Compare commits

...

1009 Commits

Author SHA1 Message Date
Chad Versace
8cb2b5c37a CHROMIUM: i965: Implement EGL_KHR_mutable_render_buffer
Tested with a low-latency handwriting application on Android Nougat on
the Chrome OS Pixelbook (codename Eve) with Kabylake.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: Ia816fa6b0a1158f81e5b63477451bf337c2001aa
2018-07-30 10:50:41 -07:00
Chad Versace
6158e78240 CHROMIUM: egl/android: Implement EGL_KHR_mutable_render_buffer
Specifically, implement the extension DRI_MutableRenderBufferLoader.
However, the loader enables EGL_KHR_mutable_render_buffer only if the
DRI driver implements its half of the extension,
DRI_MutableRenderBufferDriver.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: I7fe68a5a674d1707b1e7251d900b3affd5dd7660
2018-07-30 10:50:41 -07:00
Chad Versace
c689a23c77 CHROMIUM: egl/main: Add bits for EGL_KHR_mutable_render_buffer
A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
This patch is separate from the Android patch because I think it's
easier to review the platform-independent bits separately.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: I07470f2862796611b141f69f47f935b97b0e04a1
2018-07-30 10:50:41 -07:00
Chad Versace
0cbe8ad97d CHROMIUM: dri: Add param driCreateConfigs(mutable_render_buffer)
If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.

Not used yet.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: Icdf35794f3e9adf31e1f85740b87ce155efe1491
2018-07-30 10:50:41 -07:00
Chad Versace
f57e2bf44e CHROMIUM: dri: Define DRI_MutableRenderBuffer extensions
Define extensions DRI_MutableRenderBufferDriver and
DRI_MutableRenderBufferLoader. These are the two halves for
EGL_KHR_mutable_render_buffer.

Outside the DRI code there is one additional change.  Add
gl_config::mutableRenderBuffer to match
__DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: I4ca03d81e4557380b19c44d8d799a7cc9365d928
2018-07-30 10:50:40 -07:00
Chad Versace
3274c56585 CHROMIUM: egl/dri2: In dri2_make_current, return early on failure
This pulls an 'else' block into the function's main body, making the
code easier to follow.

Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
transforms dri2_make_current() into spaghetti.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: I26be2b7a8e78a162dcd867a44f62d6f48b9a8e4d
2018-07-30 09:32:54 -07:00
Chad Versace
0a957c2822 CHROMIUM: egl: Simplify queries for EGL_RENDER_BUFFER
There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
eglQuerySurface(EGL_RENDER_BUFFER) and
eglQueryContext(EGL_RENDER_BUFFER).

These changes eliminate potentially very fragile code in the upcoming
EGL_KHR_mutable_render_buffer implementation.

* eglQuerySurface(EGL_RENDER_BUFFER)

  The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
  abstruse logic which required comprehending the specification
  complexities of how the two EGL_RENDER_BUFFER states interact.  The
  function sometimes returned _EGLContext::WindowRenderBuffer, sometimes
  _EGLSurface::RenderBuffer. Why? The function tried to encode the
  actual logic from the EGL spec. When did the function return which
  variable? Go study the EGL spec, hope you understand it, then hope
  Mesa mutated the EGL_RENDER_BUFFER state in all the correct places.
  Have fun.

  To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
  confidence in its correctness, flatten its indirect logic. For pixmap
  and pbuffer surfaces, simply return a hard-coded literal value, as the
  spec suggests. For window surfaces, simply return
  _EGLSurface::RequestedRenderBuffer.  Nothing difficult here.

* eglQueryContext(EGL_RENDER_BUFFER)

  The implementation of this suffered from the same issues as
  eglQuerySurface, and the solution is the same.  confidence in its
  correctness, flatten its indirect logic. For pixmap and pbuffer
  surfaces, simply return a hard-coded literal value, as the spec
  suggests. For window surfaces, simply return
  _EGLSurface::ActiveRenderBuffer.

BUG=b:77899911
TEST=No android-cts-7.1 regressions on Eve.

Change-Id: Ic5f2ab1952f26a87081bc4f78bc7fa96734c8f2a
2018-07-30 09:18:28 -07:00
Chad Versace
6ac5d83a6c FROMLIST: gallium/auxiliary: Fix Autotools on Android (v2)
Problem 1: u_debug_stack_android.cpp transitively included
"pipe/p_compiler.h", but src/gallium/include was missing from the C++
include path.

Problem 2: Add -std=c++11 to AM_CXXFLAGS. Android's libbacktrace headers
require C++11, but the Android toolchain (at least in the Chrome OS SDK)
does not enable C++11 by default.

v2: Add -std=c++11.

Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Archived-At: https://lists.freedesktop.org/archives/mesa-dev/2018-July/200979.html
(am from https://patchwork.freedesktop.org/patch/240805/)

Fixes the build for betty-arcnext.

BUG=b:77235812
TEST=emerge-betty-arcnext arc-mesa

Change-Id: Idd30463f3941d7b119207aa5afd7e37e69b0a5ae
Reviewed-on: https://chromium-review.googlesource.com/1149449
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
2018-07-25 01:54:44 +00:00
Chad Versace
53ba2797ee FROMLIST: drisw: Fix build on Android Nougat
In commit cf54bd5e8, dri_sw_winsys.c began using <sys/shm.h> to support
the new functions putImageShm, getImageShm in DRI_SWRastLoader. But
Android began supporting System V shared memory only in Oreo. Nougat has
no shm headers.

Fix the build by ifdef'ing out the shm code on Nougat.

Fixes: cf54bd5e8 "drisw: use shared memory when possible"
Cc: Marc-André Lureau <marcandre.lureau@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Archived-At: https://lists.freedesktop.org/archives/mesa-dev/2018-July/200705.html
(am from https://patchwork.freedesktop.org/patch/240173/)

Fixes the build for betty.

BUG=b:77235812
TEST=emerge-betty arc-mesa

Change-Id: I5fc7214cd30fdea49b69cdc860cefd622466c7da
Reviewed-on: https://chromium-review.googlesource.com/1146074
Tested-by: Chad Versace <chadversary@chromium.org>
Trybot-Ready: Chad Versace <chadversary@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Ilja H. Friedel <ihf@chromium.org>
2018-07-21 16:51:26 +00:00
Lepton Wu
58cbc48eac UPSTREAM: virgl: Fix flush in virgl_encoder_inline_write.
The current code is buggy: if there are only 12 dwords left in cbuf,
we emit a zero data length command which will be rejected by virglrenderer.
Fix it by calling flush in this case.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry-picked from commit 04e278f793)

BUG=b:111521569

Change-Id: Ie952ece5e31ee94c5fb7e6ecb3e1de807a21763b
Reviewed-on: https://chromium-review.googlesource.com/1140419
Tested-by: Lepton Wu <lepton@chromium.org>
Trybot-Ready: Lepton Wu <lepton@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Commit-Queue: Lepton Wu <lepton@chromium.org>
2018-07-17 23:25:10 +00:00
Lepton Wu
c884f74c5f CHROMIUM: platform_android: use general fallback.
Instead of handling software fallback inside platform_android, just
let EGL framework to handle it. With this, we still can fall back
to software driver when hardware driver actually doesn't work.

BUG=b:77302150
TEST=manual - make sure betty still boot without virgl driver.

Change-Id: I5d514f67c9dc6f68661e03fd9fc9546acd7277bd
Reviewed-on: https://chromium-review.googlesource.com/1004006
Commit-Ready: Lepton Wu <lepton@chromium.org>
Tested-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
(cherry picked from commit b16fbdb135)
[chadversary: Simplify by dropping unneeded 'swrast' param.]

BUG=b:77235812
TEST=emerge-grunt arc-mesa; emerge-eve arc-mesa

Change-Id: Ib896c3f16dcd5fdddc0a7eb9e72caa7f2037c2f3
Reviewed-on: https://chromium-review.googlesource.com/1105708
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:17:24 +00:00
Emil Velikov
ca6dc3b7f4 FROMLIST: egl/android: remove HAL_PIXEL_FORMAT_BGRA_8888 support
As said in the EGL_KHR_platform_android extensions

    For each EGLConfig that belongs to the Android platform, the
    EGL_NATIVE_VISUAL_ID attribute is an Android window format, such as
    WINDOW_FORMAT_RGBA_8888.

Although it should be applicable overall.

Even though we use HAL_PIXEL_FORMAT here, those are numerically
identical to the  WINDOW_FORMAT_ and AHARDWAREBUFFER_FORMAT_ ones.

Barring the said format of course. That one is only listed in HAL.

Keep in mind that even if we try to use the said format, you'll get
caught by droid_create_surface(). The function compares the format of
the underlying window, against the NATIVE_VISUAL_ID of the config.

Unfortunatelly it only prints a warning, rather than error out, likely
leading to visual corruption.

While SDL will even call ANativeWindow_setBuffersGeometry() with the
wrong format, and conviniently ignore the [expected] failure.

Cc: mesa-stable@lists.freedesktop.org
Cc: Chad Versace <chadversary@google.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tomasz Figa <tfiga@chromium.org>
(am from https://patchwork.freedesktop.org/patch/166176/)
(tfiga: Remove only respective EGL config, leave EGL image as is.)

BUG=b:33533853
TEST=dEQP-EGL.functional.*.rgba8888_window tests pass on eve

Change-Id: I8eacfe852ede88b24c1a45bff1445aacd86f6992
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/582263
Reviewed-by: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 42c96d393b)

BUG=b:77235812
TEST=emerge-grunt arc-mesa; emerge-eve arc-mesa

Change-Id: I74e91a3621fcc9c80633cf512a8d08782ac6e1d7
Reviewed-on: https://chromium-review.googlesource.com/1105707
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:17:23 +00:00
Stéphane Marchesin
f7f64ad0ac CHROMIUM: egl/android: Support opening render nodes from within EGL
This patch adds support for opening render nodes directly from within
display initialization, Instead of relying on private interfaces
provided by gralloc.

In addition to having better separation from gralloc and being able to
use different render nodes for allocation and rendering, this also fixes
problems encountered when using the same DRI FD for gralloc and Mesa,
when both stepped each over another because of shared GEM handle
namespace.

BUG=b:29036398
TEST=No significant regressions in dEQP inside the container

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/367215
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
(cherry picked from commit 1171bddb74)

BUG=b:77235812
TEST=emerge-grunt arc-mesa; emerge-eve arc-mesa

Change-Id: Ib76d2f02077e3f679ca86d31fc999bff3a46edf0
Reviewed-on: https://chromium-review.googlesource.com/1105706
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:17:22 +00:00
Tapani Pälli
97636c85f9 FROMLIST: glcpp: Hack to handle expressions in #line directives.
GLSL ES 320 technically allows #line to have arbitrary expression trees
rather than integer literal constants, unlike the C and C++ preprocessor.
This is likely a completely unused feature that does not make sense.

However, Android irritatingly mandates this useless behavior, so this
patch implements a hack to try and support it.

We handle a single expression:

    #line <line number expression>

but we avoid handling the double expression:

    #line <line number expression> <source string expression>

because this is an ambiguous grammar.  Instead, we handle the case that
wraps both in parenthesis, which is actually well defined:

    #line (<line number expression>) (<source string expression>)

With this change following tests pass:

   dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_vertex
   dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_fragment
   dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_vertex
   dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_fragment

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

BUG=b:33352633
BUG=b:33247335
TEST=affected tests passing on CTS 7.1_r1 sentry

Reviewed-on: https://chromium-review.googlesource.com/427305
Tested-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Commit-Queue: Haixia Shi <hshi@chromium.org>
Trybot-Ready: Haixia Shi <hshi@chromium.org>
(cherry picked from commit 5ec83db3fc)

BUG=b:77235812
TEST=emerge-grunt arc-mesa; emerge-eve arc-mesa

Change-Id: Ie12a03e7642ac4f5d79e0b6b20e16d7653c3395c
Reviewed-on: https://chromium-review.googlesource.com/1105705
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:17:21 +00:00
Stéphane Marchesin
f408dc0e46 CHROMIUM: glsl: Avoid crash when overflowing the samplers array
Fixes a crash when we have too many samplers.

BUG=chromium:141901
TEST=by hand

Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
(cherry picked from commit 403ab71152)

BUG=b:77235812
TEST=emerge-grunt arc-mesa; emerge-eve arc-mesa

Change-Id: I8ffebdae3bdab68da4277193fe367959ab719796
Reviewed-on: https://chromium-review.googlesource.com/1105702
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:17:20 +00:00
Chad Versace
15656ccb9e UPSTREAM: anv/android: Fix type error in call to vk_errorf()
In a single call to vk_errorf() in the Android code, the arguments were
swapped. The bug has existed since day one. Chrome OS used to forgive
the warning, but it is now a compilation error.

CC: <mesa-stable@lists.freedesktop.org>
Fixes: 053d4c32 "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit be5fc0d7f1)

BUG=b:77235812
TEST=emerge-eve arc-mesa

Change-Id: I0dde2e53eeb09af34d629b17d0692f0e00151fdf
Reviewed-on: https://chromium-review.googlesource.com/1133766
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
2018-07-12 18:13:03 +00:00
Chad Versace
09a9aafa81 UPSTREAM: anv/android: Fix Autotools build for VK_ANDROID_native_buffer
Changes to vk.xml and anv_entrypoints_gen.py broke the Autotools build
on Android. The changes undef'd the VK_ANDROID_native_buffer entrypoints
in anv_entrypoints.h.

Fix it with CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR.

CC: <mesa-stable@lists.freedesktop.org>
See-Also: 63525ba7 "android: enable VK_ANDROID_native_buffer"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 8e403bc959)

BUG=b:77235812
TEST=none (because anv doesn't build yet)

Change-Id: Ic3b8dbeab7ee8663ca6dcca0401dc6340dddf832
Reviewed-on: https://chromium-review.googlesource.com/1133765
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
2018-07-12 18:13:02 +00:00
Stéphane Marchesin
d346f5b8bd CHROMIUM: st/mesa: Do not flush front buffer on context flush
Make gallium work again with new chrome.

BUG=none
TEST=compile

Signed-off-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Prince Agyeman <prince.agyeman@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
(cherry picked from commit 8940a624ae)

BUG=b:77235812
TEST=emerge-grunt arc-mesa

Change-Id: I450f99e8e9c83a55626a28b80d5ed4ce884c8298
Reviewed-on: https://chromium-review.googlesource.com/1105701
Commit-Queue: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:13:02 +00:00
Chad Versace
ab8ac2c3e9 UPSTREAM: gallium: Fix automake for Android (v2)
Chromium OS uses Autotools and pkg-config when building Mesa for
Android. The gallium drivers were failing to find the headers and
libraries for zlib and Android's libbacktrace.

v2:
  - Don't add a check for zlib.pc. configure.ac already checks for
    zlib.pc elsewhere. [for tfiga]
  - Check for backtrace.pc separately from the other Android libs.
    [for tfiga]

Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit dc6665422a)

BUG=b:79872606
TEST=emerge-grunt arc-mesa
CQ-DEPEND=CL:1105634

Change-Id: If2e7fd953786723d6b7194f9174f5856a55c3b14
Reviewed-on: https://chromium-review.googlesource.com/1105700
Tested-by: Chad Versace <chadversary@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2018-07-12 18:13:00 +00:00
Tomasz Figa
d069832666 CHROMIUM: Add PRESUBMIT.cfg to disable various checks
This makes it so that we don't need to run 'repo upload --no-verify'.

BUG=b:26574868
TEST=ran upload on some CLs

Change-Id: I0b22a5ee0321dc454affeefcfac0eee75490bd6e
Reviewed-on: https://chromium-review.googlesource.com/780587
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Commit-Queue: Chad Versace <chadversary@chromium.org>
(cherry picked from commit 7cc5c96a9a)

BUG=b:77235812
TEST=none

Change-Id: Ia4285d3e178787874f439b76aaf8717f50fd7df2
Reviewed-on: https://chromium-review.googlesource.com/1105699
Reviewed-by: Bas Nieuwenhuizen <basni@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Commit-Queue: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Trybot-Ready: Ilja H. Friedel <ihf@chromium.org>
2018-06-22 23:38:47 +00:00
Timothy Arceri
a9114b5e3e util: add allow_glsl_relaxed_es to drirc for Google Earth VR
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-19 12:09:56 +10:00
Timothy Arceri
725b1a406d mesa/util: add allow_glsl_relaxed_es driconfig override
This relaxes a number of ES shader restrictions allowing shaders
to follow more desktop GLSL like rules.

This initial implementation relaxes the following:

 - allows linking ES shaders with desktop shaders
 - allows mismatching precision qualifiers
 - always enables standard derivative builtins

These relaxations allow Google Earth VR shaders to compile.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-19 12:09:56 +10:00
Timothy Arceri
781c23ece6 util: add allow_glsl_builtin_const_expression to drirc for Google Earth VR
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-19 12:09:56 +10:00
Timothy Arceri
90dbab0f9a mesa/util: add allow_glsl_builtin_const_expression driconf override
Google Earth VR shaders uses builtins in constant expressions with
GLSL 1.10. That feature wasn't allowed until GLSL 1.20.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-19 12:09:56 +10:00
Timothy Arceri
de93f546a7 util: manually extract the program name from program_invocation_name
Glibc has the same code to get program_invocation_short_name. However
for some reason the short name gets mangled for some wine apps.

For example with Google Earth VR I get:

program_invocation_name:
"/home/tarceri/.local/share/Steam/steamapps/common/EarthVR/Earth.exe"

program_invocation_short_name:
"e"

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-19 12:09:56 +10:00
Bas Nieuwenhuizen
1a8501a9dd ac/surface: Set compressZ for stencil-only surfaces.
We HTILE compress stencil-only surfaces too.

CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-06-19 02:52:01 +02:00
Jason Ekstrand
0146d79636 anv: Use a single global API patch version
The Vulkan API has only one patch version shared among all of the
major.minor versions.  We should also advertise the same patch version
regardless of major.minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106941
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-18 17:11:52 -07:00
Timothy Arceri
68bf94a8b0 radeonsi: enable OpenGL 3.3 compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-06-19 09:21:33 +10:00
Timothy Arceri
89a5d6f715 mesa: add ff fragment shader support for geom and tess shaders
This is required for compatibility profile support.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-06-19 09:21:33 +10:00
Eric Anholt
e636199c1c v3d: Set the SO offsets correctly if we have to re-emit.
This should fix TF across a glFlush() or TF pause/restart.  Fixes
dEQP-GLES3.functional.transform_feedback.array.interleaved.lines.highp_float
and many, many others.
2018-06-18 14:54:16 -07:00
Marek Olšák
94178044d5 gallium/hud: = should rename the last added data source
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-18 17:53:15 -04:00
Rafael Antognolli
ba2c18763b anv: Disable constant buffer 0 being relative.
If we are on gen8+ and have context isolation support, just make that
constant buffer address be absolute, so we can use it for push UBOs too.

v2: Do not duplicate constant_buffer_0_is_relative flag (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-18 14:41:38 -07:00
Rafael Antognolli
be18d5a0ce anv/device: Check for kernel support of context isolation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-18 14:41:38 -07:00
Rafael Antognolli
056214ebfc intel/genxml: Add bitmasks for CS_DEBUG_MODE2/INSTPM.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-18 14:41:38 -07:00
Alok Hota
a678f40e46 swr/rast: Clang-Format most rasterizer source code
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-06-18 13:57:38 -05:00
Eric Engestrom
d85fef1e34 radv: fix reported number of available VGPRs
It's a bit late to round up after an integer division.

Fixes: de88979413 "radv: Implement VK_AMD_shader_info"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
2018-06-18 17:08:22 +01:00
Eric Engestrom
9a4bd6b45f mesa: add missing return in error path
Fixes: 67f40dadaa "mesa: add support for ARB_sample_locations"
Cc: Rhys Perry <pendingchaos02@gmail.com>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-06-18 16:19:48 +01:00
Bas Nieuwenhuizen
a3d93eec7c radv: Use less conservative approximation for context rolls.
Drops the number of time we set the scissor by 4x for F1 2017,
which results in a consistent performance improvement of about 4%.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-18 16:21:10 +02:00
Eric Engestrom
4d08c1e7d1 radv: fix bitwise check
Fixes: 922cd38172 "radv: implement out-of-order rasterization when it's safe on VI+"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-18 12:15:18 +01:00
Eric Engestrom
e8eb84826e meson: fix i965/anv/isl genX static lib names
Shouldn't make any functional difference, just that `liblibanv_gen90.a`
will now be called `libanv_gen90.a`.

Fixes: 3218056e0e "meson: Build i965 and dri stack"
Fixes: d1992255bb "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-18 12:03:24 +01:00
Timothy Arceri
66673bef94 mesa: Unconditionally enable floating-point textures
ARB_texture_float references US Patent #6,650,327 [1] which has a filing date
of June 16 1998.

According to [2], patents filed after 1995 expire 20 years from the filing
date, giving an expiration of June 17 2018.

[1] https://www.google.com/patents/US6650327
[2] https://en.wikipedia.org/wiki/Term_of_patent_in_the_United_States

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-18 09:29:38 +10:00
Jose Maria Casanova Crespo
b8e099e7d5 intel/fs: shuffle_64bit_data_for_32bit_write is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a4965842d6 intel/fs: Use new shuffle_32bit_write for all 64-bit storage writes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a4d445b93c intel/fs: shuffle_32bit_load_result_to_64bit_data is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
71b319a285 intel/fs: Use shuffle_from_32bit_read for 64-bit FS load_input
As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporary destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
8003ae87f4 intel/fs: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES
Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.

This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlaps.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
5565630f85 intel/fs: Use shuffle_from_32bit_read at VS load_input
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.

v2: Add comment about future 16-bit support (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
152bffb69b intel/fs: Use shuffle_from_32bit_read for 64-bit gs_input_load
This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.

Copy propagation wasn't able to remove them because shuffle
destination values are defined with partial writes because they
have stride == 2.

v2: Reword commit log summary (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
8b26a2d96d intel/fs: shuffle_from_32bit_read for 64-bit do_untyped_vector_read
do_untyped_vector_read is used at load_ssbo and load_shared.

The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
c2297bdf19 intel/fs: Remove old 16-bit shuffle/unshuffle functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
fd3d8a8f79 intel/fs: Use shuffle_for_32bit_write for 16-bits store_ssbo
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
20e4732f7d intel/fs: Use shuffle_from_32bit_read to read 16-bit SSBO
Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a0891eabca intel/fs: Use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD
shuffle_from_32bit_read can manage the shuffle/unshuffle needed
for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD.
To get the specific component the first_component parameter is used.

In the case of the previous 16-bit shuffle, the shuffle operation was
generating not needed MOVs where its results where never used. This
behaviour passed unnoticed on SIMD16 because dead_code_eliminate
pass removed the generated instructions but for SIMD8 they cound't be
removed because of being partial writes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
22c654941b intel/fs: New shuffle_for_32bit_write and shuffle_from_32bit_read
These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.

shuffle_src_to_dst takes care of doing a shuffle when source type is
smaller than destination type and an unshuffle when source type is
bigger than destination. So this new read/write functions just need
to call shuffle_src_to_dst assuming that writes use a 32-bit
destination and reads use a 32-bit source.

As shuffle_for_32bit_write/from_32bit_read components take components
in unit of source/destination types and shuffle_src_to_dst takes units
of the smallest type component, we adjust components and first_component
parameters.

To enable this new functions it is needed than there is no
source/destination overlap in the case of shuffle_from_32bit_read.
That never happens on shuffle_for_32bit_write as it allocates a new
destination register as it was at shuffle_64bit_data_for_32bit_write.

v2: Reword commit log and add comments to explain why first_component
    and components parameters are adjusted. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo
a5665056e5 intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.

If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.

Component units are measured in terms of the smaller type between
source and destination. As we are un/shuffling the smaller components
from/into a bigger one.

The operation allows to skip first_component number of components from
the source.

Shuffle MOVs are retyped using integer types avoiding problems with
denorms and float types if source and destination bitsize is different.
This allows to simplify uses of shuffle functions that are dealing with
these retypes individually.

Now there is a new restriction so source and destination can not overlap
anymore when calling this shuffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.

v2: (Jason Ekstrand)
    - Rewrite overlap asserts.
    - Manage type_sz(src.type) == type_sz(dst.type) case using MOVs
      from source to dest. This works for 64-bit to 64-bits
      operation that on Gen7 as it doesn't support Q registers.
    - Explain that components units are based in the smallest type.
v3: - Fix unshuffle overlap assert (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-16 22:39:08 +02:00
Jose Fonseca
d882331f7a appveyor: Consume LLVM 5.0.1.
https://ci.appveyor.com/project/jrfonseca/mesa/build/47

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-06-16 18:09:20 +01:00
Bas Nieuwenhuizen
c4714f698b ac: Clear meminfo to avoid valgrind warning.
Somehow valgrind misses that the value is initialized by the ioctl.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-16 19:03:47 +02:00
Samuel Pitoiset
5917761e3d radv: fix emitting the TCS regs on GFX9
The primitive ID is NULL and this generates an invalid
select instruction which crashes because one operand is NULL.

This fixes crashes in The Long Journey Home, Quantum Break
and Just Cause 3 with DXVK.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-16 10:18:51 +02:00
Ian Romanick
355868dbfc nir: Document a couple instances of parent_instr
nir_ssa_def::parent_instr and nir_src::parent_instr have the same name,
but they mean really different things.  I choose to save the next person
the hour+ that I just spent figuring that out.  Even now that I know, I
doubt I'd notice in code review that someone typed foo->parent_instr
when they actually meant foo->ssa->parent_instr.

v2: Minor wording tweak in nir_ssa_def::parent_instr.  Suggested by
Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-15 17:36:51 -07:00
Ian Romanick
4467040cb6 i965/fs: Propagate conditional modifiers from not instructions
Skylake
total instructions in shared programs: 14399081 -> 14399010 (<.01%)
instructions in affected programs: 26961 -> 26890 (-0.26%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18%
95% mean confidence interval for instructions value: -1.50 -0.99
95% mean confidence interval for instructions %-change: -0.35% -0.25%
Instructions are helped.

total cycles in shared programs: 532978307 -> 532976050 (<.01%)
cycles in affected programs: 468629 -> 466372 (-0.48%)
helped: 33
HURT: 20
helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98
helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27%
HURT stats (abs)   min: 2 max: 172 x̄: 79.40 x̃: 43
HURT stats (rel)   min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44%
95% mean confidence interval for cycles value: -81.29 -3.88
95% mean confidence interval for cycles %-change: -1.07% 0.12%
Inconclusive result (%-change mean confidence interval includes 0).

All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell shown)
total instructions in shared programs: 12973897 -> 12973838 (<.01%)
instructions in affected programs: 25970 -> 25911 (-0.23%)
helped: 55
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18%
95% mean confidence interval for instructions value: -1.14 -1.00
95% mean confidence interval for instructions %-change: -0.32% -0.24%
Instructions are helped.

total cycles in shared programs: 410355841 -> 410352067 (<.01%)
cycles in affected programs: 578454 -> 574680 (-0.65%)
helped: 47
HURT: 5
helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18
helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38%
HURT stats (abs)   min: 2 max: 242 x̄: 51.20 x̃: 4
HURT stats (rel)   min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11%
95% mean confidence interval for cycles value: -104.89 -40.27
95% mean confidence interval for cycles %-change: -1.45% -0.66%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11679351 -> 11679301 (<.01%)
instructions in affected programs: 28208 -> 28158 (-0.18%)
helped: 50
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.27% -0.19%
Instructions are helped.

total cycles in shared programs: 257445362 -> 257444662 (<.01%)
cycles in affected programs: 419338 -> 418638 (-0.17%)
helped: 40
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24
helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41%
HURT stats (abs)   min: 2 max: 1588 x̄: 634.00 x̃: 312
HURT stats (rel)   min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62%
95% mean confidence interval for cycles value: -97.96 65.41
95% mean confidence interval for cycles %-change: -1.56% -0.62%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

v2: Move 'if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ)'
check outside the loop.  Suggested by Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
f2d8bb7a7b i965/fs: Rearrange code to remove most of the gotos
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
77f269bb56 i965/fs: Refactor propagation of conditional modifiers from compares to adds
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
22f9fbc0d9 i965/vec4: Optimize OR with 0 into a MOV
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.

The "No changes on any other platforms" comment below is not quite
right.  Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes.  I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter.  I suspect other shaders would be similarly
affected.  Since this is all based on NOS, shader-db does not reflect
it.

Haswell
total instructions in shared programs: 12954955 -> 12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.

total cycles in shared programs: 410012108 -> 410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11679387 -> 11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.

No changes on any other platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
e6a9bd97b9 i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
284b563fb0 i965/fs: Optimize OR with 0 into a MOV
fs_visitor::set_gs_stream_control_data_bits generates some code like
"control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as
part of EmitVertex.  The first time this (dynamically) occurs in the
shader, control_data_bits is zero.  Many times we can determine this
statically and various optimizations will collaborate to make one of the
OR operands literal zero.

Converting the OR to a MOV usually allows it to be copy-propagated away.
However, this does not happen in at least some shaders (in the assembly
output of shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test,
search for shl).

All of the affected shaders are geometry shaders.

Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14375452 -> 14375413 (<.01%)
instructions in affected programs: 6422 -> 6383 (-0.61%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.26% -1.57%
Instructions are helped.

total cycles in shared programs: 531981179 -> 531980555 (<.01%)
cycles in affected programs: 27493 -> 26869 (-2.27%)
helped: 39
HURT: 0
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92%
95% mean confidence interval for cycles value: -16.00 -16.00
95% mean confidence interval for cycles %-change: -6.98% -4.90%
Cycles are helped.

No changes on earlier platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Eric Anholt
4106f6ce54 v3d: Handle a no-intersection scissor even if it's outside of the VP.
The min/maxes ended up producing a negative clip width/height for
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line.  Just make
sure they stay at 0 (or v3d 3.x's workaround) if that happens.
2018-06-15 16:09:39 -07:00
Eric Anholt
9aa670e52a v3d: Use the proper depth texture type for sampling.
Fixes failing tests in dEQP-GLES3.functional.texture.shadow
2018-06-15 16:09:39 -07:00
Eric Anholt
778594ae12 v3d: Limit shader threading according to our maximum TMU fifo usage.
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
2018-06-15 16:09:39 -07:00
Eric Anholt
e130ada243 v3d: Fix shaders using pixel center W but no varyings.
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"

Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
2018-06-15 16:09:39 -07:00
Dylan Baker
0d4f338a11 docs: Update release-notes and calendar 2018-06-15 13:53:25 -07:00
Dylan Baker
3c454fc84a docs: Add release notes for 18.1.2 2018-06-15 13:52:44 -07:00
Rafael Antognolli
9e1f208795 intel/aubinator: Use int to store getopt_long flags.
getopt_long flag parameter is an int pointer, so if we use bool to store
those values, when getopt_long writes to one of them, it might end up
overwriting the next one.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 09:03:10 -07:00
Samuel Pitoiset
f8e2c4c57c Revert "radv: always set/load both depth and stencil clear values"
This fixes a rendering regression with RoTR.

This reverts commit 4bdad9fadd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 16:52:06 +02:00
Samuel Pitoiset
a2f6e72138 radv: don't check for linear images in emit_fast_color_clear()
We don't enable CMASK for linear surfaces and addrlib only
enables DCC for tiling surfaces.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:12 +02:00
Samuel Pitoiset
3befac52db radv: allow RADV_PERFTEST=dccmsaa on GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:10 +02:00
Samuel Pitoiset
bfca15e16a radv: add RADV_DEBUG=checkir
This allows to run the LLVM verifier pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:08 +02:00
Samuel Pitoiset
706d51de7f radv: update ZRANGE_PRECISION in radv_update_bound_fast_clear_ds()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:06 +02:00
Samuel Pitoiset
fa8bc821a8 radv: clean up radv_{set,load}_depth_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:04 +02:00
Samuel Pitoiset
4bdad9fadd radv: always set/load both depth and stencil clear values
I don't think that matter much to emit both values and that
makes the code a bit simpler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:02 +02:00
Samuel Pitoiset
2193a6a828 radv: update the fast ds clear values only if the image is bound
It's unnecessary to update the fast depth/stencil clear values
if the fast cleared depth/stencil image isn't currently bound.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:00 +02:00
Samuel Pitoiset
be794fa26b radv: clean up radv_{set,load}_color_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:58 +02:00
Samuel Pitoiset
d7b772abb4 radv: update the fast color clear values only if the image is bound
It's unnecessary to update the fast color clear values if the
fast cleared color image isn't currently bound.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:55 +02:00
Christian Gmeiner
efae127993 util/bitset: include util/macro.h
BITSET_FFS(x) macro makes use of ARRAY_SIZE(x) macro which is
defined in util/macro.h. Include it directy to make usage more
straightforward.

Fixes: 692bd4a1ab ("util: replace Elements() with ARRAY_SIZE()")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-15 11:26:30 +01:00
Lukas Rusak
4cfc4cef80 meson: fix private libs when building without glx
I noticed that the generated pkg-config files will include
glx and x11 dependencies even when x11 isn't a selected platform.

This fixes the private libs and was tested by building kmscube

V2:
  - check if gallium-xlib is being used for glx

Fixes: 108d257a16 "meson: build libEGL"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-15 10:43:22 +01:00
Rhys Perry
30f1ab7a59 docs: document addition of GL_ARB_sample_locations for nvc0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
66ca7e400b nvc0: add support for programmable sample locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
2018-06-14 20:09:45 -06:00
Rhys Perry
9f217facbd st/mesa: add support for ARB_sample_locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
51a221e378 gallium: add support for programmable sample locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
67f40dadaa mesa: add support for ARB_sample_locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Eric Anholt
cd2e673abc v3d: Fix polygon offset for Z16 buffers.
Fixes:
dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units
dEQP-GLES3.functional.polygon_offset.fixed16_render_with_units
2018-06-14 17:03:16 -07:00
Eric Anholt
d91e06a065 v3d: Fix configuration setup of mixed f32 and f16 render targets.
Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
2018-06-14 16:52:25 -07:00
Eric Anholt
6784aa9870 v3d: Don't set the first_ez_state to DISABLED if after only UNDECIDED draws.
We need to have the RCL start with EZ enabled, since those undecided draws
had EZ enabled.  But we do need to update from UNDECIDED to LT or GT as
necessary still.

Fixes many simulator assertion fails in deqp
fragment_ops/interaction/basic_shader/*
2018-06-14 16:52:25 -07:00
Eric Anholt
9080642449 v3d: Use the right size for v3d 4.x TEXTURE_SHADER_STATE BO.
This doesn't really matter, since they both get rounded up to 4096.
2018-06-14 16:52:25 -07:00
Eric Anholt
31548187cf v3d: Add static asserts for other packed packet sizes. 2018-06-14 16:52:25 -07:00
Eric Anholt
0eef4d7f8f v3d: Fix the size of the packed attribute state.
Fixes segfaults in dEQP-GLES3.functional.vertex_array_objects.all_attributes.
2018-06-14 16:52:25 -07:00
Eric Anholt
7d8fe50af3 v3d: Remove some unused context fields from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt
48011c42aa v3d: Remove unused QUNIFORM_STENCIL left over from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt
4564537222 v3d: Use our #define for max attributes in shader caps. 2018-06-14 16:52:25 -07:00
Eric Anholt
a40bc33b11 v3d: Fix undefined results for a swap_color_rb RT from a float shader output.
Fixes segfaults and undefined behavior in
dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
2018-06-14 16:52:25 -07:00
Dave Airlie
600d34c822 radv: remove multisample bit from shader key.
This wasn't being used anywhere inside the shader from what I can see.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 09:33:20 +10:00
Kenneth Graunke
f6898f2b55 intel/compiler: Properly consider UBO loads that cross 32B boundaries.
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.

For example, if a UBO contained the following, tightly packed:

   vec4 a;  // [0, 16)
   float b; // [16, 20)
   vec4 c;  // [20, 36)

then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.

Similarly, dvec4s would suffer from the same problem.

v2: Rewrite the accounting, my calculations were wrong.
v3: Write a comment about partial values (requested by Jason).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v3]
2018-06-14 14:58:59 -07:00
Ian Romanick
37bd9ccd21 glsl: Don't copy propagate elements from SSBO or shared variables either
Since SSBOs can be written by a different GPU thread, copy propagating a
read can cause the value to magically change.  SSBO reads are also very
expensive, so doing it twice will be slower.

The same shader was helped by this patch and the previous.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399119 -> 14399113 (<.01%)
instructions in affected programs: 683 -> 677 (-0.88%)
helped: 1
HURT: 0

total cycles in shared programs: 532973113 -> 532971865 (<.01%)
cycles in affected programs: 524666 -> 523418 (-0.24%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
2018-06-14 11:28:12 -07:00
Ian Romanick
461a5c899c glsl: Don't copy propagate from SSBO or shared variables either
Since SSBOs can be written by other GPU threads, copy propagating a read
can cause the value to magically change.  SSBO reads are also very
expensive, so doing it twice will be slower.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399120 -> 14399119 (<.01%)
instructions in affected programs: 684 -> 683 (-0.15%)
helped: 1
HURT: 0

total cycles in shared programs: 532978931 -> 532973113 (<.01%)
cycles in affected programs: 530484 -> 524666 (-1.10%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
2018-06-14 11:26:33 -07:00
Lukas Rusak
1d92d6486a meson: only build vl_winsys_dri.c when x11 platform is used
This seems to have been missed in the move from autotools

This fixes the following build issue:

../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory
 #include <X11/Xlib-xcb.h>
          ^~~~~~~~~~~~~~~~

Fixes: b1b65397d0
       ("meson: Build gallium auxiliary")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-14 10:34:51 -07:00
Brian Paul
b9e6438adf st/mesa: add missing switch cases in glsl_to_tgsi_visitor::visit()
To silence compiler warning about unhandled switch cases.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-06-14 11:29:51 -06:00
Bas Nieuwenhuizen
41dabdc475 radv: Fix output for sparse MRTs.
We need to init the cb_shader_format correctly with the changed
col_format, so this moves the col_format adjustment to before the
adjustment to before the cb_shader_mask gets generated.

Fixes: 06d3c65098 "radv: fix a GPU hang when MRTs are sparse"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-14 11:48:24 +02:00
Samuel Pitoiset
68dead112e radv: update the ZRANGE_PRECISION value for the TC-compat bug
On GFX8+, there is a bug that affects TC-compatible depth surfaces
when the ZRange is not reset after LateZ kills pixels.

The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match
the last fast clear value. Because the value is set to 1 by default,
we only need to update it when clearing Z to 0.0.

We also need to set the depth clear regs and to update
ZRANGE_PRECISION when initializing a TC-compat depth image to 0.

Original patch from James Legg.

This fixes random CTS fails with
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-14 11:38:29 +02:00
Samuel Iglesias Gonsálvez
183adc51f8 anv: reduce maxFragmentInputComponents
If the application asks for the maximum number of fragment input
components (128), use all of them plus some builtins that are
passed in the VUE, then we exceed the maximum number of used VUE
slots (32) and we break one assert that checks this limit.

Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1
builtins in brw_compute_vue_map() because we don't know if
gl_ClipDistance is going to be read/write by an adjacent stage.

Fixes VK-GL-CTS CL#2569.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-14 09:54:28 +02:00
Marek Olšák
6d671078a8 radeonsi/gfx9: fix si_get_buffer_from_descriptors for 48-bit pointers
This fixes:
GL45-CTS.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-13 22:00:12 -04:00
Marek Olšák
a4312742a5 radeonsi/gfx9: update & clean up a DPBB heuristic
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:43 -04:00
Marek Olšák
47b780be21 radeonsi/gfx9: set POPS_DRAIN_PS_ON_OVERLAP due to a hw bug
This may not be needed yet, but let's set it now.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:42 -04:00
Marek Olšák
a152ca70f2 radeonsi/gfx9: remove UINT_MAX array terminators in bin size tables
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:40 -04:00
Marek Olšák
cd0be6cdc8 radeonsi/gfx9: update bin sizes
This is based on our docs (recently updated), not amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:39 -04:00
Marek Olšák
2f51081a93 radeonsi/gfx9: update primitive binning code for EQAA
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:37 -04:00
Marek Olšák
22e994bb75 radeonsi: assume that rasterizer state is non-NULL in draw_vbo
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:36 -04:00
Marek Olšák
f3b3ee6974 radeonsi: micro-optimize prim checking and fix guardband with lines+adjacency
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:34 -04:00
Marek Olšák
d6974feb90 radeonsi: move the guardband registers into a separate state atom
They have a different frequency of updates and don't change when scissors
change.

I think this even fixes something in si_update_vs_viewport_state.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:31 -04:00
Marek Olšák
68b1c669e7 radeonsi/gfx9: implement the scissor bug workaround without performance drop
This might improve performance on Vega10 and Raven.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:27 -04:00
Marek Olšák
73b0d10152 radeonsi: don't set VGT_LS_HS_CONFIG if it doesn't change
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:25 -04:00
Marek Olšák
28ee825e19 radeonsi: move VGT_GS_OUT_PRIM_TYPE into si_shader_gs
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:23 -04:00
Marek Olšák
99e0ba6868 radeonsi: record CLIPVERTEX output usage properly for compatibility profiles
This was missed when adding CLIPVERTEX support into GS & tess.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:20 -04:00
Marek Olšák
47a57a709d radeonsi: fix FBFETCH with 2D MSAA arrays
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:17 -04:00
Marek Olšák
e5e57c3a5e ac: handle undefined EQAA samples in ac_apply_fmask_to_sample
RADV might wanna use this helper too.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:12 -04:00
Marek Olšák
a2d4c8ff6d radeonsi: return real memory usage instead of per-process usage
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 21:47:36 -04:00
Marek Olšák
95ecde42eb ac/gpu_info: report real total memory sizes
The change from MIN2 to MAX2 is intentional.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 21:47:36 -04:00
Dave Airlie
f11b664f48 docs: mark virgl GL 4.0 features as complete.
virgl should now expose GL4.1 where it can.
2018-06-14 10:38:11 +10:00
Dave Airlie
7b6f2704eb virgl: add ARB_tessellation_shader support. (v2)
This should add all the pieces to enable tess shaders on virgl.

v2: fixup transform to handle tess and strip out precise.
set default for max patch varyings to work around issue when
tess gets enabled from v1 caps but v2 caps aren't in place. (Elie)

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-06-14 10:36:31 +10:00
Dave Airlie
babd1d526b glsl: allow standalone semicolons outside main()
GLSL 4.60 offically added this but games and older CTS suites actually
had shaders that did this, we may as well enable it everywhere.

Adding stable because it appears apps in the wild do this.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2018-06-14 10:21:51 +10:00
Samuel Pitoiset
51e23d3419 radv: don't fast clear HTILE for 16-bit depth surfaces on GFX8
This causes rendering issues in Shadow Warrior 2 with DXVK.

Cc: mesa-stable@lists.freedesktop.org
Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 20:30:04 +02:00
Andrew Galante
baf16b2ea3 configure.ac: Test for __atomic_add_fetch in atomic checks
Some platforms have 64-bit __atomic_load_n but not 64-bit
__atomic_add_fetch, so test for both of them.

Bug: https://bugs.gentoo.org/655616
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-13 10:09:46 -07:00
Andrew Galante
9d547a7617 meson: Test for __atomic_add_fetch in atomic checks
Some platforms have 64-bit __atomic_load_n but not 64-bit
__atomic_add_fetch, so test for both of them.

Bug: https://bugs.gentoo.org/655616
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-13 10:09:46 -07:00
Matt Turner
b29b5a82a1 meson: Fix -latomic check
Commit 54ba73ef10 (configure.ac/meson.build: Fix -latomic test) fixed
some checks for -latomic, and then commit 54bbe600ec (configure.ac:
rework -latomic check) further extended the fixes in configure.ac but
not in Meson. This commit extends those fixes to the Meson tests.

Fixes: 54bbe600ec (configure.ac: rework -latomic check)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-13 10:09:46 -07:00
Dylan Baker
9cc577761f meson: Remove various completed todos
v3: - Remove "won't do" todos, so only completed todo's are now removed.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
2018-06-13 10:07:03 -07:00
Dylan Baker
0ce3f3538b meson: Make use of optional modules
meson 0.43 gained support for optional modules, which clover wold like
to use. Since we require 0.44.1 now we can rely on them being available
for clover.

compile tested only.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-13 10:06:58 -07:00
Dylan Baker
34bbb24ce7 meson: Add support for ppc assembly/optimizations
v2: - Use -mpower8-vector in compiler test for altivec
    - rename altivec option to power8
    - reword power8 option description to be more clear, originally I
      had made it a boolean, but replaced it with an auto option.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-13 10:06:54 -07:00
Dylan Baker
e26af22143 meson: Add support for SPARC assembly
This was blindly copied from autotools and tested by a helpful gentoo
user.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-13 10:06:25 -07:00
Dylan Baker
6eaa013685 meson: Set include dirs for asm
v2: - split this from the next patch
    - Only include x86-64 and not x86 when buiding x86_64

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-13 10:06:23 -07:00
Dylan Baker
65e447c5df meson: move cc and cpp definitions to top of main meson.build
This just makes using cc and cpp easier.

v2: - Add this patch to fix altivec

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-13 10:06:16 -07:00
Jason Ekstrand
51376cd749 Revert "intel/compiler: Properly consider UBO loads that cross 32B boundaries."
This reverts commit b8fa847c2e.

This broke about 30k Vulkan CTS tests.
2018-06-13 09:23:55 -07:00
Kenneth Graunke
b8fa847c2e intel/compiler: Properly consider UBO loads that cross 32B boundaries.
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.

For example, if a UBO contained the following, tightly packed:

   vec4 a;  // [0, 16)
   float b; // [16, 20)
   vec4 c;  // [20, 36)

then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.

Similarly, dvec4s would suffer from the same problem.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-06-13 02:07:58 -07:00
Ross Burton
3c288da5ee drivers/dri/i965: add missing #include
brw_bufmgr.h uses time_t without include time.h, so the build fails under musl.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-12 12:08:30 +01:00
Mauro Rossi
fb9ab2fbd3 anv/android: Use an address for each anv_image plane
Fixes to avoid building error after change in image->planes[] structure,
{bo,bo_offset} has to be replaced by address.{bo,offset}
and update is needed also in the assert() for debug builds.

external/mesa/src/intel/vulkan/anv_android.c:188:21:
error: no member named 'bo' in 'struct anv_image::(anonymous at external/mesa/src/intel/vulkan/anv_private.h:2647:4)'
   image->planes[0].bo = bo;
   ~~~~~~~~~~~~~~~~ ^
1 error generated.

Fixes: bf34ef16ac ("anv: Use an address for each anv_image plane")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-12 11:17:43 +03:00
Mauro Rossi
a1220e7311 anv/android: Set the BO flags in bo_cache_import (v2)
Changes to avoid building error:

external/mesa/src/intel/vulkan/anv_android.c:131:72:
error: too few arguments to function call, expected 5, have 4
   result = anv_bo_cache_import(device, &device->bo_cache, dma_buf, &bo);
            ~~~~~~~~~~~~~~~~~~~                                        ^
1 error generated.

(v2) Set the correct bo_flags based on support of 48bit addresses and soft-pin

Fixes: b0d50247a7 ("anv/allocator: Set the BO flags in bo_cache_alloc/import")
Fixes: e7d0378bd9 ("anv: Soft-pin client-allocated memory")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-12 11:16:39 +03:00
Kenneth Graunke
0d5329d626 anv: Disable __gen_validate_value if NDEBUG is set.
We were enabling undefined memory checking for genxml values based on
Valgrind being installed at build time, even for release builds.  This
generates piles and piles of assembly whenever you touch genxml.

With gcc 7.3.1 and -O3 and -march=native on a Kabylake with Valgrind
installed at build time:

      text    data    bss     dec    hex filename
   5978385  262884  13488 6254757 5f70a5 libvulkan_intel.so
   3799377  262884  13488 4075749 3e30e5 libvulkan_intel.so

That's a 36% reduction in text size.

Fixes: 047ed02723 (vk/emit: Use valgrind to validate every packed field)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-11 14:55:32 -07:00
Eric Engestrom
06e8771dec README: wording fix for previous commit
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-11 18:34:58 +01:00
Eric Engestrom
d9f54dceca README: add link to WhosWho for IRC nicks
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-11 18:33:12 +01:00
Eric Engestrom
eadc068406 add project README
Now that we're using GitLab, let's take advantage of the "landing page"
README feature with some minimal information, mostly to point people to
the right resources.

Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-11 18:02:35 +01:00
Eric Engestrom
e43c012433 i965: fix resource leak
v2: intel_miptree_release() already takes care of the planes, no need
    to hand-code the loop (Lionel)

Coverity ID: 1436909
Fixes: 3352f2d746 "i965: Create multiple miptrees for planar YUV images"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2018-06-11 14:54:23 +01:00
Rob Clark
55d1a77c29 freedreno/ir3: use pipe_image_view's cpp
At least for PIPE_BUFFER, we could get the resource used as (for
example) R32F imageBuffer.  So using cpp=1 from the rsc is wrong.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
9bb90a3255 freedreno/ir3: fix image dimensions offset
copy-pasta fail from how SSBO sizes are handled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
e9fc9c16c9 freedreno/a5xx: correct image/ssbo offset
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
132e5b0b34 freedreno/ir3: use saml always if we have lod
In some cases we get plain tex opcodes (but w/ a lod argument).. in this
case always use the saml instruction.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
cf5dda3349 freedreno/ir3: don't cp absneg into meta:fi
If using a fanin (collect) to collect of consecutive registers together,
we can CP mov's into the fanin, but not (abs) or (neg).  No places that
allow those modifiers are consuming a fanin anyways.  But this caused an
absneg to be lost between a ldgb and stgb for shaders like:

  outputs[n] = abs(input[n])

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
39e7a39e91 freedreno/ir3: rework size/type conversion instructions
With 8b and 16b, there are a lot more to handle.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
a52e698219 freedreno/ir3: propagate HALF flag across fanout
If we have a fanout (split) meta instruction to split the result of a
vector instruction, propagate the HALF flag back to the original
instruction.  Otherwise result ends up in a full precision register
while instruction(s) that use the result look in a half-precision
register.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
fc1690c9d9 freedreno/a5xx: add sample-id/sample-mask-in
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
619d2317cd freedreno/ir3: add sample-id/sample-mask-in
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
a49c87956e freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Rob Clark
067d89c2cd freedreno/ir3: image atomics use image-store path
image reads are handled via tex state, whereas image writes and atomics
are handled via SSBO state block.  Previously we were only considering
image write, and not image atomics which also uses the SSBO state block.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-06-11 09:06:03 -04:00
Kyle Brenneman
41642bdbca egl/glvnd: Fix a segfault in eglGetProcAddress.
If FindProcIndex in egldispatchstubs.c is called with a name that's less than
the first entry in the array, it would end up trying to store an index of -1 in
an unsigned integer, wrap around to 2^32, and then crash when it tries to look
that up.

Change FindProcIndex so that it uses bsearch(3) instead of implementing its own
binary search, like the GLX equivalent FindGLXFunction does.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-11 12:17:07 +01:00
Jordan Justen
e266b32059 mesa/program_binary: add implicit UseProgram after successful ProgramBinary
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106810
Fixes: b4c37ce214 "i965: Add ARB_get_program_binary support using nir_serialization"
Ref: 3fe8d04a6d "mesa: don't always set _NEW_PROGRAM when linking"
Ref: c505d6d852 "mesa: use gl_program for CurrentProgram rather than gl_shader_program"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-10 21:12:46 -07:00
Dave Airlie
525cfe5dab features.txt: update virgl GL4.1 status.
All the features for GL4.1 are done (64-bit attribs were part of
the fp64 enable).

Once tessellation shaders land this will be advertised
2018-06-11 10:49:14 +10:00
Dave Airlie
77d7d7acab virgl: enable ARB_gpu_shader_fp64
This enables ARB_gpu_shader_fp64 if the host provides it.

Tested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-06-11 08:35:03 +10:00
Samuel Pitoiset
135e4d434f radv: add a workaround for DXVK hangs by setting amdgpu-skip-threshold
Workaround for bug in llvm that causes the GPU to hang in presence
of nested loops because there is an exec mask issue. The proper
solution is to fix LLVM but this might require a bunch of work.

This fixes a bunch of GPU hangs that happen with DXVK.

Vega10:
Totals from affected shaders:
SGPRS: 110456 -> 110456 (0.00 %)
VGPRS: 122800 -> 122800 (0.00 %)
Spilled SGPRs: 7478 -> 7478 (0.00 %)
Spilled VGPRs: 36 -> 36 (0.00 %)
Code Size: 9901104 -> 9922928 (0.22 %) bytes
Max Waves: 7143 -> 7143 (0.00 %)

Code size slightly increases because it inserts more branch
instructions but that's expected. I don't see any real performance
changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-09 14:16:49 +02:00
Samuel Pitoiset
94706f0de4 radv: fix missing ZRANGE_PRECISION(1) for GFX9+
ZRANGE_PRECISION(1) seems to be the default optimal value, but
it was only set for VI and older chips.

This fixes a rendering issue with Banished through DXVK, and
might fix more than that.

There is still the ZRANGE_PRECISION bug that we need to handle
but that can be fixed later.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-09 10:57:01 +02:00
Gustavo Lima Chaves
7dfaf025c5 anv: enable VK_EXT_shader_stencil_export
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-08 11:16:01 -07:00
Gustavo Lima Chaves
7cc5178bba spirv: add/hookup SpvCapabilityStencilExportEXT
v2:
An attempt to support SpvExecutionModeStencilRefReplacingEXT's behavior
also follows, with the interpretation to said mode being we prevent
writes to the built-in FragStencilRefEXT variable when the execution
mode isn't set.

v3:
A more cautious reading of 1db44252d0 led
me to a missing change that would stop (what I later discovered were)
GPU hangs on the CTS test written to exercise this.

v4:
Turn FragStencilRefEXT decoration usage without StencilRefReplacingEXT
mode into a warning, instead of trying to make the variable read-only.
If we are to follow the originating extension on GL, the built-in
variable in question should never be readable anyway.

v5/v6: rebases.

v7:
Fix check for gen9 lost in rebase. (Ilia)
Reduce the scope of the bool used to track whether
SpvExecutionModeStencilRefReplacingEXT was used. Was in shader_info,
moved to vtn_builder. (Jason)

v8:
Assert for fragment shader handling StencilRefReplacingEXT execution
mode. (Caio)
Remove warning logic, since an entry point might not have
StencilRefReplacingEXT execution mode, but the global output variable
might still exist for another entry point in the module. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-08 11:15:37 -07:00
Eric Anholt
22cc83cf87 travis: Add the v3d driver to the automake build.
Hopefully this reduces the number of fixup commits we need for the
automake build.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-06-08 09:50:38 -07:00
Eric Anholt
3db39d84d2 travis: Do our automake build tests with srcdir != builddir.
This will catch many automake bugs that end-users get to experience first,
otherwise.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-06-08 09:50:28 -07:00
Eric Engestrom
37eb56d239 autotools/meson: compile against wayland-egl-*backend*
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=106861
Fixes: 1db4ec0546 "egl: rewire the build systems to use libwayland-egl"
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Andreas Hartmetz <ahartmetz@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-08 16:45:43 +01:00
Cameron Kumar
cb03803253 vulkan/wsi: Destroy swapchain images after terminating FIFO queues
The queue_manager thread can access the images from x11_present_to_x11,
hence this reorder prevents dereferencing of dangling pointers.

Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Fixes: e73d136a02 ("vulkan/wsi/x11: Implement FIFO mode.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-08 14:06:46 +01:00
Sonny Jiang
ce64c1b70a radeonsi: emit_dpbb_state packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:40 -04:00
Sonny Jiang
7dcfa1f46e radeonsi: emit_clip_state packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:36 -04:00
Sonny Jiang
06b47005d3 radeonsi: emit_msaa_sample_locs packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:36 -04:00
Sonny Jiang
a1b4b00ce2 radeonsi: emit_msaa_config packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:36 -04:00
Sonny Jiang
2bad413f55 radeonsi: emit_cb_render_state packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:25 -04:00
Sonny Jiang
43b0269ce3 radeonsi: emit_db_render_state packets optimization
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-06-07 23:26:25 -04:00
Jan Vesely
d797f1f47e drisw: Fix invalid pointer arithmetic
Use of void * in pointer arithmetic is illegal, use char * instead.
Fixes: cf54bd5e83 ("drisw: use shared memory when possible")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
2018-06-07 21:01:29 -04:00
Timothy Arceri
03c370d2f1 radeonsi: fix possible truncation on renderer string
Fixes truncation warning in gcc 8.1

Fixes: 8539c9bf31 ("gallium/radeon: add the kernel version into the renderer string")
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2018-06-08 10:07:55 +10:00
Timothy Arceri
fae3b38770 ac: fix possible truncation of intrinsic name
Fixes the gcc warning:
snprintf’ output between 26 and 33 bytes into a destination of size 32

Fixes: d5f7ebda3e ("ac: add LLVM build functions for subgroup instrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-08 09:24:15 +10:00
Bas Nieuwenhuizen
4fc2d5e141 amd/common: Fix number of coords for getlod.
The LLVM 6 code reduced it to a non-array call. We need to do that
with the new code too.

This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv.

Fixes: a9a7993441 "amd/common: use the dimension-aware image intrinsics on LLVM 7+"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-07 23:59:52 +02:00
Dave Airlie
9be56316cf features: add virgl to the GL features list
This hopefully adds virgl to the correct places and current statuses
of various extensions.

virgl of course relies on two external things
a) host driver that can support the features
b) up to date host virglrenderer library that can support the features.

This list will be maintained as latest (a) + (b) + mesa.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-06-08 07:34:53 +10:00
Matt Turner
a5abb2da74 meson: Add support for read-only text segment on x86
Port of 6dfc5e28f7 (configure.ac: Add support to enable read-only text
segment on x86.) to Meson.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-07 14:16:44 -07:00
Dylan Baker
8f2421d73b meson: work around gentoo applying -m32 to host compiler in cross builds
Gentoo's ebuild system always adds -m32 to the compiler for doing x86_64
-> x86 cross builds, while meson expects it not to do that. This
results in an x86 -> x86 cross build, and assembly gets disabled.

Fixes: 2d62fc0646
       ("meson: disable x86 asm in fewer cases.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-07 11:54:06 -07:00
Jason Ekstrand
e0fa239962 i965/screen: Sanity check that all formats we advertise are useable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Jason Ekstrand
0e7f3febf7 i965/screen: Use RGBA non-sRGB formats for images
Not all of the MESA_FORMAT and ISL_FORMAT helpers we use can properly
handle RGBX formats.  Also, we don't want to make decisions based on
those in the first place because we can't render to RGBA and we use the
non-sRGB version to determine whether or not to allow CCS_E.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Jason Ekstrand
a266934935 i965/screen: Return false for unsupported formats in query_modifiers
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Jason Ekstrand
eeae485149 i965/screen: Refactor query_dma_buf_formats
This reworks it to work like query_dma_buf_modifiers and, in particular,
makes it more flexible so that we can disallow a non-static set of
formats.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Jason Ekstrand
3b54dd87f7 intel/isl: Add bounds-checking assertions for the format_info table
We follow the same convention as isl_format_get_layout in having two
assertions to ensure that only valid formats are passed in.  We also
check against the array size of the table because some valid formats
such as CCS formats will may be past the end of the table.  This fixes
some potential out-of-bounds array access even in valid cases.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Jason Ekstrand
778e2881a0 intel/isl: Add bounds-checking assertions in isl_format_get_layout
We add two assertions instead of one because the first assertion that
format != ISL_FORMAT_UNSUPPORTED is more descriptive and checks for a
real but unsupported enumerant while the second ensures that they don't
pass in garbage values.  We also update some other helpers to use
isl_format_get_layout instead of using the table directly so that they
get bounds checking too.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 11:23:34 -07:00
Dylan Baker
c267f46ef2 meson: Clarify why asm cannot be used in cross compile
This makes the reasoning for why a cross compile is not using asm
clearer (hopefully).

v2: - fix typos

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-07 10:40:35 -07:00
Eric Engestrom
f436ae237b docs: talk about Wayland instead of libwayland
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-07 18:06:40 +01:00
Jason Ekstrand
237c5ac4f9 anv: Set fence/semaphore types to NONE in impl_cleanup
There were some places that were calling anv_semaphore_impl_cleanup and
neither deleting the semaphore nor setting the type back to NONE.  Just
set it to NONE in impl_cleanup to avoid these issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106643
Fixes: 031f57eba "anv: Add a basic implementation of VK_KHX_external..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-07 09:46:45 -07:00
Plamena Manolova
3ba16d640e nir: Add global invocation id intrinsic.
Add the missing nir intrinsic for the gl_GlobalInvocationID
compute shader variable.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-07 14:53:12 +01:00
Eric Engestrom
61edad216e travis: bump libwayland to the first version with libwayland-egl
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-07 11:10:11 +01:00
Kenneth Graunke
3ea2d791f3 i965: Require softpin support for Cannonlake and later.
This isn't strictly necessary, but anyone running Cannonlake will
already have Kernel 4.5 or later, so there's no reason to support
the relocation model on Gen10+.

This will let us avoid dealing with them for new features.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-06 19:45:09 -07:00
Kenneth Graunke
a363bb2cd0 i965: Allocate VMA in userspace for full-PPGTT systems.
This patch enables soft-pinning of all buffers, allowing us to skip
relocation processing entirely.  All systems with full PPGTT and > 4GB
of VMA should gain these benefits.  This should be most Gen8+.

Unfortunately, this excludes a few systems:
- Cherryview (only has 32-bit addressing, despite 48-bit pointers)
- Broadwell with a 32-bit kernel
- Anybody running pre-4.5 kernel.

We may enable it for Cherryview in the future, but it would require
some tweaks to the memory zone.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-06 19:45:09 -07:00
Kenneth Graunke
74259b98aa intel/blorp: Emit VF cache invalidates for 48-bit bugs with softpin.
commit 92f01fc5f9 made i965 start emitting
VF cache invalidates when the high bits of vertex buffers change.  But
we were not tracking vertex buffers emitted by BLORP.  This was papered
over by a mistake where I emitted VF cache invalidates all the time,
which Chris fixed in commit 3ac5fbadfd.

This patch adds a new hook which allows the driver to track addresses
and request a VF cache invalidate as appropriate.

v2: Make the driver do the PIPE_CONTROL so it can apply workarounds
    (caught by Jason Ekstrand).  Rebase on anv bug fix.
v3: Don't screw up the boolean (caught by Jason Ekstrand).

Fixes: 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-06 19:45:09 -07:00
Timothy Arceri
2a74296f24 nir: add opt_if_loop_terminator()
This pass detects potential loop terminators and moves intructions
from the non breaking branch after the if-statement.

This enables both the new opt_if_simplification() pass and loop
unrolling to potentially progress further.

Unexpectedly this change speed up shader-db run times by ~3%

Ivy Bridge shader-db results (all changes in dolphin/ubershaders):

total instructions in shared programs: 9995662 -> 9995338 (-0.00%)
instructions in affected programs: 87845 -> 87521 (-0.37%)
helped: 27
HURT: 0

total cycles in shared programs: 230931495 -> 230925015 (-0.00%)
cycles in affected programs: 56391385 -> 56384905 (-0.01%)
helped: 27
HURT: 0

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-07 11:33:04 +10:00
Timothy Arceri
1098bc5e85 nir: move ends_in_break() helper to nir_loop_analyze.h
We will use the helper while simplifying potential loop terminators
in the following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-07 11:33:04 +10:00
Timothy Arceri
186988e28f radv: fix Coverity no effect control flow issue
swizzle is unsigned so "desc->swizzle[c] < 0" is never true.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-07 10:10:57 +10:00
Jason Ekstrand
44c614843c intel/blorp: Don't vertex fetch directly from clear values
On gen8+, we have to VF cache flush whenever a vertex binding aliases a
previous binding at the same index modulo 4GiB.  We deal with this in
Vulkan by ensuring that vertex buffers and the dynamic state (from which
BLORP pulls its vertex buffers) are in the same 4GiB region of the
address space.  That doesn't work if we're reading clear colors with the
VF unit.  In order to work around this we switch to using MI commands to
copy the clear value into the vertex buffer we allocate for the normal
constant data.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-06 16:32:38 -07:00
Lionel Landwerlin
b28a2510cc dri: add missing 16bits formats mapping
i965 advertises the 16-bit R and RG formats through
eglQueryDmaBufFormatsEXT but falls over when a client tries to use or
asks more information about such a format because
driImageFormatToGLFormat returns MESA_FORMAT_NONE.

Found by Eero Tamminen.

v2: Add G16R16 formats (Lionel)

v3: Fix G16R16 mapping to mesa format (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106642
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-07 00:09:21 +01:00
Eric Anholt
833c404600 nir: Look into uniform structs for samplers when counting num_textures.
mesa/st decides whether to update samplers after a program change based on
whether num_textures is nonzero.  By not counting samplers in a uniform
struct, we would segfault in
KHR-GLES3.shaders.struct.uniform.sampler_vertex if it was run in the same
context after a non-vertex-shader-uniform testcase (as is the case during
a full conformance run).

v2: Implement using two separate pure functions instead of updating
    pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-06 13:46:55 -07:00
Eric Anholt
f69473a712 v3d: Work around GFXH-1461/GFXH-1689 by using CLEAR_TILE_BUFFERS.
This doesn't seem to have done anything to my test results.  However,
given that we've still got a class of GPU hangs, following the workarounds
that the closed driver does so that we get the same command sequences
seems like a good idea.
2018-06-06 13:46:55 -07:00
Eric Anholt
9d5860310d v3d: Enable the new NIR bitfield operation lowering paths.
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.

v2: Fix commit message typo.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
73953b0713 nir: Add lowering for nir_op_bit_count.
This is basically the same as the GLSL lowering path.

v2: Fix typo in the link

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
7afa26d4e3 nir: Add lowering for nir_op_bitfield_reverse.
This is basically the same as the GLSL lowering path.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
6e1597c2d9 nir: Add an ALU lowering pass for mul_high.
This is based on the glsl/lower_instructions.cpp implementation, but
should be much more readable.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
6a0db5f08f nir: Add lowering for find_lsb.
There is a fairly simple relation to turn this into ufind_msb.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
d4c7c3c225 nir: Add lowering for ifind_msb to ufind_msb.
ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb
to that.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
af88acf4c4 nir: Add lowering from ibitfield_extract/ubitfield_extract to shifts.
V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to
glsl/lower_instructions.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt
74618ccbca nir: Add lowering for bitfieldInsert without using bfi.
If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes
things harder than keeping the "bits" argument around.

This still uses bfm, but I've added the obvious lowering of bfm if you
need it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Engestrom
735b104707 docs: add note about moving to libwayland-egl in 18.2.0
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Andres Gomez <agomez@igalia.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-06 12:12:03 -07:00
Eric Engestrom
b9361c9df0 egl: remove wayland-egl now that we're using libwayland-egl
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Daniel Stone <daniels@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-06 12:12:01 -07:00
Eric Engestrom
1db4ec0546 egl: rewire the build systems to use libwayland-egl
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Daniel Stone <daniels@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-06 12:11:57 -07:00
zhaowei yuan
67f7a16b59 glsl: Take 'double' as reserved after GLSL ES 1.0
GLSL ES 1.0.17 specifies that "double" is a keyword reserved

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106823
Signed-off-by: zhaowei yuan <zhaowei.yuan@samsung.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-05 23:39:25 -07:00
Marek Olšák
17a42062cc r300g/swtcl: make pipe_context uploaders use malloc'd memory as before
Discovered by Roland Scheidegger.

The resource_create code uses GPU memory for PIPE_BIND_CUSTOM, but
malloc'd memory otherwise. Vertex and index buffers should use malloc'd
memory.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
2018-06-05 22:52:08 -04:00
Jason Ekstrand
01ad2067bb intel/eu: Use a struct copy instead of a memcpy
The memcpy had the wrong size and this was causing crashes on 32-bit
builds of the driver.

Fixes: 6a9525bf67 "intel/eu: Switch to a logical state stack"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106830
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-05 15:51:01 -07:00
Philip Rebohle
cc21e96d5f radv: Use correct color format for fast clears
Using the image format is incorrect when the view has a different
format than the image. Instead, the view format needs to be used.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106687
2018-06-05 23:51:03 +02:00
Eric Anholt
2b1b2cbf61 v3d: Be more explicit about include directory from our generated code.
You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir.
nir was fine at that, but automake didn't have it.

Bugzilla: https://github.com/anholt/mesa/issues/104
2018-06-05 12:44:49 -07:00
Bas Nieuwenhuizen
2a10fd902d radv: Do not hardcode fast clear formats.
except for the odd one out.

This should support many more formats.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-05 20:53:21 +02:00
Scott D Phillips
6fb22114a0 intel/tools: add intel_sanitize_gpu to EXTRA_DIST
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106778
Fixes: cc41603d6d ("intel/tools: new intel_sanitize_gpu tool")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-05 10:32:35 -07:00
Scott D Phillips
08535dd886 util/tests/vma: Fix warning c++11-narrowing
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106801
Fixes: 943fecc569 ("util: Add a randomized test for the virtual memory allocator")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-05 10:32:07 -07:00
Scott D Phillips
4b123fb74b util: tests: vma test depends on C++11 support
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106776
Fixes: 943fecc569 ("util: Add a randomized test for the virtual memory allocator")
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-05 10:13:14 -07:00
Michel Dänzer
6b8f3724c8 glx: Fix number of property values to read in glXImportContextEXT
We were trying to read twice as many as the X server sent us, which
upset XCB:

[xcb] Too much data requested from _XRead
[xcb] This is most likely caused by a broken X extension library
[xcb] Aborting, sorry about that.
glx-free-context: ../../src/xcb_io.c:732: _XRead: Assertion `!xcb_xlib_too_much_data_requested' failed.

Fixing this takes 3 GLX piglit tests from crash to pass.

Fixes: 0852162950 "glx: Be more tolerant in glXImportContext (v2)"
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-06-05 18:56:43 +02:00
Eric Engestrom
c765c39ea7 configure: radv depends on mako
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=106784
Fixes: 17201a2eb0 "radv: port to using updated anv
                              entrypoint/extension generator."
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-05 16:32:48 +01:00
Eric Engestrom
5bdc38f356 travis: use correct form for array options
I'd like to eventually drop support for the confusing "an array of
a single empty string is meant to be interpreted as an empty array", so
let's start by not using it anymore.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-05 16:31:23 +01:00
Lionel Landwerlin
9aedee64ac anv: intel: add softpin flag on imported BOs
Looks like we forgot to update this bit of the driver for softpin.

Fixes: 4affeba1e9 ("anv: Soft-pin everything else")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-05 14:18:35 +01:00
Eric Engestrom
66c61797ad autotools: add missing android file to package
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=106779
Fixes: ff904978a1 "gallium/util: Android backtrace support"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-05 10:39:04 +01:00
Eric Engestrom
7c4423cce9 meson: fix platforms check for -D egl=true
Fixes: 0ed6a87a10 "meson: fix platforms=[]"
Reported-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-05 10:38:57 +01:00
Mathias Fröhlich
1ac4439d62 mesa: Make sure that imm draws are flushed before other draws execute.
The recent patch

    mesa: Remove FLUSH_VERTICES from VAO state changes.

    Pending draw calls on immediate mode or display list calls do
    not depend on changes of the VAO state. So, remove calls to
    FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.

uncovered a problem that non immediate mode draw calls do only
flush outstanding immediate mode draws if FLUSH_UPDATE_CURRENT
is set in ctx->Driver.NeedFlush.
In that case, due to the sequence of _mesa_set_draw_vao commands
we could end up with the VAO from the FLUSH_VERTICES call set
into gl_context::Array._DrawVAO when the array draw is executed.
So the change pulls FLUSH_CURRENT out of _mesa_validate_* calls
into the array draw calls being validated.
The change introduces a new macro FLUSH_FOR_DRAW beside FLUSH_VERTICES
and FLUSH_CURRENT that flushes on changed current attributes as well
as on outstanding immediate mode draw calls. Use FLUSH_FOR_DRAW
in the non immediate mode draw code paths.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106594
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-06-05 07:05:24 +02:00
gurchetansingh@chromium.org
a7b74a77fa virgl: use bits in caps set v2
Let's add another field to caps v2, that can help report boolean
values.

Suggested-by: Gert Wollny <gert.wollny@collabora.com>
Suggested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-05 14:29:00 +10:00
gurchetansingh@chromium.org
6ce94a50bb virgl: add shader offset alignment to to v2 caps struct
This is the SSBO analogue to fe0647. User supplied data must
be a multiple of GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT.

This fixes 44 GLES31 tests on airlied@'s GLES31 sketch branches with
Nvidia hardware, but this patch standalone can applied to master. The
alignment restriction on Nvidia is 32, hence the default value.

Example tests:
   dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.0
   dEQP-GLES31.functional.ssbo.layout.multi_basic_types.single_buffer.std430

v2: Move to a better place in case statement
v3: Rebase

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-05 14:28:49 +10:00
Kenneth Graunke
1c9053d076 i965: Prepare batchbuffer module for softpin support.
If EXEC_OBJECT_PINNED is set, we don't want to emit any relocations.
We simply want to add the BO to the validation list, and possibly mark
it as writeable.  The new brw_use_pinned_bo() interface does just that.

To avoid having to make every caller consider both the relocation and
softpin cases, we make emit_reloc() call brw_use_pinned_bo() when given
a softpinned buffer.

We also can't grow buffers that are softpinned - the mechanism places a
larger BO at the same offset as the original, which requires moving BOs
around in the VMA.  With softpin, we only allocate enough VMA for the
original size of the BO.

v2: Assert that BOs aren't pinned if the kernel says we should move them
    (feedback from Chris Wilson)

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-04 18:38:41 -07:00
Kenneth Graunke
01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
This introduces a new fast virtual memory allocator integrated with our
BO cache bucketing.  For larger objects, it falls back to the simple
free-list allocator (util_vma).

This puts the allocators in place but doesn't enable softpin yet.

v2:
 (feedback from Chris Wilson)
 - Check (bo->kflags & EXEC_OBJECT_PINNED) instead of a global flag
 - Avoid vma_free(0ull) on the err_free path.
 - Only enable if the kernel says we have full PPGTT support
 - Make bucketing allocators more resistant to failing to grow arrays
 (feedback from Scott Phillips)
 - Don't use node after popping it from the list.
 - Avoid undefined behavior in canonicalization by reusing new helper
 - Comment updates
 (feedback from myself)
 - Avoid __vma_alloc vs. vma_alloc by making a zero_high_bits helper
   to return a non-canonical address with the high bits zeroed.
 - Don't shadow loop variable 'i' when destroying things (ugly; worked)
v3:
 - Replace zero_high_bits with new common gen_48b_address helper.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-04 18:38:41 -07:00
Jason Ekstrand
e99b32d4d6 i965: Disable internal CCS for shadows of multi-sampled windows
If window system supports Y-tiling but not CCS_E, we currently create an
internal CCS for any window system buffers and then resolve right before
handing it off to X or Wayland.  In the case of the single-sampled
shadow of a multi-sampled window system buffer, this is pointless
because the only thing we do with it is use it as a MSAA resolve target
so we do MSAA resolve -> CCS resolve -> hand to the window system.
Instead, just disable CCS for the shadow and then the MSAA resolve will
write uncompressed directly into it.  If the window system supports
CCS_E, we will still use CCS_E, we just won't do internal CCS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 15:27:29 -07:00
Jason Ekstrand
6ab9fe7673 i965/miptree: Rename a parameter to create_for_dri_image
Instead of having it be a general "is this a winsys image" boolean, make
it more specific to the actual purpose.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 15:27:16 -07:00
Jason Ekstrand
6a9525bf67 intel/eu: Switch to a logical state stack
Instead of the state stack that's based on copying a dummy instruction
around, we start using a logical stack of brw_insn_states.  This uses a
bit less memory and is way less conceptually bogus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 14:03:03 -07:00
Jason Ekstrand
db9675f5a4 intel/eu: Set flag [sub]register number differently for 3src
Prior to gen8, the flag [sub]register number is in a different spot on
3src instructions than on other instructions.  Starting with Broadwell,
they made it consistent.  This commit fixes bugs that occur when a
conditional modifier gets propagated into a 3src instruction such as a
MAD.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 14:03:03 -07:00
Jason Ekstrand
2d20303e18 intel/eu: Copy fields manually in brw_next_insn
Instead of doing a memcpy, this moves us to start with a blank
instruction (memset to zero) and copy the fields over one at a time.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 14:03:03 -07:00
Jason Ekstrand
381fac2740 intel/eu: Add some brw_get_default_ helpers
This is much cleaner than everything that wants a default value poking
at the bits of p->current directly.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 14:03:03 -07:00
Jose Fonseca
db38c3b4ba trace: Fix parsing of recent traces.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-06-04 21:06:31 +01:00
Jose Fonseca
8652ff7cdf trace: Fix trace_context_transfer_unmap methods.
The emitted buffer_subdata/texture_subdata call didn't match the
respective signatures.

v2: Actually emit buffer_subdata call.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-06-04 21:06:31 +01:00
Nicolai Hähnle
a9a7993441 amd/common: use the dimension-aware image intrinsics on LLVM 7+
Requires LLVM trunk r329166.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-06-04 21:34:59 +02:00
Kenneth Graunke
b3ba47c592 i965: Fix batch-last mode to properly swap BOs.
On pre-4.13 kernels, which don't support I915_EXEC_BATCH_FIRST, we move
the validation list entry to the end...but incorrectly left the exec_bo
array alone, causing a mismatch where exec_bos[0] no longer corresponded
with validation_list[0] (and similarly for the last entry).

One example of resulting breakage is that we'd update bo->gtt_offset
based on the wrong buffer.  This wreaked total havoc when trying to use
softpin, and likely caused unnecessary relocations in the normal case.

Fixes: 29ba502a4e (i965: Use I915_EXEC_BATCH_FIRST when available.)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-06-04 09:43:09 -07:00
Samuel Pitoiset
06d3c65098 radv: fix a GPU hang when MRTs are sparse
When the i-th target format is set, all previous target formats
must be non-zero to avoid hangs. In other words, without this
if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs
because the target format of mrt1 is zero.

This fixes DXVK GPU hangs with "Seven: The Days Long Gone",
"GTA V" and probably more games.

Cc: "18.0" 18.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-04 14:01:33 +02:00
Bas Nieuwenhuizen
2835b6baf4 radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.
Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and
GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the
GEOMETRY shader for the TESS_EVAL shader.

This would cause the flush_constants code to emit the GEOMETRY constants
to the TESS_EVAL registers and then conclude that it did not need to set
the GEOMETRY shader registers.

Fixes: dfff9fb6f8 "radv: Handle GFX9 merged shaders in radv_flush_constants()"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-04 13:46:24 +02:00
Samuel Pitoiset
e3e929f8c3 nir: implement the GLSL equivalent of if simplication in nir_opt_if
This pass turns:

   if (cond) {
   } else {
      do_work();
   }

into:

   if (!cond) {
      do_work();
   } else {
   }

Here's the vkpipeline-db stats (from affected shaders) on Polaris10:

Totals from affected shaders:
SGPRS: 17272 -> 17296 (0.14 %)
VGPRS: 18712 -> 18740 (0.15 %)
Spilled SGPRs: 1179 -> 1142 (-3.14 %)
Code Size: 1503364 -> 1515176 (0.79 %) bytes
Max Waves: 916 -> 911 (-0.55 %)

This pass only affects Serious Sam 2017 (Vulkan) on my side. The
stats are not really good for now. Some shaders look quite dumb
but this will be improved with further NIR passes, like ifs
combination.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-04 12:41:10 +02:00
Samuel Pitoiset
e44f90eccf nir: make is_comparison() a non-static helper function
Rename and change the prototype for consistency regarding
nir_tex_instr_is_query(). This function will be used in the
following patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-04 12:41:08 +02:00
Dave Airlie
67eccd6aa2 nir: use num_components wrappers in print/validate.
These wrappers were introduces, so start using them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-04 05:58:42 +10:00
Juan A. Suarez Romero
bad7332f7c doc: update calendar, add news and link release notes for 18.0.5
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2018-06-03 10:19:32 +00:00
Juan A. Suarez Romero
41c01d79ee docs: add sha256 checksums for 18.0.5
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit aba161e63a)
2018-06-03 10:12:02 +00:00
Juan A. Suarez Romero
a89cb6711b docs: add release notes for 18.0.5
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit ca0037aaef)
2018-06-03 10:12:00 +00:00
Jose Fonseca
8841c2cda5 scons: Fix MinGW cross compilation with LLVM 5.0.
LLVM 5.0 requires additional Win32 libraries, and MinGW with pthreads.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-06-02 09:58:50 +01:00
Jason Ekstrand
64e619674e anv: Don't even bother processing relocs if we have softpin
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 16:34:26 -07:00
Jason Ekstrand
c7be17c8d3 anv: Refactor reloc handling in execbuf_add_bo
This just separates the reloc list vs. BO set cases and lets us avoid an
allocation if relocs->deps->entries == 0.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 16:34:25 -07:00
Jason Ekstrand
7105b7890a anv: Assert that the kernel leaves pinned BO addresses alone
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 16:33:07 -07:00
Scott D Phillips
4affeba1e9 anv: Soft-pin everything else
v2 (Jason Ekstrand):
 - Break up Scott's mega-patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:13 -07:00
Scott D Phillips
f3dbe0419d anv: Soft-pin batch buffers
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:12 -07:00
Jason Ekstrand
a0b133286a anv/batch_chain: Simplify secondary batch return chaining
Previously, we did this weird thing where we left space and an empty
relocation for use in a hypothetical MI_BATCH_BUFFER_START that would be
added to the secondary later.  Then, when it came time to chain it into
the primary, we would back that out and emit an MI_BATCH_BUFFER_START.
This worked well but it was always a bit hacky, fragile and ugly.  This
commit instead adds a helper for rewriting the MI_BATCH_BUFFER_START at
the end of an anv_batch_bo and we use that helper for both batch bo list
cloning and handling returns from secondaries.  The new helper doesn't
actually modify the batch in any way but instead just adjusts the
relocation as needed.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:12 -07:00
Jason Ekstrand
4f20c665b4 anv/batch_chain: Call batch_bo_finish at the end of end_batch_buffer
The only reason we were calling it in the middle was that one of the
cases for figuring out the secondary command buffer execution type
wanted batch_bo->length which gets set by batch_bo_finish.  It's easy
enough to recalculate and now batch_bo_finish is called in a sensible
location.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:11 -07:00
Jason Ekstrand
e7d0378bd9 anv: Soft-pin client-allocated memory
Now that we've done all that refactoring, addresses are now being
directly written into surface states by ISL and BLORP whenever a BO is
pinned so there's really nothing to do besides enable it.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:11 -07:00
Jason Ekstrand
caf41c78ca anv/allocator: Support softpin in the BO cache
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:11 -07:00
Jason Ekstrand
b0d50247a7 anv/allocator: Set the BO flags in bo_cache_alloc/import
It's safer to set them there because we have the opportunity to properly
handle combining flags if a BO is imported more than once.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-06-01 14:27:10 -07:00
Scott D Phillips
27cc68d9e9 anv: For pinned BOs, skip relocations, but track bo usage
References to pinned BOs won't need to be relocated at a later
point, so just write the final value of the reference into the bo
directly.

Add a `set` to the relocation lists for tracking dependencies that
were previously tracked by relocations. When a batch is executed, we
add the referenced pinned BOs to the exec list.

v2: - visit bos from the dependency set in a deterministic order (Jason)
v3: - compar => compare, drat (Jason)
    - Reworded commit message, provided by (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-01 14:27:10 -07:00
Scott D Phillips
c7db0ed4e9 anv: Use a separate pool for binding tables when soft pinning
Soft pinning lets us satisfy the binding table address
requirements without using both sides of a growing state_pool.

If you do use both sides of a state pool, then you need to read
the state pool's center_bo_offset (with the device mutex held) to
know the final offset of relocations that target the state pool
bo.

By having a separate pool for binding tables that only grows in
the forward direction, the center_bo_offset is always 0 and
relocations don't need an update pass to adjust relocations with
the mutex held.

v2: - don't introduce a separate state flag for separate binding tables (Jason)
    - replace bo and map accessors with a single binding_table_pool accessor (Jason)
v3: - assert bt_block->offset >= 0 for the separate binding table (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-01 14:27:10 -07:00
Scott D Phillips
e662bdb820 anv: Soft-pin state pools
The state_pools reserve virtual address space of the full
BLOCK_POOL_MEMFD_SIZE, but maintain the current behavior of
growing from the middle.

v2: - rename block_pool::offset to block_pool::start_address (Jason)
    - assign state pool start_address statically (Jason)
v3: - remove unnecessary bo_flags tampering for the dynamic pool (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-06-01 13:49:22 -07:00
Ian Romanick
f00fcfb7a2 nir: Lower !f2b(x) to x == 0.0
Some trivial help now, but it also prevents ~40 regressions caused by
Samuel's "nir: implement the GLSL equivalent of if simplication in
nir_opt_if" patch.

All Gen4+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14369557 -> 14369555 (<.01%)
instructions in affected programs: 442 -> 440 (-0.45%)
helped: 2
HURT: 0

total cycles in shared programs: 532425772 -> 532425743 (<.01%)
cycles in affected programs: 6086 -> 6057 (-0.48%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-06-01 10:14:53 -07:00
Ian Romanick
619c51722b nir: Add some missing "optimization undo" patterns
d8d18516b0 and 03fb13f646 added some patterns to undo conversions like

   (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c)))

If further optimization cause some of the operands to either be the same
or be constants, undoing the transformation can lead to further savings.

I don't know why these patterns were not added in those patches.  I did
not check to see which specific patterns actually helped.  I just added
all of them for symmetry.  This prevents some loop unrolling regressions
Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if
simplication in nir_opt_if" patch.

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 14369768 -> 14369557 (<.01%)
instructions in affected programs: 44076 -> 43865 (-0.48%)
helped: 141
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1
helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60%
95% mean confidence interval for instructions value: -1.67 -1.32
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 532430629 -> 532425772 (<.01%)
cycles in affected programs: 1170832 -> 1165975 (-0.41%)
helped: 101
HURT: 5
helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32
helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03%
HURT stats (abs)   min: 2 max: 22 x̄: 9.20 x̃: 4
HURT stats (rel)   min: <.01% max: 0.05% x̄: 0.02% x̃: <.01%
95% mean confidence interval for cycles value: -53.64 -38.00
95% mean confidence interval for cycles %-change: -3.06% -2.20%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-01 10:13:16 -07:00
Eric Engestrom
57fbc2ac50 docs/meson: mention how to use array options
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
03a2e7b662 meson: drop unused empty string array element
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
0ed6a87a10 meson: fix platforms=[]
Fixes: 5608d0a2ce ("meson: use array type options")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
a92cdcd598 meson: fix vulkan-drivers=[]
Fixes: 5608d0a2ce ("meson: use array type options")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
a425db4d7d meson: fix gallium-drivers=[]
Fixes: 5608d0a2ce ("meson: use array type options")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
393abd6a57 meson: fix dri-drivers=[]
Fixes: 5608d0a2ce ("meson: use array type options")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Eric Engestrom
8faa22c146 REVIEWERS: add root meson.build to the Meson reviewers group
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-01 17:53:06 +01:00
Juan A. Suarez Romero
cbe4baed1f glsl: Add ir_binop_vector_extract in NIR
Implement ir_binop_vector_extract using NIR operations. Based on SPIR-V
to NIR approach.

This fixes:
dEQP-GLES3.functional.shaders.indexing.moredynamic.with_value_from_indexing_expression_fragment
Piglit's glsl-fs-vec4-indexing-8.shader_test

CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral <itoral@igalia.com>
2018-06-01 18:09:22 +02:00
Dylan Baker
4ad8e2ac82 doc: update calendar, add news and link release notes for 18.1.1 2018-06-01 08:39:17 -07:00
Dylan Baker
55ee53ea19 docs/relnotes: Add sha256 sums for mesa 18.1.1 2018-06-01 08:39:17 -07:00
Dylan Baker
423c4fe954 docs: Add release notes for 18.1.1 2018-06-01 08:39:17 -07:00
Plamena Manolova
939312702e i965: Add ARB_fragment_shader_interlock support.
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-06-01 16:36:39 +01:00
Plamena Manolova
60e843c4d5 mesa: Add GL/GLSL plumbing for ARB_fragment_shader_interlock.
This extension provides new GLSL built-in functions
beginInvocationInterlockARB() and endInvocationInterlockARB()
that delimit a critical section of fragment shader code. For
pairs of shader invocations with "overlapping" coverage in a
given pixel, the OpenGL implementation will guarantee that the
critical section of the fragment shader will be executed for
only one fragment at a time.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-06-01 16:36:36 +01:00
Martin Pelikán
53719f818c compiler/spirv: reject invalid shader code properly
After bebe3d626e, b->fail_jump is prepared after vtn_create_builder
which can longjmp(3) to it through its vtx_assert()s.  This corrupts
the stack and creates confusing core dumps, so we need to avoid it.

While there, I decided to print the offending values for debugability.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-01 08:09:35 -07:00
Juan A. Suarez Romero
360bfb619f docs: change release manager for 18.1
Dylan will replace Emil as the release manager for 18.1.x series.

CC: Emil Velikov <emil.l.velikov@gmail.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-01 15:24:02 +02:00
Gert Wollny
ef3a6e3d98 virgl: Always assume that ORIGIN_UPPER_LEFT and PIXEL_CENTER* are supported
The driver must support at least one of

  PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT
  PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT

and one of

  PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER
  PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER

otherwise glsl_to_tgsi will fire an assert.

ORIGIN_UPPER_LEFT is the default convention, and is supported by
all mesa drivers, hence it seems reasonable to always report the caps
to be enabled.  On gles ORIGIN_LOWER_LEFT is generally not supported,
so we rely on the caps reported by the host that depend on whether we
run on an GL or an EGL host.

For PIXEL_CENTER it is completely host driver dependend on what is
supported, and since we do not report the actual host driver capabilities
it is best to mark both as supported, this is how it works for a GL
host too.

Fixes:
   dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_xyz
   dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1
   dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2

Reviewed-by: Gurchetan Singh <gurcetansingh@chromium.org>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-06-01 12:04:21 +01:00
Alex Smith
01a2414045 radeonsi: Fix crash on shaders using MSAA image load/store
The value returned by tgsi_util_get_texture_coord_dim() does not
account for the sample index. This means image_fetch_coords() will not
fetch it, leading to a null deref in ac_build_image_opcode() which
expects it to be present (the return value of ac_num_coords() *does*
include the sample index).

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-06-01 08:53:38 +01:00
Alex Smith
dfff9fb6f8 radv: Handle GFX9 merged shaders in radv_flush_constants()
This was not previously handled correctly. For example,
push_constant_stages might only contain MESA_SHADER_VERTEX because
only that stage was changed by CmdPushConstants or
CmdBindDescriptorSets.

In that case, if vertex has been merged with tess control, then the
push constant address wouldn't be updated since
pipeline->shaders[MESA_SHADER_VERTEX] would be NULL.

Use radv_get_shader() instead of getting the shader directly so that
we get the right shader if merged. Also, skip emitting the address
redundantly - if two merged stages are set in push_constant_stages
this change would have made the address get emitted twice.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:34 +01:00
Alex Smith
7ca0167ae9 radv: Consolidate GFX9 merged shader lookup logic
This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:31 +01:00
Alex Smith
0fa51bfdbe radv: Set active_stages the same whether or not shaders were cached
With GFX9 merged shaders, active_stages would be set to the original
stages specified if shaders were not cached, but to the stages still
present after merging if they were.

Be consistent and use the original stages.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:01 +01:00
Marek Olšák
9e61147ef6 st/mesa: relax requirements for ARB_ES3_compatibility
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106748

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-01 01:04:17 -04:00
Scott D Phillips
29a139b308 anv/blorp: Write relocated values into surface states
v2 (Jason Ekstrand):
 - Split the blorp bit into it's own patch and re-order a bit
 - Use anv_address helpers

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-31 16:51:47 -07:00
Jason Ekstrand
bf34ef16ac anv: Use an address for each anv_image plane
This is better than having BO and offset fields.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
1f2328c3b7 anv/cmd_buffer: Rework surface relocation helpers
This commit renames add_surface_state_reloc to add_surface_reloc and
makes it takes an address.  We also rename add_image_view_relocs to
add_surface_state_relocs because it takes an anv_surface_state and
doesn't really care about the image view anymore.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
f270a09737 anv: Use an anv_address in anv_buffer
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
8a8bd39d5e anv/cmd_buffer: Use anv_address for handling indirect parameters
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
1029458ee3 anv: Use an anv_address in anv_buffer_view
Instead of storing a BO and offset separately, use an anv_address.  This
changes anv_fill_buffer_surface_state to use anv_address and we now call
anv_address_physical and pass that into ISL.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
de1c5c1b50 anv: Use full anv_addresses in anv_surface_state
This refactors surface state filling to work entirely in terms of
anv_addresses instead of offsets.  This should make things simpler for
when we go to soft-pin image buffers.  Among other things,
add_image_view_relocs now only cares about the addresses in the surface
state and doesn't really need the image view anymore.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
94081ffc80 anv: Add some anv_address helpers
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Scott D Phillips
aaea46242d anv: Add vma_heap allocators in anv_device
These will be used to assign virtual addresses to soft pinned
buffers in a later patch.

Two allocators are added for separate 'low' and 'high' virtual
memory areas. Another alternative would have been to add a
double-sided allocator, which wasn't done here just because it
didn't appear to give any code complexity advantages.

v2 (Scott Phillips):
 - rename has_exec_softpin to use_softpin (Jason)
 - Only remove bottom one page and top 4 GiB from virt (Jason)
 - refer to comment in anv_allocator about state address + size
   overflowing 48 bits (Jason)
 - Mention hi/lo allocators vs double-sided allocator in
   commit message (Chris)
 - assign state pool memory ranges statically (Jason)

v3 (Jason Ekstrand):
 - Use (LOW|HIGH)_HEAP_(MIN|MAX)_ADDRESS rather than (1 << 31) for
   determining which heap to use in anv_vma_free
 - Only return de-canonicalized addresses to the heap

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:46 -07:00
Jason Ekstrand
6e4672f881 intel/common: Add an address de-canonicalization helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-31 16:51:45 -07:00
Scott D Phillips
943fecc569 util: Add a randomized test for the virtual memory allocator
The test pseudo-randomly makes allocations and deallocations with
the virtual memory allocator and checks that the results are
consistent. Specifically, we test that:

 * no result from the allocator overlaps an already allocated range
 * allocated memory fulfills the stated alignment requirement
 * a failed result from the allocator could not have been fulfilled
 * memory freed to the allocator can later be allocated again

v2: - fix if() in test() to actually run fill()
v3: - add c++11 build flag (Jason)
    - test the full 64-bit range (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-31 16:51:35 -07:00
Jason Ekstrand
f19ad5d31f util: Add a virtual memory allocator
This is simple linear-walk first-fit allocator roughly based on the
allocator in the radeon winsys code.  This allocator has two primary
functional differences:

 1) It cleanly returns 0 on allocation failure

 2) It allocates addresses top-down instead of bottom-up.

The second one is needed for Intel because high addresses (with bit 47
set) need to be canonicalized in order to work properly.  If we allocate
bottom-up, then high addresses will be very rare (if they ever happen).
We'd rather always have high addresses so that the canonicalization code
gets better testing.

v2: - [scott-ph] remove _heap_validate() if NDEBUG is defined (Jordan)

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Tested-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-31 16:17:35 -07:00
Bas Nieuwenhuizen
b9fb2c266a radv: Add startup debug option.
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.

A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.

This adds some info messages so we can easily get a some useful
output from end users.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen
38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen
729f7373de radv: Make the sem_info allocate/free functions static.
They are only used in 1 file.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Samuel Pitoiset
70f9e2589e nir: optimize iand(ieq(a, 0), ieq(b, 0)) to ieq(ior(a, b), 0)
Totals from affected shaders:
SGPRS: 80 -> 80 (0.00 %)
VGPRS: 48 -> 48 (0.00 %)
Code Size: 2120 -> 2096 (-1.13 %) bytes
Max Waves: 16 -> 16 (0.00 %)

Only two Rise of Tomb Raider shaders are affected on my side.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-31 10:57:16 +02:00
Tapani Pälli
c983c6abaf mesa: don't call Driver.TexEnv with invalid arguments
Patch skips useless and possibly dangerous calls down to the driver
in case invalid arguments were given. I noticed this would be happening
with demo of Darwinia game. AFAIK this does not fix anything but makes
this path safer and more like how other API functions are implemented.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-31 09:24:17 +03:00
Vinson Lee
d511bba2f9 v3d: Fix automake linking error.
CXXLD    gallium_dri.la
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_dump_packet':
src/broadcom/clif/clif_dump.c:87: undefined reference to `v3d33_clif_dump_packet'
src/broadcom/clif/clif_dump.c:85: undefined reference to `v3d41_clif_dump_packet'
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_process_worklist':
src/broadcom/clif/clif_dump.c:140: undefined reference to `v3d41_clif_dump_gl_shader_state_record'
src/broadcom/clif/clif_dump.c:144: undefined reference to `v3d33_clif_dump_gl_shader_state_record'

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-30 11:55:09 -07:00
Jakob Bornecrantz
d6cee5a162 virgl: Update virgl_hw.h
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:07:26 +01:00
Dave Airlie
e2b6d830b2 virgl: add ARB_transform_feedback_overflow_query support
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:02:55 +01:00
Dave Airlie
22b072c194 virgl: add polygon offset clamp
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:02:51 +01:00
Dave Airlie
49204ff8ad virgl: add derivative control support
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:02:47 +01:00
Dave Airlie
46fe349af2 virgl: add ARB_conditional_render_inverted support
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:02:40 +01:00
Dave Airlie
f9eb7e8b76 virgl: update caps bitset to latest version.
This makes this use all 32 bits, so future sets need to be
defined in a new struct.

Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
2018-05-30 17:02:19 +01:00
Timothy Arceri
e8b368ad1c nir: add unsigned comparison simplifications
This avoids loop unrolling regressions in Wolfenstein II on DXVK
with an upcoming optimisation series from Samuel.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-30 22:48:37 +10:00
Bas Nieuwenhuizen
c2799574eb radv: Only expose subgroup shuffles on VI+.
The current implementation depends on bpermute, which
is VI+.

Fixes: f2c6a55061 "radv: enable subgroup capabilities"
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-30 13:49:46 +02:00
Samuel Pitoiset
02c7916298 radv: fix emitting descriptor pointers with LLVM < 7
This was terribly wrong, I forced use of 32-bit pointers when
emitting shader descriptor pointers. This fixes GPU hangs with
LLVM 5&6 because 32-bit pointers are only supported with LLVM 7.

Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-30 11:38:54 +02:00
Ilia Mirkin
04fff21c62 nv30: add a couple of missed shader caps
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-30 02:06:28 -04:00
Ilia Mirkin
30918b77ac nv30: ensure that displayable formats are marked accordingly
Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-30 02:06:28 -04:00
Marek Olšák
858ac8942d mesa: expose ARB_tessellation_shader in the compatibility profile
Gallium drivers don't expose this yet due to:
    "st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
16ac832392 mesa: expose AMD_vertex_shader_layer in the compatibility profile
This requires layered FBOs from GL 3.2.

Gallium drivers don't expose this yet due to:
    "st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
518d8065ce mesa: expose ARB_gpu_shader5 in the compatibility profile
Gallium drivers don't expose this yet due to:
    "st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
dd93bc4f34 st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
34ea55d820 gallium: add PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
e453fc76e7 mesa: update fixed-func state constants for TCS, TES, GS
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
27a9f27310 mesa: print Compatibility Profile in the version string
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
d3a87537dd glsl: parse #version XXX compatibility
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-29 20:13:24 -04:00
Marek Olšák
a7d0c53ab8 st/mesa: fix assertion failures with GL_UNSIGNED_INT64_ARB (v2)
Bindless texture handles can be passed via vertex attribs using this type.
They use the double codepath, so don't use st_pipe_vertex_format.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-29 20:09:00 -04:00
Marek Olšák
a8e1413876 mesa: handle GL_UNSIGNED_INT64_ARB properly (v2)
Bindless texture handles can be passed via vertex attribs using this type.
This fixes a bunch of bindless piglit tests on radeonsi.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-29 20:09:00 -04:00
Timothy Arceri
1f7a3a1102 mesa: add display list support for glPatchParameter{i,fv}()
This is required for tessellation shader Compat profile support.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-30 09:37:35 +10:00
Dave Airlie
d3ff478732 glx/drisw: make the shm/non-shm loader extensions separately.
I disliked removing the const here, function tables are meant
to be const just to avoid having to think about them,
make a second table for the shm vs non-shm paths to use.

Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
33ce3aa512 drisw/glx: implement getImageShm
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
17b27725fe drisw: use getImageShm() if available
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
9feaf33371 drisw: learn to query shmid handle type
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
bcd80be49a drisw/glx: use XShm if possible
Implements putImageShm from DRIswrastLoaderExtension.

If XShm extension is not available, or fails, it will fallback on
regular XPutImage().

Tested on Linux only with 16bpp and 32bpp visual.

(airlied: tested on 24bpp as well)

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
cf54bd5e83 drisw: use shared memory when possible
If drisw_loader_funcs implements put_image_shm, allocates display
target data with shared memory and display with put_image_shm().

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:54 +10:00
Marc-André Lureau
63c427fa71 drisw: use putImageShm if available
If the DRIswrastLoaderExtension implements putImageShm, bind it to
drisw_loader_funcs.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:53 +10:00
Marc-André Lureau
de8085e649 dri: add putImageShm and getImageShm to swrastLoader
Add new API to put and get an image using shared memory. Instead of only
passing the data pointer, 3 arguments are given: the shmid, the data
offset and the shmaddr.

Bump interface version.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2018-05-30 09:11:53 +10:00
Dave Airlie
b7ac0779e0 gallium/winsys: rename DRM_API_HANDLE_* to WINSYS_HANDLE_*
This just renames this as we want to add an shm handle which
isn't really drm related.

Originally by: Marc-André Lureau <marcandre.lureau@gmail.com>
(airlied: I used this sed script instead)
This was generated with:
 git grep -l 'DRM_API_' | xargs sed -i 's/DRM_API_/WINSYS_/g'

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-30 09:11:53 +10:00
Marc-André Lureau
d2eaff33d0 gallium: move winsys handle to it's own file.
This will be used in the drisw interface later, which isn't
drm specific.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-30 09:11:53 +10:00
Francisco Jerez
4bd2047dee intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT.  When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
d3cd6b7215 intel/fs: Replace the CINTERP opcode with a simple MOV
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup.  Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.

v2 (Jason Ekstrand):
 - Break the bit which removes the CINTERP opcode into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
39de901a96 intel/fs: Use the ATTR file for FS inputs
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.

v2 (Jason Ekstrand):
 - Break into multiple patches
 - Change the units of the FS ATTR to be in logical scalars

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
4bfa2ac2ea intel/fs: Rename a local variable so it doesn't shadow component()
v2 (Jason Ekstrand):
 - Break the refactor into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
11c71f0e75 intel/eu: Remove brw_codegen::compressed_stack.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Jason Ekstrand
71a86d1fc6 intel/fs: Use groups for SIMD16 LINTERP on gen11+
This is better than compression control because it naturally extends to
SIMD32.

v2:
 - Push/pop instruction state around adjusted codegen (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Jason Ekstrand
a1a850cd34 intel/fs: Assert that the gen4-6 plane restrictions are followed
The fall-back does not work correctly in SIMD16 mode and the register
allocator should ensure that we never hit this case anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Jan Vesely
ed834aefa2 travis: Add clover llvm-6.0 build
v2: Don't force build using gcc-4.8
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
2018-05-29 17:36:16 -04:00
Jan Vesely
41b878e1bd clover: Cleanup compat code for llvm < 3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
2018-05-29 17:36:16 -04:00
Jan Vesely
d424be0fed clover: Fix build after llvm r332881.
v2: fix whitespace and indentation

r332881 added an extra parameter to the emit function.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106619
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
Tested-By: Aaron Watry <awatry@gmail.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
2018-05-29 17:36:16 -04:00
Chris Wilson
3ac5fbadfd i965: Only emit VF cache invalidations when the high bits changes
Commit 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit
addressing bugs with softpin.") tried to only emit the VF invalidate if
the high bits changed, but it accidentally always set need_invalidate to
true; causing it to emit unconditionally emit the pipe control before
every primitive.

Fixes: 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106708
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-29 12:16:26 -07:00
Eric Engestrom
e4fe2fd3bb vulkan: don't free uninitialised memory
The modifiers array hasn't been initialised by then, much less with data
that would need freeing.
Move the label after the loop to fix this.

Fixes: c80c08e226 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-29 17:44:13 +01:00
Eric Engestrom
51a17e7fee dri: replace two-way switch case with a table lookup
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
---
v2: rebased on top of 432df741e0 "dri_util: Add
R10G10B10{A,X}2 translation between DRI and mesa_format."
2018-05-29 17:44:13 +01:00
Eric Engestrom
d3ca7bd452 dri: fix error value returned by driGLFormatToImageFormat()
0 is not a valid value for the __DRI_IMAGE_FORMAT_* enum.
It is, however, the value of MESA_FORMAT_NONE, which two of the callers
(i915 & i965) checked for.

The other callers (that check for errors, ie. st/dri) already check for
__DRI_IMAGE_FORMAT_NONE.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-29 17:44:13 +01:00
Eric Engestrom
1945231b48 egl/x11: fix build with DRI3 disabled
Fixes: 473af0b541 "egl/x11: deduplicate depth-to-format logic"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Yogesh Marathe <yogesh.marathe@intel.com>
2018-05-29 17:01:21 +01:00
Emil Velikov
63b95fb291 meson: require shared glapi when using DRI based libGL
Just like we do in the autotools build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 16:56:19 +01:00
Emil Velikov
728d1da159 meson: remove unreachable with_glx == 'auto' check
Cannot happen since, props to the autodetection further up.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 16:31:46 +01:00
Thierry Reding
9e539012df tegra: Treat resources with modifiers as scanout
Resources created with modifiers are treated as scanout because there is
no way for applications to specify the usage (though that capability may
be useful to have in the future). Currently all the resources created by
applications with modifiers are for scanout, so make sure they have bind
flags set accordingly.

This is necessary in order to properly export buffers for such resources
so that they can be shared with scanout hardware.

Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
2018-05-29 16:48:37 +02:00
Thierry Reding
9603d81df0 tegra: Fix scanout resources without modifiers
Resources created for scanout but without modifiers need to be treated
as pitch-linear. This is because applications that don't use modifiers
to create resources must be assumed to not understand modifiers and in
turn won't be able to create a DRM framebuffer and passing along which
modifiers were picked by the implementation.

Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
2018-05-29 16:48:34 +02:00
Thierry Reding
bd3e97e5aa tegra: Remove usage of non-stable UAPI
This code path is no longer required with framebuffer modifier support.

Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
2018-05-29 16:47:45 +02:00
Eric Engestrom
f736be86bb docs: add favicon to the website
favicon.png is just gears.png resized to 64x64, and favicon.ico is
generated using this command, adapted from the ImageMagick example [1]:

  $ convert favicon.png -background black \
      \( -clone 0 -resize 16x16 \) \
      \( -clone 0 -resize 32x32 \) \
      \( -clone 0 -resize 48x48 \) \
      \( -clone 0 -resize 64x64 \) \
      -delete 0 -alpha off -colors 256 favicon.ico

We could edit every html page to add `<link rel="icon" href="favicon.ico" />`,
but there's not much point as pretty much every browser will pick it up
automatically if the file is named `favicon.ico` and is in the root folder.

[1] http://www.imagemagick.org/Usage/thumbnails/#favicon

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 14:48:21 +01:00
Eric Engestrom
e6a1aca0b2 docs: add missing html closing tag
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 14:48:21 +01:00
Eric Engestrom
3b5376330f docs: add missing html tag
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 14:48:21 +01:00
Karol Herbst
56792a0876 nir/print: fix printing of 8/16 bit constant variables
v2 (Jose Maria Casanova Crespo <jmcasanova@igalia.com>): add float16 support

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2018-05-29 13:43:49 +02:00
Pierre Moreau
f0e80e123c nv50/ir: Extend ImmediateValue::applyLog2 to 64-bit integers
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-29 13:37:45 +02:00
Pierre Moreau
03f592a164 util/u_math: Implement a logbase2 function for unsigned long
v2 (Karol Herbst <kherbst@redhat.com>):
* removed unneeded ll
* ll -> ull

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-29 13:37:45 +02:00
Eric Engestrom
539aa604a0 docs: trivial typo fix
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-29 12:10:14 +01:00
Samuel Pitoiset
88d1ed0f81 radv: emit shader descriptor pointers consecutively
This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.

We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:18 +02:00
Samuel Pitoiset
21baf33a94 radv: allow radv_emit_shader_pointer_head() to emit more pointers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:16 +02:00
Samuel Pitoiset
288fe7ec71 radv: split radv_emit_shader_pointer()
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:13 +02:00
Rhys Perry
57e721a456 gm107/ir: prevent WaW hazards in instruction scheduling
Previously, findFirstUse() only considered reads "uses". This fixes that
by making it check both an instruction's sources and definitions. It
also shortens both findFistUse() and findFirstDef() along the way.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-28 13:59:56 -04:00
Bas Nieuwenhuizen
a29bc043ae radv: Implement VK_KHR_draw_indirect_count.
Literally the same as the AMD ext.

Passes *indirect_draw_count* CTS tests.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:08:26 +02:00
Bas Nieuwenhuizen
b0002e4e05 vulkan: Update header+vk.xml to 1.1.76
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-28 12:08:20 +02:00
Bas Nieuwenhuizen
6914d5a2c0 radv: Implement alternate GFX9 scissor workaround.
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)

It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.

I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:04:25 +02:00
Eric Anholt
3b6dfcf7ae Revert "st/nir: use NIR for asm programs"
This reverts commit 5c33e8c772.  It broke
fixed function vertex programs on vc4 and v3d, and apparently caused
trouble for radeonsi's NIR paths as well.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
https://bugs.freedesktop.org/show_bug.cgi?id=106673
2018-05-28 14:41:03 +10:00
Scott D Phillips
4714784dae anv: move canonical_address calculation into a separate function
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.

v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-27 19:24:33 -07:00
Gert Wollny
1aec4a07d4 r600: Fix SSG when not all components are written
Make sure only those components are written to that are specified in the
write mask.

Fixes:
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_fragment
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_fragment
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_fragment
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_fragment
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_fragment
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_vertex
  dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_fragment
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-28 02:57:46 +01:00
Gert Wollny
42cd2810aa r600: Correct IDIV if DST and SRC use the same temporary
In cases like

  IDIV TEMP[0].xy TEMP[0].xx TEMP[1].yy

the result will be written to the same register that is also a source register.
Since the components are evaluated one by one, this may result in overwriting
the source value for a later operation. Work around this by adding another
temporary to store the result if the destination temporary index is equal to
one of the source temporary indices.

Fixes:
  dEQP-GLES2.functional.shaders.operator.binary_operator.div.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-28 02:57:46 +01:00
Kenneth Graunke
58fb613a51 i965: Revert recent tiled memcpy changes.
This reverts commit 79fe00efb4.
This reverts commit f5e8b13f78.
This reverts commit d21c086d81.

They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
2018-05-26 16:25:50 -07:00
Scott D Phillips
79fe00efb4 i965/miptree: Use cpu tiling/detiling when mapping
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.

Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.

v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)

v3: Add units to parameter names of tile_extents (Nanley Chery)
    Use _mesa_align_malloc for the shadow copy (Nanley)
    Continue using gtt maps on gen4 (Nanley)

v4: Use streaming_load_memcpy when detiling

v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
    takes precedence.  Add intel_miptree_access_raw, needed after
    rebasing on commit b499b85b0f.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-25 21:35:50 -07:00
Chris Wilson
f5e8b13f78 i915: Fix streaming loads for intel_tiled_memcpy
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.

Fixes: d21c086d81 (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-25 21:35:50 -07:00
Marek Olšák
18c50498db radeonsi: remove unused variable addr_vec
trivial
2018-05-25 18:37:57 -04:00
Jason Ekstrand
ae514ca695 intel/blorp: Support blits and clears on surfaces with offsets
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice.  Most of i965 is fine with this but it breaks blorp.  This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path.  This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-25 14:01:44 -07:00
Marek Olšák
2f65c67043 radeonsi: fix passing gl_ClipVertex for GS and tess
Also add the fprintf call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák
a7d61c0753 radeonsi: fix color inputs/outputs for GS and tess
GS is tested, tessellation is untested.

Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák
92ea9329e5 radeonsi: fix incorrect parentheses around VS-PS varying elimination
I don't know if it caused issues.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák
a4ba7cd6a2 st/mesa: simplify lastLevel determination in st_finalize_texture
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.

Roughly based on i965 commit 845ad2667a
which does a similar thing but for a different reason.

This fixes GL CTS assertion failures introduced by Ilia.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:31:36 -04:00
Scott D Phillips
d21c086d81 i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
The reference for MOVNTDQA says:

    For WC memory type, the nontemporal hint may be implemented by
    loading a temporary internal buffer with the equivalent of an
    aligned cache line without filling this data to the cache.
    [...] Subsequent MOVNTDQA reads to unread portions of the WC
    cache line will receive data from the temporary internal
    buffer if data is available.

This hidden cache line sized temporary buffer can improve the
read performance from wc maps.

v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-25 11:05:46 -07:00
Alok Hota
fb20ae0374 swr/rast: Adjusted avx512 primitive assembly for msvc codegen
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:57:02 -05:00
Alok Hota
b3360f5c8b swr/rast: Moved memory init out of core swr init
Added two new files for a wrapper function for initialization

v2: added missing include for single architecture builds

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:56:55 -05:00
Alok Hota
b6b114c1ae swr/rast: Removed superfluous JitManager argument from passes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:56:49 -05:00
Alok Hota
98d0201577 swr/rast: Renamed MetaData calls
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:56:43 -05:00
Alok Hota
14b5cac0be swr/rast: Use metadata to communicate between passes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:56:37 -05:00
Alok Hota
f09636e2e1 swr/rast: Check gCoreBuckets/CORE_BUCKETS equal length at compile time
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:56:01 -05:00
Alok Hota
cfe75cc7b5 swr/rast: Added in-place building to SCATTERPS
SCATTERPS previously assumed it was being used with an existing basic
block

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-25 10:55:37 -05:00
Samuel Pitoiset
45eb24fedf radv: run the EarlyCSEMemSSA LLVM pass
It's recommended by the instruction combining pass, and
RadeonSI also runs it.

This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.

Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)

Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 14:24:14 +02:00
Samuel Pitoiset
66e38654c9 radv: fix dumping compute shader on the graphics queue
The graphics pipeline can be NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:07 +02:00
Samuel Pitoiset
de06dfa9ea radv: add radv_dump_pipeline_state() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:05 +02:00
Samuel Pitoiset
6f0530ecfe radv: rework how shaders are dumped when generating a hang report
Use a flag for the active stages instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:58:03 +02:00
Samuel Pitoiset
8c406f0b4d radv: remove unused parameter in radv_dump_annotated_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 11:57:59 +02:00
Jose Dapena Paz
6c61c31dc2 mesa: do not leak ctx->Shader.ReferencedProgram references
When glUseProgram is used, references to the included shaders are
added in ctx->Shader.ReferencedProgram. But those references are not
decreased when the shader data is deallocated. Thus, those shaders
are leaked.

Explicitely remove the pending references to these shaders.

Fixes: e6506b3cd2 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-25 10:38:09 +10:00
Marek Olšák
508b423dd6 radeonsi: set DB_EQAA.MAX_ANCHOR_SAMPLES correctly
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:57 -04:00
Marek Olšák
07e02c8617 radeonsi: round ps_iter_samples in set_min_samples
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:57 -04:00
Marek Olšák
510c88f9d1 radeonsi: remove redundant ps_iter_samples clamp
si_get_ps_iter_samples already does this.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Marek Olšák
25cdf754e4 radeonsi: remove some old gfx 9.x registers
Leftover from bring up.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Marek Olšák
b936f9aa32 radeonsi: disable primitive binning for all blitter ops
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Marek Olšák
8c1c451a90 ac/surface/gfx6: don't overallocate mipmapped HTILE
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-24 13:41:56 -04:00
Eric Engestrom
473af0b541 egl/x11: deduplicate depth-to-format logic
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-24 18:01:45 +01:00
Tapani Pälli
7b54404c9d i965: enable OES_texture_view for gen8+
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-24 12:53:07 +03:00
Tapani Pälli
3ddcdcf94d mesa: changes to expose OES_texture_view extension
Functionality already covered by ARB_texture_view, patch also
adds missing 'gles guard' for enums (added in f1563e6392).

Tested via arb_texture_view.*_gles3 tests and individual app
utilizing texture view with ETC2.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-24 12:53:07 +03:00
Juan A. Suarez Romero
046b2b651e docs: update release calendar for 18.1 series
v2: extend 18.1 series (Andres)
v3: fix copy/paste typo (Engestrom)

CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-05-24 11:47:47 +02:00
Samuel Pitoiset
38a8c5903b radv: call nir_lower_io_to_temporaries for VS, GS, TES and FS
Do not lower FS inputs because this moves all load_var
instructions at beginning of shaders and because
interp_var_at_sample (and friends) seem broken. That might
be eventually enabled later on if we really want to preload
all FS inputs at beginning.

Polaris10:
Totals from affected shaders:
SGPRS: 54072 -> 54264 (0.36 %)
VGPRS: 38580 -> 38124 (-1.18 %)
Spilled SGPRs: 652 -> 652 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2128116 -> 2127380 (-0.03 %) bytes
Max Waves: 8048 -> 8086 (0.47 %)

Vega10:
Totals from affected shaders:
SGPRS: 52616 -> 52656 (0.08 %)
VGPRS: 37536 -> 37116 (-1.12 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Code Size: 2043756 -> 2042672 (-0.05 %) bytes
Max Waves: 9176 -> 9254 (0.85 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:57 +02:00
Samuel Pitoiset
ded1509587 radv: call nir_split_var_copies() before nir_lower_var_copies()
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:54 +02:00
Francisco Jerez
936cd3c87a i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.
Instead of directly using intel_obj->buffer.  Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take.  Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <nanleychery@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-05-23 16:21:34 -07:00
Francisco Jerez
e989acb03b i965: Handle non-zero texture buffer offsets in buffer object range calculation.
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer.  Found by
inspection.

v2: Protect against out-of-range BufferOffset (Nanley).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-05-23 16:21:28 -07:00
Francisco Jerez
156d2c6e62 i965: Move buffer texture size calculation into a common helper function.
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way.  E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly.  It's easier to fix it all in one
place.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-05-23 16:21:09 -07:00
Francisco Jerez
5a68147803 Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"
This reverts commit c0ed52f614.  It was
preventing the image format validation from being done on buffer
textures, which is required to ensure that the application doesn't
attempt to bind a buffer texture with an internal format incompatible
with the image unit format (e.g. of different texel size), which is
not allowed by the spec (it's not allowed for *any* texture target,
whether or not there is spec wording restricting this behavior
specifically for buffer textures) and will cause the driver to
calculate texel bounds incorrectly and potentially crash instead of
the expected behavior.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106465
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-05-23 16:21:09 -07:00
Bas Nieuwenhuizen
699e1f5aac ac: Use DPP for build_ddxy where possible.
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.

This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.

v2: Use ac_build_quad_swizzle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-23 21:02:45 +02:00
Miguel Casas
b73b340c37 i965: add {X,A}BGR2101010 to 'intel_image_formats'
This patch adds {X,A}BGR2101010 entries to the list of supported
'intel_image_formats'.

Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-23 10:19:04 -07:00
Miguel Casas
432df741e0 dri_util: Add R10G10B10{A,X}2 translation between DRI and mesa_format.
Add R10G10B10{A,X}2 translation between mesa_format and DRI format
to driGLFormatToImageFormat() and driImageFormatToGLFormat().

Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-23 10:17:45 -07:00
Dylan Baker
c8acfd5ab2 bin/get-pick-listh.sh: force git --pretty=medium
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-05-23 09:54:17 -07:00
Dylan Baker
5a639bdb81 bin/bugzilla_mesa.sh: explicitly set the --pretty argument
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-05-23 09:54:00 -07:00
Eric Engestrom
ec986241f3 docs: drop unnecessary out-of-frame target
I'm guessing an earlier version of the website used to have the page
contents in <frames>, but this isn't the case anymore so just drop the
unnecessary `target="_main"` :)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-23 16:52:23 +01:00
Eric Engestrom
09a6cb7be6 docs: fix various html tags mistakes
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-23 16:52:23 +01:00
Eric Engestrom
8034f5f623 docs: fix < & > used in html code
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-23 16:52:23 +01:00
Juan A. Suarez Romero
6db0660d08 docs: add news notes to 18.1.0
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-05-23 13:06:55 +02:00
Dave Airlie
f2f464de57 tgsi/scan: add hw atomic to the list of memory accessing files
This fixes 4 out of 5 cases in:
arb_framebuffer_no_attachments-atomic on cayman.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
2018-05-23 03:51:40 +01:00
Roland Scheidegger
7b89fcec41 llvmpipe: improve rasterization discard logic
This unifies the explicit rasterization discard as well as the implicit
rasterization disabled logic (which we need for another state tracker),
which really should do the exact same thing.
We'll now toss out the prims early on in setup with (implicit or
explicit) discard, rather than do setup and binning with them, which
was entirely pointless.
(We should eventually get rid of implicit discard, which should also
enable us to discard stuff already in draw, hence draw would be
able to skip the pointless clip and fallback stages in this case.)
We still need separate logic for only null ps - this is not the same
as rasterization discard. But simplify the logic there and don't count
primitives simply when there's an empty fs, regardless of depth/stencil
tests, which seems perfectly acceptable by d3d10.
While here, also fix statistics for primitives if face culling is
enabled.
No piglit changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-05-23 04:23:32 +02:00
Bas Nieuwenhuizen
047438287c ac/surface/gfx6: Don't force a tile index for fmask.
The bpe of the fmask often differs from the bpe of the main
surface. On SI that means it has to get a different tile
index.

addrlib is capable of figuring this out itself, so just pass
-1 instead to let it know that it is not preset.

Fixes: 9bf3570fed "ac/surface/gfx6: compute FMASK together with the color surface"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-23 02:23:03 +02:00
Jason Ekstrand
a347a5a12c i965: Remove ring switching entirely
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:39 -07:00
Jason Ekstrand
b499b85b0f i965/miptree: Move the access_raw call to the individual map functions
The only function that doesn't need to call access_raw is map_blit.  If
it takes the blitter path, it will happen as part of intel_miptree_copy.
If map_blit takes the blorp path, brw_blorp_copy_miptrees will handle
doing whatever resolves are needed.  This should save us resolves in
quite a few cases and will probably help performance a bit.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:37 -07:00
Jason Ekstrand
f566a1264c i965: Remove support for the BLT ring
We still support the blitter on gen4-5 but it's on the same ring as 3D.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:35 -07:00
Jason Ekstrand
33affda8bf i965/miptree: Use blorp for blit maps on gen6+
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:34 -07:00
Jason Ekstrand
0eedb0fca9 i965/miptree: Use blorp for validation tex copies on gen6+
It's faster than the blitter and can handle things like stencil properly
so it doesn't require software fallbacks.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:32 -07:00
Jason Ekstrand
80fc3896f3 i965: Delete the blitter path for CopyTexSubImage
The blorp path (called first) can do anything the blitter path can do so
it's just dead code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:31 -07:00
Jason Ekstrand
8162256b01 i965: Don't fall back to the blitter in BlitFramebuffer
On gen4-5, we try the blitter before we even try blorp.  On newer
platforms, blorp can do everything the blitter can so there's no point
in even having the blitter fall-back path.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:29 -07:00
Jason Ekstrand
e596563b08 i965: Remove some unused includes of intel_blit.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:27 -07:00
Jason Ekstrand
a9499374a9 i965/blit: Delete intel_emit_linear_blit
This function is no longer used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:25 -07:00
Jason Ekstrand
7fd962093f i965: Use meta for pixel ops on gen6+
Using meta for anything is fairly aweful and definitely has more CPU
overhead.  However, it also uses the 3D pipe and is therefore likely
faster in terms of GPU time than the blitter.  Also, the blitter code
has so many early returns that it's probably not buying us that much.
We may as well just use meta all the time instead of working over-time
to find the tiny case where we can use the blitter.  We keep gen4-5
using the old blit paths to avoid perturbing old hardware too much.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-22 15:46:20 -07:00
Kenneth Graunke
92f01fc5f9 i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
We'd like to start using soft-pin to assign BO addresses up front, and
never move them again.  Our previous plan for dealing with 48-bit VF
cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
have addresses that alias in the low 32 bits.  But that requires moving
buffers dynamically.

This patch tracks the last seen BO address for each vertex/index buffer,
and emits a VF cache invalidate if the high bits change.  (Ideally, we
won't hit this case very often.)  This should work for the soft-pin
case, but unfortunately won't work in the relocation case, as we don't
actually know the addresses.  So, we have to use both methods.

v2: Mention that the cache uses a <VertexBufferIndex, Address> tuple
    more explicitly (suggested by Scott).  Mention "single batch" too
    (suggested by Chris).

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-22 10:02:28 -07:00
Kenneth Graunke
c7259259d4 i965: Introduce a "memory zone" concept on BO allocation.
We're planning to start managing the PPGTT in userspace in the near
future, rather than relying on the kernel to assign addresses.  While
most buffers can go anywhere, some need to be restricted to within 4GB
of a base address.

This commit adds a "memory zone" parameter to the BO allocation
functions, which lets the caller specify which base address the BO will
be associated with, or BRW_MEMZONE_OTHER for the full 48-bit VMA.

Eventually, I hope to create a 4GB memory zone corresponding to each
state base address.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-05-22 10:01:09 -07:00
Jason Ekstrand
417b9e5770 intel/eu: Set EXECUTE_1 when setting the rounding mode in cr0
Fixes: d6cd14f213 "i965/fs: Define new shader opcode to..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2018-05-22 09:53:23 -07:00
Michel Dänzer
fe2edb25dd dri3: Stricter SBC wraparound handling
Prevents corrupting the upper 32 bits of draw->recv_sbc when
draw->send_sbc resets to 0 (which currently happens when the window is
unbound from a context and bound to one again), which in turn caused
loader_dri3_swap_buffers_msc to calculate target_msc with corrupted
upper 32 bits. This resulted in hangs with the Xorg modesetting driver
as of xserver 1.20 (older versions and other drivers ignored the upper
32 bits of the target MSC, which is why this wasn't noticed earlier).

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/106351
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2018-05-22 17:59:53 +02:00
Samuel Pitoiset
75e919c045 radv: fix computation of user sgprs for 32-bit pointers
With 32-bit pointers we only need one user SGPR per desc set.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:29 +02:00
Samuel Pitoiset
c5536fc813 radv: drop user_sgpr_info::sgpr_count
It's only used inside allocate_user_sgprs().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:26 +02:00
Samuel Pitoiset
36a4d6d081 radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.

Vega10:
Totals from affected shaders:
SGPRS: 1008722 -> 1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size: 34819208 -> 35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)

Polaris10:
Totals from affected shaders:
SGPRS: 1029849 -> 1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size: 34915336 -> 35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:22 +02:00
Samuel Pitoiset
b654ef5808 radv: add set_loc_shader_ptr() helper
This helper will hep for switching to 32-bit GPU pointers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:20 +02:00
Samuel Pitoiset
14a7547c08 radv: allocate descriptor BOs in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:18 +02:00
Samuel Pitoiset
0d1406ad12 radv: allocate the upload BO in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:17 +02:00
Samuel Pitoiset
d8a61d3232 radv: set amdgpu-32bit-address-high-bits LLVM attribute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:15 +02:00
Samuel Pitoiset
fe2649d3ad radv/winsys: allow to allocate BOs in the 32-bit addr space
This introduces a new flag called RADEON_FLAG_32BIT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:13 +02:00
Samuel Pitoiset
b60e0ee789 radv/winsys: request high address
This is needed for 32-bit GPU pointers. Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:09 +02:00
Anuj Phogat
0748383a60 i965/glk: Add l3 banks count for 2x6 configuration
2x6 configuration with pci-id 0x3185 has same number of
banks (2) as 3x6 configuration (pci-id 0x3184).

Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eb23be1d97 "i965: Add and initialize l3_banks field for gen7+"
Cc: Francisco Jerez <currojerez@riseup.net>
2018-05-21 16:43:26 -07:00
Vinson Lee
85f61197df v3d: Include v3d_drm.h path.
Fix build error.

  CC       v3d_blit.lo
In file included from v3d_blit.c:27:0:
v3d_context.h:39:10: fatal error: v3d_drm.h: No such file or directory
 #include "v3d_drm.h"
          ^~~~~~~~~~~

Fixes: 8a793d42f1 ("v3d: Switch the vc5 driver to using the finalized V3D UABI.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-21 11:15:47 -07:00
Samuel Pitoiset
73df16dcee radv: fix centroid interpolation
It's legal to set the centroid and sample interpolation modes
when MSAA disabled. So, we have to initialize the centroid
inputs because the hardware doesn't.

This fixes rendering issues with DXVK and The Witness, World of
Warcraft, Trackmania and probably more games.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-21 13:57:46 +02:00
Bas Nieuwenhuizen
f26b008e28 radv: Cleanup unused prime blit path.
Since we have the common WSI code, we use vkCmdCopyImageToBuffer
instead.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-21 10:33:41 +02:00
Bas Nieuwenhuizen
a63a0960e3 radv: Fix SRGB compute copies.
SRGB stores are broken. We had compensation code in the
resolve path but none in the copy path. Since we don't
want any conversion and it does not matter for DCC,
just make everything UNORM instead.

This happened to cause wrong colors for the PRIME path, as
that uses image->buffer copies which always use the compute
path.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106587
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-21 10:33:41 +02:00
Tapani Pälli
63525ba730 android: enable VK_ANDROID_native_buffer
Patch changes entrypoints generator to not skip this extension even
though it is set as disabled in the xml. We also need compilation
flag VK_USE_PLATFORM_ANDROID_KHR to be enabled.

It looks like this extension got disabled in commit 69f447553c.

v2: just remove the whole 'supported' attrib check + remove
    vk_icd.h compilation fix (fix in VulkanHeaders instead)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-21 09:26:50 +03:00
Tapani Pälli
437acae704 vulkan: update vk_icd.h to current upstream
Import from commit eb0c1fd on branch 'master'
of https://github.com/KhronosGroup/Vulkan-Headers.git.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-21 09:26:50 +03:00
Dave Airlie
bfa74bb44d virgl: set texture buffer offset alignment to disable ARB_texture_buffer_range.
The host side hasn't got support for this feature yet, so don't enable it
unless we get the caps from the host.

This makes the texture buffer range piglit tests skip now.

Fixes: fe0647df5a (virgl: add offset alignment values to to v2 caps struct)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-05-21 12:44:55 +10:00
Timothy Arceri
2e6c987a85 mesa: stop hiding query parameters from OpenGL compat
Just let the extension detection do its job as we will be adding
compat profile support in future, also we want these to work
with compat profile version overrides.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-21 09:39:03 +10:00
Christoph Haag
549e54270b radv: fix VK_EXT_descriptor_indexing
GetPhysicalDeviceProperties2KHR() was crashing because features was null

Fixes: 0e10790558 "radv: Enable VK_EXT_descriptor_indexing."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-20 13:36:07 +02:00
Bas Nieuwenhuizen
a1c87235a9 ac/surface: Only align linear power of two fmt textures.
We're not sharing 32_32_32 formats between different GPUs, so we
do not have to align for vega on pre-vega cards.

Fixes: e361970ed7 "radv: Add support for IMG_DATA_FORMAT_32_32_32."
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-20 11:57:59 +02:00
Bas Nieuwenhuizen
62e0e089d7 amd/addrlib: Use defines in autotools build.
Otherwise stuff like NDEBUG would not be passed through.

CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106479
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-20 11:57:59 +02:00
Aaron Watry
cfe582f9dc r600/compute: Mark several functions as static
They're not used anywhere else, so keep them private

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2018-05-19 10:22:16 -05:00
Aaron Watry
d21e64c626 r600/compute: Remove unused compute_memory_pool functions
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2018-05-19 10:21:57 -05:00
Roland Scheidegger
6f558fb0f7 draw: get rid of special logic to not emit null tris
I've confirmed after 77554d220d we no
longer need this to pass some tests from another api (as we no longer
generate the bogus extra null tris in the first place).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-05-19 02:49:58 +02:00
Dylan Baker
c86e9a5fe5 docs: Add sha sums for release 2018-05-18 16:44:50 -07:00
Dylan Baker
1d46852830 docs: Add release notes for 18.1.0 2018-05-18 16:44:43 -07:00
Alyssa Rosenzweig
5d85a0a55b nir: Implement optional b2f->iand lowering
This pass is required by the Midgard compiler; our instruction set uses
NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
Normally, this lowering pass would be implemented in a backend-specific
algebraic pass, but this conflicts with the existing iand->b2f pass in
nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
existing pass optional (default on -- all other backends should remain
unaffected), adding an optional pass for lowering the opposite
direction.

v2: Defer lowering until late algebraic optimisations to allow
optimising the b2f instruction itself.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-05-18 22:44:09 +02:00
Jan Vesely
8ed2cabd04 travis: Adapt to radeonsi dropping support for LLVM 4
meson Vulkan, Clover, and autotools Vulkan need to be switched to llvm 5

Fixes: f9eb1ef870
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-18 13:59:37 -04:00
Marek Olšák
3d64ed5785 radeonsi: skip ES output stores for undefined output components
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-18 13:38:07 -04:00
Nanley Chery
0ab25f05ab i965: isl: Move the MCS gen7+ assertion into ISL
This is useful for every user of ISL. Drop the comment along the way to
match similar functions in ISL.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-18 09:53:06 -07:00
Nanley Chery
f88caf2321 i965/miptree: Remove format assertion in alloc_aux
intel_miptree_supports_{ccs,mcs,hiz} ensures the format is valid for the
color or depth miptree before the miptree is assigned an aux_usage.
alloc_aux switches on the aux_usage so don't assert that the format is
valid.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-18 09:53:06 -07:00
Nanley Chery
8007b2d78b i965/miptree: Simplify the switch in supports_ccs
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-18 09:53:06 -07:00
Nanley Chery
da98441fef i965: Make get_ccs_surf succeed in alloc_aux
Synchronize the requirements listed in isl_surf_get_ccs_surf with
intel_miptree_supports_ccs by importing a restriction from ISL. Some
implications:
* We successfully create every aux_surf in alloc_aux
* We only return false from alloc_aux if we run out of memory

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-18 09:53:06 -07:00
Brian Paul
42aee8f4f6 llvmpipe: fix check for a no-op shader
The tgsi_info.num_tokens fix broke llvmpipe's detection of no-op shaders.
Fix the code to check for num_instructions <= 1 instead.

Fixes: 8fde9429c3 ("tgsi: fix incorrect tgsi_shader_info::num_tokens
computation")
Tested-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-05-18 09:09:41 -06:00
Samuel Pitoiset
03c4816093 radv: pass radv_nir_compiler_options directly to create_llvm_function()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-18 11:07:01 +02:00
Christian Gmeiner
2eb3f794d9 st/mesa: only define GLSL 1.4 for compat if driver supports it
Currently GLSL 1.4 is defined for all gallium drivers even only
GLSL 1.2 is supported as seen on etnaviv.

v1 -> v2:
 - use _min(..) as suggested by Lucas Stach and Michel Dänzer

Fixes: 4560aad780 ("mesa: add GLSLVersionCompat constant")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-18 10:46:24 +02:00
Dave Airlie
48e28ab961 vbo: remove MaxVertexAttribStride assert check.
Some drivers (virgl) don't support GL4.4 or GLES3.1 yet,
so never fill in this const.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-05-18 14:58:15 +10:00
Timothy Arceri
c0c69bd8dd mesa: drop GL_EXT_polygon_offset support
glPolygonOffset() has been part of the GL standard since 1.1. Also
niether AMD or Nvidia support this in their binary drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61761
2018-05-18 09:21:24 +10:00
Brian Paul
8fde9429c3 tgsi: fix incorrect tgsi_shader_info::num_tokens computation
We were incrementing num_tokens in each loop iteration while parsing
the shader.  But each call to tgsi_parse_token() can consume more than
one token (and often does).  Instead, just call the tgsi_num_tokens()
function.

Luckily, this issue doesn't seem to effect any current users of this
field (llvmpipe just checks for <= 1, for example).

Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-05-17 15:02:05 -06:00
Samuel Pitoiset
fcba3934fc radv: add radv_emit_shader_pointer() helper
For future work (support for 32-bit GPU pointers).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:59 +02:00
Samuel Pitoiset
9b2c310a70 radv: add some helpers for cleaning up radv_get_preamble_cs()
Because this function looks a bit ugly to me.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:57 +02:00
Marek Olšák
f9eb1ef870 amd: remove support for LLVM 4.0
It doesn't support GFX9.

Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-17 14:54:41 -04:00
Juan A. Suarez Romero
11a0d5563f docs: update calendar, add news and link release notes to 18.0.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2018-05-17 18:45:26 +00:00
Juan A. Suarez Romero
042e21976a docs: add sha256 checksums for 18.0.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 69ef6e4a75)
2018-05-17 18:40:53 +00:00
Juan A. Suarez Romero
bb7750e8da docs: add release notes for 18.0.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 3b49ab6219)
2018-05-17 18:40:51 +00:00
Mathias Fröhlich
6fac626193 mesa: The glArrayElement api is independent of the current program.
All the shader program dependent handling is done on the level
of the gl_Context::Array._DrawVAO/_DrawVAOEnabledAttribs.
So, skip array element invalidation on _NEW_PROGRAM.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-17 20:13:40 +02:00
Mathias Fröhlich
984cb4e512 mesa: Flag _NEW_ARRAY only if we are changing ctx->Array.VAO.
For the VAO internal helper functions that may be called
with a non current VAO, flag the _NEW_ARRAY state only
if it is the current ctx->Array.VAO.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-17 20:13:39 +02:00
Mathias Fröhlich
5c7e3a90ed mesa: Remove flush_vertices argument from VAO methods.
The flush_vertices argument is now unused, remove it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-17 20:13:39 +02:00
Mathias Fröhlich
9c7be67968 mesa: Remove FLUSH_VERTICES from VAO state changes.
Pending draw calls on immediate mode or display list calls do
not depend on changes of the VAO state. So, remove calls to
FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-17 20:13:39 +02:00
Juan A. Suarez Romero
0a2c947556 docs: add 18.0.5 in the release calendar
Mesa 18.1 series has not been released yet, so let's extend 18.0 lifetime.

v2: Add missing closing TR tags (Eric Engestrom)

CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2018-05-17 19:01:19 +02:00
Alok Hota
936ce75285 swr/rast: Added FEClipRectangles event
and also added some comments

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-17 10:53:14 -05:00
Alok Hota
a33d376133 swr/rast: Whitespace and tab-to-spaces changes
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-17 10:53:10 -05:00
Alok Hota
7970fcff25 swr/rast: fix VCVTPD2PS generation for AVX512
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-17 10:53:06 -05:00
Alok Hota
a0dddac1cb swr/rast: Rectlist support for GS
Add rectlist as an option for GS.  Needed to support some driver
optimizations.

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-17 10:53:01 -05:00
Alok Hota
7926d18fa5 swr/rast: Remove unneeded virtual from methods
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-17 10:52:21 -05:00
Stefan Schake
b0acc3a562 broadcom/vc4: Native fence fd support
With the syncobj support in place, lets use it to implement the
EGL_ANDROID_native_fence_sync extension. This mostly follows previous
implementations in freedreno and etnaviv.

v2: Drop the flags (Eric)
    Handle in_fence_fd already in job_submit (Eric)
    Drop extra vc4_fence_context_init (Eric)
    Dup fds with CLOEXEC (Eric)
    Mention exact extension name (Eric)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:30 +01:00
Stefan Schake
44036c354d broadcom/vc4: Store job fence in syncobj
This gives us access to the fence created for the render job.

v2: Drop flag (Eric)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:28 +01:00
Stefan Schake
9ed05e2520 broadcom/vc4: Detect syncobj support
We need to know if the kernel supports syncobj submission since otherwise
all the DRM syncobj calls fail.

v2: Use drmGetCap to detect syncobj support (Eric)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:26 +01:00
Stefan Schake
4fc0ebdff5 broadcom/vc4: Bump libdrm requirement
Require a version of libdrm with syncobj support.

v2: Don't require a libdrm_vc4, just bump core libdrm if vc4 enabled (by
    anholt)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:24 +01:00
Stefan Schake
580d1f4c60 drm-uapi: Update vc4 header with syncobj submit support
v2: Synchronized with kernel v2
v3: Update for the finalized kernel ABI (pad2 field)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:21 +01:00
Stefan Schake
1ec01a911b broadcom/vc4: Drop libdrm_vc4 requirement
This was missed in the move back to the local uapi copy.
libdrm_vc4 only seems to consist of headers that also exist in the
Mesa tree.

Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-17 16:04:12 +01:00
Eric Anholt
97894b1267 v3d: Add support for glSampleMask / glSampleCoverage. 2018-05-17 15:09:46 +01:00
Eric Anholt
9bbc3f8cf1 v3d: Enable NaN propagation in the VS and CS as well.
Fixes piglit vs-isnan-*.shader_test at the expense of gl-1.0-spot-light.
2018-05-17 15:09:12 +01:00
Nanley Chery
edfb57c0a0 i965/blorp: Disable BLORP clear color updates
With the previous patches, we now update the indirect clear color buffer
every time the clear color changes. Avoid redundant updates.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:42 -07:00
Nanley Chery
02f5512fed intel/blorp: Add a NO_UPDATE_CLEAR_COLOR batch flag
Allow callers to handle updating the indirect clear color buffer
themselves. This can reduce the number of clear color updates in the
case where a caller performs multiple fast clears with the same clear
color.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:42 -07:00
Nanley Chery
f8ac11d69f i965/blorp: Also skip the fast clear if the clear color differs
If the aux state is CLEAR and clear color value has changed, only the
surface state must be updated. The bit-pattern in the aux buffer is
exactly the same.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:42 -07:00
Nanley Chery
43616404be i965/clear: Drop a stale comment in fast_clear_depth
This comment made more sense when it was above the calls to
intel_miptree_slice_set_needs_depth_resolve(). We stopped using these
functions at commit 554f7d6d02
("i965: Move depth to the new resolve functions").

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
82849fb6d5 i965: Update the indirect buffer in set_clear_color
For depth buffers, we avoid fast-clearing if the aux_state is already
CLEAR. We do the same for color buffers only if the clear color
doesn't change. We require that the clear colors match because, in
that case, we don't update the indirect clear color outside of BLORP.

Update the indirect clear color for color buffers as well. We'll
enable the same depth buffer optimization for color buffers in a
later patch.

Note that we're now actually updating the indirect clear color twice
in the case where we use BLORP to perform the fast-clear. This is
only temporary. In later patches, we'll prevent BLORP from performing
the update.

v2: Add more context to the commit message (Topi).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-17 07:06:41 -07:00
Nanley Chery
5b315f3ad1 i965/clear: Remove an early return in fast_clear_depth
Reduce complexity and allow the next patch to delete some code. With
this change, clear operations will still be skipped and setting the
aux_state will cause no side-effects.

Remove the associated comment which implies an early return.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
6f609ca609 i965: Use set_clear_color for depth miptrees
Reduce code duplication now and prevent it in the following commits.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
92a0a87b6f Revert "i965: Make the miptree clear color setter take a gl_color_union"
This reverts commit 1d94aa1987.

The next patch will make depth miptrees use the clear color setter that
was originally being used for color miptrees. Go back to using the
isl_color_value parameter because it's the same type as the
fast_clear_color field used by color and depth miptrees.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
bb18af82c3 i965/miptree: Unify aux buffer allocation
There isn't much that changes between the aux allocation functions.
Remove the duplicated code.

v2: Inline the switch statement (Jason).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
6c41a2ef3b i965: Prepare to delete intel_miptree_alloc_ccs()
We're going to delete intel_miptree_alloc_ccs() in the next commit. With
that in mind, replace the use of this function in
do_single_blorp_clear() with intel_miptree_alloc_aux() and move the
delayed allocation logic to it's callers.

v2: Duplicate the delayed allocation comment (Topi Pohjolainen).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
beed9c4550 i965/miptree: Drop the mt param from alloc_aux_buffer
Drop an unused parameter.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
6b1836aabe i965/miptree: Drop the alloc_flags param from alloc_aux_buffer
We have enough information to determine the optimal flags internally.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
3dd7f600e0 i965/miptree: Drop the name param from alloc_aux_buffer
A name of "aux-miptree" should be sufficient.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
58d99a21f1 i965/miptree: Initialize the indirect clear color to zero
The indirect clear color isn't correctly tracked in
intel_miptree::fast_clear_color. The initial value of ::fast_clear_color
is zero, while that of the indirect clear color is undefined.

Topi Pohjolainen discovered this issue with MCS buffers. This issue is
apparent when fast-clearing an MCS buffer for the first time with
glClearColor = {0.0,}. Although the indirect clear color is undefined,
the initial aux state of the MCS is CLEAR and the tracked clear color is
zero, so we avoid updating the indirect clear color with {0.0,}.

Make the indirect clear color match the initial value of
::fast_clear_color.

Note: although we only have to drop HiZ's BO_ALLOC_BUSY flag for gen10+,
we also drop it pre-gen10 to keep things simple. We add this flag back
for pre-gen10 in a later patch.

v2: Add a note about dropping HiZ's BO_ALLOC_BUSY flag (Topi).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
b58675e93f i965/miptree: Add and use a memset option in alloc_aux_buffer
Add infrastructure for initializing the clear color BO.
intel_miptree_init_mcs is no longer needed with change.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
8a9491058d i965/miptree: Zero-initialize CCS_D buffers
Before this patch, the aux_state was actually AUX_INVALID because the BO
was never defined. This was fine on single slice miptrees because we
would fast-clear the resource right after creation. For multi-slice
miptrees on SKL+ however, this results in undefined behavior when
accessing a non-base slice. Here's a specific example:

1) Fast clear level 0
   * Undefined CCS_D buffer allocated in "PASS_THROUGH" state.
   * Level 0 transitions to the CLEAR state.
2) Render to level 1
   * Level 1 may have a 2-bit pattern of 2's.
   * Rendering with a 2 in the CCS is undefined.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Nanley Chery
816f2dc67d i965/miptree: Fix handling of uninitialized MCS buffers
Before this patch, if we failed to initialize an MCS buffer, we'd
end up in a state in which the miptree thinks it has an MCS buffer,
but doesn't. We also leaked the clear_color_bo if it existed.

With this patch, we now free the miptree aux buffer resources and let
intel_miptree_alloc_mcs() know that the MCS buffer no longer exists.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-17 07:06:41 -07:00
Samuel Pitoiset
1fba2e10b3 radv: only declare the ESGS rings for pre GFX9 chips
GFX9 uses LDS instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:20 +02:00
Samuel Pitoiset
d349d4bd24 radv: allow to print GPU info with RADV_DEBUG=info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:17 +02:00
Samuel Pitoiset
56d53ed1d6 radv: do not emit unnecessary ES output stores
GFX9:
Totals from affected shaders:
SGPRS: 472 -> 464 (-1.69 %)
VGPRS: 576 -> 584 (1.39 %)
Code Size: 45432 -> 44324 (-2.44 %) bytes
Max Waves: 40 -> 40 (0.00 %)

VI:
SGPRS: 720 -> 720 (0.00 %)
VGPRS: 728 -> 728 (0.00 %)
Code Size: 45348 -> 43992 (-2.99 %) bytes
Max Waves: 120 -> 120 (0.00 %)

This affects Rise of Tomb Raider and the three Vulkan demos
that use a geometry shader (geometryshader, deferredshadows
and viewportarray).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:13 +02:00
Samuel Pitoiset
a6e44d1271 radv: do not emit unnecessary GS output stores
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:11 +02:00
Samuel Pitoiset
507402ada6 radv: only pass the global BO list at submit time if enabled
That way the winsys might use a faster path when the global
BO list is NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:27 +02:00
Samuel Pitoiset
6211799aff radv: remove the radv_finishme() when compiling shaders
Having an entrypoint different than "main" doesn't mean we
have multiple shaders per module.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:24 +02:00
Samuel Pitoiset
1e86eaf7d8 radv: remove radv_device::llvm_supports_spill
It's always true.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:21 +02:00
Timothy Arceri
f71714022b mesa: add glUniform*ui{v} support to display lists
Fixes: a017c7ecb7 "mesa: display list support for uint uniforms"

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78097
2018-05-17 13:07:48 +10:00
Dieter Nützel
7f1dc93357 radeonsi: create .gitignore
Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-05-16 21:48:17 -04:00
Dave Airlie
eba4cf797c ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic
Drop the use of the old intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-17 11:46:53 +10:00
Eric Anholt
b2e7c32703 v3d: Fix wiring filters to NEAREST for 32-bit texture returns.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104626
2018-05-16 21:19:07 +01:00
Eric Anholt
795488d2bf v3d: Enable the driver by default.
Now that we have a stabilized ABI and a fairly conformant driver, turn it
on.
2018-05-16 21:19:07 +01:00
Eric Anholt
01ae6a9181 v3d: Rename driver functions from vc5 to v3d.
This is the final step of the driver rename.
2018-05-16 21:19:07 +01:00
Eric Anholt
8c47ebbd23 v3d: Rename the driver files from "vc5" to "v3d". 2018-05-16 21:19:07 +01:00
Eric Anholt
c4c488a2ae v3d: Rename the vc5_dri.so driver to v3d_dri.so.
This allows the driver to load against the merged kernel DRM driver.  In
the process, rename most of the build system variables and gallium
plumbing functions.
2018-05-16 21:19:07 +01:00
Eric Anholt
8a793d42f1 v3d: Switch the vc5 driver to using the finalized V3D UABI.
In the process of merging to the kernel, I renamed the driver to the
general product line's name (since we have both vc5 and vc6 supported
already).  Since the ABI is finalized, move the header to include/drm-uapi.
2018-05-16 21:19:07 +01:00
Charmaine Lee
33a86acd78 svga: fix incompatible bind flags at buffer validation time
At buffer resource validation time, if the resource handle is not yet
created and if the initial buffer bind flags and the tobind flags are
incompatible, just use the tobind flags to create the resource handle.
On the other hand, if the bind flags are compatible, we can combine
the bind flags for the resource handle creation.

Fixes piglit gl-3.1-buffer-bindings crash.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-05-16 13:04:16 -06:00
jenny.q.cao
1261b34cd5 mesa: cast the GLenum16 to GLint to avoid compile warning on android
Cast the enum to GLint to avoid the compile warning:
/src/mesa/main/get.c:3005:19:
warning: comparison of constant -32768 with expression of type
'GLenum16' (aka 'unsigned short') is always false
-Wtautologicalia-constant-out-of-range-compare

Tests: compilation without this warning
Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-05-16 13:02:43 -06:00
Stuart Young
f806cc9eb6 etnaviv: Fix missing rnndb file in tarballs
Seems that when the rnndb files for etniviv were updated/included back
in Nov 2017, hw/texdesc_3d.xml.h was missed from Makefile.sources and
meson.build. This was all during the conversion to meson, so it apears
to have slipped through the cracks. As such, this file has been missing
from the official tarballs since inclusion in Mesa, so the git trees
and tarballs differ.

Found due to lintian errors in the Debian packages.

Fixes: f1e1c60ff6 ("etnaviv: Update from rnndb")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-05-16 19:36:10 +02:00
Matthias Groß
71892fbe19 gallium/hud: add frametime graph (v2)
Thanks for your comment. This version has an additional boolean in the
fps_info struct to distinguish between fps and frame time calculation.
The struct is initialised in the respecting install functions for this
purpose.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-05-15 19:30:12 -04:00
Jan Vesely
f3521ce2c4 eg/compute: Use reference counting to handle compute memory pool.
Use pipe_reference to release old RAT surfaces.
RAT surface adds a reference to pool bo, so use reference counting for pool->bo
as well.

v2: Use the same pattern for both defrag paths
    Drop confusing comment

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-05-15 19:01:47 -04:00
Roland Scheidegger
e01af38d6f gallivm: Use alloca_undef with array type instead of alloca_array
Use a single allocation of array type instead of the old-style array
allocation for the temp and immediate arrays.
Probably only makes a difference if they aren't used indirectly (so,
if we used them solely because there's too many temps or immediates).
In this case the sroa and early-cse passes can sometimes do some
optimizations which they otherwise cannot.
(As a side note, for the temp reg array, we actually really should
use one allocation per array id, not just one for everything.)
Note that the instcombine pass would actually promote such
allocations to single alloc of array type as well, but it's too late
for some artificial shaders we've seen to help (we don't want to run
instcombine at the beginning due to its cost, hence would need
another sroa/cse pass after instcombine). sroa/early-cse help there
because they can actually eliminate all of the huge shader, reducing
it to a single const output (don't ask...).
(Interestingly, instcombine also removes all the bitcasts we do on that
allocation for single-value gathering, and in the end directly indexes
into the single vector elements, which according to spec is only
semi-valid, but this happens regardless. Another thing instcombine also
does is use inbound GEPs, which is probably something we should do
manually as well - for indirectly indexed reg files llvm may not be
able to figure it out on its own, but we should be able to guarantee
all pointers are always inbound. In any case, by the looks of it
using single allocation with array type seems to be the right thing
to do even for ordinary shaders.)
No piglit change.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-05-16 00:04:48 +02:00
Dieter Nützel
bd0b6b9f17 radv: add generated files to .gitignore(s)
Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-15 22:53:55 +02:00
Samuel Pitoiset
6bde8c5608 spirv: fix visiting inner loops with same break/continue block
We should stop walking through the CFG when the inner loop's
break block ends up as the same block as the outer loop's
continue block because we are already going to visit it.

This fixes the following assertion which ends up by crashing
in RADV or ANV:

SPIR-V parsing FAILED:
In file ../src/compiler/spirv/vtn_cfg.c:381
block->node.link.next == NULL
0 bytes into the SPIR-V binary

This also fixes a crash with a camera shader from SteamVR.

v2: make use of vtn_get_branch_type() and add an assertion

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106090
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106504
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-15 21:38:19 +02:00
Rob Clark
d89f58a6b8 mesa/st: handle vert_attrib_mask in nir case too
Note, actually fixes 9987a072cb, but the problems don't show up until
19a91841c3.

Fixes: 19a91841c3 st/mesa: Use Array._DrawVAO in st_atom_array.c.
Fixes: 9987a072cb st/mesa: Make the input_to_index array available.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2018-05-15 15:15:33 -04:00
Marek Olšák
3e27b377f2 cso: check count == 0 in cso_set_vertex_buffers
The code didn't expect that, leading to crashes.

Fixes: 86d63b53a2 "gallium: remove aux_vertex_buffer_slot code"

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2018-05-15 12:36:27 -04:00
Rob Clark
dace607245 vc5: use util_copy_framebuffer_state
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-15 08:48:13 -04:00
Rob Clark
dae4c98dd7 vc4: use util_copy_framebuffer_state
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-15 08:47:35 -04:00
Rob Clark
f897b67dc1 freedreno/a5xx: remove fd5_shader_stateobj
Extra level of indirection that serves no purpose.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-05-15 08:46:46 -04:00
Rob Clark
d48a2404a2 freedreno/a4xx: remove fd4_shader_stateobj
Extra level of indirection that serves no purpose.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-05-15 08:46:46 -04:00
Rob Clark
2c40f2ba32 freedreno/a3xx: remove fd3_shader_stateobj
Extra level of indirection that serves no purpose.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-05-15 08:46:46 -04:00
Rob Clark
273f7d8404 freedreno: fence should hold a ref to pipe
Since the fence can outlive the context, and all it really needs to wait
on a fence is the pipe, use the new fd_pipe reference counting to hold a
ref to the pipe and drop the ctx pointer.

This fixes a crash seen with (for example) glmark2:

  #0  fd_pipe_wait_timeout (pipe=0xbf48678b3cd7b32b, timestamp=0, timeout=18446744073709551615) at freedreno_pipe.c:101
  #1  0x0000ffffbdf75914 in fd_fence_finish (pscreen=0x561110, ctx=0x0, fence=0xc55c10, timeout=18446744073709551615) at ../src/gallium/drivers/freedreno/freedreno_fence.c:96
  #2  0x0000ffffbde154e4 in dri_flush (cPriv=0xb1ff80, dPriv=0x556660, flags=3, reason=__DRI2_THROTTLE_SWAPBUFFER) at ../src/gallium/state_trackers/dri/dri_drawable.c:569
  #3  0x0000ffffbecd8b44 in loader_dri3_flush (draw=0x558a28, flags=3, throttle_reason=__DRI2_THROTTLE_SWAPBUFFER) at ../src/loader/loader_dri3_helper.c:656
  #4  0x0000ffffbecbc36c in glx_dri3_flush_drawable (draw=0x558a28, flags=3) at ../src/glx/dri3_glx.c:132
  #5  0x0000ffffbecd91e8 in loader_dri3_swap_buffers_msc (draw=0x558a28, target_msc=0, divisor=0, remainder=0, flush_flags=3, force_copy=false) at ../src/loader/loader_dri3_helper.c:827
  #6  0x0000ffffbecbcfc4 in dri3_swap_buffers (pdraw=0x5589f0, target_msc=0, divisor=0, remainder=0, flush=1) at ../src/glx/dri3_glx.c:587
  #7  0x0000ffffbec98218 in glXSwapBuffers (dpy=0x502bb0, drawable=2097154) at ../src/glx/glxcmds.c:840
  #8  0x000000000040994c in CanvasGeneric::update (this=0xfffffffff400) at ../src/canvas-generic.cpp:114
  #9  0x0000000000411594 in MainLoop::step (this=this@entry=0x5728f0) at ../src/main-loop.cpp:108
  #10 0x0000000000409498 in do_benchmark (canvas=...) at ../src/main.cpp:117
  #11 0x00000000004071b0 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:210

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-05-15 08:46:46 -04:00
Rob Clark
a8c0daa172 freedreno: batch cache doesn't hold a ref to batch
The cache doesn't hold a (strong) reference to the batch.  So we
shouldn't be trying to drop a reference, as that leads to:

   #0  0x0000ffffbecb37a0 in raise () from /lib64/libc.so.6
   #1  0x0000ffffbeca159c in abort () from /lib64/libc.so.6
   #2  0x0000ffffbecacf48 in __assert_fail_base () from /lib64/libc.so.6
   #3  0x0000ffffbecacfa8 in __assert_fail () from /lib64/libc.so.6
   #4  0x0000ffffbd28def0 in pipe_reference_described (ptr=0x4f47130, reference=0x0, get_desc=0xffffbd2e0f08 <__fd_batch_describe>) at ../src/gallium/auxiliary/util/u_inlines.h:88
   #5  0x0000ffffbd28e188 in fd_batch_reference_locked (ptr=0x4f40de0, batch=0x0) at ../src/gallium/drivers/freedreno/freedreno_batch.h:258
   #6  0x0000ffffbd28e9a8 in fd_bc_invalidate_resource (rsc=0x4f40ca0, destroy=true) at ../src/gallium/drivers/freedreno/freedreno_batch_cache.c:244
   #7  0x0000ffffbd293778 in fd_resource_destroy (pscreen=0xedc170, prsc=0x4f40ca0) at ../src/gallium/drivers/freedreno/freedreno_resource.c:644
   #8  0x0000ffffbd922674 in u_transfer_helper_resource_destroy (pscreen=0xedc170, prsc=0x4f40ca0) at ../src/gallium/auxiliary/util/u_transfer_helper.c:144
   #9  0x0000ffffbd29527c in pipe_resource_reference (ptr=0x4f455d8, tex=0x0) at ../src/gallium/auxiliary/util/u_inlines.h:144
   #10 0x0000ffffbd29548c in fd_surface_destroy (pctx=0x1012720, psurf=0x4f455d0) at ../src/gallium/drivers/freedreno/freedreno_surface.c:78
   #11 0x0000ffffbd1f9c48 in pipe_surface_reference (ptr=0x4f471d0, surf=0x0) at ../src/gallium/auxiliary/util/u_inlines.h:113
   #12 0x0000ffffbd1f9ef4 in util_copy_framebuffer_state (dst=0x4f471c8, src=0x0) at ../src/gallium/auxiliary/util/u_framebuffer.c:114
   #13 0x0000ffffbd2e0e30 in __fd_batch_destroy (batch=0x4f47130) at ../src/gallium/drivers/freedreno/freedreno_batch.c:225
   #14 0x0000ffffbd28e1b0 in fd_batch_reference_locked (ptr=0xfffffffff010, batch=0x0) at ../src/gallium/drivers/freedreno/freedreno_batch.h:262
   #15 0x0000ffffbd28e6b0 in fd_bc_invalidate_context (ctx=0x1012720) at ../src/gallium/drivers/freedreno/freedreno_batch_cache.c:190
   #16 0x0000ffffbd2e2b6c in fd_context_destroy (pctx=0x1012720) at ../src/gallium/drivers/freedreno/freedreno_context.c:139
   #17 0x0000ffffbd2c3280 in fd5_context_destroy (pctx=0x1012720) at ../src/gallium/drivers/freedreno/a5xx/fd5_context.c:56
   #18 0x0000ffffbd5b7a8c in st_destroy_context_priv (st=0xfd72f0, destroy_pipe=true) at ../src/mesa/state_tracker/st_context.c:281

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-05-15 08:46:26 -04:00
Eric Engestrom
37d44e2608 docs/meson: mark code/commands as <code>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:33:39 +01:00
Eric Engestrom
5829f616ec docs/meson: replace plaintext url with a link
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:33:36 +01:00
Eric Engestrom
67c550708a docs/meson: fix various html issues
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:33:34 +01:00
Eric Engestrom
dc2dc1fa30 docs/meson: fix various typos
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:33:28 +01:00
Eric Engestrom
6c5df78d8b meson: fix copyright symbol
Fixes: bd68f1013c "autotools, meson: add tileset.h"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:31:46 +01:00
Juan A. Suarez Romero
bd68f1013c autotools, meson: add tileset.h
Fixes: 4e52cb51b5 ("swr/rast: Thread locked tiles improvement")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-15 10:00:11 +02:00
Thomas Hellstrom
3d0b4979ee st/xa: Bump minor
Bump xa minor to signal that the underlying mesa version is suitable for dri3.

This is a bit ugly since it doesn't relate to a specific xa interface change.
Recently there has been a number of fixes in mesa that helps enabling dri3
without any significant regressions in automated testing and common desktop
usage latency. However, the xf86-video-vmware driver has no other way to tell
but inspecting the xa version.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-05-15 09:27:46 +02:00
Dave Airlie
9585e70206 virgl: enable vertex streams when glsl level is high enough.
This enabled the vertex streams out when the host supports
GL4.0.
2018-05-15 14:56:57 +10:00
Kai Wasserbäch
b691d9192c opencl: autotools: Fix linking order for OpenCL target
Otherwise the build fails with an undefined reference to
clang::FrontendTimesIsEnabled.

Bugzilla: https://bugs.freedesktop.org/106209
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: Aaron Watry <awatry@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-05-14 22:45:01 -04:00
Samuel Pitoiset
97b179570c radv: reduce the number of parameters export by the GS copy shader
By using the geometry shader output usage mask.

This improves all Vulkan demos that use a geometry shader
(ie. geometryshader, deferredshadows, viewportarray).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:23 +02:00
Samuel Pitoiset
560bd9eb67 radv: scan the geometry shader output usage mask
For reducing the number of parameters that are exported by
the GS copy shader.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:21 +02:00
Samuel Pitoiset
ea43d935ab radv: run the shader info pass before emitting the GS copy shader
For further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:19 +02:00
Samuel Pitoiset
7cbc6f2621 radv: check that layout isn't NULL in radv_nir_shader_info_pass()
An upcoming patch will run the shader info pass on the
geometry shader just before emitting the GS copy shader.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:17 +02:00
Jason Ekstrand
18f8200a99 intel/blorp: Use linear formats for CCS_E clear colors in copies
It's clear that the original code meant to do this and there is even a
10-line comment explaining why.  Originally, we had a simple function
for packing the clear colors which was unaware of sRGB.  However, in
a6b66a7b26, when we started using ISL to do the packing, the wrong
format was used.

Fixes: a6b66a7b26 "intel/blorp: Use ISL instead of bitcast_color..."
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-14 10:41:26 -07:00
Bas Nieuwenhuizen
f944a59996 radv: Disable texel buffers with A2 SNORM/SSCALED/SINT for pre-vega.
The hardware always interprets the alpha as unsigned and fixing it
in the shader is going to add unacceptable overheads.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 18:58:30 +02:00
Bas Nieuwenhuizen
3d4d388e39 radv: Fix up 2_10_10_10 alpha sign.
Pre-Vega HW always interprets the alpha for this format as unsigned,
so we have to implement a fixup to do the sign correctly for signed
formats.

v2: Improve indexing mess.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 18:58:20 +02:00
Bas Nieuwenhuizen
e361970ed7 radv: Add support for IMG_DATA_FORMAT_32_32_32.
Basic sampling support for linear tiling.

No CTS regressions, but it seems the blitting coverage is not very
extensive.

https://bugs.freedesktop.org/show_bug.cgi?id=106331
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 18:58:12 +02:00
Bas Nieuwenhuizen
dd102405de radv: Translate logic ops.
radeonsi could pass them through but the enum changed between
Gallium and Vulkan, so we have to translate.

In progress I made the register defines a bit more readable.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100430
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 16:49:06 +02:00
Bas Nieuwenhuizen
62f50df7b7 radv: Fix multiview queries.
This moves the extra queries to after the main query ended, instead
of doing it after the begin and hence doing nesting.

We also emit only (view count - 1) extra queries, as the main query
is already there for the first view.

This fixes the CTS occasionally getting stuck in
dEQP-VK.multiview.queries* waiting on results.

Fixes: 32b4f3c38d "radv/query: handle multiview queries properly. (v3)"
CC: 18.1 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 16:49:06 +02:00
Eric Engestrom
f0cdc39b13 meson: remove dependency antipattern
`dep_valgrind != []` now (0.45) produces a warning that is quite explicit:
  WARNING: Trying to compare values of different types (DependencyHolder, list) using !=.
  The result of this is undefined and will become a hard error in a future Meson release.

`dep_valgrind = []` used to be the recommended way to deal with
non-existant dependency, but these don't work with `.found()`, so now
the recommended way is to declare a impossible dependency, which
null_dep does for us in Mesa.

In short, we don't need and shouldn't check for `!= []` anywhere anymore.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2018-05-14 14:55:36 +01:00
Samuel Pitoiset
ece398277c radv: remove useless check in radv_create_shaders()
radv_can_dump_shader() already handles if module is NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-14 12:38:01 +02:00
Samuel Pitoiset
8ade3e4684 radv: allow to dump the GS copy shader with RADV_DEBUG="shaders"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-14 12:38:00 +02:00
Samuel Pitoiset
553418af1e radv: move {load,store}_var intrinsics scanning in different functions
These are going to be crazy and we are probably going to add
more scan stuff in the future. Also use switch cases instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-14 12:37:58 +02:00
jenny.q.cao
ff7521c9ba android: change include "cutils/log.h" to "log/log.h" on Android API >=26
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning

Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-14 08:08:31 +03:00
Roland Scheidegger
cf3fb42fb5 llvmpipe: Fix random number generation for unit tests
We were never producing negative numbers for signed types.
Also fix only producing half the valid range for uint32, and
properly clamp signed values.

Because this now also properly tests snorm with actually negative
values, need to increase eps for such conversions. I believe these
cannot actually be hit in ordinary operation (e.g. if a snorm texture
is sampled and output to snorm RT, it will still go through snorm->float
and float->snorm conversion), so don't bother to do anything to fix
the bad accuracy (might be quite complex).
Basically, the issue is for something like snorm16->snorm8 that in the
end this will just use a 8 bit arithmetic right shift.
But the math behind it says we should actually do a division by 32767 / 127, which
is ~258, not 256. So the result can be one bit off (values have too large
magnitude), and furthermore, the shift has incorrect rounding (always rounds
down). For positive numbers, these errors have different direction, but
for negative ones they have the same, hence for some values the error will
be 2 bit in the end.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=106232
2018-05-14 03:14:00 +02:00
Dave Airlie
5978d54a09 radv: use compute path for multi-layer images.
I don't think the hw resolve path can't handle multi-layer images.

This fixes all the:
dEQP-VK.renderpass.multisample_resolve.layers_*
tests on my VI card.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
2018-05-14 08:57:54 +10:00
Dave Airlie
98dbaa445a radv: resolve all layers in compute resolve path.
This path should iterate across all layers, I've some ideas
for doing this in a single pass, but this is simpler for now.

This passes the tests because we don't use the fragment path
unless we have DCC, and we don't have DCC on layered images.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
2018-05-14 08:57:27 +10:00
Dave Airlie
b16fc6cda1 radv/resolve: do fmask decompress on all layers.
For a multi-layer subpass resolve we want to make sure we flush all
the layers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
2018-05-14 08:56:47 +10:00
Rhys Perry
8f6cbb8c7d nvc0: fix setting of subpixel precision during conservative rasterization
Fixes: 07dac3e040 ("nvc0: add conservative rasterization support")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-13 13:21:41 -04:00
Rhys Perry
c879011c72 anv,nir: add generated files to .gitignore(s)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-12 20:14:49 -07:00
Marek Olšák
86d63b53a2 gallium: remove aux_vertex_buffer_slot code
The slot index is always 0, and is pretty unlikely to change in the future.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-05-12 21:08:09 -04:00
Timothy Arceri
ce188813bf radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.

We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.

With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.

V2: add bit to radv_pipeline_key

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
2018-05-13 09:58:33 +10:00
Vinson Lee
26ddc4f9e1 scons: Add PROGRAM_NIR_FILES.
Fix SCons build error.

  Linking build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1.5 ...
build/linux-x86_64-debug/mesa/libmesa.a(st_program.os): In function `st_translate_prog_to_nir':
src/mesa/state_tracker/st_program.c:392: undefined reference to `prog_to_nir'

Fixes: 5c33e8c772 ("st/nir: use NIR for asm programs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2018-05-12 00:50:05 -07:00
Timothy Arceri
5c33e8c772 st/nir: use NIR for asm programs
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-12 14:48:21 +10:00
Timothy Arceri
0b3e9564bd st/nir: make st_nir_opts() available externally
The following patch will make use of this for asm style programs.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-05-12 14:48:21 +10:00
Boyuan Zhang
0907d3ab9c radeon/vce: add firmware support for ver 53 and up
All vce firmwares with major version greater than or equal to 53 are supported

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2018-05-11 14:59:00 -04:00
Rob Clark
a7c81a7f67 etnaviv: remove pipe_fence_handle::ctx
A fence can outlive the ctx it was created from (see glmark2).. etnaviv
doesn't actually need fence->ctx so lets remove it before someone makes
the mistake of assuming it is a valid pointer.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-05-11 18:42:13 +02:00
George Kyriazis
4e52cb51b5 swr/rast: Thread locked tiles improvement
- Change tilemgr TILE_ID encoding to use Morton-order (Z-order).
- Change locked tiles set to bitset.  Makes clear, set, get much faster.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:26:35 -05:00
George Kyriazis
8238c791dc swr/rast: Add Builder::GetVectorType()
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:25:47 -05:00
George Kyriazis
8cb55dae2e swr/rast: Prepend the console output with a newline
It can get jumbled with output from other threads.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:25:24 -05:00
George Kyriazis
db25fcfcde swr/rast: Add ConcatLists()
for concatenating lists

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:22:57 -05:00
George Kyriazis
dcaca3c7b3 swr/rast: Add constant initializer for uint64_t
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:22:17 -05:00
George Kyriazis
70f0a28b83 swr/rast: Use binner topology to assemble backend attributes
Previously was using the draw topology, which may change if GS or Tess
are active. Only affected attributes marked with constant interpolation,
which limited the impact.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:21:52 -05:00
George Kyriazis
b3b0f0e0ec swr/rast: Change formatting
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-05-11 11:21:22 -05:00
Ville Syrjälä
659910eda0 meson: Fix build for egl platform_x11 with dri3
platform_x11 with dri3 needs inc_loader.

In file included from ../src/egl/drivers/dri2/platform_x11_dri3.c:35:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory
In file included from ../src/egl/drivers/dri2/platform_x11.c:46:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory
In file included from ../src/egl/drivers/dri2/egl_dri2.c:61:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory

Cc: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2018-05-11 17:41:57 +03:00
Samuel Pitoiset
efc10949cc radv: move ac_build_if_state on top of radv_nir_to_llvm.c
These helpers will be needed for future work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-11 12:35:07 +02:00
Samuel Pitoiset
3a410f0afc radv: minor cleanups in radv_fill_shader_variant()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-11 12:35:05 +02:00
Jan Vesely
58272c1ad7 winsys/amdgpu: Destroy dev_hash table when the last winsys is removed.
Fixes memory leak on module unload.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-10 23:23:50 -04:00
Marek Olšák
a2e9d9b4c1 ac/gpu_info: add has_read_registers_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:11 -04:00
Marek Olšák
9b1fdfc541 ac/gpu_info: add has_2d_tiling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:10 -04:00
Marek Olšák
d26696283d ac/gpu_info: add has_sparse_vm_mappings
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:08 -04:00
Marek Olšák
125adc92ad ac/gpu_info: add has_unaligned_shader_loads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:07 -04:00
Marek Olšák
8b9694da4b radeonsi: expose ARB_query_buffer_object on ancient kernels too
It doesn't use indirect dispatches.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:04 -04:00
Marek Olšák
e9c08bc658 ac/gpu_info: add has_indirect_compute_dispatch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:03 -04:00
Marek Olšák
64265ac8d5 ac/gpu_info: add kernel_flushes_tc_l2_after_ib
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:01 -04:00
Marek Olšák
14c5a93bfa ac/gpu_info: add has_format_bc1_through_bc7
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:40:00 -04:00
Marek Olšák
2bd2c173e8 ac/gpu_info: add has_eqaa_surface_allocator
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:58 -04:00
Marek Olšák
e720cb6135 radeonsi: clean up the reset status query implementation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:57 -04:00
Marek Olšák
3060f62340 ac/gpu_info: add has_bo_metadata
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:56 -04:00
Marek Olšák
09f1bab483 ac/gpu_info: add si_TA_CS_BC_BASE_ADDR_allowed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:54 -04:00
Marek Olšák
8b58a14ef7 ac/gpu_info: add htile_cmask_support_1d_tiling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:53 -04:00
Marek Olšák
b81149e258 ac/gpu_info: add kernel_flushes_hdp_before_ib
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:39:47 -04:00
Marek Olšák
a969f184cf radeonsi: add an environment variable that forces EQAA for MSAA allocations
This is for testing and experiments.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:37 -04:00
Marek Olšák
2309cedf44 radeonsi: set up EQAA image descriptors properly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:36 -04:00
Marek Olšák
7ac4ef097d radeonsi: add EQAA SC,DB,CB register programming
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:34 -04:00
Marek Olšák
9d00580e75 radeonsi: support creating EQAA color textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:32 -04:00
Marek Olšák
912b0163dc ac/surface: add EQAA support
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:34:31 -04:00
Marek Olšák
ee31762ef5 radeonsi: use better sample locations for 8x EQAA
Verified with the piglit MSAA accuracy test.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:32:57 -04:00
Marek Olšák
4b6df225f7 radeonsi: improve quality of 16 sample locations
This results in better 16x and 8x quality when using these locations.
Verified with the piglit MSAA accuracy test.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:29:02 -04:00
Marek Olšák
01fd543c82 radeonsi: use better sample locations for 4x MSAA
Discovered by luck. Verified with the piglit MSAA accuracy test.
It also shows that the worst case EQAA 16s4f results in very good 4x MSAA
in the worst case.

Nine might not like these positions, but they are prettier to the eye and
GL doesn't care.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:28:12 -04:00
Marek Olšák
8d8b71ccfa radeonsi: reorder sample locations as required by EQAA
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:27:46 -04:00
Marek Olšák
5769a5ec01 radeonsi: simplify si_get_sample_position
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
9f456b3a3c radeonsi: simplify arrays of sample locations
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
3d70b5beae radeonsi: set DB_EQAA the same as Vulkan
These never change, but they only affect EQAA, which isn't implemented.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
b5ed039325 radeonsi: remove CM_ prefixes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
656fd607be radeonsi: don't update clear color registers if they don't change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
835095973d radeonsi: remove r600_fmask_info
radeon_surf contains almost everything.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
bdc3e410f7 ac/surface: unify common legacy and gfx9 fmask fields
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
9bf3570fed ac/surface/gfx6: compute FMASK together with the color surface
instead of invoking FMASK computation separately.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:33 -04:00
Marek Olšák
276acda835 ac/surface/gfx9: fix a typo in CMASK RB/pipe alignment
No change in behavior because it's always aligned.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
6841845b00 ac: set correct LLVM processor names for Raven & Vega12
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
6f7f10d285 ac: sort raster configs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
e7b82a9978 ac: remove 1 RB raster config for Iceland
Iceland always reports 2 RBs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
cb0f5cddcc ac: move the Fiji kernel workaround for raster config out of the switch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
ce954ac6f3 ac: enable both RBs on Kaveri
This can result in 2x increase in performance on non-harvested Kaveris.

v2: don't do it on radeon

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Marek Olšák
597b9e8810 radeonsi/gfx9: work around a GPU hang due to broken indirect indexing in LLVM
Fixes: 6d19120da8 "radeonsi/gfx9: workaround for INTERP with indirect indexing"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 18:26:32 -04:00
Jason Ekstrand
b784561c1a intel/isl/storage: Don't lower most UNORM formats on gen11+
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-05-10 14:13:24 -07:00
Jason Ekstrand
399962e7c6 intel/isl: Several UNORM formats support typed writes on gen11+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-05-10 14:12:55 -07:00
Brian Paul
e4211b36bb mesa: revert GL_[SECONDARY_]COLOR_ARRAY_SIZE glGet type to TYPE_INT
Since size can be 3, 4 or GL_BGRA we need to keep these glGet types
as TYPE_INT, not TYPE_UBYTE.

Fixes: d07466fe18 ("mesa: fix glGetInteger/Float/etc queries for
vertex arrays attribs")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106462
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2018-05-10 09:49:40 -06:00
Andres Rodriguez
34e9e4023f radv: disable DCC for shareable images on GFX9+
This seems to be broken at the moment for opengl interop.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-10 11:27:12 -04:00
Thomas Petazzoni
54bbe600ec configure.ac: rework -latomic check
The configure.ac logic added in commit
2ef7f23820 ("configure: check if
-latomic is needed for __atomic_*") makes the assumption that if a
64-bit atomic intrinsic test program fails to link without -latomic,
it is because we must use -latomic.

Unfortunately, this is not completely correct: libatomic only appeared
in gcc 4.8, and therefore gcc versions before that will not have
libatomic, and therefore don't provide atomic intrinsics for all
architectures. This issue was for example encountered on PowerPC with
a gcc 4.7 toolchain, where the build fails with:

powerpc-ctng_e500v2-linux-gnuspe/bin/ld: cannot find -latomic

This commit aims at fixing that, by not assuming -latomic is
available. The commit re-organizes the atomic intrinsics detection as
follows:

 (1) Test if a program using 64-bit atomic intrinsics links properly,
     without -latomic. If this is the case, we have atomic intrinsics,
     and we're good to go.

 (2) If (1) has failed, then test to link the same program, but this
     time with -latomic in LDFLAGS. If this is the case, then we have
     atomic intrinsics, provided we link with -latomic.

This has been tested in three situations:

 - On x86-64, where atomic instrinsics are all built-in, with no need
   for libatomic. In this case, config.log contains:

   GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE='#'
   GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE=''
   LIBATOMIC_LIBS=''

   This means: atomic intrinsics are available, and we don't need to
   link with libatomic.

 - On NIOS2, where atomic intrinsics are available, but some of them
   (64-bit ones) require using libatomic. In this case, config.log
   contains:

   GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE='#'
   GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE=''
   LIBATOMIC_LIBS='-latomic'

   This means: atomic intrinsics are available, and we need to link
   with libatomic.

 - On PowerPC with an old gcc 4.7 toolchain, where 32-bit atomic
   instrinsics are available, but not 64-bit atomic instrinsics, and
   there is no libatomic. In this case, config.log contains:

   GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE=''
   GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE='#'

   With means that atomic intrinsics are not usable.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2018-05-10 08:13:57 -07:00
Brian Paul
d07466fe18 mesa: fix glGetInteger/Float/etc queries for vertex arrays attribs
The vertex array Size and Stride attributes are now ubyte and short,
respectively.  The glGet code needed to be updated to handle those
types, but wasn't.

Fixes the new piglit test gl-1.5-get-array-attribs test.

v2: fix inadvertant whitespace change, change COLOR_ARRAY_SIZE to UBYTE,
misc fixes suggested by Justin

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106450
Fixes: d5f42f96e1 ("mesa: shrink size of gl_array_attributes (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2018-05-10 08:08:11 -06:00
Jan Vesely
45dfa6f4e7 winsys/radeon: Destroy fd_hash table when the last winsys is removed.
Fixes memory leak on module unload.
v2: Use util_hash_table helper function

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
2018-05-10 05:12:48 -04:00
Jan Vesely
d146768d13 gallium/auxiliary: Add helper function to count the number of entries in hash table
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
2018-05-10 05:12:43 -04:00
Samuel Pitoiset
0defc55547 radv: move handling nosisched option in a better place
It's a per-application optimization, so it makes more sense
to do that in radv_handle_per_app_options().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-10 10:57:41 +02:00
Grazvydas Ignotas
4fdce205dd radv: assorted typo fixes
Trivial.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-10 11:50:46 +03:00
Mathias Fröhlich
f660683027 mesa/vbo/tnl: Move gl_vertex_array related stuff to tnl.
The only remaining users of gl_vertex_array are tnl based
drivers. So move everything related to that into tnl and
rename it accordingly.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
881d2fcafa mesa: Remove Array._DrawArrays.
Only tnl based drivers still use this array. So remove it
from core mesa and use Array._DrawVAO instead.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
899476b6b1 i965: Remove the now unused gl_vertex_array.
Was meant to be temporary in i965.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
0fabd55306 i965: Remove the gl_vertex_array indirection.
For now store binding and attrib in brw_vertex_element.
The i965 driver still provides lots of opportunity to make use
of the unique binding information in the VAO which is currently not
taken from the VAO.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
172c9a908f i965: Implement all_varyings_in_vbos in terms of Array._DrawVAO.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
79eb6ab7b6 st/mesa: Remove the now unused gl_vertex_array.
Was meant to be temporary in gallium.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
4c77f0d065 st/mesa: Make feedback draw and rasterpos use _DrawVAO.
Instead of playing with Array._DrawArrays, make the feedback draw
path use Array._DrawVAO. Also st_RasterPos needs to use the VAO then.

v2: Use helper methods to get the offset values for array and binding.
    Update comments.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:16 +02:00
Mathias Fröhlich
19a91841c3 st/mesa: Use Array._DrawVAO in st_atom_array.c.
Finally make use of the binding information in the VAO when
setting up arrays for draw.

v2: Emit less relocations also for interleaved userspace arrays.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:15 +02:00
Mathias Fröhlich
9987a072cb st/mesa: Make the input_to_index array available.
The input_to_index array is already available internally
when preparing vertex programs. Store the map in
struct st_vertex_program.
Also store the bitmask of mesa vertex processing inputs in
struct st_vp_variant.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:15 +02:00
Mathias Fröhlich
f24bf45210 st/mesa: Use _DrawVAO for edgeflag enabled check.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:15 +02:00
Mathias Fröhlich
d1698d4311 mesa: Compute effective buffer bindings in the vao.
Compute VAO buffer binding information past the position/generic0 mapping.
Scan for duplicate buffer bindings and collapse them into derived
effective buffer binding index and effective attribute mask variables.
Provide a set of helper functions to access the distilled
information in the VAO. All of them prefixed with _mesa_draw_...
to indicate that they are meant to query draw information.

v2: Also group user space arrays containing interleaved arrays.
    Add _Eff*Offset to be copied on attribute and binding copy.
    Update comments.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2018-05-10 07:06:15 +02:00
Gert Wollny
fb4011ace9 virgl: Add support for passing GL_ANY_SAMPLES_PASSED_CONSERVATIVE
This is needed for fixing CTS:
   dEQP-GLES3.functional.occlusion_query.conservative*

Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
2018-05-10 12:26:57 +10:00
Dave Airlie
ce027ac5c7 r600: fix constant buffer bounds.
If you have an indirect access to a constant buffer on r600/eg
use a vertex fetch in the shader. However apps have expected
behaviour on those out of bounds accessess (even if illegal).

If the constants were being uploaded as part of a larger
upload buffer, we'd set the range of allowed access to a lot
larger than required so apps would get values back from
other parts of the upload buffer instead of the expected out
of bounds access.

This fixes rendering bugs in Trine and Witcher 1, thanks
to iive for nagging me effectively until I figured it out :-)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91808
Cc: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-05-10 02:14:32 +01:00
Jason Ekstrand
a8a740f272 i965,anv: Set the CS stall bit on the ISP disable PIPE_CONTROL
From the bspec docs for "Indirect State Pointers Disable":

    "At the completion of the post-sync operation associated with this
    pipe control packet, the indirect state pointers in the hardware are
    considered invalid"

So the ISP disable is a post-sync type of operation which means that it
should be combined with a CS stall.  Without this, the simulator throws
an error.

Fixes: 766d801ca "anv: emit pixel scoreboard stall before ISP disable"
Fixes: f536097f6 "i965: require pixel scoreboard stall prior to ISP disable"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-09 18:03:28 -07:00
Dave Airlie
56766b8515 radv: handle arrays in the fmask descriptor.
This fixes the fmask descriptor generation to handle 2d ms arrays.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-10 10:42:49 +10:00
Matt Turner
0f959215c3 gallium/tests: Fix assignment of EXTRA_DIST
Fixes: 6754c2e83d ("autotools: Include new meson files")
2018-05-09 16:38:47 -07:00
Matt Turner
0097940223 configure.ac: Check for grep with AC_PROG_GREP
Perhaps with a new version of autoconf, I began seeing:

| checking the name lister (/usr/bin/nm -B) interface... ./configure: line 6973: External.*some_variable: command not found
| BSD nm

This is because AC_PROG_NM expands to

	...
	if $GREP 'External.*some_variable' conftest.out > /dev/null; then
	    lt_cv_nm_interface="MS dumpbin"
	fi
	...

I'm not sure if it's a bug in AC_PROG_NM that it doesn't call
AC_PROG_GREP, but it's easy enough for us to do it.
2018-05-09 16:38:47 -07:00
Xiong, James
0ab266dc1b main: fail texture_storage() call if the size is not okay
Signed-off-by: Xiong, James <james.xiong@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-10 09:34:31 +10:00
Xiong, James
08c1444c95 main: return 0 length when the queried program object's not linked
Signed-off-by: Xiong, James <james.xiong@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-10 09:34:19 +10:00
Kenneth Graunke
a83face48a i965: Shut up unused variable warnings.
These are only used in assertions.
2018-05-09 16:20:50 -07:00
Ross Burton
1755654d9f src/intel/Makefile.vulkan.am: add missing MKDIR_GEN
Out of tree builds can try to write into a directory that doesn't exist yet:

| Traceback (most recent call last):
|   File "../../../mesa-18.0.2/src/intel/vulkan/anv_icd.py", line 46, in <module>
|     with open(args.out, 'w') as f:
| IOError: [Errno 2] No such file or directory: 'vulkan/intel_icd.x86_64.json'
| Makefile:4882: recipe for target 'vulkan/intel_icd.x86_64.json' failed

Add missing MKDIR_GEN calls to solve this.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-09 16:08:52 -07:00
Rhys Perry
5ac16ed047 mesa: fix error handling in get_framebuffer_parameteriv
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-09 14:32:40 -07:00
Lionel Landwerlin
766d801ca3 anv: emit pixel scoreboard stall before ISP disable
We want to make sure that all indirect state data has been loaded into
the EUs before disable the pointers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes: 78c125af39 ("anv/gen10: Ignore push constant packets during context restore.")
2018-05-09 20:11:57 +01:00
Lionel Landwerlin
f536097f67 i965: require pixel scoreboard stall prior to ISP disable
Invalidating the indirect state pointers might affect a previously
scheduled & still running 3DPRIMITIVE (causing page fault). So stall
on pixel scoreboard before that.

v2: Fix compile issue :(

v3: Stall on pixel scoreboard

v4: Drop the post sync operation (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes: ca19ee33d7 ("i965/gen10: Ignore push constant packets during context restore.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106243
2018-05-09 20:11:51 +01:00
Jason Ekstrand
561348caa1 intel/isl: Allow CCS_E on 1010102 formats
On CNL and above, CCS_E supports 1010102 formats and R11G11B10F.  We had
shut them off during early enabling because blorp_copy couldn't handle
them.  Now it can handle 1010102 formats so we can turn them back on.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
ccb44b8a94 intel/blorp: Allow CCS copies of 1010102 formats
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
1978de66f7 intel/blorp: Add support for more format bitcasting
nir_format_bitcast_uint_vec_unmasked can only be used to cast between
formats with uniform channel sizes.  In particular, it cannot handle
10_10_10_2 formats.  By making use of the NIR helper for uint vector
casts, we should now be able to bitcast between any two uint formats so
long as their channels are in RGBA order (possibly with channels
missing).  In order to do this we need to rework the key a bit to pass
the actual formats instead of just the number of bits in each.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
7998fe268e intel/blorp: Use nir_format_bitcast_uint_vec_unmasked
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
047e68389f nir/format_convert: Add code for bitcasting vectors
This is a fairly direct port from blorp.  The only real change is that
the nir_format_convert version doesn't assume that everything is a vec4.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
a6b66a7b26 intel/blorp: Use ISL instead of bitcast_color_value_to_uint
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
09ced65420 intel/isl: Add format conversion code
This adds helpers to ISL to convert an isl_color_value to and from
binary data encoded with a given isl_format.  The conversion is done
using ISL's built-in format introspection so it's fairly slow as format
conversions go but it should be fine for a single pixel value.  In
particular, we can use this to convert clear colors.

As a side-effect, we now rely on the sRGB helpers in libmesautil so we
need to tweak the build system a bit.  All prior uses of src/util in ISL
were header-only.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
8152c60e01 intel/isl/format: Get rid of the ALPHA colorspace
Alpha-only formats are just linear.  There's no need to specially
deliminate them as being in their own colorspace.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
8ab73790ef intel/isl/format: Add field locations informations to channel_layout
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
96598fbc02 intel/isl/format: Add a column for channel order to the table
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
d08d6a3da8 i965/blorp: Remove a pile of blorp_blit restrictions
Previously, blorp could only blit into something that was renderable.
Thanks to recent additions to blorp, it can now blit into basically
anything so long as it isn't compressed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
465d8566cd i965/blorp: Allow blorp blits for 16x MSAA
BLORP has supported 16x MSAA for quite a while now, we just never
bothered to enable it for CopyTexSubImage.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
09eede9c9d anv: Allow blitting to/from any supported format
Now that blorp handles all the cases, why not?  The only real change we
have to make is to stop using anv_swizzle_for_render() in blorp_blit
because it doesn't work for B4G4R4A4 and blorp now natively handles that.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
8ce31c9cc5 intel/blorp: Support the RGB workaround on more formats
Previously we only supported UINT formats because that's what blorp_copy
required.  If we want to use it in blorp_blit, however, we need to
support everything.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
4e26e3dea9 intel/blorp: Silently convert RGBX destination formats to RGBA
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
08cd834996 intel/isl: Add some helpers for working with RGBX formats
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
804856fa57 intel/blorp: Handle more exotic destination formats
This commit adds support for the following formats as destination
formats even though the hardware does not support rendering to them:

 - ISL_FORMAT_R24_UNORM_X8_TYPELESS
 - ISL_FORMAT_A4B4G4R4_UNORM
 - ISL_FORMAT_L8_UNORM_SRGB
 - ISL_FORMAT_R9G9B9E5_SHAREDEXP

This is done by using a different format and emitting shader code to
fake it the rest of the way.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
9e492bb92e intel/blorp: Include nir_format_convert.h in blorp_blit.c
nir_mask_shift_or is now defined in nir_format_convert.h so we can
delete the copy in blorp_blit.c.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
9981709d8f nir/format_convert: Add a function to pack RGB9_E5 formats
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
4e337b42f9 nir/format_convert: Add pack/unpack for R11F_G11F_B10F
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
98156b0019 nir/format_convert: Add linear <-> sRGB helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
2fdd966e3d nir: Add the start of a format conversion helper header
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
906c32ce87 intel/blorp: Add swizzle support for all hardware
This commit makes blorp capable of swizzling anything even on hardware
that doesn't support texture swizzle.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
1ef4f5aff1 intel/isl: Add a helper for inverting swizzles
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
242f6f7492 intel/isl: Add a helper for composing swizzles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
dad67cc245 intel/isl: Add an isl_swizzle_supports_rendering helper
This helper encodes more details, specifically about Haswell, than the
previous asserts in isl_surface_state.c.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
23d703de1f i965/surface_state: Use an identity swizzle pre-Haswell
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-09 11:16:33 -07:00
Jason Ekstrand
293b8de161 blorp: Handle the RGB workaround more like other workarounds
The previous version was sort-of strapped on in that it just adjusted
the blit rectangle and trusted in the fact that we would use texelFetch
and round to the nearest integer to ensure that the component positions
matched.  This new version, while slightly more complicated, is more
accurate because all three components end up with exactly the same
dst_pos and so they will get interpolated and sampled at the same
texture coordinate.  This makes the workaround suitable for using with
scaled blits.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-05-09 11:16:33 -07:00
Lionel Landwerlin
3853f1c6f4 i965: silence unused variable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2dc29e095f ("i965: Don't leak blorp on Gen4-5.")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-05-09 18:12:10 +01:00
Lionel Landwerlin
11d36c373a intel: devinfo: silence coverity warning
It's just not possible to have a device with no subslices.

CID: 1433511
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-05-09 15:21:01 +01:00
Michel Dänzer
6f81e07ecb dri3: Only update number of back buffers in loader_dri3_get_buffers
And only free no longer needed back buffers there as well.

We want to stick to the same back buffer throughout a frame, otherwise
we can run into various issues.

Bugzilla: https://bugs.freedesktop.org/105906
Bugzilla: https://bugs.freedesktop.org/106399
Fixes: 3160cb86aa "egl/x11: Re-allocate buffers if format is suboptimal"
Reported-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2018-05-09 15:40:41 +02:00
Samuel Iglesias Gonsálvez
2cf64fdb46 anv: ignore pColorBlendState if all color attachments of the subpass are unused
According to Vulkan spec:

  "pColorBlendState is a pointer to an instance of the
   VkPipelineColorBlendStateCreateInfo structure, and is ignored if the
   pipeline has rasterization disabled or if the subpass of the render pass the
   pipeline is created against does not use any color attachments."

Fixes tests from CL#2505:

   dEQP-VK.renderpass.*.simple.color_unused_omit_blend_state

v2:
- Check that blend is not NULL before usage.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-09 07:01:10 +02:00
Timothy Arceri
e7a7b712fe mesa: remove hard-coded OpenGL 3.2 compat limit
Just let validate_context_version() do it instead. This fixes
MESA_GL_VERSION_OVERRIDE for compat, it will also allow us to
enable new compat versions on a per driver bases in future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-09 14:24:43 +10:00
Timothy Arceri
4560aad780 mesa: add GLSLVersionCompat constant
This allows drivers to define what version of GLSL they support
in compat. This will be needed in order to support compat 3.2
without breaking drivers that wont support it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-09 14:24:36 +10:00
Timothy Arceri
be3ee9d141 mesa: dont call _mesa_override_glsl_version() in _mesa_init_constants()
All drivers that support GLSL will later set their default GLSL versions
overriding this override call. They currently all call
 _mesa_override_glsl_version() again later in order to support overrides.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-09 14:24:29 +10:00
Timothy Arceri
2a621acc8d mesa: dont set GLSLVersion in _mesa_init_constants()
Just leave it as 0 and let the drivers set it (as they already do)
to avoid redundantly initialising it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-09 14:24:22 +10:00
Jan Vesely
0783399d79 pipe-loader: Free driver_name in error path
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-08 21:35:07 -04:00
Brian Paul
901db25d5b glsl: change ast_type_qualifier bitset size to work around GCC 5.4 bug
Change the size of the bitset from 128 bits to 96.  This works around an
apparent GCC 5.4 bug in which bad SSE code is generated, leading to a
crash in ast_type_qualifier::validate_in_qualifier() (ast_type.cpp:654).

This can be repro'd with the Piglit test tests/spec/glsl-1.50/execution/
varying-struct-basic-gs-fs.shader_test

Bugzilla:https://bugs.freedesktop.org/show_bug.cgi?id=105497
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-08 19:06:09 -06:00
Kenneth Graunke
20f06bc72b i965: Dump validation list on INTEL_DEBUG=bat,submit.
This is really useful when debugging any sort of buffer management
issues, so just printing it during INTEL_DEBUG=bat,submit seems
reasonable.  With bat, we're already spamming so much output that
it doesn't really hurt.  With submit, it's still easy to grep for
the older information, and the new information is nice too.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-08 10:08:16 -07:00
Jason Ekstrand
06d3841882 i965/miptree: Remove redundant fields from intel_miptree_aux_buffer
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:46 -07:00
Jason Ekstrand
4f4779b367 i965: Simplify brw_emit_depthbuffer and brw_emit_depth_stencil_hiz
Now that we're using ISL, a good chunk of brw_emit_depthstencil is
pointless checks which ISL will do for us anyway.  Since we only have
one manual depth buffer emit function, move the useful bits into it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:45 -07:00
Jason Ekstrand
96f01501d7 i965: Move brw_emit_depth_stencil_hiz higher up in the file
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:45 -07:00
Jason Ekstrand
bdbb527a65 i965: Use ISL for emitting depth/stencil/hiz state on gen6+
We leave gen4-5 alone because the ISL code hasn't really been well-
tested on gen4-5 or with combined depth-stencil because we don't use
BLORP for depth operations on gen4-5.  Also, the gen4-5 code has to deal
with intratile offsets for LOD hacks and ISL doesn't handle those yet.
We could make ISL handle gen4-5 capable or we could just not bother.

Among other things, this should make future platform enabling easier
because it means we don't have to update multiple (or hand-rolled!)
depth stencil emit paths.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:44 -07:00
Jason Ekstrand
ccd3dce3c0 i965: Use the brw_depthbuffer atom on all gens
The only reason why we had two atoms was that the one we used for gen7+
depended on _NEW_DEPTH and _NEW_STENCIL as well as _NEW_BUFFERS.  Since
this is no longer true, we can combine them into one atom.  We do add a
dependence on BRW_NEW_AUX_STATE but that should never get set on gen4-5
so adding it is a no-op for those platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:44 -07:00
Jason Ekstrand
514bb6f41e i965: Always set depth/stencil write enables on gen7+
The hardware will AND these fields with the corresponding fields in
DEPTH_STENCIL_STATE so there's no real reason to toggle them on and off
based on state bits.  This removes our reliance on the _NEW_DEPTH and
_NEW_STENCIL state bits and better matches what ISL does.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:43 -07:00
Jason Ekstrand
c4d00da7b7 i965: Re-order depth/stencil/hiz/clear packets to match ISL
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:27:42 -07:00
Jason Ekstrand
6fc3404911 i965: Re-emit depth/stencil/hiz on BRW_NEW_AUX_STATE
Certain things can change the aux usage or fast clear color of a depth
surface and we want to re-emit if that happens.  For instance, if you do
a fast depth clear of an already clear depth surface, we will just set
the clear color and not do anything else.  In that case, we could fail
to re-emit 3DSTATE_CLEAR_PARAMS and not get the new fast-clear color.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 08:23:55 -07:00
Lionel Landwerlin
3cdf1bf97d intel: devinfo: fix assertion on devices with odd number of EUs
I forgot to change the assert in the second helper function in a
previous change.

This hit the assert() on a Broadwell platform with 1 slice, 3
subslices but all EUs disabled in subslice 1 & 2.

Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-08 15:15:54 +01:00
Bas Nieuwenhuizen
b17cfb08a3 vulkan/wsi: Only use LINEAR modifier for prime if supported.
This was setting the LINEAR modifier if neither the
X server nor the driver supported modifiers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106180
Fixes: c80c08e226 "vulkan/wsi/x11: Add support for DRI3 v1.2"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Abel Garcia Dorta <mercuriete@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-08 15:47:16 +02:00
Jan Vesely
a9e4be9212 eg/compute: Drop reference to kernel_param bo in destructor
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-08 09:02:38 -04:00
Jan Vesely
a1e8fcce3e r600: Cleanup constant buffers on context destruction
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-08 09:02:30 -04:00
Alejandro Piñeiro
b6648798cf mesa/formatquery: remove online compression check on is_resource_supported
is_resource_supported returns if the combination of
target/internalformat is supported in at least one operation. Online
compression is only mandatory for glTexImage2D. Some formats doesn't
support online compression, but can be used in any case, with
glCompressed*D methods.

Without this commit, ETC2 internalformats were returning FALSE, even
for the drivers supporting it. So any other query (like
TEXTURE_COMPRESSED) was returning FALSE/NONE instead of the proper
value.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-08 08:19:38 +02:00
Kenneth Graunke
e6fb8196ce intel/genxml: Assert that genxml field start and ends are sane.
Chris recently fixed a bunch of genxml end < start bugs, as well as
booleans that are wider than a bit.  These are way too easy to write, so
asserting that the fields are sane is a good plan.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-07 23:06:52 -07:00
Kenneth Graunke
f83fd929b7 intel/genxml: Fix some more fake booleans in genxml.
None of these are actually booleans.  Tile Parameter is a tiling mode
enum.  Display pipes take plane numbers.  Predicate Enable has some
operations (and the default value of 6 was particular bogus).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-07 23:06:52 -07:00
Kenneth Graunke
33906eeaca intel/genxml: Make assert in gen_pack_header print a message.
Python's assert can take both a condition and a string, which will cause
it to print the string if the assertion trips.  (You can't use parens as
that creates a tuple.)  Doing "condition and string" works in C, but
doesn't have the desired effect in Python.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-07 23:06:52 -07:00
Kenneth Graunke
2dc29e095f i965: Don't leak blorp on Gen4-5.
We used to only initialize BLORP on Gen6+.  When we added it on Gen4-5,
we forgot to destroy it unconditionally.

Fixes: 752d7af77a (i965: Add blorp support for gen4-5)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-07 23:05:59 -07:00
Matt Turner
ed5af94373 nir: Transform discard_if(true) into discard
Noticed while reviewing Tim Arceri's NIR inlining series.

Without his series:

instructions in affected programs: 16 -> 14 (-12.50%)
helped: 2

With his series:

instructions in affected programs: 196 -> 174 (-11.22%)
helped: 22

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-07 13:50:23 -07:00
Jan Vesely
ea1fff4416 eg/compute: Drop reference on code_bo in destructor.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-07 15:04:03 -04:00
Nicolas Boichat
54ba73ef10 configure.ac/meson.build: Fix -latomic test
When compiling with LLVM 6.0 on x86 (32-bit) for Android, the test
fails to detect that -latomic is actually required, as the atomic
call is inlined.

In the code itself (src/util/disk_cache.c), we see this pattern:
p_atomic_add(cache->size, - (uint64_t)size);
where cache->size is an uint64_t *, and results in the following
link time error without -latomic:
src/util/disk_cache.c:628: error: undefined reference to '__atomic_fetch_add_8'

Fix the configure/meson test to replicate this pattern, which then
correctly realizes the need for -latomic.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
2018-05-07 10:14:53 -07:00
Scott D Phillips
8b519075ea anv: remove unused field anv_queue::pool
The last use of the field was removed in 2015's ("48a87f4ba06
anv/queue: Get rid of the serial")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-07 09:03:46 -07:00
Kenneth Graunke
0b1cfd01ff i965: Set initial kflags on BO creation.
This simplifies kflag initialization, by creating a bufmgr-wide setting
for initial kflags, and just applying it whenever we create a new BO.

This also properly allows 48-bit addresses for imported BOs (via prime
or flink), which I had missed in my earlier 48-bit support series.

This will be useful when adding softpin support, as we'll want to add
EXEC_OBJECT_PINNED to initial_kflags as well.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-05-07 08:47:21 -07:00
Juan A. Suarez Romero
7ee54fc33d docs: update calendar, add news and link release notes to 18.0.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2018-05-07 11:25:54 +00:00
Juan A. Suarez Romero
78e103da8b docs: add sha256 checksums for 18.0.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit ae12c5e990)
2018-05-07 11:19:36 +00:00
Juan A. Suarez Romero
6c06d4e17b docs: add sha256 checksums for 18.0.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 6dc2658fd6)
2018-05-07 11:19:34 +00:00
Chris Wilson
cf440d85db intel/genxml: Fix a few invalid field widths
A couple of typos found by inspecting field.end - field.start, revealed
a few wide integers declared as bool and some that ended before they
started.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-07 11:34:13 +01:00
Vinson Lee
cd5319a64f swr/rast: Fix include for createInstructionCombiningPass with llvm-7.0.
Fix build error after llvm-7.0.0svn r330669 ("InstCombine: Fix layering
by not including Scalar.h in InstCombine").

  CXX      rasterizer/jitter/libmesaswr_la-blend_jit.lo
rasterizer/jitter/blend_jit.cpp:816:20: error: use of undeclared identifier 'createInstructionCombiningPass'; did you mean 'createInstructionSimplifierPass'?
        passes.add(createInstructionCombiningPass());
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   createInstructionSimplifierPass

Suggested-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-05 13:20:53 -07:00
Jan Vesely
2f1ad72ac1 clover: Add explicit virtual destructor to argument class
It is needed to destroy the v vector in scalar_argument
Fixes memory leaks on parameter set/bind.

v2: Drop redundant sclara_argument destructor

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-05-05 13:17:08 -04:00
Iago Toral Quiroga
e4c667b9e8 anv/device: expose shaderInt16 support in gen8+
This rollbacks the revert of this patch introduced with
commit 7cf284f18e.

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:41:14 +02:00
Iago Toral Quiroga
5a12bdac09 i965/compiler: handle conversion to smaller type in the lowering pass for that
This rollbacks the revert of this same patch introduced in
commit 7b9c15628a.

And also squahes the following patch to prevent a piglit regression caused
by this change:

intel/compiler: Fix lower_conversions for 8-bit types.
Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com>

For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.

Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:

mov(8)          g127<1>UB       g2<32,8,4>UB

And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.

v2: (Jason Ekstrand)
   - Simplify is_byte_raw_mov, include reference to PRM and not
   consider B <-> UB conversions as raw movs.

v3: (Matt Turner)
   - Indentation style fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:41:02 +02:00
Iago Toral Quiroga
a75f967388 intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms
These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit.  We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.

Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-05 12:26:37 +02:00
Caio Marcelo de Oliveira Filho
9d1ff2261c intel/genxml: recognize 0x, 0o and 0b when setting default value
Remove the need of converting values that are documented in
hexadecimal. This patch would allow writing

    <field name="3D Command Sub Opcode" ... default="0x1B"/>

instead of

    <field name="3D Command Sub Opcode" ... default="27"/>

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-04 23:58:10 +01:00
Ian Romanick
9a10a2fd5f r200: Enable NV_fog_distance
With the previous fixes in place, it appears to just work.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-04 15:29:30 -07:00
Ian Romanick
9d0bf720ed i965: Enable NV_fog_distance
With the previous fixes in place, it appears to just work.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-04 15:29:28 -07:00
Ian Romanick
df80ffa4aa ffvertex: Don't try to read output registers in fog calculation
Gallium drivers use _mesa_remove_output_reads() via st_program to lower
output reads away.  It seems better to just generate the right thing in
the first place.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-04 15:27:50 -07:00
Ian Romanick
f2db3be620 mesa: Add missing support for glFogiv(GL_FOG_DISTANCE_MODE_NV)
Found by inspection, so I made a piglit test too.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-04 15:27:44 -07:00
Ian Romanick
d350276b03 mesa: Silence an unused parameter warning
main/framebuffer.c: In function ‘update_color_draw_buffers’:
main/framebuffer.c:629:46: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 update_color_draw_buffers(struct gl_context *ctx, struct gl_framebuffer *fb)
                                              ^~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-04 15:27:40 -07:00
Gert Wollny
e695a35f40 mesa/main/readpix: Correct handling of packed floating point values
Make sure that clamping in the pixel transfer operations is enabled/disabled
for packed floating point values just like it is done for single normal and
half precision floating point values.

This fixes a series of CTS tests with virgl that use r11f_g11f_b10f
buffers as target, and where virglrenderer reads these surfaces back
using the format GL_UNSIGNED_INT_10F_11F_11F_REV.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-04 10:47:46 -07:00
Scott D Phillips
5c075b0855 util/set: add a set_clear function
Clear a set back to the state of having zero entries.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-04 10:13:33 -07:00
Tapani Pälli
affe63b1da egl: add EGL_BAD_MATCH error case for surfaceless and android
Just like is done for other backends when suitable config is not
found (added in fd4eba4929).

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2018-05-04 14:04:03 +03:00
Nicolai Hähnle
c0acb596f4 amd/common: use llvm.amdgcn.wqm for explicit derivatives
To comply with an upcoming change in LLVM, see
https://reviews.llvm.org/D46051

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-04 11:02:48 +02:00
Rhys Perry
b30949a9c2 nv50/ir: fix printing of pixld
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-05-03 22:57:46 -04:00
Drew Davenport
4373dd3215 st/va: Support YUV formats in vaCreateSurfaces
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2018-05-03 15:48:35 -07:00
Mark Janes
7cf284f18e Revert "anv/device: expose shaderInt16 support in gen8+"
This reverts commit 0ba0ac815e.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-03 15:26:59 -07:00
Mark Janes
7b9c15628a Revert "i965/compiler: handle conversion to smaller type in the lowering pass for that"
This reverts commit 96b5153790.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2018-05-03 15:26:59 -07:00
Vinson Lee
589622a2fe swr/rast: Fix WriteBitcodeToFile usage with llvm-7.0.
Fix build error after llvm-7.0svn r325155 ("Pass a reference to a module
to the bitcode writer.").

  CXX      rasterizer/jitter/libmesaswr_la-JitManager.lo
rasterizer/jitter/JitManager.cpp:548:30: error: reference to type 'const llvm::Module' could not bind to an lvalue of type 'const llvm::Module *'
    llvm::WriteBitcodeToFile(M, bitcodeStream);
                             ^

Suggested-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2018-05-03 14:06:09 -07:00
Deepak Rawat
9a21c96126 egl/x11: Send invalidate to driver on copy_region path in swap_buffer
Similar to swap_available path send invalidate to the driver because
egl/X11 is not watching for for server's invalidate events. The
dri2_copy_region path is trigerred when server supports DRI2 version
minor 1.

Tested with piglit egl tests for regression.

V2: Move invalidate from dri2_copy_region to swap_buffer common.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2018-05-03 13:55:58 +02:00
Juan A. Suarez Romero
fd4eba4929 egl: check if colorspace/surface type is supported
According to EGL 1.4 spec, section 3.5.1 ("Creating On-Screen Rendering
Surfaces"), if config does not support the colorspace or alpha format
attributes specified in attrib_list (as defined for
eglCreateWindowSurface), an EGL_BAD_MATCH error is generated.

This fixes dEQP-EGL.functional.wide_color.*_888_colorspace_srgb (still
not merged,
https://android-review.googlesource.com/c/platform/external/deqp/+/667322),
which is crashing when trying to create a windows surface with RGB888
configuration and sRGB colorspace.

v2: Handle the fix in other backends (Tapani)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-05-03 12:26:12 +02:00
Iago Toral Quiroga
0ba0ac815e anv/device: expose shaderInt16 support in gen8+
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
002cb6f2b3 anv/pipeline: support SpvCapabilityInt16 in gen8+
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
f07c05576f compiler/spirv: add implementation to check for SpvCapabilityInt16 support
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
dd41630d9a intel/compiler: implement 16-bit pack/unpack opcodes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
1dacb56279 compiler/spirv: implement 16-bit bitcasts
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
2d648e5ba3 compiler/lower_64bit_packing: rename the pass to be more generic
It can do 32-bit packing too now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
d2564af842 nir/lower_64bit_packing: extend the pass to handle packing from / to 16-bit.
With 16-bit support we can now do 32-bit packing, a follow-up patch will
rename the pass to something more generic.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
c9653cc14c nir: add opcodes for 16-bit packing and unpacking
Noitice that we don't need 'split' versions of the 64-bit to / from
16-bit opcodes which we require during pack lowering to implement these
operations. This is because these operations can be expressed as a
collection of 32-bit from / to 16-bit and 64-bit to / from 32-bit
operations, so we don't need new opcodes specifically for them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Iago Toral Quiroga
6318808a05 intel/compiler: fix 16-bit comparisons
NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.

In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.

v2 (Jason): use the type of the source for the temporary and use
            brw_reg_type_from_bit_size for the conversion to 32-bit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
b11e9425df intel/compiler: lower some 16-bit integer operations to 32-bit
These are not supported in hardware for 16-bit integers.

We do the lowering pass after the optimization loop to ensure that we
lower ALU operations injected by algebraic optimizations too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
b9a3d8c23e compiler/nir: add a lowering pass to convert the bit size of ALU operations
Not all bit-sizes may be supported natively in hardware for all operations.
This pass allows drivers to lower such operations to a bit-size that is
actually supported and then converts the result back to the original
bit-size.

Compiler backends control which operations and wich bit-sizes require
the lowering through a callback function.

v2: generalize this pass and make it available in NIR core (Rob, Jason)
v3: remove some temporaries and reduce nesting in instruction loop using
    a continue statement (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
f575277f7e intel/compiler: support negate and abs of half float immediates
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
f0e6dacee5 intel/compiler: fix brw_imm_w for negative 16-bit integers
16-bit immediates need to replicate the 16-bit immediate value
in both words of the 32-bit value. This needs to be careful
to avoid sign-extension, which the previous implementation was
not handling properly.

For example, with the previous implementation, storing the value
-3 would generate imm.d = 0xfffffffd due to signed integer sign
extension, which is not correct. Instead, we should cast to
uint16_t, which gives us the correct result: imm.ud = 0xfffdfffd.

We only had a couple of cases hitting this path in the driver
until now, one with value -1, which would work since all bits are
one in this case, and another with value -2 in brw_clip_tri(),
which would hit the aforementioned issue (this case only affects
gen4 although we are not aware of whether this was causing an
actual bug somewhere).

v2: Make explicit uint32_t casting for left shift (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
2a76f03c90 intel/compiler: fix 16-bit int brw_negate_immediate and brw_abs_immediate
From Intel Skylake PRM, vol 07, "Immediate" section (page 768):

"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."

This fixes the int16/uint16 negate and abs immediates that weren't
taking into account the replication in lower and upper words.

v2: Integer cases are different to Float cases. (Jason Ekstrand)
    Included reference to PRM (Jose Maria Casanova)
v3: Make explicit uint32_t casting for left shift (Jason Ekstrand)
    Split half float implementation. (Jason Ekstrand)
    Fix brw_abs_immediate (Jose Maria Casanova)

Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Jose Maria Casanova Crespo
e5fc3c0717 intel/compiler: implement nir_instr_type_load_const for 16-bit constants
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
939501c8ed intel/compiler: implement conversions from 16-bit int/float to bool
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
d5a419176f intel/compiler: implement conversion between float/int 16-bit types
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
96b5153790 i965/compiler: handle conversion to smaller type in the lowering pass for that
The lowering pass was specialized to act on 64-bit to 32-bit conversions only,
but the implementation is valid for other cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Iago Toral Quiroga
5361a87ee7 intel/compiler: fix isign for 16-bit integers
We need to use 16-bit constants with 16-bit instructions,
otherwise we get the following validation error:

"Destination stride must be equal to the ratio of the sizes of
 the execution data type to the destination type"

Because the execution data type is 4B due to the 32-bit integer
constant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:25 +02:00
Chris Wilson
b5e266765a i965: Always try to create a logical context
Always enable use of HW logical contexts to preserve GPU state between
batches when the kernel supports such constructs, continuing to enforce
the required support for gen6+.

At runtime, this effectively removes the BRW_NEW_CONTEXT flag (and the
upload of invariant state) from the start of every batch for any kernel
supporting contexts. So long as the older atoms are correctly listening
to the right flag (NEW_CONTEXT rather than NEW_BATCH) this should
eliminate a few redundant state uploads for the older platforms.

No piglits were harmed on ctg and ilk, both with and without logical
contexts.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-03 01:39:33 -07:00
Neil Roberts
e17d0ccbbd spirv: Apply OriginUpperLeft to FragCoord
This behaviour was changed in 1e5b09f42f. The commit message
for that says it is just a “tidy up” so my assumption is that the
behaviour change was a mistake. It’s a little hard to decipher looking
at the diff, but the previous code before that patch was:

  if (builtin == SpvBuiltInFragCoord || builtin == SpvBuiltInSamplePosition)
     nir_var->data.origin_upper_left = b->origin_upper_left;

  if (builtin == SpvBuiltInFragCoord)
     nir_var->data.pixel_center_integer = b->pixel_center_integer;

After the patch the code was:

  case SpvBuiltInSamplePosition:
     nir_var->data.origin_upper_left = b->origin_upper_left;
     /* fallthrough */
  case SpvBuiltInFragCoord:
     nir_var->data.pixel_center_integer = b->pixel_center_integer;
     break;

Before the patch origin_upper_left affected both builtins and
pixel_center_integer only affected FragCoord. After the patch
origin_upper_left only affects SamplePosition and pixel_center_integer
affects both variables.

This patch tries to restore the previous behaviour by changing the
code to:

  case SpvBuiltInFragCoord:
     nir_var->data.pixel_center_integer = b->pixel_center_integer;
     /* fallthrough */
  case SpvBuiltInSamplePosition:
     nir_var->data.origin_upper_left = b->origin_upper_left;
     break;

This change will be important for ARB_gl_spirv which is meant to
support OriginLowerLeft.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 1e5b09f42f "spirv: Tidy some repeated if checks..."
2018-05-03 10:08:42 +02:00
Samuel Iglesias Gonsálvez
b291a3a4a3 spirv: convert some operands for bitwise shift and bitwise ops to uint32
SPIR-V allows to define the shift, offset and count operands for
shift and bitfield opcodes with a bit-size different than 32 bits,
but in NIR the opcodes have that limitation. As agreed in the
mailing list, this patch adds a conversion to 32 bits to fix this.

For more info, see:

https://lists.freedesktop.org/archives/mesa-dev/2018-April/193026.html

v2:
- src_bit_size will have zero value for variable bit-size operands (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 07:07:24 +02:00
Timothy Arceri
58c05ede96 mesa: enable geom shaders in OpenGL 3.2 Compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-03 12:08:21 +10:00
Bas Nieuwenhuizen
ffa15861ef radv: UseEnumerateInstanceVersion for the default version.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-02 21:57:08 +02:00
Bas Nieuwenhuizen
467c562a29 radv: Don't check the incoming apiVersion on CreateInstance.
This fixes

dEQP-VK.api.device_init.create_instance_invalid_api_version

CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-02 21:57:08 +02:00
Bas Nieuwenhuizen
9267ff9883 radv: Allow vkEnumerateInstanceVersion ProcAddr without instance.
Apparently the somewhere between 1.1.70 and 1.1.73 the loader started
depending on this. The loader then creates a 1.0 instance, which gets
into funny situation because we have a 1.1 device.

No idea how to do line wrapping in Mako though, my random guesses
did not work.

CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 21:57:08 +02:00
Lionel Landwerlin
336decd67e intel: aubinator: add an option to limit the number of decoded VBO lines
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 19:46:47 +01:00
Lionel Landwerlin
000452aebc intel: decoder: limit to the number decoded lines from VBO
By default we set no limit, but the debug batch decoder in i965 sets
it to 100.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 19:46:47 +01:00
Jason Ekstrand
bd35345e85 anv: Advertise variableMultisampleRate
Initially, I didn't understand this feature.  Turns out that all it
means is that you can switch multisample rates in the middle of a
zero-attachment subpass.  We've been able to do this since forever.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-05-02 10:59:03 -07:00
Rob Clark
28e410f6a5 nir: add missing dependency in meson.build
nir_builder_opcodes.h also depends on nir_intrinsics.py for generating
the system-value builders.

Reported-by: Christoph Haag <haagch@frickel.club>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 13:57:51 -04:00
Matthew Nicholls
97d57ef917 radv: fix multisample image copies
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.

This fixes some copy_and_blit CTS tests.

v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
    - updated commit description (Samuel)

Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 19:32:00 +02:00
Kenneth Graunke
169d8e011a intel: Fix 3DSTATE_CONSTANT buffer decoding.
First, this was iterating over the 3DSTATE_CONSTANT_* instruction
but trying to process fields of the 3DSTATE_CONSTANT_BODY substructure.

Secondly, the fields have been called Buffer[0] and Read Length[0],
for a while now, and we were not handling the subscripts correctly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 10:09:28 -07:00
Lionel Landwerlin
cf1d587879 intel: fix aubinator include
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7c22c150c4 ("intel: Move batch decoder/disassembler from tools/ to common/")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:54:29 +01:00
Kenneth Graunke
0ab423388c i965: Reuse batch decoder infrastructure rather than open coding it.
With the new callback, Jason's newer batch decoder infrastructure
should be able to do just as well as the old open coded INTEL_DEBUG=bat
handling, with much less code.  If there are any limitations, we'd like
to improve the common code rather than doing one-off hacks here.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 09:27:56 -07:00
Kenneth Graunke
bf91b81a0b intel: Give the batch decoder a callback to ask about state size.
Given an arbitrary batch, we don't always know what the size of certain
things are, such as how many entries are in a binding table.  But it's
easy for the driver to track that information, so with a simple callback
we can calculate this correctly for INTEL_DEBUG=bat.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 09:27:56 -07:00
Kenneth Graunke
7c22c150c4 intel: Move batch decoder/disassembler from tools/ to common/
Making these part of libintel_common allows us to use them in the DRI
driver.  The standalone tool binaries already link against the common
library, too, so it's no harder for them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 09:27:56 -07:00
Kenneth Graunke
5c04971831 i965: Allocate shadow batches to explicitly be the BO size.
This unfortunately makes it malloc/realloc on every new batch, rather
than once at startup.  But it ensures that the shadow buffer's size will
absolutely match the BO size.  Otherwise, as we tune BATCH_SZ/STATE_SZ
or bufmgr cache bucket sizes, we may get a BO size that's rounded up,
and fail to allocate the shadow buffer large enough.

This doesn't fix any bugs today, as BATCH_SZ/STATE_SZ are the size of
a cache bucket, but it's better to be safe than sorry.

Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 09:26:55 -07:00
Lionel Landwerlin
ec5df73803 intel: batch-decoder: iterate VERTEX_BUFFER_STATE fields
The gen_field_iterator only iterates the fields of a given gen_group.
If we want to iterate the fields of another gen_group contained as
field, we need to do it manually.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:11:28 +01:00
Lionel Landwerlin
acbce2ac57 intel: decoder: fix starting dword of struct fields
Struct fields might span several dwords, but iter_dword is incremented
up to the last dword of the current field before we print out the
struct's fields. We can't use iter_dword for computing the offset into
the pointer of data to decode.

v2: Fix displayed offset number (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:11:28 +01:00
Lionel Landwerlin
467430ddcc intel: decoder: document when fields should be used
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:10:37 +01:00
Lionel Landwerlin
4f128f7850 intel: decoder: identify groups with fixed length
<register> & <struct> elements always have fixed length. The
get_length() method implies that we're dealing with an instruction in
which the length is encoded into the variable data but the field
iterator uses it without checking what kind of gen_group it is dealing
with.

Let's make get_length() report the correct length regardless of the
gen_group (register, struct or instruction).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:10:37 +01:00
Lionel Landwerlin
3c416a50d8 intel: decoder: make the field iterator use more natural
while (iter_next()) { ... }

instead of

do { ... } while (iter_next());

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-02 17:10:37 +01:00
Vlad Golovkin
967aabca06 nv50: Extract needed value bits without shifting them before calling bitcount
This can save one instruction since bitcount doesn't care about specific
bits' positions.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-05-02 15:12:48 +02:00
Antia Puentes
3a1df14a7b intel: activate the gl_BaseVertex lowering
Surplus code related to the basevertex is removed.

The Vertex Elements contain now:
* VE 1: <firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is_indexed_draw, 0, 0>

Also fixes unreachable message.

Fixes OpenGL CTS tests:
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters

Fixes Piglit tests:
* arb_shader_draw_parameters-drawid-indirect baseinstance
* arb_shader_draw_parameters-basevertex

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
2018-05-02 11:24:46 +02:00
Antia Puentes
0fb204fac1 compiler/nir: Add conditional lowering for gl_BaseVertex
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:24:31 +02:00
Antia Puentes
0cbf29fa55 intel: emit is_indexed_draw in the same VE than gl_DrawID
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>

VE1 is it kept as it was before, VE2 additionally contains the new
system value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:23:34 +02:00
Antia Puentes
6ba9088d9c intel/compiler: Add uses_is_indexed_draw flag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:20:48 +02:00
Antia Puentes
9e6b886cf2 compiler: Add SYSTEM_VALUE_IS_INDEXED_DRAW and instrinsics
This VS system value contains if the draw command used to start the
rendering was an indexed draw command or a non-indexed one
(~0/0 respectively). Useful to calculate the gl_BaseVertex as:
(SYSTEM_VALUE_IS_INDEXED_DRAW & SYSTEM_VALUE_FIRST_VERTEX).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:20:40 +02:00
Samuel Pitoiset
0737c1e3a6 radv: enable out-of-order rasterization by default
As the implementation is conservative, we can now enable it
by default. It can be disabled with RADV_DEBUG=nooutoforder.

Don't expect much more than 1% of improvements, but the gain
seems consistent.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 10:33:24 +02:00
Samuel Pitoiset
1d766b0196 radv: only disable out-of-order rast for perfect occlusion queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 10:33:22 +02:00
Kenneth Graunke
1122fb2d98 i965: Drop unused gen5 sampler default color struct.
Trivial.
2018-05-01 23:09:25 -07:00
Kenneth Graunke
9f6082f6c7 i965: Make brw_vs_outputs_written static.
Drop a prototype.  Trivial.
2018-05-01 23:09:16 -07:00
Nanley Chery
3e56e4642f i965/tex_image: Avoid the ASTC LDR workaround on gen9lp
Both the internal documentation and the results of testing this in the
CI suggest that this is unnecessary. Add the fixes tag because this
reduces an internal benchmark's startup time by about 17 seconds
(reported by Eero).

Fixes: 710b1d2e66 "i965/tex_image: Flush certain subnormal ASTC channel values"
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-01 16:47:39 -07:00
Eric Anholt
800be7f277 freedreno: Fix ir3_cmdline.c build.
Fixes: 6487e7a30c ("nir: move GL specific passes to src/compiler/glsl")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2018-05-01 16:38:37 -07:00
Jason Ekstrand
d216ffc604 anv: Allow lookup of vkEnumerateInstanceVersion without an instance
Fixes: cbab2d1da5
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-01 14:45:51 -07:00
Jason Ekstrand
d5a0787f03 anv: Don't advertise Float64 or Int64 on HW without 64-bit types
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-05-01 14:45:50 -07:00
Samuel Pitoiset
d8db5986ce radv: compute the number of subpass attachments correctly
Only count color attachments twice if resolves are used, also
account for the depth stencil attachment if present.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-01 22:18:03 +02:00
Dave Airlie
e66f64c285 radv: set fmask_surf_index on fmask surfaces.
This is needed for gfx9 and later for all fmask surface index.

(Mentioned by Marek on irc)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-02 06:01:42 +10:00
Brian Paul
f298ed93d9 gallium/i915: fix PIPE_CAPF_MIN_CONSERVATIVE_RASTER_DILATE typo
Fixes: fffe5e2d14 ("gallium: add initial support for conservative
rasterization")
Trivial.
2018-05-01 09:52:22 -06:00
Rhys Perry
07dac3e040 nvc0: add conservative rasterization support
Subpixel precision bias, dilation and the post-snap mode are supported on
GM200 and newer. The pre-snap mode is supported for triangle primitives on
GP100.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-04-30 21:13:53 -06:00
Rhys Perry
97f5f399ef st/mesa: add support for nvidia conservative rasterization extensions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-30 21:13:53 -06:00
Rhys Perry
fffe5e2d14 gallium: add initial support for conservative rasterization
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-30 21:13:53 -06:00
Rhys Perry
4580617509 mesa: add support for nvidia conservative rasterization extensions
Although the specs are written against compatibility GL 4.3 and allows core
profile and GLES2+, it is exposed for GL 1.0+ and GLES1 and GLES2+.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-30 21:13:53 -06:00
Brian Paul
31ab0427a7 glsl/tests: add GLSL_TYPE_UINT8, GLSL_TYPE_INT8 cases to switch statements
To silence warnings about unhandled switch values.
Untested otherwise.

v2: move the INT/UINT8 cases after the INT/UINT16 cases, per Eric.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-30 21:13:53 -06:00
Brian Paul
efec712d51 tgsi: use enums instead of unsigned in ureg code
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-04-30 21:13:53 -06:00
Timothy Arceri
6487e7a30c nir: move GL specific passes to src/compiler/glsl
With this we should have no passes in src/compiler/nir with any
dependencies on headers from core GL Mesa.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-05-01 12:39:33 +10:00
Andres Rodriguez
f56e22e496 radv/winsys: fix leaking resources from bo's imported by fd
A bo's ref_count was not being initialized when imported from an fd.
Therefore, we would fail to free the resource during VkFreeMemory().

This patch fixes applications like hifi VR in threaded mode, which
perform frequent imports/releases of IPC shared memory.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
2018-04-30 18:20:30 -04:00
Scott D Phillips
2a08ae3c7c i965/tiled_memcpy: ytiled_to_linear a cache line at a time
Similar to the transformation applied to linear_to_ytiled, also align
each readback from the ytiled source to a cacheline (i.e. transfer a
whole cacheline from the source before moving on to the next column).
This will allow us to utilize movntqda (_mm_stream_si128) in a
subsequent patch to obtain near WB readback performance when accessing
the uncached ytiled memory, an order of magnitude improvement.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-30 15:18:36 -07:00
Chris Wilson
682bdaa658 i965: Record mipmap resolver for unmapping
When mapping a region of the mipmap_tree, record which complementary
method to use to unmap it afterwards. By doing so we can avoid
duplicating the decision tree used when mapping and thereby eliminate
trivial errors that can be introduced if the two if-chains become out of
sync.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
5367295e1a i965: Move unmap_depthstencil before map_depthstencil
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
ab2825c898 i965: Move unmap_etc before map_etc
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
9e7e88049f i965: Move unmap_s8 before map_s8
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
b3ad6f5ca6 i965: Move unmap_movntdqa before map_movntdqa
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
f348d07a62 i965: Move unmap_blit before map_blit
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Chris Wilson
359624142d i965: Move unmap_gtt before map_gtt
Reorder code to avoid a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 14:06:23 -07:00
Dave Airlie
8d3529872c ac/nir: expand 64-bit vec3 loads to fix shuffling.
If loading 64-bit vec3 values, a 4 component load would be followed
by a 2 component load and the resulting shuffle would fail as it
requires 2 4 components. This just expands the second results
vector out to 4 components.

This fixes 100 CTS tests:
dEQP-VK.spirv_assembly.type.vec3.*64*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-01 05:58:14 +10:00
Kenneth Graunke
bde12f75e1 i965: Don't stomp initial kflags for program cache.
We want to flag EXEC_OBJECT_CAPTURE, but we ought to preserve any
existing kflags.  Today, there are none (as the program cache doesn't
support 48-bit addressing), but once we start using softpin, we'll
need to preserve EXEC_OBJECT_PINNED.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-04-30 11:34:19 -07:00
Kenneth Graunke
0cc98522f9 i965: Let batchbuffers be placed anywhere in the 48-bit address space.
We were trying to mark batch buffers with EXEC_OBJECT_CAPTURE, and
accidentally stomped EXEC_OBJECT_SUPPORTS_48B_ADDRESS in the process.

There's no reason to restrict batch buffers to the lower 4GB.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-04-30 11:34:19 -07:00
Scott D Phillips
8ffc6ee251 intel: fix check for 48b ppgtt support
The previous logic of the supports_48b_addresses wasn't actually
checking if i915.ko was running with full_48bit_ppgtt. The ENOENT
it was checking for was actually coming from the invalid context
id provided in the test execbuffer.  There is no path in the
kernel driver where the presence of
EXEC_OBJECT_SUPPORTS_48B_ADDRESS leads to an error.

Instead, check the default context's GTT_SIZE param for a value
greater than 4 GiB

v2 (Ken): Fix in i965 as well.
v3 Check GTT_SIZE instead of HAS_ALIASING_PPGTT (Chris Wilson)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-30 11:34:19 -07:00
Leo Liu
1c5f4f4e17 st/omx/enc: fix blit setup for YUV LoadImage
The blit here involves scaling since it's copying from I8 format to R8G8 format.
Half of source will be filtered out with PIPE_TEX_FILTER_NEAREST instruction, it
looks that GPU always uses the second half as source. Currently we use "1" as
the start point of x for R, then causing 1 source pixel of U component shift to
right. So "-1" should be the start point for U component.

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-30 11:55:36 -04:00
Juan A. Suarez Romero
4d449c94e4 autotools, meson: bump up required VA version
Due using a new VP9 config we use, required VA API 0.39

Fixes: 413c5ca372 ("travis: update libva required version")
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-04-30 13:59:37 +02:00
Juan A. Suarez Romero
96ed3714fc docs: update calendar, add news and link release notes to 18.0.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2018-04-28 17:01:48 +00:00
Juan A. Suarez Romero
8f1159bf9a docs: add sha256 checksums for 18.0.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit b3eed3ad03)
2018-04-28 16:58:39 +00:00
Juan A. Suarez Romero
14f85260de docs: add release notes for 18.0.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit d38da7bd2d)
2018-04-28 16:58:36 +00:00
Marek Olšák
8b7358fe43 radeonsi: increase the number of compiler threads depending on the CPU
The compiler queue was limited to 3 threads, so shader-db running
on a 16-thread CPU would have a bottleneck on the 3-thread queue.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
3f0eaaf6d9 radeonsi: avoid a crash in gallivm_dispose_target_library_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
e75fc8d033 radeonsi: move data_layout into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
797d673c9a radeonsi: move passmgr into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
c1823ff661 radeonsi: move target_library_info into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
5a94f15aa7 radeonsi: use si_compiler::triple in si_llvm_optimize_module
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
43f0a10051 radeonsi: add triple into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
87eb597758 radeonsi: add struct si_compiler containing LLVMTargetMachineRef
It will contain more variables.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
788d66553a radeonsi: rename r600_texture::resource to buffer
r600_resource could be renamed to si_buffer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
6fadfc01c6 radeonsi: use r600_resource() typecast helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
3160ee876a radeonsi: remove unused atom parameter from si_atom::emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
de344209ad radeonsi: inline 2 trivial state structures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
e395475096 radeonsi: remove function si_init_atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
ccebcba893 radeonsi: remove si_atom::id
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
639b673fc3 radeonsi: don't use an indirect table for state atoms
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
9054799b39 radeonsi: rename r600_atom -> si_atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
a8abbbb172 radeonsi: remove r600_pipe_common.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
6d19120da8 radeonsi/gfx9: workaround for INTERP with indirect indexing
and clean up the conditions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
2018-04-27 17:56:04 -04:00
Marek Olšák
2d69b485f5 radeonsi: rewrite DCC format compatibility checking code
It might be better to use a slow compressed clear when clearing to 1.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
c732d069b3 radeonsi: implement DCC fast clear swizzle constraints more accurately
Reduce swizzle constraints to the ALPHA_IS_ON_MSB constraint and the clear
value of 1.

This significantly changes the DCC fast clear code, and fixes fast clear
for RGB formats without alpha.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
9ef423f720 radeonsi: rename variables and document stuff around DCC fast clear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
1cc2e0cc6b radeonsi: fully enable 2x DCC MSAA for array and non-array textures
The clear code is exactly the same as for 1 sample buffers -
just clear the whole thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
ca33d961a4 radeonsi: enable fast color clear for level 0 of mipmapped textures on <= VI
GFX9 is more complicated and needs a compute shader that we should just
copy from amdvlk.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák
174e11c3f5 ac/surface: handle DCC subresource fast clear restriction on VI
v2: require the previous level to be clearable for determining whether
    the last unaligned level is clearable

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
George Kyriazis
838f15650e swr/rast: No need to export GetSimdValidIndicesGfx
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
7caeee3432 swr/rast: Small editorial changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
f276517ebf swr/rast: Use new processor detection mechanism
Use specific avx512 selection mechanism based on avx512er bit instead of
getHostCPUName().  LLVM 6.0.0 has a bug that reports wrong string for KNL
(fixed in 6.0.1).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
8ace547e8d swr/rast: Output rasterizer dir to console since it's process specific
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
c328c5d0f4 swr/rast: Add TranslateGfxAddress for shader
Also add GFX_MEM_CLIENT_SHADER

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
edc41f73b8 swr/rast: jit PRINT improvements.
Sign-extend integer types to 32bit when specifying "%d" and add new %u
which zero-extends to 32bit. Improves  printing of sub 32bit integer types
(i1 specifically).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
5d403178e6 swr/rast: Fix regressions.
Bump jit cache revision number to force recompile.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
577af2bed4 swr/rast: Cleanup old cruft.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
aeab9db50a swr/rast: Package events.proto with core output
However only if the file exists in DEBUG_OUTPUT_DIR. The expectation is
that AR rasterizerLauncher will start placing it there when launching
a workload (which is in a subsequent checkin)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
b97bb0ea6d swr/rast: Fix init in EventHandlerWorkerStats
Make sure we initialize variables.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
9a72d4c03e swr/rast: Fix return type of VCVTPS2PH.
expecting <8xi16> return.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
3f008c5505 swr/rast: WIP Translation handling
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
7986519d50 swr/rast: Use different handing for stream masks
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
6b1c852ebc swr/rast: Silence warnings
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
e6daa62a48 swr/rast: Add support for TexelMask evaluation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
cec1b52cac swr/rast: Internal core change
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
7b343a215e swr/rast: Fix x86 lowering 64-bit float handling
- 64-bit cvt-to-float needs to be explicitly handled
- gathers need the right parameter types to work with doubles

Fixes draw-vertices piglit tests

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
fa4ab7910e swr/rast: Add some SIMD_T utility functors
VecEqual and VecHash

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
18c9cb85d1 swr/rast: Fix wrong type allocation
ALLOCA pointer elements, not pointers.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
1cdbce8805 swr: touch generated files to update timestamp
previous change in generators necessitates this change

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
George Kyriazis
9ceeb671a3 swr/rast: Fix byte offset for non-indexed draws
for the case when USE_SIMD16_SHADERS == FALSE

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-04-27 14:36:41 -05:00
Marek Olšák
7083ac7290 util/u_queue: fix a deadlock in util_queue_finish
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 13:28:17 -04:00
Dylan Baker
7772de5283 meson: fix race condition revealed by using 0.44
Previously there was a special target that blocked for the generation of
anv_entrypoints.h, with meson 0.44 we don't need this, we can use a new
language feature instead. The problem is that previously that blocking
target would hide a race condition for the generation of another header,
anv_extensions.h. Now the build sometimes fails when anv_extensions.h is
not generated in time.

v2: - clarify the race condition in the commit message (Emil)

CC: Mark Janes <mark.a.janes@intel.com>
Fixes: 92550d9b16
       ("meson: remove workaround for custom target creating .h and .c files")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-04-27 10:24:51 -07:00
Dylan Baker
0c23bd76d1 bin: force git show to use default pretty setting
I have pretty default to short, which breaks this script.

v2: - Fix both places that don't define a --pretty (Emil)

cc: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Andres Gomez <agomez@igalia.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-04-27 10:19:55 -07:00
Tapani Pälli
b3ad4b6971 mesa: add TBO support for GL_EXT_texture_norm16
Earlier plumbing missed interaction with texture buffer objects.

Fixes: 7f467d4f73 "mesa: GL_EXT_texture_norm16 extension plumbing"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-27 14:34:43 +03:00
Samuel Pitoiset
d38425ce87 ac: fix texture query LOD for 1D textures on GFX9
1D textures are allocated as 2D which means we only need
one coordinate for texture query LOD.

Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 11:15:35 +02:00
Christian Gmeiner
3e69127939 etnaviv: remove not needed includes
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2018-04-27 09:04:56 +02:00
Christian Gmeiner
2ba587aac7 etnaviv: remove redundant include
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2018-04-27 09:04:53 +02:00
Timothy Arceri
79b0556f29 glsl: replace some asserts with unreachable when processing the ast
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-04-27 10:18:47 +10:00
Timothy Arceri
410f901bee mesa: drop the buffer mode param from the DrawBuffer driver function
No drivers used it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-27 10:09:10 +10:00
Anuj Phogat
b695a7bd8e anv/icl: Enable Vulkan on Ice Lake
This patch enables the Vulkan driver on Ice Lake h/w
with added warning about preliminary support.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2018-04-26 16:31:27 -07:00
Caio Marcelo de Oliveira Filho
c9bdc7f7e2 anv: enable VK_EXT_shader_viewport_index_layer
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-26 15:32:05 -07:00
Jason Ekstrand
3db93f9128 anv/allocator: Don't shrink either end of the block pool
Previously, we only tried to ensure that we didn't shrink either end
below what was already handed out.  However, due to the way we handle
relocations with block pools, we can't shrink the back end at all.  It's
probably best to not shrink in either direction.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105374
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106147
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2018-04-26 13:17:14 -07:00
Eric Anholt
76ee9edcb4 broadcom/vc5: Add support for centroid varyings.
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros.  We need to refactor this to shader record setup at compile time,
anyway.

Fixes ext_framebuffer_multisample-interpolation * centroid-*
2018-04-26 11:30:22 -07:00
Eric Anholt
e2f3317801 broadcom/vc5: Add an assert about GFXH-1559.
Our TF outputs always start at 6 or 7 currently, so we don't hit the
broken 8 case.  Let's make sure that doesn't change somehow.
2018-04-26 11:30:22 -07:00
Eric Anholt
77b4f30bae broadcom/vc5: Add validation that we don't violate GFXH-1633 requirements.
We don't use ldunifa yet, but we will eventually for UBOs.
2018-04-26 11:30:22 -07:00
Eric Anholt
089c32eefd broadcom/vc5: Add validation that we don't violate GFXH-1625 requirements.
We don't use TMUWT yet, but we will once we do SSBOs.
2018-04-26 11:30:22 -07:00
Eric Anholt
57ceb95c84 broadcom/vc5: Implement GFXH-1742 workaround (emit 2 dummy stores on 4.x).
This should fix help with intermittent GPU hangs in tests switching
formats while rendering small frames.  Unfortunately, it didn't help with
the tests I'm having troubles with.
2018-04-26 11:30:22 -07:00
Eric Anholt
dc4cb04ee5 broadcom/vc5: Add QPU validation for register writes after thrend.
The next shader gets to start writing the register file during these
slots, so make sure we don't stomp over them.

The only case of hitting this that I could imagine would be dead writes.
2018-04-26 11:30:22 -07:00
Eric Anholt
8adf813f83 st: Choose a 2101010 format for GL_RGB/GL_RGBA with a 2_10_10_10 type.
GLES's GL_EXT_texture_type_2_10_10_10_REV allows uploading this type to an
unsized internalformat, and it should be non-color-renderable.
fbobject.c's implementation of the check for color-renderable is checks
that the texture has a 2101010 mesa format, so make sure that we have
chosen a 2101010 format so that check can do what it meant to.

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgb on vc5.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-26 11:30:22 -07:00
Charmaine Lee
8aef7fccb7 st/mesa: fix missing setting of _ElementSize in new_draw_rasterpos_stage
With this patch, _ElementSize is initialized along with the rest
of the vertex array attributes in new_draw_rasterpos_stage().
This fixes a crash in st_pipe_vertex_format() when running
topogun-1.06-orc-84k-resize trace file with VMware svga driver.

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-26 10:29:02 -07:00
Drew Davenport
e923e8151d st/va: Fix typos
s/attibute/attribute/
s/suface/surface/

v2: rebased(Leo)

Reviewed-by: Leo Liu <leo.liu@amd.com>
2018-04-26 11:16:05 -04:00
Drew Davenport
893808006a st/va: Fix potential buffer overread
VASurfaceAttribExternalBuffers.pitches is indexed by
plane. Current implementation only supports single plane layout.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2018-04-26 11:16:05 -04:00
Boyuan Zhang
deba56accf radeon/vcn: fix mpeg4 msg buffer settings
Previous bit-fields assignments are incorrect and will result certain mpeg4
decode failed due to wrong flag values. This patch fixes these assignments.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2018-04-26 11:16:05 -04:00
Ian Romanick
bf5e0276b6 radeon: Drop broken front_buffer_reading/drawing optimization
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-26 09:38:51 -04:00
Ian Romanick
0b3231966f radeon: Use _mesa_is_front_buffer_drawing
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-26 09:38:51 -04:00
Samuel Pitoiset
d7ffe3b384 radv: set ac_surf_info::num_channels correctly
num_channels has been introduced since "ac/surface: don't set
the display flag for obviously unsupported cases".

Based on RadeonSI.

Fixes: e29facff31 ("ac/surface: don't set the display flag for obviously unsupported cases (v2)")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-26 15:34:14 +02:00
Samuel Pitoiset
a6fbefa67b radv: fix DCC enablement since partial MSAA implementation
dcc_msaa_allowed is always false on GFX9+ and only true on VI
if RADV_PERFTEST=dccmsaa is set. This means DCC was disabled
in some situations where it should not.

This is likely going to fix a performance regression.

Fixes: 2f63b3dd09 ("radv: enable DCC for MSAA 2x textures on VI under an option")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-26 15:34:11 +02:00
Karol Herbst
227b1af866 nir/opt_constant_folding: fix folding of 8 and 16 bit ints
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-26 11:16:15 +02:00
Karol Herbst
14943add44 nir: print 8 and 16 bit constants correctly
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-26 11:16:15 +02:00
Karol Herbst
543a8c66a7 nir: support converting to 8-bit integers in nir_type_conversion_op
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-26 11:16:15 +02:00
Neil Roberts
c4ab1bdcc9 spirv: Don’t check for NaN for most OpFOrd* comparisons
For all of the OpFOrd* comparisons except OpFOrdNotEqual the hardware
should probably already return false if one of the operands is NaN so
we don’t need to have an explicit check for it. This seems to at least
work on Intel hardware. This should reduce the number of instructions
generated for the most common comparisons.

For what it’s worth, the original code to handle this was added in
e062eb6415. The commit message for that says that it was to fix
some CTS tests for OpFUnord* opcodes. Even if the hardware doesn’t
handle NaNs this patch shouldn’t affect those tests. At any rate they
have since been moved out of the mustpass list. Incidentally those
tests fail on the nvidia proprietary driver so it doesn’t seem like
handling NaNs correctly is a priority.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-26 10:08:14 +02:00
Matt Atwood
3ba5a646e5 Intel: Add a Kaby Lake PCI ID
v2: Branding changed

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-04-25 13:31:55 -07:00
Eric Anholt
069c409f43 gallium/util: Fix incorrect refcounting of separate stencil.
The driver may have a reference on the separate stencil buffer for some
reason (like an unflushed job using it), so we can't directly free the
resource and should instead just decrement the refcount that we own.
Fixes double-free in KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8
on vc5.

Fixes: e94eb5e600 ("gallium/util: add u_transfer_helper")
Reviewed-by: Rob Clark <robdclark@gmail.com>
2018-04-25 12:14:33 -07:00
Eric Anholt
0d4ce00d70 broadcom/vc5: Fix reloads of separate stencil buffers.
Like for stores, we need to emit a separate load_general packet.
2018-04-25 09:21:54 -07:00
Eric Anholt
9f3f4284c0 broadcom/vc5: Fix cpp of MSAA surfaces on 4.x.
The internal-type-bpp path is for surfaces that get stored in the raw TLB
format.  For 4.x, we're storing MSAA as just 2x width/height at the
original format.
2018-04-25 09:21:54 -07:00
Eric Anholt
ac207acb97 broadcom/vc5: Implement stencil blits using RGBA.
Fixes piglit fbo-depthstencil blit default_fb
2018-04-25 09:21:54 -07:00
Eric Anholt
503716fa86 broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key. 2018-04-25 09:21:54 -07:00
Eric Anholt
5710532e9e broadcom/vc5: Fix tile load/store of MSAA surfaces on 4.x.
For single-sample we have to always program SAMPLE_0, but for multisample
we want to store all the samples.
2018-04-25 09:21:54 -07:00
Juan A. Suarez Romero
413c5ca372 travis: update libva required version
Commit fa328456e8 added VP9 config support, but this needs a newer
libva version, 1.7.0 or above.

Fixes: fa328456e8 ("st/va: add VP9 config to enable profile2")
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-04-25 16:09:20 +02:00
Tapani Pälli
7f467d4f73 mesa: GL_EXT_texture_norm16 extension plumbing
Patch enables use of short and unsigned short data for texture uploads,
rendering and reading of framebuffers within the restrictions specified
in GL_EXT_texture_norm16 spec.

Patch also enables those 16bit format layout qualifiers listed in
GL_NV_image_formats that depend on EXT_texture_norm16.

v2: expose extension with dummy_true
    fix layout qualifier map changes (Ilia Mirkin)

v3: use _mesa_has_EXT_texture_norm16, other fixes
    and cleanup (Ilia Mirkin)

v4: fix rest of the issues found

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-25 14:26:20 +03:00
Jordan Justen
b0c5774027 meson: Fix with_intel_vk and with_amd_vk variables
Fixes: 5608d0a2ce "meson: use array type options"
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-04-24 23:12:42 -07:00
Roland Scheidegger
77554d220d draw: fix different sign logic when clipping
The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when
the sign is the same but both numbers are sufficiently small
(if the product is smaller than 2^-128).
This could apparently lead to emitting a sufficient amount of
additional bogus vertices to overflow the allocated array for them,
hitting an assertion (still safe with release builds since we just
aborted clipping after the assertion in this case - I'm however unsure
if this is now really no longer possible, so that code stays).
Not sure if the additional vertices could cause other grief, I didn't
see anything wrong even when hitting the assertion.

Essentially, both +-0 are treated as positive (the vertex is considered
to be inside the clip volume for this plane), so integrate the logic
determining different sign into the branch there.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-25 04:50:20 +02:00
Roland Scheidegger
98578df27b draw: simplify clip null tri logic
Simplifies the logic when to emit null tris (albeit the reasons why we
have to do this remain unclear).
This is strictly just logic simplification, the behavior doesn't change
at all.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-25 04:50:20 +02:00
Ilia Mirkin
c17ddcb4b4 nvc0/ir: all short immediates are sign-extended, adjust LIMM test
Some analysis suggests that all short immediates are sign-extended. The
insnCanLoad logic already accounted for this, but we could still pick
the wrong form when emitting actual instructions that support both short
and long immediates (with the long form usually having additional
restrictions that insnCanLoad should be aware of).

This also reverses a bunch of commits that had previously "worked
around" this issue in various emitters:

9c63224540: gm107/ir: make use of ADD32I for all immediates
83a4f28dc2: gm107/ir: make use of LOP32I for all immediates
b84c97587b: gm107/ir: make use of IMUL32I for all immediates
d30768025a: gk110/ir: make use of IMUL32I for all immediates

as well as the original import for UMUL in the nvc0 emitter.

Reported-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
2018-04-24 21:37:44 -04:00
Boyan Ding
6695f9d5c5 mesa: call DrawBufferAllocate driver hook in update_framebuffer for windows-system FB
When draw buffers are changed on a bound framebuffer, DrawBufferAllocate()
hook should be called. However, it is missing in update_framebuffer with
window-system framebuffer, in which FB's draw buffer state should match
context state, potentially resulting in a change.

Note: This is needed because gallium delays creating the front buffer,
      i965 works fine without this change.

V2 (Timothy Arceri):
 - Rebased on merged/simplified DrawBuffer driver function
 - Move DrawBuffer call outside fb->ColorDrawBuffer[0] !=
   ctx->Color.DrawBuffer[0] check to make piglit pass.

v3 (Timothy Arceri):
 - Call new DrawBuffaerAllocate() driver function.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v2)
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99116
2018-04-25 09:08:26 +10:00
Timothy Arceri
6ca09f3a60 st/mesa: add new driver function DrawBufferAllocate
Unlike some of the classic drivers the st was only using DrawBuffer()
to allocated some buffers on-demand. Creating a separate function
will allow us to call it from update_framebuffer() in the following
patch without regressing some of the older classic drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-25 09:08:26 +10:00
Timothy Arceri
2554b8cb00 mesa: some C99 tidy ups for framebuffer.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-04-25 09:08:26 +10:00
Dylan Baker
1d01b52d76 meson: Fix no-rtti in llvm detection
Because I clearly wasn't thinking and clearly didn't do a good job
testing. Sigh

Fixes: c5a97d658e
       ("meson: fix builds against LLVM built without rtti")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-24 15:26:51 -07:00
Dylan Baker
be0a2cfc65 meson: use new warning function
Instead of emulating it with message.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
5608d0a2ce meson: use array type options
This option type is nice since it involves less converting strings into
lists, and because it validates the values that are provided.

v2: - Set with_any_vk to true if any vulkan driver is built (Eric)

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
c5a97d658e meson: fix builds against LLVM built without rtti
Building without rtti is a frought with peril, but it's something that
autotools supports so we need to support it too.

Since we've moved to version 0.44 as a whole we can use the meson
functionality for accessing random llvm-config options we can check for
rtti and add -fno-rtti to all C++ code accordingly.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-04-24 14:08:15 -07:00
Dylan Baker
595021bf1a meson: remove dummy_cpp
meson has gotten pretty smart about tracking C and C++ dependencies
(internal and external), and using the right linker. This wasn't always
the case and we created empty c++ files to force the use of the c++
linker. We don't need that any more.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
db90c8627c meson: allow empty sources when using link_whole
meson used to get grumpy if the sources list was empty, even when using
--whole-archive (link_whole). In more recent versions that's not true,
so remove the workaround.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
92550d9b16 meson: remove workaround for custom target creating .h and .c files
In more modern versions of meson a custom_target returns an index-able
object. This allows us to create accurate dependency models for targets
that rely only on the header and not on the code from anv_entrypoints.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
5a670d08c0 meson: raise required version to 0.44.1
We have already required 0.44 for building clover and swr, so it was
already partially required. This just makes it required across the board
instead of just for clover and swr.

There is a bug in 0.44 which makes it impossible to build mesa in some
configurations, so require 0.44.1 which fixes this.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
1546f76a39 meson: fix graw-xlib after auxiliary consolidation
This one's completely my fault, I didn't do good enough testing after
rebasing and this got missed.

Fixes: d28c246501
       ("meson: build graw tests")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
c73abb4f82 meson: only build mesa_st tests when build-tests is true
Since we have an option to turn test building on and off, we should
honor that.

Fixes: 34cb4d0ebc
       ("meson: build tests for gallium mesa state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Dylan Baker
aaab624245 meson: don't build classic mesa tests without dri_drivers
Since mesa_classic is build-on-demand the tests will create a demand and
add a bunch of extra compilation.

Fixes: 43a6e84927
       ("meson: build mesa test.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-04-24 14:08:15 -07:00
Nanley Chery
0e8b16e0a2 i965/meta_util: Re-enable sRGB-encoded fast-clears on CNL
The paths which sample with the clear color are now using a getter which
performs the sRGB decode needed to enable this fast clear.

This path can be exercised by fast-clearing a texture, then performing
an operation which requires sRGB decoding. Test coverage for this
feature is provided with the following tests:

* Shader texture calls:
  - spec@ext_texture_srgb@tex-srgb

* Shader texelfetch calls:
  - spec@arb_framebuffer_srgb@fbo-fast-clear
  - spec@arb_framebuffer_srgb@msaa-fast-clear

* Blending:
  - spec@arb_framebuffer_srgb@arb_framebuffer_srgb-fast-clear-blend

* Blitting:
  - spec@arb_framebuffer_srgb@blit texture srgb msaa enabled clear

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Nanley Chery
129ad66dd5 i965/miptree: Extend the sRGB-blending WA to future platforms
The blending issue seems to be present on CNL as well.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Nanley Chery
7ea013c6d3 i965: Add and use a getter for the clear color
It returns both the inline clear color and a clear address which points
to the indirect clear color buffer (or NULL if unused/non-existent).
This getter allows CNL to sample from fast-cleared sRGB textures
correctly by doing the needed sRGB-decode on the clear color (inline)
and making the indirect clear color buffer unused.

v2 (Rafael):
* Have a more detailed commit message.
* Add a comment on the sRGB conversion process.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Jason Ekstrand
b55077a8bc util/srgb: Add a float sRGB -> linear helper
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Nanley Chery
cd5ce363e3 i965/wm_surface_state: Use the clear address if clear_bo is non-NULL
We want to add and use a getter that turns off the indirect path by
returning zero for the clear color bo and offset.

v2: Fix usage of "clear address" in commit message (Jason).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Nanley Chery
af4e9295fe i965: Add and use a single miptree aux_buf field
We want to add and use a function that accesses the auxiliary buffer's
clear_color_bo and doesn't care if it has an MCS or HiZ buffer
specifically.

v2 (Jason Ekstrand):
* Drop intel_miptree_get_aux_buffer().
* Mention CCS in the aux_buf field.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 13:41:14 -07:00
Nanley Chery
5503b65103 i965: Add and use a getter for the miptree aux buffer
Make the next patch easier to read by eliminating most of the would-be
duplicate field accesses now.

v2: Update the HiZ comment instead of deleting it (Rafael).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-04-24 13:41:14 -07:00
Karol Herbst
e4f675dc42 gm107/ir/lib: fix sched in div u32 builtin
Imad needs to set a read barrier.

With significant big work groups I was getting wrong results for div u32. Turns
out the issue was with the sched opcodes.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-24 22:31:59 +02:00
Ian Romanick
0d5ce25c1c intel/compiler: Add scheduler deps for instructions that implicitly read g0
Otherwise the scheduler can move the writes after the reads.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95009
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95012
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: Clayton A Craft <clayton.a.craft@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2018-04-24 14:31:21 -04:00
Ian Romanick
cd32a4e5f4 intel/compiler: Silence unused parameter warnings in empty vec4_instruction_scheduler methods
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::count_reads_remaining(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:764:72: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::count_reads_remaining(backend_instruction *be)
                                                                        ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::setup_liveness(cfg_t*)’:
src/intel/compiler/brw_schedule_instructions.cpp:769:51: warning: unused parameter ‘cfg’ [-Wunused-parameter]
 vec4_instruction_scheduler::setup_liveness(cfg_t *cfg)
                                                   ^~~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::update_register_pressure(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:774:75: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::update_register_pressure(backend_instruction *be)
                                                                           ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:779:80: warning: unused parameter ‘be’ [-Wunused-parameter]
 vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be)
                                                                                ^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::issue_time(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:1550:61: warning: unused parameter ‘inst’ [-Wunused-parameter]
 vec4_instruction_scheduler::issue_time(backend_instruction *inst)
                                                             ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 14:31:21 -04:00
Ian Romanick
bdb15c2344 intel/compiler: Silence unused parameter warning in compile_cs_to_nir
src/intel/compiler/brw_fs.cpp: In function ‘nir_shader* compile_cs_to_nir(const brw_compiler*, void*, const brw_cs_prog_key*, brw_cs_prog_data*, const nir_shader*, unsigned int)’:
src/intel/compiler/brw_fs.cpp:7205:44: warning: unused parameter ‘prog_data’ [-Wunused-parameter]
                   struct brw_cs_prog_data *prog_data,
                                            ^~~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 14:31:21 -04:00
Ian Romanick
d84b2ed1d7 intel/compiler: Silence unused parameter warnings in generate_foo methods
Since all of the fs_generator::generate_foo methods take a fs_inst * as
the first parameter, just remove the name to quiet the compiler.

src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst*, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src)
                                         ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst*)’:
src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_discard_jump(fs_inst *inst)
                                              ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_pack_half_2x16_split(fs_inst *inst,
                                                      ^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter]
 fs_generator::generate_shader_time_add(fs_inst *inst,
                                                 ^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen*, brw::vec4_instruction*, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter]
                                  vec4_instruction *inst,
                                                    ^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen*, brw::vec4_instruction*, brw_reg, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
                       vec4_instruction *inst,
                                         ^~~~
src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter]
                       struct brw_reg indirect, struct brw_reg length)
                                                               ^~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-04-24 14:31:21 -04:00
Eric Anholt
3d21fc193e broadcom/vc5: Set up internal_format for imported resources.
Without this, we'd assertion fail in u_transfer_helper when mapping an
imported resource.
2018-04-24 10:37:29 -07:00
Eric Anholt
f08f477a93 broadcom/vc5: Assert that created BOs have offset != 0.
The kernel shouldn't return a bo at NULL, and the HW special-cases NULL
address values for things like OQs.
2018-04-24 10:37:29 -07:00
Eric Anholt
482f2e24b5 broadcom/vc5: Don't allocate simulator BOs at offset 0.
The kernel won't return us BOs at offset 0 (because things like OQs
wouldn't work there), so we shouldn't in the simulator either.
2018-04-24 10:37:29 -07:00
Eric Anholt
82cdb801fd broadcom/vc5: Add sim support for the GET_BO_OFFSET ioctl.
Otherwise we'd crash immediately upon importing a BO through EGL
interfaces.
2018-04-24 10:37:29 -07:00
Eric Anholt
3cdd055ed2 broadcom/vc5: Treat imports of DRM_FORMAT_MOD_INVALID BOs as linear.
We don't have any kernel metadata about BO tiling, so this probably is all
we should do for the moment.
2018-04-24 10:37:29 -07:00
Tapani Pälli
c2e159d050 i965: expose MESA_FORMAT_R8G8B8A8_SRGB visual
Exposing the visual makes following dEQP tests pass on Android:

   dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
   dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb

Visual is exposed only when DRI_LOADER_CAP_RGBA_ORDERING is set.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-24 14:55:18 +03:00
Tapani Pälli
fa4d4d97f3 dri: Add __DRI_IMAGE_FORMAT_SABGR8
Add format definition and required plumbing to create images.
Note that there is no match to drm_fourcc definition, just like
with existing _DRI_IMAGE_FOURCC_SARGB8888.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-24 14:55:18 +03:00
Marek Olšák
4559aefb5c Revert "st/dri: Fix dangling pointer to a destroyed dri_drawable"
This reverts commit dab02dea34.

It causes crashes of qtcreator and firefox.

Fixes: dab02de "st/dri: Fix dangling pointer to a destroyed dri_drawable"

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
2018-04-24 00:00:20 -04:00
Roland Scheidegger
e8e1d287a3 gallivm: dump bitcode before optimization
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-24 04:49:39 +02:00
Roland Scheidegger
e89cf59c27 gallivm: (trivial) do division by 1000 with int64
Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-24 04:49:39 +02:00
Roland Scheidegger
45b8f620a5 gallivm: remove LICM pass
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-24 04:49:39 +02:00
Roland Scheidegger
8b9ab674b9 gallivm: add early cse pass
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-04-24 04:49:39 +02:00
Vlad Golovkin
1ff1dc1c63 glsl/glcpp: Handle hex constants with 0X prefix
GLSL 4.6 spec describes hex constant as:

hexadecimal-constant:
    0x hexadecimal-digit
    0X hexadecimal-digit
    hexadecimal-constant hexadecimal-digit

Right now if you have a shader with the following structure:

    #if 0X1 // or any hex number with the 0X prefix
    // some code
    #endif

the code between #if and #endif gets removed because the checking is performed
only for "0x" prefix which results in strtoll being called with the base 8 and
after encountering the 'X' char the strtoll returns 0. Letting strtoll detect
the base makes this limitation go away and also makes code easier to read.

From the strtoll Linux man page:

"If base is zero or 16, the string may then include a "0x" prefix, and the
number will be read in base 16; otherwise, a zero base is taken as 10 (decimal)
unless the next character is '0', in which case it is taken as 8 (octal)."

This matches the behaviour in the GLSL spec.

This patch also adds a test for uppercase hex prefix.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-24 09:55:05 +10:00
Timothy Arceri
295f57e09a mesa: rename api_validate.{c,h} -> draw_validate.{c,h}
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65422
2018-04-24 09:23:30 +10:00
Dave Airlie
a90c9f33cf ac/radv/radeonsi: refactor harvest config register getters.
This refactors the code out to share it between radv and radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-24 09:08:34 +10:00
Dave Airlie
8e4d54505a radv: only set raster_config_1 outside the index registers.
This follows what radeonsi does.

Ported from radeonsi:
    radeonsi: emit PA_SC_RASTER_CONFIG_1 only once

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-24 09:08:34 +10:00
Dave Airlie
f77caa7411 ac/radv/radeonsi: refactor max simd waves into common code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:08:33 +10:00
Dave Airlie
899df55ee0 ac/radv/radeonsi: refactor raster_config default values getters.
This just makes this common code between the two drivers.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:07:51 +10:00
Dave Airlie
8de7ff91be radeonsi: use common gs_table_depth code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:05:43 +10:00
Dave Airlie
9afe9c0fe2 radv: use common gs_table_depth code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:05:43 +10:00
Dave Airlie
5e2ef28390 ac/info: move gs table depth to common code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:05:38 +10:00
Dave Airlie
b25f6cde89 radeonsi: don't runtime check gs table info
We can just unreachable here, this aligns with radv code, makes
it easier to move to common code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:05:29 +10:00
Dave Airlie
40783a7fa3 radv/gfx9: don't use gs_table_depth on gfx9.
Missed this on initial radeonsi port, we shouldn't use this value
on gfx9, but also in gfx8 only for when we have a geom shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-04-24 09:04:42 +10:00
Jason Ekstrand
de1f22d595 i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*
They are send messages and this makes size_read() and mlen agree.  For
both of these opcodes, the payload is just a dummy so mlen == 1 and this
should decrease register pressure a bit.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
2018-04-23 14:04:42 -07:00
Samuel Pitoiset
d136a5fad9 ac: fix the number of coordinates for ac_image_get_lod and arrays
This fixes crashes for the following CTS:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*

Cubemaps are the same as 2D arrays.

Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-23 21:48:38 +02:00
Lionel Landwerlin
2964e16e51 i965: perf: enable GPA query statistics
The combinaison of GPA/MDAPI components expects a particular name &
layout for their pipeline statistics query.

v2: Limit the query GPA/MDAPI statistics to gen7->9 (Lionel)

v3: Add curly braces (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-23 18:30:10 +01:00
Lionel Landwerlin
2e3025c817 i965: perf: add support for raw queries
The INTEL_performance_query extension provides a list of queries that
a user can select to monitor a particular workload. Each query reports
different sets of counters (roughly looking at different parts of the
hardware, i.e. caches/fixed functions/etc...).

Each query has an associated configuration that we need to program
into the hardware before using the query. Up to now, we provided
predefined queries. This change allows the user to build its own query
(and associated configuration) externally, and have the i965 driver
use that configuration through a new query named :

   Intel_Raw_Hardware_Counters_Set_0_Query

When this query is selected, the i965 driver will report raw counters
deltas (meaning their values need to be interpreted by the user, as
opposed to existing queries that provide human readable values).

This change is also useful for debug purposes for building new
pre-defined queries and verifying the underlying numbers make sense
before writing equations for user readable output.

This change's purpose is also to enable GPA. GPA uses a library called
MDAPI that processes raw counter data. MDAPI expects raw data to have
a certain layout (per generation which is a bit unfortunate...). This
change also embeds the expected data layouts.

v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel)

v3: Don't assert on cherryview for gen7... (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-23 18:30:10 +01:00
Lionel Landwerlin
c61d445a5a i965: perf: read slice/unslice frequencies from OA reports
v2: Add comment breaking down where the frequency values come from (Ken)

v3: More documentation (Ken/Lionel)
    Adjust clock ratio multiplier to reflect the divider's behavior (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-23 18:30:10 +01:00
Lionel Landwerlin
43fcb72d2c i965: perf: snapshot RPSTAT register
This register contains the current/previous frequency of the GT, it's
one of the value GPA would like to have as part of their queries.

v2: Don't use this register on baytrail/cherryview (Ken)
    Use GET_FIELD() macro (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-23 18:30:10 +01:00
Lionel Landwerlin
d71b442416 i965: perf: extract utility functions
We would like to reuse a number of the functions and structures in
another file in a future commit.

We also move the previous content of brw_performance_query.h into
brw_performance_query_metrics.h to be included by generated metrics
files.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-23 18:30:10 +01:00
Samuel Pitoiset
e37e643589 ac: teach get_ac_sampler_dim() about subpass attachments
Suggested by Nicolai.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-23 19:10:56 +02:00
Samuel Pitoiset
84fef802fb ac/nir: add missing round_slice for 1D arrays
This fixes a bunch of CTS fails with 1D arrays:

dEQP-VK.glsl.texture_functions.texture*.sampler1darray_*

Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-23 19:10:52 +02:00
Dylan Baker
10e4290524 bin/install_megadrivers: rename a few variables to make things clearer
Originally the "each" variable was just a part of the "drivers"
variable. It's not anymore so it's a bit ambiguous.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-04-23 09:57:35 -07:00
Dylan Baker
ae3f45c11e bin/install_megadrivers: fix DESTDIR and -D*-path
This fixes -Ddri-drivers-path, -Dvdpau-libs-path, etc. with DESTDIR when
those paths are absolute. Currently due to the way python's os.path.join
handles absolute paths these will ignore DESTDIR, which is bad. This
fixes them to be relative to DESTDIR if that is set.

Fixes: 3218056e0e
       ("meson: Build i965 and dri stack")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-04-23 09:57:35 -07:00
Dylan Baker
dbf5b772b3 compiler/glsl: close fd's in glcpp_test.py
I would have thought falling out of scope would allow the gc to collect
these, but apparently it doesn't, and this hits an fd limit on macos.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106133
Fixes: db8cd8e367
       ("glcpp/tests: Convert shell scripts to a python script")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2018-04-23 09:55:17 -07:00
Bas Nieuwenhuizen
0e945fdf23 nir: Do not use progress for unreachable code in return lowering.
We seem to use progress for two cases:
1) When we lowered some returns.
2) When we remove unreachable code.

If just case 2 happens we assert as state->return_flag has not
been allocated yet, but we are still trying to do insert all
predicates based on it.

This splits the concerns. We only use progress internally for case 1
and then keep track of 2 in a separate variable to indicate progress
in the return value of the pass.

This is slightly better than transforming the assert into
if (!state->return_flag) return, as the solution in this patch avoids
inserting predicates even if some other part of the might need them.

Fixes: 6e22ad6edc "nir: return early when lowering a return at the end of a function"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106174
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-23 16:55:15 +02:00
Józef Kucia
8328c64eb1 radv: advertise 8 bits of subpixel precision for viewports
This is what radeonsi does.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-23 11:16:11 +02:00
Johan Klokkhammer Helsing
dab02dea34 st/dri: Fix dangling pointer to a destroyed dri_drawable
If an EGLSurface is created, made current and destroyed, and then a second
EGLSurface is created. Then the second malloc in driCreateNewDrawable may
return the same pointer address the first surface's drawable had.
Consequently, when dri_make_current later tries to determine if it should
update the texture_stamp it compares the surface's drawable pointer against
the drawable in the last call to dri_make_current and assumes it's the same
surface (which it isn't).

When texture_stamp is left unset, then dri_st_framebuffer_validate thinks
it has already called update_drawable_info for that drawable, leaving it
unvalidated and this is when bad things starts to happen. In my case it
manifested itself by the width and height of the surface being unset.

This is fixed this by setting the pointer to NULL before freeing the
surface.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126
Signed-off-by: Johan Klokkhammer Helsing <johan.helsing@qt.io>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
2018-04-23 04:25:40 -04:00
Ilia Mirkin
5428066f5e nv50/ir: make a copy of tex src if it's referenced multiple times
For nv50 we coalesce the srcs and defs into a single node. As such, we
can end up with impossible constraints if the source is referenced
after the tex operation (which, due to the coalescing of values, will
have overwritten it).

This logic already exists for inserting moves for MERGE/UNION sources.
It's the exact same idea here, so leverage that code, which also
includes a few optimizations around not extending live ranges
unnecessarily.

Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-04-22 23:03:16 -04:00
Lepton Wu
6c5abb68c7 virgl: disable virgl when no 3D for virtio gpu.
If users are running mesa under old version of qemu or have turned off
GL at runtime, virtio gpu driver actually doesn't work. Adds a detection
here so mesa can fall back to software rendering.

v2:
 - move detection from loader to virgl (Ilia, Emil)

Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-04-23 12:35:29 +10:00
Dave Airlie
a8420e2530 radv: mark const structs as extern in header file to avoid lto damage
The copr repo from che was using LTO and he reported radv broke
recently with it. When testing with lto builds here I noticed
that we weren't seeing any instance extensions reported.

It appears LTO was treating the const without extern as an empty
struct, this is possibly a gcc bug, but we can work around it
just by marking these with extern.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-04-23 05:55:22 +10:00
Dylan Baker
f8c4716854 Bump version after 18.1
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-04-22 09:35:56 -07:00
Ilia Mirkin
3f1cad48b8 gallium/tests/trivial: fix viewport depth transform
These were getting mapped off into outer space, which would cause nv50
and nvc0 to clip the primitives (as depth_clip was enabled).

These drivers are configured to clip everything outside the [0, 1]
range, even though the hardware supports other view settings.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-04-21 23:31:48 -04:00
Ilia Mirkin
fe8b6d7e1f trace: allow image resource to be null
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-04-21 23:29:39 -04:00
Karol Herbst
63572091b5 nv50/ir/ra: prefer def == src2 for fma with immediates on nvc0
This helps with the PostRALoadPropagation pass moving long immediates into
FMA/MAD instructions.

changes in shader-db:
total instructions in shared programs : 5894114 -> 5886074 (-0.14%)
total gprs used in shared programs    : 666558 -> 666563 (0.00%)
total shared used in shared programs  : 520416 -> 520416 (0.00%)
total local used in shared programs   : 53524 -> 53524 (0.00%)
total bytes used in shared programs   : 54006744 -> 53932472 (-0.14%)

                local     shared        gpr       inst      bytes
    helped           0           0           2        4192        4192
      hurt           0           0           7           9           9

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: minor edits to separate nv50 and nvc0+ cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-04-21 10:53:59 -04:00
Rhys Perry
cc35b76e99 docs/features: mark GL_ARB_post_depth_coverage as DONE for nvc0
This was done a while ago but never marked on features.txt. Note that
this is only supported on GM200+.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-04-21 10:02:55 -04:00
912 changed files with 53354 additions and 36157 deletions

View File

@@ -21,8 +21,8 @@ env:
- LIBXCB_VERSION=libxcb-1.13
- LIBXSHMFENCE_VERSION=libxshmfence-1.2
- LIBVDPAU_VERSION=libvdpau-1.1
- LIBVA_VERSION=libva-1.6.2
- LIBWAYLAND_VERSION=wayland-1.11.1
- LIBVA_VERSION=libva-1.7.0
- LIBWAYLAND_VERSION=wayland-1.15.0
- WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig:$HOME/prefix/share/pkgconfig
- LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"
@@ -33,18 +33,18 @@ matrix:
- env:
- LABEL="meson Vulkan"
- BUILD=meson
- MESON_OPTIONS="-Ddri-drivers= -Dgallium-drivers="
- LLVM_VERSION=4.0
- MESON_OPTIONS="-Ddri-drivers=[] -Dgallium-drivers=[]"
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
sources:
- llvm-toolchain-trusty-4.0
- llvm-toolchain-trusty-5.0
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- llvm-4.0-dev
- llvm-5.0-dev
# Common
- xz-utils
- libexpat1-dev
@@ -53,7 +53,7 @@ matrix:
- env:
- LABEL="meson loaders/classic DRI"
- BUILD=meson
- MESON_OPTIONS="-Dvulkan-drivers= -Dgallium-drivers="
- MESON_OPTIONS="-Dvulkan-drivers=[] -Dgallium-drivers=[]"
addons:
apt:
packages:
@@ -123,7 +123,7 @@ matrix:
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=4.0
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
@@ -134,12 +134,12 @@ matrix:
addons:
apt:
sources:
- llvm-toolchain-trusty-4.0
- llvm-toolchain-trusty-5.0
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- llvm-4.0-dev
- llvm-5.0-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
@@ -159,7 +159,7 @@ matrix:
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,freedreno,svga,swrast,vc4,virgl,etnaviv,imx"
- GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv,imx"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
@@ -231,7 +231,7 @@ matrix:
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600,radeonsi"
- GALLIUM_DRIVERS="r600"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
@@ -290,6 +290,39 @@ matrix:
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
# NOTE: Analogous to SWR above, building Clover is quite slow.
- LABEL="make Gallium ST Clover LLVM-6.0"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600,radeonsi"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
- llvm-toolchain-trusty-6.0
# llvm-6 depends on gcc-4.9 which is not in main repo
- ubuntu-toolchain-r-test
packages:
- libclc-dev
# From sources above
- llvm-6.0-dev
- clang-6.0
- libclang-6.0-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
- LABEL="make Gallium ST Other"
- BUILD=make
@@ -331,7 +364,7 @@ matrix:
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="make -C src/gtest check && make -C src/intel check"
- LLVM_VERSION=4.0
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl --with-platforms=x11,wayland"
- DRI_DRIVERS=""
@@ -342,12 +375,12 @@ matrix:
addons:
apt:
sources:
- llvm-toolchain-trusty-4.0
- llvm-toolchain-trusty-5.0
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- llvm-4.0-dev
- llvm-5.0-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
@@ -558,7 +591,9 @@ script:
export CFLAGS="$CFLAGS -isystem`pwd`";
./autogen.sh --enable-debug
mkdir build &&
cd build &&
../autogen.sh --enable-debug
$LIBUNWIND_FLAGS
$DRI_LOADERS
--with-dri-drivers=$DRI_DRIVERS

View File

@@ -73,6 +73,7 @@ LOCAL_CFLAGS += \
-DHAVE_ENDIAN_H \
-DHAVE_ZLIB \
-DMAJOR_IN_SYSMACROS \
-DVK_USE_PLATFORM_ANDROID_KHR \
-fvisibility=hidden \
-Wno-sign-compare

View File

@@ -77,6 +77,7 @@ noinst_HEADERS = \
include/drm-uapi/drm_mode.h \
include/drm-uapi/i915_drm.h \
include/drm-uapi/tegra_drm.h \
include/drm-uapi/v3d_drm.h \
include/drm-uapi/vc4_drm.h \
include/D3D9 \
include/GL/wglext.h \

10
PRESUBMIT.cfg Normal file
View File

@@ -0,0 +1,10 @@
# This sample config file disables all of the ChromiumOS source style checks.
# Comment out the disable-flags for any checks you want to leave enabled.
[Hook Overrides]
stray_whitespace_check: false
long_line_check: false
cros_license_check: false
tab_check: false
bug_field_check: false
test_field_check: false

79
README.rst Normal file
View File

@@ -0,0 +1,79 @@
`Mesa <https://mesa3d.org>`_ - The 3D Graphics Library
======================================================
Source
------
This repository lives at https://gitlab.freedesktop.org/mesa/mesa.
Other repositories are likely forks, and code found there is not supported.
Build status
------------
Travis:
.. image:: https://travis-ci.org/mesa3d/mesa.svg?branch=master
:target: https://travis-ci.org/mesa3d/mesa
Appveyor:
.. image:: https://img.shields.io/appveyor/ci/mesa3d/mesa.svg
:target: https://ci.appveyor.com/project/mesa3d/mesa
Coverity:
.. image:: https://scan.coverity.com/projects/139/badge.svg?flat=1
:target: https://scan.coverity.com/projects/mesa
Build & install
---------------
You can find more information in our documentation (`docs/install.html
<https://mesa3d.org/install.html>`_), but the recommended way is to use
Meson (`docs/meson.html <https://mesa3d.org/meson.html>`_):
.. code-block:: sh
$ mkdir build
$ cd build
$ meson ..
$ sudo ninja install
Support
-------
Many Mesa devs hang on IRC; if you're not sure which channel is
appropriate, you should ask your question on `Freenode's #dri-devel
<irc://chat.freenode.net#dri-devel>`_, someone will redirect you if
necessary.
Remember that not everyone is in the same timezone as you, so it might
take a while before someone qualified sees your question.
To figure out who you're talking to, or which nick to ping for your
question, check out `Who's Who on IRC
<https://dri.freedesktop.org/wiki/WhosWho/>`_.
The next best option is to ask your question in an email to the
mailing lists: `mesa-dev\@lists.freedesktop.org
<https://lists.freedesktop.org/mailman/listinfo/mesa-dev>`_
Bug reports
-----------
If you think something isn't working properly, please file a bug report
(`docs/bugs.html <https://mesa3d.org/bugs.html>`_).
Contributing
------------
Contributions are welcome, and step-by-step instructions can be found in our
documentation (`docs/submittingpatches.html
<https://mesa3d.org/submittingpatches.html>`_).
Note that Mesa uses email mailing-lists for patches submission, review and
discussions.

View File

@@ -116,6 +116,7 @@ MESON BUILD
R: Dylan Baker <dylan@pnwbakers.com>
R: Eric Engestrom <eric@engestrom.ch>
F: */meson.build
F: meson.build
F: meson_options.txt
ANDROID EGL SUPPORT

View File

@@ -1 +1 @@
18.1.0-devel
18.2.0-devel

View File

@@ -35,13 +35,13 @@ clone_depth: 100
cache:
- win_flex_bison-2.5.9.zip
- llvm-3.3.1-msvc2015-mtd.7z
- llvm-5.0.1-msvc2015-mtd.7z
os: Visual Studio 2015
environment:
WINFLEXBISON_ARCHIVE: win_flex_bison-2.5.9.zip
LLVM_ARCHIVE: llvm-3.3.1-msvc2015-mtd.7z
LLVM_ARCHIVE: llvm-5.0.1-msvc2015-mtd.7z
install:
# Check pip

View File

@@ -23,7 +23,7 @@ echo "<ul>"
echo ""
# extract fdo urls from commit log
git log $* | grep 'bugs.freedesktop.org/show_bug' | sed -e $trim_before | sort -n -u | sed -e $use_after |\
git log --pretty=medium $* | grep 'bugs.freedesktop.org/show_bug' | sed -e $trim_before | sort -n -u | sed -e $use_after |\
while read url
do
id=$(echo $url | cut -d'=' -f2)

View File

@@ -16,7 +16,7 @@ latest_branchpoint=`git merge-base origin/master HEAD`
git log --reverse --pretty=%H $latest_branchpoint > already_landed
# ... and the ones cherry-picked.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
git log --reverse --pretty=medium --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
@@ -38,7 +38,7 @@ do
# Place every "fixes:" tag on its own line and join with the next word
# on its line or a later one.
fixes=`git show -s $sha | tr -d "\n" | sed -e 's/fixes:[[:space:]]*/\nfixes:/Ig' | grep "fixes:" | sed -e 's/\(fixes:[a-zA-Z0-9]*\).*$/\1/'`
fixes=`git show --pretty=medium -s $sha | tr -d "\n" | sed -e 's/fixes:[[:space:]]*/\nfixes:/Ig' | grep "fixes:" | sed -e 's/\(fixes:[a-zA-Z0-9]*\).*$/\1/'`
# For each one try to extract the tag
fixes_count=`echo "$fixes" | wc -l`

View File

@@ -12,7 +12,7 @@
latest_branchpoint=`git merge-base origin/master HEAD`
# Grep for commits with "cherry picked from commit" in the commit message.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
git log --reverse --pretty=medium --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked

View File

@@ -1,6 +1,6 @@
#!/usr/bin/env python
# encoding=utf-8
# Copyright © 2017 Intel Corporation
# Copyright © 2017-2018 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -35,30 +35,34 @@ def main():
parser.add_argument('drivers', nargs='+')
args = parser.parse_args()
to = os.path.join(os.environ.get('MESON_INSTALL_DESTDIR_PREFIX'), args.libdir)
if os.path.isabs(args.libdir):
to = os.path.join(os.environ.get('DESTDIR', '/'), args.libdir[1:])
else:
to = os.path.join(os.environ['MESON_INSTALL_DESTDIR_PREFIX'], args.libdir)
master = os.path.join(to, os.path.basename(args.megadriver))
if not os.path.exists(to):
os.makedirs(to)
shutil.copy(args.megadriver, master)
for each in args.drivers:
driver = os.path.join(to, each)
for driver in args.drivers:
abs_driver = os.path.join(to, driver)
if os.path.exists(driver):
os.unlink(driver)
print('installing {} to {}'.format(args.megadriver, driver))
os.link(master, driver)
if os.path.exists(abs_driver):
os.unlink(abs_driver)
print('installing {} to {}'.format(args.megadriver, abs_driver))
os.link(master, abs_driver)
try:
ret = os.getcwd()
os.chdir(to)
name, ext = os.path.splitext(each)
name, ext = os.path.splitext(driver)
while ext != '.so':
if os.path.exists(name):
os.unlink(name)
os.symlink(each, name)
os.symlink(driver, name)
name, ext = os.path.splitext(name)
finally:
os.chdir(ret)

View File

@@ -78,17 +78,19 @@ LIBDRM_AMDGPU_REQUIRED=2.4.91
LIBDRM_INTEL_REQUIRED=2.4.75
LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66
LIBDRM_FREEDRENO_REQUIRED=2.4.91
LIBDRM_FREEDRENO_REQUIRED=2.4.92
LIBDRM_ETNAVIV_REQUIRED=2.4.89
LIBDRM_VC4_REQUIRED=2.4.89
dnl Versions for external dependencies
DRI2PROTO_REQUIRED=2.8
GLPROTO_REQUIRED=1.4.14
LIBOMXIL_BELLAGIO_REQUIRED=0.0
LIBOMXIL_TIZONIA_REQUIRED=0.10.0
LIBVA_REQUIRED=0.38.0
LIBVA_REQUIRED=0.39.0
VDPAU_REQUIRED=1.1
WAYLAND_REQUIRED=1.11
WAYLAND_EGL_BACKEND_REQUIRED=3
WAYLAND_PROTOCOLS_REQUIRED=1.8
XCB_REQUIRED=1.9.3
XCBDRI2_REQUIRED=1.8
@@ -106,8 +108,8 @@ dnl LLVM versions
LLVM_REQUIRED_GALLIUM=3.3.0
LLVM_REQUIRED_OPENCL=3.9.0
LLVM_REQUIRED_R600=3.9.0
LLVM_REQUIRED_RADEONSI=4.0.0
LLVM_REQUIRED_RADV=4.0.0
LLVM_REQUIRED_RADEONSI=5.0.0
LLVM_REQUIRED_RADV=5.0.0
LLVM_REQUIRED_SWR=4.0.0
dnl Check for progs
@@ -119,6 +121,7 @@ dnl other CC/CXX flags related help
AC_ARG_VAR([CXX11_CXXFLAGS], [Compiler flag to enable C++11 support (only needed if not
enabled by default and different from -std=c++11)])
AM_PROG_CC_C_O
AC_PROG_GREP
AC_PROG_NM
AM_PROG_AS
AX_CHECK_GNU_MAKE
@@ -433,26 +436,40 @@ fi
AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
AC_SUBST([SSE41_CFLAGS], $SSE41_CFLAGS)
dnl Check for new-style atomic builtins
AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
dnl Check for new-style atomic builtins. We first check without linking to
dnl -latomic.
AC_MSG_CHECKING(whether __atomic_load_n is supported)
AC_LINK_IFELSE([AC_LANG_SOURCE([[
#include <stdint.h>
int main() {
int n;
return __atomic_load_n(&n, __ATOMIC_ACQUIRE);
}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=1)
if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" = x1; then
struct {
uint64_t *v;
} x;
return (int)__atomic_load_n(x.v, __ATOMIC_ACQUIRE) &
(int)__atomic_add_fetch(x.v, (uint64_t)1, __ATOMIC_ACQ_REL);
}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=yes, GCC_ATOMIC_BUILTINS_SUPPORTED=no)
dnl If that didn't work, we try linking with -latomic, which is needed on some
dnl platforms.
if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" != xyes; then
save_LDFLAGS=$LDFLAGS
LDFLAGS="$LDFLAGS -latomic"
AC_LINK_IFELSE([AC_LANG_SOURCE([[
#include <stdint.h>
int main() {
struct {
uint64_t *v;
} x;
return (int)__atomic_load_n(x.v, __ATOMIC_ACQUIRE) &
(int)__atomic_add_fetch(x.v, (uint64_t)1, __ATOMIC_ACQ_REL);
}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=yes LIBATOMIC_LIBS="-latomic",
GCC_ATOMIC_BUILTINS_SUPPORTED=no)
LDFLAGS=$save_LDFLAGS
fi
AC_MSG_RESULT($GCC_ATOMIC_BUILTINS_SUPPORTED)
if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" = xyes; then
DEFINES="$DEFINES -DUSE_GCC_ATOMIC_BUILTINS"
dnl On some platforms, new-style atomics need a helper library
AC_MSG_CHECKING(whether -latomic is needed)
AC_LINK_IFELSE([AC_LANG_SOURCE([[
#include <stdint.h>
uint64_t v;
int main() {
return (int)__atomic_load_n(&v, __ATOMIC_ACQUIRE);
}]])], GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC=no, GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC=yes)
AC_MSG_RESULT($GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC)
if test "x$GCC_ATOMIC_BUILTINS_NEED_LIBATOMIC" = xyes; then
LIBATOMIC_LIBS="-latomic"
fi
fi
AC_SUBST([LIBATOMIC_LIBS])
@@ -746,21 +763,6 @@ esac
AC_SUBST([LIB_EXT])
dnl
dnl potentially-infringing-but-nobody-knows-for-sure stuff
dnl
AC_ARG_ENABLE([texture-float],
[AS_HELP_STRING([--enable-texture-float],
[enable floating-point textures and renderbuffers @<:@default=disabled@:>@])],
[enable_texture_float="$enableval"],
[enable_texture_float=no]
)
if test "x$enable_texture_float" = xyes; then
AC_MSG_WARN([Floating-point textures enabled.])
AC_MSG_WARN([Please consult docs/patents.txt with your lawyer before building Mesa.])
DEFINES="$DEFINES -DTEXTURE_FLOAT_ENABLED"
fi
dnl
dnl Arch/platform-specific settings
dnl
@@ -1359,7 +1361,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
AC_ARG_WITH([gallium-drivers],
[AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
[comma delimited Gallium drivers list, e.g.
"i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,tegra,vc4,vc5,virgl,etnaviv,imx"
"i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,tegra,v3d,vc4,virgl,etnaviv,imx"
@<:@default=r300,r600,svga,swrast@:>@])],
[with_gallium_drivers="$withval"],
[with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
@@ -1794,6 +1796,9 @@ for plat in $platforms; do
PKG_CHECK_MODULES([WAYLAND_CLIENT], [wayland-client >= $WAYLAND_REQUIRED])
PKG_CHECK_MODULES([WAYLAND_SERVER], [wayland-server >= $WAYLAND_REQUIRED])
PKG_CHECK_MODULES([WAYLAND_PROTOCOLS], [wayland-protocols >= $WAYLAND_PROTOCOLS_REQUIRED])
if test "x$enable_egl" = xyes; then
PKG_CHECK_MODULES([WAYLAND_EGL], [wayland-egl-backend >= $WAYLAND_EGL_BACKEND_REQUIRED])
fi
WAYLAND_PROTOCOLS_DATADIR=`$PKG_CONFIG --variable=pkgdatadir wayland-protocols`
PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland-scanner],
@@ -1826,6 +1831,9 @@ for plat in $platforms; do
android)
PKG_CHECK_MODULES([ANDROID], [cutils hardware sync])
if test -n "$with_gallium_drivers"; then
PKG_CHECK_MODULES([BACKTRACE], [backtrace])
fi
DEFINES="$DEFINES -DHAVE_ANDROID_PLATFORM"
;;
@@ -2085,6 +2093,9 @@ if test -n "$with_vulkan_drivers"; then
PKG_CHECK_MODULES([AMDGPU], [libdrm >= $LIBDRM_AMDGPU_REQUIRED libdrm_amdgpu >= $LIBDRM_AMDGPU_REQUIRED])
radeon_llvm_check $LLVM_REQUIRED_RADV "radv"
require_x11_dri3 "radv"
if test "x$acv_mako_found" = xno; then
AC_MSG_ERROR([Python mako module v$PYTHON_MAKO_REQUIRED or higher not found])
fi
HAVE_RADEON_VULKAN=yes
;;
*)
@@ -2714,20 +2725,20 @@ if test -n "$with_gallium_drivers"; then
;;
xvc4)
HAVE_GALLIUM_VC4=yes
require_libdrm "vc4"
PKG_CHECK_MODULES([VC4], [libdrm >= $LIBDRM_VC4_REQUIRED])
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes;
DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
[USE_VC4_SIMULATOR=no])
;;
xvc5)
HAVE_GALLIUM_VC5=yes
xv3d)
HAVE_GALLIUM_V3D=yes
PKG_CHECK_MODULES([VC5_SIMULATOR], [v3dv3],
[USE_VC5_SIMULATOR=yes;
DEFINES="$DEFINES -DUSE_VC5_SIMULATOR"],
[AC_MSG_ERROR([vc5 requires the simulator])])
PKG_CHECK_MODULES([V3D_SIMULATOR], [v3dv3],
[USE_V3D_SIMULATOR=yes;
DEFINES="$DEFINES -DUSE_V3D_SIMULATOR"],
[USE_V3D_SIMULATOR=no])
;;
xpl111)
HAVE_GALLIUM_PL111=yes
@@ -2879,8 +2890,8 @@ AM_CONDITIONAL(HAVE_GALLIUM_SWR, test "x$HAVE_GALLIUM_SWR" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SWRAST, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes -o \
"x$HAVE_GALLIUM_LLVMPIPE" = xyes -o \
"x$HAVE_GALLIUM_SWR" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_V3D, test "x$HAVE_GALLIUM_V3D" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VC5, test "x$HAVE_GALLIUM_VC5" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VIRGL, test "x$HAVE_GALLIUM_VIRGL" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test "x$enable_shared_pipe_drivers" = xno)
@@ -2908,7 +2919,7 @@ AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_RADEONSI" = xyes -o \
"x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_BROADCOM_DRIVERS, test "x$HAVE_GALLIUM_VC4" = xyes -o \
"x$HAVE_GALLIUM_VC5" = xyes)
"x$HAVE_GALLIUM_V3D" = xyes)
AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \
"x$HAVE_I965_DRI" = xyes)
@@ -2919,8 +2930,8 @@ AM_CONDITIONAL(NEED_RADEON_DRM_WINSYS, test "x$HAVE_GALLIUM_R300" = xyes -o \
AM_CONDITIONAL(NEED_WINSYS_XLIB, test "x$enable_glx" = xgallium-xlib)
AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_LLVM, test "x$enable_llvm" = xyes)
AM_CONDITIONAL(USE_V3D_SIMULATOR, test x$USE_V3D_SIMULATOR = xyes)
AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
AM_CONDITIONAL(USE_VC5_SIMULATOR, test x$USE_VC5_SIMULATOR = xyes)
AM_CONDITIONAL(HAVE_LIBDRM, test "x$have_libdrm" = xyes)
AM_CONDITIONAL(HAVE_OSMESA, test "x$enable_osmesa" = xyes)
@@ -2956,7 +2967,7 @@ AC_SUBST([XVMC_MAJOR], 1)
AC_SUBST([XVMC_MINOR], 0)
AC_SUBST([XA_MAJOR], 2)
AC_SUBST([XA_MINOR], 3)
AC_SUBST([XA_MINOR], 4)
AC_SUBST([XA_PATCH], 0)
AC_SUBST([XA_VERSION], "$XA_MAJOR.$XA_MINOR.$XA_PATCH")
@@ -3005,8 +3016,6 @@ AC_CONFIG_FILES([Makefile
src/egl/Makefile
src/egl/main/egl.pc
src/egl/wayland/wayland-drm/Makefile
src/egl/wayland/wayland-egl/Makefile
src/egl/wayland/wayland-egl/wayland-egl.pc
src/gallium/Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
@@ -3024,8 +3033,8 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/tegra/Makefile
src/gallium/drivers/etnaviv/Makefile
src/gallium/drivers/imx/Makefile
src/gallium/drivers/v3d/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/vc5/Makefile
src/gallium/drivers/virgl/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
@@ -3072,8 +3081,8 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
src/gallium/winsys/tegra/drm/Makefile
src/gallium/winsys/v3d/drm/Makefile
src/gallium/winsys/vc4/drm/Makefile
src/gallium/winsys/vc5/drm/Makefile
src/gallium/winsys/virgl/drm/Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
@@ -3109,6 +3118,7 @@ AC_CONFIG_FILES([Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile
src/util/tests/string_buffer/Makefile
src/util/tests/vma/Makefile
src/util/xmlpool/Makefile
src/vulkan/Makefile])

View File

@@ -83,7 +83,7 @@ We try to quote the OpenGL specification where prudent:
* "An INVALID_OPERATION error is generated for any of the following
* conditions:
*
* * <length> is zero."
* * &lt;length&gt; is zero."
*
* Additionally, page 94 of the PDF of the OpenGL 4.5 core spec
* (30.10.2014) also says this, so it's no longer allowed for desktop GL,
@@ -94,7 +94,7 @@ Function comment example:
<pre>
/**
* Create and initialize a new buffer object. Called via the
* ctx->Driver.CreateObject() driver callback function.
* ctx-&gt;Driver.CreateObject() driver callback function.
* \param name integer name of the object
* \param type one of GL_FOO, GL_BAR, etc.
* \return pointer to new object or NULL if error

View File

@@ -168,6 +168,7 @@ the X server directly using (XCB-)DRI2 protocol.</p>
<p>This driver can share DRI drivers with <code>libGL</code>.</p>
</dd>
</dl>
<h2>Packaging</h2>

BIN
docs/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

BIN
docs/favicon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

View File

@@ -36,7 +36,7 @@ context as extensions.
Feature Status
------------------------------------------------------- ------------------------
GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr, virgl
glBindFragDataLocation, glGetFragDataLocation DONE
GL_NV_conditional_render (Conditional rendering) DONE ()
@@ -68,7 +68,7 @@ GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llv
(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr, virgl
Forward compatible context support/deprecations DONE ()
GL_ARB_draw_instanced (Instanced drawing) DONE ()
@@ -81,7 +81,7 @@ GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llv
GL_EXT_texture_snorm (Signed normalized textures) DONE ()
GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr, virgl
Core/compatibility profiles DONE
Geometry shaders DONE ()
@@ -96,7 +96,7 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
GLX_ARB_create_context_profile DONE
GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, virgl
GL_ARB_blend_func_extended DONE (freedreno/a3xx, swr)
GL_ARB_explicit_attrib_location DONE (all drivers that support GLSL)
@@ -110,7 +110,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
GL_ARB_vertex_type_2_10_10_10_rev DONE (freedreno, swr)
GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
GL_ARB_draw_buffers_blend DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (freedreno, i965/gen7+, llvmpipe, softpipe, swr)
@@ -139,7 +139,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL_ARB_transform_feedback3 DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
GL_ARB_ES2_compatibility DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_get_program_binary DONE (0 or 1 binary formats)
@@ -151,17 +151,17 @@ GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL 4.2, GLSL 4.20 -- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL_ARB_texture_compression_bptc DONE (freedreno, i965)
GL_ARB_texture_compression_bptc DONE (freedreno, i965, virgl)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (freedreno/a5xx, i965, softpipe)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback_instanced DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_base_instance DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_shader_image_load_store DONE (freedreno/a5xx, i965, softpipe)
GL_ARB_conservative_depth DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_420pack DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_internalformat_query DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_internalformat_query DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_map_buffer_alignment DONE (all drivers)
@@ -174,17 +174,17 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL_ARB_copy_image DONE (i965, nv50, softpipe, llvmpipe)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, llvmpipe, softpipe)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, llvmpipe, softpipe, virgl)
GL_ARB_framebuffer_no_attachments DONE (freedreno, i965, softpipe)
GL_ARB_internalformat_query2 DONE (all drivers)
GL_ARB_invalidate_subdata DONE (all drivers)
GL_ARB_multi_draw_indirect DONE (freedreno, i965, llvmpipe, softpipe, swr)
GL_ARB_multi_draw_indirect DONE (freedreno, i965, llvmpipe, softpipe, swr, virgl)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior DONE (i965)
GL_ARB_shader_image_size DONE (freedreno/a5xx, i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx, i965, softpipe)
GL_ARB_stencil_texturing DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (freedreno, nv50, i965, llvmpipe)
GL_ARB_stencil_texturing DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_texture_buffer_range DONE (freedreno, nv50, i965, llvmpipe, virgl)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
@@ -205,17 +205,17 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
- input/output block locations DONE
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object DONE (i965/hsw+)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_stencil8 DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_texture_stencil8 DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi
GL_ARB_ES3_1_compatibility DONE (i965/hsw+, r600)
GL_ARB_clip_control DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_cull_distance DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_derivative_control DONE (i965, nv50, r600)
GL_ARB_conditional_render_inverted DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
GL_ARB_cull_distance DONE (i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
GL_ARB_derivative_control DONE (i965, nv50, r600, virgl)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, r600)
@@ -229,13 +229,13 @@ GL 4.6, GLSL 4.60
GL_ARB_gl_spirv in progress (Nicolai Hähnle, Ian Romanick)
GL_ARB_indirect_parameters DONE (i965/gen7+, nvc0, radeonsi)
GL_ARB_pipeline_statistics_query DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_polygon_offset_clamp DONE (freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, swr)
GL_ARB_polygon_offset_clamp DONE (freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, swr, virgl)
GL_ARB_shader_atomic_counter_ops DONE (freedreno/a5xx, i965/gen7+, nvc0, r600, radeonsi, softpipe)
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (i965, nvc0, radeonsi)
GL_ARB_spirv_extensions in progress (Nicolai Hähnle, Ian Romanick)
GL_ARB_texture_filter_anisotropic DONE (freedreno, i965, nv50, nvc0, r600, radeonsi, softpipe (*), llvmpipe (*))
GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+, nvc0, radeonsi, llvmpipe, softpipe)
GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+, nvc0, radeonsi, llvmpipe, softpipe, virgl)
GL_KHR_no_error DONE (all drivers)
(*) softpipe and llvmpipe advertise 16x anisotropy but simply ignore the setting
@@ -245,7 +245,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, r600, radeonsi
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_compute_shader DONE (freedreno/a5xx, i965/gen7+, softpipe)
GL_ARB_draw_indirect DONE (freedreno, i965/gen7+, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (freedreno, i965/gen7+, llvmpipe, softpipe, swr, virgl)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (freedreno, i965/gen7+, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
@@ -255,12 +255,12 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, r600, radeonsi
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx, i965/gen7+, softpipe)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (freedreno, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, nv50, llvmpipe, softpipe)
GL_ARB_stencil_texturing DONE (freedreno, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, nv50, llvmpipe, softpipe, virgl)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GS5 Enhanced textureGather DONE (freedreno, i965/gen7+,)
GS5 Packing/bitfield/conversion functions DONE (i965/gen6+)
GS5 Enhanced textureGather DONE (freedreno, i965/gen7+,virgl)
GS5 Packing/bitfield/conversion functions DONE (i965/gen6+,virgl)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functionality not covered above:
@@ -300,16 +300,16 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_cl_event not started
GL_ARB_compute_variable_group_size DONE (nvc0, radeonsi)
GL_ARB_ES3_2_compatibility DONE (i965/gen8+)
GL_ARB_fragment_shader_interlock not started
GL_ARB_fragment_shader_interlock DONE (i965)
GL_ARB_gpu_shader_int64 DONE (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
GL_ARB_parallel_shader_compile not started, but Chia-I Wu did some related work in 2014
GL_ARB_post_depth_coverage DONE (i965)
GL_ARB_post_depth_coverage DONE (i965, nvc0)
GL_ARB_robustness_isolation not started
GL_ARB_sample_locations not started
GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr)
GL_ARB_sample_locations DONE (nvc0)
GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
GL_ARB_shader_ballot DONE (i965/gen8+, nvc0, radeonsi)
GL_ARB_shader_clock DONE (i965/gen7+, nv50, nvc0, r600, radeonsi)
GL_ARB_shader_stencil_export DONE (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_stencil_export DONE (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr, virgl)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, nvc0, radeonsi)
GL_ARB_sparse_buffer DONE (radeonsi/CIK+)
GL_ARB_sparse_texture not started

View File

@@ -16,6 +16,47 @@
<h1>News</h1>
<h2>June 3, 2018</h2>
<p>
<a href="relnotes/18.0.5.html">Mesa 18.0.5</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 18.0.5 will be the final release in the
18.0 series. Users of 18.0 are encouraged to migrate to the 18.1
series in order to obtain future fixes.
</p>
<h2>June 1, 2018</h2>
<p>
<a href="relnotes/18.1.1.html">Mesa 18.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>May 18, 2018</h2>
<p>
<a href="relnotes/18.1.0.html">Mesa 18.1.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>May 17, 2018</h2>
<p>
<a href="relnotes/18.0.4.html">Mesa 18.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>May 7, 2018</h2>
<p>
<a href="relnotes/18.0.3.html">Mesa 18.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>April 28, 2018</h2>
<p>
<a href="relnotes/18.0.2.html">Mesa 18.0.2</a> is released.
This is a bug-fix release.
</p>
<h2>April 18, 2018</h2>
<p>
<a href="relnotes/18.0.1.html">Mesa 18.0.1</a> is released.

View File

@@ -24,10 +24,7 @@ for production</strong></p>
<p>The meson build is tested on on Linux, macOS, Cygwin and Haiku, it should
work on FreeBSD, DragonflyBSD, NetBSD, and OpenBSD.</p>
<p><strong>Mesa requires Meson >= 0.42.0 to build in general.</strong>
Additionaly, to build the Clover OpenCL state tracker or the OpenSWR driver
meson 0.44.0 or greater is required.
<p><strong>Mesa requires Meson >= 0.44.1 to build.</strong>
Some older versions of meson do not check that they are too old and will error
out in odd ways.
@@ -36,7 +33,7 @@ out in odd ways.
<p>
The meson program is used to configure the source directory and generates
either a ninja build file or Visual Studio® build files. The latter must
be enabled via the --backend switch, as ninja is the default backend on all
be enabled via the <code>--backend</code> switch, as ninja is the default backend on all
operating systems. Meson only supports out-of-tree builds, and must be passed a
directory to put built and generated sources into. We'll call that directory
"build" for examples.
@@ -52,7 +49,7 @@ along with a build directory to view the selected options for. This will show
your meson global arguments and project arguments, along with their defaults
and your local settings.
Moes does not currently support listing options before configure a build
Meson does not currently support listing options before configure a build
directory, but this feature is being discussed upstream.
</p>
@@ -63,13 +60,21 @@ directory, but this feature is being discussed upstream.
<p>
With additional arguments <code>meson configure</code> is used to change
options on already configured build directory. All options passed to this
command are in the form -D "command"="value".
command are in the form <code>-D "command"="value"</code>.
</p>
<pre>
meson configure build/ -Dprefix=/tmp/install -Dglx=true
</pre>
<p>
Note that options taking lists (such as <code>platforms</code>) are
<a href="http://mesonbuild.com/Build-options.html#using-build-options">a bit
more complicated</a>, but the simplest form compatible with Mesa options
is to use a comma to separate values (<code>-D platforms=drm,wayland</code>)
and brackets to represent an empty list (<code>-D platforms=[]</code>).
</p>
<p>
Once you've run the initial <code>meson</code> command successfully you can use
your configured backend to build the project. With ninja, the -C option can be
@@ -85,13 +90,14 @@ Without arguments, it will produce libGL.so and/or several other libraries
depending on the options you have chosen. Later, if you want to rebuild for a
different configuration, you should run <code>ninja clean</code> before
changing the configuration, or create a new out of tree build directory for
each configuration you want to build.
http://mesonbuild.com/Using-multiple-build-directories.html
each configuration you want to build
<a href="http://mesonbuild.com/Using-multiple-build-directories.html">as
recommended in the documentation</a>
</p>
<dl>
<dt><code>Environment Variables</code></dt>
<dd><p>Meson supports the standard CC and CXX envrionment variables for
<dd><p>Meson supports the standard CC and CXX environment variables for
changing the default compiler, and CFLAGS, CXXFLAGS, and LDFLAGS for setting
options to the compiler and linker.
@@ -102,9 +108,9 @@ the popular compilers, a complete list is available
These arguments are consumed and stored by meson when it is initialized or
re-initialized. Therefore passing them to meson configure will not do anything,
and passing them to ninja will only do something if ninja decides to
re-initialze meson, for example, if a meson.build file has been changed.
re-initialize meson, for example, if a meson.build file has been changed.
Changing these variables will not cause all targets to be rebuilt, so running
ninja clean is recomended when changing CFLAGS or CXXFLAGS. meson will never
ninja clean is recommended when changing CFLAGS or CXXFLAGS. Meson will never
change compiler in a configured build directory.
</p>
@@ -116,14 +122,13 @@ change compiler in a configured build directory.
CFLAGS=-Wno-typedef-redefinition ninja -C build-clang
</pre>
<p>Meson also honors DESTDIR for installs</p>
<p>Meson also honors <code>DESTDIR</code> for installs</p>
</dd>
<dl>
<dt><code>LLVM</code></dt>
<dd><p>Meson includes upstream logic to wrap llvm-config using it's standard
dependncy interface. It will search $PATH (or %PATH% on windows) for
dependency interface. It will search <code>$PATH</code> (or <code>%PATH%</code> on windows) for
llvm-config, so using an LLVM from a non-standard path is as easy as
<code>PATH=/path/with/llvm-config:$PATH meson build</code>.
</p></dd>
@@ -146,7 +151,7 @@ One of the oddities of meson is that some options are different when passed to
the <code>meson</code> than to <code>meson configure</code>. These options are
passed as --option=foo to <code>meson</code>, but -Doption=foo to <code>meson
configure</code>. Mesa defined options are always passed as -Doption=foo.
<p>
</p>
<p>For those coming from autotools be aware of the following:</p>
@@ -155,13 +160,13 @@ configure</code>. Mesa defined options are always passed as -Doption=foo.
<dd><p>This option will set the compiler debug/optimisation levels to aid
debugging the Mesa libraries.</p>
<p>Note that in meson this defaults to "debugoptimized", and not setting it to
"release" will yield non-optimal performance and binary size. Not using "debug"
may interfer with debbugging as some code and validation will be optimized
away.
<p>Note that in meson this defaults to <code>debugoptimized</code>, and
not setting it to <code>release</code> will yield non-optimal
performance and binary size. Not using <code>debug</code> may interfere
with debugging as some code and validation will be optimized away.
</p>
<p> For those wishing to pass their own optimization flags, use the "plain"
<p> For those wishing to pass their own optimization flags, use the <code>plain</code>
buildtype, which causes meson to inject no additional compiler arguments, only
those in the C/CXXFLAGS and those that mesa itself defines.</p>
</dd>
@@ -169,10 +174,14 @@ those in the C/CXXFLAGS and those that mesa itself defines.</p>
<dl>
<dt><code>-Db_ndebug</code></dt>
<dd><p>This option controls assertions in meson projects. When set to false
<dd><p>This option controls assertions in meson projects. When set to <code>false</code>
(the default) assertions are enabled, when set to true they are disabled. This
is unrelated to the <code>buildtype</code>; setting the latter to
<code>release</code> will not turn off assertions.
</p>
</dd>
</dl>
</div>
</body>
</html>

View File

@@ -1,31 +0,0 @@
ARB_texture_float:
Silicon Graphics, Inc. owns US Patent #6,650,327, issued November 18,
2003 [1].
SGI believes this patent contains necessary IP for graphics systems
implementing floating point rasterization and floating point
framebuffer capabilities described in ARB_texture_float extension, and
will discuss licensing on RAND terms, on an individual basis with
companies wishing to use this IP in the context of conformant OpenGL
implementations [2].
The source code to implement ARB_texture_float extension is included
and can be toggled on at compile time, for those who purchased a
license from SGI, or are in a country where the patent does not apply,
etc.
The software is provided "as is", without warranty of any kind, express
or implied, including but not limited to the warranties of
merchantability, fitness for a particular purpose and noninfringement.
In no event shall the authors or copyright holders be liable for any
claim, damages or other liability, whether in an action of contract,
tort or otherwise, arising from, out of or in connection with the
software or the use or other dealings in the software.
You should contact a lawyer or SGI's legal department if you want to
enable this extension.
[1] https://patents.google.com/patent/US6650327B1
[2] https://www.opengl.org/registry/specs/ARB/texture_float.txt

View File

@@ -24,10 +24,12 @@ Some Linux distributions closely follow the latest Mesa releases. On others one
has to use unofficial channels.
<br>
There are some general directions:
<ul>
<li>Debian/Ubuntu based distros - PPA: xorg-edgers, oibaf and padoka</li>
<li>Fedora - Corp: erp and che</li>
<li>OpenSuse/SLES - OBS: X11:XOrg and pontostroy:X11</li>
<li>Gentoo/Archlinux - officially provided/supported</li>
</ul>
</p>
</div>

View File

@@ -37,73 +37,36 @@ if you'd like to nominate a patch in the next stable release.
<th>Release</th>
<th>Release manager</th>
<th>Notes</th>
<tr>
<td rowspan="3">18.0</td>
<td>2018-04-20</td>
<td>18.0.2</td>
<td>Juan A. Suarez Romero</td>
<td></td>
</tr>
<tr>
<td>2018-05-04</td>
<td>18.0.3</td>
<td>Juan A. Suarez Romero</td>
<td></td>
</tr>
<tr>
<td>2018-05-18</td>
<td>18.0.4</td>
<td>Juan A. Suarez Romero</td>
<td>Last planned 18.0.x release</td>
</tr>
<tr>
<td rowspan="8">18.1</td>
<td>2018-04-20</td>
<td>18.1.0rc1</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-04-27</td>
<td>18.1.0rc2</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-05-04</td>
<td>18.1.0rc3</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-05-11</td>
<td>18.1.0rc4</td>
<td>Dylan Baker</td>
<td>Last planned RC/Final release</td>
</tr>
<tr>
<td>TBD</td>
<td>18.1.1</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>TBD</td>
<td>18.1.2</td>
<td>Emil Velikov</td>
<td></td>
</tr>
<tr>
<td>TBD</td>
<td>2018-06-29</td>
<td>18.1.3</td>
<td>Emil Velikov</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>TBD</td>
<td>2018-07-13</td>
<td>18.1.4</td>
<td>Emil Velikov</td>
<td>Last planned RC/Final release</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-07-27</td>
<td>18.1.5</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-08-10</td>
<td>18.1.6</td>
<td>Dylan Baker</td>
<td></td>
</tr>
<tr>
<td>2018-08-24</td>
<td>18.1.7</td>
<td>Dylan Baker</td>
<td>Last planned 18.1.x release</td>
</tr>
<tr>
<td rowspan="4">18.2</td>

View File

@@ -21,6 +21,13 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/18.1.2.html">18.1.2 release notes</a>
<li><a href="relnotes/18.0.5.html">18.0.5 release notes</a>
<li><a href="relnotes/18.1.1.html">18.1.1 release notes</a>
<li><a href="relnotes/18.1.0.html">18.1.0 release notes</a>
<li><a href="relnotes/18.0.4.html">18.0.4 release notes</a>
<li><a href="relnotes/18.0.3.html">18.0.3 release notes</a>
<li><a href="relnotes/18.0.2.html">18.0.2 release notes</a>
<li><a href="relnotes/18.0.1.html">18.0.1 release notes</a>
<li><a href="relnotes/17.3.9.html">17.3.9 release notes</a>
<li><a href="relnotes/17.3.8.html">17.3.8 release notes</a>

View File

@@ -48,8 +48,8 @@ Note: some of the new features are only available with certain drivers.
<li>Disk shader cache support for i965 when MESA_GLSL_CACHE_DISABLE environment variable is set to "0" or "false"</li>
<li>GL_ARB_shader_atomic_counters and GL_ARB_shader_atomic_counter_ops on r600/evergreen+</li>
<li>GL_ARB_shader_image_load_store and GL_ARB_shader_image_size on r600/evergreen+</li>
<li>GL_ARB_shader_storage_buffer_object on r600/evergreen+<li>
<li>GL_ARB_compute_shader on r600/evergreen+<li>
<li>GL_ARB_shader_storage_buffer_object on r600/evergreen+</li>
<li>GL_ARB_compute_shader on r600/evergreen+</li>
<li>GL_ARB_cull_distance on r600/evergreen+</li>
<li>GL_ARB_enhanced_layouts on r600/evergreen+</li>
<li>GL_ARB_bindless_texture on nvc0/kepler</li>

144
docs/relnotes/18.0.2.html Normal file
View File

@@ -0,0 +1,144 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.0.2 Release Notes / April 28, 2018</h1>
<p>
Mesa 18.0.2 is a bug fix release which fixes bugs found since the 18.0.1 release.
</p>
<p>
Mesa 18.0.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
SHA256: ffd8dfe3337b474a3baa085f0e7ef1a32c7cdc3bed1ad810b2633919a9324840 mesa-18.0.2.tar.gz
SHA256: 98fa159768482dc568b9f8bf0f36c7acb823fa47428ffd650b40784f16b9e7b3 mesa-18.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95009">Bug 95009</a> - [SNB] amd_shader_trinary_minmax.execution.built-in-functions.gs-mid3-ivec2-ivec2-ivec2 intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95012">Bug 95012</a> - [SNB] glsl-1_50.execution.built-in-functions.gs-op tests intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98281">Bug 98281</a> - 'message's in ctx-&gt;Debug.LogMessages[] seem to leak.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105320">Bug 105320</a> - Storage texel buffer access produces wrong results (RX Vega)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105775">Bug 105775</a> - SI reaches the maximum IB size in dwords and fail to submit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105994">Bug 105994</a> - surface state leak when creating and destroying image views with aspectMask depth and stencil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106074">Bug 106074</a> - radv: si_scissor_from_viewport returns incorrect result when using half-pixel viewport offset</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106126">Bug 106126</a> - eglMakeCurrent does not always ensure dri_drawable-&gt;update_drawable_info has been called for a new EGLSurface if another has been created and destroyed first</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>ac/nir: Make the GFX9 buffer size fix apply to image loads/atomics too.</li>
<li>radv: Mark GTT memory as device local for APUs.</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>bin/install_megadrivers: fix DESTDIR and -D*-path</li>
<li>meson: don't build classic mesa tests without dri_drivers</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>intel/compiler: Add scheduler deps for instructions that implicitly read g0</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*</li>
</ul>
<p>Johan Klokkhammer Helsing (1):</p>
<ul>
<li>st/dri: Fix dangling pointer to a destroyed dri_drawable</li>
</ul>
<p>Juan A. Suarez Romero (4):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.1</li>
<li>travis: radv needs LLVM 4.0</li>
<li>cherry-ignore: add explicit 18.1 only nominations</li>
<li>Update version to 18.0.2</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fix shadow batches to be the same size as the real BO.</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix number of planes for depth &amp; stencil</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: fix texture_format_needs_swiz</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>radeonsi/gfx9: fix a hang with an empty first IB</li>
<li>glsl_to_tgsi: try harder to lower unsupported ir_binop_vector_extract</li>
<li>Revert "st/dri: Fix dangling pointer to a destroyed dri_drawable"</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: fix scissor computation when using half-pixel viewport offset</li>
<li>radv/winsys: allow to submit up to 4 IBs for chips without chaining</li>
</ul>
<p>Thomas Hellstrom (1):</p>
<ul>
<li>svga: Fix incorrect advertizing of EGL_KHR_gl_colorspace</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: free debug messages when destroying the debug state</li>
</ul>
</div>
</body>
</html>

107
docs/relnotes/18.0.3.html Normal file
View File

@@ -0,0 +1,107 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.0.3 Release Notes / May 7, 2018</h1>
<p>
Mesa 18.0.3 is a bug fix release which fixes bugs found since the 18.0.2 release.
</p>
<p>
Mesa 18.0.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
58cc5c5b1ab2a44e6e47f18ef6c29836ad06f95450adce635ce3c317507a171b mesa-18.0.3.tar.gz
099d9667327a76a61741a533f95067d76ea71a656e66b91507b3c0caf1d49e30 mesa-18.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105374">Bug 105374</a> - texture3d, a SaschaWillems demo, assert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106147">Bug 106147</a> - SIGBUS in write_reloc() when Sacha Willems' &quot;texture3d&quot; Vulkan demo starts</li>
</ul>
<h2>Changes</h2>
<p>Andres Rodriguez (1):</p>
<ul>
<li>radv/winsys: fix leaking resources from bo's imported by fd</li>
</ul>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/vcn: fix mpeg4 msg buffer settings</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>gallium/util: Fix incorrect refcounting of separate stencil.</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>anv/allocator: Don't shrink either end of the block pool</li>
</ul>
<p>Juan A. Suarez Romero (3):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.2</li>
<li>cherry-ignore: add explicit 18.1 only nominations</li>
<li>Update version to 18.0.3</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>st/omx/enc: fix blit setup for YUV LoadImage</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>util/u_queue: fix a deadlock in util_queue_finish</li>
<li>radeonsi/gfx9: workaround for INTERP with indirect indexing</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>i965/tex_image: Avoid the ASTC LDR workaround on gen9lp</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: compute the number of subpass attachments correctly</li>
</ul>
</div>
</body>
</html>

157
docs/relnotes/18.0.4.html Normal file
View File

@@ -0,0 +1,157 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.0.4 Release Notes / May 17, 2018</h1>
<p>
Mesa 18.0.4 is a bug fix release which fixes bugs found since the 18.0.3 release.
</p>
<p>
Mesa 18.0.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
d1dc3469faccdd73439479426952d71a9e8f684e8d03b6687063c12b13430801 mesa-18.0.4.tar.gz
1f3bcfe7cef0a5c20dae2b41df5d7e0a985e06be0183fa4d43b6068fcba2920f mesa-18.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91808">Bug 91808</a> - trine1 misrender r600g</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100430">Bug 100430</a> - [radv] graphical glitches on dolphin emulator</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106243">Bug 106243</a> - [kbl] GPU HANG: 9:0:0x85dffffb, in Cinnamon</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106480">Bug 106480</a> - A2B10G10R10_SNORM vertex attribute doesn't work.</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Translate logic ops.</li>
<li>radv: Fix up 2_10_10_10 alpha sign.</li>
<li>radv: Disable texel buffers with A2 SNORM/SSCALED/SINT for pre-vega.</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>r600: fix constant buffer bounds.</li>
<li>radv: resolve all layers in compute resolve path.</li>
<li>radv: use compute path for multi-layer images.</li>
</ul>
<p>Deepak Rawat (1):</p>
<ul>
<li>egl/x11: Send invalidate to driver on copy_region path in swap_buffer</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>mesa: Add missing support for glFogiv(GL_FOG_DISTANCE_MODE_NV)</li>
</ul>
<p>Jan Vesely (8):</p>
<ul>
<li>clover: Add explicit virtual destructor to argument class</li>
<li>eg/compute: Drop reference on code_bo in destructor.</li>
<li>r600: Cleanup constant buffers on context destruction</li>
<li>eg/compute: Drop reference to kernel_param bo in destructor</li>
<li>pipe-loader: Free driver_name in error path</li>
<li>gallium/auxiliary: Add helper function to count the number of entries in hash table</li>
<li>winsys/radeon: Destroy fd_hash table when the last winsys is removed.</li>
<li>winsys/amdgpu: Destroy dev_hash table when the last winsys is removed.</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965,anv: Set the CS stall bit on the ISP disable PIPE_CONTROL</li>
</ul>
<p>Jose Maria Casanova Crespo (2):</p>
<ul>
<li>intel/compiler: fix 16-bit int brw_negate_immediate and brw_abs_immediate</li>
<li>intel/compiler: fix brw_imm_w for negative 16-bit integers</li>
</ul>
<p>Juan A. Suarez Romero (7):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.3</li>
<li>cherry-ignore: add explicit 18.1 only nominations</li>
<li>cherry-ignore: glsl: change ast_type_qualifier bitset size to work around GCC 5.4 bug</li>
<li>cherry-ignore: mesa: fix glGetInteger/Float/etc queries for vertex arrays attribs</li>
<li>cherry-ignore: mesa: revert GL_[SECONDARY_]COLOR_ARRAY_SIZE glGet type to TYPE_INT</li>
<li>cherry-ignore: radv/resolve: do fmask decompress on all layers.</li>
<li>Update version to 18.0.4</li>
</ul>
<p>Kai Wasserbäch (1):</p>
<ul>
<li>opencl: autotools: Fix linking order for OpenCL target</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Don't leak blorp on Gen4-5.</li>
</ul>
<p>Lionel Landwerlin (2):</p>
<ul>
<li>i965: require pixel scoreboard stall prior to ISP disable</li>
<li>anv: emit pixel scoreboard stall before ISP disable</li>
</ul>
<p>Matthew Nicholls (1):</p>
<ul>
<li>radv: fix multisample image copies</li>
</ul>
<p>Neil Roberts (1):</p>
<ul>
<li>spirv: Apply OriginUpperLeft to FragCoord</li>
</ul>
<p>Rhys Perry (1):</p>
<ul>
<li>mesa: fix error handling in get_framebuffer_parameteriv</li>
</ul>
<p>Ross Burton (1):</p>
<ul>
<li>src/intel/Makefile.vulkan.am: add missing MKDIR_GEN</li>
</ul>
</div>
</body>
</html>

162
docs/relnotes/18.0.5.html Normal file
View File

@@ -0,0 +1,162 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.0.5 Release Notes / June 3, 2018</h1>
<p>
Mesa 18.0.5 is a bug fix release which fixes bugs found since the 18.0.4 release.
</p>
<p>
Mesa 18.0.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
ea3e00329cea899b1e32db812fd2f426832be37e4baa2e2fd9288a3480f30531 mesa-18.0.5.tar.gz
5187bba8d72aea78f2062d134ec6079a508e8216062dce9ec9048b5eb2c4fc6b mesa-18.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=78097">Bug 78097</a> - glUniform1ui and friends not supported by display lists</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102390">Bug 102390</a> - centroid interpolation causes broken attribute values</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105351">Bug 105351</a> - [Gen6+] piglit's arb_shader_image_load_store-host-mem-barrier fails with a glGetTexSubImage fallback path</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106090">Bug 106090</a> - Compiling compute shader crashes RADV</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106315">Bug 106315</a> - The witness + dxvk suffers flickering garbage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106465">Bug 106465</a> - No test for Image Load/Store on format-incompatible texture buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106479">Bug 106479</a> - NDEBUG not defined for libamdgpu_addrlib</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106481">Bug 106481</a> - No test for Image Load/Store on texture buffer sized greater than MAX_TEXTURE_BUFFER_SIZE_ARB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106504">Bug 106504</a> - vulkan SPIR-V parsing failed at ../src/compiler/spirv/vtn_cfg.c:381</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106587">Bug 106587</a> - Dota2 is very dark when using vulkan render on a Intel &lt;&lt; AMD prime setup</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106629">Bug 106629</a> - [SNB,IVB,HSW,BDW] dEQP-EGL.functional.image.create.gles2_cubemap_negative_z_rgb_read_pixels</li>
</ul>
<h2>Changes</h2>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965/glk: Add l3 banks count for 2x6 configuration</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>amd/addrlib: Use defines in autotools build.</li>
<li>radv: Fix SRGB compute copies.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>tgsi/scan: add hw atomic to the list of memory accessing files</li>
</ul>
<p>Francisco Jerez (4):</p>
<ul>
<li>Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"</li>
<li>i965: Move buffer texture size calculation into a common helper function.</li>
<li>i965: Handle non-zero texture buffer offsets in buffer object range calculation.</li>
<li>i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>eg/compute: Use reference counting to handle compute memory pool.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>intel/eu: Set EXECUTE_1 when setting the rounding mode in cr0</li>
<li>intel/blorp: Support blits and clears on surfaces with offsets</li>
</ul>
<p>Jose Dapena Paz (1):</p>
<ul>
<li>mesa: do not leak ctx-&gt;Shader.ReferencedProgram references</li>
</ul>
<p>Juan A. Suarez Romero (8):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.4</li>
<li>cherry-ignore: i965/miptree: Fix handling of uninitialized MCS buffers</li>
<li>cherry-ignore: add explicit 18.1 only nominations</li>
<li>cherry-ignore: mesa/st: handle vert_attrib_mask in nir case too</li>
<li>cherry-ignore: Tegra is not supported</li>
<li>cherry-ignore: st/mesa: fix assertion failures with GL_UNSIGNED_INT64_ARB (v2)</li>
<li>cherry-ignore: nv30: ensure that displayable formats are marked accordingly</li>
<li>Update version to 18.0.5</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: simplify lastLevel determination in st_finalize_texture</li>
<li>radeonsi: fix incorrect parentheses around VS-PS varying elimination</li>
<li>mesa: handle GL_UNSIGNED_INT64_ARB properly (v2)</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>dri3: Stricter SBC wraparound handling</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>i965/miptree: Zero-initialize CCS_D buffers</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>spirv: fix visiting inner loops with same break/continue block</li>
<li>radv: fix centroid interpolation</li>
</ul>
<p>Stuart Young (1):</p>
<ul>
<li>etnaviv: Fix missing rnndb file in tarballs</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: add glUniform*ui{v} support to display lists</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.1.0 Release Notes / TBD</h1>
<h1>Mesa 18.1.0 Release Notes / May 18 2018</h1>
<p>
Mesa 18.1.0 is a new development release. People who are concerned
@@ -33,7 +33,8 @@ Compatibility contexts may report a lower version depending on each driver.
<h2>SHA256 checksums</h2>
<pre>
TBD.
b1c1dbb42597190503d3abc518b12de880623f097c6cb6c293ecf69ae87e6fbf mesa-18.1.0.tar.gz
c855c5b67ef993b7621f76d8b120769ec0415f1c3616eaff44ef7f7f300aceba mesa-18.1.0.tar.xz
</pre>
@@ -58,7 +59,201 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2>
<ul>
TBD
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90311">Bug 90311</a> - Fail to build libglx with clang at linking stage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91808">Bug 91808</a> - trine1 misrender r600g</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95009">Bug 95009</a> - [SNB] amd_shader_trinary_minmax.execution.built-in-functions.gs-mid3-ivec2-ivec2-ivec2 intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95012">Bug 95012</a> - [SNB] glsl-1_50.execution.built-in-functions.gs-op tests intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98281">Bug 98281</a> - 'message's in ctx-&gt;Debug.LogMessages[] seem to leak.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99549">Bug 99549</a> - pp: Failed to translate a shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100259">Bug 100259</a> - [EGL] [GBM] undefined reference to `gbm_bo_create_with_modifiers'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101408">Bug 101408</a> - [Gen8+] Xonotic fails to render one of the weapons</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101442">Bug 101442</a> - Piglit shaders&#64;ssa&#64;fs-if-def-else-break fails with sb but passes with R600_DEBUG=nosb</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102342">Bug 102342</a> - mesa-17.1.7/src/gallium/auxiliary/pipebuffer/pb_cache.c:169]: (style) Suspicious condition</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102542">Bug 102542</a> - mesa-17.2.0/src/gallium/state_trackers/nine/nine_ff.c:1938: bad assignment ?</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=102905">Bug 102905</a> - [R600] Miscompilation of TGSI to VLIW causes artifacts in Gallium Nine with Crysis2 bump mapping</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103006">Bug 103006</a> - [OpenGL CTS] [HSW] KHR-GL45.vertex_attrib_binding.basic-inputL-case1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103142">Bug 103142</a> - R600g+sb: optimizer apparently stuck in an endless loop</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103626">Bug 103626</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=103746">Bug 103746</a> - [BDW BSW SKL KBL] dEQP-GLES31.functional.copy_image regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104302">Bug 104302</a> - Wolfenstein 2 (2017) under wine graphical artifacting on RADV</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104335">Bug 104335</a> - [OpenGL CTS][SKL,KBL] KHR-GL45.vertex_attrib_64bit.limits_test occasionally fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104625">Bug 104625</a> - semicolon after if</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104636">Bug 104636</a> - [BSW/HD400] Aztec Ruins GL version GPU hangs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104642">Bug 104642</a> - Android: NULL pointer dereference with i965 mesa-dev, seems build_id_length related</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104654">Bug 104654</a> - r600/sb: Alien Isolation GPU lock</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104668">Bug 104668</a> - dEQP-GLES31.functional.shaders.linkage.uniform.block.differing_precision regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104717">Bug 104717</a> - Rocket League: grass rendering broken with nir</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104732">Bug 104732</a> - [radv] Binding descriptor sets disturbs other pipeline bindings</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104741">Bug 104741</a> - Graphic corruption for Android apps Telegram and KineMaster</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104762">Bug 104762</a> - Various segfaults/problems in qt/plasma</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104777">Bug 104777</a> - Attaching multiple shader objects for the same stage to a GLSL program triggers a linker error</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104794">Bug 104794</a> - piglit.spec.arb_internalformat_query2.samples and num_sample_counts pname checks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104803">Bug 104803</a> - SIGSEGV in state_tracker/st_glsl_to_tgsi_temprename.cpp</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104863">Bug 104863</a> - 186 assertions in piglit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104884">Bug 104884</a> - memory leak with intel i965 mesa when running android container in Ubuntu</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104905">Bug 104905</a> - SpvOpFOrdEqual doesn't return correct results for NaNs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104908">Bug 104908</a> - Texture Compression Hint not converted to enum16</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104915">Bug 104915</a> - Indexed SHADING_LANGUAGE_VERSION query not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104923">Bug 104923</a> - anv: Dota2 rendering corruption</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=104989">Bug 104989</a> - [r600] [bisected] OpenGL applications can't render anything at all</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105013">Bug 105013</a> - [regression] GLX+VA-API+clutter-gst video playback is corrupt with Mesa 17.3 (but is fine with 17.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105026">Bug 105026</a> - glxgears asserts with pp_jimenezmlaa=1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105029">Bug 105029</a> - simdlib_512_avx512.inl:371:57: error: could not convert _mm512_mask_blend_epi32((__mmask16)(ImmT), a, b) from __m512i {aka __vector(8) long long int} to SIMDImpl::SIMD512Impl::Float</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105052">Bug 105052</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105065">Bug 105065</a> - Qt Programs occasionally fail to render with new Mesa (glGetProgramBinary)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105067">Bug 105067</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105088">Bug 105088</a> - brw_nir_uniforms.cpp:256:10: error: non-constant-expression cannot be narrowed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105098">Bug 105098</a> - [RADV] GPU freeze with simple Vulkan App</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105103">Bug 105103</a> - Wayland master causes Mesa to fail to compile</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105120">Bug 105120</a> - meson build broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105161">Bug 105161</a> - KHR_blend_equation_advanced doesn't work in GLSL 1.10-1.40 shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105183">Bug 105183</a> - Weird assertion in NIR linker</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105211">Bug 105211</a> - build failure after zwp_dmabuf commit if wayland-protocols is not installed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105224">Bug 105224</a> - Webgl Pointclouds flickers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105229">Bug 105229</a> - [KBL SKL BDW HSW] [Regression] KHR-GLES31.core.shader_image_load_store.advanced-sso-simple failures</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105238">Bug 105238</a> - ast.h:648:16: error: union member 'i' has a non-trivial constructor</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105255">Bug 105255</a> - Waiting for fences without waitAll is not implemented</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105262">Bug 105262</a> - [R600] [BISECTED] ttf fonts are invisible in many programs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105271">Bug 105271</a> - WebGL2 shader crashes i965_dri.so 17.3.3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105274">Bug 105274</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105290">Bug 105290</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105292">Bug 105292</a> - vkGetQueryPoolResults returns incorrect query status for large query buffers (bisected)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105317">Bug 105317</a> - The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105320">Bug 105320</a> - Storage texel buffer access produces wrong results (RX Vega)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105374">Bug 105374</a> - texture3d, a SaschaWillems demo, assert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105436">Bug 105436</a> - Blinking textures in UT2004 [bisected]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105440">Bug 105440</a> - GEN7: rendering issue on citra</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105442">Bug 105442</a> - Hang when running nine ff lighting shader with radeonsi</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105444">Bug 105444</a> - Enable GL disk shader cache when transform feedback is enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105464">Bug 105464</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105471">Bug 105471</a> - [g33] [bisected] dEQP-GLES2.functional.shaders failures</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105497">Bug 105497</a> - shader-db crashes on 72 core system after ast_type_qualifier bitset change</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105529">Bug 105529</a> - u_debug_stack.c:268: error: #pragma GCC diagnostic not allowed inside functions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105567">Bug 105567</a> - meson/ninja: 1. mesa/vdpau incorrect symlinks in DESTDIR and 2. Ddri-drivers-path Dvdpau-libs-path overrides DESTDIR</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105621">Bug 105621</a> - Build failure on GNOME Continuous</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105634">Bug 105634</a> - Android build test fails when building brw_oa_metrics.c</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105670">Bug 105670</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105704">Bug 105704</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105717">Bug 105717</a> - [bisected] Mesa build tests fails: BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105737">Bug 105737</a> - st_tests_common.cpp:140:42: error: no matching function for call to 'tgsi_get_opcode_info'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105738">Bug 105738</a> - commit f7ffa504a065dc2631fd38cc5fe885b277f4e7e7 causes artifacting in radv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105740">Bug 105740</a> - glsl_types.cpp(524): error: a dynamically-initialized local static variable is not allowed inside of a statement expression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105775">Bug 105775</a> - SI reaches the maximum IB size in dwords and fail to submit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105807">Bug 105807</a> - [Regression, bisected]: 3D Rendering not working correctly in Warhammer 40k: Dawn of War II</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105817">Bug 105817</a> - scons build broken by glSpecializeShaderARB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105820">Bug 105820</a> - [m32] piglit regressions relinking program without shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105942">Bug 105942</a> - Graphical artefacts after update to mesa 18.0.0-2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105952">Bug 105952</a> - radv causes GPU hang on SI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105960">Bug 105960</a> - [bisected] meson build test fails with: undefined reference to `etna_pm_create_query'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=105994">Bug 105994</a> - surface state leak when creating and destroying image views with aspectMask depth and stencil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106074">Bug 106074</a> - radv: si_scissor_from_viewport returns incorrect result when using half-pixel viewport offset</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106126">Bug 106126</a> - eglMakeCurrent does not always ensure dri_drawable-&gt;update_drawable_info has been called for a new EGLSurface if another has been created and destroyed first</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106131">Bug 106131</a> - meson/ninja build missing file gtest.h</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106133">Bug 106133</a> - make check &quot;OSError: [Errno 24] Too many open files&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106147">Bug 106147</a> - SIGBUS in write_reloc() when Sacha Willems' &quot;texture3d&quot; Vulkan demo starts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106174">Bug 106174</a> - vulkan dota2 broken (segfaulting), found bug commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106180">Bug 106180</a> - [bisected] radv vulkan smoke test black screen (Add support for DRI3 v1.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106243">Bug 106243</a> - [kbl] GPU HANG: 9:0:0x85dffffb, in Cinnamon</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106450">Bug 106450</a> - </li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=106462">Bug 106462</a> - piglit.spec.arb_vertex_array_bgra.get regression</li>
</ul>
<h2>Changes</h2>

168
docs/relnotes/18.1.1.html Normal file
View File

@@ -0,0 +1,168 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.1.1 Release Notes / June 1 2018</h1>
<p>
Mesa 18.1.1 is a bug fix release which fixes bugs found since the 18.1.0 release.
</p>
<p>
Mesa 18.1.1 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
366a35f7530a016f2a8284fb0ee5759eeb216b4d6fa47f0e96b89ad2e43faf96 mesa-18.1.1.tar.gz
d3312a2ede5aac14a47476b208b8e3a401367838330197c4588ab8ad420d7781 mesa-18.1.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>None<p>
<h2>Changes</h2>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965/glk: Add l3 banks count for 2x6 configuration</li>
</ul>
<p>Bas Nieuwenhuizen (7):</p>
<ul>
<li>radv: Fix multiview queries.</li>
<li>radv: Translate logic ops.</li>
<li>radv: Fix up 2_10_10_10 alpha sign.</li>
<li>radv: Disable texel buffers with A2 SNORM/SSCALED/SINT for pre-vega.</li>
<li>amd/addrlib: Use defines in autotools build.</li>
<li>radv: Fix SRGB compute copies.</li>
<li>radv: Only expose subgroup shuffles on VI+.</li>
</ul>
<p>Christoph Haag (1):</p>
<ul>
<li>radv: fix VK_EXT_descriptor_indexing</li>
</ul>
<p>Dave Airlie (5):</p>
<ul>
<li>radv/resolve: do fmask decompress on all layers.</li>
<li>radv: resolve all layers in compute resolve path.</li>
<li>radv: use compute path for multi-layer images.</li>
<li>virgl: set texture buffer offset alignment to disable ARB_texture_buffer_range.</li>
<li>tgsi/scan: add hw atomic to the list of memory accessing files</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>docs: Add sha sums for release</li>
<li>VERSION: bump to 18.1.1 for next release</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>vulkan: don't free uninitialised memory</li>
</ul>
<p>Francisco Jerez (4):</p>
<ul>
<li>Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"</li>
<li>i965: Move buffer texture size calculation into a common helper function.</li>
<li>i965: Handle non-zero texture buffer offsets in buffer object range calculation.</li>
<li>i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nv30: ensure that displayable formats are marked accordingly</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>eg/compute: Use reference counting to handle compute memory pool.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>intel/eu: Set EXECUTE_1 when setting the rounding mode in cr0</li>
<li>intel/blorp: Support blits and clears on surfaces with offsets</li>
</ul>
<p>Jose Dapena Paz (1):</p>
<ul>
<li>mesa: do not leak ctx-&gt;Shader.ReferencedProgram references</li>
</ul>
<p>Kai Wasserbäch (1):</p>
<ul>
<li>opencl: autotools: Fix linking order for OpenCL target</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: simplify lastLevel determination in st_finalize_texture</li>
<li>radeonsi: fix incorrect parentheses around VS-PS varying elimination</li>
<li>mesa: handle GL_UNSIGNED_INT64_ARB properly (v2)</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>dri3: Stricter SBC wraparound handling</li>
</ul>
<p>Nanley Chery (4):</p>
<ul>
<li>i965: Add and use a getter for the miptree aux buffer</li>
<li>i965: Add and use a single miptree aux_buf field</li>
<li>i965/miptree: Fix handling of uninitialized MCS buffers</li>
<li>i965/miptree: Zero-initialize CCS_D buffers</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>spirv: fix visiting inner loops with same break/continue block</li>
<li>radv: fix centroid interpolation</li>
</ul>
<p>Stuart Young (1):</p>
<ul>
<li>etnaviv: Fix missing rnndb file in tarballs</li>
</ul>
<p>Thierry Reding (3):</p>
<ul>
<li>tegra: Treat resources with modifiers as scanout</li>
<li>tegra: Fix scanout resources without modifiers</li>
<li>tegra: Remove usage of non-stable UAPI</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: add glUniform*ui{v} support to display lists</li>
</ul>
</div>
</body>
</html>

170
docs/relnotes/18.1.2.html Normal file
View File

@@ -0,0 +1,170 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.1.2 Release Notes / June 15 2018</h1>
<p>
Mesa 18.1.2 is a bug fix release which fixes bugs found since the 18.1.1 release.
</p>
<p>
Mesa 18.1.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
a644df23937f4078a2bd9a54349f6315c1955f5e3a4ac272832da51dea4d3c11 mesa-18.1.1.tar.gz
070bf0648ba5b242d7303ceed32aed80842f4c0ba16e5acc1a650a46eadfb1f9 mesa-18.1.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>None<p>
<h2>Changes</h2>
<p>Alex Smith (4):</p>
<ul>
<li>radv: Consolidate GFX9 merged shader lookup logic</li>
<li>radv: Handle GFX9 merged shaders in radv_flush_constants()</li>
<li>radeonsi: Fix crash on shaders using MSAA image load/store</li>
<li>radv: Set active_stages the same whether or not shaders were cached</li>
</ul>
<p>Andrew Galante (2):</p>
<ul>
<li>meson: Test for __atomic_add_fetch in atomic checks</li>
<li>configure.ac: Test for __atomic_add_fetch in atomic checks</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.</li>
</ul>
<p>Cameron Kumar (1):</p>
<ul>
<li>vulkan/wsi: Destroy swapchain images after terminating FIFO queues</li>
</ul>
<p>Dylan Baker (6):</p>
<ul>
<li>docs/relnotes: Add sha256 sums for mesa 18.1.1</li>
<li>cherry-ignore: add commits not to pull</li>
<li>cherry-ignore: Add patches from Jason that he rebased on 18.1</li>
<li>meson: work around gentoo applying -m32 to host compiler in cross builds</li>
<li>cherry-ignore: Add another patch</li>
<li>version: bump version for 18.1.2 release</li>
</ul>
<p>Eric Engestrom (3):</p>
<ul>
<li>autotools: add missing android file to package</li>
<li>configure: radv depends on mako</li>
<li>i965: fix resource leak</li>
</ul>
<p>Jason Ekstrand (10):</p>
<ul>
<li>intel/eu: Add some brw_get_default_ helpers</li>
<li>intel/eu: Copy fields manually in brw_next_insn</li>
<li>intel/eu: Set flag [sub]register number differently for 3src</li>
<li>intel/blorp: Don't vertex fetch directly from clear values</li>
<li>intel/isl: Add bounds-checking assertions in isl_format_get_layout</li>
<li>intel/isl: Add bounds-checking assertions for the format_info table</li>
<li>i965/screen: Refactor query_dma_buf_formats</li>
<li>i965/screen: Use RGBA non-sRGB formats for images</li>
<li>anv: Set fence/semaphore types to NONE in impl_cleanup</li>
<li>i965/screen: Return false for unsupported formats in query_modifiers</li>
</ul>
<p>Jordan Justen (1):</p>
<ul>
<li>mesa/program_binary: add implicit UseProgram after successful ProgramBinary</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>glsl: Add ir_binop_vector_extract in NIR</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Fix batch-last mode to properly swap BOs.</li>
<li>anv: Disable __gen_validate_value if NDEBUG is set.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>r300g/swtcl: make pipe_context uploaders use malloc'd memory as before</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>meson: Fix -latomic check</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>glx: Fix number of property values to read in glXImportContextEXT</li>
</ul>
<p>Nicolas Boichat (1):</p>
<ul>
<li>configure.ac/meson.build: Fix -latomic test</li>
</ul>
<p>Philip Rebohle (1):</p>
<ul>
<li>radv: Use correct color format for fast clears</li>
</ul>
<p>Samuel Pitoiset (3):</p>
<ul>
<li>radv: fix a GPU hang when MRTs are sparse</li>
<li>radv: fix missing ZRANGE_PRECISION(1) for GFX9+</li>
<li>radv: add a workaround for DXVK hangs by setting amdgpu-skip-threshold</li>
</ul>
<p>Scott D Phillips (1):</p>
<ul>
<li>intel/tools: add intel_sanitize_gpu to EXTRA_DIST</li>
</ul>
<p>Thomas Petazzoni (1):</p>
<ul>
<li>configure.ac: rework -latomic check</li>
</ul>
<p>Timothy Arceri (2):</p>
<ul>
<li>ac: fix possible truncation of intrinsic name</li>
<li>radeonsi: fix possible truncation on renderer string</li>
</ul>
</div>
</body>
</html>

72
docs/relnotes/18.2.0.html Normal file
View File

@@ -0,0 +1,72 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.2.0 Release Notes / TBD</h1>
<p>
Mesa 18.2.0 is a new development release. People who are concerned
with stability and reliability should stick with a previous release or
wait for Mesa 18.2.1.
</p>
<p>
Mesa 18.2.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<p>
libwayland-egl is now distributed by Wayland (since 1.15,
<a href="https://lists.freedesktop.org/archives/wayland-devel/2018-April/037767.html">see announcement</a>),
and has been removed from Mesa in this release. Make sure you're using
an up-to-date version of Wayland to keep the functionality.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_fragment_shader_interlock on i965</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li>GL_ARB_sample_locations and GL_NV_sample_locations on nvc0 (GM200+)</li>
</ul>
<h2>Changes</h2>
<ul>
<li>Removed GL_EXT_polygon_offset applications should use glPolygonOffset instead.</li>
<li>Removed libwayland-egl, now part of Wayland</li>
</ul>
</div>
</body>
</html>

View File

@@ -122,9 +122,9 @@ Please use common sense and do <strong>not</strong> blindly add everyone.
<pre>
$ scripts/get_reviewer.pl --help # to get the help screen
$ scripts/get_reviewer.pl -f src/egl/drivers/dri2/platform_android.c
Rob Herring <robh@kernel.org> (reviewer:ANDROID EGL SUPPORT,added_lines:188/700=27%,removed_lines:58/283=20%)
Tomasz Figa <tfiga@chromium.org> (reviewer:ANDROID EGL SUPPORT,authored:12/41=29%,added_lines:308/700=44%,removed_lines:115/283=41%)
Emil Velikov <emil.l.velikov@gmail.com> (authored:13/41=32%,removed_lines:76/283=27%)
Rob Herring &lt;robh@kernel.org&gt; (reviewer:ANDROID EGL SUPPORT,added_lines:188/700=27%,removed_lines:58/283=20%)
Tomasz Figa &lt;tfiga@chromium.org&gt; (reviewer:ANDROID EGL SUPPORT,authored:12/41=29%,added_lines:308/700=44%,removed_lines:115/283=41%)
Emil Velikov &lt;emil.l.velikov@gmail.com&gt; (authored:13/41=32%,removed_lines:76/283=27%)
</pre>
</ul>

View File

@@ -31,7 +31,7 @@
<dd>is a very useful tool for tracking down
memory-related problems in your code.</dd>
<dt><a href="https://scan.coverity.com/projects/mesa">Coverity</a><dt>
<dt><a href="https://scan.coverity.com/projects/mesa">Coverity</a></dt>
<dd>provides static code analysis of Mesa. If you create an account
you can see the results and try to fix outstanding issues.</dd>
</dl>

View File

@@ -18,8 +18,8 @@
<p>
This page lists known issues with
<a href="https://www.spec.org/gwpg/gpc.static/vp11info.html" target="_main">SPEC Viewperf 11</a>
and <a href="https://www.spec.org/gwpg/gpc.static/vp12info.html" target="_main">SPEC Viewperf 12</a>
<a href="https://www.spec.org/gwpg/gpc.static/vp11info.html">SPEC Viewperf 11</a>
and <a href="https://www.spec.org/gwpg/gpc.static/vp12info.html">SPEC Viewperf 12</a>
when running on Mesa-based drivers.
</p>
@@ -66,13 +66,10 @@ either in Viewperf or the Mesa driver.
<p>
These tests use features of the
<a href="https://www.opengl.org/registry/specs/NV/fragment_program2.txt"
target="_main">
GL_NV_fragment_program2</a> and
<a href="https://www.opengl.org/registry/specs/NV/vertex_program3.txt"
target="_main">
GL_NV_vertex_program3</a> extensions without checking if the driver supports
them.
<a href="https://www.opengl.org/registry/specs/NV/fragment_program2.txt">GL_NV_fragment_program2</a>
and
<a href="https://www.opengl.org/registry/specs/NV/vertex_program3.txt">GL_NV_vertex_program3</a>
extensions without checking if the driver supports them.
</p>
<p>
When Mesa tries to compile the vertex/fragment programs it generates errors
@@ -86,8 +83,8 @@ Subsequent drawing calls become no-ops and the rendering is incorrect.
<p>
These tests depend on the
<a href="https://www.opengl.org/registry/specs/NV/primitive_restart.txt"
target="_main">GL_NV_primitive_restart</a> extension.
<a href="https://www.opengl.org/registry/specs/NV/primitive_restart.txt">GL_NV_primitive_restart</a>
extension.
</p>
<p>
@@ -124,7 +121,7 @@ never specified.
<p>
A trace captured with
<a href="https://github.com/apitrace/apitrace" target="_main">API trace</a>
<a href="https://github.com/apitrace/apitrace">API trace</a>
shows this sequences of calls like this:
<pre>

View File

@@ -48,6 +48,7 @@ typedef unsigned int drm_drawable_t;
typedef struct drm_clip_rect drm_clip_rect_t;
#endif
#include <stdbool.h>
#include <stdint.h>
/**
@@ -589,7 +590,7 @@ struct __DRIdamageExtensionRec {
* SWRast Loader extension.
*/
#define __DRI_SWRAST_LOADER "DRI_SWRastLoader"
#define __DRI_SWRAST_LOADER_VERSION 3
#define __DRI_SWRAST_LOADER_VERSION 4
struct __DRIswrastLoaderExtensionRec {
__DRIextension base;
@@ -631,6 +632,24 @@ struct __DRIswrastLoaderExtensionRec {
void (*getImage2)(__DRIdrawable *readable,
int x, int y, int width, int height, int stride,
char *data, void *loaderPrivate);
/**
* Put shm image to drawable
*
* \since 4
*/
void (*putImageShm)(__DRIdrawable *drawable, int op,
int x, int y, int width, int height, int stride,
int shmid, char *shmaddr, unsigned offset,
void *loaderPrivate);
/**
* Get shm image from readable
*
* \since 4
*/
void (*getImageShm)(__DRIdrawable *readable,
int x, int y, int width, int height,
int shmid, void *loaderPrivate);
};
/**
@@ -728,7 +747,8 @@ struct __DRIuseInvalidateExtensionRec {
#define __DRI_ATTRIB_BIND_TO_TEXTURE_TARGETS 46
#define __DRI_ATTRIB_YINVERTED 47
#define __DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE 48
#define __DRI_ATTRIB_MAX (__DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE + 1)
#define __DRI_ATTRIB_MUTABLE_RENDER_BUFFER 49 /* EGL_MUTABLE_RENDER_BUFFER_BIT_KHR */
#define __DRI_ATTRIB_MAX 50
/* __DRI_ATTRIB_RENDER_TYPE */
#define __DRI_ATTRIB_RGBA_BIT 0x01
@@ -1253,6 +1273,7 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FORMAT_YUYV 0x100f
#define __DRI_IMAGE_FORMAT_XBGR2101010 0x1010
#define __DRI_IMAGE_FORMAT_ABGR2101010 0x1011
#define __DRI_IMAGE_FORMAT_SABGR8 0x1012
#define __DRI_IMAGE_USE_SHARE 0x0001
#define __DRI_IMAGE_USE_SCANOUT 0x0002
@@ -1289,6 +1310,7 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FOURCC_ABGR8888 0x34324241
#define __DRI_IMAGE_FOURCC_XBGR8888 0x34324258
#define __DRI_IMAGE_FOURCC_SARGB8888 0x83324258
#define __DRI_IMAGE_FOURCC_SABGR8888 0x84324258
#define __DRI_IMAGE_FOURCC_ARGB2101010 0x30335241
#define __DRI_IMAGE_FOURCC_XRGB2101010 0x30335258
#define __DRI_IMAGE_FOURCC_ABGR2101010 0x30334241
@@ -1868,9 +1890,57 @@ struct __DRI2rendererQueryExtensionRec {
* Image Loader extension. Drivers use this to allocate color buffers
*/
/**
* See __DRIimageLoaderExtensionRec::getBuffers::buffer_mask.
*/
enum __DRIimageBufferMask {
__DRI_IMAGE_BUFFER_BACK = (1 << 0),
__DRI_IMAGE_BUFFER_FRONT = (1 << 1)
__DRI_IMAGE_BUFFER_FRONT = (1 << 1),
/**
* A buffer shared between application and compositor. The buffer may be
* simultaneously accessed by each.
*
* A shared buffer is equivalent to an EGLSurface whose EGLConfig contains
* EGL_MUTABLE_RENDER_BUFFER_BIT_KHR and whose active EGL_RENDER_BUFFER (as
* opposed to any pending, requested change to EGL_RENDER_BUFFER) is
* EGL_SINGLE_BUFFER.
*
* If buffer_mask contains __DRI_IMAGE_BUFFER_SHARED, then must contains no
* other bits. As a corollary, a __DRIdrawable that has a "shared" buffer
* has no front nor back buffer.
*
* The loader returns __DRI_IMAGE_BUFFER_SHARED in buffer_mask if and only
* if:
* - The loader supports __DRI_MUTABLE_RENDER_BUFFER_LOADER.
* - The driver supports __DRI_MUTABLE_RENDER_BUFFER_DRIVER.
* - The EGLConfig of the drawable EGLSurface contains
* EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.
* - The EGLContext's EGL_RENDER_BUFFER is EGL_SINGLE_BUFFER.
* Equivalently, the EGLSurface's active EGL_RENDER_BUFFER (as
* opposed to any pending, requested change to EGL_RENDER_BUFFER) is
* EGL_SINGLE_BUFFER. (See the EGL 1.5 and
* EGL_KHR_mutable_render_buffer spec for details about "pending" vs
* "active" EGL_RENDER_BUFFER state).
*
* A shared buffer is similar to a front buffer in that all rendering to the
* buffer should appear promptly on the screen. It is different from
* a front buffer in that its behavior is independent from the
* GL_DRAW_BUFFER state. Specifically, if GL_DRAW_FRAMEBUFFER is 0 and the
* __DRIdrawable's buffer_mask is __DRI_IMAGE_BUFFER_SHARED, then all
* rendering should appear promptly on the screen if GL_DRAW_BUFFER is not
* GL_NONE.
*
* The difference between a shared buffer and a front buffer is motivated
* by the constraints of Android and OpenGL ES. OpenGL ES does not support
* front-buffer rendering. Android's SurfaceFlinger protocol provides the
* EGL driver only a back buffer and no front buffer. The shared buffer
* mode introduced by EGL_KHR_mutable_render_buffer is a backdoor though
* EGL that allows Android OpenGL ES applications to render to what is
* effectively the front buffer, a backdoor that required no change to the
* OpenGL ES API and little change to the SurfaceFlinger API.
*/
__DRI_IMAGE_BUFFER_SHARED = (1 << 2),
};
struct __DRIimageList {
@@ -1895,7 +1965,8 @@ struct __DRIimageLoaderExtensionRec {
* \param stamp Address of variable to be updated when
* getBuffers must be called again
* \param loaderPrivate The loaderPrivate for driDrawable
* \param buffer_mask Set of buffers to allocate
* \param buffer_mask Set of buffers to allocate. A bitmask of
* __DRIimageBufferMask.
* \param buffers Returned buffers
*/
int (*getBuffers)(__DRIdrawable *driDrawable,
@@ -2009,4 +2080,85 @@ struct __DRIbackgroundCallableExtensionRec {
GLboolean (*isThreadSafe)(void *loaderPrivate);
};
/**
* The driver portion of EGL_KHR_mutable_render_buffer.
*
* If the driver creates a __DRIconfig with
* __DRI_ATTRIB_MUTABLE_RENDER_BUFFER, then it must support this extension.
*
* To support this extension:
*
* - The driver should create at least one __DRIconfig with
* __DRI_ATTRIB_MUTABLE_RENDER_BUFFER. This is strongly recommended but
* not required.
*
* - The driver must be able to handle __DRI_IMAGE_BUFFER_SHARED if
* returned by __DRIimageLoaderExtension:getBuffers().
*
* - When rendering to __DRI_IMAGE_BUFFER_SHARED, it must call
* __DRImutableRenderBufferLoaderExtension::displaySharedBuffer() in
* response to glFlush and glFinish. (This requirement is not documented
* in EGL_KHR_mutable_render_buffer, but is a de-facto requirement in the
* Android ecosystem. Android applications expect that glFlush will
* immediately display the buffer when in shared buffer mode, and Android
* drivers comply with this expectation). It :may: call
* displaySharedBuffer() more often than required.
*
* - When rendering to __DRI_IMAGE_BUFFER_SHARED, it must ensure that the
* buffer is always in a format compatible for display because the
* display engine (usually SurfaceFlinger or hwcomposer) may display the
* image at any time, even concurrently with 3D rendering. For example,
* display hardware and the GL hardware may be able to access the buffer
* simultaneously. In particular, if the buffer is compressed then take
* care that SurfaceFlinger and hwcomposer can consume the compression
* format.
*
* \see __DRI_IMAGE_BUFFER_SHARED
* \see __DRI_ATTRIB_MUTABLE_RENDER_BUFFER
* \see __DRI_MUTABLE_RENDER_BUFFER_LOADER
*/
#define __DRI_MUTABLE_RENDER_BUFFER_DRIVER "DRI_MutableRenderBufferDriver"
#define __DRI_MUTABLE_RENDER_BUFFER_DRIVER_VERSION 1
typedef struct __DRImutableRenderBufferDriverExtensionRec __DRImutableRenderBufferDriverExtension;
struct __DRImutableRenderBufferDriverExtensionRec {
__DRIextension base;
};
/**
* The loader portion of EGL_KHR_mutable_render_buffer.
*
* Requires loader extension DRI_IMAGE_LOADER, through which the loader sends
* __DRI_IMAGE_BUFFER_SHARED to the driver.
*
* \see __DRI_MUTABLE_RENDER_BUFFER_DRIVER
*/
#define __DRI_MUTABLE_RENDER_BUFFER_LOADER "DRI_MutableRenderBufferLoader"
#define __DRI_MUTABLE_RENDER_BUFFER_LOADER_VERSION 1
typedef struct __DRImutableRenderBufferLoaderExtensionRec __DRImutableRenderBufferLoaderExtension;
struct __DRImutableRenderBufferLoaderExtensionRec {
__DRIextension base;
/**
* Inform the display engine (that is, SurfaceFlinger and/or hwcomposer)
* that the __DRIdrawable has new content.
*
* The display engine may ignore this call, for example, if it continually
* refreshes and displays the buffer on every frame, as in
* EGL_ANDROID_front_buffer_auto_refresh. On the other extreme, the display
* engine may refresh and display the buffer only in frames in which the
* driver calls this.
*
* If the fence_fd is not -1, then the display engine will display the
* buffer only after the fence signals.
*
* The drawable's current __DRIimageBufferMask, as returned by
* __DRIimageLoaderExtension::getBuffers(), must be
* __DRI_IMAGE_BUFFER_SHARED.
*/
void (*displaySharedBuffer)(__DRIdrawable *drawable, int fence_fd,
void *loaderPrivate);
};
#endif

View File

@@ -1,5 +1,5 @@
/*
* Copyright © 2014-2017 Broadcom
* Copyright © 2014-2018 Broadcom
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
@@ -21,8 +21,8 @@
* IN THE SOFTWARE.
*/
#ifndef _VC5_DRM_H_
#define _VC5_DRM_H_
#ifndef _V3D_DRM_H_
#define _V3D_DRM_H_
#include "drm.h"
@@ -30,28 +30,28 @@
extern "C" {
#endif
#define DRM_VC5_SUBMIT_CL 0x00
#define DRM_VC5_WAIT_BO 0x01
#define DRM_VC5_CREATE_BO 0x02
#define DRM_VC5_MMAP_BO 0x03
#define DRM_VC5_GET_PARAM 0x04
#define DRM_VC5_GET_BO_OFFSET 0x05
#define DRM_V3D_SUBMIT_CL 0x00
#define DRM_V3D_WAIT_BO 0x01
#define DRM_V3D_CREATE_BO 0x02
#define DRM_V3D_MMAP_BO 0x03
#define DRM_V3D_GET_PARAM 0x04
#define DRM_V3D_GET_BO_OFFSET 0x05
#define DRM_IOCTL_VC5_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_SUBMIT_CL, struct drm_vc5_submit_cl)
#define DRM_IOCTL_VC5_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_WAIT_BO, struct drm_vc5_wait_bo)
#define DRM_IOCTL_VC5_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_CREATE_BO, struct drm_vc5_create_bo)
#define DRM_IOCTL_VC5_MMAP_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_MMAP_BO, struct drm_vc5_mmap_bo)
#define DRM_IOCTL_VC5_GET_PARAM DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_GET_PARAM, struct drm_vc5_get_param)
#define DRM_IOCTL_VC5_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_VC5_GET_BO_OFFSET, struct drm_vc5_get_bo_offset)
#define DRM_IOCTL_V3D_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
#define DRM_IOCTL_V3D_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
#define DRM_IOCTL_V3D_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_CREATE_BO, struct drm_v3d_create_bo)
#define DRM_IOCTL_V3D_MMAP_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
#define DRM_IOCTL_V3D_GET_PARAM DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
#define DRM_IOCTL_V3D_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
/**
* struct drm_vc5_submit_cl - ioctl argument for submitting commands to the 3D
* struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
* engine.
*
* This asks the kernel to have the GPU execute an optional binner
* command list, and a render command list.
*/
struct drm_vc5_submit_cl {
struct drm_v3d_submit_cl {
/* Pointer to the binner command list.
*
* This is the first set of commands executed, which runs the
@@ -101,29 +101,32 @@ struct drm_vc5_submit_cl {
/* Number of BO handles passed in (size is that times 4). */
__u32 bo_handle_count;
/* Pad, must be zero-filled. */
__u32 pad;
};
/**
* struct drm_vc5_wait_bo - ioctl argument for waiting for
* completion of the last DRM_VC5_SUBMIT_CL on a BO.
* struct drm_v3d_wait_bo - ioctl argument for waiting for
* completion of the last DRM_V3D_SUBMIT_CL on a BO.
*
* This is useful for cases where multiple processes might be
* rendering to a BO and you want to wait for all rendering to be
* completed.
*/
struct drm_vc5_wait_bo {
struct drm_v3d_wait_bo {
__u32 handle;
__u32 pad;
__u64 timeout_ns;
};
/**
* struct drm_vc5_create_bo - ioctl argument for creating VC5 BOs.
* struct drm_v3d_create_bo - ioctl argument for creating V3D BOs.
*
* There are currently no values for the flags argument, but it may be
* used in a future extension.
*/
struct drm_vc5_create_bo {
struct drm_v3d_create_bo {
__u32 size;
__u32 flags;
/** Returned GEM handle for the BO. */
@@ -140,7 +143,7 @@ struct drm_vc5_create_bo {
};
/**
* struct drm_vc5_mmap_bo - ioctl argument for mapping VC5 BOs.
* struct drm_v3d_mmap_bo - ioctl argument for mapping V3D BOs.
*
* This doesn't actually perform an mmap. Instead, it returns the
* offset you need to use in an mmap on the DRM device node. This
@@ -150,7 +153,7 @@ struct drm_vc5_create_bo {
* There are currently no values for the flags argument, but it may be
* used in a future extension.
*/
struct drm_vc5_mmap_bo {
struct drm_v3d_mmap_bo {
/** Handle for the object being mapped. */
__u32 handle;
__u32 flags;
@@ -158,17 +161,17 @@ struct drm_vc5_mmap_bo {
__u64 offset;
};
enum drm_vc5_param {
DRM_VC5_PARAM_V3D_UIFCFG,
DRM_VC5_PARAM_V3D_HUB_IDENT1,
DRM_VC5_PARAM_V3D_HUB_IDENT2,
DRM_VC5_PARAM_V3D_HUB_IDENT3,
DRM_VC5_PARAM_V3D_CORE0_IDENT0,
DRM_VC5_PARAM_V3D_CORE0_IDENT1,
DRM_VC5_PARAM_V3D_CORE0_IDENT2,
enum drm_v3d_param {
DRM_V3D_PARAM_V3D_UIFCFG,
DRM_V3D_PARAM_V3D_HUB_IDENT1,
DRM_V3D_PARAM_V3D_HUB_IDENT2,
DRM_V3D_PARAM_V3D_HUB_IDENT3,
DRM_V3D_PARAM_V3D_CORE0_IDENT0,
DRM_V3D_PARAM_V3D_CORE0_IDENT1,
DRM_V3D_PARAM_V3D_CORE0_IDENT2,
};
struct drm_vc5_get_param {
struct drm_v3d_get_param {
__u32 param;
__u32 pad;
__u64 value;
@@ -176,10 +179,10 @@ struct drm_vc5_get_param {
/**
* Returns the offset for the BO in the V3D address space for this DRM fd.
* This is the same value returned by drm_vc5_create_bo, if that was called
* This is the same value returned by drm_v3d_create_bo, if that was called
* from this DRM fd.
*/
struct drm_vc5_get_bo_offset {
struct drm_v3d_get_bo_offset {
__u32 handle;
__u32 offset;
};
@@ -188,4 +191,4 @@ struct drm_vc5_get_bo_offset {
}
#endif
#endif /* _VC5_DRM_H_ */
#endif /* _V3D_DRM_H_ */

View File

@@ -183,10 +183,17 @@ struct drm_vc4_submit_cl {
/* ID of the perfmon to attach to this job. 0 means no perfmon. */
__u32 perfmonid;
/* Unused field to align this struct on 64 bits. Must be set to 0.
* If one ever needs to add an u32 field to this struct, this field
* can be used.
/* Syncobj handle to wait on. If set, processing of this render job
* will not start until the syncobj is signaled. 0 means ignore.
*/
__u32 in_sync;
/* Syncobj handle to export fence to. If set, the fence in the syncobj
* will be replaced with a fence that signals upon completion of this
* render job. 0 means ignore.
*/
__u32 out_sync;
__u32 pad2;
};

View File

@@ -156,6 +156,7 @@ CHIPSET(0x5912, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
CHIPSET(0x5916, kbl_gt2, "Intel(R) HD Graphics 620 (Kaby Lake GT2)")
CHIPSET(0x591A, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
CHIPSET(0x591B, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
CHIPSET(0x591C, kbl_gt2, "Intel(R) Kaby Lake GT2")
CHIPSET(0x591D, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
CHIPSET(0x591E, kbl_gt2, "Intel(R) HD Graphics 615 (Kaby Lake GT2)")
CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")

View File

@@ -24,13 +24,34 @@
#define VKICD_H
#include "vulkan.h"
#include <stdbool.h>
/*
* Loader-ICD version negotiation API
*/
#define CURRENT_LOADER_ICD_INTERFACE_VERSION 3
// Loader-ICD version negotiation API. Versions add the following features:
// Version 0 - Initial. Doesn't support vk_icdGetInstanceProcAddr
// or vk_icdNegotiateLoaderICDInterfaceVersion.
// Version 1 - Add support for vk_icdGetInstanceProcAddr.
// Version 2 - Add Loader/ICD Interface version negotiation
// via vk_icdNegotiateLoaderICDInterfaceVersion.
// Version 3 - Add ICD creation/destruction of KHR_surface objects.
// Version 4 - Add unknown physical device extension qyering via
// vk_icdGetPhysicalDeviceProcAddr.
// Version 5 - Tells ICDs that the loader is now paying attention to the
// application version of Vulkan passed into the ApplicationInfo
// structure during vkCreateInstance. This will tell the ICD
// that if the loader is older, it should automatically fail a
// call for any API version > 1.0. Otherwise, the loader will
// manually determine if it can support the expected version.
#define CURRENT_LOADER_ICD_INTERFACE_VERSION 5
#define MIN_SUPPORTED_LOADER_ICD_INTERFACE_VERSION 0
typedef VkResult (VKAPI_PTR *PFN_vkNegotiateLoaderICDInterfaceVersion)(uint32_t *pVersion);
#define MIN_PHYS_DEV_EXTENSION_ICD_INTERFACE_VERSION 4
typedef VkResult(VKAPI_PTR *PFN_vkNegotiateLoaderICDInterfaceVersion)(uint32_t *pVersion);
// This is defined in vk_layer.h which will be found by the loader, but if an ICD is building against this
// file directly, it won't be found.
#ifndef PFN_GetPhysicalDeviceProcAddr
typedef PFN_vkVoidFunction(VKAPI_PTR *PFN_GetPhysicalDeviceProcAddr)(VkInstance instance, const char *pName);
#endif
/*
* The ICD must reserve space for a pointer for the loader's dispatch
* table, at the start of <each object>.
@@ -64,6 +85,9 @@ typedef enum {
VK_ICD_WSI_PLATFORM_WIN32,
VK_ICD_WSI_PLATFORM_XCB,
VK_ICD_WSI_PLATFORM_XLIB,
VK_ICD_WSI_PLATFORM_ANDROID,
VK_ICD_WSI_PLATFORM_MACOS,
VK_ICD_WSI_PLATFORM_IOS,
VK_ICD_WSI_PLATFORM_DISPLAY
} VkIcdWsiPlatform;
@@ -77,7 +101,7 @@ typedef struct {
MirConnection *connection;
MirSurface *mirSurface;
} VkIcdSurfaceMir;
#endif // VK_USE_PLATFORM_MIR_KHR
#endif // VK_USE_PLATFORM_MIR_KHR
#ifdef VK_USE_PLATFORM_WAYLAND_KHR
typedef struct {
@@ -85,7 +109,7 @@ typedef struct {
struct wl_display *display;
struct wl_surface *surface;
} VkIcdSurfaceWayland;
#endif // VK_USE_PLATFORM_WAYLAND_KHR
#endif // VK_USE_PLATFORM_WAYLAND_KHR
#ifdef VK_USE_PLATFORM_WIN32_KHR
typedef struct {
@@ -93,7 +117,7 @@ typedef struct {
HINSTANCE hinstance;
HWND hwnd;
} VkIcdSurfaceWin32;
#endif // VK_USE_PLATFORM_WIN32_KHR
#endif // VK_USE_PLATFORM_WIN32_KHR
#ifdef VK_USE_PLATFORM_XCB_KHR
typedef struct {
@@ -101,7 +125,7 @@ typedef struct {
xcb_connection_t *connection;
xcb_window_t window;
} VkIcdSurfaceXcb;
#endif // VK_USE_PLATFORM_XCB_KHR
#endif // VK_USE_PLATFORM_XCB_KHR
#ifdef VK_USE_PLATFORM_XLIB_KHR
typedef struct {
@@ -109,13 +133,28 @@ typedef struct {
Display *dpy;
Window window;
} VkIcdSurfaceXlib;
#endif // VK_USE_PLATFORM_XLIB_KHR
#endif // VK_USE_PLATFORM_XLIB_KHR
#ifdef VK_USE_PLATFORM_ANDROID_KHR
typedef struct {
ANativeWindow* window;
VkIcdSurfaceBase base;
struct ANativeWindow *window;
} VkIcdSurfaceAndroid;
#endif //VK_USE_PLATFORM_ANDROID_KHR
#endif // VK_USE_PLATFORM_ANDROID_KHR
#ifdef VK_USE_PLATFORM_MACOS_MVK
typedef struct {
VkIcdSurfaceBase base;
const void *pView;
} VkIcdSurfaceMacOS;
#endif // VK_USE_PLATFORM_MACOS_MVK
#ifdef VK_USE_PLATFORM_IOS_MVK
typedef struct {
VkIcdSurfaceBase base;
const void *pView;
} VkIcdSurfaceIOS;
#endif // VK_USE_PLATFORM_IOS_MVK
typedef struct {
VkIcdSurfaceBase base;
@@ -128,4 +167,4 @@ typedef struct {
VkExtent2D imageExtent;
} VkIcdSurfaceDisplay;
#endif // VKICD_H
#endif // VKICD_H

View File

@@ -43,7 +43,7 @@ extern "C" {
#define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff)
#define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff)
// Version of this file
#define VK_HEADER_VERSION 73
#define VK_HEADER_VERSION 76
#define VK_NULL_HANDLE 0
@@ -350,6 +350,11 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SURFACE_INFO_2_KHR = 1000119000,
VK_STRUCTURE_TYPE_SURFACE_CAPABILITIES_2_KHR = 1000119001,
VK_STRUCTURE_TYPE_SURFACE_FORMAT_2_KHR = 1000119002,
VK_STRUCTURE_TYPE_DISPLAY_PROPERTIES_2_KHR = 1000121000,
VK_STRUCTURE_TYPE_DISPLAY_PLANE_PROPERTIES_2_KHR = 1000121001,
VK_STRUCTURE_TYPE_DISPLAY_MODE_PROPERTIES_2_KHR = 1000121002,
VK_STRUCTURE_TYPE_DISPLAY_PLANE_INFO_2_KHR = 1000121003,
VK_STRUCTURE_TYPE_DISPLAY_PLANE_CAPABILITIES_2_KHR = 1000121004,
VK_STRUCTURE_TYPE_IOS_SURFACE_CREATE_INFO_MVK = 1000122000,
VK_STRUCTURE_TYPE_MACOS_SURFACE_CREATE_INFO_MVK = 1000123000,
VK_STRUCTURE_TYPE_DEBUG_UTILS_OBJECT_NAME_INFO_EXT = 1000128000,
@@ -2715,6 +2720,16 @@ typedef struct VkDrawIndirectCommand {
uint32_t firstInstance;
} VkDrawIndirectCommand;
typedef struct VkBaseOutStructure {
VkStructureType sType;
struct VkBaseOutStructure* pNext;
} VkBaseOutStructure;
typedef struct VkBaseInStructure {
VkStructureType sType;
const struct VkBaseInStructure* pNext;
} VkBaseInStructure;
typedef VkResult (VKAPI_PTR *PFN_vkCreateInstance)(const VkInstanceCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkInstance* pInstance);
typedef void (VKAPI_PTR *PFN_vkDestroyInstance)(VkInstance instance, const VkAllocationCallbacks* pAllocator);
@@ -5572,6 +5587,70 @@ typedef VkPhysicalDeviceVariablePointerFeatures VkPhysicalDeviceVariablePointerF
#define VK_KHR_get_display_properties2 1
#define VK_KHR_GET_DISPLAY_PROPERTIES_2_SPEC_VERSION 1
#define VK_KHR_GET_DISPLAY_PROPERTIES_2_EXTENSION_NAME "VK_KHR_get_display_properties2"
typedef struct VkDisplayProperties2KHR {
VkStructureType sType;
void* pNext;
VkDisplayPropertiesKHR displayProperties;
} VkDisplayProperties2KHR;
typedef struct VkDisplayPlaneProperties2KHR {
VkStructureType sType;
void* pNext;
VkDisplayPlanePropertiesKHR displayPlaneProperties;
} VkDisplayPlaneProperties2KHR;
typedef struct VkDisplayModeProperties2KHR {
VkStructureType sType;
void* pNext;
VkDisplayModePropertiesKHR displayModeProperties;
} VkDisplayModeProperties2KHR;
typedef struct VkDisplayPlaneInfo2KHR {
VkStructureType sType;
const void* pNext;
VkDisplayModeKHR mode;
uint32_t planeIndex;
} VkDisplayPlaneInfo2KHR;
typedef struct VkDisplayPlaneCapabilities2KHR {
VkStructureType sType;
void* pNext;
VkDisplayPlaneCapabilitiesKHR capabilities;
} VkDisplayPlaneCapabilities2KHR;
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceDisplayProperties2KHR)(VkPhysicalDevice physicalDevice, uint32_t* pPropertyCount, VkDisplayProperties2KHR* pProperties);
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceDisplayPlaneProperties2KHR)(VkPhysicalDevice physicalDevice, uint32_t* pPropertyCount, VkDisplayPlaneProperties2KHR* pProperties);
typedef VkResult (VKAPI_PTR *PFN_vkGetDisplayModeProperties2KHR)(VkPhysicalDevice physicalDevice, VkDisplayKHR display, uint32_t* pPropertyCount, VkDisplayModeProperties2KHR* pProperties);
typedef VkResult (VKAPI_PTR *PFN_vkGetDisplayPlaneCapabilities2KHR)(VkPhysicalDevice physicalDevice, const VkDisplayPlaneInfo2KHR* pDisplayPlaneInfo, VkDisplayPlaneCapabilities2KHR* pCapabilities);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceDisplayProperties2KHR(
VkPhysicalDevice physicalDevice,
uint32_t* pPropertyCount,
VkDisplayProperties2KHR* pProperties);
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceDisplayPlaneProperties2KHR(
VkPhysicalDevice physicalDevice,
uint32_t* pPropertyCount,
VkDisplayPlaneProperties2KHR* pProperties);
VKAPI_ATTR VkResult VKAPI_CALL vkGetDisplayModeProperties2KHR(
VkPhysicalDevice physicalDevice,
VkDisplayKHR display,
uint32_t* pPropertyCount,
VkDisplayModeProperties2KHR* pProperties);
VKAPI_ATTR VkResult VKAPI_CALL vkGetDisplayPlaneCapabilities2KHR(
VkPhysicalDevice physicalDevice,
const VkDisplayPlaneInfo2KHR* pDisplayPlaneInfo,
VkDisplayPlaneCapabilities2KHR* pCapabilities);
#endif
#define VK_KHR_dedicated_allocation 1
#define VK_KHR_DEDICATED_ALLOCATION_SPEC_VERSION 3
#define VK_KHR_DEDICATED_ALLOCATION_EXTENSION_NAME "VK_KHR_dedicated_allocation"
@@ -5727,6 +5806,33 @@ VKAPI_ATTR void VKAPI_CALL vkGetDescriptorSetLayoutSupportKHR(
VkDescriptorSetLayoutSupport* pSupport);
#endif
#define VK_KHR_draw_indirect_count 1
#define VK_KHR_DRAW_INDIRECT_COUNT_SPEC_VERSION 1
#define VK_KHR_DRAW_INDIRECT_COUNT_EXTENSION_NAME "VK_KHR_draw_indirect_count"
typedef void (VKAPI_PTR *PFN_vkCmdDrawIndirectCountKHR)(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset, VkBuffer countBuffer, VkDeviceSize countBufferOffset, uint32_t maxDrawCount, uint32_t stride);
typedef void (VKAPI_PTR *PFN_vkCmdDrawIndexedIndirectCountKHR)(VkCommandBuffer commandBuffer, VkBuffer buffer, VkDeviceSize offset, VkBuffer countBuffer, VkDeviceSize countBufferOffset, uint32_t maxDrawCount, uint32_t stride);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndirectCountKHR(
VkCommandBuffer commandBuffer,
VkBuffer buffer,
VkDeviceSize offset,
VkBuffer countBuffer,
VkDeviceSize countBufferOffset,
uint32_t maxDrawCount,
uint32_t stride);
VKAPI_ATTR void VKAPI_CALL vkCmdDrawIndexedIndirectCountKHR(
VkCommandBuffer commandBuffer,
VkBuffer buffer,
VkDeviceSize offset,
VkBuffer countBuffer,
VkDeviceSize countBufferOffset,
uint32_t maxDrawCount,
uint32_t stride);
#endif
#define VK_EXT_debug_report 1
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkDebugReportCallbackEXT)

View File

@@ -25,10 +25,13 @@ project(
[find_program('python', 'python2', 'python3'), 'bin/meson_get_version.py']
).stdout(),
license : 'MIT',
meson_version : '>= 0.42',
meson_version : '>= 0.44.1',
default_options : ['buildtype=debugoptimized', 'c_std=c99', 'cpp_std=c++11']
)
cc = meson.get_compiler('c')
cpp = meson.get_compiler('cpp')
null_dep = dependency('', required : false)
system_has_kms_drm = ['openbsd', 'netbsd', 'freebsd', 'dragonfly', 'linux'].contains(host_machine.system())
@@ -50,16 +53,13 @@ with_tests = get_option('build-tests')
with_valgrind = get_option('valgrind')
with_libunwind = get_option('libunwind')
with_asm = get_option('asm')
with_glx_read_only_text = get_option('glx-read-only-text')
with_osmesa = get_option('osmesa')
with_swr_arches = get_option('swr-arches').split(',')
with_tools = get_option('tools').split(',')
with_swr_arches = get_option('swr-arches')
with_tools = get_option('tools')
if with_tools.contains('all')
with_tools = ['freedreno', 'glsl', 'intel', 'nir', 'nouveau']
endif
if get_option('texture-float')
pre_args += '-DTEXTURE_FLOAT_ENABLED'
message('WARNING: Floating-point texture enabled. Please consult docs/patents.txt and your lawyer before building mesa.')
endif
dri_drivers_path = get_option('dri-drivers-path')
if dri_drivers_path == ''
@@ -93,128 +93,102 @@ endif
system_has_kms_drm = ['openbsd', 'netbsd', 'freebsd', 'dragonfly', 'linux'].contains(host_machine.system())
with_dri = false
with_dri_i915 = false
with_dri_i965 = false
with_dri_r100 = false
with_dri_r200 = false
with_dri_nouveau = false
with_dri_swrast = false
_drivers = get_option('dri-drivers')
if _drivers == 'auto'
if _drivers.contains('auto')
if system_has_kms_drm
# TODO: PPC, Sparc
if ['x86', 'x86_64'].contains(host_machine.cpu_family())
_drivers = 'i915,i965,r100,r200,nouveau'
_drivers = ['i915', 'i965', 'r100', 'r200', 'nouveau']
elif ['arm', 'aarch64'].contains(host_machine.cpu_family())
_drivers = ''
_drivers = []
else
error('Unknown architecture. Please pass -Ddri-drivers to set driver options. Patches gladly accepted to fix this.')
endif
elif ['darwin', 'windows', 'cygwin', 'haiku'].contains(host_machine.system())
# only swrast would make sense here, but gallium swrast is a much better default
_drivers = ''
_drivers = []
else
error('Unknown OS. Please pass -Ddri-drivers to set driver options. Patches gladly accepted to fix this.')
endif
endif
if _drivers != ''
_split = _drivers.split(',')
with_dri_i915 = _split.contains('i915')
with_dri_i965 = _split.contains('i965')
with_dri_r100 = _split.contains('r100')
with_dri_r200 = _split.contains('r200')
with_dri_nouveau = _split.contains('nouveau')
with_dri_swrast = _split.contains('swrast')
with_dri = true
endif
with_gallium = false
with_gallium_pl111 = false
with_gallium_radeonsi = false
with_gallium_r300 = false
with_gallium_r600 = false
with_gallium_nouveau = false
with_gallium_freedreno = false
with_gallium_softpipe = false
with_gallium_vc4 = false
with_gallium_vc5 = false
with_gallium_etnaviv = false
with_gallium_imx = false
with_gallium_tegra = false
with_gallium_i915 = false
with_gallium_svga = false
with_gallium_virgl = false
with_gallium_swr = false
with_dri_i915 = _drivers.contains('i915')
with_dri_i965 = _drivers.contains('i965')
with_dri_r100 = _drivers.contains('r100')
with_dri_r200 = _drivers.contains('r200')
with_dri_nouveau = _drivers.contains('nouveau')
with_dri_swrast = _drivers.contains('swrast')
with_dri = _drivers.length() != 0 and _drivers != ['']
_drivers = get_option('gallium-drivers')
if _drivers == 'auto'
if _drivers.contains('auto')
if system_has_kms_drm
# TODO: PPC, Sparc
if ['x86', 'x86_64'].contains(host_machine.cpu_family())
_drivers = 'r300,r600,radeonsi,nouveau,virgl,svga,swrast'
_drivers = [
'r300', 'r600', 'radeonsi', 'nouveau', 'virgl', 'svga', 'swrast'
]
elif ['arm', 'aarch64'].contains(host_machine.cpu_family())
_drivers = 'pl111,vc4,vc5,freedreno,etnaviv,imx,nouveau,tegra,virgl,swrast'
_drivers = [
'pl111', 'v3d', 'vc4', 'freedreno', 'etnaviv', 'imx', 'nouveau',
'tegra', 'virgl', 'swrast',
]
else
error('Unknown architecture. Please pass -Dgallium-drivers to set driver options. Patches gladly accepted to fix this.')
endif
elif ['darwin', 'windows', 'cygwin', 'haiku'].contains(host_machine.system())
_drivers = 'swrast'
_drivers = ['swrast']
else
error('Unknown OS. Please pass -Dgallium-drivers to set driver options. Patches gladly accepted to fix this.')
endif
endif
if _drivers != ''
_split = _drivers.split(',')
with_gallium_pl111 = _split.contains('pl111')
with_gallium_radeonsi = _split.contains('radeonsi')
with_gallium_r300 = _split.contains('r300')
with_gallium_r600 = _split.contains('r600')
with_gallium_nouveau = _split.contains('nouveau')
with_gallium_freedreno = _split.contains('freedreno')
with_gallium_softpipe = _split.contains('swrast')
with_gallium_vc4 = _split.contains('vc4')
with_gallium_vc5 = _split.contains('vc5')
with_gallium_etnaviv = _split.contains('etnaviv')
with_gallium_imx = _split.contains('imx')
with_gallium_tegra = _split.contains('tegra')
with_gallium_i915 = _split.contains('i915')
with_gallium_svga = _split.contains('svga')
with_gallium_virgl = _split.contains('virgl')
with_gallium_swr = _split.contains('swr')
with_gallium = true
if system_has_kms_drm
_glx = get_option('glx')
_egl = get_option('egl')
if _glx == 'dri' or _egl == 'true' or (_glx == 'disabled' and _egl != 'false')
with_dri = true
endif
with_gallium_pl111 = _drivers.contains('pl111')
with_gallium_radeonsi = _drivers.contains('radeonsi')
with_gallium_r300 = _drivers.contains('r300')
with_gallium_r600 = _drivers.contains('r600')
with_gallium_nouveau = _drivers.contains('nouveau')
with_gallium_freedreno = _drivers.contains('freedreno')
with_gallium_softpipe = _drivers.contains('swrast')
with_gallium_vc4 = _drivers.contains('vc4')
with_gallium_v3d = _drivers.contains('v3d')
with_gallium_etnaviv = _drivers.contains('etnaviv')
with_gallium_imx = _drivers.contains('imx')
with_gallium_tegra = _drivers.contains('tegra')
with_gallium_i915 = _drivers.contains('i915')
with_gallium_svga = _drivers.contains('svga')
with_gallium_virgl = _drivers.contains('virgl')
with_gallium_swr = _drivers.contains('swr')
with_gallium = _drivers.length() != 0 and _drivers != ['']
if with_gallium and system_has_kms_drm
_glx = get_option('glx')
_egl = get_option('egl')
if _glx == 'dri' or _egl == 'true' or (_glx == 'disabled' and _egl != 'false')
with_dri = true
endif
endif
with_intel_vk = false
with_amd_vk = false
with_any_vk = false
_vulkan_drivers = get_option('vulkan-drivers')
if _vulkan_drivers == 'auto'
if _vulkan_drivers.contains('auto')
if system_has_kms_drm
if host_machine.cpu_family().startswith('x86')
_vulkan_drivers = 'amd,intel'
_vulkan_drivers = ['amd', 'intel']
else
error('Unknown architecture. Please pass -Dvulkan-drivers to set driver options. Patches gladly accepted to fix this.')
endif
elif ['darwin', 'windows', 'cygwin', 'haiku'].contains(host_machine.system())
# No vulkan driver supports windows or macOS currently
_vulkan_drivers = ''
_vulkan_drivers = []
else
error('Unknown OS. Please pass -Dvulkan-drivers to set driver options. Patches gladly accepted to fix this.')
endif
endif
if _vulkan_drivers != ''
_split = _vulkan_drivers.split(',')
with_intel_vk = _split.contains('intel')
with_amd_vk = _split.contains('amd')
with_any_vk = with_amd_vk or with_intel_vk
endif
with_intel_vk = _vulkan_drivers.contains('intel')
with_amd_vk = _vulkan_drivers.contains('amd')
with_any_vk = _vulkan_drivers.length() != 0 and _vulkan_drivers != ['']
if with_dri_swrast and (with_gallium_softpipe or with_gallium_swr)
error('Only one swrast provider can be built')
@@ -247,33 +221,30 @@ else
with_dri_platform = 'none'
endif
with_platform_android = false
with_platform_wayland = false
with_platform_x11 = false
with_platform_drm = false
with_platform_surfaceless = false
egl_native_platform = ''
_platforms = get_option('platforms')
if _platforms == 'auto'
if _platforms.contains('auto')
if system_has_kms_drm
_platforms = 'x11,wayland,drm,surfaceless'
_platforms = ['x11', 'wayland', 'drm', 'surfaceless']
elif ['darwin', 'windows', 'cygwin'].contains(host_machine.system())
_platforms = 'x11,surfaceless'
_platforms = ['x11', 'surfaceless']
elif ['haiku'].contains(host_machine.system())
_platforms = 'haiku'
_platforms = ['haiku']
else
error('Unknown OS. Please pass -Dplatforms to set platforms. Patches gladly accepted to fix this.')
endif
endif
if _platforms != ''
_split = _platforms.split(',')
with_platform_android = _split.contains('android')
with_platform_x11 = _split.contains('x11')
with_platform_wayland = _split.contains('wayland')
with_platform_drm = _split.contains('drm')
with_platform_haiku = _split.contains('haiku')
with_platform_surfaceless = _split.contains('surfaceless')
egl_native_platform = _split[0]
with_platform_android = _platforms.contains('android')
with_platform_x11 = _platforms.contains('x11')
with_platform_wayland = _platforms.contains('wayland')
with_platform_drm = _platforms.contains('drm')
with_platform_haiku = _platforms.contains('haiku')
with_platform_surfaceless = _platforms.contains('surfaceless')
with_platforms = false
if _platforms.length() != 0 and _platforms != ['']
with_platforms = true
egl_native_platform = _platforms[0]
endif
with_glx = get_option('glx')
@@ -315,13 +286,13 @@ endif
_egl = get_option('egl')
if _egl == 'auto'
with_egl = with_dri and with_shared_glapi and egl_native_platform != ''
with_egl = with_dri and with_shared_glapi and with_platforms
elif _egl == 'true'
if not with_dri
error('EGL requires dri')
elif not with_shared_glapi
error('EGL requires shared-glapi')
elif egl_native_platform == ''
elif not with_platforms
error('No platforms specified, consider -Dplatforms=drm,x11 at least')
elif not ['disabled', 'dri'].contains(with_glx)
error('EGL requires dri, but a GLX is being built without dri')
@@ -343,11 +314,7 @@ endif
pre_args += '-DGLX_USE_TLS'
if with_glx != 'disabled'
if not (with_platform_x11 and with_any_opengl)
if with_glx == 'auto'
with_glx = 'disabled'
else
error('Cannot build GLX support without X11 platform support and at least one OpenGL API')
endif
error('Cannot build GLX support without X11 platform support and at least one OpenGL API')
elif with_glx == 'gallium-xlib'
if not with_gallium
error('Gallium-xlib based GLX requires at least one gallium driver')
@@ -360,8 +327,12 @@ if with_glx != 'disabled'
if with_dri
error('xlib conflicts with any dri driver')
endif
elif with_glx == 'dri' and not with_dri
error('dri based GLX requires at least one DRI driver')
elif with_glx == 'dri'
if not with_dri
error('dri based GLX requires at least one DRI driver')
elif not with_shared_glapi
error('dri based GLX requires shared-glapi')
endif
endif
endif
@@ -584,7 +555,7 @@ endif
with_gallium_va = _va == 'true'
dep_va = null_dep
if with_gallium_va
dep_va = dependency('libva', version : '>= 0.38.0')
dep_va = dependency('libva', version : '>= 0.39.0')
dep_va_headers = declare_dependency(
compile_args : run_command(prog_pkgconfig, ['libva', '--cflags']).stdout().split()
)
@@ -630,13 +601,34 @@ if with_gallium_st_nine
endif
endif
if get_option('power8') != 'false'
if host_machine.cpu_family() == 'ppc64le'
if cc.get_id() == 'gcc' and cc.version().version_compare('< 4.8')
error('Altivec is not supported with gcc version < 4.8.')
endif
if cc.compiles('''
#include <altivec.h>
int main() {
vector unsigned char r;
vector unsigned int v = vec_splat_u32 (1);
r = __builtin_vec_vgbbd ((vector unsigned char) v);
return 0;
}''',
args : '-mpower8-vector',
name : 'POWER8 intrinsics')
pre_args += ['-D_ARCH_PWR8', '-mpower8-vector']
elif get_option('power8') == 'true'
error('POWER8 intrinsic support required but not found.')
endif
endif
endif
_opencl = get_option('gallium-opencl')
if _opencl != 'disabled'
if not with_gallium
error('OpenCL Clover implementation requires at least one gallium driver.')
endif
# TODO: alitvec?
dep_clc = dependency('libclc')
with_gallium_opencl = true
with_opencl_icd = _opencl == 'icd'
@@ -697,7 +689,6 @@ if has_mako.returncode() != 0
error('Python (2.x) mako module required to build mesa.')
endif
cc = meson.get_compiler('c')
if cc.get_id() == 'gcc' and cc.version().version_compare('< 4.4.6')
error('When using GCC, version 4.4.6 or later is required.')
endif
@@ -777,7 +768,6 @@ if cc.has_argument('-fvisibility=hidden')
endif
# Check for generic C++ arguments
cpp = meson.get_compiler('cpp')
cpp_args = []
foreach a : ['-Wall', '-fno-math-errno', '-fno-trapping-math',
'-Qunused-arguments']
@@ -836,7 +826,15 @@ endif
# Check for GCC style atomics
dep_atomic = null_dep
if cc.compiles('int main() { int n; return __atomic_load_n(&n, __ATOMIC_ACQUIRE); }',
if cc.compiles('''#include <stdint.h>
int main() {
struct {
uint64_t *v;
} x;
return (int)__atomic_load_n(x.v, __ATOMIC_ACQUIRE) &
(int)__atomic_add_fetch(x.v, (uint64_t)1, __ATOMIC_ACQ_REL);
}''',
name : 'GCC atomic builtins')
pre_args += '-DUSE_GCC_ATOMIC_BUILTINS'
@@ -848,8 +846,11 @@ if cc.compiles('int main() { int n; return __atomic_load_n(&n, __ATOMIC_ACQUIRE)
# as ARM.
if not cc.links('''#include <stdint.h>
int main() {
uint64_t n;
return (int)__atomic_load_n(&n, __ATOMIC_ACQUIRE);
struct {
uint64_t *v;
} x;
return (int)__atomic_load_n(x.v, __ATOMIC_ACQUIRE) &
(int)__atomic_add_fetch(x.v, (uint64_t)1, __ATOMIC_ACQ_REL);
}''',
name : 'GCC atomic builtins required -latomic')
dep_atomic = cc.find_library('atomic')
@@ -864,30 +865,44 @@ if not cc.links('''#include <stdint.h>
pre_args += '-DMISSING_64_BIT_ATOMICS'
endif
# TODO: endian
# TODO: powr8
# TODO: shared/static? Is this even worth doing?
# Building x86 assembly code requires running x86 binaries. It is possible for
# x86_64 OSes to run x86 binaries, so don't disable asm in those cases
# TODO: it should be possible to use an exe_wrapper to run the binary during
# the build.
# When cross compiling we generally need to turn off the use of assembly,
# because mesa's assembly relies on building an executable for the host system,
# and running it to get information about struct sizes. There is at least one
# case of cross compiling where we can use asm, and that's x86_64 -> x86 when
# host OS == build OS, since in that case the build machine can run the host's
# binaries.
if meson.is_cross_build()
if not (build_machine.cpu_family() == 'x86_64' and host_machine.cpu_family() == 'x86'
and build_machine.system() == host_machine.system())
message('Cross compiling to x86 from non-x86, disabling asm')
if build_machine.system() != host_machine.system()
# TODO: It may be possible to do this with an exe_wrapper (like wine).
message('Cross compiling from one OS to another, disabling assembly.')
with_asm = false
elif not (build_machine.cpu_family().startswith('x86') and host_machine.cpu_family() == 'x86')
# FIXME: Gentoo always sets -m32 for x86_64 -> x86 builds, resulting in an
# x86 -> x86 cross compile. We use startswith rather than == to handle this
# case.
# TODO: There may be other cases where the 64 bit version of the
# architecture can run 32 bit binaries (aarch64 and armv7 for example)
message('''
Cross compiling to different architectures, and the host cannot run
the build machine's binaries. Disabling assembly.
''')
with_asm = false
endif
endif
with_asm_arch = ''
if with_asm
# TODO: SPARC and PPC
if host_machine.cpu_family() == 'x86'
if system_has_kms_drm
with_asm_arch = 'x86'
pre_args += ['-DUSE_X86_ASM', '-DUSE_MMX_ASM', '-DUSE_3DNOW_ASM',
'-DUSE_SSE_ASM']
if with_glx_read_only_text
pre_args += ['-DGLX_X86_READONLY_TEXT']
endif
endif
elif host_machine.cpu_family() == 'x86_64'
if system_has_kms_drm
@@ -904,6 +919,16 @@ if with_asm
with_asm_arch = 'aarch64'
pre_args += ['-DUSE_AARCH64_ASM']
endif
elif host_machine.cpu_family() == 'sparc64'
if system_has_kms_drm
with_asm_arch = 'sparc'
pre_args += ['-DUSE_SPARC_ASM']
endif
elif host_machine.cpu_family() == 'ppc64le'
if system_has_kms_drm
with_asm_arch = 'ppc64le'
pre_args += ['-DUSE_PPC64LE_ASM']
endif
endif
endif
@@ -1038,7 +1063,7 @@ _drm_amdgpu_ver = '2.4.91'
_drm_radeon_ver = '2.4.71'
_drm_nouveau_ver = '2.4.66'
_drm_etnaviv_ver = '2.4.89'
_drm_freedreno_ver = '2.4.91'
_drm_freedreno_ver = '2.4.92'
_drm_intel_ver = '2.4.75'
_drm_ver = '2.4.75'
@@ -1052,6 +1077,12 @@ _libdrm_checks = [
['freedreno', with_gallium_freedreno],
]
# VC4 only needs core libdrm support of this version, not a libdrm_vc4
# library.
if with_gallium_vc4
_drm_ver = '2.4.89'
endif
# Loop over the enables versions and get the highest libdrm requirement for all
# active drivers.
foreach d : _libdrm_checks
@@ -1084,6 +1115,7 @@ if dep_libdrm.found()
endif
llvm_modules = ['bitwriter', 'engine', 'mcdisassembler', 'mcjit']
llvm_optional_modules = []
if with_amd_vk or with_gallium_radeonsi or with_gallium_r600
llvm_modules += ['amdgpu', 'bitreader', 'ipo']
if with_gallium_r600
@@ -1095,10 +1127,12 @@ if with_gallium_opencl
'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', 'irreader',
'lto', 'option', 'objcarcopts', 'profiledata',
]
# TODO: optional modules
llvm_optional_modules += ['coroutines', 'opencl']
endif
if with_amd_vk or with_gallium_radeonsi or with_gallium_swr
if with_amd_vk or with_gallium_radeonsi
_llvm_version = '>= 5.0.0'
elif with_gallium_swr
_llvm_version = '>= 4.0.0'
elif with_gallium_opencl or with_gallium_r600
_llvm_version = '>= 3.9.0'
@@ -1109,12 +1143,20 @@ endif
_llvm = get_option('llvm')
if _llvm == 'auto'
dep_llvm = dependency(
'llvm', version : _llvm_version, modules : llvm_modules,
'llvm',
version : _llvm_version,
modules : llvm_modules,
optional_modules : llvm_optional_modules,
required : with_amd_vk or with_gallium_radeonsi or with_gallium_swr or with_gallium_opencl,
)
with_llvm = dep_llvm.found()
elif _llvm == 'true'
dep_llvm = dependency('llvm', version : _llvm_version, modules : llvm_modules)
dep_llvm = dependency(
'llvm',
version : _llvm_version,
modules : llvm_modules,
optional_modules : llvm_optional_modules,
)
with_llvm = true
else
dep_llvm = null_dep
@@ -1143,6 +1185,13 @@ if with_llvm
'-DHAVE_LLVM=0x0@0@0@1@'.format(_llvm_version[0], _llvm_version[1]),
'-DMESA_LLVM_VERSION_PATCH=@0@'.format(_llvm_patch),
]
# LLVM can be built without rtti, turning off rtti changes the ABI of C++
# programs, so we need to build all C++ code in mesa without rtti as well to
# ensure that linking works.
if dep_llvm.get_configtool_variable('has-rtti') == 'NO'
cpp_args += '-fno-rtti'
endif
elif with_amd_vk or with_gallium_radeonsi or with_gallium_swr
error('The following drivers require LLVM: Radv, RadeonSI, SWR. One of these is enabled, but LLVM is disabled.')
endif
@@ -1173,8 +1222,6 @@ if get_option('selinux')
pre_args += '-DMESA_SELINUX'
endif
# TODO: llvm-prefix and llvm-shared-libs
if with_libunwind != 'false'
dep_unwind = dependency('libunwind', required : with_libunwind == 'true')
if dep_unwind.found()
@@ -1184,8 +1231,6 @@ else
dep_unwind = null_dep
endif
# TODO: gallium-hud
if with_osmesa != 'none'
if with_osmesa == 'classic' and not with_dri_swrast
error('OSMesa classic requires dri (classic) swrast.')
@@ -1213,6 +1258,11 @@ if with_platform_wayland
dep_wl_protocols = dependency('wayland-protocols', version : '>= 1.8')
dep_wayland_client = dependency('wayland-client', version : '>=1.11')
dep_wayland_server = dependency('wayland-server', version : '>=1.11')
if with_egl
dep_wayland_egl = dependency('wayland-egl-backend', version : '>= 3')
dep_wayland_egl_headers = declare_dependency(
compile_args : run_command(prog_pkgconfig, ['wayland-egl-backend', '--cflags']).stdout().split())
endif
wayland_dmabuf_xml = join_paths(
dep_wl_protocols.get_pkgconfig_variable('pkgdatadir'), 'unstable',
'linux-dmabuf', 'linux-dmabuf-unstable-v1.xml'
@@ -1305,18 +1355,6 @@ else
dep_lmsensors = null_dep
endif
# TODO: various libdirs
# TODO: gallium driver dirs
# FIXME: this is a workaround for #2326
prog_touch = find_program('touch')
dummy_cpp = custom_target(
'dummy_cpp',
output : 'dummy.cpp',
command : [prog_touch, '@OUTPUT@'],
)
foreach a : pre_args
add_project_arguments(a, language : ['c', 'cpp'])
endforeach
@@ -1329,18 +1367,24 @@ endforeach
inc_include = include_directories('include')
gl_priv_reqs = [
'x11', 'xext', 'xdamage >= 1.1', 'xfixes', 'x11-xcb', 'xcb',
'xcb-glx >= 1.8.1']
gl_priv_reqs = []
if with_glx == 'xlib' or with_glx == 'gallium-xlib'
gl_priv_reqs += ['x11', 'xext', 'xcb']
elif with_glx == 'dri'
gl_priv_reqs += [
'x11', 'xext', 'xdamage >= 1.1', 'xfixes', 'x11-xcb', 'xcb',
'xcb-glx >= 1.8.1']
if with_dri_platform == 'drm'
gl_priv_reqs += 'xcb-dri2 >= 1.8'
endif
endif
if dep_libdrm.found()
gl_priv_reqs += 'libdrm >= 2.4.75'
endif
if dep_xxf86vm.found()
gl_priv_reqs += 'xxf86vm'
endif
if with_dri_platform == 'drm'
gl_priv_reqs += 'xcb-dri2 >= 1.8'
endif
gl_priv_libs = []
if dep_thread.found()

View File

@@ -1,4 +1,4 @@
# Copyright © 2017 Intel Corporation
# Copyright © 2017-2018 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -20,8 +20,11 @@
option(
'platforms',
type : 'string',
value : 'auto',
type : 'array',
value : ['auto'],
choices : [
'', 'auto', 'x11', 'wayland', 'drm', 'surfaceless', 'haiku', 'android',
],
description : 'comma separated list of window systems to support. If this is set to auto all platforms applicable to the OS will be enabled.'
)
option(
@@ -33,9 +36,10 @@ option(
)
option(
'dri-drivers',
type : 'string',
value : 'auto',
description : 'comma separated list of dri drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
type : 'array',
value : ['auto'],
choices : ['', 'auto', 'i915', 'i965', 'r100', 'r200', 'nouveau', 'swrast'],
description : 'List of dri drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'dri-drivers-path',
@@ -51,9 +55,14 @@ option(
)
option(
'gallium-drivers',
type : 'string',
value : 'auto',
description : 'comma separated list of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
type : 'array',
value : ['auto'],
choices : [
'', 'auto', 'pl111', 'radeonsi', 'r300', 'r600', 'nouveau', 'freedreno',
'swrast', 'v3d', 'vc4', 'etnaviv', 'imx', 'tegra', 'i915', 'svga', 'virgl',
'swr',
],
description : 'List of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'gallium-extra-hud',
@@ -141,9 +150,10 @@ option(
)
option(
'vulkan-drivers',
type : 'string',
value : 'auto',
description : 'comma separated list of vulkan drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
type : 'array',
value : ['auto'],
choices : ['', 'auto', 'amd', 'intel'],
description : 'List of vulkan drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'shader-cache',
@@ -214,6 +224,12 @@ option(
value : true,
description : 'Build assembly code if possible'
)
option(
'glx-read-only-text',
type : 'boolean',
value : false,
description : 'Disable writable .text section on x86 (decreases performance)'
)
option(
'llvm',
type : 'combo',
@@ -248,12 +264,6 @@ option(
value : false,
description : 'Build unit tests. Currently this will build *all* unit tests, which may build more than expected.'
)
option(
'texture-float',
type : 'boolean',
value : false,
description : 'Enable floating point textures and renderbuffers. This option may be patent encumbered, please read docs/patents.txt and consult with your lawyer before turning this on.'
)
option(
'selinux',
type : 'boolean',
@@ -276,13 +286,22 @@ option(
)
option(
'swr-arches',
type : 'string',
value : 'avx,avx2',
description : 'Comma delemited swr architectures. choices : avx,avx2,knl,skx'
type : 'array',
value : ['avx', 'avx2'],
choices : ['avx', 'avx2', 'knl', 'skx'],
description : 'Architectures to build SWR support for.',
)
option(
'tools',
type : 'string',
value : '',
description : 'Comma delimited list of tools to build. choices : freedreno,glsl,intel,nir,nouveau or all'
type : 'array',
value : [],
choices : ['freedreno', 'glsl', 'intel', 'nir', 'nouveau', 'all'],
description : 'List of tools to build.',
)
option(
'power8',
type : 'combo',
value : 'auto',
choices : ['auto', 'true', 'false'],
description : 'Enable power8 optimizations.',
)

View File

@@ -392,10 +392,6 @@ def generate(env):
cppdefines += ['PIPE_SUBSYSTEM_WINDOWS_USER']
if env['embedded']:
cppdefines += ['PIPE_SUBSYSTEM_EMBEDDED']
if env['texture_float']:
print('warning: Floating-point textures enabled.')
print('warning: Please consult docs/patents.txt with your lawyer before building Mesa.')
cppdefines += ['TEXTURE_FLOAT_ENABLED']
env.Append(CPPDEFINES = cppdefines)
# C compiler options

View File

@@ -123,6 +123,10 @@ def generate(env):
'LLVMDemangle', 'LLVMGlobalISel', 'LLVMDebugInfoMSF',
'LLVMBinaryFormat',
])
if env['platform'] == 'windows' and env['crosscompile']:
# LLVM 5.0 requires MinGW w/ pthreads due to use of std::thread and friends.
assert env['gcc']
env['CXX'] = env['CXX'] + '-posix'
elif llvm_version >= distutils.version.LooseVersion('4.0'):
env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser',
@@ -211,8 +215,11 @@ def generate(env):
'imagehlp',
'psapi',
'shell32',
'advapi32'
'advapi32',
'ole32',
'uuid',
])
if env['msvc']:
# Some of the LLVM C headers use the inline keyword without
# defining it.

View File

@@ -95,11 +95,6 @@ if HAVE_GBM
SUBDIRS += gbm
endif
## Optionally required by EGL
if HAVE_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-egl
endif
if HAVE_EGL
SUBDIRS += egl
endif

View File

@@ -22,6 +22,7 @@
ADDRLIB_LIBS = addrlib/libamdgpu_addrlib.la
addrlib_libamdgpu_addrlib_la_CPPFLAGS = \
$(DEFINES) \
-I$(top_srcdir)/src/ \
-I$(srcdir)/common \
-I$(srcdir)/addrlib \

View File

@@ -97,7 +97,6 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct amdgpu_gpu_info *amdinfo)
{
struct amdgpu_buffer_size_alignments alignment_info = {};
struct amdgpu_heap_info vram, vram_vis, gtt;
struct drm_amdgpu_info_hw_ip dma = {}, compute = {}, uvd = {};
struct drm_amdgpu_info_hw_ip uvd_enc = {}, vce = {}, vcn_dec = {};
struct drm_amdgpu_info_hw_ip vcn_enc = {}, gfx = {};
@@ -131,26 +130,6 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM, 0, &vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
&vram_vis);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram_vis) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_GTT, 0, &gtt);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(gtt) failed.\n");
return false;
}
r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_DMA, 0, &dma);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(dma) failed.\n");
@@ -255,6 +234,60 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
return false;
}
if (info->drm_minor >= 9) {
struct drm_amdgpu_memory_info meminfo = {};
r = amdgpu_query_info(dev, AMDGPU_INFO_MEMORY, sizeof(meminfo), &meminfo);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_info(memory) failed.\n");
return false;
}
/* Note: usable_heap_size values can be random and can't be relied on. */
info->gart_size = meminfo.gtt.total_heap_size;
info->vram_size = meminfo.vram.total_heap_size;
info->vram_vis_size = meminfo.cpu_accessible_vram.total_heap_size;
info->max_alloc_size = MAX2(meminfo.vram.max_allocation,
meminfo.gtt.max_allocation);
} else {
/* This is a deprecated interface, which reports usable sizes
* (total minus pinned), but the pinned size computation is
* buggy, so the values returned from these functions can be
* random.
*/
struct amdgpu_heap_info vram, vram_vis, gtt;
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM, 0, &vram);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_VRAM,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
&vram_vis);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(vram_vis) failed.\n");
return false;
}
r = amdgpu_query_heap_info(dev, AMDGPU_GEM_DOMAIN_GTT, 0, &gtt);
if (r) {
fprintf(stderr, "amdgpu: amdgpu_query_heap_info(gtt) failed.\n");
return false;
}
info->gart_size = gtt.heap_size;
info->vram_size = vram.heap_size;
info->vram_vis_size = vram_vis.heap_size;
/* The kernel can split large buffers in VRAM but not in GTT, so large
* allocations can fail or cause buffer movement failures in the kernel.
*/
info->max_alloc_size = MAX2(info->vram_size * 0.9, info->gart_size * 0.7);
}
/* Set chip identification. */
info->pci_id = amdinfo->asic_id; /* TODO: is this correct? */
info->vce_harvest_config = amdinfo->vce_harvest_config;
@@ -287,15 +320,8 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
!(amdinfo->ids_flags & AMDGPU_IDS_FLAGS_FUSION);
/* Set hardware information. */
info->gart_size = gtt.heap_size;
info->vram_size = vram.heap_size;
info->vram_vis_size = vram_vis.heap_size;
info->gds_size = gds.gds_total_size;
info->gds_gfx_partition_size = gds.gds_gfx_partition_size;
/* The kernel can split large buffers in VRAM but not in GTT, so large
* allocations can fail or cause buffer movement failures in the kernel.
*/
info->max_alloc_size = MIN2(info->vram_size * 0.9, info->gart_size * 0.7);
/* convert the shader clock from KHz to MHz */
info->max_shader_clock = amdinfo->max_engine_clk / 1000;
info->max_se = amdinfo->num_shader_engines;
@@ -316,7 +342,35 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
/* TODO: Enable this once the kernel handles it efficiently. */
info->has_local_buffers = info->drm_minor >= 20 &&
!info->has_dedicated_vram;
info->kernel_flushes_hdp_before_ib = true;
info->htile_cmask_support_1d_tiling = true;
info->si_TA_CS_BC_BASE_ADDR_allowed = true;
info->has_bo_metadata = true;
info->has_gpu_reset_status_query = true;
info->has_gpu_reset_counter_query = false;
info->has_eqaa_surface_allocator = true;
info->has_format_bc1_through_bc7 = true;
/* DRM 3.1.0 doesn't flush TC for VI correctly. */
info->kernel_flushes_tc_l2_after_ib = info->chip_class != VI ||
info->drm_minor >= 2;
info->has_indirect_compute_dispatch = true;
/* SI doesn't support unaligned loads. */
info->has_unaligned_shader_loads = info->chip_class != SI;
/* Disable sparse mappings on SI due to VM faults in CP DMA. Enable them once
* these faults are mitigated in software.
* Disable sparse mappings on GFX9 due to hangs.
*/
info->has_sparse_vm_mappings =
info->chip_class >= CIK && info->chip_class <= VI &&
info->drm_minor >= 13;
info->has_2d_tiling = true;
info->has_read_registers_query = true;
info->num_render_backends = amdinfo->rb_pipes;
/* The value returned by the kernel driver was wrong. */
if (info->family == CHIP_KAVERI)
info->num_render_backends = 2;
info->clock_crystal_freq = amdinfo->gpu_counter_freq;
if (!info->clock_crystal_freq) {
fprintf(stderr, "amdgpu: clock crystal frequency is 0, timestamps will be wrong\n");
@@ -449,7 +503,7 @@ void ac_print_gpu_info(struct radeon_info *info)
printf(" vce_fw_version = %u\n", info->vce_fw_version);
printf(" vce_harvest_config = %i\n", info->vce_harvest_config);
printf("Kernel info:\n");
printf("Kernel & winsys capabilities:\n");
printf(" drm = %i.%i.%i\n", info->drm_major,
info->drm_minor, info->drm_patchlevel);
printf(" has_userptr = %i\n", info->has_userptr);
@@ -458,6 +512,20 @@ void ac_print_gpu_info(struct radeon_info *info)
printf(" has_fence_to_handle = %u\n", info->has_fence_to_handle);
printf(" has_ctx_priority = %u\n", info->has_ctx_priority);
printf(" has_local_buffers = %u\n", info->has_local_buffers);
printf(" kernel_flushes_hdp_before_ib = %u\n", info->kernel_flushes_hdp_before_ib);
printf(" htile_cmask_support_1d_tiling = %u\n", info->htile_cmask_support_1d_tiling);
printf(" si_TA_CS_BC_BASE_ADDR_allowed = %u\n", info->si_TA_CS_BC_BASE_ADDR_allowed);
printf(" has_bo_metadata = %u\n", info->has_bo_metadata);
printf(" has_gpu_reset_status_query = %u\n", info->has_gpu_reset_status_query);
printf(" has_gpu_reset_counter_query = %u\n", info->has_gpu_reset_counter_query);
printf(" has_eqaa_surface_allocator = %u\n", info->has_eqaa_surface_allocator);
printf(" has_format_bc1_through_bc7 = %u\n", info->has_format_bc1_through_bc7);
printf(" kernel_flushes_tc_l2_after_ib = %u\n", info->kernel_flushes_tc_l2_after_ib);
printf(" has_indirect_compute_dispatch = %u\n", info->has_indirect_compute_dispatch);
printf(" has_unaligned_shader_loads = %u\n", info->has_unaligned_shader_loads);
printf(" has_sparse_vm_mappings = %u\n", info->has_sparse_vm_mappings);
printf(" has_2d_tiling = %u\n", info->has_2d_tiling);
printf(" has_read_registers_query = %u\n", info->has_read_registers_query);
printf("Shader core info:\n");
printf(" max_shader_clock = %i\n", info->max_shader_clock);
@@ -521,3 +589,235 @@ void ac_print_gpu_info(struct radeon_info *info)
G_0098F8_NUM_LOWER_PIPES(info->gb_addr_config));
}
}
int
ac_get_gs_table_depth(enum chip_class chip_class, enum radeon_family family)
{
if (chip_class >= GFX9)
return -1;
switch (family) {
case CHIP_OLAND:
case CHIP_HAINAN:
case CHIP_KAVERI:
case CHIP_KABINI:
case CHIP_MULLINS:
case CHIP_ICELAND:
case CHIP_CARRIZO:
case CHIP_STONEY:
return 16;
case CHIP_TAHITI:
case CHIP_PITCAIRN:
case CHIP_VERDE:
case CHIP_BONAIRE:
case CHIP_HAWAII:
case CHIP_TONGA:
case CHIP_FIJI:
case CHIP_POLARIS10:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGAM:
return 32;
default:
unreachable("Unknown GPU");
}
}
void
ac_get_raster_config(struct radeon_info *info,
uint32_t *raster_config_p,
uint32_t *raster_config_1_p)
{
unsigned raster_config, raster_config_1;
switch (info->family) {
/* 1 SE / 1 RB */
case CHIP_HAINAN:
case CHIP_KABINI:
case CHIP_MULLINS:
case CHIP_STONEY:
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
/* 1 SE / 4 RBs */
case CHIP_VERDE:
raster_config = 0x0000124a;
raster_config_1 = 0x00000000;
break;
/* 1 SE / 2 RBs (Oland is special) */
case CHIP_OLAND:
raster_config = 0x00000082;
raster_config_1 = 0x00000000;
break;
/* 1 SE / 2 RBs */
case CHIP_KAVERI:
case CHIP_ICELAND:
case CHIP_CARRIZO:
raster_config = 0x00000002;
raster_config_1 = 0x00000000;
break;
/* 2 SEs / 4 RBs */
case CHIP_BONAIRE:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
raster_config = 0x16000012;
raster_config_1 = 0x00000000;
break;
/* 2 SEs / 8 RBs */
case CHIP_TAHITI:
case CHIP_PITCAIRN:
raster_config = 0x2a00126a;
raster_config_1 = 0x00000000;
break;
/* 4 SEs / 8 RBs */
case CHIP_TONGA:
case CHIP_POLARIS10:
raster_config = 0x16000012;
raster_config_1 = 0x0000002a;
break;
/* 4 SEs / 16 RBs */
case CHIP_HAWAII:
case CHIP_FIJI:
case CHIP_VEGAM:
raster_config = 0x3a00161a;
raster_config_1 = 0x0000002e;
break;
default:
fprintf(stderr,
"ac: Unknown GPU, using 0 for raster_config\n");
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
}
/* drm/radeon on Kaveri is buggy, so disable 1 RB to work around it.
* This decreases performance by up to 50% when the RB is the bottleneck.
*/
if (info->family == CHIP_KAVERI && info->drm_major == 2)
raster_config = 0x00000000;
/* Fiji: Old kernels have incorrect tiling config. This decreases
* RB performance by 25%. (it disables 1 RB in the second packer)
*/
if (info->family == CHIP_FIJI &&
info->cik_macrotile_mode_array[0] == 0x000000e8) {
raster_config = 0x16000012;
raster_config_1 = 0x0000002a;
}
*raster_config_p = raster_config;
*raster_config_1_p = raster_config_1;
}
void
ac_get_harvested_configs(struct radeon_info *info,
unsigned raster_config,
unsigned *cik_raster_config_1_p,
unsigned *raster_config_se)
{
unsigned sh_per_se = MAX2(info->max_sh_per_se, 1);
unsigned num_se = MAX2(info->max_se, 1);
unsigned rb_mask = info->enabled_rb_mask;
unsigned num_rb = MIN2(info->num_render_backends, 16);
unsigned rb_per_pkr = MIN2(num_rb / num_se / sh_per_se, 2);
unsigned rb_per_se = num_rb / num_se;
unsigned se_mask[4];
unsigned se;
se_mask[0] = ((1 << rb_per_se) - 1) & rb_mask;
se_mask[1] = (se_mask[0] << rb_per_se) & rb_mask;
se_mask[2] = (se_mask[1] << rb_per_se) & rb_mask;
se_mask[3] = (se_mask[2] << rb_per_se) & rb_mask;
assert(num_se == 1 || num_se == 2 || num_se == 4);
assert(sh_per_se == 1 || sh_per_se == 2);
assert(rb_per_pkr == 1 || rb_per_pkr == 2);
if (info->chip_class >= CIK) {
unsigned raster_config_1 = *cik_raster_config_1_p;
if ((num_se > 2) && ((!se_mask[0] && !se_mask[1]) ||
(!se_mask[2] && !se_mask[3]))) {
raster_config_1 &= C_028354_SE_PAIR_MAP;
if (!se_mask[0] && !se_mask[1]) {
raster_config_1 |=
S_028354_SE_PAIR_MAP(V_028354_RASTER_CONFIG_SE_PAIR_MAP_3);
} else {
raster_config_1 |=
S_028354_SE_PAIR_MAP(V_028354_RASTER_CONFIG_SE_PAIR_MAP_0);
}
*cik_raster_config_1_p = raster_config_1;
}
}
for (se = 0; se < num_se; se++) {
unsigned pkr0_mask = ((1 << rb_per_pkr) - 1) << (se * rb_per_se);
unsigned pkr1_mask = pkr0_mask << rb_per_pkr;
int idx = (se / 2) * 2;
raster_config_se[se] = raster_config;
if ((num_se > 1) && (!se_mask[idx] || !se_mask[idx + 1])) {
raster_config_se[se] &= C_028350_SE_MAP;
if (!se_mask[idx]) {
raster_config_se[se] |=
S_028350_SE_MAP(V_028350_RASTER_CONFIG_SE_MAP_3);
} else {
raster_config_se[se] |=
S_028350_SE_MAP(V_028350_RASTER_CONFIG_SE_MAP_0);
}
}
pkr0_mask &= rb_mask;
pkr1_mask &= rb_mask;
if (rb_per_se > 2 && (!pkr0_mask || !pkr1_mask)) {
raster_config_se[se] &= C_028350_PKR_MAP;
if (!pkr0_mask) {
raster_config_se[se] |=
S_028350_PKR_MAP(V_028350_RASTER_CONFIG_PKR_MAP_3);
} else {
raster_config_se[se] |=
S_028350_PKR_MAP(V_028350_RASTER_CONFIG_PKR_MAP_0);
}
}
if (rb_per_se >= 2) {
unsigned rb0_mask = 1 << (se * rb_per_se);
unsigned rb1_mask = rb0_mask << 1;
rb0_mask &= rb_mask;
rb1_mask &= rb_mask;
if (!rb0_mask || !rb1_mask) {
raster_config_se[se] &= C_028350_RB_MAP_PKR0;
if (!rb0_mask) {
raster_config_se[se] |=
S_028350_RB_MAP_PKR0(V_028350_RASTER_CONFIG_RB_MAP_3);
} else {
raster_config_se[se] |=
S_028350_RB_MAP_PKR0(V_028350_RASTER_CONFIG_RB_MAP_0);
}
}
if (rb_per_se > 2) {
rb0_mask = 1 << (se * rb_per_se + rb_per_pkr);
rb1_mask = rb0_mask << 1;
rb0_mask &= rb_mask;
rb1_mask &= rb_mask;
if (!rb0_mask || !rb1_mask) {
raster_config_se[se] &= C_028350_RB_MAP_PKR1;
if (!rb0_mask) {
raster_config_se[se] |=
S_028350_RB_MAP_PKR1(V_028350_RASTER_CONFIG_RB_MAP_3);
} else {
raster_config_se[se] |=
S_028350_RB_MAP_PKR1(V_028350_RASTER_CONFIG_RB_MAP_0);
}
}
}
}
}
}

View File

@@ -86,7 +86,7 @@ struct radeon_info {
uint32_t vce_fw_version;
uint32_t vce_harvest_config;
/* Kernel info. */
/* Kernel & winsys capabilities. */
uint32_t drm_major; /* version */
uint32_t drm_minor;
uint32_t drm_patchlevel;
@@ -96,6 +96,20 @@ struct radeon_info {
bool has_fence_to_handle;
bool has_ctx_priority;
bool has_local_buffers;
bool kernel_flushes_hdp_before_ib;
bool htile_cmask_support_1d_tiling;
bool si_TA_CS_BC_BASE_ADDR_allowed;
bool has_bo_metadata;
bool has_gpu_reset_status_query;
bool has_gpu_reset_counter_query;
bool has_eqaa_surface_allocator;
bool has_format_bc1_through_bc7;
bool kernel_flushes_tc_l2_after_ib;
bool has_indirect_compute_dispatch;
bool has_unaligned_shader_loads;
bool has_sparse_vm_mappings;
bool has_2d_tiling;
bool has_read_registers_query;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */
@@ -130,6 +144,29 @@ void ac_compute_driver_uuid(char *uuid, size_t size);
void ac_compute_device_uuid(struct radeon_info *info, char *uuid, size_t size);
void ac_print_gpu_info(struct radeon_info *info);
int ac_get_gs_table_depth(enum chip_class chip_class, enum radeon_family family);
void ac_get_raster_config(struct radeon_info *info,
uint32_t *raster_config_p,
uint32_t *raster_config_1_p);
void ac_get_harvested_configs(struct radeon_info *info,
unsigned raster_config,
unsigned *cik_raster_config_1_p,
unsigned *raster_config_se);
static inline unsigned ac_get_max_simd_waves(enum radeon_family family)
{
switch (family) {
/* These always have 8 waves: */
case CHIP_POLARIS10:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGAM:
return 8;
default:
return 10;
}
}
#ifdef __cplusplus
}

View File

@@ -888,37 +888,36 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
bool writeonly_memory,
bool swizzle_enable_hint)
{
/* Split 3 channel stores, becase LLVM doesn't support 3-channel
* intrinsics. */
if (num_channels == 3) {
LLVMValueRef v[3], v01;
for (int i = 0; i < 3; i++) {
v[i] = LLVMBuildExtractElement(ctx->builder, vdata,
LLVMConstInt(ctx->i32, i, 0), "");
}
v01 = ac_build_gather_values(ctx, v, 2);
ac_build_buffer_store_dword(ctx, rsrc, v01, 2, voffset,
soffset, inst_offset, glc, slc,
writeonly_memory, swizzle_enable_hint);
ac_build_buffer_store_dword(ctx, rsrc, v[2], 1, voffset,
soffset, inst_offset + 8,
glc, slc,
writeonly_memory, swizzle_enable_hint);
return;
}
/* SWIZZLE_ENABLE requires that soffset isn't folded into voffset
* (voffset is swizzled, but soffset isn't swizzled).
* llvm.amdgcn.buffer.store doesn't have a separate soffset parameter.
*/
if (!swizzle_enable_hint) {
/* Split 3 channel stores, becase LLVM doesn't support 3-channel
* intrinsics. */
if (num_channels == 3) {
LLVMValueRef v[3], v01;
for (int i = 0; i < 3; i++) {
v[i] = LLVMBuildExtractElement(ctx->builder, vdata,
LLVMConstInt(ctx->i32, i, 0), "");
}
v01 = ac_build_gather_values(ctx, v, 2);
ac_build_buffer_store_dword(ctx, rsrc, v01, 2, voffset,
soffset, inst_offset, glc, slc,
writeonly_memory, swizzle_enable_hint);
ac_build_buffer_store_dword(ctx, rsrc, v[2], 1, voffset,
soffset, inst_offset + 8,
glc, slc,
writeonly_memory, swizzle_enable_hint);
return;
}
unsigned func = CLAMP(num_channels, 1, 3) - 1;
static const char *types[] = {"f32", "v2f32", "v4f32"};
char name[256];
LLVMValueRef offset = soffset;
static const char *types[] = {"f32", "v2f32", "v4f32"};
if (inst_offset)
offset = LLVMBuildAdd(ctx->builder, offset,
LLVMConstInt(ctx->i32, inst_offset, 0), "");
@@ -934,53 +933,46 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
LLVMConstInt(ctx->i1, slc, 0),
};
char name[256];
snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.%s",
types[func]);
types[CLAMP(num_channels, 1, 3) - 1]);
ac_build_intrinsic(ctx, name, ctx->voidt,
args, ARRAY_SIZE(args),
writeonly_memory ?
AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY :
AC_FUNC_ATTR_WRITEONLY);
AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY :
AC_FUNC_ATTR_WRITEONLY);
return;
}
static unsigned dfmt[] = {
static const unsigned dfmt[] = {
V_008F0C_BUF_DATA_FORMAT_32,
V_008F0C_BUF_DATA_FORMAT_32_32,
V_008F0C_BUF_DATA_FORMAT_32_32_32,
V_008F0C_BUF_DATA_FORMAT_32_32_32_32
};
assert(num_channels >= 1 && num_channels <= 4);
static const char *types[] = {"i32", "v2i32", "v4i32"};
LLVMValueRef args[] = {
rsrc,
vdata,
LLVMConstInt(ctx->i32, num_channels, 0),
voffset ? voffset : LLVMGetUndef(ctx->i32),
LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""),
LLVMConstInt(ctx->i32, 0, 0),
voffset ? voffset : LLVMConstInt(ctx->i32, 0, 0),
soffset,
LLVMConstInt(ctx->i32, inst_offset, 0),
LLVMConstInt(ctx->i32, dfmt[num_channels - 1], 0),
LLVMConstInt(ctx->i32, V_008F0C_BUF_NUM_FORMAT_UINT, 0),
LLVMConstInt(ctx->i32, voffset != NULL, 0),
LLVMConstInt(ctx->i32, 0, 0), /* idxen */
LLVMConstInt(ctx->i32, glc, 0),
LLVMConstInt(ctx->i32, slc, 0),
LLVMConstInt(ctx->i32, 0, 0), /* tfe*/
LLVMConstInt(ctx->i1, glc, 0),
LLVMConstInt(ctx->i1, slc, 0),
};
/* The instruction offset field has 12 bits */
assert(voffset || inst_offset < (1 << 12));
/* The intrinsic is overloaded, we need to add a type suffix for overloading to work. */
unsigned func = CLAMP(num_channels, 1, 3) - 1;
const char *types[] = {"i32", "v2i32", "v4i32"};
char name[256];
snprintf(name, sizeof(name), "llvm.SI.tbuffer.store.%s", types[func]);
snprintf(name, sizeof(name), "llvm.amdgcn.tbuffer.store.%s",
types[CLAMP(num_channels, 1, 3) - 1]);
ac_build_intrinsic(ctx, name, ctx->voidt,
args, ARRAY_SIZE(args),
AC_FUNC_ATTR_LEGACY);
writeonly_memory ?
AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY :
AC_FUNC_ATTR_WRITEONLY);
}
static LLVMValueRef
@@ -1178,7 +1170,21 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
LLVMValueRef tl, trbl, args[2];
LLVMValueRef result;
if (ctx->chip_class >= VI) {
if (HAVE_LLVM >= 0x0700) {
unsigned tl_lanes[4], trbl_lanes[4];
for (unsigned i = 0; i < 4; ++i) {
tl_lanes[i] = i & mask;
trbl_lanes[i] = (i & mask) + idx;
}
tl = ac_build_quad_swizzle(ctx, val,
tl_lanes[0], tl_lanes[1],
tl_lanes[2], tl_lanes[3]);
trbl = ac_build_quad_swizzle(ctx, val,
trbl_lanes[0], trbl_lanes[1],
trbl_lanes[2], trbl_lanes[3]);
} else if (ctx->chip_class >= VI) {
LLVMValueRef thread_id, tl_tid, trbl_tid;
thread_id = ac_get_thread_id(ctx);
@@ -1248,6 +1254,13 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
tl = LLVMBuildBitCast(ctx->builder, tl, ctx->f32, "");
trbl = LLVMBuildBitCast(ctx->builder, trbl, ctx->f32, "");
result = LLVMBuildFSub(ctx->builder, trbl, tl, "");
if (HAVE_LLVM >= 0x0700) {
result = ac_build_intrinsic(ctx,
"llvm.amdgcn.wqm.f32", ctx->f32,
&result, 1, 0);
}
return result;
}
@@ -1367,66 +1380,41 @@ LLVMValueRef ac_build_umin(struct ac_llvm_context *ctx, LLVMValueRef a,
LLVMValueRef ac_build_clamp(struct ac_llvm_context *ctx, LLVMValueRef value)
{
if (HAVE_LLVM >= 0x0500) {
return ac_build_fmin(ctx, ac_build_fmax(ctx, value, ctx->f32_0),
ctx->f32_1);
}
LLVMValueRef args[3] = {
value,
LLVMConstReal(ctx->f32, 0),
LLVMConstReal(ctx->f32, 1),
};
return ac_build_intrinsic(ctx, "llvm.AMDGPU.clamp.", ctx->f32, args, 3,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
return ac_build_fmin(ctx, ac_build_fmax(ctx, value, ctx->f32_0),
ctx->f32_1);
}
void ac_build_export(struct ac_llvm_context *ctx, struct ac_export_args *a)
{
LLVMValueRef args[9];
if (HAVE_LLVM >= 0x0500) {
args[0] = LLVMConstInt(ctx->i32, a->target, 0);
args[1] = LLVMConstInt(ctx->i32, a->enabled_channels, 0);
args[0] = LLVMConstInt(ctx->i32, a->target, 0);
args[1] = LLVMConstInt(ctx->i32, a->enabled_channels, 0);
if (a->compr) {
LLVMTypeRef i16 = LLVMInt16TypeInContext(ctx->context);
LLVMTypeRef v2i16 = LLVMVectorType(i16, 2);
if (a->compr) {
LLVMTypeRef i16 = LLVMInt16TypeInContext(ctx->context);
LLVMTypeRef v2i16 = LLVMVectorType(i16, 2);
args[2] = LLVMBuildBitCast(ctx->builder, a->out[0],
v2i16, "");
args[3] = LLVMBuildBitCast(ctx->builder, a->out[1],
v2i16, "");
args[4] = LLVMConstInt(ctx->i1, a->done, 0);
args[5] = LLVMConstInt(ctx->i1, a->valid_mask, 0);
args[2] = LLVMBuildBitCast(ctx->builder, a->out[0],
v2i16, "");
args[3] = LLVMBuildBitCast(ctx->builder, a->out[1],
v2i16, "");
args[4] = LLVMConstInt(ctx->i1, a->done, 0);
args[5] = LLVMConstInt(ctx->i1, a->valid_mask, 0);
ac_build_intrinsic(ctx, "llvm.amdgcn.exp.compr.v2i16",
ctx->voidt, args, 6, 0);
} else {
args[2] = a->out[0];
args[3] = a->out[1];
args[4] = a->out[2];
args[5] = a->out[3];
args[6] = LLVMConstInt(ctx->i1, a->done, 0);
args[7] = LLVMConstInt(ctx->i1, a->valid_mask, 0);
ac_build_intrinsic(ctx, "llvm.amdgcn.exp.compr.v2i16",
ctx->voidt, args, 6, 0);
} else {
args[2] = a->out[0];
args[3] = a->out[1];
args[4] = a->out[2];
args[5] = a->out[3];
args[6] = LLVMConstInt(ctx->i1, a->done, 0);
args[7] = LLVMConstInt(ctx->i1, a->valid_mask, 0);
ac_build_intrinsic(ctx, "llvm.amdgcn.exp.f32",
ctx->voidt, args, 8, 0);
}
return;
ac_build_intrinsic(ctx, "llvm.amdgcn.exp.f32",
ctx->voidt, args, 8, 0);
}
args[0] = LLVMConstInt(ctx->i32, a->enabled_channels, 0);
args[1] = LLVMConstInt(ctx->i32, a->valid_mask, 0);
args[2] = LLVMConstInt(ctx->i32, a->done, 0);
args[3] = LLVMConstInt(ctx->i32, a->target, 0);
args[4] = LLVMConstInt(ctx->i32, a->compr, 0);
memcpy(args + 5, a->out, sizeof(a->out[0]) * 4);
ac_build_intrinsic(ctx, "llvm.SI.export", ctx->voidt, args, 9,
AC_FUNC_ATTR_LEGACY);
}
void ac_build_export_null(struct ac_llvm_context *ctx)
@@ -1485,8 +1473,26 @@ static unsigned ac_num_derivs(enum ac_image_dim dim)
}
}
LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
struct ac_image_args *a)
static const char *get_atomic_name(enum ac_atomic_op op)
{
switch (op) {
case ac_atomic_swap: return "swap";
case ac_atomic_add: return "add";
case ac_atomic_sub: return "sub";
case ac_atomic_smin: return "smin";
case ac_atomic_umin: return "umin";
case ac_atomic_smax: return "smax";
case ac_atomic_umax: return "umax";
case ac_atomic_and: return "and";
case ac_atomic_or: return "or";
case ac_atomic_xor: return "xor";
}
unreachable("bad atomic op");
}
/* LLVM 6 and older */
static LLVMValueRef ac_build_image_opcode_llvm6(struct ac_llvm_context *ctx,
struct ac_image_args *a)
{
LLVMValueRef args[16];
LLVMTypeRef retty = ctx->v4f32;
@@ -1494,16 +1500,6 @@ LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
const char *atomic_subop = "";
char intr_name[128], coords_type[64];
assert(!a->lod || a->lod == ctx->i32_0 || a->lod == ctx->f32_0 ||
!a->level_zero);
assert((a->opcode != ac_image_get_resinfo && a->opcode != ac_image_load_mip &&
a->opcode != ac_image_store_mip) ||
a->lod);
assert((a->bias ? 1 : 0) +
(a->lod ? 1 : 0) +
(a->level_zero ? 1 : 0) +
(a->derivs[0] ? 1 : 0) <= 1);
bool sample = a->opcode == ac_image_sample ||
a->opcode == ac_image_gather4 ||
a->opcode == ac_image_get_lod;
@@ -1521,6 +1517,20 @@ LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
LLVMValueRef addr;
unsigned num_addr = 0;
if (a->opcode == ac_image_get_lod) {
switch (a->dim) {
case ac_image_1darray:
num_coords = 1;
break;
case ac_image_2darray:
case ac_image_cube:
num_coords = 2;
break;
default:
break;
}
}
if (a->offset)
args[num_addr++] = ac_to_integer(ctx, a->offset);
if (a->bias)
@@ -1601,18 +1611,7 @@ LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
if (a->opcode == ac_image_atomic_cmpswap) {
atomic_subop = "cmpswap";
} else {
switch (a->atomic) {
case ac_atomic_swap: atomic_subop = "swap"; break;
case ac_atomic_add: atomic_subop = "add"; break;
case ac_atomic_sub: atomic_subop = "sub"; break;
case ac_atomic_smin: atomic_subop = "smin"; break;
case ac_atomic_umin: atomic_subop = "umin"; break;
case ac_atomic_smax: atomic_subop = "smax"; break;
case ac_atomic_umax: atomic_subop = "umax"; break;
case ac_atomic_and: atomic_subop = "and"; break;
case ac_atomic_or: atomic_subop = "or"; break;
case ac_atomic_xor: atomic_subop = "xor"; break;
}
atomic_subop = get_atomic_name(a->atomic);
}
break;
case ac_image_get_lod:
@@ -1656,22 +1655,175 @@ LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
return result;
}
LLVMValueRef ac_build_image_opcode(struct ac_llvm_context *ctx,
struct ac_image_args *a)
{
const char *overload[3] = { "", "", "" };
unsigned num_overloads = 0;
LLVMValueRef args[18];
unsigned num_args = 0;
enum ac_image_dim dim = a->dim;
assert(!a->lod || a->lod == ctx->i32_0 || a->lod == ctx->f32_0 ||
!a->level_zero);
assert((a->opcode != ac_image_get_resinfo && a->opcode != ac_image_load_mip &&
a->opcode != ac_image_store_mip) ||
a->lod);
assert(a->opcode == ac_image_sample || a->opcode == ac_image_gather4 ||
(!a->compare && !a->offset));
assert((a->opcode == ac_image_sample || a->opcode == ac_image_gather4 ||
a->opcode == ac_image_get_lod) ||
!a->bias);
assert((a->bias ? 1 : 0) +
(a->lod ? 1 : 0) +
(a->level_zero ? 1 : 0) +
(a->derivs[0] ? 1 : 0) <= 1);
if (HAVE_LLVM < 0x0700)
return ac_build_image_opcode_llvm6(ctx, a);
if (a->opcode == ac_image_get_lod) {
switch (dim) {
case ac_image_1darray:
dim = ac_image_1d;
break;
case ac_image_2darray:
case ac_image_cube:
dim = ac_image_2d;
break;
default:
break;
}
}
bool sample = a->opcode == ac_image_sample ||
a->opcode == ac_image_gather4 ||
a->opcode == ac_image_get_lod;
bool atomic = a->opcode == ac_image_atomic ||
a->opcode == ac_image_atomic_cmpswap;
LLVMTypeRef coord_type = sample ? ctx->f32 : ctx->i32;
if (atomic || a->opcode == ac_image_store || a->opcode == ac_image_store_mip) {
args[num_args++] = a->data[0];
if (a->opcode == ac_image_atomic_cmpswap)
args[num_args++] = a->data[1];
}
if (!atomic)
args[num_args++] = LLVMConstInt(ctx->i32, a->dmask, false);
if (a->offset)
args[num_args++] = ac_to_integer(ctx, a->offset);
if (a->bias) {
args[num_args++] = ac_to_float(ctx, a->bias);
overload[num_overloads++] = ".f32";
}
if (a->compare)
args[num_args++] = ac_to_float(ctx, a->compare);
if (a->derivs[0]) {
unsigned count = ac_num_derivs(dim);
for (unsigned i = 0; i < count; ++i)
args[num_args++] = ac_to_float(ctx, a->derivs[i]);
overload[num_overloads++] = ".f32";
}
unsigned num_coords =
a->opcode != ac_image_get_resinfo ? ac_num_coords(dim) : 0;
for (unsigned i = 0; i < num_coords; ++i)
args[num_args++] = LLVMBuildBitCast(ctx->builder, a->coords[i], coord_type, "");
if (a->lod)
args[num_args++] = LLVMBuildBitCast(ctx->builder, a->lod, coord_type, "");
overload[num_overloads++] = sample ? ".f32" : ".i32";
args[num_args++] = a->resource;
if (sample) {
args[num_args++] = a->sampler;
args[num_args++] = LLVMConstInt(ctx->i1, a->unorm, false);
}
args[num_args++] = ctx->i32_0; /* texfailctrl */
args[num_args++] = LLVMConstInt(ctx->i32, a->cache_policy, false);
const char *name;
const char *atomic_subop = "";
switch (a->opcode) {
case ac_image_sample: name = "sample"; break;
case ac_image_gather4: name = "gather4"; break;
case ac_image_load: name = "load"; break;
case ac_image_load_mip: name = "load.mip"; break;
case ac_image_store: name = "store"; break;
case ac_image_store_mip: name = "store.mip"; break;
case ac_image_atomic:
name = "atomic.";
atomic_subop = get_atomic_name(a->atomic);
break;
case ac_image_atomic_cmpswap:
name = "atomic.";
atomic_subop = "cmpswap";
break;
case ac_image_get_lod: name = "getlod"; break;
case ac_image_get_resinfo: name = "getresinfo"; break;
default: unreachable("invalid image opcode");
}
const char *dimname;
switch (dim) {
case ac_image_1d: dimname = "1d"; break;
case ac_image_2d: dimname = "2d"; break;
case ac_image_3d: dimname = "3d"; break;
case ac_image_cube: dimname = "cube"; break;
case ac_image_1darray: dimname = "1darray"; break;
case ac_image_2darray: dimname = "2darray"; break;
case ac_image_2dmsaa: dimname = "2dmsaa"; break;
case ac_image_2darraymsaa: dimname = "2darraymsaa"; break;
default: unreachable("invalid dim");
}
bool lod_suffix =
a->lod && (a->opcode == ac_image_sample || a->opcode == ac_image_gather4);
char intr_name[96];
snprintf(intr_name, sizeof(intr_name),
"llvm.amdgcn.image.%s%s" /* base name */
"%s%s%s" /* sample/gather modifiers */
".%s.%s%s%s%s", /* dimension and type overloads */
name, atomic_subop,
a->compare ? ".c" : "",
a->bias ? ".b" :
lod_suffix ? ".l" :
a->derivs[0] ? ".d" :
a->level_zero ? ".lz" : "",
a->offset ? ".o" : "",
dimname,
atomic ? "i32" : "v4f32",
overload[0], overload[1], overload[2]);
LLVMTypeRef retty;
if (atomic)
retty = ctx->i32;
else if (a->opcode == ac_image_store || a->opcode == ac_image_store_mip)
retty = ctx->voidt;
else
retty = ctx->v4f32;
LLVMValueRef result =
ac_build_intrinsic(ctx, intr_name, retty, args, num_args,
a->attributes);
if (!sample && retty == ctx->v4f32) {
result = LLVMBuildBitCast(ctx->builder, result,
ctx->v4i32, "");
}
return result;
}
LLVMValueRef ac_build_cvt_pkrtz_f16(struct ac_llvm_context *ctx,
LLVMValueRef args[2])
{
if (HAVE_LLVM >= 0x0500) {
LLVMTypeRef v2f16 =
LLVMVectorType(LLVMHalfTypeInContext(ctx->context), 2);
LLVMValueRef res =
ac_build_intrinsic(ctx, "llvm.amdgcn.cvt.pkrtz",
v2f16, args, 2,
AC_FUNC_ATTR_READNONE);
return LLVMBuildBitCast(ctx->builder, res, ctx->i32, "");
}
return ac_build_intrinsic(ctx, "llvm.SI.packf16", ctx->i32, args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
LLVMTypeRef v2f16 =
LLVMVectorType(LLVMHalfTypeInContext(ctx->context), 2);
LLVMValueRef res =
ac_build_intrinsic(ctx, "llvm.amdgcn.cvt.pkrtz",
v2f16, args, 2,
AC_FUNC_ATTR_READNONE);
return LLVMBuildBitCast(ctx->builder, res, ctx->i32, "");
}
/* Upper 16 bits must be zero. */
@@ -1855,20 +2007,11 @@ LLVMValueRef ac_build_bfe(struct ac_llvm_context *ctx, LLVMValueRef input,
width,
};
if (HAVE_LLVM >= 0x0500) {
return ac_build_intrinsic(ctx,
is_signed ? "llvm.amdgcn.sbfe.i32" :
"llvm.amdgcn.ubfe.i32",
ctx->i32, args, 3,
AC_FUNC_ATTR_READNONE);
}
return ac_build_intrinsic(ctx,
is_signed ? "llvm.AMDGPU.bfe.i32" :
"llvm.AMDGPU.bfe.u32",
is_signed ? "llvm.amdgcn.sbfe.i32" :
"llvm.amdgcn.ubfe.i32",
ctx->i32, args, 3,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_LEGACY);
AC_FUNC_ATTR_READNONE);
}
void ac_build_waitcnt(struct ac_llvm_context *ctx, unsigned simm16)
@@ -1948,9 +2091,9 @@ LLVMValueRef ac_build_fsign(struct ac_llvm_context *ctx, LLVMValueRef src0,
return val;
}
#define AC_EXP_TARGET (HAVE_LLVM >= 0x0500 ? 0 : 3)
#define AC_EXP_ENABLED_CHANNELS (HAVE_LLVM >= 0x0500 ? 1 : 0)
#define AC_EXP_OUT0 (HAVE_LLVM >= 0x0500 ? 2 : 5)
#define AC_EXP_TARGET 0
#define AC_EXP_ENABLED_CHANNELS 1
#define AC_EXP_OUT0 2
enum ac_ir_type {
AC_IR_UNDEF,
@@ -2604,11 +2747,13 @@ void ac_apply_fmask_to_sample(struct ac_llvm_context *ac, LLVMValueRef fmask,
final_sample = LLVMBuildMul(ac->builder, addr[sample_chan],
LLVMConstInt(ac->i32, 4, 0), "");
final_sample = LLVMBuildLShr(ac->builder, fmask_value, final_sample, "");
/* Mask the sample index by 0x7, because 0x8 means an unknown value
* with EQAA, so those will map to 0. */
final_sample = LLVMBuildAnd(ac->builder, final_sample,
LLVMConstInt(ac->i32, 0xF, 0), "");
LLVMConstInt(ac->i32, 0x7, 0), "");
/* Don't rewrite the sample index if WORD1.DATA_FORMAT of the FMASK
* resource descriptor is 0 (invalid),
* resource descriptor is 0 (invalid).
*/
LLVMValueRef tmp;
tmp = LLVMBuildBitCast(ac->builder, fmask, ac->v8i32, "");
@@ -2852,7 +2997,7 @@ static LLVMValueRef
ac_build_set_inactive(struct ac_llvm_context *ctx, LLVMValueRef src,
LLVMValueRef inactive)
{
char name[32], type[8];
char name[33], type[8];
LLVMTypeRef src_type = LLVMTypeOf(src);
src = ac_to_integer(ctx, src);
inactive = ac_to_integer(ctx, inactive);

View File

@@ -37,22 +37,10 @@
#include <llvm/IR/CallSite.h>
#include <llvm/IR/IRBuilder.h>
#if HAVE_LLVM < 0x0500
namespace llvm {
typedef AttributeSet AttributeList;
}
#endif
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
{
llvm::Argument *A = llvm::unwrap<llvm::Argument>(val);
#if HAVE_LLVM < 0x0500
llvm::AttrBuilder B;
B.addDereferenceableAttr(bytes);
A->addAttr(llvm::AttributeList::get(A->getContext(), A->getArgNo() + 1, B));
#else
A->addAttr(llvm::Attribute::getWithDereferenceableBytes(A->getContext(), bytes));
#endif
}
bool ac_is_sgpr_param(LLVMValueRef arg)

View File

@@ -115,15 +115,19 @@ const char *ac_get_llvm_processor_name(enum radeon_family family)
case CHIP_VEGAM:
return "polaris11";
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_RAVEN:
return "gfx900";
case CHIP_RAVEN:
return "gfx902";
case CHIP_VEGA12:
return HAVE_LLVM >= 0x0700 ? "gfx904" : "gfx902";
default:
return "";
}
}
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, enum ac_target_machine_options tm_options)
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
enum ac_target_machine_options tm_options,
const char **out_triple)
{
assert(family >= CHIP_TAHITI);
char features[256];
@@ -146,6 +150,8 @@ LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, enum ac
LLVMRelocDefault,
LLVMCodeModelDefault);
if (out_triple)
*out_triple = triple;
return tm;
}

View File

@@ -68,7 +68,9 @@ enum ac_float_mode {
};
const char *ac_get_llvm_processor_name(enum radeon_family family);
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, enum ac_target_machine_options tm_options);
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
enum ac_target_machine_options tm_options,
const char **out_triple);
LLVMTargetRef ac_get_llvm_target(const char *triple);
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes);

View File

@@ -87,7 +87,6 @@ get_ac_sampler_dim(const struct ac_llvm_context *ctx, enum glsl_sampler_dim dim,
return is_array ? ac_image_1darray : ac_image_1d;
case GLSL_SAMPLER_DIM_2D:
case GLSL_SAMPLER_DIM_RECT:
case GLSL_SAMPLER_DIM_SUBPASS:
case GLSL_SAMPLER_DIM_EXTERNAL:
return is_array ? ac_image_2darray : ac_image_2d;
case GLSL_SAMPLER_DIM_3D:
@@ -95,8 +94,11 @@ get_ac_sampler_dim(const struct ac_llvm_context *ctx, enum glsl_sampler_dim dim,
case GLSL_SAMPLER_DIM_CUBE:
return ac_image_cube;
case GLSL_SAMPLER_DIM_MS:
case GLSL_SAMPLER_DIM_SUBPASS_MS:
return is_array ? ac_image_2darraymsaa : ac_image_2dmsaa;
case GLSL_SAMPLER_DIM_SUBPASS:
return ac_image_2darray;
case GLSL_SAMPLER_DIM_SUBPASS_MS:
return ac_image_2darraymsaa;
default:
unreachable("bad sampler dim");
}
@@ -1307,6 +1309,14 @@ static LLVMValueRef build_tex_intrinsic(struct ac_nir_context *ctx,
}
}
/* Fixup for GFX9 which allocates 1D textures as 2D. */
if (instr->op == nir_texop_lod && ctx->ac.chip_class >= GFX9) {
if ((args->dim == ac_image_2darray ||
args->dim == ac_image_2d) && !args->coords[1]) {
args->coords[1] = ctx->ac.i32_0;
}
}
args->attributes = AC_FUNC_ATTR_READNONE;
return ac_build_image_opcode(&ctx->ac, args);
}
@@ -1562,6 +1572,11 @@ static LLVMValueRef visit_load_buffer(struct ac_nir_context *ctx,
LLVMConstInt(ctx->ac.i32, 6, false), LLVMConstInt(ctx->ac.i32, 7, false)
};
if (num_components == 6) {
/* we end up with a v4f32 and v2f32 but shuffle fails on that */
results[1] = ac_build_expand_to_vec4(&ctx->ac, results[1], 4);
}
LLVMValueRef swizzle = LLVMConstVector(masks, num_components);
ret = LLVMBuildShuffleVector(ctx->ac.builder, results[0],
results[num_components > 4 ? 1 : 0], swizzle, "");
@@ -2090,18 +2105,6 @@ static LLVMValueRef adjust_sample_index_using_fmask(struct ac_llvm_context *ctx,
return sample_index;
}
static bool
glsl_is_array_image(const struct glsl_type *type)
{
const enum glsl_sampler_dim dim = glsl_get_sampler_dim(type);
if (glsl_sampler_type_is_array(type))
return true;
return dim == GLSL_SAMPLER_DIM_SUBPASS ||
dim == GLSL_SAMPLER_DIM_SUBPASS_MS;
}
static void get_image_coords(struct ac_nir_context *ctx,
const nir_intrinsic_instr *instr,
struct ac_image_args *args)
@@ -2247,7 +2250,7 @@ static LLVMValueRef visit_image_load(struct ac_nir_context *ctx,
args.resource = get_sampler_desc(ctx, instr->variables[0],
AC_DESC_IMAGE, NULL, true, false);
args.dim = get_ac_image_dim(&ctx->ac, glsl_get_sampler_dim(type),
glsl_is_array_image(type));
glsl_sampler_type_is_array(type));
args.dmask = 15;
args.attributes = AC_FUNC_ATTR_READONLY;
if (var->data.image._volatile || var->data.image.coherent)
@@ -2290,7 +2293,7 @@ static void visit_image_store(struct ac_nir_context *ctx,
args.resource = get_sampler_desc(ctx, instr->variables[0],
AC_DESC_IMAGE, NULL, true, false);
args.dim = get_ac_image_dim(&ctx->ac, glsl_get_sampler_dim(type),
glsl_is_array_image(type));
glsl_sampler_type_is_array(type));
args.dmask = 15;
if (force_glc || var->data.image._volatile || var->data.image.coherent)
args.cache_policy |= ac_glc;
@@ -2381,7 +2384,7 @@ static LLVMValueRef visit_image_atomic(struct ac_nir_context *ctx,
args.resource = get_sampler_desc(ctx, instr->variables[0],
AC_DESC_IMAGE, NULL, true, false);
args.dim = get_ac_image_dim(&ctx->ac, glsl_get_sampler_dim(type),
glsl_is_array_image(type));
glsl_sampler_type_is_array(type));
return ac_build_image_opcode(&ctx->ac, &args);
}
@@ -3397,6 +3400,13 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr)
}
/* Texture coordinates fixups */
if (instr->coord_components > 1 &&
instr->sampler_dim == GLSL_SAMPLER_DIM_1D &&
instr->is_array &&
instr->op != nir_texop_txf) {
args.coords[1] = apply_round_slice(&ctx->ac, args.coords[1]);
}
if (instr->coord_components > 2 &&
(instr->sampler_dim == GLSL_SAMPLER_DIM_2D ||
instr->sampler_dim == GLSL_SAMPLER_DIM_MS ||

View File

@@ -227,8 +227,16 @@ ADDR_HANDLE amdgpu_addr_create(const struct radeon_info *info,
return addrCreateOutput.hLib;
}
static int surf_config_sanity(const struct ac_surf_config *config)
static int surf_config_sanity(const struct ac_surf_config *config,
unsigned flags)
{
/* FMASK is allocated together with the color surface and can't be
* allocated separately.
*/
assert(!(flags & RADEON_SURF_FMASK));
if (flags & RADEON_SURF_FMASK)
return -EINVAL;
/* all dimension must be at least 1 ! */
if (!config->info.width || !config->info.height || !config->info.depth ||
!config->info.array_size || !config->info.levels)
@@ -241,10 +249,27 @@ static int surf_config_sanity(const struct ac_surf_config *config)
case 4:
case 8:
break;
case 16:
if (flags & RADEON_SURF_Z_OR_SBUFFER)
return -EINVAL;
break;
default:
return -EINVAL;
}
if (!(flags & RADEON_SURF_Z_OR_SBUFFER)) {
switch (config->info.color_samples) {
case 0:
case 1:
case 2:
case 4:
case 8:
break;
default:
return -EINVAL;
}
}
if (config->is_3d && config->info.array_size > 1)
return -EINVAL;
if (config->is_cube && config->info.depth > 1)
@@ -276,10 +301,10 @@ static int gfx6_compute_level(ADDR_HANDLE addrlib,
*/
if (config->info.levels == 1 &&
AddrSurfInfoIn->tileMode == ADDR_TM_LINEAR_ALIGNED &&
AddrSurfInfoIn->bpp) {
AddrSurfInfoIn->bpp &&
util_is_power_of_two_or_zero(AddrSurfInfoIn->bpp)) {
unsigned alignment = 256 / (AddrSurfInfoIn->bpp / 8);
assert(util_is_power_of_two_or_zero(AddrSurfInfoIn->bpp));
AddrSurfInfoIn->width = align(AddrSurfInfoIn->width, alignment);
}
@@ -343,6 +368,9 @@ static int gfx6_compute_level(ADDR_HANDLE addrlib,
/* The previous level's flag tells us if we can use DCC for this level. */
if (AddrSurfInfoIn->flags.dccCompatible &&
(level == 0 || AddrDccOut->subLvlCompressible)) {
bool prev_level_clearable = level == 0 ||
AddrDccOut->dccRamSizeAligned;
AddrDccIn->colorSurfSize = AddrSurfInfoOut->surfSize;
AddrDccIn->tileMode = AddrSurfInfoOut->tileMode;
AddrDccIn->tileInfo = *AddrSurfInfoOut->pTileInfo;
@@ -355,10 +383,26 @@ static int gfx6_compute_level(ADDR_HANDLE addrlib,
if (ret == ADDR_OK) {
surf_level->dcc_offset = surf->dcc_size;
surf_level->dcc_fast_clear_size = AddrDccOut->dccFastClearSize;
surf->num_dcc_levels = level + 1;
surf->dcc_size = surf_level->dcc_offset + AddrDccOut->dccRamSize;
surf->dcc_alignment = MAX2(surf->dcc_alignment, AddrDccOut->dccRamBaseAlign);
/* If the DCC size of a subresource (1 mip level or 1 slice)
* is not aligned, the DCC memory layout is not contiguous for
* that subresource, which means we can't use fast clear.
*
* We only do fast clears for whole mipmap levels. If we did
* per-slice fast clears, the same restriction would apply.
* (i.e. only compute the slice size and see if it's aligned)
*
* The last level can be non-contiguous and still be clearable
* if it's interleaved with the next level that doesn't exist.
*/
if (AddrDccOut->dccRamSizeAligned ||
(prev_level_clearable && level == config->info.levels - 1))
surf_level->dcc_fast_clear_size = AddrDccOut->dccFastClearSize;
else
surf_level->dcc_fast_clear_size = 0;
}
}
@@ -426,7 +470,6 @@ static bool get_display_flag(const struct ac_surf_config *config,
unsigned bpe = surf->bpe;
if (surf->flags & RADEON_SURF_SCANOUT &&
!(surf->flags & RADEON_SURF_FMASK) &&
config->info.samples <= 1 &&
surf->blk_w <= 2 && surf->blk_h == 1) {
/* subsampled */
@@ -537,9 +580,8 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
compressed = surf->blk_w == 4 && surf->blk_h == 4;
/* MSAA and FMASK require 2D tiling. */
if (config->info.samples > 1 ||
(surf->flags & RADEON_SURF_FMASK))
/* MSAA requires 2D tiling. */
if (config->info.samples > 1)
mode = RADEON_SURF_MODE_2D;
/* DB doesn't support linear layouts. */
@@ -582,13 +624,18 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
}
AddrDccIn.numSamples = AddrSurfInfoIn.numSamples =
config->info.samples ? config->info.samples : 1;
MAX2(1, config->info.samples);
AddrSurfInfoIn.tileIndex = -1;
if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) {
AddrDccIn.numSamples = AddrSurfInfoIn.numFrags =
MAX2(1, config->info.color_samples);
}
/* Set the micro tile type. */
if (surf->flags & RADEON_SURF_SCANOUT)
AddrSurfInfoIn.tileType = ADDR_DISPLAYABLE;
else if (surf->flags & (RADEON_SURF_Z_OR_SBUFFER | RADEON_SURF_FMASK))
else if (surf->flags & RADEON_SURF_Z_OR_SBUFFER)
AddrSurfInfoIn.tileType = ADDR_DEPTH_SAMPLE_ORDER;
else
AddrSurfInfoIn.tileType = ADDR_NON_DISPLAYABLE;
@@ -596,7 +643,6 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.cube = config->is_cube;
AddrSurfInfoIn.flags.fmask = (surf->flags & RADEON_SURF_FMASK) != 0;
AddrSurfInfoIn.flags.display = get_display_flag(config, surf);
AddrSurfInfoIn.flags.pow2Pad = config->info.levels > 1;
AddrSurfInfoIn.flags.tcCompatible = (surf->flags & RADEON_SURF_TC_COMPATIBLE_HTILE) != 0;
@@ -624,7 +670,7 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
config->info.levels == 1);
AddrSurfInfoIn.flags.noStencil = (surf->flags & RADEON_SURF_SBUFFER) == 0;
AddrSurfInfoIn.flags.compressZ = AddrSurfInfoIn.flags.depth;
AddrSurfInfoIn.flags.compressZ = !!(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
/* On CI/VI, the DB uses the same pitch and tile mode (except tilesplit)
* for Z and stencil. This can cause a number of problems which we work
@@ -661,8 +707,6 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
if (AddrSurfInfoIn.tileMode >= ADDR_TM_2D_TILED_THIN1 &&
surf->u.legacy.bankw && surf->u.legacy.bankh &&
surf->u.legacy.mtilea && surf->u.legacy.tile_split) {
assert(!(surf->flags & RADEON_SURF_FMASK));
/* If any of these parameters are incorrect, the calculation
* will fail. */
AddrTileInfoIn.banks = surf->u.legacy.num_banks;
@@ -809,6 +853,67 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
}
}
/* Compute FMASK. */
if (config->info.samples >= 2 && AddrSurfInfoIn.flags.color) {
ADDR_COMPUTE_FMASK_INFO_INPUT fin = {0};
ADDR_COMPUTE_FMASK_INFO_OUTPUT fout = {0};
ADDR_TILEINFO fmask_tile_info = {};
fin.size = sizeof(fin);
fout.size = sizeof(fout);
fin.tileMode = AddrSurfInfoOut.tileMode;
fin.pitch = AddrSurfInfoOut.pitch;
fin.height = config->info.height;
fin.numSlices = AddrSurfInfoIn.numSlices;
fin.numSamples = AddrSurfInfoIn.numSamples;
fin.numFrags = AddrSurfInfoIn.numFrags;
fin.tileIndex = -1;
fout.pTileInfo = &fmask_tile_info;
r = AddrComputeFmaskInfo(addrlib, &fin, &fout);
if (r)
return r;
surf->fmask_size = fout.fmaskBytes;
surf->fmask_alignment = fout.baseAlign;
surf->fmask_tile_swizzle = 0;
surf->u.legacy.fmask.slice_tile_max =
(fout.pitch * fout.height) / 64;
if (surf->u.legacy.fmask.slice_tile_max)
surf->u.legacy.fmask.slice_tile_max -= 1;
surf->u.legacy.fmask.tiling_index = fout.tileIndex;
surf->u.legacy.fmask.bankh = fout.pTileInfo->bankHeight;
surf->u.legacy.fmask.pitch_in_pixels = fout.pitch;
/* Compute tile swizzle for FMASK. */
if (config->info.fmask_surf_index &&
!(surf->flags & RADEON_SURF_SHAREABLE)) {
ADDR_COMPUTE_BASE_SWIZZLE_INPUT xin = {0};
ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT xout = {0};
xin.size = sizeof(ADDR_COMPUTE_BASE_SWIZZLE_INPUT);
xout.size = sizeof(ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT);
/* This counter starts from 1 instead of 0. */
xin.surfIndex = p_atomic_inc_return(config->info.fmask_surf_index);
xin.tileIndex = fout.tileIndex;
xin.macroModeIndex = fout.macroModeIndex;
xin.pTileInfo = fout.pTileInfo;
xin.tileMode = fin.tileMode;
int r = AddrComputeBaseSwizzle(addrlib, &xin, &xout);
if (r != ADDR_OK)
return r;
assert(xout.tileSwizzle <=
u_bit_consecutive(0, sizeof(surf->tile_swizzle) * 8));
surf->fmask_tile_swizzle = xout.tileSwizzle;
}
}
/* Recalculate the whole DCC miptree size including disabled levels.
* This is what addrlib does, but calling addrlib would be a lot more
* complicated.
@@ -829,8 +934,17 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
/* Make sure HTILE covers the whole miptree, because the shader reads
* TC-compatible HTILE even for levels where it's disabled by DB.
*/
if (surf->htile_size && config->info.levels > 1)
surf->htile_size *= 2;
if (surf->htile_size && config->info.levels > 1 &&
surf->flags & RADEON_SURF_TC_COMPATIBLE_HTILE) {
/* MSAA can't occur with levels > 1, so ignore the sample count. */
const unsigned total_pixels = surf->surf_size / surf->bpe;
const unsigned htile_block_size = 8 * 8;
const unsigned htile_element_size = 4;
surf->htile_size = (total_pixels / htile_block_size) *
htile_element_size;
surf->htile_size = align(surf->htile_size, surf->htile_alignment);
}
surf->is_linear = surf->u.legacy.level[0].mode == RADEON_SURF_MODE_LINEAR_ALIGNED;
surf->is_displayable = surf->is_linear ||
@@ -1095,8 +1209,8 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
surf->u.gfx9.fmask.swizzle_mode = fin.swizzleMode;
surf->u.gfx9.fmask.epitch = fout.pitch - 1;
surf->u.gfx9.fmask_size = fout.fmaskBytes;
surf->u.gfx9.fmask_alignment = fout.baseAlign;
surf->fmask_size = fout.fmaskBytes;
surf->fmask_alignment = fout.baseAlign;
/* Compute tile swizzle for the FMASK surface. */
if (config->info.fmask_surf_index &&
@@ -1122,8 +1236,8 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
return ret;
assert(xout.pipeBankXor <=
u_bit_consecutive(0, sizeof(surf->u.gfx9.fmask_tile_swizzle) * 8));
surf->u.gfx9.fmask_tile_swizzle = xout.pipeBankXor;
u_bit_consecutive(0, sizeof(surf->fmask_tile_swizzle) * 8));
surf->fmask_tile_swizzle = xout.pipeBankXor;
}
}
@@ -1135,7 +1249,7 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
cin.size = sizeof(ADDR2_COMPUTE_CMASK_INFO_INPUT);
cout.size = sizeof(ADDR2_COMPUTE_CMASK_INFO_OUTPUT);
if (in->numSamples) {
if (in->numSamples > 1) {
/* FMASK is always aligned. */
cin.cMaskFlags.pipeAligned = 1;
cin.cMaskFlags.rbAligned = 1;
@@ -1178,8 +1292,6 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
ADDR2_COMPUTE_SURFACE_INFO_INPUT AddrSurfInfoIn = {0};
int r;
assert(!(surf->flags & RADEON_SURF_FMASK));
AddrSurfInfoIn.size = sizeof(ADDR2_COMPUTE_SURFACE_INFO_INPUT);
compressed = surf->blk_w == 4 && surf->blk_h == 4;
@@ -1217,6 +1329,10 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
AddrSurfInfoIn.format = ADDR_FMT_32_32;
break;
case 12:
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
AddrSurfInfoIn.format = ADDR_FMT_32_32_32;
break;
case 16:
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
AddrSurfInfoIn.format = ADDR_FMT_32_32_32_32;
@@ -1236,9 +1352,12 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
AddrSurfInfoIn.flags.opt4space = 1;
AddrSurfInfoIn.numMipLevels = config->info.levels;
AddrSurfInfoIn.numSamples = config->info.samples ? config->info.samples : 1;
AddrSurfInfoIn.numSamples = MAX2(1, config->info.samples);
AddrSurfInfoIn.numFrags = AddrSurfInfoIn.numSamples;
if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER))
AddrSurfInfoIn.numFrags = MAX2(1, config->info.color_samples);
/* GFX9 doesn't support 1D depth textures, so allocate all 1D textures
* as 2D to avoid having shader variants for 1D vs 2D, so all shaders
* must sample 1D textures as 2D. */
@@ -1291,12 +1410,12 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
surf->num_dcc_levels = 0;
surf->surf_size = 0;
surf->fmask_size = 0;
surf->dcc_size = 0;
surf->htile_size = 0;
surf->htile_slice_size = 0;
surf->u.gfx9.surf_offset = 0;
surf->u.gfx9.stencil_offset = 0;
surf->u.gfx9.fmask_size = 0;
surf->u.gfx9.cmask_size = 0;
/* Calculate texture layout information. */
@@ -1391,7 +1510,7 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
/* Temporary workaround to prevent VM faults and hangs. */
if (info->family == CHIP_VEGA12)
surf->u.gfx9.fmask_size *= 8;
surf->fmask_size *= 8;
return 0;
}
@@ -1403,7 +1522,7 @@ int ac_compute_surface(ADDR_HANDLE addrlib, const struct radeon_info *info,
{
int r;
r = surf_config_sanity(config);
r = surf_config_sanity(config, surf->flags);
if (r)
return r;

View File

@@ -79,6 +79,13 @@ struct legacy_surf_level {
enum radeon_surf_mode mode:2;
};
struct legacy_surf_fmask {
unsigned slice_tile_max; /* max 4M */
uint8_t tiling_index; /* max 31 */
uint8_t bankh; /* max 8 */
uint16_t pitch_in_pixels;
};
struct legacy_surf_layout {
unsigned bankw:4; /* max 8 */
unsigned bankh:4; /* max 8 */
@@ -101,6 +108,7 @@ struct legacy_surf_layout {
struct legacy_surf_level stencil_level[RADEON_SURF_MAX_LEVELS];
uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
uint8_t stencil_tiling_index[RADEON_SURF_MAX_LEVELS];
struct legacy_surf_fmask fmask;
};
/* Same as addrlib - AddrResourceType. */
@@ -142,13 +150,9 @@ struct gfx9_surf_layout {
uint16_t dcc_pitch_max; /* (mip chain pitch - 1) */
uint64_t stencil_offset; /* separate stencil */
uint64_t fmask_size;
uint64_t cmask_size;
uint32_t fmask_alignment;
uint32_t cmask_alignment;
uint8_t fmask_tile_swizzle;
};
struct radeon_surf {
@@ -188,8 +192,10 @@ struct radeon_surf {
* - depth/stencil if HTILE is not TC-compatible and if the gen is not GFX9
*/
uint8_t tile_swizzle;
uint8_t fmask_tile_swizzle;
uint64_t surf_size;
uint64_t fmask_size;
/* DCC and HTILE are very small. */
uint32_t dcc_size;
uint32_t htile_size;
@@ -197,6 +203,7 @@ struct radeon_surf {
uint32_t htile_slice_size;
uint32_t surf_alignment;
uint32_t fmask_alignment;
uint32_t dcc_alignment;
uint32_t htile_alignment;
@@ -217,12 +224,13 @@ struct ac_surf_info {
uint32_t width;
uint32_t height;
uint32_t depth;
uint8_t samples;
uint8_t samples; /* For Z/S: samples; For color: FMASK coverage samples */
uint8_t color_samples; /* For color: color samples */
uint8_t levels;
uint8_t num_channels; /* heuristic for displayability */
uint16_t array_size;
uint32_t *surf_index; /* Set a monotonic counter for tile swizzling. */
uint32_t *fmask_surf_index; /* GFX9+ */
uint32_t *fmask_surf_index;
};
struct ac_surf_config {

View File

@@ -1123,7 +1123,6 @@
#define S_030960_HW_USE_ONLY(x) (((unsigned)(x) & 0x1) << 23)
#define G_030960_HW_USE_ONLY(x) (((x) >> 23) & 0x1)
#define C_030960_HW_USE_ONLY 0xFF7FFFFF
#define R_030964_VGT_OBJECT_ID 0x030964
#define R_030968_VGT_INSTANCE_BASE_ID 0x030968
#define R_030A00_PA_SU_LINE_STIPPLE_VALUE 0x030A00
#define S_030A00_LINE_STIPPLE_VALUE(x) (((unsigned)(x) & 0xFFFFFF) << 0)
@@ -1195,19 +1194,6 @@
#define S_030E04_ADDRESS(x) (((unsigned)(x) & 0xFF) << 0)
#define G_030E04_ADDRESS(x) (((x) >> 0) & 0xFF)
#define C_030E04_ADDRESS 0xFFFFFF00
#define R_030E08_TA_GRAD_ADJ_UCONFIG 0x030E08
#define S_030E08_GRAD_ADJ_0(x) (((unsigned)(x) & 0xFF) << 0)
#define G_030E08_GRAD_ADJ_0(x) (((x) >> 0) & 0xFF)
#define C_030E08_GRAD_ADJ_0 0xFFFFFF00
#define S_030E08_GRAD_ADJ_1(x) (((unsigned)(x) & 0xFF) << 8)
#define G_030E08_GRAD_ADJ_1(x) (((x) >> 8) & 0xFF)
#define C_030E08_GRAD_ADJ_1 0xFFFF00FF
#define S_030E08_GRAD_ADJ_2(x) (((unsigned)(x) & 0xFF) << 16)
#define G_030E08_GRAD_ADJ_2(x) (((x) >> 16) & 0xFF)
#define C_030E08_GRAD_ADJ_2 0xFF00FFFF
#define S_030E08_GRAD_ADJ_3(x) (((unsigned)(x) & 0xFF) << 24)
#define G_030E08_GRAD_ADJ_3(x) (((x) >> 24) & 0xFF)
#define C_030E08_GRAD_ADJ_3 0x00FFFFFF
#define R_030F00_DB_OCCLUSION_COUNT0_LOW 0x030F00
#define R_008F00_SQ_BUF_RSRC_WORD0 0x008F00
#define R_030F04_DB_OCCLUSION_COUNT0_HI 0x030F04
@@ -4084,10 +4070,6 @@
#define S_028060_DISALLOW_OVERFLOW(x) (((unsigned)(x) & 0x1) << 3)
#define G_028060_DISALLOW_OVERFLOW(x) (((x) >> 3) & 0x1)
#define C_028060_DISALLOW_OVERFLOW 0xFFFFFFF7
#define R_028064_DB_RENDER_FILTER 0x028064
#define S_028064_PS_INVOKE_MASK(x) (((unsigned)(x) & 0xFFFF) << 0)
#define G_028064_PS_INVOKE_MASK(x) (((x) >> 0) & 0xFFFF)
#define C_028064_PS_INVOKE_MASK 0xFFFF0000
#define R_028068_DB_Z_INFO2 0x028068
#define S_028068_EPITCH(x) (((unsigned)(x) & 0xFFFF) << 0)
#define G_028068_EPITCH(x) (((x) >> 0) & 0xFFFF)
@@ -4417,9 +4399,6 @@
#define S_02835C_NUM_RB_PER_SE(x) (((unsigned)(x) & 0x03) << 5)
#define G_02835C_NUM_RB_PER_SE(x) (((x) >> 5) & 0x03)
#define C_02835C_NUM_RB_PER_SE 0xFFFFFF9F
#define S_02835C_DISABLE_SRBSL_DB_OPTIMIZED_PACKING(x) (((unsigned)(x) & 0x1) << 8)
#define G_02835C_DISABLE_SRBSL_DB_OPTIMIZED_PACKING(x) (((x) >> 8) & 0x1)
#define C_02835C_DISABLE_SRBSL_DB_OPTIMIZED_PACKING 0xFFFFFEFF
#define R_028360_CP_PERFMON_CNTX_CNTL 0x028360
#define S_028360_PERFMON_ENABLE(x) (((unsigned)(x) & 0x1) << 31)
#define G_028360_PERFMON_ENABLE(x) (((x) >> 31) & 0x1)
@@ -4463,26 +4442,6 @@
#define S_0283A8_BOT_QTR(x) (((unsigned)(x) & 0xFF) << 24)
#define G_0283A8_BOT_QTR(x) (((x) >> 24) & 0xFF)
#define C_0283A8_BOT_QTR 0x00FFFFFF
#define R_0283AC_PA_SC_FOV_WINDOW_LR 0x0283AC
#define S_0283AC_LEFT_EYE_FOV_LEFT(x) (((unsigned)(x) & 0xFF) << 0)
#define G_0283AC_LEFT_EYE_FOV_LEFT(x) (((x) >> 0) & 0xFF)
#define C_0283AC_LEFT_EYE_FOV_LEFT 0xFFFFFF00
#define S_0283AC_LEFT_EYE_FOV_RIGHT(x) (((unsigned)(x) & 0xFF) << 8)
#define G_0283AC_LEFT_EYE_FOV_RIGHT(x) (((x) >> 8) & 0xFF)
#define C_0283AC_LEFT_EYE_FOV_RIGHT 0xFFFF00FF
#define S_0283AC_RIGHT_EYE_FOV_LEFT(x) (((unsigned)(x) & 0xFF) << 16)
#define G_0283AC_RIGHT_EYE_FOV_LEFT(x) (((x) >> 16) & 0xFF)
#define C_0283AC_RIGHT_EYE_FOV_LEFT 0xFF00FFFF
#define S_0283AC_RIGHT_EYE_FOV_RIGHT(x) (((unsigned)(x) & 0xFF) << 24)
#define G_0283AC_RIGHT_EYE_FOV_RIGHT(x) (((x) >> 24) & 0xFF)
#define C_0283AC_RIGHT_EYE_FOV_RIGHT 0x00FFFFFF
#define R_0283B0_PA_SC_FOV_WINDOW_TB 0x0283B0
#define S_0283B0_FOV_TOP(x) (((unsigned)(x) & 0xFF) << 0)
#define G_0283B0_FOV_TOP(x) (((x) >> 0) & 0xFF)
#define C_0283B0_FOV_TOP 0xFFFFFF00
#define S_0283B0_FOV_BOT(x) (((unsigned)(x) & 0xFF) << 8)
#define G_0283B0_FOV_BOT(x) (((x) >> 8) & 0xFF)
#define C_0283B0_FOV_BOT 0xFFFF00FF
#define R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX 0x02840C
#define R_028414_CB_BLEND_RED 0x028414
#define R_028418_CB_BLEND_GREEN 0x028418
@@ -5772,9 +5731,6 @@
#define S_028830_RECTANGLE_FILTER_DISABLE(x) (((unsigned)(x) & 0x1) << 4)
#define G_028830_RECTANGLE_FILTER_DISABLE(x) (((x) >> 4) & 0x1)
#define C_028830_RECTANGLE_FILTER_DISABLE 0xFFFFFFEF
#define S_028830_SRBSL_ENABLE(x) (((unsigned)(x) & 0x1) << 5)
#define G_028830_SRBSL_ENABLE(x) (((x) >> 5) & 0x1)
#define C_028830_SRBSL_ENABLE 0xFFFFFFDF
#define R_028834_PA_CL_OBJPRIM_ID_CNTL 0x028834
#define S_028834_OBJ_ID_SEL(x) (((unsigned)(x) & 0x1) << 0)
#define G_028834_OBJ_ID_SEL(x) (((x) >> 0) & 0x1)
@@ -6273,10 +6229,6 @@
#define S_028A98_OBJECT_ID_INST_EN(x) (((unsigned)(x) & 0x1) << 3)
#define G_028A98_OBJECT_ID_INST_EN(x) (((x) >> 3) & 0x1)
#define C_028A98_OBJECT_ID_INST_EN 0xFFFFFFF7
#define R_028A9C_VGT_INDEX_PAYLOAD_CNTL 0x028A9C
#define S_028A9C_COMPOUND_INDEX_EN(x) (((unsigned)(x) & 0x1) << 0)
#define G_028A9C_COMPOUND_INDEX_EN(x) (((x) >> 0) & 0x1)
#define C_028A9C_COMPOUND_INDEX_EN 0xFFFFFFFE
#define R_028AA0_VGT_INSTANCE_STEP_RATE_0 0x028AA0
#define R_028AA4_VGT_INSTANCE_STEP_RATE_1 0x028AA4
#define R_028AAC_VGT_ESGS_RING_ITEMSIZE 0x028AAC

View File

@@ -6892,34 +6892,22 @@
#define S_028808_ROP3(x) (((unsigned)(x) & 0xFF) << 16)
#define G_028808_ROP3(x) (((x) >> 16) & 0xFF)
#define C_028808_ROP3 0xFF00FFFF
#define V_028808_X_0X00 0x00
#define V_028808_X_0X05 0x05
#define V_028808_X_0X0A 0x0A
#define V_028808_X_0X0F 0x0F
#define V_028808_X_0X11 0x11
#define V_028808_X_0X22 0x22
#define V_028808_X_0X33 0x33
#define V_028808_X_0X44 0x44
#define V_028808_X_0X50 0x50
#define V_028808_X_0X55 0x55
#define V_028808_X_0X5A 0x5A
#define V_028808_X_0X5F 0x5F
#define V_028808_X_0X66 0x66
#define V_028808_X_0X77 0x77
#define V_028808_X_0X88 0x88
#define V_028808_X_0X99 0x99
#define V_028808_X_0XA0 0xA0
#define V_028808_X_0XA5 0xA5
#define V_028808_X_0XAA 0xAA
#define V_028808_X_0XAF 0xAF
#define V_028808_X_0XBB 0xBB
#define V_028808_X_0XCC 0xCC
#define V_028808_X_0XDD 0xDD
#define V_028808_X_0XEE 0xEE
#define V_028808_X_0XF0 0xF0
#define V_028808_X_0XF5 0xF5
#define V_028808_X_0XFA 0xFA
#define V_028808_X_0XFF 0xFF
#define V_028808_ROP3_CLEAR 0x00
#define V_028808_ROP3_NOR 0x11
#define V_028808_ROP3_AND_INVERTED 0x22
#define V_028808_ROP3_COPY_INVERTED 0x33
#define V_028808_ROP3_AND_REVERSE 0x44
#define V_028808_ROP3_INVERT 0x55
#define V_028808_ROP3_XOR 0x66
#define V_028808_ROP3_NAND 0x77
#define V_028808_ROP3_AND 0x88
#define V_028808_ROP3_EQUIVALENT 0x99
#define V_028808_ROP3_NO_OP 0xaa
#define V_028808_ROP3_OR_INVERTED 0xbb
#define V_028808_ROP3_COPY 0xcc
#define V_028808_ROP3_OR_REVERSE 0xdd
#define V_028808_ROP3_OR 0xee
#define V_028808_ROP3_SET 0xff
#define R_02880C_DB_SHADER_CONTROL 0x02880C
#define S_02880C_Z_EXPORT_ENABLE(x) (((unsigned)(x) & 0x1) << 0)
#define G_02880C_Z_EXPORT_ENABLE(x) (((x) >> 0) & 0x1)

View File

@@ -2,6 +2,7 @@
/radv_entrypoints.c
/radv_entrypoints.h
/radv_extensions.c
/radv_extensions.h
/radv_timestamp.h
/dev_icd.json
/vk_format_table.c

View File

@@ -122,7 +122,7 @@ radv_image_from_gralloc(VkDevice device_h,
return result;
if (gralloc_info->handle->numFds != 1) {
return vk_errorf(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR,
return vk_errorf(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR,
"VkNativeBufferANDROID::handle::numFds is %d, "
"expected 1", gralloc_info->handle->numFds);
}
@@ -233,7 +233,7 @@ VkResult radv_GetSwapchainGrallocUsageANDROID(
result = radv_GetPhysicalDeviceImageFormatProperties2(phys_dev_h,
&image_format_info, &image_format_props);
if (result != VK_SUCCESS) {
return vk_errorf(result,
return vk_errorf(device->instance, result,
"radv_GetPhysicalDeviceImageFormatProperties2 failed "
"inside %s", __func__);
}
@@ -252,7 +252,7 @@ VkResult radv_GetSwapchainGrallocUsageANDROID(
* gralloc swapchains.
*/
if (imageUsage != 0) {
return vk_errorf(VK_ERROR_FORMAT_NOT_SUPPORTED,
return vk_errorf(device->instance, VK_ERROR_FORMAT_NOT_SUPPORTED,
"unsupported VkImageUsageFlags(0x%x) for gralloc "
"swapchain", imageUsage);
}

View File

@@ -226,7 +226,7 @@ static VkResult radv_create_cmd_buffer(
cmd_buffer = vk_zalloc(&pool->alloc, sizeof(*cmd_buffer), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (cmd_buffer == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
cmd_buffer->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
cmd_buffer->device = device;
@@ -238,7 +238,7 @@ static VkResult radv_create_cmd_buffer(
cmd_buffer->queue_family_index = pool->queue_family_index;
} else {
/* Init the pool_link so we can safefly call list_del when we destroy
/* Init the pool_link so we can safely call list_del when we destroy
* the command buffer
*/
list_inithead(&cmd_buffer->pool_link);
@@ -250,7 +250,7 @@ static VkResult radv_create_cmd_buffer(
cmd_buffer->cs = device->ws->cs_create(device->ws, ring);
if (!cmd_buffer->cs) {
vk_free(&cmd_buffer->pool->alloc, cmd_buffer);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
*pCommandBuffer = radv_cmd_buffer_to_handle(cmd_buffer);
@@ -347,7 +347,8 @@ radv_cmd_buffer_resize_upload_buf(struct radv_cmd_buffer *cmd_buffer,
new_size, 4096,
RADEON_DOMAIN_GTT,
RADEON_FLAG_CPU_ACCESS|
RADEON_FLAG_NO_INTERPROCESS_SHARING);
RADEON_FLAG_NO_INTERPROCESS_SHARING |
RADEON_FLAG_32BIT);
if (!bo) {
cmd_buffer->record_result = VK_ERROR_OUT_OF_DEVICE_MEMORY;
@@ -559,20 +560,8 @@ radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
gl_shader_stage stage,
int idx)
{
if (stage == MESA_SHADER_VERTEX) {
if (pipeline->shaders[MESA_SHADER_VERTEX])
return &pipeline->shaders[MESA_SHADER_VERTEX]->info.user_sgprs_locs.shader_data[idx];
if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
return &pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.user_sgprs_locs.shader_data[idx];
if (pipeline->shaders[MESA_SHADER_GEOMETRY])
return &pipeline->shaders[MESA_SHADER_GEOMETRY]->info.user_sgprs_locs.shader_data[idx];
} else if (stage == MESA_SHADER_TESS_EVAL) {
if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
return &pipeline->shaders[MESA_SHADER_TESS_EVAL]->info.user_sgprs_locs.shader_data[idx];
if (pipeline->shaders[MESA_SHADER_GEOMETRY])
return &pipeline->shaders[MESA_SHADER_GEOMETRY]->info.user_sgprs_locs.shader_data[idx];
}
return &pipeline->shaders[stage]->info.user_sgprs_locs.shader_data[idx];
struct radv_shader_variant *shader = radv_get_shader(pipeline, stage);
return &shader->info.user_sgprs_locs.shader_data[idx];
}
static void
@@ -585,11 +574,54 @@ radv_emit_userdata_address(struct radv_cmd_buffer *cmd_buffer,
uint32_t base_reg = pipeline->user_data_0[stage];
if (loc->sgpr_idx == -1)
return;
assert(loc->num_sgprs == 2);
assert(loc->num_sgprs == (HAVE_32BIT_POINTERS ? 1 : 2));
assert(!loc->indirect);
radeon_set_sh_reg_seq(cmd_buffer->cs, base_reg + loc->sgpr_idx * 4, 2);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radv_emit_shader_pointer(cmd_buffer->device, cmd_buffer->cs,
base_reg + loc->sgpr_idx * 4, va, false);
}
static void
radv_emit_descriptor_pointers(struct radv_cmd_buffer *cmd_buffer,
struct radv_pipeline *pipeline,
struct radv_descriptor_state *descriptors_state,
gl_shader_stage stage)
{
struct radv_device *device = cmd_buffer->device;
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint32_t sh_base = pipeline->user_data_0[stage];
struct radv_userdata_locations *locs =
&pipeline->shaders[stage]->info.user_sgprs_locs;
unsigned mask;
mask = descriptors_state->dirty & descriptors_state->valid;
for (int i = 0; i < MAX_SETS; i++) {
struct radv_userdata_info *loc = &locs->descriptor_sets[i];
if (loc->sgpr_idx != -1 && !loc->indirect)
continue;
mask &= ~(1 << i);
}
while (mask) {
int start, count;
u_bit_scan_consecutive_range(&mask, &start, &count);
struct radv_userdata_info *loc = &locs->descriptor_sets[start];
unsigned sh_offset = sh_base + loc->sgpr_idx * 4;
radv_emit_shader_pointer_head(cs, sh_offset, count,
HAVE_32BIT_POINTERS);
for (int i = 0; i < count; i++) {
struct radv_descriptor_set *set =
descriptors_state->sets[start + i];
radv_emit_shader_pointer_body(device, cs, set->va,
HAVE_32BIT_POINTERS);
}
}
}
static void
@@ -867,14 +899,6 @@ radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
{
uint32_t count = cmd_buffer->state.dynamic.scissor.count;
/* Vega10/Raven scissor bug workaround. This must be done before VPORT
* scissor registers are changed. There is also a more efficient but
* more involved alternative workaround.
*/
if (cmd_buffer->device->physical_device->has_scissor_bug) {
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_PS_PARTIAL_FLUSH;
si_emit_cache_flush(cmd_buffer);
}
si_write_scissors(cmd_buffer->cs, 0, count,
cmd_buffer->state.dynamic.scissor.scissors,
cmd_buffer->state.dynamic.viewport.viewports,
@@ -1020,6 +1044,68 @@ radv_emit_fb_color_state(struct radv_cmd_buffer *cmd_buffer,
}
}
static void
radv_update_zrange_precision(struct radv_cmd_buffer *cmd_buffer,
struct radv_ds_buffer_info *ds,
struct radv_image *image, VkImageLayout layout,
bool requires_cond_write)
{
uint32_t db_z_info = ds->db_z_info;
uint32_t db_z_info_reg;
if (!radv_image_is_tc_compat_htile(image))
return;
if (!radv_layout_has_htile(image, layout,
radv_image_queue_family_mask(image,
cmd_buffer->queue_family_index,
cmd_buffer->queue_family_index))) {
db_z_info &= C_028040_TILE_SURFACE_ENABLE;
}
db_z_info &= C_028040_ZRANGE_PRECISION;
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
db_z_info_reg = R_028038_DB_Z_INFO;
} else {
db_z_info_reg = R_028040_DB_Z_INFO;
}
/* When we don't know the last fast clear value we need to emit a
* conditional packet, otherwise we can update DB_Z_INFO directly.
*/
if (requires_cond_write) {
radeon_emit(cmd_buffer->cs, PKT3(PKT3_COND_WRITE, 7, 0));
const uint32_t write_space = 0 << 8; /* register */
const uint32_t poll_space = 1 << 4; /* memory */
const uint32_t function = 3 << 0; /* equal to the reference */
const uint32_t options = write_space | poll_space | function;
radeon_emit(cmd_buffer->cs, options);
/* poll address - location of the depth clear value */
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->clear_value_offset;
/* In presence of stencil format, we have to adjust the base
* address because the first value is the stencil clear value.
*/
if (vk_format_is_stencil(image->vk_format))
va += 4;
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cmd_buffer->cs, fui(0.0f)); /* reference value */
radeon_emit(cmd_buffer->cs, (uint32_t)-1); /* comparison mask */
radeon_emit(cmd_buffer->cs, db_z_info_reg >> 2); /* write address low */
radeon_emit(cmd_buffer->cs, 0u); /* write address high */
radeon_emit(cmd_buffer->cs, db_z_info);
} else {
radeon_set_context_reg(cmd_buffer->cs, db_z_info_reg, db_z_info);
}
}
static void
radv_emit_fb_ds_state(struct radv_cmd_buffer *cmd_buffer,
struct radv_ds_buffer_info *ds,
@@ -1078,20 +1164,71 @@ radv_emit_fb_ds_state(struct radv_cmd_buffer *cmd_buffer,
}
/* Update the ZRANGE_PRECISION value for the TC-compat bug. */
radv_update_zrange_precision(cmd_buffer, ds, image, layout, true);
radeon_set_context_reg(cmd_buffer->cs, R_028B78_PA_SU_POLY_OFFSET_DB_FMT_CNTL,
ds->pa_su_poly_offset_db_fmt_cntl);
}
void
radv_set_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkClearDepthStencilValue ds_clear_value,
VkImageAspectFlags aspects)
/**
* Update the fast clear depth/stencil values if the image is bound as a
* depth/stencil buffer.
*/
static void
radv_update_bound_fast_clear_ds(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkClearDepthStencilValue ds_clear_value,
VkImageAspectFlags aspects)
{
struct radv_framebuffer *framebuffer = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radeon_winsys_cs *cs = cmd_buffer->cs;
struct radv_attachment_info *att;
uint32_t att_idx;
if (!framebuffer || !subpass)
return;
att_idx = subpass->depth_stencil_attachment.attachment;
if (att_idx == VK_ATTACHMENT_UNUSED)
return;
att = &framebuffer->attachments[att_idx];
if (att->attachment->image != image)
return;
radeon_set_context_reg_seq(cs, R_028028_DB_STENCIL_CLEAR, 2);
radeon_emit(cs, ds_clear_value.stencil);
radeon_emit(cs, fui(ds_clear_value.depth));
/* Update the ZRANGE_PRECISION value for the TC-compat bug. This is
* only needed when clearing Z to 0.0.
*/
if ((aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
ds_clear_value.depth == 0.0) {
VkImageLayout layout = subpass->depth_stencil_attachment.layout;
radv_update_zrange_precision(cmd_buffer, &att->ds, image,
layout, false);
}
}
/**
* Set the clear depth/stencil values to the image's metadata.
*/
void
radv_set_ds_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkClearDepthStencilValue ds_clear_value,
VkImageAspectFlags aspects)
{
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->clear_value_offset;
unsigned reg_offset = 0, reg_count = 0;
va += image->offset + image->clear_value_offset;
assert(radv_image_has_htile(image));
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
@@ -1103,33 +1240,35 @@ radv_set_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
++reg_count;
radeon_emit(cmd_buffer->cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, 0));
radeon_emit(cmd_buffer->cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT)
radeon_emit(cmd_buffer->cs, ds_clear_value.stencil);
radeon_emit(cs, ds_clear_value.stencil);
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
radeon_emit(cmd_buffer->cs, fui(ds_clear_value.depth));
radeon_emit(cs, fui(ds_clear_value.depth));
radeon_set_context_reg_seq(cmd_buffer->cs, R_028028_DB_STENCIL_CLEAR + 4 * reg_offset, reg_count);
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT)
radeon_emit(cmd_buffer->cs, ds_clear_value.stencil); /* R_028028_DB_STENCIL_CLEAR */
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
radeon_emit(cmd_buffer->cs, fui(ds_clear_value.depth)); /* R_02802C_DB_DEPTH_CLEAR */
radv_update_bound_fast_clear_ds(cmd_buffer, image, ds_clear_value,
aspects);
}
/**
* Load the clear depth/stencil values from the image's metadata.
*/
static void
radv_load_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image)
radv_load_ds_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image)
{
struct radeon_winsys_cs *cs = cmd_buffer->cs;
VkImageAspectFlags aspects = vk_format_aspects(image->vk_format);
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->clear_value_offset;
unsigned reg_offset = 0, reg_count = 0;
va += image->offset + image->clear_value_offset;
if (!radv_image_has_htile(image))
return;
@@ -1142,21 +1281,21 @@ radv_load_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
++reg_count;
radeon_emit(cmd_buffer->cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cmd_buffer->cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG) |
(reg_count == 2 ? COPY_DATA_COUNT_SEL : 0));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cmd_buffer->cs, (R_028028_DB_STENCIL_CLEAR + 4 * reg_offset) >> 2);
radeon_emit(cmd_buffer->cs, 0);
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG) |
(reg_count == 2 ? COPY_DATA_COUNT_SEL : 0));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, (R_028028_DB_STENCIL_CLEAR + 4 * reg_offset) >> 2);
radeon_emit(cs, 0);
radeon_emit(cmd_buffer->cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cmd_buffer->cs, 0);
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
}
/*
*with DCC some colors don't require CMASK elimiation before being
* With DCC some colors don't require CMASK elimination before being
* used as a texture. This sets a predicate value to determine if the
* cmask eliminate is required.
*/
@@ -1181,55 +1320,95 @@ radv_set_dcc_need_cmask_elim_pred(struct radv_cmd_buffer *cmd_buffer,
radeon_emit(cmd_buffer->cs, pred_val >> 32);
}
void
radv_set_color_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int idx,
uint32_t color_values[2])
/**
* Update the fast clear color values if the image is bound as a color buffer.
*/
static void
radv_update_bound_fast_clear_color(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int cb_idx,
uint32_t color_values[2])
{
struct radv_framebuffer *framebuffer = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
struct radeon_winsys_cs *cs = cmd_buffer->cs;
struct radv_attachment_info *att;
uint32_t att_idx;
if (!framebuffer || !subpass)
return;
att_idx = subpass->color_attachments[cb_idx].attachment;
if (att_idx == VK_ATTACHMENT_UNUSED)
return;
att = &framebuffer->attachments[att_idx];
if (att->attachment->image != image)
return;
radeon_set_context_reg_seq(cs, R_028C8C_CB_COLOR0_CLEAR_WORD0 + cb_idx * 0x3c, 2);
radeon_emit(cs, color_values[0]);
radeon_emit(cs, color_values[1]);
}
/**
* Set the clear color values to the image's metadata.
*/
void
radv_set_color_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int cb_idx,
uint32_t color_values[2])
{
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->clear_value_offset;
assert(radv_image_has_cmask(image) || radv_image_has_dcc(image));
radeon_emit(cmd_buffer->cs, PKT3(PKT3_WRITE_DATA, 4, 0));
radeon_emit(cmd_buffer->cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cmd_buffer->cs, color_values[0]);
radeon_emit(cmd_buffer->cs, color_values[1]);
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, color_values[0]);
radeon_emit(cs, color_values[1]);
radeon_set_context_reg_seq(cmd_buffer->cs, R_028C8C_CB_COLOR0_CLEAR_WORD0 + idx * 0x3c, 2);
radeon_emit(cmd_buffer->cs, color_values[0]);
radeon_emit(cmd_buffer->cs, color_values[1]);
radv_update_bound_fast_clear_color(cmd_buffer, image, cb_idx,
color_values);
}
/**
* Load the clear color values from the image's metadata.
*/
static void
radv_load_color_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int idx)
radv_load_color_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int cb_idx)
{
struct radeon_winsys_cs *cs = cmd_buffer->cs;
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->clear_value_offset;
if (!radv_image_has_cmask(image) && !radv_image_has_dcc(image))
return;
uint32_t reg = R_028C8C_CB_COLOR0_CLEAR_WORD0 + idx * 0x3c;
uint32_t reg = R_028C8C_CB_COLOR0_CLEAR_WORD0 + cb_idx * 0x3c;
radeon_emit(cmd_buffer->cs, PKT3(PKT3_COPY_DATA, 4, cmd_buffer->state.predicating));
radeon_emit(cmd_buffer->cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG) |
COPY_DATA_COUNT_SEL);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cmd_buffer->cs, reg >> 2);
radeon_emit(cmd_buffer->cs, 0);
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, cmd_buffer->state.predicating));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG) |
COPY_DATA_COUNT_SEL);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, reg >> 2);
radeon_emit(cs, 0);
radeon_emit(cmd_buffer->cs, PKT3(PKT3_PFP_SYNC_ME, 0, cmd_buffer->state.predicating));
radeon_emit(cmd_buffer->cs, 0);
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, cmd_buffer->state.predicating));
radeon_emit(cs, 0);
}
static void
@@ -1260,7 +1439,7 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
assert(att->attachment->aspect_mask & VK_IMAGE_ASPECT_COLOR_BIT);
radv_emit_fb_color_state(cmd_buffer, i, att, image, layout);
radv_load_color_clear_regs(cmd_buffer, image, i);
radv_load_color_clear_metadata(cmd_buffer, image, i);
}
if(subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
@@ -1282,7 +1461,7 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
cmd_buffer->state.offset_scale = att->ds.offset_scale;
}
radv_load_depth_clear_regs(cmd_buffer, image);
radv_load_ds_clear_metadata(cmd_buffer, image);
} else {
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9)
radeon_set_context_reg_seq(cmd_buffer->cs, R_028038_DB_Z_INFO, 2);
@@ -1334,6 +1513,7 @@ radv_emit_index_buffer(struct radv_cmd_buffer *cmd_buffer)
void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer)
{
bool has_perfect_queries = cmd_buffer->state.perfect_occlusion_queries_enabled;
struct radv_pipeline *pipeline = cmd_buffer->state.pipeline;
uint32_t pa_sc_mode_cntl_1 =
pipeline ? pipeline->graphics.ms.pa_sc_mode_cntl_1 : 0;
@@ -1342,11 +1522,12 @@ void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer)
if(!cmd_buffer->state.active_occlusion_queries) {
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
if (G_028A4C_OUT_OF_ORDER_PRIMITIVE_ENABLE(pa_sc_mode_cntl_1) &&
pipeline->graphics.disable_out_of_order_rast_for_occlusion) {
pipeline->graphics.disable_out_of_order_rast_for_occlusion &&
has_perfect_queries) {
/* Re-enable out-of-order rasterization if the
* bound pipeline supports it and if it's has
* been disabled before starting occlusion
* queries.
* been disabled before starting any perfect
* occlusion queries.
*/
radeon_set_context_reg(cmd_buffer->cs,
R_028A4C_PA_SC_MODE_CNTL_1,
@@ -1359,22 +1540,22 @@ void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer)
} else {
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
uint32_t sample_rate = subpass ? util_logbase2(subpass->max_sample_count) : 0;
bool perfect = cmd_buffer->state.perfect_occlusion_queries_enabled;
if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
db_count_control =
S_028004_PERFECT_ZPASS_COUNTS(perfect) |
S_028004_PERFECT_ZPASS_COUNTS(has_perfect_queries) |
S_028004_SAMPLE_RATE(sample_rate) |
S_028004_ZPASS_ENABLE(1) |
S_028004_SLICE_EVEN_ENABLE(1) |
S_028004_SLICE_ODD_ENABLE(1);
if (G_028A4C_OUT_OF_ORDER_PRIMITIVE_ENABLE(pa_sc_mode_cntl_1) &&
pipeline->graphics.disable_out_of_order_rast_for_occlusion) {
pipeline->graphics.disable_out_of_order_rast_for_occlusion &&
has_perfect_queries) {
/* If the bound pipeline has enabled
* out-of-order rasterization, we should
* disable it before starting occlusion
* queries.
* disable it before starting any perfect
* occlusion queries.
*/
pa_sc_mode_cntl_1 &= C_028A4C_OUT_OF_ORDER_PRIMITIVE_ENABLE;
@@ -1399,7 +1580,8 @@ radv_cmd_buffer_flush_dynamic_state(struct radv_cmd_buffer *cmd_buffer)
if (states & (RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
radv_emit_viewport(cmd_buffer);
if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR | RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR | RADV_CMD_DIRTY_DYNAMIC_VIEWPORT) &&
!cmd_buffer->device->physical_device->has_scissor_bug)
radv_emit_scissor(cmd_buffer);
if (states & RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH)
@@ -1425,48 +1607,6 @@ radv_cmd_buffer_flush_dynamic_state(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->state.dirty &= ~states;
}
static void
emit_stage_descriptor_set_userdata(struct radv_cmd_buffer *cmd_buffer,
struct radv_pipeline *pipeline,
int idx,
uint64_t va,
gl_shader_stage stage)
{
struct radv_userdata_info *desc_set_loc = &pipeline->shaders[stage]->info.user_sgprs_locs.descriptor_sets[idx];
uint32_t base_reg = pipeline->user_data_0[stage];
if (desc_set_loc->sgpr_idx == -1 || desc_set_loc->indirect)
return;
assert(!desc_set_loc->indirect);
assert(desc_set_loc->num_sgprs == 2);
radeon_set_sh_reg_seq(cmd_buffer->cs,
base_reg + desc_set_loc->sgpr_idx * 4, 2);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
}
static void
radv_emit_descriptor_set_userdata(struct radv_cmd_buffer *cmd_buffer,
VkShaderStageFlags stages,
struct radv_descriptor_set *set,
unsigned idx)
{
if (cmd_buffer->state.pipeline) {
radv_foreach_stage(stage, stages) {
if (cmd_buffer->state.pipeline->shaders[stage])
emit_stage_descriptor_set_userdata(cmd_buffer, cmd_buffer->state.pipeline,
idx, set->va,
stage);
}
}
if (cmd_buffer->state.compute_pipeline && (stages & VK_SHADER_STAGE_COMPUTE_BIT))
emit_stage_descriptor_set_userdata(cmd_buffer, cmd_buffer->state.compute_pipeline,
idx, set->va,
MESA_SHADER_COMPUTE);
}
static void
radv_flush_push_descriptors(struct radv_cmd_buffer *cmd_buffer,
VkPipelineBindPoint bind_point)
@@ -1548,7 +1688,6 @@ radv_flush_descriptors(struct radv_cmd_buffer *cmd_buffer,
VK_PIPELINE_BIND_POINT_GRAPHICS;
struct radv_descriptor_state *descriptors_state =
radv_get_descriptors_state(cmd_buffer, bind_point);
unsigned i;
if (!descriptors_state->dirty)
return;
@@ -1565,13 +1704,25 @@ radv_flush_descriptors(struct radv_cmd_buffer *cmd_buffer,
cmd_buffer->cs,
MAX_SETS * MESA_SHADER_STAGES * 4);
for_each_bit(i, descriptors_state->dirty) {
struct radv_descriptor_set *set = descriptors_state->sets[i];
if (!(descriptors_state->valid & (1u << i)))
continue;
if (cmd_buffer->state.pipeline) {
radv_foreach_stage(stage, stages) {
if (!cmd_buffer->state.pipeline->shaders[stage])
continue;
radv_emit_descriptor_set_userdata(cmd_buffer, stages, set, i);
radv_emit_descriptor_pointers(cmd_buffer,
cmd_buffer->state.pipeline,
descriptors_state, stage);
}
}
if (cmd_buffer->state.compute_pipeline &&
(stages & VK_SHADER_STAGE_COMPUTE_BIT)) {
radv_emit_descriptor_pointers(cmd_buffer,
cmd_buffer->state.compute_pipeline,
descriptors_state,
MESA_SHADER_COMPUTE);
}
descriptors_state->dirty = 0;
descriptors_state->push_dirty = false;
@@ -1589,6 +1740,7 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
? cmd_buffer->state.compute_pipeline
: cmd_buffer->state.pipeline;
struct radv_pipeline_layout *layout = pipeline->layout;
struct radv_shader_variant *shader, *prev_shader;
unsigned offset;
void *ptr;
uint64_t va;
@@ -1613,10 +1765,16 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws,
cmd_buffer->cs, MESA_SHADER_STAGES * 4);
prev_shader = NULL;
radv_foreach_stage(stage, stages) {
if (pipeline->shaders[stage]) {
shader = radv_get_shader(pipeline, stage);
/* Avoid redundantly emitting the address for merged stages. */
if (shader && shader != prev_shader) {
radv_emit_userdata_address(cmd_buffer, pipeline, stage,
AC_UD_PUSH_CONSTANTS, va);
prev_shader = shader;
}
}
@@ -1631,7 +1789,7 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer *cmd_buffer,
if ((pipeline_is_dirty ||
(cmd_buffer->state.dirty & RADV_CMD_DIRTY_VERTEX_BUFFER)) &&
cmd_buffer->state.pipeline->vertex_elements.count &&
radv_get_vertex_shader(cmd_buffer->state.pipeline)->info.info.vs.has_vertex_buffers) {
radv_get_shader(cmd_buffer->state.pipeline, MESA_SHADER_VERTEX)->info.info.vs.has_vertex_buffers) {
struct radv_vertex_elements_info *velems = &cmd_buffer->state.pipeline->vertex_elements;
unsigned vb_offset;
void *vb_ptr;
@@ -2405,7 +2563,7 @@ VkResult radv_EndCommandBuffer(
vk_free(&cmd_buffer->pool->alloc, cmd_buffer->state.attachments);
if (!cmd_buffer->device->ws->cs_finalize(cmd_buffer->cs))
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(cmd_buffer->device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
cmd_buffer->status = RADV_CMD_BUFFER_STATUS_EXECUTABLE;
@@ -2519,18 +2677,6 @@ void radv_CmdSetViewport(
assert(firstViewport < MAX_VIEWPORTS);
assert(total_count >= 1 && total_count <= MAX_VIEWPORTS);
if (cmd_buffer->device->physical_device->has_scissor_bug) {
/* Try to skip unnecessary PS partial flushes when the viewports
* don't change.
*/
if (!(state->dirty & (RADV_CMD_DIRTY_DYNAMIC_VIEWPORT |
RADV_CMD_DIRTY_DYNAMIC_SCISSOR)) &&
!memcmp(state->dynamic.viewport.viewports + firstViewport,
pViewports, viewportCount * sizeof(*pViewports))) {
return;
}
}
memcpy(state->dynamic.viewport.viewports + firstViewport, pViewports,
viewportCount * sizeof(*pViewports));
@@ -2550,18 +2696,6 @@ void radv_CmdSetScissor(
assert(firstScissor < MAX_SCISSORS);
assert(total_count >= 1 && total_count <= MAX_SCISSORS);
if (cmd_buffer->device->physical_device->has_scissor_bug) {
/* Try to skip unnecessary PS partial flushes when the scissors
* don't change.
*/
if (!(state->dirty & (RADV_CMD_DIRTY_DYNAMIC_VIEWPORT |
RADV_CMD_DIRTY_DYNAMIC_SCISSOR)) &&
!memcmp(state->dynamic.scissor.scissors + firstScissor,
pScissors, scissorCount * sizeof(*pScissors))) {
return;
}
}
memcpy(state->dynamic.scissor.scissors + firstScissor, pScissors,
scissorCount * sizeof(*pScissors));
@@ -2783,7 +2917,7 @@ VkResult radv_CreateCommandPool(
pool = vk_alloc2(&device->alloc, pAllocator, sizeof(*pool), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pool == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
if (pAllocator)
pool->alloc = *pAllocator;
@@ -2956,7 +3090,7 @@ radv_cs_emit_indirect_draw_packet(struct radv_cmd_buffer *cmd_buffer,
struct radeon_winsys_cs *cs = cmd_buffer->cs;
unsigned di_src_sel = indexed ? V_0287F0_DI_SRC_SEL_DMA
: V_0287F0_DI_SRC_SEL_AUTO_INDEX;
bool draw_id_enable = radv_get_vertex_shader(cmd_buffer->state.pipeline)->info.info.vs.needs_draw_id;
bool draw_id_enable = radv_get_shader(cmd_buffer->state.pipeline, MESA_SHADER_VERTEX)->info.info.vs.needs_draw_id;
uint32_t base_reg = cmd_buffer->state.pipeline->graphics.vtx_base_sgpr;
assert(base_reg);
@@ -3141,10 +3275,55 @@ radv_emit_draw_packets(struct radv_cmd_buffer *cmd_buffer,
}
}
/*
* Vega and raven have a bug which triggers if there are multiple context
* register contexts active at the same time with different scissor values.
*
* There are two possible workarounds:
* 1) Wait for PS_PARTIAL_FLUSH every time the scissor is changed. That way
* there is only ever 1 active set of scissor values at the same time.
*
* 2) Whenever the hardware switches contexts we have to set the scissor
* registers again even if it is a noop. That way the new context gets
* the correct scissor values.
*
* This implements option 2. radv_need_late_scissor_emission needs to
* return true on affected HW if radv_emit_all_graphics_states sets
* any context registers.
*/
static bool radv_need_late_scissor_emission(struct radv_cmd_buffer *cmd_buffer,
bool indexed_draw)
{
struct radv_cmd_state *state = &cmd_buffer->state;
if (!cmd_buffer->device->physical_device->has_scissor_bug)
return false;
uint32_t used_states = cmd_buffer->state.pipeline->graphics.needed_dynamic_state | ~RADV_CMD_DIRTY_DYNAMIC_ALL;
/* Index & Vertex buffer don't change context regs, and pipeline is handled later. */
used_states &= ~(RADV_CMD_DIRTY_INDEX_BUFFER | RADV_CMD_DIRTY_VERTEX_BUFFER | RADV_CMD_DIRTY_PIPELINE);
/* Assume all state changes except these two can imply context rolls. */
if (cmd_buffer->state.dirty & used_states)
return true;
if (cmd_buffer->state.emitted_pipeline != cmd_buffer->state.pipeline)
return true;
if (indexed_draw && state->pipeline->graphics.prim_restart_enable &&
(state->index_type ? 0xffffffffu : 0xffffu) != state->last_primitive_reset_index)
return true;
return false;
}
static void
radv_emit_all_graphics_states(struct radv_cmd_buffer *cmd_buffer,
const struct radv_draw_info *info)
{
bool late_scissor_emission = radv_need_late_scissor_emission(cmd_buffer, info->indexed);
if ((cmd_buffer->state.dirty & RADV_CMD_DIRTY_FRAMEBUFFER) ||
cmd_buffer->state.emitted_pipeline != cmd_buffer->state.pipeline)
radv_emit_rbplus_state(cmd_buffer);
@@ -3174,6 +3353,9 @@ radv_emit_all_graphics_states(struct radv_cmd_buffer *cmd_buffer,
radv_emit_draw_registers(cmd_buffer, info->indexed,
info->instance_count > 1, info->indirect,
info->indirect ? 0 : info->count);
if (late_scissor_emission)
radv_emit_scissor(cmd_buffer);
}
static void
@@ -3381,6 +3563,55 @@ void radv_CmdDrawIndexedIndirectCountAMD(
radv_draw(cmd_buffer, &info);
}
void radv_CmdDrawIndirectCountKHR(
VkCommandBuffer commandBuffer,
VkBuffer _buffer,
VkDeviceSize offset,
VkBuffer _countBuffer,
VkDeviceSize countBufferOffset,
uint32_t maxDrawCount,
uint32_t stride)
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
RADV_FROM_HANDLE(radv_buffer, buffer, _buffer);
RADV_FROM_HANDLE(radv_buffer, count_buffer, _countBuffer);
struct radv_draw_info info = {};
info.count = maxDrawCount;
info.indirect = buffer;
info.indirect_offset = offset;
info.count_buffer = count_buffer;
info.count_buffer_offset = countBufferOffset;
info.stride = stride;
radv_draw(cmd_buffer, &info);
}
void radv_CmdDrawIndexedIndirectCountKHR(
VkCommandBuffer commandBuffer,
VkBuffer _buffer,
VkDeviceSize offset,
VkBuffer _countBuffer,
VkDeviceSize countBufferOffset,
uint32_t maxDrawCount,
uint32_t stride)
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
RADV_FROM_HANDLE(radv_buffer, buffer, _buffer);
RADV_FROM_HANDLE(radv_buffer, count_buffer, _countBuffer);
struct radv_draw_info info = {};
info.indexed = true;
info.count = maxDrawCount;
info.indirect = buffer;
info.indirect_offset = offset;
info.count_buffer = count_buffer;
info.count_buffer_offset = countBufferOffset;
info.stride = stride;
radv_draw(cmd_buffer, &info);
}
struct radv_dispatch_info {
/**
* Determine the layout of the grid (in block units) to be used.
@@ -3711,6 +3942,20 @@ static void radv_initialize_htile(struct radv_cmd_buffer *cmd_buffer,
size, clear_word);
state->flush_bits |= RADV_CMD_FLAG_FLUSH_AND_INV_DB_META;
/* Initialize the depth clear registers and update the ZRANGE_PRECISION
* value for the TC-compat bug (because ZRANGE_PRECISION is 1 by
* default). This is only needed whean clearing Z to 0.0f.
*/
if (radv_image_is_tc_compat_htile(image) && clear_word == 0) {
VkImageAspectFlags aspects = VK_IMAGE_ASPECT_DEPTH_BIT;
VkClearDepthStencilValue value = {};
if (vk_format_is_stencil(image->vk_format))
aspects |= VK_IMAGE_ASPECT_STENCIL_BIT;
radv_set_ds_clear_metadata(cmd_buffer, image, value, aspects);
}
}
static void radv_handle_depth_image_transition(struct radv_cmd_buffer *cmd_buffer,

View File

@@ -369,11 +369,9 @@ static void si_add_split_disasm(const char *disasm,
}
static void
radv_dump_annotated_shader(struct radv_pipeline *pipeline,
struct radv_shader_variant *shader,
gl_shader_stage stage,
struct ac_wave_info *waves, unsigned num_waves,
FILE *f)
radv_dump_annotated_shader(struct radv_shader_variant *shader,
gl_shader_stage stage, struct ac_wave_info *waves,
unsigned num_waves, FILE *f)
{
uint64_t start_addr, end_addr;
unsigned i;
@@ -444,28 +442,22 @@ radv_dump_annotated_shader(struct radv_pipeline *pipeline,
static void
radv_dump_annotated_shaders(struct radv_pipeline *pipeline,
struct radv_shader_variant *compute_shader,
FILE *f)
VkShaderStageFlagBits active_stages, FILE *f)
{
struct ac_wave_info waves[AC_MAX_WAVES_PER_CHIP];
unsigned num_waves = ac_get_wave_info(waves);
unsigned mask;
fprintf(f, COLOR_CYAN "The number of active waves = %u" COLOR_RESET
"\n\n", num_waves);
/* Dump annotated active graphics shaders. */
mask = pipeline->active_stages;
while (mask) {
int stage = u_bit_scan(&mask);
while (active_stages) {
int stage = u_bit_scan(&active_stages);
radv_dump_annotated_shader(pipeline, pipeline->shaders[stage],
radv_dump_annotated_shader(pipeline->shaders[stage],
stage, waves, num_waves, f);
}
radv_dump_annotated_shader(pipeline, compute_shader,
MESA_SHADER_COMPUTE, waves, num_waves, f);
/* Print waves executing shaders that are not currently bound. */
unsigned i;
bool found = false;
@@ -523,48 +515,51 @@ radv_dump_shader(struct radv_pipeline *pipeline,
static void
radv_dump_shaders(struct radv_pipeline *pipeline,
struct radv_shader_variant *compute_shader, FILE *f)
VkShaderStageFlagBits active_stages, FILE *f)
{
unsigned mask;
/* Dump active graphics shaders. */
mask = pipeline->active_stages;
while (mask) {
int stage = u_bit_scan(&mask);
while (active_stages) {
int stage = u_bit_scan(&active_stages);
radv_dump_shader(pipeline, pipeline->shaders[stage], stage, f);
}
}
radv_dump_shader(pipeline, compute_shader, MESA_SHADER_COMPUTE, f);
static void
radv_dump_pipeline_state(struct radv_pipeline *pipeline,
VkShaderStageFlagBits active_stages, FILE *f)
{
radv_dump_shaders(pipeline, active_stages, f);
radv_dump_annotated_shaders(pipeline, active_stages, f);
radv_dump_descriptors(pipeline, f);
}
static void
radv_dump_graphics_state(struct radv_pipeline *graphics_pipeline,
struct radv_pipeline *compute_pipeline, FILE *f)
{
struct radv_shader_variant *compute_shader =
compute_pipeline ? compute_pipeline->shaders[MESA_SHADER_COMPUTE] : NULL;
VkShaderStageFlagBits active_stages;
if (!graphics_pipeline)
return;
if (graphics_pipeline) {
active_stages = graphics_pipeline->active_stages;
radv_dump_pipeline_state(graphics_pipeline, active_stages, f);
}
radv_dump_shaders(graphics_pipeline, compute_shader, f);
radv_dump_annotated_shaders(graphics_pipeline, compute_shader, f);
radv_dump_descriptors(graphics_pipeline, f);
if (compute_pipeline) {
active_stages = VK_SHADER_STAGE_COMPUTE_BIT;
radv_dump_pipeline_state(compute_pipeline, active_stages, f);
}
}
static void
radv_dump_compute_state(struct radv_pipeline *compute_pipeline, FILE *f)
{
VkShaderStageFlagBits active_stages = VK_SHADER_STAGE_COMPUTE_BIT;
if (!compute_pipeline)
return;
radv_dump_shaders(compute_pipeline,
compute_pipeline->shaders[MESA_SHADER_COMPUTE], f);
radv_dump_annotated_shaders(compute_pipeline,
compute_pipeline->shaders[MESA_SHADER_COMPUTE],
f);
radv_dump_descriptors(compute_pipeline, f);
radv_dump_pipeline_state(compute_pipeline, active_stages, f);
}
static struct radv_pipeline *
@@ -643,11 +638,9 @@ radv_dump_device_name(struct radv_device *device, FILE *f)
snprintf(kernel_version, sizeof(kernel_version),
" / %s", uname_data.release);
if (HAVE_LLVM > 0) {
snprintf(llvm_string, sizeof(llvm_string),
", LLVM %i.%i.%i", (HAVE_LLVM >> 8) & 0xff,
HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
}
snprintf(llvm_string, sizeof(llvm_string),
", LLVM %i.%i.%i", (HAVE_LLVM >> 8) & 0xff,
HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
fprintf(f, "Device name: %s (%s DRM %i.%i.%i%s%s)\n\n",
chip_name, device->physical_device->name,

View File

@@ -44,6 +44,11 @@ enum {
RADV_DEBUG_NO_SISCHED = 0x4000,
RADV_DEBUG_PREOPTIR = 0x8000,
RADV_DEBUG_NO_DYNAMIC_BOUNDS = 0x10000,
RADV_DEBUG_NO_OUT_OF_ORDER = 0x20000,
RADV_DEBUG_INFO = 0x40000,
RADV_DEBUG_ERRORS = 0x80000,
RADV_DEBUG_STARTUP = 0x100000,
RADV_DEBUG_CHECKIR = 0x200000,
};
enum {

View File

@@ -95,7 +95,7 @@ VkResult radv_CreateDescriptorSetLayout(
set_layout = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set_layout)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
set_layout->flags = pCreateInfo->flags;
@@ -106,7 +106,7 @@ VkResult radv_CreateDescriptorSetLayout(
pCreateInfo->bindingCount);
if (!bindings) {
vk_free2(&device->alloc, pAllocator, set_layout);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
set_layout->binding_count = max_binding + 1;
@@ -322,7 +322,7 @@ void radv_GetDescriptorSetLayoutSupport(VkDevice device,
/*
* Pipeline layouts. These have nothing to do with the pipeline. They are
* just muttiple descriptor set layouts pasted together
* just multiple descriptor set layouts pasted together.
*/
VkResult radv_CreatePipelineLayout(
@@ -340,7 +340,7 @@ VkResult radv_CreatePipelineLayout(
layout = vk_alloc2(&device->alloc, pAllocator, sizeof(*layout), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (layout == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
layout->num_sets = pCreateInfo->setLayoutCount;
@@ -412,7 +412,7 @@ radv_descriptor_set_create(struct radv_device *device,
if (pool->host_memory_base) {
if (pool->host_memory_end - pool->host_memory_ptr < mem_size)
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
return vk_error(device->instance, VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
set = (struct radv_descriptor_set*)pool->host_memory_ptr;
pool->host_memory_ptr += mem_size;
@@ -421,7 +421,7 @@ radv_descriptor_set_create(struct radv_device *device,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!set)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
memset(set, 0, mem_size);
@@ -437,7 +437,7 @@ radv_descriptor_set_create(struct radv_device *device,
if (!pool->host_memory_base && pool->entry_count == pool->max_entry_count) {
vk_free2(&device->alloc, NULL, set);
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
return vk_error(device->instance, VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
}
/* try to allocate linearly first, so that we don't spend
@@ -466,7 +466,7 @@ radv_descriptor_set_create(struct radv_device *device,
if (pool->size - offset < layout_size) {
vk_free2(&device->alloc, NULL, set);
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
return vk_error(device->instance, VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
}
set->bo = pool->bo;
set->mapped_ptr = (uint32_t*)(pool->mapped_ptr + offset);
@@ -478,7 +478,7 @@ radv_descriptor_set_create(struct radv_device *device,
pool->entries[index].set = set;
pool->entry_count++;
} else
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
return vk_error(device->instance, VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
}
if (layout->has_immutable_samplers) {
@@ -580,7 +580,7 @@ VkResult radv_CreateDescriptorPool(
pool = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pool, 0, sizeof(*pool));
@@ -594,7 +594,8 @@ VkResult radv_CreateDescriptorPool(
pool->bo = device->ws->buffer_create(device->ws, bo_size, 32,
RADEON_DOMAIN_VRAM,
RADEON_FLAG_NO_INTERPROCESS_SHARING |
RADEON_FLAG_READ_ONLY);
RADEON_FLAG_READ_ONLY |
RADEON_FLAG_32BIT);
pool->mapped_ptr = (uint8_t*)device->ws->buffer_map(pool->bo);
}
pool->size = bo_size;
@@ -995,7 +996,7 @@ VkResult radv_CreateDescriptorUpdateTemplate(VkDevice _device,
templ = vk_alloc2(&device->alloc, pAllocator, size, 8, VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!templ)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
templ->entry_count = entry_count;
templ->bind_point = pCreateInfo->pipelineBindPoint;

View File

@@ -108,12 +108,9 @@ radv_get_device_name(enum radeon_family family, char *name, size_t name_len)
default: chip_string = "AMD RADV unknown"; break;
}
if (HAVE_LLVM > 0) {
snprintf(llvm_string, sizeof(llvm_string),
" (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
}
snprintf(llvm_string, sizeof(llvm_string),
" (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
snprintf(name, name_len, "%s%s", chip_string, llvm_string);
}
@@ -230,23 +227,38 @@ radv_physical_device_init(struct radv_physical_device *device,
int fd;
fd = open(path, O_RDWR | O_CLOEXEC);
if (fd < 0)
return vk_error(VK_ERROR_INCOMPATIBLE_DRIVER);
if (fd < 0) {
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Could not open device '%s'", path);
return vk_error(instance, VK_ERROR_INCOMPATIBLE_DRIVER);
}
version = drmGetVersion(fd);
if (!version) {
close(fd);
return vk_errorf(VK_ERROR_INCOMPATIBLE_DRIVER,
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Could not get the kernel driver version for device '%s'", path);
return vk_errorf(instance, VK_ERROR_INCOMPATIBLE_DRIVER,
"failed to get version %s: %m", path);
}
if (strcmp(version->name, "amdgpu")) {
drmFreeVersion(version);
close(fd);
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Device '%s' is not using the amdgpu kernel driver.", path);
return VK_ERROR_INCOMPATIBLE_DRIVER;
}
drmFreeVersion(version);
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Found compatible device '%s'.", path);
device->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
device->instance = instance;
assert(strlen(path) < ARRAY_SIZE(device->path));
@@ -255,7 +267,7 @@ radv_physical_device_init(struct radv_physical_device *device,
device->ws = radv_amdgpu_winsys_create(fd, instance->debug_flags,
instance->perftest_flags);
if (!device->ws) {
result = VK_ERROR_INCOMPATIBLE_DRIVER;
result = vk_error(instance, VK_ERROR_INCOMPATIBLE_DRIVER);
goto fail;
}
@@ -268,7 +280,7 @@ radv_physical_device_init(struct radv_physical_device *device,
if (radv_device_get_cache_uuid(device->rad_info.family, device->cache_uuid)) {
device->ws->destroy(device->ws);
result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED,
result = vk_errorf(instance, VK_ERROR_INITIALIZATION_FAILED,
"cannot generate UUID");
goto fail;
}
@@ -278,7 +290,7 @@ radv_physical_device_init(struct radv_physical_device *device,
(device->instance->perftest_flags & RADV_PERFTEST_SISCHED ? 0x1 : 0) |
(device->instance->debug_flags & RADV_DEBUG_UNSAFE_MATH ? 0x2 : 0);
/* The gpu id is already embeded in the uuid so we just pass "radv"
/* The gpu id is already embedded in the uuid so we just pass "radv"
* when creating the cache.
*/
char buf[VK_UUID_SIZE * 2 + 1];
@@ -300,7 +312,7 @@ radv_physical_device_init(struct radv_physical_device *device,
device->rad_info.family == CHIP_RAVEN;
}
/* The mere presense of CLEAR_STATE in the IB causes random GPU hangs
/* The mere presence of CLEAR_STATE in the IB causes random GPU hangs
* on SI.
*/
device->has_clear_state = device->rad_info.chip_class >= CIK;
@@ -315,10 +327,10 @@ radv_physical_device_init(struct radv_physical_device *device,
device->has_out_of_order_rast = device->rad_info.chip_class >= VI &&
device->rad_info.max_se >= 2;
device->out_of_order_rast_allowed = device->has_out_of_order_rast &&
(device->instance->perftest_flags & RADV_PERFTEST_OUT_OF_ORDER);
!(device->instance->debug_flags & RADV_DEBUG_NO_OUT_OF_ORDER);
device->dcc_msaa_allowed = device->rad_info.chip_class == VI &&
(device->instance->perftest_flags & RADV_PERFTEST_DCC_MSAA);
device->dcc_msaa_allowed =
(device->instance->perftest_flags & RADV_PERFTEST_DCC_MSAA);
radv_physical_device_init_mem_types(device);
radv_fill_device_extension_table(device, &device->supported_extensions);
@@ -326,9 +338,13 @@ radv_physical_device_init(struct radv_physical_device *device,
result = radv_init_wsi(device);
if (result != VK_SUCCESS) {
device->ws->destroy(device->ws);
vk_error(instance, result);
goto fail;
}
if ((device->instance->debug_flags & RADV_DEBUG_INFO))
ac_print_gpu_info(&device->rad_info);
return VK_SUCCESS;
fail:
@@ -390,6 +406,11 @@ static const struct debug_control radv_debug_options[] = {
{"nosisched", RADV_DEBUG_NO_SISCHED},
{"preoptir", RADV_DEBUG_PREOPTIR},
{"nodynamicbounds", RADV_DEBUG_NO_DYNAMIC_BOUNDS},
{"nooutoforder", RADV_DEBUG_NO_OUT_OF_ORDER},
{"info", RADV_DEBUG_INFO},
{"errors", RADV_DEBUG_ERRORS},
{"startup", RADV_DEBUG_STARTUP},
{"checkir", RADV_DEBUG_CHECKIR},
{NULL, 0}
};
@@ -405,7 +426,6 @@ static const struct debug_control radv_perftest_options[] = {
{"sisched", RADV_PERFTEST_SISCHED},
{"localbos", RADV_PERFTEST_LOCAL_BOS},
{"binning", RADV_PERFTEST_BINNING},
{"outoforderrast", RADV_PERFTEST_OUT_OF_ORDER},
{"dccmsaa", RADV_PERFTEST_DCC_MSAA},
{NULL, 0}
};
@@ -428,10 +448,12 @@ radv_handle_per_app_options(struct radv_instance *instance,
if (!strcmp(name, "Talos - Linux - 32bit") ||
!strcmp(name, "Talos - Linux - 64bit")) {
/* Force enable LLVM sisched for Talos because it looks safe
* and it gives few more FPS.
*/
instance->perftest_flags |= RADV_PERFTEST_SISCHED;
if (!(instance->debug_flags & RADV_DEBUG_NO_SISCHED)) {
/* Force enable LLVM sisched for Talos because it looks
* safe and it gives few more FPS.
*/
instance->perftest_flags |= RADV_PERFTEST_SISCHED;
}
}
}
@@ -460,22 +482,13 @@ VkResult radv_CreateInstance(
pCreateInfo->pApplicationInfo->apiVersion != 0) {
client_version = pCreateInfo->pApplicationInfo->apiVersion;
} else {
client_version = VK_MAKE_VERSION(1, 0, 0);
}
if (VK_MAKE_VERSION(1, 0, 0) > client_version ||
client_version > VK_MAKE_VERSION(1, 1, 0xfff)) {
return vk_errorf(VK_ERROR_INCOMPATIBLE_DRIVER,
"Client requested version %d.%d.%d",
VK_VERSION_MAJOR(client_version),
VK_VERSION_MINOR(client_version),
VK_VERSION_PATCH(client_version));
radv_EnumerateInstanceVersion(&client_version);
}
instance = vk_zalloc2(&default_alloc, pAllocator, sizeof(*instance), 8,
VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
if (!instance)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(NULL, VK_ERROR_OUT_OF_HOST_MEMORY);
instance->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
@@ -487,13 +500,23 @@ VkResult radv_CreateInstance(
instance->apiVersion = client_version;
instance->physicalDeviceCount = -1;
instance->debug_flags = parse_debug_string(getenv("RADV_DEBUG"),
radv_debug_options);
instance->perftest_flags = parse_debug_string(getenv("RADV_PERFTEST"),
radv_perftest_options);
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Created an instance");
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
const char *ext_name = pCreateInfo->ppEnabledExtensionNames[i];
int index = radv_get_instance_extension_index(ext_name);
if (index < 0 || !radv_supported_instance_extensions.extensions[index]) {
vk_free2(&default_alloc, pAllocator, instance);
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
return vk_error(instance, VK_ERROR_EXTENSION_NOT_PRESENT);
}
instance->enabled_extensions.extensions[index] = true;
@@ -502,29 +525,15 @@ VkResult radv_CreateInstance(
result = vk_debug_report_instance_init(&instance->debug_report_callbacks);
if (result != VK_SUCCESS) {
vk_free2(&default_alloc, pAllocator, instance);
return vk_error(result);
return vk_error(instance, result);
}
_mesa_locale_init();
VG(VALGRIND_CREATE_MEMPOOL(instance, 0, false));
instance->debug_flags = parse_debug_string(getenv("RADV_DEBUG"),
radv_debug_options);
instance->perftest_flags = parse_debug_string(getenv("RADV_PERFTEST"),
radv_perftest_options);
radv_handle_per_app_options(instance, pCreateInfo->pApplicationInfo);
if (instance->debug_flags & RADV_DEBUG_NO_SISCHED) {
/* Disable sisched when the user requests it, this is mostly
* useful when the driver force-enable sisched for the given
* application.
*/
instance->perftest_flags &= ~RADV_PERFTEST_SISCHED;
}
*pInstance = radv_instance_to_handle(instance);
return VK_SUCCESS;
@@ -563,8 +572,12 @@ radv_enumerate_devices(struct radv_instance *instance)
instance->physicalDeviceCount = 0;
max_devices = drmGetDevices2(0, devices, ARRAY_SIZE(devices));
if (instance->debug_flags & RADV_DEBUG_STARTUP)
radv_logi("Found %d drm nodes", max_devices);
if (max_devices < 1)
return vk_error(VK_ERROR_INCOMPATIBLE_DRIVER);
return vk_error(instance, VK_ERROR_INCOMPATIBLE_DRIVER);
for (unsigned i = 0; i < (unsigned)max_devices; i++) {
if (devices[i]->available_nodes & 1 << DRM_NODE_RENDER &&
@@ -745,7 +758,7 @@ void radv_GetPhysicalDeviceFeatures2(
}
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DESCRIPTOR_INDEXING_FEATURES_EXT: {
VkPhysicalDeviceDescriptorIndexingFeaturesEXT *features =
(VkPhysicalDeviceDescriptorIndexingFeaturesEXT*)features;
(VkPhysicalDeviceDescriptorIndexingFeaturesEXT*)ext;
features->shaderInputAttachmentArrayDynamicIndexing = true;
features->shaderUniformTexelBufferArrayDynamicIndexing = true;
features->shaderStorageTexelBufferArrayDynamicIndexing = true;
@@ -865,7 +878,7 @@ void radv_GetPhysicalDeviceProperties(
.maxViewports = MAX_VIEWPORTS,
.maxViewportDimensions = { (1 << 14), (1 << 14) },
.viewportBoundsRange = { INT16_MIN, INT16_MAX },
.viewportSubPixelBits = 13, /* We take a float? */
.viewportSubPixelBits = 8,
.minMemoryMapAlignment = 4096, /* A page */
.minTexelBufferOffsetAlignment = 1,
.minUniformBufferOffsetAlignment = 4,
@@ -977,9 +990,12 @@ void radv_GetPhysicalDeviceProperties2(
VK_SUBGROUP_FEATURE_BASIC_BIT |
VK_SUBGROUP_FEATURE_BALLOT_BIT |
VK_SUBGROUP_FEATURE_QUAD_BIT |
VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
VK_SUBGROUP_FEATURE_SHUFFLE_RELATIVE_BIT |
VK_SUBGROUP_FEATURE_VOTE_BIT;
if (pdevice->rad_info.chip_class >= VI) {
properties->supportedOperations |=
VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
VK_SUBGROUP_FEATURE_SHUFFLE_RELATIVE_BIT;
}
properties->quadOperationsInAllStages = true;
break;
}
@@ -1258,7 +1274,7 @@ radv_queue_init(struct radv_device *device, struct radv_queue *queue,
queue->hw_ctx = device->ws->ctx_create(device->ws, queue->priority);
if (!queue->hw_ctx)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
return VK_SUCCESS;
}
@@ -1353,36 +1369,8 @@ static void radv_bo_list_remove(struct radv_device *device,
static void
radv_device_init_gs_info(struct radv_device *device)
{
switch (device->physical_device->rad_info.family) {
case CHIP_OLAND:
case CHIP_HAINAN:
case CHIP_KAVERI:
case CHIP_KABINI:
case CHIP_MULLINS:
case CHIP_ICELAND:
case CHIP_CARRIZO:
case CHIP_STONEY:
device->gs_table_depth = 16;
return;
case CHIP_TAHITI:
case CHIP_PITCAIRN:
case CHIP_VERDE:
case CHIP_BONAIRE:
case CHIP_HAWAII:
case CHIP_TONGA:
case CHIP_FIJI:
case CHIP_POLARIS10:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGAM:
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_RAVEN:
device->gs_table_depth = 32;
return;
default:
unreachable("unknown GPU");
}
device->gs_table_depth = ac_get_gs_table_depth(device->physical_device->rad_info.chip_class,
device->physical_device->rad_info.family);
}
static int radv_get_device_extension_index(const char *name)
@@ -1415,7 +1403,7 @@ VkResult radv_CreateDevice(
unsigned num_features = sizeof(VkPhysicalDeviceFeatures) / sizeof(VkBool32);
for (uint32_t i = 0; i < num_features; i++) {
if (enabled_feature[i] && !supported_feature[i])
return vk_error(VK_ERROR_FEATURE_NOT_PRESENT);
return vk_error(physical_device->instance, VK_ERROR_FEATURE_NOT_PRESENT);
}
}
@@ -1423,7 +1411,7 @@ VkResult radv_CreateDevice(
sizeof(*device), 8,
VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
if (!device)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(physical_device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
device->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
device->instance = physical_device->instance;
@@ -1440,7 +1428,7 @@ VkResult radv_CreateDevice(
int index = radv_get_device_extension_index(ext_name);
if (index < 0 || !physical_device->supported_extensions.extensions[index]) {
vk_free(&device->alloc, device);
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
return vk_error(physical_device->instance, VK_ERROR_EXTENSION_NOT_PRESENT);
}
device->enabled_extensions.extensions[index] = true;
@@ -1497,13 +1485,11 @@ VkResult radv_CreateDevice(
device->always_use_syncobj = device->physical_device->rad_info.has_syncobj_wait_for_submit;
#endif
device->llvm_supports_spill = true;
/* The maximum number of scratch waves. Scratch space isn't divided
* evenly between CUs. The number is only a function of the number of CUs.
* We can decrease the constant to decrease the scratch buffer size.
*
* sctx->scratch_waves must be >= the maximum posible size of
* sctx->scratch_waves must be >= the maximum possible size of
* 1 threadgroup, so that the hw doesn't hang from being unable
* to start any.
*
@@ -1654,7 +1640,7 @@ VkResult radv_EnumerateInstanceLayerProperties(
}
/* None supported at this time */
return vk_error(VK_ERROR_LAYER_NOT_PRESENT);
return vk_error(NULL, VK_ERROR_LAYER_NOT_PRESENT);
}
VkResult radv_EnumerateDeviceLayerProperties(
@@ -1668,7 +1654,7 @@ VkResult radv_EnumerateDeviceLayerProperties(
}
/* None supported at this time */
return vk_error(VK_ERROR_LAYER_NOT_PRESENT);
return vk_error(NULL, VK_ERROR_LAYER_NOT_PRESENT);
}
void radv_GetDeviceQueue2(
@@ -1905,6 +1891,126 @@ radv_get_hs_offchip_param(struct radv_device *device, uint32_t *max_offchip_buff
return hs_offchip_param;
}
static void
radv_emit_gs_ring_sizes(struct radv_queue *queue, struct radeon_winsys_cs *cs,
struct radeon_winsys_bo *esgs_ring_bo,
uint32_t esgs_ring_size,
struct radeon_winsys_bo *gsvs_ring_bo,
uint32_t gsvs_ring_size)
{
if (!esgs_ring_bo && !gsvs_ring_bo)
return;
if (esgs_ring_bo)
radv_cs_add_buffer(queue->device->ws, cs, esgs_ring_bo, 8);
if (gsvs_ring_bo)
radv_cs_add_buffer(queue->device->ws, cs, gsvs_ring_bo, 8);
if (queue->device->physical_device->rad_info.chip_class >= CIK) {
radeon_set_uconfig_reg_seq(cs, R_030900_VGT_ESGS_RING_SIZE, 2);
radeon_emit(cs, esgs_ring_size >> 8);
radeon_emit(cs, gsvs_ring_size >> 8);
} else {
radeon_set_config_reg_seq(cs, R_0088C8_VGT_ESGS_RING_SIZE, 2);
radeon_emit(cs, esgs_ring_size >> 8);
radeon_emit(cs, gsvs_ring_size >> 8);
}
}
static void
radv_emit_tess_factor_ring(struct radv_queue *queue, struct radeon_winsys_cs *cs,
unsigned hs_offchip_param, unsigned tf_ring_size,
struct radeon_winsys_bo *tess_rings_bo)
{
uint64_t tf_va;
if (!tess_rings_bo)
return;
tf_va = radv_buffer_get_va(tess_rings_bo);
radv_cs_add_buffer(queue->device->ws, cs, tess_rings_bo, 8);
if (queue->device->physical_device->rad_info.chip_class >= CIK) {
radeon_set_uconfig_reg(cs, R_030938_VGT_TF_RING_SIZE,
S_030938_SIZE(tf_ring_size / 4));
radeon_set_uconfig_reg(cs, R_030940_VGT_TF_MEMORY_BASE,
tf_va >> 8);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg(cs, R_030944_VGT_TF_MEMORY_BASE_HI,
S_030944_BASE_HI(tf_va >> 40));
}
radeon_set_uconfig_reg(cs, R_03093C_VGT_HS_OFFCHIP_PARAM,
hs_offchip_param);
} else {
radeon_set_config_reg(cs, R_008988_VGT_TF_RING_SIZE,
S_008988_SIZE(tf_ring_size / 4));
radeon_set_config_reg(cs, R_0089B8_VGT_TF_MEMORY_BASE,
tf_va >> 8);
radeon_set_config_reg(cs, R_0089B0_VGT_HS_OFFCHIP_PARAM,
hs_offchip_param);
}
}
static void
radv_emit_compute_scratch(struct radv_queue *queue, struct radeon_winsys_cs *cs,
struct radeon_winsys_bo *compute_scratch_bo)
{
uint64_t scratch_va;
if (!compute_scratch_bo)
return;
scratch_va = radv_buffer_get_va(compute_scratch_bo);
radv_cs_add_buffer(queue->device->ws, cs, compute_scratch_bo, 8);
radeon_set_sh_reg_seq(cs, R_00B900_COMPUTE_USER_DATA_0, 2);
radeon_emit(cs, scratch_va);
radeon_emit(cs, S_008F04_BASE_ADDRESS_HI(scratch_va >> 32) |
S_008F04_SWIZZLE_ENABLE(1));
}
static void
radv_emit_global_shader_pointers(struct radv_queue *queue,
struct radeon_winsys_cs *cs,
struct radeon_winsys_bo *descriptor_bo)
{
uint64_t va;
if (!descriptor_bo)
return;
va = radv_buffer_get_va(descriptor_bo);
radv_cs_add_buffer(queue->device->ws, cs, descriptor_bo, 8);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
uint32_t regs[] = {R_00B030_SPI_SHADER_USER_DATA_PS_0,
R_00B130_SPI_SHADER_USER_DATA_VS_0,
R_00B208_SPI_SHADER_USER_DATA_ADDR_LO_GS,
R_00B408_SPI_SHADER_USER_DATA_ADDR_LO_HS};
for (int i = 0; i < ARRAY_SIZE(regs); ++i) {
radv_emit_shader_pointer(queue->device, cs, regs[i],
va, true);
}
} else {
uint32_t regs[] = {R_00B030_SPI_SHADER_USER_DATA_PS_0,
R_00B130_SPI_SHADER_USER_DATA_VS_0,
R_00B230_SPI_SHADER_USER_DATA_GS_0,
R_00B330_SPI_SHADER_USER_DATA_ES_0,
R_00B430_SPI_SHADER_USER_DATA_HS_0,
R_00B530_SPI_SHADER_USER_DATA_LS_0};
for (int i = 0; i < ARRAY_SIZE(regs); ++i) {
radv_emit_shader_pointer(queue->device, cs, regs[i],
va, true);
}
}
}
static VkResult
radv_get_preamble_cs(struct radv_queue *queue,
uint32_t scratch_size,
@@ -2059,18 +2165,6 @@ radv_get_preamble_cs(struct radv_queue *queue,
if (scratch_bo)
radv_cs_add_buffer(queue->device->ws, cs, scratch_bo, 8);
if (esgs_ring_bo)
radv_cs_add_buffer(queue->device->ws, cs, esgs_ring_bo, 8);
if (gsvs_ring_bo)
radv_cs_add_buffer(queue->device->ws, cs, gsvs_ring_bo, 8);
if (tess_rings_bo)
radv_cs_add_buffer(queue->device->ws, cs, tess_rings_bo, 8);
if (descriptor_bo)
radv_cs_add_buffer(queue->device->ws, cs, descriptor_bo, 8);
if (descriptor_bo != queue->descriptor_bo) {
uint32_t *map = (uint32_t*)queue->device->ws->buffer_map(descriptor_bo);
@@ -2102,80 +2196,12 @@ radv_get_preamble_cs(struct radv_queue *queue,
radeon_emit(cs, EVENT_TYPE(V_028A90_VGT_FLUSH) | EVENT_INDEX(0));
}
if (esgs_ring_bo || gsvs_ring_bo) {
if (queue->device->physical_device->rad_info.chip_class >= CIK) {
radeon_set_uconfig_reg_seq(cs, R_030900_VGT_ESGS_RING_SIZE, 2);
radeon_emit(cs, esgs_ring_size >> 8);
radeon_emit(cs, gsvs_ring_size >> 8);
} else {
radeon_set_config_reg_seq(cs, R_0088C8_VGT_ESGS_RING_SIZE, 2);
radeon_emit(cs, esgs_ring_size >> 8);
radeon_emit(cs, gsvs_ring_size >> 8);
}
}
if (tess_rings_bo) {
uint64_t tf_va = radv_buffer_get_va(tess_rings_bo);
if (queue->device->physical_device->rad_info.chip_class >= CIK) {
radeon_set_uconfig_reg(cs, R_030938_VGT_TF_RING_SIZE,
S_030938_SIZE(tess_factor_ring_size / 4));
radeon_set_uconfig_reg(cs, R_030940_VGT_TF_MEMORY_BASE,
tf_va >> 8);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_uconfig_reg(cs, R_030944_VGT_TF_MEMORY_BASE_HI,
S_030944_BASE_HI(tf_va >> 40));
}
radeon_set_uconfig_reg(cs, R_03093C_VGT_HS_OFFCHIP_PARAM, hs_offchip_param);
} else {
radeon_set_config_reg(cs, R_008988_VGT_TF_RING_SIZE,
S_008988_SIZE(tess_factor_ring_size / 4));
radeon_set_config_reg(cs, R_0089B8_VGT_TF_MEMORY_BASE,
tf_va >> 8);
radeon_set_config_reg(cs, R_0089B0_VGT_HS_OFFCHIP_PARAM,
hs_offchip_param);
}
}
if (descriptor_bo) {
uint64_t va = radv_buffer_get_va(descriptor_bo);
if (queue->device->physical_device->rad_info.chip_class >= GFX9) {
uint32_t regs[] = {R_00B030_SPI_SHADER_USER_DATA_PS_0,
R_00B130_SPI_SHADER_USER_DATA_VS_0,
R_00B208_SPI_SHADER_USER_DATA_ADDR_LO_GS,
R_00B408_SPI_SHADER_USER_DATA_ADDR_LO_HS};
for (int i = 0; i < ARRAY_SIZE(regs); ++i) {
radeon_set_sh_reg_seq(cs, regs[i], 2);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
}
} else {
uint32_t regs[] = {R_00B030_SPI_SHADER_USER_DATA_PS_0,
R_00B130_SPI_SHADER_USER_DATA_VS_0,
R_00B230_SPI_SHADER_USER_DATA_GS_0,
R_00B330_SPI_SHADER_USER_DATA_ES_0,
R_00B430_SPI_SHADER_USER_DATA_HS_0,
R_00B530_SPI_SHADER_USER_DATA_LS_0};
for (int i = 0; i < ARRAY_SIZE(regs); ++i) {
radeon_set_sh_reg_seq(cs, regs[i], 2);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
}
}
}
if (compute_scratch_bo) {
uint64_t scratch_va = radv_buffer_get_va(compute_scratch_bo);
uint32_t rsrc1 = S_008F04_BASE_ADDRESS_HI(scratch_va >> 32) |
S_008F04_SWIZZLE_ENABLE(1);
radv_cs_add_buffer(queue->device->ws, cs, compute_scratch_bo, 8);
radeon_set_sh_reg_seq(cs, R_00B900_COMPUTE_USER_DATA_0, 2);
radeon_emit(cs, scratch_va);
radeon_emit(cs, rsrc1);
}
radv_emit_gs_ring_sizes(queue, cs, esgs_ring_bo, esgs_ring_size,
gsvs_ring_bo, gsvs_ring_size);
radv_emit_tess_factor_ring(queue, cs, hs_offchip_param,
tess_factor_ring_size, tess_rings_bo);
radv_emit_global_shader_pointers(queue, cs, descriptor_bo);
radv_emit_compute_scratch(queue, cs, compute_scratch_bo);
if (i == 0) {
si_cs_emit_cache_flush(cs,
@@ -2282,10 +2308,11 @@ fail:
queue->device->ws->buffer_destroy(gsvs_ring_bo);
if (tess_rings_bo && tess_rings_bo != queue->tess_rings_bo)
queue->device->ws->buffer_destroy(tess_rings_bo);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(queue->device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
static VkResult radv_alloc_sem_counts(struct radv_winsys_sem_counts *counts,
static VkResult radv_alloc_sem_counts(struct radv_instance *instance,
struct radv_winsys_sem_counts *counts,
int num_sems,
const VkSemaphore *sems,
VkFence _fence,
@@ -2314,14 +2341,14 @@ static VkResult radv_alloc_sem_counts(struct radv_winsys_sem_counts *counts,
if (counts->syncobj_count) {
counts->syncobj = (uint32_t *)malloc(sizeof(uint32_t) * counts->syncobj_count);
if (!counts->syncobj)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
if (counts->sem_count) {
counts->sem = (struct radeon_winsys_sem **)malloc(sizeof(struct radeon_winsys_sem *) * counts->sem_count);
if (!counts->sem) {
free(counts->syncobj);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
}
@@ -2350,7 +2377,8 @@ static VkResult radv_alloc_sem_counts(struct radv_winsys_sem_counts *counts,
return VK_SUCCESS;
}
void radv_free_sem_info(struct radv_winsys_sem_info *sem_info)
static void
radv_free_sem_info(struct radv_winsys_sem_info *sem_info)
{
free(sem_info->wait.syncobj);
free(sem_info->wait.sem);
@@ -2373,20 +2401,22 @@ static void radv_free_temp_syncobjs(struct radv_device *device,
}
}
VkResult radv_alloc_sem_info(struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
int num_signal_sems,
const VkSemaphore *signal_sems,
VkFence fence)
static VkResult
radv_alloc_sem_info(struct radv_instance *instance,
struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
int num_signal_sems,
const VkSemaphore *signal_sems,
VkFence fence)
{
VkResult ret;
memset(sem_info, 0, sizeof(*sem_info));
ret = radv_alloc_sem_counts(&sem_info->wait, num_wait_sems, wait_sems, VK_NULL_HANDLE, true);
ret = radv_alloc_sem_counts(instance, &sem_info->wait, num_wait_sems, wait_sems, VK_NULL_HANDLE, true);
if (ret)
return ret;
ret = radv_alloc_sem_counts(&sem_info->signal, num_signal_sems, signal_sems, fence, false);
ret = radv_alloc_sem_counts(instance, &sem_info->signal, num_signal_sems, signal_sems, fence, false);
if (ret)
radv_free_sem_info(sem_info);
@@ -2404,7 +2434,7 @@ static VkResult radv_signal_fence(struct radv_queue *queue,
VkResult result;
struct radv_winsys_sem_info sem_info;
result = radv_alloc_sem_info(&sem_info, 0, NULL, 0, NULL,
result = radv_alloc_sem_info(queue->device->instance, &sem_info, 0, NULL, 0, NULL,
radv_fence_to_handle(fence));
if (result != VK_SUCCESS)
return result;
@@ -2417,7 +2447,7 @@ static VkResult radv_signal_fence(struct radv_queue *queue,
/* TODO: find a better error */
if (ret)
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(queue->device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
return VK_SUCCESS;
}
@@ -2474,7 +2504,8 @@ VkResult radv_QueueSubmit(
uint32_t advance;
struct radv_winsys_sem_info sem_info;
result = radv_alloc_sem_info(&sem_info,
result = radv_alloc_sem_info(queue->device->instance,
&sem_info,
pSubmits[i].waitSemaphoreCount,
pSubmits[i].pWaitSemaphores,
pSubmits[i].signalSemaphoreCount,
@@ -2517,6 +2548,8 @@ VkResult radv_QueueSubmit(
for (uint32_t j = 0; j < pSubmits[i].commandBufferCount; j += advance) {
struct radeon_winsys_cs *initial_preamble = (do_flush && !j) ? initial_flush_preamble_cs : initial_preamble_cs;
const struct radv_winsys_bo_list *bo_list = NULL;
advance = MIN2(max_cs_submission,
pSubmits[i].commandBufferCount - j);
@@ -2526,12 +2559,14 @@ VkResult radv_QueueSubmit(
sem_info.cs_emit_wait = j == 0;
sem_info.cs_emit_signal = j + advance == pSubmits[i].commandBufferCount;
if (unlikely(queue->device->use_global_bo_list))
if (unlikely(queue->device->use_global_bo_list)) {
pthread_mutex_lock(&queue->device->bo_list.mutex);
bo_list = &queue->device->bo_list.list;
}
ret = queue->device->ws->cs_submit(ctx, queue->queue_idx, cs_array + j,
advance, initial_preamble, continue_preamble_cs,
&sem_info, &queue->device->bo_list.list,
&sem_info, bo_list,
can_patch, base_fence);
if (unlikely(queue->device->use_global_bo_list))
@@ -2715,7 +2750,7 @@ static VkResult radv_alloc_memory(struct radv_device *device,
mem = vk_alloc2(&device->alloc, pAllocator, sizeof(*mem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (mem == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
if (wsi_info && wsi_info->implicit_sync)
flags |= RADEON_FLAG_IMPLICIT_SYNC;
@@ -2853,7 +2888,7 @@ VkResult radv_MapMemory(
return VK_SUCCESS;
}
return vk_error(VK_ERROR_MEMORY_MAP_FAILED);
return vk_error(device->instance, VK_ERROR_MEMORY_MAP_FAILED);
}
void radv_UnmapMemory(
@@ -3127,7 +3162,8 @@ radv_sparse_image_opaque_bind_memory(struct radv_device *device,
}
VkResult result;
result = radv_alloc_sem_info(&sem_info,
result = radv_alloc_sem_info(queue->device->instance,
&sem_info,
pBindInfo[i].waitSemaphoreCount,
pBindInfo[i].pWaitSemaphores,
pBindInfo[i].signalSemaphoreCount,
@@ -3178,7 +3214,7 @@ VkResult radv_CreateFence(
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!fence)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
fence->submitted = false;
fence->signalled = !!(pCreateInfo->flags & VK_FENCE_CREATE_SIGNALED_BIT);
@@ -3187,7 +3223,7 @@ VkResult radv_CreateFence(
int ret = device->ws->create_syncobj(device->ws, &fence->syncobj);
if (ret) {
vk_free2(&device->alloc, pAllocator, fence);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
if (pCreateInfo->flags & VK_FENCE_CREATE_SIGNALED_BIT) {
device->ws->signal_syncobj(device->ws, fence->syncobj);
@@ -3197,7 +3233,7 @@ VkResult radv_CreateFence(
fence->fence = device->ws->create_fence();
if (!fence->fence) {
vk_free2(&device->alloc, pAllocator, fence);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
fence->syncobj = 0;
}
@@ -3268,7 +3304,7 @@ VkResult radv_WaitForFences(
if (device->always_use_syncobj) {
uint32_t *handles = malloc(sizeof(uint32_t) * fenceCount);
if (!handles)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
for (uint32_t i = 0; i < fenceCount; ++i) {
RADV_FROM_HANDLE(radv_fence, fence, pFences[i]);
@@ -3287,7 +3323,7 @@ VkResult radv_WaitForFences(
uint32_t wait_count = 0;
struct radeon_winsys_fence **fences = malloc(sizeof(struct radeon_winsys_fence *) * fenceCount);
if (!fences)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
for (uint32_t i = 0; i < fenceCount; ++i) {
RADV_FROM_HANDLE(radv_fence, fence, pFences[i]);
@@ -3426,7 +3462,7 @@ VkResult radv_CreateSemaphore(
sizeof(*sem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!sem)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
sem->temp_syncobj = 0;
/* create a syncobject if we are going to export this semaphore */
@@ -3435,14 +3471,14 @@ VkResult radv_CreateSemaphore(
int ret = device->ws->create_syncobj(device->ws, &sem->syncobj);
if (ret) {
vk_free2(&device->alloc, pAllocator, sem);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
sem->sem = NULL;
} else {
sem->sem = device->ws->create_sem(device->ws);
if (!sem->sem) {
vk_free2(&device->alloc, pAllocator, sem);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
sem->syncobj = 0;
}
@@ -3480,14 +3516,14 @@ VkResult radv_CreateEvent(
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!event)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
event->bo = device->ws->buffer_create(device->ws, 8, 8,
RADEON_DOMAIN_GTT,
RADEON_FLAG_VA_UNCACHED | RADEON_FLAG_CPU_ACCESS | RADEON_FLAG_NO_INTERPROCESS_SHARING);
if (!event->bo) {
vk_free2(&device->alloc, pAllocator, event);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
event->map = (uint64_t*)device->ws->buffer_map(event->bo);
@@ -3556,7 +3592,7 @@ VkResult radv_CreateBuffer(
buffer = vk_alloc2(&device->alloc, pAllocator, sizeof(*buffer), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (buffer == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
buffer->size = pCreateInfo->size;
buffer->usage = pCreateInfo->usage;
@@ -3573,7 +3609,7 @@ VkResult radv_CreateBuffer(
4096, 0, RADEON_FLAG_VIRTUAL);
if (!buffer->bo) {
vk_free2(&device->alloc, pAllocator, buffer);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
}
@@ -3934,7 +3970,8 @@ radv_initialise_ds_surface(struct radv_device *device,
ds->db_z_info = S_028038_FORMAT(format) |
S_028038_NUM_SAMPLES(util_logbase2(iview->image->info.samples)) |
S_028038_SW_MODE(iview->image->surface.u.gfx9.surf.swizzle_mode) |
S_028038_MAXMIP(iview->image->info.levels - 1);
S_028038_MAXMIP(iview->image->info.levels - 1) |
S_028038_ZRANGE_PRECISION(1);
ds->db_stencil_info = S_02803C_FORMAT(stencil_format) |
S_02803C_SW_MODE(iview->image->surface.u.gfx9.stencil.swizzle_mode);
@@ -4060,7 +4097,7 @@ VkResult radv_CreateFramebuffer(
framebuffer = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (framebuffer == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
framebuffer->attachment_count = pCreateInfo->attachmentCount;
framebuffer->width = pCreateInfo->width;
@@ -4278,7 +4315,7 @@ VkResult radv_CreateSampler(
sampler = vk_alloc2(&device->alloc, pAllocator, sizeof(*sampler), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!sampler)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
radv_init_sampler(device, sampler, pCreateInfo);
*pSampler = radv_sampler_to_handle(sampler);
@@ -4360,7 +4397,7 @@ VkResult radv_GetMemoryFdKHR(VkDevice _device,
bool ret = radv_get_memory_fd(device, memory, pFD);
if (ret == false)
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
return VK_SUCCESS;
}
@@ -4369,6 +4406,8 @@ VkResult radv_GetMemoryFdPropertiesKHR(VkDevice _device,
int fd,
VkMemoryFdPropertiesKHR *pMemoryFdProperties)
{
RADV_FROM_HANDLE(radv_device, device, _device);
switch (handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT:
pMemoryFdProperties->memoryTypeBits = (1 << RADV_MEM_TYPE_COUNT) - 1;
@@ -4382,7 +4421,7 @@ VkResult radv_GetMemoryFdPropertiesKHR(VkDevice _device,
*
* So opaque handle types fall into the default "unsupported" case.
*/
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
}
}
@@ -4393,7 +4432,7 @@ static VkResult radv_import_opaque_fd(struct radv_device *device,
uint32_t syncobj_handle = 0;
int ret = device->ws->import_syncobj(device->ws, fd, &syncobj_handle);
if (ret != 0)
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
if (*syncobj)
device->ws->destroy_syncobj(device->ws, *syncobj);
@@ -4414,7 +4453,7 @@ static VkResult radv_import_sync_fd(struct radv_device *device,
if (!syncobj_handle) {
int ret = device->ws->create_syncobj(device->ws, &syncobj_handle);
if (ret) {
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
}
}
@@ -4423,7 +4462,7 @@ static VkResult radv_import_sync_fd(struct radv_device *device,
} else {
int ret = device->ws->import_syncobj_from_sync_file(device->ws, syncobj_handle, fd);
if (ret != 0)
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
}
*syncobj = syncobj_handle;
@@ -4490,7 +4529,7 @@ VkResult radv_GetSemaphoreFdKHR(VkDevice _device,
}
if (ret)
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return VK_SUCCESS;
}
@@ -4579,7 +4618,7 @@ VkResult radv_GetFenceFdKHR(VkDevice _device,
}
if (ret)
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return vk_error(device->instance, VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return VK_SUCCESS;
}

View File

@@ -116,7 +116,7 @@ struct string_map_entry {
uint32_t num;
};
/* We use a big string constant to avoid lots of reloctions from the entry
/* We use a big string constant to avoid lots of relocations from the entry
* point table to lots of little strings. The entries in the entry point table
* store the index into this big string.
*/
@@ -205,7 +205,7 @@ radv_entrypoint_is_enabled(int index, uint32_t core_version,
% if not e.device_command:
if (device) return false;
% endif
% if e.name == 'vkCreateInstance' or e.name == 'vkEnumerateInstanceExtensionProperties' or e.name == 'vkEnumerateInstanceLayerProperties':
% if e.name == 'vkCreateInstance' or e.name == 'vkEnumerateInstanceExtensionProperties' or e.name == 'vkEnumerateInstanceLayerProperties' or e.name == 'vkEnumerateInstanceVersion':
return !device;
% elif e.core_version:
return instance && ${e.core_version.c_vk_version()} <= core_version;

View File

@@ -56,6 +56,7 @@ EXTENSIONS = [
Extension('VK_KHR_descriptor_update_template', 1, True),
Extension('VK_KHR_device_group', 1, True),
Extension('VK_KHR_device_group_creation', 1, True),
Extension('VK_KHR_draw_indirect_count', 1, True),
Extension('VK_KHR_external_fence', 1, 'device->rad_info.has_syncobj_wait_for_submit'),
Extension('VK_KHR_external_fence_capabilities', 1, True),
Extension('VK_KHR_external_fence_fd', 1, 'device->rad_info.has_syncobj_wait_for_submit'),
@@ -195,9 +196,9 @@ struct radv_device_extension_table {
};
};
const VkExtensionProperties radv_instance_extensions[RADV_INSTANCE_EXTENSION_COUNT];
const VkExtensionProperties radv_device_extensions[RADV_DEVICE_EXTENSION_COUNT];
const struct radv_instance_extension_table radv_supported_instance_extensions;
extern const VkExtensionProperties radv_instance_extensions[RADV_INSTANCE_EXTENSION_COUNT];
extern const VkExtensionProperties radv_device_extensions[RADV_DEVICE_EXTENSION_COUNT];
extern const struct radv_instance_extension_table radv_supported_instance_extensions;
struct radv_physical_device;

View File

@@ -321,10 +321,8 @@ uint32_t radv_translate_tex_dataformat(VkFormat format,
return V_008F14_IMG_DATA_FORMAT_32;
case 2:
return V_008F14_IMG_DATA_FORMAT_32_32;
#if 0 /* Not supported for render targets */
case 3:
return V_008F14_IMG_DATA_FORMAT_32_32_32;
#endif
case 4:
return V_008F14_IMG_DATA_FORMAT_32_32_32_32;
}
@@ -638,13 +636,17 @@ radv_physical_device_get_format_properties(struct radv_physical_device *physical
tiled |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT;
}
}
if (tiled && util_is_power_of_two_or_zero(vk_format_get_blocksize(format)) && !scaled) {
if (tiled && !scaled) {
tiled |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
}
/* Tiled formatting does not support NPOT pixel sizes */
if (!util_is_power_of_two_or_zero(vk_format_get_blocksize(format)))
tiled = 0;
}
if (linear && util_is_power_of_two_or_zero(vk_format_get_blocksize(format)) && !scaled) {
if (linear && !scaled) {
linear |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
}
@@ -655,6 +657,25 @@ radv_physical_device_get_format_properties(struct radv_physical_device *physical
tiled |= VK_FORMAT_FEATURE_STORAGE_IMAGE_ATOMIC_BIT;
}
switch(format) {
case VK_FORMAT_A2R10G10B10_SNORM_PACK32:
case VK_FORMAT_A2B10G10R10_SNORM_PACK32:
case VK_FORMAT_A2R10G10B10_SSCALED_PACK32:
case VK_FORMAT_A2B10G10R10_SSCALED_PACK32:
case VK_FORMAT_A2R10G10B10_SINT_PACK32:
case VK_FORMAT_A2B10G10R10_SINT_PACK32:
if (physical_device->rad_info.chip_class <= VI &&
physical_device->rad_info.family != CHIP_STONEY) {
buffer &= ~(VK_FORMAT_FEATURE_UNIFORM_TEXEL_BUFFER_BIT |
VK_FORMAT_FEATURE_STORAGE_TEXEL_BUFFER_BIT);
linear = 0;
tiled = 0;
}
break;
default:
break;
}
out_properties->linearTilingFeatures = linear;
out_properties->optimalTilingFeatures = tiled;
out_properties->bufferFeatures = buffer;
@@ -859,194 +880,87 @@ bool radv_format_pack_clear_color(VkFormat format,
uint32_t clear_vals[2],
VkClearColorValue *value)
{
uint8_t r = 0, g = 0, b = 0, a = 0;
const struct vk_format_description *desc = vk_format_description(format);
if (vk_format_get_component_bits(format, VK_FORMAT_COLORSPACE_RGB, 0) <= 8) {
if (desc->colorspace == VK_FORMAT_COLORSPACE_RGB) {
r = float_to_ubyte(value->float32[0]);
g = float_to_ubyte(value->float32[1]);
b = float_to_ubyte(value->float32[2]);
a = float_to_ubyte(value->float32[3]);
} else if (desc->colorspace == VK_FORMAT_COLORSPACE_SRGB) {
r = util_format_linear_float_to_srgb_8unorm(value->float32[0]);
g = util_format_linear_float_to_srgb_8unorm(value->float32[1]);
b = util_format_linear_float_to_srgb_8unorm(value->float32[2]);
a = float_to_ubyte(value->float32[3]);
}
}
switch (format) {
case VK_FORMAT_R8_UNORM:
case VK_FORMAT_R8_SRGB:
clear_vals[0] = r;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8_UNORM:
case VK_FORMAT_R8G8_SRGB:
clear_vals[0] = r | g << 8;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8B8A8_SRGB:
case VK_FORMAT_R8G8B8A8_UNORM:
clear_vals[0] = r | g << 8 | b << 16 | a << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_B8G8R8A8_SRGB:
case VK_FORMAT_B8G8R8A8_UNORM:
clear_vals[0] = b | g << 8 | r << 16 | a << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_A8B8G8R8_UNORM_PACK32:
case VK_FORMAT_A8B8G8R8_SRGB_PACK32:
clear_vals[0] = r | g << 8 | b << 16 | a << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8_UINT:
clear_vals[0] = value->uint32[0] & 0xff;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8_SINT:
clear_vals[0] = value->int32[0] & 0xff;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16_UINT:
clear_vals[0] = value->uint32[0] & 0xffff;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8_UINT:
clear_vals[0] = value->uint32[0] & 0xff;
clear_vals[0] |= (value->uint32[1] & 0xff) << 8;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8_SINT:
clear_vals[0] = value->int32[0] & 0xff;
clear_vals[0] |= (value->int32[1] & 0xff) << 8;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8B8A8_UINT:
clear_vals[0] = value->uint32[0] & 0xff;
clear_vals[0] |= (value->uint32[1] & 0xff) << 8;
clear_vals[0] |= (value->uint32[2] & 0xff) << 16;
clear_vals[0] |= (value->uint32[3] & 0xff) << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_R8G8B8A8_SINT:
clear_vals[0] = value->int32[0] & 0xff;
clear_vals[0] |= (value->int32[1] & 0xff) << 8;
clear_vals[0] |= (value->int32[2] & 0xff) << 16;
clear_vals[0] |= (value->int32[3] & 0xff) << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_A8B8G8R8_UINT_PACK32:
clear_vals[0] = value->uint32[0] & 0xff;
clear_vals[0] |= (value->uint32[1] & 0xff) << 8;
clear_vals[0] |= (value->uint32[2] & 0xff) << 16;
clear_vals[0] |= (value->uint32[3] & 0xff) << 24;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16_UINT:
clear_vals[0] = value->uint32[0] & 0xffff;
clear_vals[0] |= (value->uint32[1] & 0xffff) << 16;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16B16A16_UINT:
clear_vals[0] = value->uint32[0] & 0xffff;
clear_vals[0] |= (value->uint32[1] & 0xffff) << 16;
clear_vals[1] = value->uint32[2] & 0xffff;
clear_vals[1] |= (value->uint32[3] & 0xffff) << 16;
break;
case VK_FORMAT_R32_UINT:
clear_vals[0] = value->uint32[0];
clear_vals[1] = 0;
break;
case VK_FORMAT_R32G32_UINT:
clear_vals[0] = value->uint32[0];
clear_vals[1] = value->uint32[1];
break;
case VK_FORMAT_R32_SINT:
clear_vals[0] = value->int32[0];
clear_vals[1] = 0;
break;
case VK_FORMAT_R16_SFLOAT:
clear_vals[0] = util_float_to_half(value->float32[0]);
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16_SFLOAT:
clear_vals[0] = util_float_to_half(value->float32[0]);
clear_vals[0] |= (uint32_t)util_float_to_half(value->float32[1]) << 16;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16B16A16_SFLOAT:
clear_vals[0] = util_float_to_half(value->float32[0]);
clear_vals[0] |= (uint32_t)util_float_to_half(value->float32[1]) << 16;
clear_vals[1] = util_float_to_half(value->float32[2]);
clear_vals[1] |= (uint32_t)util_float_to_half(value->float32[3]) << 16;
break;
case VK_FORMAT_R16_UNORM:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 0.0f, 1.0f) * 0xffff)) & 0xffff;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16_UNORM:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 0.0f, 1.0f) * 0xffff)) & 0xffff;
clear_vals[0] |= ((uint16_t)util_iround(CLAMP(value->float32[1], 0.0f, 1.0f) * 0xffff)) << 16;
clear_vals[1] = 0;
break;
case VK_FORMAT_R16G16B16A16_UNORM:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 0.0f, 1.0f) * 0xffff)) & 0xffff;
clear_vals[0] |= ((uint16_t)util_iround(CLAMP(value->float32[1], 0.0f, 1.0f) * 0xffff)) << 16;
clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], 0.0f, 1.0f) * 0xffff)) & 0xffff;
clear_vals[1] |= ((uint16_t)util_iround(CLAMP(value->float32[3], 0.0f, 1.0f) * 0xffff)) << 16;
break;
case VK_FORMAT_R16G16B16A16_SNORM:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], -1.0f, 1.0f) * 0x7fff)) & 0xffff;
clear_vals[0] |= ((uint16_t)util_iround(CLAMP(value->float32[1], -1.0f, 1.0f) * 0x7fff)) << 16;
clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], -1.0f, 1.0f) * 0x7fff)) & 0xffff;
clear_vals[1] |= ((uint16_t)util_iround(CLAMP(value->float32[3], -1.0f, 1.0f) * 0x7fff)) << 16;
break;
case VK_FORMAT_A2B10G10R10_UNORM_PACK32:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 0.0f, 1.0f) * 0x3ff)) & 0x3ff;
clear_vals[0] |= (((uint16_t)util_iround(CLAMP(value->float32[1], 0.0f, 1.0f) * 0x3ff)) & 0x3ff) << 10;
clear_vals[0] |= (((uint16_t)util_iround(CLAMP(value->float32[2], 0.0f, 1.0f) * 0x3ff)) & 0x3ff) << 20;
clear_vals[0] |= (((uint16_t)util_iround(CLAMP(value->float32[3], 0.0f, 1.0f) * 0x3)) & 0x3) << 30;
clear_vals[1] = 0;
return true;
case VK_FORMAT_R32G32_SFLOAT:
clear_vals[0] = fui(value->float32[0]);
clear_vals[1] = fui(value->float32[1]);
break;
case VK_FORMAT_R32_SFLOAT:
clear_vals[1] = 0;
clear_vals[0] = fui(value->float32[0]);
break;
case VK_FORMAT_B10G11R11_UFLOAT_PACK32:
if (format == VK_FORMAT_B10G11R11_UFLOAT_PACK32) {
clear_vals[0] = float3_to_r11g11b10f(value->float32);
clear_vals[1] = 0;
break;
case VK_FORMAT_R32G32B32A32_SFLOAT:
if (value->float32[0] != value->float32[1] ||
value->float32[0] != value->float32[2])
return false;
clear_vals[0] = fui(value->float32[0]);
clear_vals[1] = fui(value->float32[3]);
break;
case VK_FORMAT_R32G32B32A32_UINT:
if (value->uint32[0] != value->uint32[1] ||
value->uint32[0] != value->uint32[2])
return false;
clear_vals[0] = value->uint32[0];
clear_vals[1] = value->uint32[3];
break;
case VK_FORMAT_R32G32B32A32_SINT:
if (value->int32[0] != value->int32[1] ||
value->int32[0] != value->int32[2])
return false;
clear_vals[0] = value->int32[0];
clear_vals[1] = value->int32[3];
break;
default:
fprintf(stderr, "failed to fast clear %d\n", format);
return true;
}
if (desc->layout != VK_FORMAT_LAYOUT_PLAIN) {
fprintf(stderr, "failed to fast clear for non-plain format %d\n", format);
return false;
}
if (!util_is_power_of_two_or_zero(desc->block.bits)) {
fprintf(stderr, "failed to fast clear for NPOT format %d\n", format);
return false;
}
if (desc->block.bits > 64) {
/*
* We have a 128 bits format, check if the first 3 components are the same.
* Every elements has to be 32 bits since we don't support 64-bit formats,
* and we can skip swizzling checks as alpha always comes last for these and
* we do not care about the rest as they have to be the same.
*/
if (desc->channel[0].type == VK_FORMAT_TYPE_FLOAT) {
if (value->float32[0] != value->float32[1] ||
value->float32[0] != value->float32[2])
return false;
} else {
if (value->uint32[0] != value->uint32[1] ||
value->uint32[0] != value->uint32[2])
return false;
}
clear_vals[0] = value->uint32[0];
clear_vals[1] = value->uint32[3];
return true;
}
uint64_t clear_val = 0;
for (unsigned c = 0; c < 4; ++c) {
if (desc->swizzle[c] >= 4)
continue;
const struct vk_format_channel_description *channel = &desc->channel[desc->swizzle[c]];
assert(channel->size);
uint64_t v = 0;
if (channel->pure_integer) {
v = value->uint32[c] & ((1ULL << channel->size) - 1);
} else if (channel->normalized) {
if (channel->type == VK_FORMAT_TYPE_UNSIGNED &&
desc->swizzle[c] < 3 &&
desc->colorspace == VK_FORMAT_COLORSPACE_SRGB) {
assert(channel->size == 8);
v = util_format_linear_float_to_srgb_8unorm(value->float32[c]);
} else if (channel->type == VK_FORMAT_TYPE_UNSIGNED) {
v = MAX2(MIN2(value->float32[c], 1.0f), 0.0f) * ((1ULL << channel->size) - 1);
} else {
v = MAX2(MIN2(value->float32[c], 1.0f), -1.0f) * ((1ULL << (channel->size - 1)) - 1);
}
} else if (channel->type == VK_FORMAT_TYPE_FLOAT) {
if (channel->size == 32) {
memcpy(&v, &value->float32[c], 4);
} else if(channel->size == 16) {
v = util_float_to_half(value->float32[c]);
} else {
fprintf(stderr, "failed to fast clear for unhandled float size in format %d\n", format);
return false;
}
} else {
fprintf(stderr, "failed to fast clear for unhandled component type in format %d\n", format);
return false;
}
clear_val |= (v & ((1ULL << channel->size) - 1)) << channel->shift;
}
clear_vals[0] = clear_val;
clear_vals[1] = clear_val >> 32;
return true;
}
@@ -1306,7 +1220,7 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2(
* vkGetPhysicalDeviceImageFormatProperties2KHR returns
* VK_ERROR_FORMAT_NOT_SUPPORTED.
*/
result = vk_errorf(VK_ERROR_FORMAT_NOT_SUPPORTED,
result = vk_errorf(physical_device->instance, VK_ERROR_FORMAT_NOT_SUPPORTED,
"unsupported VkExternalMemoryTypeFlagBitsKHR 0x%x",
external_info->handleType);
goto fail;

View File

@@ -110,6 +110,8 @@ radv_use_dcc_for_image(struct radv_device *device,
{
bool dcc_compatible_formats;
bool blendable;
bool shareable = vk_find_struct_const(pCreateInfo->pNext,
EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHR) != NULL;
/* DCC (Delta Color Compression) is only available for GFX8+. */
if (device->physical_device->rad_info.chip_class < VI)
@@ -118,6 +120,11 @@ radv_use_dcc_for_image(struct radv_device *device,
if (device->instance->debug_flags & RADV_DEBUG_NO_DCC)
return false;
/* FIXME: DCC is broken for shareable images starting with GFX9 */
if (device->physical_device->rad_info.chip_class >= GFX9 &&
shareable)
return false;
/* TODO: Enable DCC for storage images. */
if ((pCreateInfo->usage & VK_IMAGE_USAGE_STORAGE_BIT) ||
(pCreateInfo->flags & VK_IMAGE_CREATE_EXTENDED_USAGE_BIT_KHR))
@@ -133,12 +140,12 @@ radv_use_dcc_for_image(struct radv_device *device,
if (create_info->scanout)
return false;
/* FIXME: DCC for MSAA with 4x and 8x samples doesn't work yet. */
if (pCreateInfo->samples > 2)
return false;
/* TODO: Enable DCC for MSAA textures. */
if (!device->physical_device->dcc_msaa_allowed)
/* FIXME: DCC for MSAA with 4x and 8x samples doesn't work yet, while
* 2x can be enabled with an option.
*/
if (pCreateInfo->samples > 2 ||
(pCreateInfo->samples == 2 &&
!device->physical_device->dcc_msaa_allowed))
return false;
/* Determine if the formats are DCC compatible. */
@@ -414,7 +421,7 @@ static unsigned radv_tex_dim(VkImageType image_type, VkImageViewType view_type,
else
return V_008F1C_SQ_RSRC_IMG_2D_ARRAY;
default:
unreachable("illegale image type");
unreachable("illegal image type");
}
}
@@ -534,7 +541,7 @@ si_make_texture_descriptor(struct radv_device *device,
if (device->physical_device->rad_info.chip_class >= GFX9) {
unsigned bc_swizzle = gfx9_border_color_swizzle(swizzle);
/* Depth is the the last accessible layer on Gfx9.
/* Depth is the last accessible layer on Gfx9.
* The hw doesn't need to know the total number of layers.
*/
if (type == V_008F1C_SQ_RSRC_IMG_3D)
@@ -619,7 +626,7 @@ si_make_texture_descriptor(struct radv_device *device,
S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) |
S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) |
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, 1, 0, false, false));
S_008F1C_TYPE(radv_tex_dim(image->type, view_type, image->info.array_size, 0, false, false));
fmask_state[4] = 0;
fmask_state[5] = S_008F24_BASE_ARRAY(first_layer);
fmask_state[6] = 0;
@@ -726,56 +733,20 @@ radv_image_get_fmask_info(struct radv_device *device,
unsigned nr_samples,
struct radv_fmask_info *out)
{
/* FMASK is allocated like an ordinary texture. */
struct radeon_surf fmask = {};
struct ac_surf_info info = image->info;
memset(out, 0, sizeof(*out));
if (device->physical_device->rad_info.chip_class >= GFX9) {
out->alignment = image->surface.u.gfx9.fmask_alignment;
out->size = image->surface.u.gfx9.fmask_size;
out->alignment = image->surface.fmask_alignment;
out->size = image->surface.fmask_size;
out->tile_swizzle = image->surface.fmask_tile_swizzle;
return;
}
fmask.blk_w = image->surface.blk_w;
fmask.blk_h = image->surface.blk_h;
info.samples = 1;
fmask.flags = image->surface.flags | RADEON_SURF_FMASK;
if (!image->shareable)
info.surf_index = &device->fmask_mrt_offset_counter;
/* Force 2D tiling if it wasn't set. This may occur when creating
* FMASK for MSAA resolve on R6xx. On R6xx, the single-sample
* destination buffer must have an FMASK too. */
fmask.flags = RADEON_SURF_CLR(fmask.flags, MODE);
fmask.flags |= RADEON_SURF_SET(RADEON_SURF_MODE_2D, MODE);
switch (nr_samples) {
case 2:
case 4:
fmask.bpe = 1;
break;
case 8:
fmask.bpe = 4;
break;
default:
return;
}
device->ws->surface_init(device->ws, &info, &fmask);
assert(fmask.u.legacy.level[0].mode == RADEON_SURF_MODE_2D);
out->slice_tile_max = (fmask.u.legacy.level[0].nblk_x * fmask.u.legacy.level[0].nblk_y) / 64;
if (out->slice_tile_max)
out->slice_tile_max -= 1;
out->tile_mode_index = fmask.u.legacy.tiling_index[0];
out->pitch_in_pixels = fmask.u.legacy.level[0].nblk_x;
out->bank_height = fmask.u.legacy.bankh;
out->tile_swizzle = fmask.tile_swizzle;
out->alignment = MAX2(256, fmask.surf_alignment);
out->size = fmask.surf_size;
out->slice_tile_max = image->surface.u.legacy.fmask.slice_tile_max;
out->tile_mode_index = image->surface.u.legacy.fmask.tiling_index;
out->pitch_in_pixels = image->surface.u.legacy.fmask.pitch_in_pixels;
out->bank_height = image->surface.u.legacy.fmask.bankh;
out->tile_swizzle = image->surface.fmask_tile_swizzle;
out->alignment = image->surface.fmask_alignment;
out->size = image->surface.fmask_size;
assert(!out->tile_swizzle || !image->shareable);
}
@@ -959,16 +930,17 @@ radv_image_create(VkDevice _device,
image = vk_zalloc2(&device->alloc, alloc, sizeof(*image), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!image)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
image->type = pCreateInfo->imageType;
image->info.width = pCreateInfo->extent.width;
image->info.height = pCreateInfo->extent.height;
image->info.depth = pCreateInfo->extent.depth;
image->info.samples = pCreateInfo->samples;
image->info.color_samples = pCreateInfo->samples;
image->info.array_size = pCreateInfo->arrayLayers;
image->info.levels = pCreateInfo->mipLevels;
image->info.num_channels = 4; /* TODO: set this correctly */
image->info.num_channels = vk_format_get_nr_components(pCreateInfo->format);
image->vk_format = pCreateInfo->format;
image->tiling = pCreateInfo->tiling;
@@ -1043,7 +1015,7 @@ radv_image_create(VkDevice _device,
0, RADEON_FLAG_VIRTUAL);
if (!image->bo) {
vk_free2(&device->alloc, alloc, image);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
}
@@ -1358,7 +1330,7 @@ radv_CreateImageView(VkDevice _device,
view = vk_alloc2(&device->alloc, pAllocator, sizeof(*view), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (view == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
radv_image_view_init(view, device, pCreateInfo);
@@ -1406,7 +1378,7 @@ radv_CreateBufferView(VkDevice _device,
view = vk_alloc2(&device->alloc, pAllocator, sizeof(*view), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!view)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
radv_buffer_view_init(view, device, pCreateInfo);

View File

@@ -199,10 +199,6 @@ void radv_decompress_resolve_src(struct radv_cmd_buffer *cmd_buffer,
uint32_t region_count,
const VkImageResolve *regions);
void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
struct radv_image *linear_image);
uint32_t radv_clear_cmask(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image, uint32_t value);
uint32_t radv_clear_dcc(struct radv_cmd_buffer *cmd_buffer,

View File

@@ -100,7 +100,8 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_buffer *src_buf,
struct blit2d_src_temps *tmp,
enum blit2d_src_type src_type, VkFormat depth_format,
VkImageAspectFlagBits aspects)
VkImageAspectFlagBits aspects,
uint32_t log2_samples)
{
struct radv_device *device = cmd_buffer->device;
@@ -108,7 +109,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
create_bview(cmd_buffer, src_buf, &tmp->bview, depth_format);
radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
device->meta_state.blit2d.p_layouts[src_type],
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
0, /* set */
1, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
@@ -123,7 +124,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
});
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
&src_buf->pitch);
} else {
@@ -131,12 +132,12 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
if (src_type == BLIT2D_SRC_TYPE_IMAGE_3D)
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
&src_img->layer);
radv_meta_push_descriptor_set(cmd_buffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
device->meta_state.blit2d.p_layouts[src_type],
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
0, /* set */
1, /* descriptorWriteCount */
(VkWriteDescriptorSet[]) {
@@ -190,10 +191,11 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
static void
bind_pipeline(struct radv_cmd_buffer *cmd_buffer,
enum blit2d_src_type src_type, unsigned fs_key)
enum blit2d_src_type src_type, unsigned fs_key,
uint32_t log2_samples)
{
VkPipeline pipeline =
cmd_buffer->device->meta_state.blit2d.pipelines[src_type][fs_key];
cmd_buffer->device->meta_state.blit2d[log2_samples].pipelines[src_type][fs_key];
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
@@ -201,10 +203,11 @@ bind_pipeline(struct radv_cmd_buffer *cmd_buffer,
static void
bind_depth_pipeline(struct radv_cmd_buffer *cmd_buffer,
enum blit2d_src_type src_type)
enum blit2d_src_type src_type,
uint32_t log2_samples)
{
VkPipeline pipeline =
cmd_buffer->device->meta_state.blit2d.depth_only_pipeline[src_type];
cmd_buffer->device->meta_state.blit2d[log2_samples].depth_only_pipeline[src_type];
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
@@ -212,10 +215,11 @@ bind_depth_pipeline(struct radv_cmd_buffer *cmd_buffer,
static void
bind_stencil_pipeline(struct radv_cmd_buffer *cmd_buffer,
enum blit2d_src_type src_type)
enum blit2d_src_type src_type,
uint32_t log2_samples)
{
VkPipeline pipeline =
cmd_buffer->device->meta_state.blit2d.stencil_only_pipeline[src_type];
cmd_buffer->device->meta_state.blit2d[log2_samples].stencil_only_pipeline[src_type];
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
@@ -227,7 +231,8 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_blit2d_buffer *src_buf,
struct radv_meta_blit2d_surf *dst,
unsigned num_rects,
struct radv_meta_blit2d_rect *rects, enum blit2d_src_type src_type)
struct radv_meta_blit2d_rect *rects, enum blit2d_src_type src_type,
uint32_t log2_samples)
{
struct radv_device *device = cmd_buffer->device;
@@ -241,7 +246,7 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
else if (aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT)
depth_format = vk_format_depth_only(dst->image->vk_format);
struct blit2d_src_temps src_temps;
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format, aspect_mask);
blit2d_bind_src(cmd_buffer, src_img, src_buf, &src_temps, src_type, depth_format, aspect_mask, log2_samples);
struct blit2d_dst_temps dst_temps;
blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + rects[r].width,
@@ -255,7 +260,7 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
};
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d.p_layouts[src_type],
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
VK_SHADER_STAGE_VERTEX_BIT, 0, 16,
vertex_push_constants);
@@ -266,7 +271,7 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.render_passes[fs_key][dst_layout],
.renderPass = device->meta_state.blit2d_render_passes[fs_key][dst_layout],
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
@@ -277,13 +282,13 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_pipeline(cmd_buffer, src_type, fs_key);
bind_pipeline(cmd_buffer, src_type, fs_key, log2_samples);
} else if (aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT) {
enum radv_blit_ds_layout ds_layout = radv_meta_blit_ds_to_type(dst->current_layout);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.depth_only_rp[ds_layout],
.renderPass = device->meta_state.blit2d_depth_only_rp[ds_layout],
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
@@ -294,14 +299,14 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_depth_pipeline(cmd_buffer, src_type);
bind_depth_pipeline(cmd_buffer, src_type, log2_samples);
} else if (aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT) {
enum radv_blit_ds_layout ds_layout = radv_meta_blit_ds_to_type(dst->current_layout);
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
&(VkRenderPassBeginInfo) {
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.renderPass = device->meta_state.blit2d.stencil_only_rp[ds_layout],
.renderPass = device->meta_state.blit2d_stencil_only_rp[ds_layout],
.framebuffer = dst_temps.fb,
.renderArea = {
.offset = { rects[r].dst_x, rects[r].dst_y, },
@@ -312,7 +317,7 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
}, VK_SUBPASS_CONTENTS_INLINE);
bind_stencil_pipeline(cmd_buffer, src_type);
bind_stencil_pipeline(cmd_buffer, src_type, log2_samples);
} else
unreachable("Processing blit2d with multiple aspects.");
@@ -332,7 +337,24 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer *cmd_buffer,
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
if (log2_samples > 0) {
for (uint32_t sample = 0; sample < src_img->image->info.samples; sample++) {
uint32_t sample_mask = 1 << sample;
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 20, 4,
&sample);
radv_CmdPushConstants(radv_cmd_buffer_to_handle(cmd_buffer),
device->meta_state.blit2d[log2_samples].p_layouts[src_type],
VK_SHADER_STAGE_FRAGMENT_BIT, 24, 4,
&sample_mask);
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
}
}
else
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
radv_CmdEndRenderPass(radv_cmd_buffer_to_handle(cmd_buffer));
/* At the point where we emit the draw call, all data from the
@@ -358,7 +380,8 @@ radv_meta_blit2d(struct radv_cmd_buffer *cmd_buffer,
enum blit2d_src_type src_type = src_buf ? BLIT2D_SRC_TYPE_BUFFER :
use_3d ? BLIT2D_SRC_TYPE_IMAGE_3D : BLIT2D_SRC_TYPE_IMAGE;
radv_meta_blit2d_normal_dst(cmd_buffer, src_img, src_buf, dst,
num_rects, rects, src_type);
num_rects, rects, src_type,
src_img ? util_logbase2(src_img->image->info.samples) : 0);
}
static nir_shader *
@@ -421,13 +444,14 @@ build_nir_vertex_shader(void)
typedef nir_ssa_def* (*texel_fetch_build_func)(struct nir_builder *,
struct radv_device *,
nir_ssa_def *, bool);
nir_ssa_def *, bool, bool);
static nir_ssa_def *
build_nir_texel_fetch(struct nir_builder *b, struct radv_device *device,
nir_ssa_def *tex_pos, bool is_3d)
nir_ssa_def *tex_pos, bool is_3d, bool is_multisampled)
{
enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D : GLSL_SAMPLER_DIM_2D;
enum glsl_sampler_dim dim =
is_3d ? GLSL_SAMPLER_DIM_3D : is_multisampled ? GLSL_SAMPLER_DIM_MS : GLSL_SAMPLER_DIM_2D;
const struct glsl_type *sampler_type =
glsl_sampler_type(dim, false, false, GLSL_TYPE_UINT);
nir_variable *sampler = nir_variable_create(b->shader, nir_var_uniform,
@@ -436,6 +460,7 @@ build_nir_texel_fetch(struct nir_builder *b, struct radv_device *device,
sampler->data.binding = 0;
nir_ssa_def *tex_pos_3d = NULL;
nir_intrinsic_instr *sample_idx = NULL;
if (is_3d) {
nir_intrinsic_instr *layer = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(layer, 16);
@@ -451,13 +476,26 @@ build_nir_texel_fetch(struct nir_builder *b, struct radv_device *device,
chans[2] = &layer->dest.ssa;
tex_pos_3d = nir_vec(b, chans, 3);
}
nir_tex_instr *tex = nir_tex_instr_create(b->shader, 2);
if (is_multisampled) {
sample_idx = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(sample_idx, 20);
nir_intrinsic_set_range(sample_idx, 4);
sample_idx->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
sample_idx->num_components = 1;
nir_ssa_dest_init(&sample_idx->instr, &sample_idx->dest, 1, 32, "sample_idx");
nir_builder_instr_insert(b, &sample_idx->instr);
}
nir_tex_instr *tex = nir_tex_instr_create(b->shader, is_multisampled ? 3 : 2);
tex->sampler_dim = dim;
tex->op = nir_texop_txf;
tex->op = is_multisampled ? nir_texop_txf_ms : nir_texop_txf;
tex->src[0].src_type = nir_tex_src_coord;
tex->src[0].src = nir_src_for_ssa(is_3d ? tex_pos_3d : tex_pos);
tex->src[1].src_type = nir_tex_src_lod;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, 0));
tex->src[1].src_type = is_multisampled ? nir_tex_src_ms_index : nir_tex_src_lod;
tex->src[1].src = nir_src_for_ssa(is_multisampled ? &sample_idx->dest.ssa : nir_imm_int(b, 0));
if (is_multisampled) {
tex->src[2].src_type = nir_tex_src_lod;
tex->src[2].src = nir_src_for_ssa(nir_imm_int(b, 0));
}
tex->dest_type = nir_type_uint;
tex->is_array = false;
tex->coord_components = is_3d ? 3 : 2;
@@ -473,7 +511,7 @@ build_nir_texel_fetch(struct nir_builder *b, struct radv_device *device,
static nir_ssa_def *
build_nir_buffer_fetch(struct nir_builder *b, struct radv_device *device,
nir_ssa_def *tex_pos, bool is_3d)
nir_ssa_def *tex_pos, bool is_3d, bool is_multisampled)
{
const struct glsl_type *sampler_type =
glsl_sampler_type(GLSL_SAMPLER_DIM_BUF, false, false, GLSL_TYPE_UINT);
@@ -519,9 +557,31 @@ static const VkPipelineVertexInputStateCreateInfo normal_vi_create_info = {
.vertexAttributeDescriptionCount = 0,
};
static void
build_nir_store_sample_mask(struct nir_builder *b)
{
nir_intrinsic_instr *sample_mask = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(sample_mask, 24);
nir_intrinsic_set_range(sample_mask, 4);
sample_mask->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
sample_mask->num_components = 1;
nir_ssa_dest_init(&sample_mask->instr, &sample_mask->dest, 1, 32, "sample_mask");
nir_builder_instr_insert(b, &sample_mask->instr);
const struct glsl_type *sample_mask_out_type = glsl_uint_type();
nir_variable *sample_mask_out =
nir_variable_create(b->shader, nir_var_shader_out,
sample_mask_out_type, "sample_mask_out");
sample_mask_out->data.location = FRAG_RESULT_SAMPLE_MASK;
nir_store_var(b, sample_mask_out, &sample_mask->dest.ssa, 0x1);
}
static nir_shader *
build_nir_copy_fragment_shader(struct radv_device *device,
texel_fetch_build_func txf_func, const char* name, bool is_3d)
texel_fetch_build_func txf_func, const char* name, bool is_3d,
bool is_multisampled)
{
const struct glsl_type *vec4 = glsl_vec4_type();
const struct glsl_type *vec2 = glsl_vector_type(GLSL_TYPE_FLOAT, 2);
@@ -538,11 +598,15 @@ build_nir_copy_fragment_shader(struct radv_device *device,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_DATA0;
if (is_multisampled) {
build_nir_store_sample_mask(&b);
}
nir_ssa_def *pos_int = nir_f2i32(&b, nir_load_var(&b, tex_pos_in));
unsigned swiz[4] = { 0, 1 };
nir_ssa_def *tex_pos = nir_swizzle(&b, pos_int, swiz, 2, false);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d, is_multisampled);
nir_store_var(&b, color_out, color, 0xf);
return b.shader;
@@ -550,7 +614,8 @@ build_nir_copy_fragment_shader(struct radv_device *device,
static nir_shader *
build_nir_copy_fragment_shader_depth(struct radv_device *device,
texel_fetch_build_func txf_func, const char* name, bool is_3d)
texel_fetch_build_func txf_func, const char* name, bool is_3d,
bool is_multisampled)
{
const struct glsl_type *vec4 = glsl_vec4_type();
const struct glsl_type *vec2 = glsl_vector_type(GLSL_TYPE_FLOAT, 2);
@@ -567,11 +632,15 @@ build_nir_copy_fragment_shader_depth(struct radv_device *device,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_DEPTH;
if (is_multisampled) {
build_nir_store_sample_mask(&b);
}
nir_ssa_def *pos_int = nir_f2i32(&b, nir_load_var(&b, tex_pos_in));
unsigned swiz[4] = { 0, 1 };
nir_ssa_def *tex_pos = nir_swizzle(&b, pos_int, swiz, 2, false);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d, is_multisampled);
nir_store_var(&b, color_out, color, 0x1);
return b.shader;
@@ -579,7 +648,8 @@ build_nir_copy_fragment_shader_depth(struct radv_device *device,
static nir_shader *
build_nir_copy_fragment_shader_stencil(struct radv_device *device,
texel_fetch_build_func txf_func, const char* name, bool is_3d)
texel_fetch_build_func txf_func, const char* name, bool is_3d,
bool is_multisampled)
{
const struct glsl_type *vec4 = glsl_vec4_type();
const struct glsl_type *vec2 = glsl_vector_type(GLSL_TYPE_FLOAT, 2);
@@ -596,11 +666,15 @@ build_nir_copy_fragment_shader_stencil(struct radv_device *device,
vec4, "f_color");
color_out->data.location = FRAG_RESULT_STENCIL;
if (is_multisampled) {
build_nir_store_sample_mask(&b);
}
nir_ssa_def *pos_int = nir_f2i32(&b, nir_load_var(&b, tex_pos_in));
unsigned swiz[4] = { 0, 1 };
nir_ssa_def *tex_pos = nir_swizzle(&b, pos_int, swiz, 2, false);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d);
nir_ssa_def *color = txf_func(&b, device, tex_pos, is_3d, is_multisampled);
nir_store_var(&b, color_out, color, 0x1);
return b.shader;
@@ -614,45 +688,48 @@ radv_device_finish_meta_blit2d_state(struct radv_device *device)
for(unsigned j = 0; j < NUM_META_FS_KEYS; ++j) {
for (unsigned k = 0; k < RADV_META_DST_LAYOUT_COUNT; ++k) {
radv_DestroyRenderPass(radv_device_to_handle(device),
state->blit2d.render_passes[j][k],
&state->alloc);
state->blit2d_render_passes[j][k],
&state->alloc);
}
}
for (enum radv_blit_ds_layout j = RADV_BLIT_DS_LAYOUT_TILE_ENABLE; j < RADV_BLIT_DS_LAYOUT_COUNT; j++) {
radv_DestroyRenderPass(radv_device_to_handle(device),
state->blit2d.depth_only_rp[j], &state->alloc);
state->blit2d_depth_only_rp[j], &state->alloc);
radv_DestroyRenderPass(radv_device_to_handle(device),
state->blit2d.stencil_only_rp[j], &state->alloc);
state->blit2d_stencil_only_rp[j], &state->alloc);
}
for (unsigned src = 0; src < BLIT2D_NUM_SRC_TYPES; src++) {
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->blit2d.p_layouts[src],
&state->alloc);
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
state->blit2d.ds_layouts[src],
&state->alloc);
for (unsigned log2_samples = 0; log2_samples < 1 + MAX_SAMPLES_LOG2; ++log2_samples) {
for (unsigned src = 0; src < BLIT2D_NUM_SRC_TYPES; src++) {
radv_DestroyPipelineLayout(radv_device_to_handle(device),
state->blit2d[log2_samples].p_layouts[src],
&state->alloc);
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
state->blit2d[log2_samples].ds_layouts[src],
&state->alloc);
for (unsigned j = 0; j < NUM_META_FS_KEYS; ++j) {
radv_DestroyPipeline(radv_device_to_handle(device),
state->blit2d[log2_samples].pipelines[src][j],
&state->alloc);
}
for (unsigned j = 0; j < NUM_META_FS_KEYS; ++j) {
radv_DestroyPipeline(radv_device_to_handle(device),
state->blit2d.pipelines[src][j],
state->blit2d[log2_samples].depth_only_pipeline[src],
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->blit2d[log2_samples].stencil_only_pipeline[src],
&state->alloc);
}
radv_DestroyPipeline(radv_device_to_handle(device),
state->blit2d.depth_only_pipeline[src],
&state->alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
state->blit2d.stencil_only_pipeline[src],
&state->alloc);
}
}
static VkResult
blit2d_init_color_pipeline(struct radv_device *device,
enum blit2d_src_type src_type,
VkFormat format)
VkFormat format,
uint32_t log2_samples)
{
VkResult result;
unsigned fs_key = radv_format_meta_fs_key(format);
@@ -681,7 +758,7 @@ blit2d_init_color_pipeline(struct radv_device *device,
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_nir_copy_fragment_shader(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D);
fs.nir = build_nir_copy_fragment_shader(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D, log2_samples > 0);
vi_create_info = &normal_vi_create_info;
struct radv_shader_module vs = {
@@ -705,7 +782,7 @@ blit2d_init_color_pipeline(struct radv_device *device,
};
for (unsigned dst_layout = 0; dst_layout < RADV_META_DST_LAYOUT_COUNT; ++dst_layout) {
if (!device->meta_state.blit2d.render_passes[fs_key][dst_layout]) {
if (!device->meta_state.blit2d_render_passes[fs_key][dst_layout]) {
VkImageLayout layout = radv_meta_dst_layout_to_layout(dst_layout);
result = radv_CreateRenderPass(radv_device_to_handle(device),
@@ -737,7 +814,7 @@ blit2d_init_color_pipeline(struct radv_device *device,
.pPreserveAttachments = (uint32_t[]) { 0 },
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d.render_passes[fs_key][dst_layout]);
}, &device->meta_state.alloc, &device->meta_state.blit2d_render_passes[fs_key][dst_layout]);
}
}
@@ -765,7 +842,7 @@ blit2d_init_color_pipeline(struct radv_device *device,
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.rasterizationSamples = 1 << log2_samples,
.sampleShadingEnable = false,
.pSampleMask = (VkSampleMask[]) { UINT32_MAX },
},
@@ -796,8 +873,8 @@ blit2d_init_color_pipeline(struct radv_device *device,
},
},
.flags = 0,
.layout = device->meta_state.blit2d.p_layouts[src_type],
.renderPass = device->meta_state.blit2d.render_passes[fs_key][0],
.layout = device->meta_state.blit2d[log2_samples].p_layouts[src_type],
.renderPass = device->meta_state.blit2d_render_passes[fs_key][0],
.subpass = 0,
};
@@ -809,7 +886,7 @@ blit2d_init_color_pipeline(struct radv_device *device,
radv_pipeline_cache_to_handle(&device->meta_state.cache),
&vk_pipeline_info, &radv_pipeline_info,
&device->meta_state.alloc,
&device->meta_state.blit2d.pipelines[src_type][fs_key]);
&device->meta_state.blit2d[log2_samples].pipelines[src_type][fs_key]);
ralloc_free(vs.nir);
@@ -820,7 +897,8 @@ blit2d_init_color_pipeline(struct radv_device *device,
static VkResult
blit2d_init_depth_only_pipeline(struct radv_device *device,
enum blit2d_src_type src_type)
enum blit2d_src_type src_type,
uint32_t log2_samples)
{
VkResult result;
const char *name;
@@ -847,7 +925,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo *vi_create_info;
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_nir_copy_fragment_shader_depth(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D);
fs.nir = build_nir_copy_fragment_shader_depth(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D, log2_samples > 0);
vi_create_info = &normal_vi_create_info;
struct radv_shader_module vs = {
@@ -871,7 +949,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
};
for (enum radv_blit_ds_layout ds_layout = RADV_BLIT_DS_LAYOUT_TILE_ENABLE; ds_layout < RADV_BLIT_DS_LAYOUT_COUNT; ds_layout++) {
if (!device->meta_state.blit2d.depth_only_rp[ds_layout]) {
if (!device->meta_state.blit2d_depth_only_rp[ds_layout]) {
VkImageLayout layout = radv_meta_blit_ds_to_layout(ds_layout);
result = radv_CreateRenderPass(radv_device_to_handle(device),
&(VkRenderPassCreateInfo) {
@@ -899,7 +977,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.pPreserveAttachments = (uint32_t[]) { 0 },
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d.depth_only_rp[ds_layout]);
}, &device->meta_state.alloc, &device->meta_state.blit2d_depth_only_rp[ds_layout]);
}
}
@@ -927,7 +1005,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.rasterizationSamples = 1 << log2_samples,
.sampleShadingEnable = false,
.pSampleMask = (VkSampleMask[]) { UINT32_MAX },
},
@@ -958,8 +1036,8 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
},
},
.flags = 0,
.layout = device->meta_state.blit2d.p_layouts[src_type],
.renderPass = device->meta_state.blit2d.depth_only_rp[0],
.layout = device->meta_state.blit2d[log2_samples].p_layouts[src_type],
.renderPass = device->meta_state.blit2d_depth_only_rp[0],
.subpass = 0,
};
@@ -971,7 +1049,7 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
radv_pipeline_cache_to_handle(&device->meta_state.cache),
&vk_pipeline_info, &radv_pipeline_info,
&device->meta_state.alloc,
&device->meta_state.blit2d.depth_only_pipeline[src_type]);
&device->meta_state.blit2d[log2_samples].depth_only_pipeline[src_type]);
ralloc_free(vs.nir);
@@ -982,7 +1060,8 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
static VkResult
blit2d_init_stencil_only_pipeline(struct radv_device *device,
enum blit2d_src_type src_type)
enum blit2d_src_type src_type,
uint32_t log2_samples)
{
VkResult result;
const char *name;
@@ -1009,7 +1088,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
const VkPipelineVertexInputStateCreateInfo *vi_create_info;
struct radv_shader_module fs = { .nir = NULL };
fs.nir = build_nir_copy_fragment_shader_stencil(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D);
fs.nir = build_nir_copy_fragment_shader_stencil(device, src_func, name, src_type == BLIT2D_SRC_TYPE_IMAGE_3D, log2_samples > 0);
vi_create_info = &normal_vi_create_info;
struct radv_shader_module vs = {
@@ -1033,7 +1112,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
};
for (enum radv_blit_ds_layout ds_layout = RADV_BLIT_DS_LAYOUT_TILE_ENABLE; ds_layout < RADV_BLIT_DS_LAYOUT_COUNT; ds_layout++) {
if (!device->meta_state.blit2d.stencil_only_rp[ds_layout]) {
if (!device->meta_state.blit2d_stencil_only_rp[ds_layout]) {
VkImageLayout layout = radv_meta_blit_ds_to_layout(ds_layout);
result = radv_CreateRenderPass(radv_device_to_handle(device),
&(VkRenderPassCreateInfo) {
@@ -1061,7 +1140,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.pPreserveAttachments = (uint32_t[]) { 0 },
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d.stencil_only_rp[ds_layout]);
}, &device->meta_state.alloc, &device->meta_state.blit2d_stencil_only_rp[ds_layout]);
}
}
@@ -1089,7 +1168,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
},
.pMultisampleState = &(VkPipelineMultisampleStateCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
.rasterizationSamples = 1,
.rasterizationSamples = 1 << log2_samples,
.sampleShadingEnable = false,
.pSampleMask = (VkSampleMask[]) { UINT32_MAX },
},
@@ -1136,8 +1215,8 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
},
},
.flags = 0,
.layout = device->meta_state.blit2d.p_layouts[src_type],
.renderPass = device->meta_state.blit2d.stencil_only_rp[0],
.layout = device->meta_state.blit2d[log2_samples].p_layouts[src_type],
.renderPass = device->meta_state.blit2d_stencil_only_rp[0],
.subpass = 0,
};
@@ -1149,7 +1228,7 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
radv_pipeline_cache_to_handle(&device->meta_state.cache),
&vk_pipeline_info, &radv_pipeline_info,
&device->meta_state.alloc,
&device->meta_state.blit2d.stencil_only_pipeline[src_type]);
&device->meta_state.blit2d[log2_samples].stencil_only_pipeline[src_type]);
ralloc_free(vs.nir);
@@ -1175,15 +1254,16 @@ static VkFormat pipeline_formats[] = {
static VkResult
meta_blit2d_create_pipe_layout(struct radv_device *device,
int idx)
int idx,
uint32_t log2_samples)
{
VkResult result;
VkDescriptorType desc_type = (idx == BLIT2D_SRC_TYPE_BUFFER) ? VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER : VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE;
const VkPushConstantRange push_constant_ranges[] = {
{VK_SHADER_STAGE_VERTEX_BIT, 0, 16},
{VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4},
{VK_SHADER_STAGE_FRAGMENT_BIT, 16, 12},
};
int num_push_constant_range = (idx != BLIT2D_SRC_TYPE_IMAGE) ? 2 : 1;
int num_push_constant_range = (idx != BLIT2D_SRC_TYPE_IMAGE || log2_samples > 0) ? 2 : 1;
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
&(VkDescriptorSetLayoutCreateInfo) {
@@ -1199,7 +1279,7 @@ meta_blit2d_create_pipe_layout(struct radv_device *device,
.pImmutableSamplers = NULL
},
}
}, &device->meta_state.alloc, &device->meta_state.blit2d.ds_layouts[idx]);
}, &device->meta_state.alloc, &device->meta_state.blit2d[log2_samples].ds_layouts[idx]);
if (result != VK_SUCCESS)
goto fail;
@@ -1207,11 +1287,11 @@ meta_blit2d_create_pipe_layout(struct radv_device *device,
&(VkPipelineLayoutCreateInfo) {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
.pSetLayouts = &device->meta_state.blit2d.ds_layouts[idx],
.pSetLayouts = &device->meta_state.blit2d[log2_samples].ds_layouts[idx],
.pushConstantRangeCount = num_push_constant_range,
.pPushConstantRanges = push_constant_ranges,
},
&device->meta_state.alloc, &device->meta_state.blit2d.p_layouts[idx]);
&device->meta_state.alloc, &device->meta_state.blit2d[log2_samples].p_layouts[idx]);
if (result != VK_SUCCESS)
goto fail;
return VK_SUCCESS;
@@ -1225,27 +1305,33 @@ radv_device_init_meta_blit2d_state(struct radv_device *device)
VkResult result;
bool create_3d = device->physical_device->rad_info.chip_class >= GFX9;
for (unsigned src = 0; src < BLIT2D_NUM_SRC_TYPES; src++) {
if (src == BLIT2D_SRC_TYPE_IMAGE_3D && !create_3d)
continue;
for (unsigned log2_samples = 0; log2_samples < 1 + MAX_SAMPLES_LOG2; log2_samples++) {
for (unsigned src = 0; src < BLIT2D_NUM_SRC_TYPES; src++) {
if (src == BLIT2D_SRC_TYPE_IMAGE_3D && !create_3d)
continue;
result = meta_blit2d_create_pipe_layout(device, src);
if (result != VK_SUCCESS)
goto fail;
/* Don't need to handle copies between buffers and multisample images. */
if (src == BLIT2D_SRC_TYPE_BUFFER && log2_samples > 0)
continue;
for (unsigned j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
result = blit2d_init_color_pipeline(device, src, pipeline_formats[j]);
result = meta_blit2d_create_pipe_layout(device, src, log2_samples);
if (result != VK_SUCCESS)
goto fail;
for (unsigned j = 0; j < ARRAY_SIZE(pipeline_formats); ++j) {
result = blit2d_init_color_pipeline(device, src, pipeline_formats[j], log2_samples);
if (result != VK_SUCCESS)
goto fail;
}
result = blit2d_init_depth_only_pipeline(device, src, log2_samples);
if (result != VK_SUCCESS)
goto fail;
result = blit2d_init_stencil_only_pipeline(device, src, log2_samples);
if (result != VK_SUCCESS)
goto fail;
}
result = blit2d_init_depth_only_pipeline(device, src);
if (result != VK_SUCCESS)
goto fail;
result = blit2d_init_stencil_only_pipeline(device, src);
if (result != VK_SUCCESS)
goto fail;
}
return VK_SUCCESS;

View File

@@ -645,7 +645,8 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
if (depth_view_can_fast_clear(cmd_buffer, iview, aspects,
subpass->depth_stencil_attachment.layout,
clear_rect, clear_value))
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
radv_set_ds_clear_metadata(cmd_buffer, iview->image,
clear_value, aspects);
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &(VkViewport) {
.x = clear_rect->rect.offset.x,
@@ -717,6 +718,14 @@ emit_fast_htile_clear(struct radv_cmd_buffer *cmd_buffer,
if ((clear_value.depth != 0.0 && clear_value.depth != 1.0) || !(aspects & VK_IMAGE_ASPECT_DEPTH_BIT))
goto fail;
/* GFX8 only supports 32-bit depth surfaces but we can enable TC-compat
* HTILE for 16-bit surfaces if no Z planes are compressed. Though,
* fast HTILE clears don't seem to work.
*/
if (cmd_buffer->device->physical_device->rad_info.chip_class == VI &&
iview->image->vk_format == VK_FORMAT_D16_UNORM)
goto fail;
if (vk_format_aspects(iview->image->vk_format) & VK_IMAGE_ASPECT_STENCIL_BIT) {
if (clear_value.stencil != 0 || !(aspects & VK_IMAGE_ASPECT_STENCIL_BIT))
goto fail;
@@ -736,7 +745,7 @@ emit_fast_htile_clear(struct radv_cmd_buffer *cmd_buffer,
iview->image->offset + iview->image->htile_offset,
iview->image->surface.htile_size, clear_word);
radv_set_depth_clear_regs(cmd_buffer, iview->image, clear_value, aspects);
radv_set_ds_clear_metadata(cmd_buffer, iview->image, clear_value, aspects);
if (post_flush) {
*post_flush |= flush_bits;
} else {
@@ -1011,8 +1020,6 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
if (iview->image->info.levels > 1)
goto fail;
if (iview->image->surface.is_linear)
goto fail;
if (!radv_image_extent_compare(iview->image, &iview->extent))
goto fail;
@@ -1035,7 +1042,7 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
goto fail;
/* DCC */
ret = radv_format_pack_clear_color(iview->image->vk_format,
ret = radv_format_pack_clear_color(iview->vk_format,
clear_color, &clear_value);
if (ret == false)
goto fail;
@@ -1056,7 +1063,7 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
bool can_avoid_fast_clear_elim;
bool need_decompress_pass = false;
vi_get_fast_clear_parameters(iview->image->vk_format,
vi_get_fast_clear_parameters(iview->vk_format,
&clear_value, &reset_value,
&can_avoid_fast_clear_elim);
@@ -1097,7 +1104,8 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
cmd_buffer->state.flush_bits |= flush_bits;
}
radv_set_color_clear_regs(cmd_buffer, iview->image, subpass_att, clear_color);
radv_set_color_clear_metadata(cmd_buffer, iview->image, subpass_att,
clear_color);
return true;
fail:

View File

@@ -72,6 +72,7 @@ vk_format_for_size(int bs)
case 2: return VK_FORMAT_R8G8_UINT;
case 4: return VK_FORMAT_R8G8B8A8_UINT;
case 8: return VK_FORMAT_R16G16B16A16_UINT;
case 12: return VK_FORMAT_R32G32B32_UINT;
case 16: return VK_FORMAT_R32G32B32A32_UINT;
default:
unreachable("Invalid format block size");
@@ -93,6 +94,8 @@ blit_surf_for_image_level_layer(struct radv_image *image,
!(radv_image_is_tc_compat_htile(image)))
format = vk_format_for_size(vk_format_get_blocksize(format));
format = vk_format_no_srgb(format);
return (struct radv_meta_blit2d_surf) {
.format = format,
.bs = vk_format_get_blocksize(format),
@@ -483,24 +486,3 @@ void radv_CmdCopyImage(
dest_image, destImageLayout,
regionCount, pRegions);
}
void radv_blit_to_prime_linear(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
struct radv_image *linear_image)
{
struct VkImageCopy image_copy = { 0 };
image_copy.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
image_copy.srcSubresource.layerCount = 1;
image_copy.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
image_copy.dstSubresource.layerCount = 1;
image_copy.extent.width = image->info.width;
image_copy.extent.height = image->info.height;
image_copy.extent.depth = 1;
meta_copy_image(cmd_buffer, image, VK_IMAGE_LAYOUT_GENERAL, linear_image,
VK_IMAGE_LAYOUT_GENERAL,
1, &image_copy);
}

View File

@@ -358,6 +358,8 @@ static void radv_pick_resolve_method_images(struct radv_image *src_image,
*method = RESOLVE_COMPUTE;
else if (vk_format_is_int(src_image->vk_format))
*method = RESOLVE_COMPUTE;
else if (src_image->info.array_size > 1)
*method = RESOLVE_COMPUTE;
if (radv_layout_dcc_compressed(dest_image, dest_image_layout, queue_mask)) {
*method = RESOLVE_FRAGMENT;
@@ -695,7 +697,7 @@ radv_decompress_resolve_subpass_src(struct radv_cmd_buffer *cmd_buffer)
VkImageResolve region = {};
region.srcSubresource.baseArrayLayer = 0;
region.srcSubresource.mipLevel = 0;
region.srcSubresource.layerCount = 1;
region.srcSubresource.layerCount = src_image->info.array_size;
radv_decompress_resolve_src(cmd_buffer, src_image,
src_att.layout, 1, &region);

View File

@@ -508,12 +508,48 @@ radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
emit_resolve(cmd_buffer,
src_iview,
dst_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
struct radv_image *src_image = src_iview->image;
struct radv_image *dst_image = dst_iview->image;
for (uint32_t layer = 0; layer < src_image->info.array_size; layer++) {
struct radv_image_view tsrc_iview;
radv_image_view_init(&tsrc_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(src_image),
.viewType = radv_meta_get_view_type(src_image),
.format = src_image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = src_iview->base_mip,
.levelCount = 1,
.baseArrayLayer = layer,
.layerCount = 1,
},
});
struct radv_image_view tdst_iview;
radv_image_view_init(&tdst_iview, cmd_buffer->device,
&(VkImageViewCreateInfo) {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(dst_image),
.viewType = radv_meta_get_view_type(dst_image),
.format = vk_to_non_srgb_format(dst_image->vk_format),
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = dst_iview->base_mip,
.levelCount = 1,
.baseArrayLayer = layer,
.layerCount = 1,
},
});
emit_resolve(cmd_buffer,
&tsrc_iview,
&tdst_iview,
&(VkOffset2D) { 0, 0 },
&(VkOffset2D) { 0, 0 },
&(VkExtent2D) { fb->width, fb->height });
}
}
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH |

View File

@@ -124,6 +124,98 @@ radv_shader_context_from_abi(struct ac_shader_abi *abi)
return container_of(abi, ctx, abi);
}
struct ac_build_if_state
{
struct radv_shader_context *ctx;
LLVMValueRef condition;
LLVMBasicBlockRef entry_block;
LLVMBasicBlockRef true_block;
LLVMBasicBlockRef false_block;
LLVMBasicBlockRef merge_block;
};
static LLVMBasicBlockRef
ac_build_insert_new_block(struct radv_shader_context *ctx, const char *name)
{
LLVMBasicBlockRef current_block;
LLVMBasicBlockRef next_block;
LLVMBasicBlockRef new_block;
/* get current basic block */
current_block = LLVMGetInsertBlock(ctx->ac.builder);
/* chqeck if there's another block after this one */
next_block = LLVMGetNextBasicBlock(current_block);
if (next_block) {
/* insert the new block before the next block */
new_block = LLVMInsertBasicBlockInContext(ctx->context, next_block, name);
}
else {
/* append new block after current block */
LLVMValueRef function = LLVMGetBasicBlockParent(current_block);
new_block = LLVMAppendBasicBlockInContext(ctx->context, function, name);
}
return new_block;
}
static void
ac_nir_build_if(struct ac_build_if_state *ifthen,
struct radv_shader_context *ctx,
LLVMValueRef condition)
{
LLVMBasicBlockRef block = LLVMGetInsertBlock(ctx->ac.builder);
memset(ifthen, 0, sizeof *ifthen);
ifthen->ctx = ctx;
ifthen->condition = condition;
ifthen->entry_block = block;
/* create endif/merge basic block for the phi functions */
ifthen->merge_block = ac_build_insert_new_block(ctx, "endif-block");
/* create/insert true_block before merge_block */
ifthen->true_block =
LLVMInsertBasicBlockInContext(ctx->context,
ifthen->merge_block,
"if-true-block");
/* successive code goes into the true block */
LLVMPositionBuilderAtEnd(ctx->ac.builder, ifthen->true_block);
}
/**
* End a conditional.
*/
static void
ac_nir_build_endif(struct ac_build_if_state *ifthen)
{
LLVMBuilderRef builder = ifthen->ctx->ac.builder;
/* Insert branch to the merge block from current block */
LLVMBuildBr(builder, ifthen->merge_block);
/*
* Now patch in the various branch instructions.
*/
/* Insert the conditional branch instruction at the end of entry_block */
LLVMPositionBuilderAtEnd(builder, ifthen->entry_block);
if (ifthen->false_block) {
/* we have an else clause */
LLVMBuildCondBr(builder, ifthen->condition,
ifthen->true_block, ifthen->false_block);
}
else {
/* no else clause */
LLVMBuildCondBr(builder, ifthen->condition,
ifthen->true_block, ifthen->merge_block);
}
/* Resume building code at end of the ifthen->merge_block */
LLVMPositionBuilderAtEnd(builder, ifthen->merge_block);
}
static LLVMValueRef get_rel_patch_id(struct radv_shader_context *ctx)
{
switch (ctx->stage) {
@@ -388,7 +480,7 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef module,
unsigned num_return_elems,
struct arg_info *args,
unsigned max_workgroup_size,
bool unsafe_math)
const struct radv_nir_compiler_options *options)
{
LLVMTypeRef main_function_type, ret_type;
LLVMBasicBlockRef main_function_body;
@@ -419,12 +511,18 @@ create_llvm_function(LLVMContextRef ctx, LLVMModuleRef module,
}
}
if (options->address32_hi) {
ac_llvm_add_target_dep_function_attr(main_function,
"amdgpu-32bit-address-high-bits",
options->address32_hi);
}
if (max_workgroup_size) {
ac_llvm_add_target_dep_function_attr(main_function,
"amdgpu-max-work-group-size",
max_workgroup_size);
}
if (unsafe_math) {
if (options->unsafe_math) {
/* These were copied from some LLVM test. */
LLVMAddTargetDependentFunctionAttr(main_function,
"less-precise-fpmad",
@@ -468,6 +566,15 @@ set_loc_shader(struct radv_shader_context *ctx, int idx, uint8_t *sgpr_idx,
set_loc(ud_info, sgpr_idx, num_sgprs, 0);
}
static void
set_loc_shader_ptr(struct radv_shader_context *ctx, int idx, uint8_t *sgpr_idx)
{
bool use_32bit_pointers = HAVE_32BIT_POINTERS &&
idx != AC_UD_SCRATCH_RING_OFFSETS;
set_loc_shader(ctx, idx, sgpr_idx, use_32bit_pointers ? 1 : 2);
}
static void
set_loc_desc(struct radv_shader_context *ctx, int idx, uint8_t *sgpr_idx,
uint32_t indirect_offset)
@@ -476,12 +583,11 @@ set_loc_desc(struct radv_shader_context *ctx, int idx, uint8_t *sgpr_idx,
&ctx->shader_info->user_sgprs_locs.descriptor_sets[idx];
assert(ud_info);
set_loc(ud_info, sgpr_idx, 2, indirect_offset);
set_loc(ud_info, sgpr_idx, HAVE_32BIT_POINTERS ? 1 : 2, indirect_offset);
}
struct user_sgpr_info {
bool need_ring_offsets;
uint8_t sgpr_count;
bool indirect_all_descriptor_sets;
};
@@ -514,7 +620,8 @@ count_vs_user_sgprs(struct radv_shader_context *ctx)
{
uint8_t count = 0;
count += ctx->shader_info->info.vs.has_vertex_buffers ? 2 : 0;
if (ctx->shader_info->info.vs.has_vertex_buffers)
count += HAVE_32BIT_POINTERS ? 1 : 2;
count += ctx->shader_info->info.vs.needs_draw_id ? 3 : 2;
return count;
@@ -527,6 +634,8 @@ static void allocate_user_sgprs(struct radv_shader_context *ctx,
bool needs_view_index,
struct user_sgpr_info *user_sgpr_info)
{
uint8_t user_sgpr_count = 0;
memset(user_sgpr_info, 0, sizeof(struct user_sgpr_info));
/* until we sort out scratch/global buffers always assign ring offsets for gs/vs/es */
@@ -543,25 +652,25 @@ static void allocate_user_sgprs(struct radv_shader_context *ctx,
/* 2 user sgprs will nearly always be allocated for scratch/rings */
if (ctx->options->supports_spill || user_sgpr_info->need_ring_offsets) {
user_sgpr_info->sgpr_count += 2;
user_sgpr_count += 2;
}
switch (stage) {
case MESA_SHADER_COMPUTE:
if (ctx->shader_info->info.cs.uses_grid_size)
user_sgpr_info->sgpr_count += 3;
user_sgpr_count += 3;
break;
case MESA_SHADER_FRAGMENT:
user_sgpr_info->sgpr_count += ctx->shader_info->info.ps.needs_sample_positions;
user_sgpr_count += ctx->shader_info->info.ps.needs_sample_positions;
break;
case MESA_SHADER_VERTEX:
if (!ctx->is_gs_copy_shader)
user_sgpr_info->sgpr_count += count_vs_user_sgprs(ctx);
user_sgpr_count += count_vs_user_sgprs(ctx);
break;
case MESA_SHADER_TESS_CTRL:
if (has_previous_stage) {
if (previous_stage == MESA_SHADER_VERTEX)
user_sgpr_info->sgpr_count += count_vs_user_sgprs(ctx);
user_sgpr_count += count_vs_user_sgprs(ctx);
}
break;
case MESA_SHADER_TESS_EVAL:
@@ -569,7 +678,7 @@ static void allocate_user_sgprs(struct radv_shader_context *ctx,
case MESA_SHADER_GEOMETRY:
if (has_previous_stage) {
if (previous_stage == MESA_SHADER_VERTEX) {
user_sgpr_info->sgpr_count += count_vs_user_sgprs(ctx);
user_sgpr_count += count_vs_user_sgprs(ctx);
}
}
break;
@@ -578,19 +687,18 @@ static void allocate_user_sgprs(struct radv_shader_context *ctx,
}
if (needs_view_index)
user_sgpr_info->sgpr_count++;
user_sgpr_count++;
if (ctx->shader_info->info.loads_push_constants)
user_sgpr_info->sgpr_count += 2;
user_sgpr_count += HAVE_32BIT_POINTERS ? 1 : 2;
uint32_t available_sgprs = ctx->options->chip_class >= GFX9 ? 32 : 16;
uint32_t remaining_sgprs = available_sgprs - user_sgpr_info->sgpr_count;
uint32_t remaining_sgprs = available_sgprs - user_sgpr_count;
uint32_t num_desc_set =
util_bitcount(ctx->shader_info->info.desc_set_used_mask);
if (remaining_sgprs / 2 < util_bitcount(ctx->shader_info->info.desc_set_used_mask)) {
user_sgpr_info->sgpr_count += 2;
if (remaining_sgprs / (HAVE_32BIT_POINTERS ? 1 : 2) < num_desc_set) {
user_sgpr_info->indirect_all_descriptor_sets = true;
} else {
user_sgpr_info->sgpr_count += util_bitcount(ctx->shader_info->info.desc_set_used_mask) * 2;
}
}
@@ -603,7 +711,7 @@ declare_global_input_sgprs(struct radv_shader_context *ctx,
struct arg_info *args,
LLVMValueRef *desc_sets)
{
LLVMTypeRef type = ac_array_in_const_addr_space(ctx->ac.i8);
LLVMTypeRef type = ac_array_in_const32_addr_space(ctx->ac.i8);
unsigned num_sets = ctx->options->layout ?
ctx->options->layout->num_sets : 0;
unsigned stage_mask = 1 << stage;
@@ -621,7 +729,7 @@ declare_global_input_sgprs(struct radv_shader_context *ctx,
}
}
} else {
add_array_arg(args, ac_array_in_const_addr_space(type), desc_sets);
add_array_arg(args, ac_array_in_const32_addr_space(type), desc_sets);
}
if (ctx->shader_info->info.loads_push_constants) {
@@ -641,7 +749,8 @@ declare_vs_specific_input_sgprs(struct radv_shader_context *ctx,
(stage == MESA_SHADER_VERTEX ||
(has_previous_stage && previous_stage == MESA_SHADER_VERTEX))) {
if (ctx->shader_info->info.vs.has_vertex_buffers) {
add_arg(args, ARG_SGPR, ac_array_in_const_addr_space(ctx->ac.v4i32),
add_arg(args, ARG_SGPR,
ac_array_in_const32_addr_space(ctx->ac.v4i32),
&ctx->vertex_buffers);
}
add_arg(args, ARG_SGPR, ctx->ac.i32, &ctx->abi.base_vertex);
@@ -699,8 +808,8 @@ set_global_input_locs(struct radv_shader_context *ctx, gl_shader_stage stage,
ctx->descriptor_sets[i] = NULL;
}
} else {
set_loc_shader(ctx, AC_UD_INDIRECT_DESCRIPTOR_SETS,
user_sgpr_idx, 2);
set_loc_shader_ptr(ctx, AC_UD_INDIRECT_DESCRIPTOR_SETS,
user_sgpr_idx);
for (unsigned i = 0; i < num_sets; ++i) {
if ((ctx->shader_info->info.desc_set_used_mask & (1 << i)) &&
@@ -718,7 +827,7 @@ set_global_input_locs(struct radv_shader_context *ctx, gl_shader_stage stage,
}
if (ctx->shader_info->info.loads_push_constants) {
set_loc_shader(ctx, AC_UD_PUSH_CONSTANTS, user_sgpr_idx, 2);
set_loc_shader_ptr(ctx, AC_UD_PUSH_CONSTANTS, user_sgpr_idx);
}
}
@@ -732,8 +841,8 @@ set_vs_specific_input_locs(struct radv_shader_context *ctx,
(stage == MESA_SHADER_VERTEX ||
(has_previous_stage && previous_stage == MESA_SHADER_VERTEX))) {
if (ctx->shader_info->info.vs.has_vertex_buffers) {
set_loc_shader(ctx, AC_UD_VS_VERTEX_BUFFERS,
user_sgpr_idx, 2);
set_loc_shader_ptr(ctx, AC_UD_VS_VERTEX_BUFFERS,
user_sgpr_idx);
}
unsigned vs_num = 2;
@@ -759,7 +868,7 @@ static void set_llvm_calling_convention(LLVMValueRef func,
calling_conv = RADEON_LLVM_AMDGPU_GS;
break;
case MESA_SHADER_TESS_CTRL:
calling_conv = HAVE_LLVM >= 0x0500 ? RADEON_LLVM_AMDGPU_HS : RADEON_LLVM_AMDGPU_VS;
calling_conv = RADEON_LLVM_AMDGPU_HS;
break;
case MESA_SHADER_FRAGMENT:
calling_conv = RADEON_LLVM_AMDGPU_PS;
@@ -1014,8 +1123,7 @@ static void create_function(struct radv_shader_context *ctx,
ctx->main_function = create_llvm_function(
ctx->context, ctx->ac.module, ctx->ac.builder, NULL, 0, &args,
ctx->max_workgroup_size,
ctx->options->unsafe_math);
ctx->max_workgroup_size, ctx->options);
set_llvm_calling_convention(ctx->main_function, stage);
@@ -1032,8 +1140,8 @@ static void create_function(struct radv_shader_context *ctx,
user_sgpr_idx = 0;
if (ctx->options->supports_spill || user_sgpr_info.need_ring_offsets) {
set_loc_shader(ctx, AC_UD_SCRATCH_RING_OFFSETS,
&user_sgpr_idx, 2);
set_loc_shader_ptr(ctx, AC_UD_SCRATCH_RING_OFFSETS,
&user_sgpr_idx);
if (ctx->options->supports_spill) {
ctx->ring_offsets = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.implicit.buffer.ptr",
LLVMPointerType(ctx->ac.i8, AC_CONST_ADDR_SPACE),
@@ -1592,6 +1700,8 @@ visit_emit_vertex(struct ac_shader_abi *abi, unsigned stream, LLVMValueRef *addr
/* loop num outputs */
idx = 0;
for (unsigned i = 0; i < AC_LLVM_MAX_OUTPUTS; ++i) {
unsigned output_usage_mask =
ctx->shader_info->info.gs.output_usage_mask[i];
LLVMValueRef *out_ptr = &addrs[i * 4];
int length = 4;
int slot = idx;
@@ -1605,8 +1715,13 @@ visit_emit_vertex(struct ac_shader_abi *abi, unsigned stream, LLVMValueRef *addr
length = ctx->num_output_clips + ctx->num_output_culls;
if (length > 4)
slot_inc = 2;
output_usage_mask = (1 << length) - 1;
}
for (unsigned j = 0; j < length; j++) {
if (!(output_usage_mask & (1 << j)))
continue;
LLVMValueRef out_val = LLVMBuildLoad(ctx->ac.builder,
out_ptr[j], "");
LLVMValueRef voffset = LLVMConstInt(ctx->ac.i32, (slot * 4 + j) * ctx->gs_max_out_vertices, false);
@@ -1768,11 +1883,53 @@ static LLVMValueRef radv_get_sampler_desc(struct ac_shader_abi *abi,
index = LLVMBuildMul(builder, index, LLVMConstInt(ctx->ac.i32, stride / type_size, 0), "");
list = ac_build_gep0(&ctx->ac, list, LLVMConstInt(ctx->ac.i32, offset, 0));
list = LLVMBuildPointerCast(builder, list, ac_array_in_const_addr_space(type), "");
list = LLVMBuildPointerCast(builder, list,
ac_array_in_const32_addr_space(type), "");
return ac_build_load_to_sgpr(&ctx->ac, list, index);
}
/* For 2_10_10_10 formats the alpha is handled as unsigned by pre-vega HW.
* so we may need to fix it up. */
static LLVMValueRef
adjust_vertex_fetch_alpha(struct radv_shader_context *ctx,
unsigned adjustment,
LLVMValueRef alpha)
{
if (adjustment == RADV_ALPHA_ADJUST_NONE)
return alpha;
LLVMValueRef c30 = LLVMConstInt(ctx->ac.i32, 30, 0);
if (adjustment == RADV_ALPHA_ADJUST_SSCALED)
alpha = LLVMBuildFPToUI(ctx->ac.builder, alpha, ctx->ac.i32, "");
else
alpha = ac_to_integer(&ctx->ac, alpha);
/* For the integer-like cases, do a natural sign extension.
*
* For the SNORM case, the values are 0.0, 0.333, 0.666, 1.0
* and happen to contain 0, 1, 2, 3 as the two LSBs of the
* exponent.
*/
alpha = LLVMBuildShl(ctx->ac.builder, alpha,
adjustment == RADV_ALPHA_ADJUST_SNORM ?
LLVMConstInt(ctx->ac.i32, 7, 0) : c30, "");
alpha = LLVMBuildAShr(ctx->ac.builder, alpha, c30, "");
/* Convert back to the right type. */
if (adjustment == RADV_ALPHA_ADJUST_SNORM) {
LLVMValueRef clamp;
LLVMValueRef neg_one = LLVMConstReal(ctx->ac.f32, -1.0);
alpha = LLVMBuildSIToFP(ctx->ac.builder, alpha, ctx->ac.f32, "");
clamp = LLVMBuildFCmp(ctx->ac.builder, LLVMRealULT, alpha, neg_one, "");
alpha = LLVMBuildSelect(ctx->ac.builder, clamp, neg_one, alpha, "");
} else if (adjustment == RADV_ALPHA_ADJUST_SSCALED) {
alpha = LLVMBuildSIToFP(ctx->ac.builder, alpha, ctx->ac.f32, "");
}
return alpha;
}
static void
handle_vs_input_decl(struct radv_shader_context *ctx,
@@ -1783,18 +1940,19 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
LLVMValueRef t_list;
LLVMValueRef input;
LLVMValueRef buffer_index;
int index = variable->data.location - VERT_ATTRIB_GENERIC0;
int idx = variable->data.location;
unsigned attrib_count = glsl_count_attribute_slots(variable->type, true);
uint8_t input_usage_mask =
ctx->shader_info->info.vs.input_usage_mask[variable->data.location];
unsigned num_channels = util_last_bit(input_usage_mask);
variable->data.driver_location = idx * 4;
variable->data.driver_location = variable->data.location * 4;
for (unsigned i = 0; i < attrib_count; ++i, ++idx) {
if (ctx->options->key.vs.instance_rate_inputs & (1u << (index + i))) {
uint32_t divisor = ctx->options->key.vs.instance_rate_divisors[index + i];
for (unsigned i = 0; i < attrib_count; ++i) {
LLVMValueRef output[4];
unsigned attrib_index = variable->data.location + i - VERT_ATTRIB_GENERIC0;
if (ctx->options->key.vs.instance_rate_inputs & (1u << attrib_index)) {
uint32_t divisor = ctx->options->key.vs.instance_rate_divisors[attrib_index];
if (divisor) {
buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.instance_id,
@@ -1818,7 +1976,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
} else
buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.vertex_id,
ctx->abi.base_vertex, "");
t_offset = LLVMConstInt(ctx->ac.i32, index + i, false);
t_offset = LLVMConstInt(ctx->ac.i32, attrib_index, false);
t_list = ac_build_load_to_sgpr(&ctx->ac, t_list_ptr, t_offset);
@@ -1831,9 +1989,15 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
for (unsigned chan = 0; chan < 4; chan++) {
LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, false);
ctx->inputs[ac_llvm_reg_index_soa(idx, chan)] =
ac_to_integer(&ctx->ac, LLVMBuildExtractElement(ctx->ac.builder,
input, llvm_chan, ""));
output[chan] = LLVMBuildExtractElement(ctx->ac.builder, input, llvm_chan, "");
}
unsigned alpha_adjust = (ctx->options->key.vs.alpha_adjust >> (attrib_index * 2)) & 3;
output[3] = adjust_vertex_fetch_alpha(ctx, alpha_adjust, output[3]);
for (unsigned chan = 0; chan < 4; chan++) {
ctx->inputs[ac_llvm_reg_index_soa(variable->data.location + i, chan)] =
ac_to_integer(&ctx->ac, output[chan]);
}
}
}
@@ -1929,9 +2093,6 @@ static void
prepare_interp_optimize(struct radv_shader_context *ctx,
struct nir_shader *nir)
{
if (!ctx->options->key.fs.multisample)
return;
bool uses_center = false;
bool uses_centroid = false;
nir_foreach_variable(variable, &nir->inputs) {
@@ -2353,10 +2514,9 @@ handle_vs_outputs_post(struct radv_shader_context *ctx,
output_usage_mask =
ctx->shader_info->info.tes.output_usage_mask[i];
} else {
/* Enable all channels for the GS copy shader because
* we don't know the output usage mask currently.
*/
output_usage_mask = 0xf;
assert(ctx->is_gs_copy_shader);
output_usage_mask =
ctx->shader_info->info.gs.output_usage_mask[i];
}
radv_export_param(ctx, param_count, values, output_usage_mask);
@@ -2436,14 +2596,26 @@ handle_es_outputs_post(struct radv_shader_context *ctx,
for (unsigned i = 0; i < AC_LLVM_MAX_OUTPUTS; ++i) {
LLVMValueRef dw_addr = NULL;
LLVMValueRef *out_ptr = &ctx->abi.outputs[i * 4];
unsigned output_usage_mask;
int param_index;
int length = 4;
if (!(ctx->output_mask & (1ull << i)))
continue;
if (i == VARYING_SLOT_CLIP_DIST0)
if (ctx->stage == MESA_SHADER_VERTEX) {
output_usage_mask =
ctx->shader_info->info.vs.output_usage_mask[i];
} else {
assert(ctx->stage == MESA_SHADER_TESS_EVAL);
output_usage_mask =
ctx->shader_info->info.tes.output_usage_mask[i];
}
if (i == VARYING_SLOT_CLIP_DIST0) {
length = ctx->num_output_clips + ctx->num_output_culls;
output_usage_mask = (1 << length) - 1;
}
param_index = shader_io_get_unique_index(i);
@@ -2452,14 +2624,22 @@ handle_es_outputs_post(struct radv_shader_context *ctx,
LLVMConstInt(ctx->ac.i32, param_index * 4, false),
"");
}
for (j = 0; j < length; j++) {
if (!(output_usage_mask & (1 << j)))
continue;
LLVMValueRef out_val = LLVMBuildLoad(ctx->ac.builder, out_ptr[j], "");
out_val = LLVMBuildBitCast(ctx->ac.builder, out_val, ctx->ac.i32, "");
if (ctx->ac.chip_class >= GFX9) {
ac_lds_store(&ctx->ac, dw_addr,
LLVMValueRef dw_addr_offset =
LLVMBuildAdd(ctx->ac.builder, dw_addr,
LLVMConstInt(ctx->ac.i32,
j, false), "");
ac_lds_store(&ctx->ac, dw_addr_offset,
LLVMBuildLoad(ctx->ac.builder, out_ptr[j], ""));
dw_addr = LLVMBuildAdd(ctx->ac.builder, dw_addr, ctx->ac.i32_1, "");
} else {
ac_build_buffer_store_dword(&ctx->ac,
ctx->esgs_ring,
@@ -2502,97 +2682,6 @@ handle_ls_outputs_post(struct radv_shader_context *ctx)
}
}
struct ac_build_if_state
{
struct radv_shader_context *ctx;
LLVMValueRef condition;
LLVMBasicBlockRef entry_block;
LLVMBasicBlockRef true_block;
LLVMBasicBlockRef false_block;
LLVMBasicBlockRef merge_block;
};
static LLVMBasicBlockRef
ac_build_insert_new_block(struct radv_shader_context *ctx, const char *name)
{
LLVMBasicBlockRef current_block;
LLVMBasicBlockRef next_block;
LLVMBasicBlockRef new_block;
/* get current basic block */
current_block = LLVMGetInsertBlock(ctx->ac.builder);
/* chqeck if there's another block after this one */
next_block = LLVMGetNextBasicBlock(current_block);
if (next_block) {
/* insert the new block before the next block */
new_block = LLVMInsertBasicBlockInContext(ctx->context, next_block, name);
}
else {
/* append new block after current block */
LLVMValueRef function = LLVMGetBasicBlockParent(current_block);
new_block = LLVMAppendBasicBlockInContext(ctx->context, function, name);
}
return new_block;
}
static void
ac_nir_build_if(struct ac_build_if_state *ifthen,
struct radv_shader_context *ctx,
LLVMValueRef condition)
{
LLVMBasicBlockRef block = LLVMGetInsertBlock(ctx->ac.builder);
memset(ifthen, 0, sizeof *ifthen);
ifthen->ctx = ctx;
ifthen->condition = condition;
ifthen->entry_block = block;
/* create endif/merge basic block for the phi functions */
ifthen->merge_block = ac_build_insert_new_block(ctx, "endif-block");
/* create/insert true_block before merge_block */
ifthen->true_block =
LLVMInsertBasicBlockInContext(ctx->context,
ifthen->merge_block,
"if-true-block");
/* successive code goes into the true block */
LLVMPositionBuilderAtEnd(ctx->ac.builder, ifthen->true_block);
}
/**
* End a conditional.
*/
static void
ac_nir_build_endif(struct ac_build_if_state *ifthen)
{
LLVMBuilderRef builder = ifthen->ctx->ac.builder;
/* Insert branch to the merge block from current block */
LLVMBuildBr(builder, ifthen->merge_block);
/*
* Now patch in the various branch instructions.
*/
/* Insert the conditional branch instruction at the end of entry_block */
LLVMPositionBuilderAtEnd(builder, ifthen->entry_block);
if (ifthen->false_block) {
/* we have an else clause */
LLVMBuildCondBr(builder, ifthen->condition,
ifthen->true_block, ifthen->false_block);
}
else {
/* no else clause */
LLVMBuildCondBr(builder, ifthen->condition,
ifthen->true_block, ifthen->merge_block);
}
/* Resume building code at end of the ifthen->merge_block */
LLVMPositionBuilderAtEnd(builder, ifthen->merge_block);
}
static void
write_tess_factors(struct radv_shader_context *ctx)
{
@@ -2647,7 +2736,7 @@ write_tess_factors(struct radv_shader_context *ctx)
outer[i] = LLVMGetUndef(ctx->ac.i32);
}
// LINES reverseal
// LINES reversal
if (ctx->options->key.tcs.primitive_mode == GL_ISOLINES) {
outer[0] = out[1] = ac_lds_load(&ctx->ac, lds_outer);
lds_outer = LLVMBuildAdd(ctx->ac.builder, lds_outer,
@@ -2878,13 +2967,17 @@ handle_shader_outputs_post(struct ac_shader_abi *abi, unsigned max_outputs,
}
}
static void ac_llvm_finalize_module(struct radv_shader_context *ctx)
static void ac_llvm_finalize_module(struct radv_shader_context *ctx,
const struct radv_nir_compiler_options *options)
{
LLVMPassManagerRef passmgr;
/* Create the pass manager */
passmgr = LLVMCreateFunctionPassManagerForModule(
ctx->ac.module);
if (options->check_ir)
LLVMAddVerifierPass(passmgr);
/* This pass should eliminate all the load and store instructions */
LLVMAddPromoteMemoryToRegisterPass(passmgr);
@@ -2893,6 +2986,8 @@ static void ac_llvm_finalize_module(struct radv_shader_context *ctx)
LLVMAddLICMPass(passmgr);
LLVMAddAggressiveDCEPass(passmgr);
LLVMAddCFGSimplificationPass(passmgr);
/* This is recommended by the instruction combining pass. */
LLVMAddEarlyCSEMemSSAPass(passmgr);
LLVMAddInstructionCombiningPass(passmgr);
/* Run the pass */
@@ -2942,9 +3037,16 @@ ac_nir_eliminate_const_vs_outputs(struct radv_shader_context *ctx)
static void
ac_setup_rings(struct radv_shader_context *ctx)
{
if ((ctx->stage == MESA_SHADER_VERTEX && ctx->options->key.vs.as_es) ||
(ctx->stage == MESA_SHADER_TESS_EVAL && ctx->options->key.tes.as_es)) {
ctx->esgs_ring = ac_build_load_to_sgpr(&ctx->ac, ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_ESGS_VS, false));
if (ctx->options->chip_class <= VI &&
(ctx->stage == MESA_SHADER_GEOMETRY ||
ctx->options->key.vs.as_es || ctx->options->key.tes.as_es)) {
unsigned ring = ctx->stage == MESA_SHADER_GEOMETRY ? RING_ESGS_GS
: RING_ESGS_VS;
LLVMValueRef offset = LLVMConstInt(ctx->ac.i32, ring, false);
ctx->esgs_ring = ac_build_load_to_sgpr(&ctx->ac,
ctx->ring_offsets,
offset);
}
if (ctx->is_gs_copy_shader) {
@@ -2955,7 +3057,6 @@ ac_setup_rings(struct radv_shader_context *ctx)
uint32_t num_entries = 64;
LLVMValueRef gsvs_ring_stride = LLVMConstInt(ctx->ac.i32, ctx->max_gsvs_emit_size, false);
LLVMValueRef gsvs_ring_desc = LLVMConstInt(ctx->ac.i32, ctx->max_gsvs_emit_size << 16, false);
ctx->esgs_ring = ac_build_load_to_sgpr(&ctx->ac, ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_ESGS_GS, false));
ctx->gsvs_ring = ac_build_load_to_sgpr(&ctx->ac, ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_GSVS_GS, false));
ctx->gsvs_ring = LLVMBuildBitCast(ctx->ac.builder, ctx->gsvs_ring, ctx->ac.v4i32, "");
@@ -3006,7 +3107,6 @@ static void ac_nir_fixup_ls_hs_input_vgprs(struct radv_shader_context *ctx)
LLVMValueRef hs_empty = LLVMBuildICmp(ctx->ac.builder, LLVMIntEQ, count,
ctx->ac.i32_0, "");
ctx->abi.instance_id = LLVMBuildSelect(ctx->ac.builder, hs_empty, ctx->rel_auto_id, ctx->abi.instance_id, "");
ctx->vs_prim_id = LLVMBuildSelect(ctx->ac.builder, hs_empty, ctx->abi.vertex_id, ctx->vs_prim_id, "");
ctx->rel_auto_id = LLVMBuildSelect(ctx->ac.builder, hs_empty, ctx->abi.tcs_rel_ids, ctx->rel_auto_id, "");
ctx->abi.vertex_id = LLVMBuildSelect(ctx->ac.builder, hs_empty, ctx->abi.tcs_patch_id, ctx->abi.vertex_id, "");
}
@@ -3202,7 +3302,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
if (options->dump_preoptir)
ac_dump_module(ctx.ac.module);
ac_llvm_finalize_module(&ctx);
ac_llvm_finalize_module(&ctx, options);
if (shader_count == 1)
ac_nir_eliminate_const_vs_outputs(&ctx);
@@ -3500,6 +3600,8 @@ radv_compile_gs_copy_shader(LLVMTargetMachineRef tm,
ctx.ac.builder = ac_create_builder(ctx.context, float_mode);
ctx.stage = MESA_SHADER_VERTEX;
radv_nir_shader_info_pass(geom_shader, options, &shader_info->info);
create_function(&ctx, MESA_SHADER_VERTEX, false, MESA_SHADER_VERTEX);
ctx.gs_max_out_vertices = geom_shader->info.gs.vertices_out;
@@ -3518,7 +3620,7 @@ radv_compile_gs_copy_shader(LLVMTargetMachineRef tm,
LLVMBuildRetVoid(ctx.ac.builder);
ac_llvm_finalize_module(&ctx);
ac_llvm_finalize_module(&ctx, options);
ac_compile_llvm_module(tm, ctx.ac.module, binary, config, shader_info,
MESA_SHADER_VERTEX, options);

View File

@@ -50,7 +50,7 @@ VkResult radv_CreateRenderPass(
pass = vk_alloc2(&device->alloc, pAllocator, size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pass == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
memset(pass, 0, size);
pass->attachment_count = pCreateInfo->attachmentCount;
@@ -87,8 +87,8 @@ VkResult radv_CreateRenderPass(
subpass_attachment_count +=
desc->inputAttachmentCount +
desc->colorAttachmentCount +
/* Count colorAttachmentCount again for resolve_attachments */
desc->colorAttachmentCount;
(desc->pResolveAttachments ? desc->colorAttachmentCount : 0) +
(desc->pDepthStencilAttachment != NULL);
}
if (subpass_attachment_count) {
@@ -98,7 +98,7 @@ VkResult radv_CreateRenderPass(
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pass->subpass_attachments == NULL) {
vk_free2(&device->alloc, pAllocator, pass);
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
}
} else
pass->subpass_attachments = NULL;

View File

@@ -174,13 +174,54 @@ radv_pipeline_scratch_init(struct radv_device *device,
if (scratch_bytes_per_wave && max_waves < min_waves) {
/* Not really true at this moment, but will be true on first
* execution. Avoid having hanging shaders. */
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
pipeline->scratch_bytes_per_wave = scratch_bytes_per_wave;
pipeline->max_waves = max_waves;
return VK_SUCCESS;
}
static uint32_t si_translate_blend_logic_op(VkLogicOp op)
{
switch (op) {
case VK_LOGIC_OP_CLEAR:
return V_028808_ROP3_CLEAR;
case VK_LOGIC_OP_AND:
return V_028808_ROP3_AND;
case VK_LOGIC_OP_AND_REVERSE:
return V_028808_ROP3_AND_REVERSE;
case VK_LOGIC_OP_COPY:
return V_028808_ROP3_COPY;
case VK_LOGIC_OP_AND_INVERTED:
return V_028808_ROP3_AND_INVERTED;
case VK_LOGIC_OP_NO_OP:
return V_028808_ROP3_NO_OP;
case VK_LOGIC_OP_XOR:
return V_028808_ROP3_XOR;
case VK_LOGIC_OP_OR:
return V_028808_ROP3_OR;
case VK_LOGIC_OP_NOR:
return V_028808_ROP3_NOR;
case VK_LOGIC_OP_EQUIVALENT:
return V_028808_ROP3_EQUIVALENT;
case VK_LOGIC_OP_INVERT:
return V_028808_ROP3_INVERT;
case VK_LOGIC_OP_OR_REVERSE:
return V_028808_ROP3_OR_REVERSE;
case VK_LOGIC_OP_COPY_INVERTED:
return V_028808_ROP3_COPY_INVERTED;
case VK_LOGIC_OP_OR_INVERTED:
return V_028808_ROP3_OR_INVERTED;
case VK_LOGIC_OP_NAND:
return V_028808_ROP3_NAND;
case VK_LOGIC_OP_SET:
return V_028808_ROP3_SET;
default:
unreachable("Unhandled logic op");
}
}
static uint32_t si_translate_blend_function(VkBlendOp op)
{
switch (op) {
@@ -463,6 +504,7 @@ radv_pipeline_compute_spi_color_formats(struct radv_pipeline *pipeline,
RADV_FROM_HANDLE(radv_render_pass, pass, pCreateInfo->renderPass);
struct radv_subpass *subpass = pass->subpasses + pCreateInfo->subpass;
unsigned col_format = 0;
unsigned num_targets;
for (unsigned i = 0; i < (blend->single_cb_enable ? 1 : subpass->color_count); ++i) {
unsigned cf;
@@ -482,6 +524,16 @@ radv_pipeline_compute_spi_color_formats(struct radv_pipeline *pipeline,
col_format |= cf << (4 * i);
}
/* If the i-th target format is set, all previous target formats must
* be non-zero to avoid hangs.
*/
num_targets = (util_last_bit(col_format) + 3) / 4;
for (unsigned i = 0; i < num_targets; i++) {
if (!(col_format & (0xf << (i * 4)))) {
col_format |= V_028714_SPI_SHADER_32_R << (i * 4);
}
}
blend->cb_shader_mask = ac_get_cb_shader_mask(col_format);
if (blend->mrt0_is_dual_src)
@@ -570,7 +622,7 @@ radv_blend_check_commutativity(struct radv_blend_state *blend,
(1u << VK_BLEND_FACTOR_ONE_MINUS_SRC1_ALPHA);
if (dst == VK_BLEND_FACTOR_ONE &&
(src_allowed && (1u << src))) {
(src_allowed & (1u << src))) {
/* Addition is commutative, but floating point addition isn't
* associative: subtle changes can be introduced via different
* rounding. Be conservative, only enable for min and max.
@@ -600,9 +652,9 @@ radv_pipeline_init_blend_state(struct radv_pipeline *pipeline,
}
blend.cb_color_control = 0;
if (vkblend->logicOpEnable)
blend.cb_color_control |= S_028808_ROP3(vkblend->logicOp | (vkblend->logicOp << 4));
blend.cb_color_control |= S_028808_ROP3(si_translate_blend_logic_op(vkblend->logicOp));
else
blend.cb_color_control |= S_028808_ROP3(0xcc);
blend.cb_color_control |= S_028808_ROP3(V_028808_ROP3_COPY);
blend.db_alpha_to_mask = S_028B70_ALPHA_TO_MASK_OFFSET0(2) |
S_028B70_ALPHA_TO_MASK_OFFSET1(2) |
@@ -1542,21 +1594,25 @@ static void si_multiwave_lds_size_workaround(struct radv_device *device,
}
struct radv_shader_variant *
radv_get_vertex_shader(struct radv_pipeline *pipeline)
radv_get_shader(struct radv_pipeline *pipeline,
gl_shader_stage stage)
{
if (pipeline->shaders[MESA_SHADER_VERTEX])
return pipeline->shaders[MESA_SHADER_VERTEX];
if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
return pipeline->shaders[MESA_SHADER_TESS_CTRL];
return pipeline->shaders[MESA_SHADER_GEOMETRY];
}
static struct radv_shader_variant *
radv_get_tess_eval_shader(struct radv_pipeline *pipeline)
{
if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
return pipeline->shaders[MESA_SHADER_TESS_EVAL];
return pipeline->shaders[MESA_SHADER_GEOMETRY];
if (stage == MESA_SHADER_VERTEX) {
if (pipeline->shaders[MESA_SHADER_VERTEX])
return pipeline->shaders[MESA_SHADER_VERTEX];
if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
return pipeline->shaders[MESA_SHADER_TESS_CTRL];
if (pipeline->shaders[MESA_SHADER_GEOMETRY])
return pipeline->shaders[MESA_SHADER_GEOMETRY];
} else if (stage == MESA_SHADER_TESS_EVAL) {
if (!radv_pipeline_has_tess(pipeline))
return NULL;
if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
return pipeline->shaders[MESA_SHADER_TESS_EVAL];
if (pipeline->shaders[MESA_SHADER_GEOMETRY])
return pipeline->shaders[MESA_SHADER_GEOMETRY];
}
return pipeline->shaders[stage];
}
static struct radv_tessellation_state
@@ -1591,7 +1647,7 @@ calculate_tess_state(struct radv_pipeline *pipeline,
S_028B58_HS_NUM_OUTPUT_CP(num_tcs_output_cp);
tess.num_patches = num_patches;
struct radv_shader_variant *tes = radv_get_tess_eval_shader(pipeline);
struct radv_shader_variant *tes = radv_get_shader(pipeline, MESA_SHADER_TESS_EVAL);
unsigned type = 0, partitioning = 0, topology = 0, distribution_mode = 0;
switch (tes->info.tes.primitive_mode) {
@@ -1724,13 +1780,13 @@ radv_link_shaders(struct radv_pipeline *pipeline, nir_shader **shaders)
ac_lower_indirect_derefs(ordered_shaders[i],
pipeline->device->physical_device->rad_info.chip_class);
}
radv_optimize_nir(ordered_shaders[i]);
radv_optimize_nir(ordered_shaders[i], false);
if (nir_lower_global_vars_to_local(ordered_shaders[i - 1])) {
ac_lower_indirect_derefs(ordered_shaders[i - 1],
pipeline->device->physical_device->rad_info.chip_class);
}
radv_optimize_nir(ordered_shaders[i - 1]);
radv_optimize_nir(ordered_shaders[i - 1], false);
}
}
}
@@ -1750,6 +1806,9 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
struct radv_pipeline_key key;
memset(&key, 0, sizeof(key));
if (pCreateInfo->flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)
key.optimisations_disabled = 1;
key.has_multiview_view_index = has_view_index;
uint32_t binding_input_rate = 0;
@@ -1769,13 +1828,36 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
}
for (unsigned i = 0; i < input_state->vertexAttributeDescriptionCount; ++i) {
unsigned binding;
binding = input_state->pVertexAttributeDescriptions[i].binding;
unsigned location = input_state->pVertexAttributeDescriptions[i].location;
unsigned binding = input_state->pVertexAttributeDescriptions[i].binding;
if (binding_input_rate & (1u << binding)) {
unsigned location = input_state->pVertexAttributeDescriptions[i].location;
key.instance_rate_inputs |= 1u << location;
key.instance_rate_divisors[location] = instance_rate_divisors[binding];
}
if (pipeline->device->physical_device->rad_info.chip_class <= VI &&
pipeline->device->physical_device->rad_info.family != CHIP_STONEY) {
VkFormat format = input_state->pVertexAttributeDescriptions[i].format;
uint64_t adjust;
switch(format) {
case VK_FORMAT_A2R10G10B10_SNORM_PACK32:
case VK_FORMAT_A2B10G10R10_SNORM_PACK32:
adjust = RADV_ALPHA_ADJUST_SNORM;
break;
case VK_FORMAT_A2R10G10B10_SSCALED_PACK32:
case VK_FORMAT_A2B10G10R10_SSCALED_PACK32:
adjust = RADV_ALPHA_ADJUST_SSCALED;
break;
case VK_FORMAT_A2R10G10B10_SINT_PACK32:
case VK_FORMAT_A2B10G10R10_SINT_PACK32:
adjust = RADV_ALPHA_ADJUST_SINT;
break;
default:
adjust = 0;
break;
}
key.vertex_alpha_adjust |= adjust << (2 * location);
}
}
if (pCreateInfo->pTessellationState)
@@ -1786,7 +1868,6 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
pCreateInfo->pMultisampleState->rasterizationSamples > 1) {
uint32_t num_samples = pCreateInfo->pMultisampleState->rasterizationSamples;
uint32_t ps_iter_samples = radv_pipeline_get_ps_iter_samples(pCreateInfo->pMultisampleState);
key.multisample = true;
key.log2_num_samples = util_logbase2(num_samples);
key.log2_ps_iter_samples = util_logbase2(ps_iter_samples);
}
@@ -1804,6 +1885,7 @@ radv_fill_shader_keys(struct radv_shader_variant_key *keys,
nir_shader **nir)
{
keys[MESA_SHADER_VERTEX].vs.instance_rate_inputs = key->instance_rate_inputs;
keys[MESA_SHADER_VERTEX].vs.alpha_adjust = key->vertex_alpha_adjust;
for (unsigned i = 0; i < MAX_VERTEX_ATTRIBS; ++i)
keys[MESA_SHADER_VERTEX].vs.instance_rate_divisors[i] = key->instance_rate_divisors[i];
@@ -1826,7 +1908,6 @@ radv_fill_shader_keys(struct radv_shader_variant_key *keys,
for(int i = 0; i < MESA_SHADER_STAGES; ++i)
keys[i].has_multiview_view_index = key->has_multiview_view_index;
keys[MESA_SHADER_FRAGMENT].fs.multisample = key->multisample;
keys[MESA_SHADER_FRAGMENT].fs.col_format = key->col_format;
keys[MESA_SHADER_FRAGMENT].fs.is_int8 = key->is_int8;
keys[MESA_SHADER_FRAGMENT].fs.is_int10 = key->is_int10;
@@ -1878,7 +1959,8 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
struct radv_device *device,
struct radv_pipeline_cache *cache,
struct radv_pipeline_key key,
const VkPipelineShaderStageCreateInfo **pStages)
const VkPipelineShaderStageCreateInfo **pStages,
const VkPipelineCreateFlags flags)
{
struct radv_shader_module fs_m = {0};
struct radv_shader_module *modules[MESA_SHADER_STAGES] = { 0, };
@@ -1895,6 +1977,8 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
_mesa_sha1_compute(modules[i]->nir->info.name,
strlen(modules[i]->nir->info.name),
modules[i]->sha1);
pipeline->active_stages |= mesa_to_vk_shader_stage(i);
}
}
@@ -1910,10 +1994,6 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
if (radv_create_shader_variants_from_pipeline_cache(device, cache, hash, pipeline->shaders) &&
(!modules[MESA_SHADER_GEOMETRY] || pipeline->gs_copy_shader)) {
for (unsigned i = 0; i < MESA_SHADER_STAGES; ++i) {
if (pipeline->shaders[i])
pipeline->active_stages |= mesa_to_vk_shader_stage(i);
}
return;
}
@@ -1944,8 +2024,8 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
nir[i] = radv_shader_compile_to_nir(device, modules[i],
stage ? stage->pName : "main", i,
stage ? stage->pSpecializationInfo : NULL);
pipeline->active_stages |= mesa_to_vk_shader_stage(i);
stage ? stage->pSpecializationInfo : NULL,
flags);
/* We don't want to alter meta shaders IR directly so clone it
* first.
@@ -1963,8 +2043,10 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
if (i != last)
mask = mask | nir_var_shader_out;
nir_lower_io_to_scalar_early(nir[i], mask);
radv_optimize_nir(nir[i]);
if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) {
nir_lower_io_to_scalar_early(nir[i], mask);
radv_optimize_nir(nir[i], false);
}
}
}
@@ -1973,10 +2055,11 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
merge_tess_info(&nir[MESA_SHADER_TESS_EVAL]->info, &nir[MESA_SHADER_TESS_CTRL]->info);
}
radv_link_shaders(pipeline, nir);
if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT))
radv_link_shaders(pipeline, nir);
for (int i = 0; i < MESA_SHADER_STAGES; ++i) {
if (modules[i] && radv_can_dump_shader(device, modules[i]))
if (radv_can_dump_shader(device, modules[i], false))
nir_print_shader(nir[i], stderr);
}
@@ -3076,7 +3159,7 @@ radv_pipeline_generate_vgt_vertex_reuse(struct radeon_winsys_cs *cs,
unsigned vtx_reuse_depth = 30;
if (radv_pipeline_has_tess(pipeline) &&
radv_get_tess_eval_shader(pipeline)->info.tes.spacing == TESS_SPACING_FRACTIONAL_ODD) {
radv_get_shader(pipeline, MESA_SHADER_TESS_EVAL)->info.tes.spacing == TESS_SPACING_FRACTIONAL_ODD) {
vtx_reuse_depth = 14;
}
radeon_set_context_reg(cs, R_028C58_VGT_VERTEX_REUSE_BLOCK_CNTL,
@@ -3206,8 +3289,9 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
}
}
/* GS requirement. */
if (SI_GS_PER_ES / ia_multi_vgt_param.primgroup_size >= pipeline->device->gs_table_depth - 3)
ia_multi_vgt_param.partial_es_wave = true;
if (radv_pipeline_has_gs(pipeline) && device->physical_device->rad_info.chip_class <= VI)
if (SI_GS_PER_ES / ia_multi_vgt_param.primgroup_size >= pipeline->device->gs_table_depth - 3)
ia_multi_vgt_param.partial_es_wave = true;
ia_multi_vgt_param.wd_switch_on_eop = false;
if (device->physical_device->rad_info.chip_class >= CIK) {
@@ -3236,7 +3320,7 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
if (radv_pipeline_has_tess(pipeline)) {
/* SWITCH_ON_EOI must be set if PrimID is used. */
if (pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.info.uses_prim_id ||
radv_get_tess_eval_shader(pipeline)->info.info.uses_prim_id)
radv_get_shader(pipeline, MESA_SHADER_TESS_EVAL)->info.info.uses_prim_id)
ia_multi_vgt_param.ia_switch_on_eoi = true;
}
@@ -3348,7 +3432,7 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
radv_create_shaders(pipeline, device, cache,
radv_generate_graphics_pipeline_key(pipeline, pCreateInfo, &blend, has_view_index),
pStages);
pStages, pCreateInfo->flags);
pipeline->graphics.spi_baryc_cntl = S_0286E0_FRONT_FACE_ALL_BITS(1);
radv_pipeline_init_multisample_state(pipeline, &blend, pCreateInfo);
@@ -3426,7 +3510,7 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
if (loc->sgpr_idx != -1) {
pipeline->graphics.vtx_base_sgpr = pipeline->user_data_0[MESA_SHADER_VERTEX];
pipeline->graphics.vtx_base_sgpr += loc->sgpr_idx * 4;
if (radv_get_vertex_shader(pipeline)->info.info.vs.needs_draw_id)
if (radv_get_shader(pipeline, MESA_SHADER_VERTEX)->info.info.vs.needs_draw_id)
pipeline->graphics.vtx_emit_num = 3;
else
pipeline->graphics.vtx_emit_num = 2;
@@ -3455,7 +3539,7 @@ radv_graphics_pipeline_create(
pipeline = vk_zalloc2(&device->alloc, pAllocator, sizeof(*pipeline), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pipeline == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
result = radv_pipeline_init(pipeline, device, cache,
pCreateInfo, extra, pAllocator);
@@ -3574,14 +3658,14 @@ static VkResult radv_compute_pipeline_create(
pipeline = vk_zalloc2(&device->alloc, pAllocator, sizeof(*pipeline), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (pipeline == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
pipeline->device = device;
pipeline->layout = radv_pipeline_layout_from_handle(pCreateInfo->layout);
assert(pipeline->layout);
pStages[MESA_SHADER_COMPUTE] = &pCreateInfo->stage;
radv_create_shaders(pipeline, device, cache, (struct radv_pipeline_key) {0}, pStages);
radv_create_shaders(pipeline, device, cache, (struct radv_pipeline_key) {0}, pStages, pCreateInfo->flags);
pipeline->user_data_0[MESA_SHADER_COMPUTE] = radv_pipeline_stage_to_user_data_0(pipeline, MESA_SHADER_COMPUTE, device->physical_device->rad_info.chip_class);
pipeline->need_indirect_descriptor_sets |= pipeline->shaders[MESA_SHADER_COMPUTE]->info.need_indirect_descriptor_sets;

View File

@@ -206,7 +206,7 @@ radv_pipeline_cache_grow(struct radv_pipeline_cache *cache)
table = malloc(byte_size);
if (table == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(cache->device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
cache->hash_table = table;
cache->table_size = table_size;
@@ -515,7 +515,7 @@ VkResult radv_CreatePipelineCache(
sizeof(*cache), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (cache == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
if (pAllocator)
cache->alloc = *pAllocator;

View File

@@ -57,8 +57,10 @@
#include "ac_nir_to_llvm.h"
#include "ac_gpu_info.h"
#include "ac_surface.h"
#include "ac_llvm_build.h"
#include "radv_descriptor_set.h"
#include "radv_extensions.h"
#include "radv_cs.h"
#include <llvm-c/TargetMachine.h>
@@ -215,20 +217,19 @@ radv_clear_mask(uint32_t *inout_mask, uint32_t clear_mask)
* propagating errors. Might be useful to plug in a stack trace here.
*/
VkResult __vk_errorf(VkResult error, const char *file, int line, const char *format, ...);
struct radv_instance;
#ifdef DEBUG
#define vk_error(error) __vk_errorf(error, __FILE__, __LINE__, NULL);
#define vk_errorf(error, format, ...) __vk_errorf(error, __FILE__, __LINE__, format, ## __VA_ARGS__);
#else
#define vk_error(error) error
#define vk_errorf(error, format, ...) error
#endif
VkResult __vk_errorf(struct radv_instance *instance, VkResult error, const char *file, int line, const char *format, ...);
#define vk_error(instance, error) __vk_errorf(instance, error, __FILE__, __LINE__, NULL);
#define vk_errorf(instance, error, format, ...) __vk_errorf(instance, error, __FILE__, __LINE__, format, ## __VA_ARGS__);
void __radv_finishme(const char *file, int line, const char *format, ...)
radv_printflike(3, 4);
void radv_loge(const char *format, ...) radv_printflike(1, 2);
void radv_loge_v(const char *format, va_list va);
void radv_logi(const char *format, ...) radv_printflike(1, 2);
void radv_logi_v(const char *format, va_list va);
/**
* Print a FINISHME message, including its source location.
@@ -352,14 +353,15 @@ struct radv_pipeline_cache {
struct radv_pipeline_key {
uint32_t instance_rate_inputs;
uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];
uint64_t vertex_alpha_adjust;
unsigned tess_input_vertices;
uint32_t col_format;
uint32_t is_int8;
uint32_t is_int10;
uint8_t log2_ps_iter_samples;
uint8_t log2_num_samples;
uint32_t multisample : 1;
uint32_t has_multiview_view_index : 1;
uint32_t optimisations_disabled : 1;
};
void
@@ -465,18 +467,18 @@ struct radv_meta_state {
} blit;
struct {
VkRenderPass render_passes[NUM_META_FS_KEYS][RADV_META_DST_LAYOUT_COUNT];
VkPipelineLayout p_layouts[5];
VkDescriptorSetLayout ds_layouts[5];
VkPipeline pipelines[5][NUM_META_FS_KEYS];
VkPipelineLayout p_layouts[3];
VkDescriptorSetLayout ds_layouts[3];
VkPipeline pipelines[3][NUM_META_FS_KEYS];
VkPipeline depth_only_pipeline[5];
VkRenderPass depth_only_rp[RADV_BLIT_DS_LAYOUT_COUNT];
VkPipeline depth_only_pipeline[3];
VkPipeline stencil_only_pipeline[5];
} blit2d[1 + MAX_SAMPLES_LOG2];
VkRenderPass stencil_only_rp[RADV_BLIT_DS_LAYOUT_COUNT];
VkPipeline stencil_only_pipeline[3];
} blit2d;
VkRenderPass blit2d_render_passes[NUM_META_FS_KEYS][RADV_META_DST_LAYOUT_COUNT];
VkRenderPass blit2d_depth_only_rp[RADV_BLIT_DS_LAYOUT_COUNT];
VkRenderPass blit2d_stencil_only_rp[RADV_BLIT_DS_LAYOUT_COUNT];
struct {
VkPipelineLayout img_p_layout;
@@ -622,7 +624,6 @@ struct radv_device {
struct radeon_winsys_cs *empty_cs[RADV_MAX_QUEUE_FAMILIES];
bool always_use_syncobj;
bool llvm_supports_spill;
bool has_distributed_tess;
bool pbb_allowed;
bool dfsm_allowed;
@@ -1108,14 +1109,17 @@ void radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer);
void radv_cayman_emit_msaa_sample_locs(struct radeon_winsys_cs *cs, int nr_samples);
unsigned radv_cayman_get_maxdist(int log_samples);
void radv_device_init_msaa(struct radv_device *device);
void radv_set_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkClearDepthStencilValue ds_clear_value,
VkImageAspectFlags aspects);
void radv_set_color_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int idx,
uint32_t color_values[2]);
void radv_set_ds_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
VkClearDepthStencilValue ds_clear_value,
VkImageAspectFlags aspects);
void radv_set_color_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int cb_idx,
uint32_t color_values[2]);
void radv_set_dcc_need_cmask_elim_pred(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
bool value);
@@ -1127,6 +1131,41 @@ bool radv_get_memory_fd(struct radv_device *device,
struct radv_device_memory *memory,
int *pFD);
static inline void
radv_emit_shader_pointer_head(struct radeon_winsys_cs *cs,
unsigned sh_offset, unsigned pointer_count,
bool use_32bit_pointers)
{
radeon_emit(cs, PKT3(PKT3_SET_SH_REG, pointer_count * (use_32bit_pointers ? 1 : 2), 0));
radeon_emit(cs, (sh_offset - SI_SH_REG_OFFSET) >> 2);
}
static inline void
radv_emit_shader_pointer_body(struct radv_device *device,
struct radeon_winsys_cs *cs,
uint64_t va, bool use_32bit_pointers)
{
radeon_emit(cs, va);
if (use_32bit_pointers) {
assert(va == 0 ||
(va >> 32) == device->physical_device->rad_info.address32_hi);
} else {
radeon_emit(cs, va >> 32);
}
}
static inline void
radv_emit_shader_pointer(struct radv_device *device,
struct radeon_winsys_cs *cs,
uint32_t sh_offset, uint64_t va, bool global)
{
bool use_32bit_pointers = HAVE_32BIT_POINTERS && !global;
radv_emit_shader_pointer_head(cs, sh_offset, 1, use_32bit_pointers);
radv_emit_shader_pointer_body(device, cs, va, use_32bit_pointers);
}
static inline struct radv_descriptor_state *
radv_get_descriptors_state(struct radv_cmd_buffer *cmd_buffer,
VkPipelineBindPoint bind_point)
@@ -1279,7 +1318,8 @@ struct radv_userdata_info *radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
gl_shader_stage stage,
int idx);
struct radv_shader_variant *radv_get_vertex_shader(struct radv_pipeline *pipeline);
struct radv_shader_variant *radv_get_shader(struct radv_pipeline *pipeline,
gl_shader_stage stage);
struct radv_graphics_pipeline_create_info {
bool use_rectlist;
@@ -1706,14 +1746,6 @@ struct radv_semaphore {
uint32_t temp_syncobj;
};
VkResult radv_alloc_sem_info(struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
int num_signal_sems,
const VkSemaphore *signal_sems,
VkFence fence);
void radv_free_sem_info(struct radv_winsys_sem_info *sem_info);
void radv_set_descriptor_set(struct radv_cmd_buffer *cmd_buffer,
VkPipelineBindPoint bind_point,
struct radv_descriptor_set *set,

View File

@@ -753,7 +753,7 @@ VkResult radv_CreateQueryPool(
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!pool)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
switch(pCreateInfo->queryType) {
@@ -783,7 +783,7 @@ VkResult radv_CreateQueryPool(
if (!pool->bo) {
vk_free2(&device->alloc, pAllocator, pool);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
pool->ptr = device->ws->buffer_map(pool->bo);
@@ -791,7 +791,7 @@ VkResult radv_CreateQueryPool(
if (!pool->ptr) {
device->ws->buffer_destroy(pool->bo);
vk_free2(&device->alloc, pAllocator, pool);
return vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_DEVICE_MEMORY);
}
memset(pool->ptr, 0, pool->size);
@@ -1140,12 +1140,12 @@ static void emit_end_query(struct radv_cmd_buffer *cmd_buffer,
cmd_buffer->state.active_occlusion_queries--;
if (cmd_buffer->state.active_occlusion_queries == 0) {
radv_set_db_count_control(cmd_buffer);
/* Reset the perfect occlusion queries hint now that no
* queries are active.
*/
cmd_buffer->state.perfect_occlusion_queries_enabled = false;
radv_set_db_count_control(cmd_buffer);
}
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
@@ -1204,25 +1204,6 @@ void radv_CmdBeginQuery(
va += pool->stride * query;
emit_begin_query(cmd_buffer, va, pool->type, flags);
/*
* For multiview we have to emit a query for each bit in the mask,
* however the first query we emit will get the totals for all the
* operations, so we don't want to get a real value in the other
* queries. This emits a fake begin/end sequence so the waiting
* code gets a completed query value and doesn't hang, but the
* query returns 0.
*/
if (cmd_buffer->state.subpass && cmd_buffer->state.subpass->view_mask) {
uint64_t avail_va = va + pool->availability_offset + 4 * query;
for (unsigned i = 0; i < util_bitcount(cmd_buffer->state.subpass->view_mask); i++) {
va += pool->stride;
avail_va += 4;
emit_begin_query(cmd_buffer, va, pool->type, flags);
emit_end_query(cmd_buffer, va, avail_va, pool->type);
}
}
}
@@ -1241,6 +1222,26 @@ void radv_CmdEndQuery(
* currently be active, which means the BO is already in the list.
*/
emit_end_query(cmd_buffer, va, avail_va, pool->type);
/*
* For multiview we have to emit a query for each bit in the mask,
* however the first query we emit will get the totals for all the
* operations, so we don't want to get a real value in the other
* queries. This emits a fake begin/end sequence so the waiting
* code gets a completed query value and doesn't hang, but the
* query returns 0.
*/
if (cmd_buffer->state.subpass && cmd_buffer->state.subpass->view_mask) {
uint64_t avail_va = va + pool->availability_offset + 4 * query;
for (unsigned i = 1; i < util_bitcount(cmd_buffer->state.subpass->view_mask); i++) {
va += pool->stride;
avail_va += 4;
emit_begin_query(cmd_buffer, va, pool->type, 0);
emit_end_query(cmd_buffer, va, avail_va, pool->type);
}
}
}
void radv_CmdWriteTimestamp(

View File

@@ -57,6 +57,7 @@ enum radeon_bo_flag { /* bitfield */
RADEON_FLAG_IMPLICIT_SYNC = (1 << 5),
RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 6),
RADEON_FLAG_READ_ONLY = (1 << 7),
RADEON_FLAG_32BIT = (1 << 8),
};
enum radeon_bo_usage { /* bitfield */

View File

@@ -36,6 +36,7 @@
#include <llvm-c/Core.h>
#include <llvm-c/TargetMachine.h>
#include <llvm-c/Support.h>
#include "sid.h"
#include "gfx9d.h"
@@ -89,7 +90,7 @@ VkResult radv_CreateShaderModule(
sizeof(*module) + pCreateInfo->codeSize, 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (module == NULL)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
module->nir = NULL;
module->size = pCreateInfo->codeSize;
@@ -117,7 +118,7 @@ void radv_DestroyShaderModule(
}
void
radv_optimize_nir(struct nir_shader *shader)
radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively)
{
bool progress;
@@ -125,7 +126,7 @@ radv_optimize_nir(struct nir_shader *shader)
progress = false;
NIR_PASS_V(shader, nir_lower_vars_to_ssa);
NIR_PASS_V(shader, nir_lower_64bit_pack);
NIR_PASS_V(shader, nir_lower_pack);
NIR_PASS_V(shader, nir_lower_alu_to_scalar);
NIR_PASS_V(shader, nir_lower_phis_to_scalar);
@@ -149,7 +150,7 @@ radv_optimize_nir(struct nir_shader *shader)
if (shader->options->max_unroll_iterations) {
NIR_PASS(progress, shader, nir_opt_loop_unroll, 0);
}
} while (progress);
} while (progress && !optimize_conservatively);
NIR_PASS(progress, shader, nir_opt_shrink_load);
NIR_PASS(progress, shader, nir_opt_move_load_ubo);
@@ -160,12 +161,9 @@ radv_shader_compile_to_nir(struct radv_device *device,
struct radv_shader_module *module,
const char *entrypoint_name,
gl_shader_stage stage,
const VkSpecializationInfo *spec_info)
const VkSpecializationInfo *spec_info,
const VkPipelineCreateFlags flags)
{
if (strcmp(entrypoint_name, "main") != 0) {
radv_finishme("Multiple shaders per module not really supported");
}
nir_shader *nir;
nir_function *entry_point;
if (module->nir) {
@@ -280,7 +278,20 @@ radv_shader_compile_to_nir(struct radv_device *device,
nir_lower_tex(nir, &tex_options);
nir_lower_vars_to_ssa(nir);
if (nir->info.stage == MESA_SHADER_VERTEX ||
nir->info.stage == MESA_SHADER_GEOMETRY) {
NIR_PASS_V(nir, nir_lower_io_to_temporaries,
nir_shader_get_entrypoint(nir), true, true);
} else if (nir->info.stage == MESA_SHADER_TESS_EVAL||
nir->info.stage == MESA_SHADER_FRAGMENT) {
NIR_PASS_V(nir, nir_lower_io_to_temporaries,
nir_shader_get_entrypoint(nir), true, false);
}
nir_split_var_copies(nir);
nir_lower_var_copies(nir);
nir_lower_global_vars_to_local(nir);
nir_remove_dead_variables(nir, nir_var_local);
nir_lower_subgroups(nir, &(struct nir_lower_subgroups_options) {
@@ -293,7 +304,8 @@ radv_shader_compile_to_nir(struct radv_device *device,
.lower_vote_eq_to_ballot = 1,
});
radv_optimize_nir(nir);
if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT))
radv_optimize_nir(nir, false);
/* Indirect lowering must be called after the radv_optimize_nir() loop
* has been called at least once. Otherwise indirect lowering can
@@ -301,7 +313,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
* considered too large for unrolling.
*/
ac_lower_indirect_derefs(nir, device->physical_device->rad_info.chip_class);
radv_optimize_nir(nir);
radv_optimize_nir(nir, flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT);
return nir;
}
@@ -371,16 +383,14 @@ radv_fill_shader_variant(struct radv_device *device,
gl_shader_stage stage)
{
bool scratch_enabled = variant->config.scratch_bytes_per_wave > 0;
struct radv_shader_info *info = &variant->info.info;
unsigned vgpr_comp_cnt = 0;
if (scratch_enabled && !device->llvm_supports_spill)
radv_finishme("shader scratch support only available with LLVM 4.0");
variant->code_size = binary->code_size;
variant->rsrc2 = S_00B12C_USER_SGPR(variant->info.num_user_sgprs) |
S_00B12C_SCRATCH_EN(scratch_enabled);
S_00B12C_SCRATCH_EN(scratch_enabled);
variant->rsrc1 = S_00B848_VGPRS((variant->config.num_vgprs - 1) / 4) |
variant->rsrc1 = S_00B848_VGPRS((variant->config.num_vgprs - 1) / 4) |
S_00B848_SGPRS((variant->config.num_sgprs - 1) / 8) |
S_00B848_DX10_CLAMP(1) |
S_00B848_FLOAT_MODE(variant->config.float_mode);
@@ -391,10 +401,11 @@ radv_fill_shader_variant(struct radv_device *device,
variant->rsrc2 |= S_00B12C_OC_LDS_EN(1);
break;
case MESA_SHADER_TESS_CTRL:
if (device->physical_device->rad_info.chip_class >= GFX9)
if (device->physical_device->rad_info.chip_class >= GFX9) {
vgpr_comp_cnt = variant->info.vs.vgpr_comp_cnt;
else
} else {
variant->rsrc2 |= S_00B12C_OC_LDS_EN(1);
}
break;
case MESA_SHADER_VERTEX:
case MESA_SHADER_GEOMETRY:
@@ -402,8 +413,7 @@ radv_fill_shader_variant(struct radv_device *device,
break;
case MESA_SHADER_FRAGMENT:
break;
case MESA_SHADER_COMPUTE: {
struct radv_shader_info *info = &variant->info.info;
case MESA_SHADER_COMPUTE:
variant->rsrc2 |=
S_00B84C_TGID_X_EN(info->cs.uses_block_id[0]) |
S_00B84C_TGID_Y_EN(info->cs.uses_block_id[1]) |
@@ -413,7 +423,6 @@ radv_fill_shader_variant(struct radv_device *device,
S_00B84C_TG_SIZE_EN(info->cs.uses_local_invocation_idx) |
S_00B84C_LDS_SIZE(variant->config.lds_size);
break;
}
default:
unreachable("unsupported shader type");
break;
@@ -421,7 +430,6 @@ radv_fill_shader_variant(struct radv_device *device,
if (device->physical_device->rad_info.chip_class >= GFX9 &&
stage == MESA_SHADER_GEOMETRY) {
struct radv_shader_info *info = &variant->info.info;
unsigned es_type = variant->info.gs.es_type;
unsigned gs_vgpr_comp_cnt, es_vgpr_comp_cnt;
@@ -436,28 +444,106 @@ radv_fill_shader_variant(struct radv_device *device,
/* If offsets 4, 5 are used, GS_VGPR_COMP_CNT is ignored and
* VGPR[0:4] are always loaded.
*/
if (info->uses_invocation_id)
if (info->uses_invocation_id) {
gs_vgpr_comp_cnt = 3; /* VGPR3 contains InvocationID. */
else if (info->uses_prim_id)
} else if (info->uses_prim_id) {
gs_vgpr_comp_cnt = 2; /* VGPR2 contains PrimitiveID. */
else if (variant->info.gs.vertices_in >= 3)
} else if (variant->info.gs.vertices_in >= 3) {
gs_vgpr_comp_cnt = 1; /* VGPR1 contains offsets 2, 3 */
else
} else {
gs_vgpr_comp_cnt = 0; /* VGPR0 contains offsets 0, 1 */
}
variant->rsrc1 |= S_00B228_GS_VGPR_COMP_CNT(gs_vgpr_comp_cnt);
variant->rsrc2 |= S_00B22C_ES_VGPR_COMP_CNT(es_vgpr_comp_cnt) |
S_00B22C_OC_LDS_EN(es_type == MESA_SHADER_TESS_EVAL);
} else if (device->physical_device->rad_info.chip_class >= GFX9 &&
stage == MESA_SHADER_TESS_CTRL)
stage == MESA_SHADER_TESS_CTRL) {
variant->rsrc1 |= S_00B428_LS_VGPR_COMP_CNT(vgpr_comp_cnt);
else
} else {
variant->rsrc1 |= S_00B128_VGPR_COMP_CNT(vgpr_comp_cnt);
}
void *ptr = radv_alloc_shader_memory(device, variant);
memcpy(ptr, binary->code, binary->code_size);
}
static void radv_init_llvm_target()
{
LLVMInitializeAMDGPUTargetInfo();
LLVMInitializeAMDGPUTarget();
LLVMInitializeAMDGPUTargetMC();
LLVMInitializeAMDGPUAsmPrinter();
/* For inline assembly. */
LLVMInitializeAMDGPUAsmParser();
/* Workaround for bug in llvm 4.0 that causes image intrinsics
* to disappear.
* https://reviews.llvm.org/D26348
*
* Workaround for bug in llvm that causes the GPU to hang in presence
* of nested loops because there is an exec mask issue. The proper
* solution is to fix LLVM but this might require a bunch of work.
* https://bugs.llvm.org/show_bug.cgi?id=37744
*
* "mesa" is the prefix for error messages.
*/
const char *argv[3] = { "mesa", "-simplifycfg-sink-common=false",
"-amdgpu-skip-threshold=1" };
LLVMParseCommandLineOptions(3, argv, NULL);
}
static once_flag radv_init_llvm_target_once_flag = ONCE_FLAG_INIT;
static LLVMTargetRef radv_get_llvm_target(const char *triple)
{
LLVMTargetRef target = NULL;
char *err_message = NULL;
call_once(&radv_init_llvm_target_once_flag, radv_init_llvm_target);
if (LLVMGetTargetFromTriple(triple, &target, &err_message)) {
fprintf(stderr, "Cannot find target for triple %s ", triple);
if (err_message) {
fprintf(stderr, "%s\n", err_message);
}
LLVMDisposeMessage(err_message);
return NULL;
}
return target;
}
static LLVMTargetMachineRef radv_create_target_machine(enum radeon_family family,
enum ac_target_machine_options tm_options,
const char **out_triple)
{
assert(family >= CHIP_TAHITI);
char features[256];
const char *triple = (tm_options & AC_TM_SUPPORTS_SPILL) ? "amdgcn-mesa-mesa3d" : "amdgcn--";
LLVMTargetRef target = radv_get_llvm_target(triple);
snprintf(features, sizeof(features),
"+DumpCode,+vgpr-spilling,-fp32-denormals,+fp64-denormals%s%s%s%s",
tm_options & AC_TM_SISCHED ? ",+si-scheduler" : "",
tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "");
LLVMTargetMachineRef tm = LLVMCreateTargetMachine(
target,
triple,
ac_get_llvm_processor_name(family),
features,
LLVMCodeGenLevelDefault,
LLVMRelocDefault,
LLVMCodeModelDefault);
if (out_triple)
*out_triple = triple;
return tm;
}
static struct radv_shader_variant *
shader_variant_create(struct radv_device *device,
struct radv_shader_module *module,
@@ -481,17 +567,19 @@ shader_variant_create(struct radv_device *device,
options->family = chip_family;
options->chip_class = device->physical_device->rad_info.chip_class;
options->dump_shader = radv_can_dump_shader(device, module);
options->dump_shader = radv_can_dump_shader(device, module, gs_copy_shader);
options->dump_preoptir = options->dump_shader &&
device->instance->debug_flags & RADV_DEBUG_PREOPTIR;
options->record_llvm_ir = device->keep_shader_info;
options->check_ir = device->instance->debug_flags & RADV_DEBUG_CHECKIR;
options->tess_offchip_block_dw_size = device->tess_offchip_block_dw_size;
options->address32_hi = device->physical_device->rad_info.address32_hi;
if (options->supports_spill)
tm_options |= AC_TM_SUPPORTS_SPILL;
if (device->instance->perftest_flags & RADV_PERFTEST_SISCHED)
tm_options |= AC_TM_SISCHED;
tm = ac_create_target_machine(chip_family, tm_options);
tm = radv_create_target_machine(chip_family, tm_options, NULL);
if (gs_copy_shader) {
assert(shader_count == 1);
@@ -551,7 +639,7 @@ radv_shader_variant_create(struct radv_device *device,
options.key = *key;
options.unsafe_math = !!(device->instance->debug_flags & RADV_DEBUG_UNSAFE_MATH);
options.supports_spill = device->llvm_supports_spill;
options.supports_spill = true;
return shader_variant_create(device, module, shaders, shader_count, shaders[shader_count - 1]->info.stage,
&options, false, code_out, code_size_out);
@@ -615,17 +703,7 @@ generate_shader_stats(struct radv_device *device,
unsigned max_simd_waves;
unsigned lds_per_wave = 0;
switch (device->physical_device->rad_info.family) {
/* These always have 8 waves: */
case CHIP_POLARIS10:
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_VEGAM:
max_simd_waves = 8;
break;
default:
max_simd_waves = 10;
}
max_simd_waves = ac_get_max_simd_waves(device->physical_device->rad_info.family);
conf = &variant->config;
@@ -710,7 +788,7 @@ radv_GetShaderInfoAMD(VkDevice _device,
/* Spec doesn't indicate what to do if the stage is invalid, so just
* return no info for this. */
if (!variant)
return vk_error(VK_ERROR_FEATURE_NOT_PRESENT);
return vk_error(device->instance, VK_ERROR_FEATURE_NOT_PRESENT);
switch (infoType) {
case VK_SHADER_INFO_TYPE_STATISTICS_AMD:
@@ -731,7 +809,7 @@ radv_GetShaderInfoAMD(VkDevice _device,
unsigned workgroup_size = local_size[0] * local_size[1] * local_size[2];
statistics.numAvailableVgprs = statistics.numPhysicalVgprs /
ceil(workgroup_size / statistics.numPhysicalVgprs);
ceil((double)workgroup_size / statistics.numPhysicalVgprs);
statistics.computeWorkGroupSize[0] = local_size[0];
statistics.computeWorkGroupSize[1] = local_size[1];

View File

@@ -55,9 +55,21 @@ struct radv_shader_module {
char data[0];
};
enum {
RADV_ALPHA_ADJUST_NONE = 0,
RADV_ALPHA_ADJUST_SNORM = 1,
RADV_ALPHA_ADJUST_SINT = 2,
RADV_ALPHA_ADJUST_SSCALED = 3,
};
struct radv_vs_variant_key {
uint32_t instance_rate_inputs;
uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];
/* For 2_10_10_10 formats the alpha is handled as unsigned by pre-vega HW.
* so we may need to fix it up. */
uint64_t alpha_adjust;
uint32_t as_es:1;
uint32_t as_ls:1;
uint32_t export_prim_id:1;
@@ -86,7 +98,6 @@ struct radv_fs_variant_key {
uint8_t log2_num_samples;
uint32_t is_int8;
uint32_t is_int10;
uint32_t multisample : 1;
};
struct radv_shader_variant_key {
@@ -108,9 +119,11 @@ struct radv_nir_compiler_options {
bool dump_shader;
bool dump_preoptir;
bool record_llvm_ir;
bool check_ir;
enum radeon_family family;
enum chip_class chip_class;
uint32_t tess_offchip_block_dw_size;
uint32_t address32_hi;
};
enum radv_ud_index {
@@ -145,6 +158,9 @@ struct radv_shader_info {
bool needs_draw_id;
bool needs_instance_id;
} vs;
struct {
uint8_t output_usage_mask[VARYING_SLOT_VAR31 + 1];
} gs;
struct {
uint8_t output_usage_mask[VARYING_SLOT_VAR31 + 1];
} tes;
@@ -282,14 +298,15 @@ struct radv_shader_slab {
};
void
radv_optimize_nir(struct nir_shader *shader);
radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively);
nir_shader *
radv_shader_compile_to_nir(struct radv_device *device,
struct radv_shader_module *module,
const char *entrypoint_name,
gl_shader_stage stage,
const VkSpecializationInfo *spec_info);
const VkSpecializationInfo *spec_info,
const VkPipelineCreateFlags flags);
void *
radv_alloc_shader_memory(struct radv_device *device,
@@ -328,11 +345,14 @@ radv_shader_dump_stats(struct radv_device *device,
static inline bool
radv_can_dump_shader(struct radv_device *device,
struct radv_shader_module *module)
struct radv_shader_module *module,
bool is_gs_copy_shader)
{
if (!(device->instance->debug_flags & RADV_DEBUG_DUMP_SHADERS))
return false;
/* Only dump non-meta shaders, useful for debugging purposes. */
return device->instance->debug_flags & RADV_DEBUG_DUMP_SHADERS &&
module && !module->nir;
return (module && !module->nir) || is_gs_copy_shader;
}
static inline bool

View File

@@ -87,6 +87,89 @@ static void get_deref_offset(nir_deref_var *deref, unsigned *const_out)
*const_out = const_offset;
}
static void
gather_intrinsic_load_var_info(const nir_shader *nir,
const nir_intrinsic_instr *instr,
struct radv_shader_info *info)
{
switch (nir->info.stage) {
case MESA_SHADER_VERTEX: {
nir_deref_var *dvar = instr->variables[0];
nir_variable *var = dvar->var;
if (var->data.mode == nir_var_shader_in) {
unsigned idx = var->data.location;
uint8_t mask = nir_ssa_def_components_read(&instr->dest.ssa);
info->vs.input_usage_mask[idx] |=
mask << var->data.location_frac;
}
break;
}
default:
break;
}
}
static void
gather_intrinsic_store_var_info(const nir_shader *nir,
const nir_intrinsic_instr *instr,
struct radv_shader_info *info)
{
nir_deref_var *dvar = instr->variables[0];
nir_variable *var = dvar->var;
if (var->data.mode == nir_var_shader_out) {
unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
get_deref_offset(dvar, &const_offset);
switch (nir->info.stage) {
case MESA_SHADER_VERTEX:
for (unsigned i = 0; i < attrib_count; i++) {
info->vs.output_usage_mask[idx + i + const_offset] |=
instr->const_index[0] << comp;
}
break;
case MESA_SHADER_GEOMETRY:
for (unsigned i = 0; i < attrib_count; i++) {
info->gs.output_usage_mask[idx + i + const_offset] |=
instr->const_index[0] << comp;
}
break;
case MESA_SHADER_TESS_EVAL:
for (unsigned i = 0; i < attrib_count; i++) {
info->tes.output_usage_mask[idx + i + const_offset] |=
instr->const_index[0] << comp;
}
break;
case MESA_SHADER_TESS_CTRL: {
unsigned param = shader_io_get_unique_index(idx);
const struct glsl_type *type = var->type;
if (!var->data.patch)
type = glsl_get_array_element(var->type);
unsigned slots =
var->data.compact ? DIV_ROUND_UP(glsl_get_length(type), 4)
: glsl_count_attribute_slots(type, false);
if (idx == VARYING_SLOT_CLIP_DIST0)
slots = (nir->info.clip_distance_array_size +
nir->info.cull_distance_array_size > 4) ? 2 : 1;
mark_tess_output(info, var->data.patch, param, slots);
break;
}
default:
break;
}
}
}
static void
gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
struct radv_shader_info *info)
@@ -197,55 +280,11 @@ gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
info->ps.writes_memory = true;
break;
case nir_intrinsic_load_var:
if (nir->info.stage == MESA_SHADER_VERTEX) {
nir_deref_var *dvar = instr->variables[0];
nir_variable *var = dvar->var;
if (var->data.mode == nir_var_shader_in) {
unsigned idx = var->data.location;
uint8_t mask =
nir_ssa_def_components_read(&instr->dest.ssa) << var->data.location_frac;
info->vs.input_usage_mask[idx] |= mask;
}
}
gather_intrinsic_load_var_info(nir, instr, info);
break;
case nir_intrinsic_store_var: {
nir_deref_var *dvar = instr->variables[0];
nir_variable *var = dvar->var;
if (var->data.mode == nir_var_shader_out) {
unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
get_deref_offset(dvar, &const_offset);
if (nir->info.stage == MESA_SHADER_VERTEX) {
for (unsigned i = 0; i < attrib_count; i++) {
info->vs.output_usage_mask[idx + i + const_offset] |=
instr->const_index[0] << comp;
}
} else if (nir->info.stage == MESA_SHADER_TESS_EVAL) {
for (unsigned i = 0; i < attrib_count; i++) {
info->tes.output_usage_mask[idx + i + const_offset] |=
instr->const_index[0] << comp;
}
} else if (nir->info.stage == MESA_SHADER_TESS_CTRL) {
unsigned param = shader_io_get_unique_index(idx);
const struct glsl_type *type = var->type;
if (!var->data.patch)
type = glsl_get_array_element(var->type);
unsigned slots =
var->data.compact ? DIV_ROUND_UP(glsl_get_length(type), 4)
: glsl_count_attribute_slots(type, false);
if (idx == VARYING_SLOT_CLIP_DIST0)
slots = (nir->info.clip_distance_array_size + nir->info.cull_distance_array_size > 4) ? 2 : 1;
mark_tess_output(info, var->data.patch, param, slots);
}
}
case nir_intrinsic_store_var:
gather_intrinsic_store_var_info(nir, instr, info);
break;
}
default:
break;
}
@@ -391,7 +430,7 @@ radv_nir_shader_info_pass(const struct nir_shader *nir,
struct nir_function *func =
(struct nir_function *)exec_list_get_head_const(&nir->functions);
if (options->layout->dynamic_offset_count)
if (options->layout && options->layout->dynamic_offset_count)
info->loads_push_constants = true;
nir_foreach_variable(variable, &nir->inputs)

View File

@@ -29,6 +29,7 @@
#include <assert.h>
#include "radv_private.h"
#include "radv_debug.h"
#include "vk_enum_to_str.h"
#include "util/u_math.h"
@@ -53,6 +54,26 @@ radv_loge_v(const char *format, va_list va)
fprintf(stderr, "\n");
}
/** Log an error message. */
void radv_printflike(1, 2)
radv_logi(const char *format, ...)
{
va_list va;
va_start(va, format);
radv_logi_v(format, va);
va_end(va);
}
/** \see radv_logi() */
void
radv_logi_v(const char *format, va_list va)
{
fprintf(stderr, "radv: info: ");
vfprintf(stderr, format, va);
fprintf(stderr, "\n");
}
void radv_printflike(3, 4)
__radv_finishme(const char *file, int line, const char *format, ...)
{
@@ -67,13 +88,19 @@ void radv_printflike(3, 4)
}
VkResult
__vk_errorf(VkResult error, const char *file, int line, const char *format, ...)
__vk_errorf(struct radv_instance *instance, VkResult error, const char *file,
int line, const char *format, ...)
{
va_list ap;
char buffer[256];
const char *error_str = vk_Result_to_str(error);
#ifndef DEBUG
if (instance && !(instance->debug_flags & RADV_DEBUG_ERRORS))
return error;
#endif
if (format) {
va_start(ap, format);
vsnprintf(buffer, sizeof(buffer), format, ap);

View File

@@ -41,110 +41,16 @@ si_write_harvested_raster_configs(struct radv_physical_device *physical_device,
unsigned raster_config,
unsigned raster_config_1)
{
unsigned sh_per_se = MAX2(physical_device->rad_info.max_sh_per_se, 1);
unsigned num_se = MAX2(physical_device->rad_info.max_se, 1);
unsigned rb_mask = physical_device->rad_info.enabled_rb_mask;
unsigned num_rb = MIN2(physical_device->rad_info.num_render_backends, 16);
unsigned rb_per_pkr = MIN2(num_rb / num_se / sh_per_se, 2);
unsigned rb_per_se = num_rb / num_se;
unsigned se_mask[4];
unsigned raster_config_se[4];
unsigned se;
se_mask[0] = ((1 << rb_per_se) - 1) & rb_mask;
se_mask[1] = (se_mask[0] << rb_per_se) & rb_mask;
se_mask[2] = (se_mask[1] << rb_per_se) & rb_mask;
se_mask[3] = (se_mask[2] << rb_per_se) & rb_mask;
assert(num_se == 1 || num_se == 2 || num_se == 4);
assert(sh_per_se == 1 || sh_per_se == 2);
assert(rb_per_pkr == 1 || rb_per_pkr == 2);
/* XXX: I can't figure out what the *_XSEL and *_YSEL
* fields are for, so I'm leaving them as their default
* values. */
if ((num_se > 2) && ((!se_mask[0] && !se_mask[1]) ||
(!se_mask[2] && !se_mask[3]))) {
raster_config_1 &= C_028354_SE_PAIR_MAP;
if (!se_mask[0] && !se_mask[1]) {
raster_config_1 |=
S_028354_SE_PAIR_MAP(V_028354_RASTER_CONFIG_SE_PAIR_MAP_3);
} else {
raster_config_1 |=
S_028354_SE_PAIR_MAP(V_028354_RASTER_CONFIG_SE_PAIR_MAP_0);
}
}
ac_get_harvested_configs(&physical_device->rad_info,
raster_config,
&raster_config_1,
raster_config_se);
for (se = 0; se < num_se; se++) {
unsigned raster_config_se = raster_config;
unsigned pkr0_mask = ((1 << rb_per_pkr) - 1) << (se * rb_per_se);
unsigned pkr1_mask = pkr0_mask << rb_per_pkr;
int idx = (se / 2) * 2;
if ((num_se > 1) && (!se_mask[idx] || !se_mask[idx + 1])) {
raster_config_se &= C_028350_SE_MAP;
if (!se_mask[idx]) {
raster_config_se |=
S_028350_SE_MAP(V_028350_RASTER_CONFIG_SE_MAP_3);
} else {
raster_config_se |=
S_028350_SE_MAP(V_028350_RASTER_CONFIG_SE_MAP_0);
}
}
pkr0_mask &= rb_mask;
pkr1_mask &= rb_mask;
if (rb_per_se > 2 && (!pkr0_mask || !pkr1_mask)) {
raster_config_se &= C_028350_PKR_MAP;
if (!pkr0_mask) {
raster_config_se |=
S_028350_PKR_MAP(V_028350_RASTER_CONFIG_PKR_MAP_3);
} else {
raster_config_se |=
S_028350_PKR_MAP(V_028350_RASTER_CONFIG_PKR_MAP_0);
}
}
if (rb_per_se >= 2) {
unsigned rb0_mask = 1 << (se * rb_per_se);
unsigned rb1_mask = rb0_mask << 1;
rb0_mask &= rb_mask;
rb1_mask &= rb_mask;
if (!rb0_mask || !rb1_mask) {
raster_config_se &= C_028350_RB_MAP_PKR0;
if (!rb0_mask) {
raster_config_se |=
S_028350_RB_MAP_PKR0(V_028350_RASTER_CONFIG_RB_MAP_3);
} else {
raster_config_se |=
S_028350_RB_MAP_PKR0(V_028350_RASTER_CONFIG_RB_MAP_0);
}
}
if (rb_per_se > 2) {
rb0_mask = 1 << (se * rb_per_se + rb_per_pkr);
rb1_mask = rb0_mask << 1;
rb0_mask &= rb_mask;
rb1_mask &= rb_mask;
if (!rb0_mask || !rb1_mask) {
raster_config_se &= C_028350_RB_MAP_PKR1;
if (!rb0_mask) {
raster_config_se |=
S_028350_RB_MAP_PKR1(V_028350_RASTER_CONFIG_RB_MAP_3);
} else {
raster_config_se |=
S_028350_RB_MAP_PKR1(V_028350_RASTER_CONFIG_RB_MAP_0);
}
}
}
}
/* GRBM_GFX_INDEX has a different offset on SI and CI+ */
if (physical_device->rad_info.chip_class < CIK)
radeon_set_config_reg(cs, R_00802C_GRBM_GFX_INDEX,
@@ -155,9 +61,7 @@ si_write_harvested_raster_configs(struct radv_physical_device *physical_device,
radeon_set_uconfig_reg(cs, R_030800_GRBM_GFX_INDEX,
S_030800_SE_INDEX(se) | S_030800_SH_BROADCAST_WRITES(1) |
S_030800_INSTANCE_BROADCAST_WRITES(1));
radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG, raster_config_se);
if (physical_device->rad_info.chip_class >= CIK)
radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1, raster_config_1);
radeon_set_context_reg(cs, R_028350_PA_SC_RASTER_CONFIG, raster_config_se[se]);
}
/* GRBM_GFX_INDEX has a different offset on SI and CI+ */
@@ -170,6 +74,9 @@ si_write_harvested_raster_configs(struct radv_physical_device *physical_device,
radeon_set_uconfig_reg(cs, R_030800_GRBM_GFX_INDEX,
S_030800_SE_BROADCAST_WRITES(1) | S_030800_SH_BROADCAST_WRITES(1) |
S_030800_INSTANCE_BROADCAST_WRITES(1));
if (physical_device->rad_info.chip_class >= CIK)
radeon_set_context_reg(cs, R_028354_PA_SC_RASTER_CONFIG_1, raster_config_1);
}
static void
@@ -234,88 +141,9 @@ si_set_raster_config(struct radv_physical_device *physical_device,
unsigned rb_mask = physical_device->rad_info.enabled_rb_mask;
unsigned raster_config, raster_config_1;
switch (physical_device->rad_info.family) {
case CHIP_TAHITI:
case CHIP_PITCAIRN:
raster_config = 0x2a00126a;
raster_config_1 = 0x00000000;
break;
case CHIP_VERDE:
raster_config = 0x0000124a;
raster_config_1 = 0x00000000;
break;
case CHIP_OLAND:
raster_config = 0x00000082;
raster_config_1 = 0x00000000;
break;
case CHIP_HAINAN:
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
case CHIP_BONAIRE:
raster_config = 0x16000012;
raster_config_1 = 0x00000000;
break;
case CHIP_HAWAII:
raster_config = 0x3a00161a;
raster_config_1 = 0x0000002e;
break;
case CHIP_FIJI:
if (physical_device->rad_info.cik_macrotile_mode_array[0] == 0x000000e8) {
/* old kernels with old tiling config */
raster_config = 0x16000012;
raster_config_1 = 0x0000002a;
} else {
raster_config = 0x3a00161a;
raster_config_1 = 0x0000002e;
}
break;
case CHIP_POLARIS10:
raster_config = 0x16000012;
raster_config_1 = 0x0000002a;
break;
case CHIP_POLARIS11:
case CHIP_POLARIS12:
raster_config = 0x16000012;
raster_config_1 = 0x00000000;
break;
case CHIP_VEGAM:
raster_config = 0x3a00161a;
raster_config_1 = 0x0000002e;
break;
case CHIP_TONGA:
raster_config = 0x16000012;
raster_config_1 = 0x0000002a;
break;
case CHIP_ICELAND:
if (num_rb == 1)
raster_config = 0x00000000;
else
raster_config = 0x00000002;
raster_config_1 = 0x00000000;
break;
case CHIP_CARRIZO:
raster_config = 0x00000002;
raster_config_1 = 0x00000000;
break;
case CHIP_KAVERI:
/* KV should be 0x00000002, but that causes problems with radeon */
raster_config = 0x00000000; /* 0x00000002 */
raster_config_1 = 0x00000000;
break;
case CHIP_KABINI:
case CHIP_MULLINS:
case CHIP_STONEY:
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
default:
fprintf(stderr,
"radv: Unknown GPU, using 0 for raster_config\n");
raster_config = 0x00000000;
raster_config_1 = 0x00000000;
break;
}
ac_get_raster_config(&physical_device->rad_info,
&raster_config,
&raster_config_1);
/* Always use the default config when all backends are enabled
* (or when we failed to determine the enabled backends).

View File

@@ -416,6 +416,46 @@ vk_format_is_srgb(VkFormat format)
return desc->colorspace == VK_FORMAT_COLORSPACE_SRGB;
}
static inline VkFormat
vk_format_no_srgb(VkFormat format)
{
switch(format) {
case VK_FORMAT_R8_SRGB:
return VK_FORMAT_R8_UNORM;
case VK_FORMAT_R8G8_SRGB:
return VK_FORMAT_R8G8_UNORM;
case VK_FORMAT_R8G8B8_SRGB:
return VK_FORMAT_R8G8B8_UNORM;
case VK_FORMAT_B8G8R8_SRGB:
return VK_FORMAT_B8G8R8_UNORM;
case VK_FORMAT_R8G8B8A8_SRGB:
return VK_FORMAT_R8G8B8A8_UNORM;
case VK_FORMAT_B8G8R8A8_SRGB:
return VK_FORMAT_B8G8R8A8_UNORM;
case VK_FORMAT_A8B8G8R8_SRGB_PACK32:
return VK_FORMAT_A8B8G8R8_UNORM_PACK32;
case VK_FORMAT_BC1_RGB_SRGB_BLOCK:
return VK_FORMAT_BC1_RGB_UNORM_BLOCK;
case VK_FORMAT_BC1_RGBA_SRGB_BLOCK:
return VK_FORMAT_BC1_RGBA_UNORM_BLOCK;
case VK_FORMAT_BC2_SRGB_BLOCK:
return VK_FORMAT_BC2_UNORM_BLOCK;
case VK_FORMAT_BC3_SRGB_BLOCK:
return VK_FORMAT_BC3_UNORM_BLOCK;
case VK_FORMAT_BC7_SRGB_BLOCK:
return VK_FORMAT_BC7_UNORM_BLOCK;
case VK_FORMAT_ETC2_R8G8B8_SRGB_BLOCK:
return VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK;
case VK_FORMAT_ETC2_R8G8B8A1_SRGB_BLOCK:
return VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK;
case VK_FORMAT_ETC2_R8G8B8A8_SRGB_BLOCK:
return VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK;
default:
assert(!vk_format_is_srgb(format));
return format;
}
}
static inline VkFormat
vk_format_stencil_only(VkFormat format)
{
@@ -488,4 +528,11 @@ vk_to_non_srgb_format(VkFormat format)
}
}
static inline unsigned
vk_format_get_nr_components(VkFormat format)
{
const struct vk_format_description *desc = vk_format_description(format);
return desc->nr_channels;
}
#endif /* VK_FORMAT_H */

View File

@@ -38,7 +38,6 @@
#include "util/u_atomic.h"
static void radv_amdgpu_winsys_bo_destroy(struct radeon_winsys_bo *_bo);
static int
@@ -306,7 +305,9 @@ radv_amdgpu_winsys_bo_create(struct radeon_winsys *_ws,
}
r = amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general,
size, alignment, 0, &va, &va_handle, 0);
size, alignment, 0, &va, &va_handle,
(flags & RADEON_FLAG_32BIT ? AMDGPU_VA_RANGE_32_BIT : 0) |
AMDGPU_VA_RANGE_HIGH);
if (r)
goto error_va_alloc;
@@ -424,7 +425,8 @@ radv_amdgpu_winsys_bo_from_ptr(struct radeon_winsys *_ws,
goto error;
if (amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general,
size, 1 << 12, 0, &va, &va_handle, 0))
size, 1 << 12, 0, &va, &va_handle,
AMDGPU_VA_RANGE_HIGH))
goto error_va_alloc;
if (amdgpu_bo_va_op(buf_handle, 0, size, va, 0, AMDGPU_VA_OP_MAP))
@@ -480,7 +482,8 @@ radv_amdgpu_winsys_bo_from_fd(struct radeon_winsys *_ws,
goto error_query;
r = amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general,
result.alloc_size, 1 << 20, 0, &va, &va_handle, 0);
result.alloc_size, 1 << 20, 0, &va, &va_handle,
AMDGPU_VA_RANGE_HIGH);
if (r)
goto error_query;
@@ -501,6 +504,7 @@ radv_amdgpu_winsys_bo_from_fd(struct radeon_winsys *_ws,
bo->size = result.alloc_size;
bo->is_shared = true;
bo->ws = ws;
bo->ref_count = 1;
radv_amdgpu_add_buffer_to_global_list(bo);
return (struct radeon_winsys_bo *)bo;
error_va_map:

View File

@@ -45,13 +45,6 @@ do_winsys_init(struct radv_amdgpu_winsys *ws, int fd)
if (!ac_query_gpu_info(fd, ws->dev, &ws->info, &ws->amdinfo))
return false;
/* LLVM 5.0 is required for GFX9. */
if (ws->info.chip_class >= GFX9 && HAVE_LLVM < 0x0500) {
fprintf(stderr, "amdgpu: LLVM 5.0 is required, got LLVM %i.%i\n",
HAVE_LLVM >> 8, HAVE_LLVM & 255);
return false;
}
ws->addrlib = amdgpu_addr_create(&ws->info, &ws->amdinfo, &ws->info.max_alignment);
if (!ws->addrlib) {
fprintf(stderr, "amdgpu: Cannot create addrlib.\n");

View File

@@ -60,6 +60,6 @@ PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
include Makefile.genxml.am
include Makefile.cle.am
include Makefile.vc5.am
include Makefile.v3d.am
CLEANFILES += $(BUILT_SOURCES)

View File

@@ -3,9 +3,9 @@ noinst_LTLIBRARIES += libbroadcom_v33.la
noinst_LTLIBRARIES += libbroadcom_v41.la
noinst_LTLIBRARIES += libbroadcom_v42.la
if USE_VC5_SIMULATOR
AM_CFLAGS += $(VC5_SIMULATOR_CFLAGS)
libbroadcom_la_LDFLAGS = $(VC5_SIMULATOR_LIBS)
if USE_V3D_SIMULATOR
AM_CFLAGS += $(V3D_SIMULATOR_CFLAGS)
libbroadcom_la_LDFLAGS = $(V3D_SIMULATOR_LIBS)
endif
libbroadcom_v33_la_SOURCES = $(BROADCOM_PER_VERSION_SOURCES)

View File

@@ -43,7 +43,7 @@ pack_header = """%(license)s
#ifndef %(guard)s
#define %(guard)s
#include "v3d_packet_helpers.h"
#include "cle/v3d_packet_helpers.h"
"""

View File

@@ -702,15 +702,17 @@
<field name="Fragment Shader Code Address" size="29" start="99" type="address"/>
<field name="Fragment Shader 2-way threadable" size="1" start="96" type="bool"/>
<field name="Fragment Shader 4-way threadable" size="1" start="97" type="bool"/>
<field name="Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Uniforms Address" size="32" start="16b" type="address"/>
<field name="Vertex Shader Code Address" size="32" start="20b" type="address"/>
<field name="Vertex Shader 2-way threadable" size="1" start="160" type="bool"/>
<field name="Vertex Shader 4-way threadable" size="1" start="161" type="bool"/>
<field name="Vertex Shader Propagate NaNs" size="1" start="162" type="bool"/>
<field name="Vertex Shader Uniforms Address" size="32" start="24b" type="address"/>
<field name="Coordinate Shader Code Address" size="32" start="28b" type="address"/>
<field name="Coordinate Shader 2-way threadable" size="1" start="224" type="bool"/>
<field name="Coordinate Shader 4-way threadable" size="1" start="225" type="bool"/>
<field name="Coordinate Shader Propagate NaNs" size="1" start="226" type="bool"/>
<field name="Coordinate Shader Uniforms Address" size="32" start="32b" type="address"/>
</struct>

View File

@@ -277,7 +277,7 @@
<field name="Clear buffer being stored" size="1" start="18" type="bool"/>
<field name="Output Image Format" size="6" start="12" type="Output Image Format"/>
<field name="Decimate" size="2" start="10" type="Decimate Mode"/>
<field name="Decimate mode" size="2" start="10" type="Decimate Mode"/>
<field name="A dithered" size="1" start="9" type="bool"/>
<field name="BGR dithered" size="1" start="8" type="bool"/>
@@ -311,7 +311,7 @@
<field name="Input Image Format" size="6" start="12" type="Output Image Format"/>
<field name="Decimate" size="2" start="10" type="Decimate Mode"/>
<field name="Decimate mode" size="2" start="10" type="Decimate Mode"/>
<field name="Flip Y" size="1" start="7" type="bool"/>
@@ -475,6 +475,11 @@
<field name="Varying offset V0" size="4" start="0" type="uint"/>
</packet>
<packet code="91" name="Sample State">
<field name="Coverage" size="16" start="16" type="uint"/> <!-- float-1-8-7 -->
<field name="Mask" size="4" start="0" type="uint"/>
</packet>
<packet code="92" name="Occlusion Query Counter">
<field name="address" size="32" start="0" type="address"/>
</packet>
@@ -781,17 +786,19 @@
<field name="Fragment Shader Code Address" size="32" start="12b" type="address"/>
<field name="Fragment Shader 4-way threadable" size="1" start="96" type="bool"/>
<field name="Fragment Shader start in final thread section" size="1" start="97" type="bool"/>
<field name="Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Uniforms Address" size="32" start="16b" type="address"/>
<field name="Vertex Shader Code Address" size="32" start="20b" type="address"/>
<field name="Vertex Shader 4-way threadable" size="1" start="160" type="bool"/>
<field name="Vertex Shader start in final thread section" size="1" start="161" type="bool"/>
<field name="Vertex Shader Propagate NaNs" size="1" start="162" type="bool"/>
<field name="Vertex Shader Uniforms Address" size="32" start="24b" type="address"/>
<field name="Coordinate Shader Code Address" size="32" start="28b" type="address"/>
<field name="Coordinate Shader 4-way threadable" size="1" start="224" type="bool"/>
<field name="Coordinate Shader start in final thread section" size="1" start="225" type="bool"/>
<field name="Coordinate Shader Propagate NaNs" size="1" start="226" type="bool"/>
<field name="Coordinate Shader Uniforms Address" size="32" start="32b" type="address"/>
</struct>

View File

@@ -278,7 +278,7 @@
<field name="Clear buffer being stored" size="1" start="18" type="bool"/>
<field name="Output Image Format" size="6" start="12" type="Output Image Format"/>
<field name="Decimate" size="2" start="10" type="Decimate Mode"/>
<field name="Decimate mode" size="2" start="10" type="Decimate Mode"/>
<field name="A dithered" size="1" start="9" type="bool"/>
<field name="BGR dithered" size="1" start="8" type="bool"/>
@@ -312,7 +312,7 @@
<field name="Input Image Format" size="6" start="12" type="Output Image Format"/>
<field name="Decimate" size="2" start="10" type="Decimate Mode"/>
<field name="Decimate mode" size="2" start="10" type="Decimate Mode"/>
<field name="Flip Y" size="1" start="7" type="bool"/>
@@ -476,6 +476,11 @@
<field name="Varying offset V0" size="4" start="0" type="uint"/>
</packet>
<packet code="91" name="Sample State">
<field name="Coverage" size="16" start="16" type="uint"/> <!-- float-1-8-7 -->
<field name="Mask" size="4" start="0" type="uint"/>
</packet>
<packet code="92" name="Occlusion Query Counter">
<field name="address" size="32" start="0" type="address"/>
</packet>
@@ -782,17 +787,19 @@
<field name="Fragment Shader Code Address" size="32" start="12b" type="address"/>
<field name="Fragment Shader 4-way threadable" size="1" start="96" type="bool"/>
<field name="Fragment Shader start in final thread section" size="1" start="97" type="bool"/>
<field name="Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Propagate NaNs" size="1" start="98" type="bool"/>
<field name="Fragment Shader Uniforms Address" size="32" start="16b" type="address"/>
<field name="Vertex Shader Code Address" size="32" start="20b" type="address"/>
<field name="Vertex Shader 4-way threadable" size="1" start="160" type="bool"/>
<field name="Vertex Shader start in final thread section" size="1" start="161" type="bool"/>
<field name="Vertex Shader Propagate NaNs" size="1" start="162" type="bool"/>
<field name="Vertex Shader Uniforms Address" size="32" start="24b" type="address"/>
<field name="Coordinate Shader Code Address" size="32" start="28b" type="address"/>
<field name="Coordinate Shader 4-way threadable" size="1" start="224" type="bool"/>
<field name="Coordinate Shader start in final thread section" size="1" start="225" type="bool"/>
<field name="Coordinate Shader Propagate NaNs" size="1" start="226" type="bool"/>
<field name="Coordinate Shader Uniforms Address" size="32" start="32b" type="address"/>
</struct>

View File

@@ -57,7 +57,11 @@ extern uint32_t V3D_DEBUG;
#ifdef HAVE_ANDROID_PLATFORM
#define LOG_TAG "BROADCOM-MESA"
#if ANDROID_API_LEVEL >= 26
#include <log/log.h>
#else
#include <cutils/log.h>
#endif /* use log/log.h start from android 8 major version */
#ifndef ALOGW
#define ALOGW LOGW
#endif

View File

@@ -436,6 +436,7 @@ emit_fragment_varying(struct v3d_compile *c, nir_variable *var,
/* FALLTHROUGH */
case INTERP_MODE_SMOOTH:
if (var->data.centroid) {
BITSET_SET(c->centroid_flags, i);
return vir_FADD(c, vir_FMUL(c, vary,
c->payload_w_centroid), r5);
} else {
@@ -754,6 +755,10 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
result = vir_NOT(c, src[0]);
break;
case nir_op_ufind_msb:
result = vir_SUB(c, vir_uniform_ui(c, 31), vir_CLZ(c, src[0]));
break;
case nir_op_imul:
result = vir_UMUL(c, src[0], src[1]);
break;
@@ -852,6 +857,13 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
result = vir_FDY(c, src[0]);
break;
case nir_op_uadd_carry:
vir_PF(c, vir_ADD(c, src[0], src[1]), V3D_QPU_PF_PUSHC);
result = vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFA,
vir_uniform_ui(c, ~0),
vir_uniform_ui(c, 0)));
break;
default:
fprintf(stderr, "unknown NIR ALU inst: ");
nir_print_instr(&instr->instr, stderr);
@@ -957,6 +969,9 @@ emit_frag_end(struct v3d_compile *c)
conf |= TLB_SAMPLE_MODE_PER_PIXEL;
conf |= (7 - rt) << TLB_RENDER_TARGET_SHIFT;
if (c->fs_key->swap_color_rb & (1 << rt))
num_components = MAX2(num_components, 3);
assert(num_components != 0);
switch (glsl_get_base_type(var->type)) {
case GLSL_TYPE_UINT:
@@ -985,7 +1000,7 @@ emit_frag_end(struct v3d_compile *c)
struct qreg b = color[2];
struct qreg a = color[3];
if (c->fs_key->f32_color_rb) {
if (c->fs_key->f32_color_rb & (1 << rt)) {
conf |= TLB_TYPE_F32_COLOR;
conf |= ((num_components - 1) <<
TLB_VEC_SIZE_MINUS_1_SHIFT);
@@ -1348,7 +1363,7 @@ ntq_setup_outputs(struct v3d_compile *c)
assert(array_len == 1);
(void)array_len;
for (int i = 0; i < glsl_get_vector_elements(var->type); i++) {
for (int i = 0; i < 4; i++) {
add_output(c, loc + var->data.location_frac + i,
var->data.location,
var->data.location_frac + i);
@@ -1893,8 +1908,11 @@ const nir_shader_compiler_options v3d_nir_options = {
.lower_all_io_to_temps = true,
.lower_extract_byte = true,
.lower_extract_word = true,
.lower_bitfield_insert = true,
.lower_bitfield_extract = true,
.lower_bfm = true,
.lower_bitfield_insert_to_shifts = true,
.lower_bitfield_extract_to_shifts = true,
.lower_bitfield_reverse = true,
.lower_bit_count = true,
.lower_pack_unorm_2x16 = true,
.lower_pack_snorm_2x16 = true,
.lower_pack_unorm_4x8 = true,
@@ -1902,12 +1920,15 @@ const nir_shader_compiler_options v3d_nir_options = {
.lower_unpack_unorm_4x8 = true,
.lower_unpack_snorm_4x8 = true,
.lower_fdiv = true,
.lower_find_lsb = true,
.lower_ffma = true,
.lower_flrp32 = true,
.lower_fpow = true,
.lower_fsat = true,
.lower_fsqrt = true,
.lower_ifind_msb = true,
.lower_ldexp = true,
.lower_mul_high = true,
.native_integers = true,
};
@@ -1985,6 +2006,29 @@ vir_emit_last_thrsw(struct v3d_compile *c)
c->last_thrsw->is_last_thrsw = true;
}
/* There's a flag in the shader for "center W is needed for reasons other than
* non-centroid varyings", so we just walk the program after VIR optimization
* to see if it's used. It should be harmless to set even if we only use
* center W for varyings.
*/
static void
vir_check_payload_w(struct v3d_compile *c)
{
if (c->s->info.stage != MESA_SHADER_FRAGMENT)
return;
vir_for_each_inst_inorder(inst, c) {
for (int i = 0; i < vir_get_nsrc(inst); i++) {
if (inst->src[i].file == QFILE_REG &&
inst->src[i].index == 0) {
c->uses_center_w = true;
return;
}
}
}
}
void
v3d_nir_to_vir(struct v3d_compile *c)
{
@@ -2024,6 +2068,8 @@ v3d_nir_to_vir(struct v3d_compile *c)
vir_optimize(c);
vir_lower_uniforms(c);
vir_check_payload_w(c);
/* XXX: vir_schedule_instructions(c); */
if (V3D_DEBUG & (V3D_DEBUG_VIR |

View File

@@ -41,7 +41,15 @@ struct v3d_qpu_validate_state {
int last_sfu_write;
int last_branch_ip;
int last_thrsw_ip;
/* Set when we've found the last-THRSW signal, or if we were started
* in single-segment mode.
*/
bool last_thrsw_found;
/* Set when we've found the THRSW after the last THRSW */
bool thrend_found;
int thrsw_count;
};
@@ -116,6 +124,19 @@ qpu_validate_inst(struct v3d_qpu_validate_state *state, struct qinst *qinst)
fail_instr(state, "LDUNIF after a LDVARY");
}
/* GFXH-1633 */
bool last_reads_ldunif = (state->last && (state->last->sig.ldunif ||
state->last->sig.ldunifrf));
bool last_reads_ldunifa = (state->last && (state->last->sig.ldunifa ||
state->last->sig.ldunifarf));
bool reads_ldunif = inst->sig.ldunif || inst->sig.ldunifrf;
bool reads_ldunifa = inst->sig.ldunifa || inst->sig.ldunifarf;
if ((last_reads_ldunif && reads_ldunifa) ||
(last_reads_ldunifa && reads_ldunif)) {
fail_instr(state,
"LDUNIF and LDUNIFA can't be next to each other");
}
int tmu_writes = 0;
int sfu_writes = 0;
int vpm_writes = 0;
@@ -204,6 +225,9 @@ qpu_validate_inst(struct v3d_qpu_validate_state *state, struct qinst *qinst)
if (in_branch_delay_slots(state))
fail_instr(state, "THRSW in a branch delay slot.");
if (state->last_thrsw_found)
state->thrend_found = true;
if (state->last_thrsw_ip == state->ip - 1) {
/* If it's the second THRSW in a row, then it's just a
* last-thrsw signal.
@@ -221,6 +245,28 @@ qpu_validate_inst(struct v3d_qpu_validate_state *state, struct qinst *qinst)
}
}
if (state->thrend_found &&
state->last_thrsw_ip - state->ip <= 2 &&
inst->type == V3D_QPU_INSTR_TYPE_ALU) {
if ((inst->alu.add.op != V3D_QPU_A_NOP &&
!inst->alu.add.magic_write)) {
fail_instr(state, "RF write after THREND");
}
if ((inst->alu.mul.op != V3D_QPU_M_NOP &&
!inst->alu.mul.magic_write)) {
fail_instr(state, "RF write after THREND");
}
if (v3d_qpu_sig_writes_address(devinfo, &inst->sig))
fail_instr(state, "RF write after THREND");
/* GFXH-1625: No TMUWT in the last instruction */
if (state->last_thrsw_ip - state->ip == 2 &&
inst->alu.add.op == V3D_QPU_A_TMUWT)
fail_instr(state, "TMUWT in last instruction");
}
if (inst->type == V3D_QPU_INSTR_TYPE_BRANCH) {
if (in_branch_delay_slots(state))
fail_instr(state, "branch in a branch delay slot.");
@@ -262,6 +308,8 @@ qpu_validate(struct v3d_compile *c)
.last_thrsw_ip = -10,
.last_branch_ip = -10,
.ip = 0,
.last_thrsw_found = !c->last_thrsw,
};
vir_for_each_block(block, c) {
@@ -273,8 +321,6 @@ qpu_validate(struct v3d_compile *c)
"thread switch found without last-THRSW in program");
}
if (state.thrsw_count == 0 ||
(state.last_thrsw_found && state.thrsw_count == 1)) {
if (!state.thrend_found)
fail_instr(&state, "No program-end THRSW found");
}
}

Some files were not shown because too many files have changed in this diff Show More