Compare commits

...

42 Commits

Author SHA1 Message Date
Andres Gomez
dcd3786e6e Update version to 18.2.0-rc3
Signed-off-by: Andres Gomez <agomez@igalia.com>
2018-08-15 14:53:50 +03:00
Bas Nieuwenhuizen
d82c36a4c7 radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9.
Follow radeonsi.

Fixes: 3665f66ef2 "radv: Add support for ETC2 textures."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4bb6c49375)
2018-08-14 23:52:11 +03:00
Bas Nieuwenhuizen
8061ee5883 radv: Update to new VK_EXT_vertex_attribute_divisor to version 2.
Behavior wrt firstInstance got changed, and a divisor of 0 has been
disallowed.

The new version of the ext got published in specification 1.1.81.

Sending to stable since the only known user is DXVK, which needs
this for correctness.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 66e12451ac)
2018-08-14 23:51:14 +03:00
Bas Nieuwenhuizen
bbd95de921 radv: Fix missing Android platform define.
CC: <mesa-stable@lists.freedesktop.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit bf33ca7512)
2018-08-14 23:50:05 +03:00
Kenneth Graunke
b696ab172c blorp: Properly handle Z24X8 blits.
One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS
destinations were broken was that an earlier layer was swapping it
out for B8G8R8A8_UNORM.  That made Z24X8 -> Z24X8 blits work.

However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken.
The old code only considered one format at a time, without thinking
that format conversion may need to occur.

This patch moves the translation out to a place where it can consider
both formats.  If both are Z24X8, we continue using B8G8R8A8_UNORM to
avoid having to do shader math workarounds.  If we have a Z24X8
destination, but a non-matching source, we use our shader hacks to
actually render to it properly.

Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit de57926dc9)
2018-08-13 12:45:28 +03:00
Kenneth Graunke
f7e8bc0f23 blorp: Don't try to use R32_UNORM for R24_UNORM_X8_TYPELESS rendering.
The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so
Jason decided to fake it with a bit of shader math and R32_UNORM RTs.

The only problem is that R32_UNORM isn't renderable either...so we've
just traded one bad format for another.

This patch makes us use R32_UINT instead.

Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 8a29086285)
2018-08-13 12:44:46 +03:00
Jason Ekstrand
90278c7f95 intel: Switch the order of the 2x MSAA sample positions
The Vulkan 1.1.82 spec flipped the order to better match D3D.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a9f7bcfdf9)
2018-08-13 12:43:03 +03:00
Gert Wollny
0c1832765f mesa/st: ETC2 now uses R8G8B8A8_SRGB as fallback
The check for ETC2 compatibility was not updated when the fallback
format was changed.

Fixes: 71867a0a61
   st/mesa: Fall back to R8G8B8A8_SRGB for ETC2

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e94095ec30)
2018-08-10 15:31:40 +03:00
Eric Anholt
94da454726 egl: Fix leak of X11 pixmaps backing pbuffers in DRI3.
This is basically copied from the DRI2 destroy path.  Without this,
Raspberry Pi would quickly run out of CMA during the EGL tests in the CTS
due to all the pixmaps laying around.

Fixes: f35198bade ("egl/x11: Implement dri3 support with loader's dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit b618d7ea59)
2018-08-10 15:31:08 +03:00
Kenneth Graunke
dadc50add5 intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
(cherry picked from commit 08a5c395ab)
2018-08-10 15:30:10 +03:00
Adam Jackson
e91782ed55 glx: GLX_MESA_multithread_makecurrent is direct-only
This extension is not defined for indirect contexts. Marking it as
"client only", as the old code did here, would make the extension
available in indirect contexts, even though the server would certainly
not have it in its extension list.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 63a6b719d9)
2018-08-10 15:29:09 +03:00
Tapani Pälli
9df3460724 glsl: handle error case with ast_post_inc, ast_post_dec
Return ir_rvalue::error_value with ast_post_inc, ast_post_dec if
parser error was emitted previously. This way process_array_size
won't see bogus IR generated like with commit 9c676a6427.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98699
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 03a5acec68)
2018-08-10 15:26:21 +03:00
vadym.shovkoplias
8be5985e65 drirc: Allow extension midshader for Metro Redux
This fixes both Metro 2033 Redux and Metro Last Light Redux

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99730
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit e0de26eacc)
2018-08-10 15:24:45 +03:00
Eric Engestrom
6606cacd3d intel/tools: add missing variable initialisation
Fixes: 6a60beba40 "intel/tools: Add an error state to aub translator"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit aac80f7597)
2018-08-09 15:19:00 +03:00
Eric Anholt
1378f33142 vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.
We won't have an FD if we're just having the server wait on a fence
created by eglCreateSyncKHR().  Our seqno fences will happen in order, so
server-side waits are no-ops in that case.  Fixes
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete

Fixes: b0acc3a562 ("broadcom/vc4: Native fence fd support")
(cherry picked from commit cfe69d0aaa)
2018-08-09 03:48:49 +03:00
Emil Velikov
9dacf10ca8 swr: don't export swr_create_screen_internal
With earlier rework the user and provider of the symbol are within the
same binary. Thus there's no point in exporting the function.

Spotted while reviewing patch from Chuck, that nearly added another
unneeded PUBLIC function.

Cc: Chuck Atkins <chuck.atkins@kitware.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Fixes: f50aa21456 "(swr: build driver proper separate from rasterizer")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com<mailto:george.kyriazis@intel.com>>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com<mailto:chuck.atkins@kitware.com>>
(cherry picked from commit 54d844897f)
2018-08-09 03:48:12 +03:00
Juan A. Suarez Romero
7af6be8864 wayland/egl: update surface size on window resize
According to EGL 1.5 spec, section 3.10.1.1 ("Native Window Resizing"):

  "If the native window corresponding to _surface_ has been resized
   prior to the swap, _surface_ must be resized to match. _surface_ will
   normally be resized by the EGL implementation at the time the native
   window is resized. If the implementation cannot do this transparently
   to the client, then *eglSwapBuffers* must detect the change and
   resize surface prior to copying its pixels to the native window."

So far, resizing a native window in Wayland/EGL was interpreted in Mesa
as a request to resize, which is not executed until the first draw call.
And hence, surface size is not updated until executing it. Thus,
querying the surface size with eglQuerySurface() after a window resize
still returns the old values.

This commit updates the surface size values as soon as the resize is
done, even when the real resize is done in the draw call. This makes the
semantics that any native window resize request take effect inmediately,
and if user calls eglQuerySurface() it will return the new resized
values.

v2: update surface size if there isn't a back surface (Daniel)

CC: Daniel Stone <daniel@fooishbar.org>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit a9fb331ea7)
2018-08-09 03:47:42 +03:00
Juan A. Suarez Romero
9ad14f71e6 wayland/egl: initialize window surface size to window size
When creating a windows surface with eglCreateWindowSurface(), the
width and height returned by eglQuerySurface(EGL_{WIDTH,HEIGHT}) is
invalid until buffers are updated (like calling glClear()).

But according to EGL 1.5 spec, section 3.5.6 ("Surface Attributes"):

  "Querying EGL_WIDTH and EGL_HEIGHT returns respectively the width and
   height, in pixels, of the surface. For a window or pixmap surface,
   these values are initially equal to the width and height of the
   native window or pixmap with respect to which the surface was
   created"

This fixes dEQP-EGL.functional.color_clears.* CTS tests

v2:
- Do not modify attached_{width,height} (Daniel)
- Do not update size on resizing window (Brendan)

CC: Daniel Stone <daniel@fooishbar.org>
CC: Brendan King <brendan.king@imgtec.com>
CC: mesa-stable@lists.freedesktop.org
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 1fe7cbdf05)
2018-08-09 03:47:21 +03:00
Emil Velikov
6ae0a639ec autotools: use correct gl.pc LIBS when using glvnd
This is more of a hack, since glvnd itself should be providing the file.
Until that happens, ensure the libs is correctly set to -lGL

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 315c46cfdc)
2018-08-09 03:46:50 +03:00
Emil Velikov
c709206977 autotools: error out when building with mangling and glvnd
It's not a thing that can work, nor is a wise idea to attempt.

v2: Tweak error message (Dylan)

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
(cherry picked from commit 25a9450a44)
2018-08-09 03:46:23 +03:00
Emil Velikov
33ac5fb678 autotools: error out when using the broken --with-{gl, osmesa}-lib-name
The toggles were broken with the introduction of --enable-mangling.
Fixing that up might be possible, but it's not worth the complexity
since one can rename the libraries at any point.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit d5ac236471)
2018-08-09 03:45:51 +03:00
Emil Velikov
f0ae95492a automake: require shared glapi when using DRI based libGL
This has been a requirement for ages, yet it seems like we never
explicitly errored out during configure.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit a7ea7511ba)
2018-08-09 03:45:20 +03:00
Eric Anholt
a42afc8504 vc4: Ignore samplers for finding uniform offsets.
Fixes:
dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 69158c452b)
2018-08-09 03:44:48 +03:00
Eric Anholt
adfbf1fe84 vc4: Respect a sampler view's first_layer field.
Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 9ab6912a00)
2018-08-09 03:43:44 +03:00
Andres Gomez
4a25d8b623 Update version to 18.2.0-rc2
Signed-off-by: Andres Gomez <agomez@igalia.com>
2018-08-09 02:29:47 +03:00
Jon Turney
4a769c8850 meson: use correct keyword to fix a meson warning
With a sufficently recent meson, the following warning is produced:

WARNING: Passed invalid keyword argument "extra_args".
WARNING: This will become a hard error in the future.

It seems that compiler.links(args:) is meant here.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit a48c0659e1)
2018-08-07 20:59:51 +03:00
Eric Anholt
d39fb6d157 vc4: Fix a leak of the no-vertex-elements workaround BO.
Fixes: bd1925562a ("vc4: Convert the driver to emitting the shader record using pack macros.")
(cherry picked from commit 9507e03699)
2018-08-07 20:57:27 +03:00
Eric Anholt
ed117c27e1 vc4: Fix context creation when syncobjs aren't supported.
Noticed when trying to run current Mesa on rpi's downstream kernel.

Fixes: b0acc3a562 ("broadcom/vc4: Native fence fd support")
(cherry picked from commit 86095e9bb1)
2018-08-07 20:57:01 +03:00
Chad Versace
fdbbe4c50c drisw: Fix build on Android Nougat, which lacks shm (v2)
In commit cf54bd5e8, dri_sw_winsys.c began using <sys/shm.h> to support
the new functions putImageShm, getImageShm in DRI_SWRastLoader. But
Android began supporting System V shared memory only in Oreo. Nougat has
no shm headers.

Fix the build by ifdef'ing out the shm code on Nougat.

Fixes: cf54bd5e8 "drisw: use shared memory when possible"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@gmail.com>
(cherry picked from commit aaa41cd297)
2018-08-07 20:56:30 +03:00
Gert Wollny
3c3589a0ba meson, install_megadrivers: Also remove stale symlinks
os.path.exists doesn't return True for stale symlinks, but they are in
the way later, when a link/file with the same name is to be created.
For instance it is conceivable that the pointed to file is replaced by
a file with a new name, and then the symlink is dead.

To handle this check specifically for all existing symlinks to be
removed. (This bugged me for some time with a link libXvMCr600.so
always being in the way of installing this file)

v2: use only os.lexist and replace all instances of os.exist (Dylan Baker)

v3: handle directory check correctly (Eric Engestrom)

Fixes: f7f1b30f81
       ("meson: extend install_megadrivers script to handle symmlinking")

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>(v2 minus dir check)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 7a46b2d641)
2018-08-07 20:55:56 +03:00
Eric Anholt
37fa81f631 v3d: Emit the VCM_CACHE_SIZE packet.
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1561e4984e)
2018-08-07 20:55:09 +03:00
Eric Anholt
71aa72d695 v3d: Avoid spilling that breaks the r5 usage after a ldvary.
Fixes bad rendering when forcing 2 spills in glxgears.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 50a8713d4f)
2018-08-07 20:54:41 +03:00
Eric Anholt
c8d41bc58d v3d: Make sure that QPU instruction-has-a-dest matches VIR.
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment.  In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f2c0d310d6)
2018-08-07 20:54:10 +03:00
Eric Anholt
c3b1a6d7fa v3d: Wait for TMU writes to complete before continuing after a spill.
The simulator complained that we had write responses outstanding at shader
end.  It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3f9cb2eb05)
2018-08-07 20:53:42 +03:00
Eric Anholt
cce78368df v3d: Make sure we don't emit a thrsw before the last one finished.
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ccbe33af5b)
2018-08-07 20:52:48 +03:00
Lionel Landwerlin
b6e9ef1556 intel: aubinator: fix read the context/ring
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.

Fixes: 144b40db54 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 35955afa7a)
2018-08-06 16:43:31 +03:00
Karol Herbst
c18ed873a5 nvc0/ir: return 0 in imageLoad on incomplete textures
We already guarded all OP_SULDP against out of bound accesses, but we
ended up just reusing whatever value was stored in the dest registers.

Fixes CTS test shader_image_load_store.incomplete_textures

v2: fix for loads not ending up with predicates (bindless_texture)
v3: fix replacing the def

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit c3325097be)
2018-08-06 16:42:47 +03:00
Marek Olšák
88c36f4379 gallium/u_vbuf: handle indirect multidraws correctly and efficiently (v3)
v2: need to do MAX{start+count} instead of MAX{count}
    added piglit tests
v3: use malloc

Cc: 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 0f79b2015b)
2018-08-06 15:46:19 +03:00
Mauro Rossi
bbeb78620c android: radv: build vulkan.radv conditionally to radeonsi
A problem was reported with arm,arm64 targets build due to missing
libLLVM shared library dependency with AOSP; to avoid this issue vulkan.radv
is built conditionally only when radeonsi is in BOARD_GPU_DRIVERS

Fixes: 0ca153f869 ("android: radv: enable build of vulkan.radv HAL module")

Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1c7a2433b2)
2018-08-06 15:44:06 +03:00
Andres Gomez
9ddff68f6f intel/tools: add error2aub creation into autotools
Tarball distribution is done through "make distcheck". We include the
meson targets also into autotools so they won't fail when building
from the tarball.

Fixes: 6a60beba40 ("intel/tools: Add an error state to aub translator")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 2d4d139877)
2018-08-02 21:21:22 +03:00
Vlad Golovkin
2e903df72f swr: Remove unnecessary memset call
Zeroing memory after calloc is not necessary. This also allows to avoid
possible crash when allocation fails, because memset is called before
checking screen for NULL.

Fixes: a29d63ecf7 "swr: refactor swr_create_screen to allow
                              for proper cleanup on error"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 9d3a2394e4)
2018-08-02 21:20:52 +03:00
Andres Gomez
cb542ac550 Update version to 18.2.0-rc1
Signed-off-by: Andres Gomez <agomez@igalia.com>
2018-08-02 18:28:04 +03:00
45 changed files with 493 additions and 106 deletions

View File

@@ -1 +1 @@
18.2.0-devel
18.2.0-rc3

View File

@@ -43,13 +43,15 @@ def main():
master = os.path.join(to, os.path.basename(args.megadriver))
if not os.path.exists(to):
if os.path.lexists(to):
os.unlink(to)
os.makedirs(to)
shutil.copy(args.megadriver, master)
for driver in args.drivers:
abs_driver = os.path.join(to, driver)
if os.path.exists(abs_driver):
if os.path.lexists(abs_driver):
os.unlink(abs_driver)
print('installing {} to {}'.format(args.megadriver, abs_driver))
os.link(master, abs_driver)
@@ -60,7 +62,7 @@ def main():
name, ext = os.path.splitext(driver)
while ext != '.so':
if os.path.exists(name):
if os.path.lexists(name):
os.unlink(name)
os.symlink(driver, name)
name, ext = os.path.splitext(name)

View File

@@ -1503,15 +1503,15 @@ fi
AC_ARG_WITH([gl-lib-name],
[AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
[specify GL library name @<:@default=GL@:>@])],
[GL_LIB=$withval],
[GL_LIB="$DEFAULT_GL_LIB_NAME"])
[AC_MSG_ERROR([--with-gl-lib-name is no longer supported. Rename the library manually if needed.])],
[])
AC_ARG_WITH([osmesa-lib-name],
[AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
[specify OSMesa library name @<:@default=OSMesa@:>@])],
[OSMESA_LIB=$withval],
[OSMESA_LIB=OSMesa])
AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
[AC_MSG_ERROR([--with-osmesa-lib-name is no longer supported. Rename the library manually if needed.])],
[])
GL_LIB="$DEFAULT_GL_LIB_NAME"
OSMESA_LIB=OSMesa
dnl
dnl Mangled Mesa support
@@ -1523,6 +1523,9 @@ AC_ARG_ENABLE([mangling],
[enable_mangling=no]
)
if test "x${enable_mangling}" = "xyes" ; then
if test "x$enable_libglvnd" = xyes; then
AC_MSG_ERROR([Conflicting options --enable-mangling and --enable-libglvnd.])
fi
DEFINES="${DEFINES} -DUSE_MGL_NAMESPACE"
GL_LIB="Mangled${GL_LIB}"
OSMESA_LIB="Mangled${OSMESA_LIB}"
@@ -1530,6 +1533,15 @@ fi
AC_SUBST([GL_LIB])
AC_SUBST([OSMESA_LIB])
dnl HACK when building glx + glvnd we ship gl.pc, despite that glvnd should do it
dnl Thus we need to use GL as a DSO name.
if test "x$enable_libglvnd" = xyes -a "x$enable_glx" != xno; then
GL_PKGCONF_LIB="GL"
else
GL_PKGCONF_LIB="$GL_LIB"
fi
AC_SUBST([GL_PKGCONF_LIB])
# Check for libdrm
PKG_CHECK_MODULES([LIBDRM], [libdrm >= $LIBDRM_REQUIRED],
[have_libdrm=yes], [have_libdrm=no])
@@ -1658,6 +1670,8 @@ xxlib | xgallium-xlib)
xdri)
# DRI-based GLX
require_dri_shared_libs_and_glapi "GLX"
# find the DRI deps for libGL
dri_modules="x11 xext xdamage >= $XDAMAGE_REQUIRED xfixes x11-xcb xcb xcb-glx >= $XCBGLX_REQUIRED"

View File

@@ -989,7 +989,7 @@ if cc.links('''
freelocale(loc);
return 0;
}''',
extra_args : pre_args,
args : pre_args,
name : 'strtod has locale support')
pre_args += '-DHAVE_STRTOD_L'
endif

View File

@@ -27,4 +27,6 @@ include $(LOCAL_PATH)/Makefile.sources
include $(LOCAL_PATH)/Android.addrlib.mk
include $(LOCAL_PATH)/Android.common.mk
ifneq ($(filter radeonsi,$(BOARD_GPU_DRIVERS)),)
include $(LOCAL_PATH)/vulkan/Android.mk
endif

View File

@@ -62,6 +62,7 @@ LOCAL_SRC_FILES := \
$(VULKAN_FILES)
LOCAL_CFLAGS += -DFORCE_BUILD_AMDGPU # instructs LLVM to declare LLVMInitializeAMDGPU* functions
LOCAL_CFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR
$(call mesa-build-with-llvm)
@@ -140,6 +141,7 @@ LOCAL_SRC_FILES := \
$(VULKAN_ANDROID_FILES)
LOCAL_CFLAGS += -DFORCE_BUILD_AMDGPU # instructs LLVM to declare LLVMInitializeAMDGPU* functions
LOCAL_CFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR
$(call mesa-build-with-llvm)

View File

@@ -124,7 +124,7 @@ VULKAN_LIB_DEPS += \
endif
if HAVE_PLATFORM_ANDROID
AM_CPPFLAGS += $(ANDROID_CPPFLAGS)
AM_CPPFLAGS += $(ANDROID_CPPFLAGS) -DVK_USE_PLATFORM_ANDROID_KHR
AM_CFLAGS += $(ANDROID_CFLAGS)
VULKAN_LIB_DEPS += $(ANDROID_LIBS)
VULKAN_SOURCES += $(VULKAN_ANDROID_FILES)

View File

@@ -105,7 +105,7 @@ EXTENSIONS = [
Extension('VK_EXT_sampler_filter_minmax', 1, 'device->rad_info.chip_class >= CIK'),
Extension('VK_EXT_shader_viewport_index_layer', 1, True),
Extension('VK_EXT_shader_stencil_export', 1, True),
Extension('VK_EXT_vertex_attribute_divisor', 1, True),
Extension('VK_EXT_vertex_attribute_divisor', 2, True),
Extension('VK_AMD_draw_indirect_count', 1, True),
Extension('VK_AMD_gcn_shader', 1, True),
Extension('VK_AMD_rasterization_order', 1, 'device->has_out_of_order_rast'),

View File

@@ -612,7 +612,8 @@ radv_physical_device_get_format_properties(struct radv_physical_device *physical
}
if (desc->layout == VK_FORMAT_LAYOUT_ETC &&
physical_device->rad_info.chip_class < GFX9 &&
physical_device->rad_info.family != CHIP_VEGA10 &&
physical_device->rad_info.family != CHIP_RAVEN &&
physical_device->rad_info.family != CHIP_STONEY) {
out_properties->linearTilingFeatures = linear;
out_properties->optimalTilingFeatures = tiled;

View File

@@ -1991,8 +1991,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
uint32_t divisor = ctx->options->key.vs.instance_rate_divisors[attrib_index];
if (divisor) {
buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.instance_id,
ctx->abi.start_instance, "");
buffer_index = ctx->abi.instance_id;
if (divisor != 1) {
buffer_index = LLVMBuildUDiv(ctx->ac.builder, buffer_index,
@@ -2007,8 +2006,10 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
MAX2(1, ctx->shader_info->vs.vgpr_comp_cnt);
}
} else {
buffer_index = ctx->ac.i32_0;
unreachable("Invalid vertex attribute divisor of 0.");
}
buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.start_instance, buffer_index, "");
} else
buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.vertex_id,
ctx->abi.base_vertex, "");

View File

@@ -528,6 +528,16 @@
<field name="number of attribute arrays" size="5" start="0" type="uint"/>
</packet>
<packet code="71" name="VCM Cache Size" min_ver="41">
<field name="Number of 16-vertex batches for rendering" size="4" start="4" type="uint"/>
<field name="Number of 16-vertex batches for binning" size="4" start="0" type="uint"/>
</packet>
<packet code="73" name="VCM Cache Size" max_ver="33">
<field name="Number of 16-vertex batches for rendering" size="4" start="4" type="uint"/>
<field name="Number of 16-vertex batches for binning" size="4" start="0" type="uint"/>
</packet>
<packet code="73" name="Transform Feedback Buffer" min_ver="41">
<field name="Buffer Address" size="32" start="32" type="address"/>
<field name="Buffer Size in 32-bit words" size="30" start="2" type="uint"/>

View File

@@ -27,13 +27,14 @@
#include <stdint.h>
/**
* Struct for tracking features of the V3D chip. This is where we'll store
* boolean flags for features in a specific version, but for now it's just the
* version
* Struct for tracking features of the V3D chip across driver and compiler.
*/
struct v3d_device_info {
/** Simple V3D version: major * 10 + minor */
uint8_t ver;
/** Size of the VPM, in bytes. */
int vpm_size;
};
#endif

View File

@@ -462,6 +462,7 @@ struct choose_scoreboard {
int last_magic_sfu_write_tick;
int last_ldvary_tick;
int last_uniforms_reset_tick;
int last_thrsw_tick;
bool tlb_locked;
};
@@ -1095,10 +1096,16 @@ qpu_instruction_valid_in_thrend_slot(struct v3d_compile *c,
}
static bool
valid_thrsw_sequence(struct v3d_compile *c,
valid_thrsw_sequence(struct v3d_compile *c, struct choose_scoreboard *scoreboard,
struct qinst *qinst, int instructions_in_sequence,
bool is_thrend)
{
/* No emitting our thrsw while the previous thrsw hasn't happened yet. */
if (scoreboard->last_thrsw_tick + 3 >
scoreboard->tick - instructions_in_sequence) {
return false;
}
for (int slot = 0; slot < instructions_in_sequence; slot++) {
/* No scheduling SFU when the result would land in the other
* thread. The simulator complains for safety, though it
@@ -1159,7 +1166,8 @@ emit_thrsw(struct v3d_compile *c,
if (!v3d_qpu_sig_pack(c->devinfo, &sig, &packed_sig))
break;
if (!valid_thrsw_sequence(c, prev_inst, slots_filled + 1,
if (!valid_thrsw_sequence(c, scoreboard,
prev_inst, slots_filled + 1,
is_thrend)) {
break;
}
@@ -1173,7 +1181,9 @@ emit_thrsw(struct v3d_compile *c,
if (merge_inst) {
merge_inst->qpu.sig.thrsw = true;
needs_free = true;
scoreboard->last_thrsw_tick = scoreboard->tick - slots_filled;
} else {
scoreboard->last_thrsw_tick = scoreboard->tick;
insert_scheduled_instruction(c, block, scoreboard, inst);
time++;
slots_filled++;
@@ -1475,6 +1485,7 @@ v3d_qpu_schedule_instructions(struct v3d_compile *c)
scoreboard.last_ldvary_tick = -10;
scoreboard.last_magic_sfu_write_tick = -10;
scoreboard.last_uniforms_reset_tick = -10;
scoreboard.last_thrsw_tick = -10;
if (debug) {
fprintf(stderr, "Pre-schedule instructions\n");

View File

@@ -648,6 +648,9 @@ struct v3d_vs_prog_data {
/* Total number of components written, for the shader state record. */
uint32_t vpm_output_size;
/* Value to be programmed in VCM_CACHE_SIZE. */
uint8_t vcm_cache_size;
};
struct v3d_fs_prog_data {
@@ -928,7 +931,7 @@ VIR_A_ALU2(OR)
VIR_A_ALU2(XOR)
VIR_A_ALU2(VADD)
VIR_A_ALU2(VSUB)
VIR_A_ALU2(STVPMV)
VIR_A_NODST_2(STVPMV)
VIR_A_ALU1(NOT)
VIR_A_ALU1(NEG)
VIR_A_ALU1(FLAPUSH)

View File

@@ -452,6 +452,16 @@ vir_emit_def(struct v3d_compile *c, struct qinst *inst)
{
assert(inst->dst.file == QFILE_NULL);
/* If we're emitting an instruction that's a def, it had better be
* writing a register.
*/
if (inst->qpu.type == V3D_QPU_INSTR_TYPE_ALU) {
assert(inst->qpu.alu.add.op == V3D_QPU_A_NOP ||
v3d_qpu_add_op_has_dst(inst->qpu.alu.add.op));
assert(inst->qpu.alu.mul.op == V3D_QPU_M_NOP ||
v3d_qpu_mul_op_has_dst(inst->qpu.alu.mul.op));
}
inst->dst = vir_get_temp(c);
if (inst->dst.file == QFILE_TEMP)
@@ -746,10 +756,28 @@ uint64_t *v3d_compile_vs(const struct v3d_compiler *compiler,
if (prog_data->uses_iid)
prog_data->vpm_input_size++;
/* Input/output segment size are in 8x32-bit multiples. */
/* Input/output segment size are in sectors (8 rows of 32 bits per
* channel).
*/
prog_data->vpm_input_size = align(prog_data->vpm_input_size, 8) / 8;
prog_data->vpm_output_size = align(c->num_vpm_writes, 8) / 8;
/* Compute VCM cache size. We set up our program to take up less than
* half of the VPM, so that any set of bin and render programs won't
* run out of space. We need space for at least one input segment,
* and then allocate the rest to output segments (one for the current
* program, the rest to VCM). The valid range of the VCM cache size
* field is 1-4 16-vertex batches, but GFXH-1744 limits us to 2-4
* batches.
*/
assert(c->devinfo->vpm_size);
int sector_size = 16 * sizeof(uint32_t) * 8;
int vpm_size_in_sectors = c->devinfo->vpm_size / sector_size;
int half_vpm = vpm_size_in_sectors / 2;
int vpm_output_batches = half_vpm - prog_data->vpm_input_size;
assert(vpm_output_batches >= 2);
prog_data->vcm_cache_size = CLAMP(vpm_output_batches - 1, 2, 4);
return v3d_return_qpu_insts(c, final_assembly_size);
}

View File

@@ -94,6 +94,15 @@ v3d_choose_spill_node(struct v3d_compile *c, struct ra_graph *g,
}
}
/* Refuse to spill a ldvary's dst, because that means
* that ldvary's r5 would end up being used across a
* thrsw.
*/
if (inst->qpu.sig.ldvary) {
assert(inst->dst.file == QFILE_TEMP);
BITSET_CLEAR(c->spillable, inst->dst.index);
}
if (inst->is_last_thrsw)
started_last_seg = true;
@@ -102,7 +111,7 @@ v3d_choose_spill_node(struct v3d_compile *c, struct ra_graph *g,
started_last_seg = true;
/* Track when we're in between a TMU setup and the
* final LDTMU from that TMU setup. We can't
* final LDTMU or TMUWT from that TMU setup. We can't
* spill/fill any temps during that time, because that
* involves inserting a new TMU setup/LDTMU sequence.
*/
@@ -110,6 +119,10 @@ v3d_choose_spill_node(struct v3d_compile *c, struct ra_graph *g,
is_last_ldtmu(inst, block))
in_tmu_operation = false;
if (inst->qpu.type == V3D_QPU_INSTR_TYPE_ALU &&
inst->qpu.alu.add.op == V3D_QPU_A_TMUWT)
in_tmu_operation = false;
if (v3d_qpu_writes_tmu(&inst->qpu))
in_tmu_operation = true;
}
@@ -206,6 +219,7 @@ v3d_spill_reg(struct v3d_compile *c, int spill_temp)
inst->dst);
v3d_emit_spill_tmua(c, spill_offset);
vir_emit_thrsw(c);
vir_TMUWT(c);
c->spills++;
}

View File

@@ -1928,6 +1928,11 @@ ast_expression::do_hir(exec_list *instructions,
error_emitted = op[0]->type->is_error() || op[1]->type->is_error();
if (error_emitted) {
result = ir_rvalue::error_value(ctx);
break;
}
type = arithmetic_result_type(op[0], op[1], false, state, & loc);
ir_rvalue *temp_rhs;

View File

@@ -201,6 +201,17 @@ resize_callback(struct wl_egl_window *wl_win, void *data)
struct dri2_egl_display *dri2_dpy =
dri2_egl_display(dri2_surf->base.Resource.Display);
/* Update the surface size as soon as native window is resized; from user
* pov, this makes the effect that resize is done inmediately after native
* window resize, without requiring to wait until the first draw.
*
* A more detailed and lengthy explanation can be found at
* https://lists.freedesktop.org/archives/mesa-dev/2018-June/196474.html
*/
if (!dri2_surf->back) {
dri2_surf->base.Width = wl_win->width;
dri2_surf->base.Height = wl_win->height;
}
dri2_dpy->flush->invalidate(dri2_surf->dri_drawable);
}
@@ -258,6 +269,9 @@ dri2_wl_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
goto cleanup_surf;
}
dri2_surf->base.Width = window->width;
dri2_surf->base.Height = window->height;
visual_idx = dri2_wl_visual_idx_from_config(dri2_dpy, config);
assert(visual_idx != -1);
@@ -577,8 +591,8 @@ update_buffers(struct dri2_egl_surface *dri2_surf)
struct dri2_egl_display *dri2_dpy =
dri2_egl_display(dri2_surf->base.Resource.Display);
if (dri2_surf->base.Width != dri2_surf->wl_win->width ||
dri2_surf->base.Height != dri2_surf->wl_win->height) {
if (dri2_surf->base.Width != dri2_surf->wl_win->attached_width ||
dri2_surf->base.Height != dri2_surf->wl_win->attached_height) {
dri2_wl_release_buffers(dri2_surf);
@@ -1632,8 +1646,8 @@ swrast_update_buffers(struct dri2_egl_surface *dri2_surf)
if (dri2_surf->back)
return 0;
if (dri2_surf->base.Width != dri2_surf->wl_win->width ||
dri2_surf->base.Height != dri2_surf->wl_win->height) {
if (dri2_surf->base.Width != dri2_surf->wl_win->attached_width ||
dri2_surf->base.Height != dri2_surf->wl_win->attached_height) {
dri2_wl_release_buffers(dri2_surf);

View File

@@ -107,12 +107,17 @@ static const struct loader_dri3_vtable egl_dri3_vtable = {
static EGLBoolean
dri3_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
xcb_drawable_t drawable = dri3_surf->loader_drawable.drawable;
(void) drv;
loader_dri3_drawable_fini(&dri3_surf->loader_drawable);
if (surf->Type == EGL_PBUFFER_BIT)
xcb_free_pixmap (dri2_dpy->conn, drawable);
dri2_fini_surface(surf);
free(surf);

View File

@@ -1131,6 +1131,31 @@ static void u_vbuf_set_driver_vertex_buffers(struct u_vbuf *mgr)
mgr->dirty_real_vb_mask = 0;
}
static void
u_vbuf_split_indexed_multidraw(struct u_vbuf *mgr, struct pipe_draw_info *info,
unsigned *indirect_data, unsigned stride,
unsigned draw_count)
{
assert(info->index_size);
info->indirect = NULL;
for (unsigned i = 0; i < draw_count; i++) {
unsigned offset = i * stride / 4;
info->count = indirect_data[offset + 0];
info->instance_count = indirect_data[offset + 1];
if (!info->count || !info->instance_count)
continue;
info->start = indirect_data[offset + 2];
info->index_bias = indirect_data[offset + 3];
info->start_instance = indirect_data[offset + 4];
u_vbuf_draw_vbo(mgr, info);
}
}
void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info)
{
struct pipe_context *pipe = mgr->pipe;
@@ -1160,33 +1185,163 @@ void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info)
new_info = *info;
/* Fallback. We need to know all the parameters. */
/* Handle indirect (multi)draws. */
if (new_info.indirect) {
struct pipe_transfer *transfer = NULL;
int *data;
const struct pipe_draw_indirect_info *indirect = new_info.indirect;
unsigned draw_count = 0;
if (new_info.index_size) {
data = pipe_buffer_map_range(pipe, new_info.indirect->buffer,
new_info.indirect->offset, 20,
PIPE_TRANSFER_READ, &transfer);
new_info.index_bias = data[3];
new_info.start_instance = data[4];
}
else {
data = pipe_buffer_map_range(pipe, new_info.indirect->buffer,
new_info.indirect->offset, 16,
PIPE_TRANSFER_READ, &transfer);
new_info.start_instance = data[3];
/* Get the number of draws. */
if (indirect->indirect_draw_count) {
pipe_buffer_read(pipe, indirect->indirect_draw_count,
indirect->indirect_draw_count_offset,
4, &draw_count);
} else {
draw_count = indirect->draw_count;
}
new_info.count = data[0];
new_info.instance_count = data[1];
new_info.start = data[2];
pipe_buffer_unmap(pipe, transfer);
new_info.indirect = NULL;
if (!new_info.count)
if (!draw_count)
return;
unsigned data_size = (draw_count - 1) * indirect->stride +
(new_info.index_size ? 20 : 16);
unsigned *data = malloc(data_size);
if (!data)
return; /* report an error? */
/* Read the used buffer range only once, because the read can be
* uncached.
*/
pipe_buffer_read(pipe, indirect->buffer, indirect->offset, data_size,
data);
if (info->index_size) {
/* Indexed multidraw. */
unsigned index_bias0 = data[3];
bool index_bias_same = true;
/* If we invoke the translate path, we have to split the multidraw. */
if (incompatible_vb_mask ||
mgr->ve->incompatible_elem_mask) {
u_vbuf_split_indexed_multidraw(mgr, &new_info, data,
indirect->stride, draw_count);
free(data);
return;
}
/* See if index_bias is the same for all draws. */
for (unsigned i = 1; i < draw_count; i++) {
if (data[i * indirect->stride / 4 + 3] != index_bias0) {
index_bias_same = false;
break;
}
}
/* Split the multidraw if index_bias is different. */
if (!index_bias_same) {
u_vbuf_split_indexed_multidraw(mgr, &new_info, data,
indirect->stride, draw_count);
free(data);
return;
}
/* If we don't need to use the translate path and index_bias is
* the same, we can process the multidraw with the time complexity
* equal to 1 draw call (except for the index range computation).
* We only need to compute the index range covering all draw calls
* of the multidraw.
*
* The driver will not look at these values because indirect != NULL.
* These values determine the user buffer bounds to upload.
*/
new_info.index_bias = index_bias0;
new_info.min_index = ~0u;
new_info.max_index = 0;
new_info.start_instance = ~0u;
unsigned end_instance = 0;
struct pipe_transfer *transfer = NULL;
const uint8_t *indices;
if (info->has_user_indices) {
indices = (uint8_t*)info->index.user;
} else {
indices = (uint8_t*)pipe_buffer_map(pipe, info->index.resource,
PIPE_TRANSFER_READ, &transfer);
}
for (unsigned i = 0; i < draw_count; i++) {
unsigned offset = i * indirect->stride / 4;
unsigned start = data[offset + 2];
unsigned count = data[offset + 0];
unsigned start_instance = data[offset + 4];
unsigned instance_count = data[offset + 1];
if (!count || !instance_count)
continue;
/* Update the ranges of instances. */
new_info.start_instance = MIN2(new_info.start_instance,
start_instance);
end_instance = MAX2(end_instance, start_instance + instance_count);
/* Update the index range. */
unsigned min, max;
new_info.count = count; /* only used by get_minmax_index */
u_vbuf_get_minmax_index_mapped(&new_info,
indices +
new_info.index_size * start,
&min, &max);
new_info.min_index = MIN2(new_info.min_index, min);
new_info.max_index = MAX2(new_info.max_index, max);
}
free(data);
if (transfer)
pipe_buffer_unmap(pipe, transfer);
/* Set the final instance count. */
new_info.instance_count = end_instance - new_info.start_instance;
if (new_info.start_instance == ~0u || !new_info.instance_count)
return;
} else {
/* Non-indexed multidraw.
*
* Keep the draw call indirect and compute minimums & maximums,
* which will determine the user buffer bounds to upload, but
* the driver will not look at these values because indirect != NULL.
*
* This efficiently processes the multidraw with the time complexity
* equal to 1 draw call.
*/
new_info.start = ~0u;
new_info.start_instance = ~0u;
unsigned end_vertex = 0;
unsigned end_instance = 0;
for (unsigned i = 0; i < draw_count; i++) {
unsigned offset = i * indirect->stride / 4;
unsigned start = data[offset + 2];
unsigned count = data[offset + 0];
unsigned start_instance = data[offset + 3];
unsigned instance_count = data[offset + 1];
new_info.start = MIN2(new_info.start, start);
new_info.start_instance = MIN2(new_info.start_instance,
start_instance);
end_vertex = MAX2(end_vertex, start + count);
end_instance = MAX2(end_instance, start_instance + instance_count);
}
/* Set the final counts. */
new_info.count = end_vertex - new_info.start;
new_info.instance_count = end_instance - new_info.start_instance;
if (new_info.start == ~0u || !new_info.count || !new_info.instance_count)
return;
}
}
if (new_info.index_size) {
@@ -1211,7 +1366,8 @@ void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info)
* We would have to break this drawing operation into several ones. */
/* Use some heuristic to see if unrolling indices improves
* performance. */
if (!new_info.primitive_restart &&
if (!info->indirect &&
!new_info.primitive_restart &&
num_vertices > new_info.count*2 &&
num_vertices - new_info.count > 32 &&
!u_vbuf_mapping_vertex_buffer_blocks(mgr)) {

View File

@@ -2151,13 +2151,36 @@ NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su)
}
}
void
NVC0LoweringPass::insertOOBSurfaceOpResult(TexInstruction *su)
{
if (!su->getPredicate())
return;
bld.setPosition(su, true);
for (unsigned i = 0; su->defExists(i); ++i) {
ValueDef &def = su->def(i);
Instruction *mov = bld.mkMov(bld.getSSA(), bld.loadImm(NULL, 0));
assert(su->cc == CC_NOT_P);
mov->setPredicate(CC_P, su->getPredicate());
Instruction *uni = bld.mkOp2(OP_UNION, TYPE_U32, bld.getSSA(), NULL, mov->getDef(0));
def.replace(uni->getDef(0), false);
uni->setSrc(0, def.get());
}
}
void
NVC0LoweringPass::handleSurfaceOpNVE4(TexInstruction *su)
{
processSurfaceCoordsNVE4(su);
if (su->op == OP_SULDP)
if (su->op == OP_SULDP) {
convertSurfaceFormat(su);
insertOOBSurfaceOpResult(su);
}
if (su->op == OP_SUREDB || su->op == OP_SUREDP) {
assert(su->getPredicate());
@@ -2267,8 +2290,10 @@ NVC0LoweringPass::handleSurfaceOpNVC0(TexInstruction *su)
processSurfaceCoordsNVC0(su);
if (su->op == OP_SULDP)
if (su->op == OP_SULDP) {
convertSurfaceFormat(su);
insertOOBSurfaceOpResult(su);
}
if (su->op == OP_SUREDB || su->op == OP_SUREDP) {
const int dim = su->tex.target.getDim();
@@ -2370,8 +2395,10 @@ NVC0LoweringPass::handleSurfaceOpGM107(TexInstruction *su)
{
processSurfaceCoordsGM107(su);
if (su->op == OP_SULDP)
if (su->op == OP_SULDP) {
convertSurfaceFormat(su);
insertOOBSurfaceOpResult(su);
}
if (su->op == OP_SUREDP) {
Value *def = su->getDef(0);

View File

@@ -172,6 +172,7 @@ private:
void processSurfaceCoordsNVE4(TexInstruction *);
void processSurfaceCoordsNVC0(TexInstruction *);
void convertSurfaceFormat(TexInstruction *);
void insertOOBSurfaceOpResult(TexInstruction *);
Value *calculateSampleOffset(Value *sampleID);
protected:

View File

@@ -37,7 +37,7 @@ extern "C" {
struct pipe_screen *swr_create_screen(struct sw_winsys *winsys);
// arch-specific dll entry point
PUBLIC struct pipe_screen *swr_create_screen_internal(struct sw_winsys *winsys);
struct pipe_screen *swr_create_screen_internal(struct sw_winsys *winsys);
// cleanup for failed screen creation
void swr_destroy_screen_internal(struct swr_screen **screen);

View File

@@ -1143,12 +1143,10 @@ swr_validate_env_options(struct swr_screen *screen)
}
PUBLIC
struct pipe_screen *
swr_create_screen_internal(struct sw_winsys *winsys)
{
struct swr_screen *screen = CALLOC_STRUCT(swr_screen);
memset(screen, 0, sizeof(struct swr_screen));
if (!screen)
return NULL;

View File

@@ -585,6 +585,8 @@ v3d_get_device_info(struct v3d_screen *screen)
uint32_t minor = (ident1.value >> 0) & 0xf;
screen->devinfo.ver = major * 10 + minor;
screen->devinfo.vpm_size = (ident1.value >> 28 & 0xf) * 1024;
switch (screen->devinfo.ver) {
case 33:
case 41:

View File

@@ -306,6 +306,13 @@ v3d_emit_gl_shader_state(struct v3d_context *v3d,
}
}
cl_emit(&job->bcl, VCM_CACHE_SIZE, vcm) {
vcm.number_of_16_vertex_batches_for_binning =
v3d->prog.cs->prog_data.vs->vcm_cache_size;
vcm.number_of_16_vertex_batches_for_rendering =
v3d->prog.vs->prog_data.vs->vcm_cache_size;
}
cl_emit(&job->bcl, GL_SHADER_STATE, state) {
state.address = cl_address(job->indirect.bo, shader_rec_offset);
state.number_of_attribute_arrays = num_elements_to_emit;

View File

@@ -222,6 +222,8 @@ vc4_emit_gl_shader_state(struct vc4_context *vc4,
attr.coordinate_shader_vpm_offset = 0;
attr.vertex_shader_vpm_offset = 0;
}
vc4_bo_unreference(&bo);
}
cl_emit(&job->bcl, GL_SHADER_STATE, shader_state) {

View File

@@ -121,7 +121,8 @@ vc4_fence_server_sync(struct pipe_context *pctx,
struct vc4_context *vc4 = vc4_context(pctx);
struct vc4_fence *fence = vc4_fence(pfence);
sync_accumulate("vc4", &vc4->in_fence_fd, fence->fd);
if (fence->fd >= 0)
sync_accumulate("vc4", &vc4->in_fence_fd, fence->fd);
}
static int
@@ -142,8 +143,12 @@ vc4_fence_context_init(struct vc4_context *vc4)
/* Since we initialize the in_fence_fd to -1 (no wait necessary),
* we also need to initialize our in_syncobj as signaled.
*/
return drmSyncobjCreate(vc4->fd, DRM_SYNCOBJ_CREATE_SIGNALED,
&vc4->in_syncobj);
if (vc4->screen->has_syncobj) {
return drmSyncobjCreate(vc4->fd, DRM_SYNCOBJ_CREATE_SIGNALED,
&vc4->in_syncobj);
} else {
return 0;
}
}
void

View File

@@ -38,6 +38,7 @@
#include "vc4_context.h"
#include "vc4_qpu.h"
#include "vc4_qir.h"
#include "mesa/state_tracker/st_glsl_types.h"
static struct qreg
ntq_get_src(struct vc4_compile *c, nir_src src, int i);
@@ -50,6 +51,12 @@ type_size(const struct glsl_type *type)
return glsl_count_attribute_slots(type, false);
}
static int
uniforms_type_size(const struct glsl_type *type)
{
return st_glsl_storage_type_size(type, false);
}
static void
resize_qreg_array(struct vc4_compile *c,
struct qreg **regs,
@@ -1685,7 +1692,7 @@ static void
ntq_setup_uniforms(struct vc4_compile *c)
{
nir_foreach_variable(var, &c->s->uniforms) {
uint32_t vec4_count = type_size(var->type);
uint32_t vec4_count = uniforms_type_size(var->type);
unsigned vec4_size = 4 * sizeof(float);
declare_uniform_range(c, var->data.driver_location * vec4_size,
@@ -2469,9 +2476,13 @@ vc4_shader_state_create(struct pipe_context *pctx,
*/
s = cso->ir.nir;
NIR_PASS_V(s, nir_lower_io, nir_var_all, type_size,
NIR_PASS_V(s, nir_lower_io, nir_var_all & ~nir_var_uniform,
type_size,
(nir_lower_io_options)0);
} else {
NIR_PASS_V(s, nir_lower_io, nir_var_uniform,
uniforms_type_size,
(nir_lower_io_options)0);
} else {
assert(cso->type == PIPE_SHADER_IR_TGSI);
if (vc4_debug & VC4_DEBUG_TGSI) {

View File

@@ -614,7 +614,9 @@ vc4_create_sampler_view(struct pipe_context *pctx, struct pipe_resource *prsc,
}
so->texture_p0 =
(VC4_SET_FIELD(rsc->slices[0].offset >> 12, VC4_TEX_P0_OFFSET) |
(VC4_SET_FIELD((rsc->slices[0].offset +
cso->u.tex.first_layer *
rsc->cube_map_stride) >> 12, VC4_TEX_P0_OFFSET) |
VC4_SET_FIELD(rsc->vc4_format & 15, VC4_TEX_P0_TYPE) |
VC4_SET_FIELD(so->force_first_level ?
cso->u.tex.last_level :

View File

@@ -26,8 +26,12 @@
*
**************************************************************************/
#if !defined(ANDROID) || ANDROID_API_LEVEL >= 26
/* Android's libc began supporting shm in Oreo */
#define HAVE_SHM
#include <sys/ipc.h>
#include <sys/shm.h>
#endif
#include "pipe/p_compiler.h"
#include "pipe/p_format.h"
@@ -83,6 +87,7 @@ dri_sw_is_displaytarget_format_supported( struct sw_winsys *ws,
return TRUE;
}
#ifdef HAVE_SHM
static char *
alloc_shm(struct dri_sw_displaytarget *dri_sw_dt, unsigned size)
{
@@ -101,6 +106,7 @@ alloc_shm(struct dri_sw_displaytarget *dri_sw_dt, unsigned size)
return addr;
}
#endif
static struct sw_displaytarget *
dri_sw_displaytarget_create(struct sw_winsys *winsys,
@@ -131,8 +137,11 @@ dri_sw_displaytarget_create(struct sw_winsys *winsys,
size = dri_sw_dt->stride * nblocksy;
dri_sw_dt->shmid = -1;
#ifdef HAVE_SHM
if (ws->lf->put_image_shm)
dri_sw_dt->data = alloc_shm(dri_sw_dt, size);
#endif
if(!dri_sw_dt->data)
dri_sw_dt->data = align_malloc(size, alignment);
@@ -156,8 +165,10 @@ dri_sw_displaytarget_destroy(struct sw_winsys *ws,
struct dri_sw_displaytarget *dri_sw_dt = dri_sw_displaytarget(dt);
if (dri_sw_dt->shmid >= 0) {
#ifdef HAVE_SHM
shmdt(dri_sw_dt->data);
shmctl(dri_sw_dt->shmid, IPC_RMID, 0);
#endif
} else {
align_free(dri_sw_dt->data);
}

View File

@@ -19,9 +19,6 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
if HAVE_SHARED_GLAPI
SHARED_GLAPI_LIB = $(top_builddir)/src/mapi/shared-glapi/libglapi.la
endif
SUBDIRS =
@@ -181,7 +178,7 @@ GL_LIBS = \
$(LIBDRM_LIBS) \
libglx.la \
$(top_builddir)/src/mapi/glapi/libglapi.la \
$(SHARED_GLAPI_LIB) \
$(top_builddir)/src/mapi/shared-glapi/libglapi.la \
$(GL_LIB_DEPS)
GL_LDFLAGS = \

View File

@@ -152,7 +152,7 @@ static const struct extension_info known_glx_extensions[] = {
{ GLX(ATI_pixel_format_float), VER(0,0), N, N, N, N },
{ GLX(INTEL_swap_event), VER(0,0), Y, N, N, N },
{ GLX(MESA_copy_sub_buffer), VER(0,0), Y, N, N, N },
{ GLX(MESA_multithread_makecurrent),VER(0,0), Y, N, Y, N },
{ GLX(MESA_multithread_makecurrent),VER(0,0), Y, N, N, Y },
{ GLX(MESA_query_renderer), VER(0,0), Y, N, N, Y },
{ GLX(MESA_swap_control), VER(0,0), Y, N, N, Y },
{ GLX(NV_float_buffer), VER(0,0), N, N, N, N },

View File

@@ -21,7 +21,9 @@
noinst_PROGRAMS += \
tools/aubinator \
tools/aubinator_error_decode
tools/aubinator_error_decode \
tools/error2aub
tools_aubinator_SOURCES = \
tools/aubinator.c \
@@ -59,3 +61,23 @@ tools_aubinator_error_decode_LDADD = \
tools_aubinator_error_decode_CFLAGS = \
$(AM_CFLAGS) \
$(ZLIB_CFLAGS)
tools_error2aub_SOURCES = \
tools/gen_context.h \
tools/gen8_context.h \
tools/gen10_context.h \
tools/aub_write.h \
tools/aub_write.c \
tools/error2aub.c
tools_error2aub_CFLAGS = \
$(AM_CFLAGS) \
$(ZLIB_CFLAGS)
tools_error2aub_LDADD = \
dev/libintel_dev.la \
$(PTHREAD_LIBS) \
$(DLOPEN_LIBS) \
$(ZLIB_LIBS) \
-lm

View File

@@ -75,18 +75,6 @@ brw_blorp_surface_info_init(struct blorp_context *blorp,
if (format == ISL_FORMAT_UNSUPPORTED)
format = surf->surf->format;
if (format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
/* Unfortunately, ISL_FORMAT_R24_UNORM_X8_TYPELESS it isn't supported as
* a render target, which would prevent us from blitting to 24-bit
* depth. The miptree consists of 32 bits per pixel, arranged as 24-bit
* depth values interleaved with 8 "don't care" bits. Since depth
* values don't require any blending, it doesn't matter how we interpret
* the bit pattern as long as we copy the right amount of data, so just
* map it as 8-bit BGRA.
*/
format = ISL_FORMAT_B8G8R8A8_UNORM;
}
info->surf = *surf->surf;
info->addr = surf->addr;

View File

@@ -776,6 +776,14 @@ blorp_nir_manual_blend_bilinear(nir_builder *b, nir_ssa_def *pos,
* grid of samples with in a pixel. Sample number layout shows the
* rectangular grid of samples roughly corresponding to the real sample
* locations with in a pixel.
*
* In the case of 2x MSAA, the layout of sample indices is reversed from
* the layout of sample numbers:
*
* sample index layout : --------- sample number layout : ---------
* | 0 | 1 | | 1 | 0 |
* --------- ---------
*
* In case of 4x MSAA, layout of sample indices matches the layout of
* sample numbers:
* ---------
@@ -819,7 +827,9 @@ blorp_nir_manual_blend_bilinear(nir_builder *b, nir_ssa_def *pos,
key->x_scale * key->y_scale));
sample = nir_f2i32(b, sample);
if (tex_samples == 8) {
if (tex_samples == 2) {
sample = nir_isub(b, nir_imm_int(b, 1), sample);
} else if (tex_samples == 8) {
sample = nir_iand(b, nir_ishr(b, nir_imm_int(b, 0x64210573),
nir_ishl(b, sample, nir_imm_int(b, 2))),
nir_imm_int(b, 0xf));
@@ -984,14 +994,14 @@ convert_color(struct nir_builder *b, nir_ssa_def *color,
nir_ssa_def *value;
if (key->dst_format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
/* The destination image is bound as R32_UNORM but the data needs to be
/* The destination image is bound as R32_UINT but the data needs to be
* in R24_UNORM_X8_TYPELESS. The bottom 24 are the actual data and the
* top 8 need to be zero. We can accomplish this by simply multiplying
* by a factor to scale things down.
*/
float factor = (float)((1 << 24) - 1) / (float)UINT32_MAX;
value = nir_fmul(b, nir_fsat(b, nir_channel(b, color, 0)),
nir_imm_float(b, factor));
unsigned factor = (1 << 24) - 1;
value = nir_fsat(b, nir_channel(b, color, 0));
value = nir_f2i32(b, nir_fmul(b, value, nir_imm_float(b, factor)));
} else if (key->dst_format == ISL_FORMAT_L8_UNORM_SRGB) {
value = nir_format_linear_to_srgb(b, nir_channel(b, color, 0));
} else if (key->dst_format == ISL_FORMAT_R8G8B8_UNORM_SRGB) {
@@ -1976,7 +1986,7 @@ try_blorp_blit(struct blorp_batch *batch,
isl_format_rgbx_to_rgba(params->dst.view.format);
} else if (params->dst.view.format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
wm_prog_key->dst_format = params->dst.view.format;
params->dst.view.format = ISL_FORMAT_R32_UNORM;
params->dst.view.format = ISL_FORMAT_R32_UINT;
} else if (params->dst.view.format == ISL_FORMAT_A4B4G4R4_UNORM) {
params->dst.view.swizzle =
isl_swizzle_compose(params->dst.view.swizzle,
@@ -2240,6 +2250,17 @@ blorp_blit(struct blorp_batch *batch,
}
}
/* ISL_FORMAT_R24_UNORM_X8_TYPELESS it isn't supported as a render target,
* which requires shader math to render to it. Blitting Z24X8 to Z24X8
* is fairly common though, so we'd like to avoid it. Since we don't need
* to blend depth values, we can simply pick a renderable format with the
* right number of bits-per-pixel, like 8-bit BGRA.
*/
if (dst_surf->surf->format == ISL_FORMAT_R24_UNORM_X8_TYPELESS &&
src_surf->surf->format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
src_format = dst_format = ISL_FORMAT_B8G8R8A8_UNORM;
}
brw_blorp_surface_info_init(batch->blorp, &params.src, src_surf, src_level,
src_layer, src_format, false);
brw_blorp_surface_info_init(batch->blorp, &params.dst, dst_surf, dst_level,

View File

@@ -42,10 +42,10 @@ prefix##0YOffset = 0.5;
* c 1
*/
#define GEN_SAMPLE_POS_2X(prefix) \
prefix##0XOffset = 0.25; \
prefix##0YOffset = 0.25; \
prefix##1XOffset = 0.75; \
prefix##1YOffset = 0.75;
prefix##0XOffset = 0.75; \
prefix##0YOffset = 0.75; \
prefix##1XOffset = 0.25; \
prefix##1YOffset = 0.25;
/**
* Sample positions:

View File

@@ -5115,6 +5115,25 @@ get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
}
}
if (devinfo->gen < 6) {
/* From the G45 PRM, Volume 4 Page 361:
*
* "Operand Alignment Rule: With the exceptions listed below, a
* source/destination operand in general should be aligned to even
* 256-bit physical register with a region size equal to two 256-bit
* physical registers."
*
* Normally we enforce this by allocating virtual registers to the
* even-aligned class. But we need to handle payload registers.
*/
for (unsigned i = 0; i < inst->sources; i++) {
if (inst->src[i].file == FIXED_GRF && (inst->src[i].nr & 1) &&
inst->size_read(i) > REG_SIZE) {
max_width = MIN2(max_width, 8);
}
}
}
/* From the IVB PRMs:
* "When an instruction is SIMD32, the low 16 bits of the execution mask
* are applied for both halves of the SIMD32 instruction. If different
@@ -6321,6 +6340,7 @@ fs_visitor::optimize()
if (OPT(lower_load_payload)) {
split_virtual_grfs();
OPT(register_coalesce);
OPT(lower_simd_width);
OPT(compute_to_mrf);
OPT(dead_code_eliminate);
}

View File

@@ -590,7 +590,7 @@ handle_memtrace_reg_write(uint32_t *p)
uint32_t pphwsp_addr = context_descriptor & 0xfffff000;
struct gen_batch_decode_bo pphwsp_bo = get_ggtt_batch_bo(NULL, pphwsp_addr);
uint32_t *context = (uint32_t *)((uint8_t *)pphwsp_bo.map +
(pphwsp_bo.addr - pphwsp_addr) +
(pphwsp_addr - pphwsp_bo.addr) +
pphwsp_size);
uint32_t ring_buffer_head = context[5];
@@ -601,7 +601,7 @@ handle_memtrace_reg_write(uint32_t *p)
struct gen_batch_decode_bo ring_bo = get_ggtt_batch_bo(NULL,
ring_buffer_start);
assert(ring_bo.size > 0);
void *commands = (uint8_t *)ring_bo.map + (ring_bo.addr - ring_buffer_start);
void *commands = (uint8_t *)ring_bo.map + (ring_buffer_start - ring_bo.addr);
if (context_descriptor & 0x100 /* ppgtt */) {
batch_ctx.get_bo = get_ppgtt_batch_bo;

View File

@@ -205,7 +205,7 @@ main(int argc, char *argv[])
BO_TYPE_UNKNOWN = 0,
BO_TYPE_BATCH,
BO_TYPE_USER,
} bo_type;
} bo_type = BO_TYPE_UNKNOWN;
uint64_t bo_addr;
char *line = NULL;

View File

@@ -38,13 +38,13 @@
/**
* 1x MSAA has a single sample at the center: (0.5, 0.5) -> (0x8, 0x8).
*
* 2x MSAA sample positions are (0.25, 0.25) and (0.75, 0.75):
* 2x MSAA sample positions are (0.75, 0.75) and (0.25, 0.25):
* 4 c
* 4 0
* c 1
* 4 1
* c 0
*/
static const uint32_t
brw_multisample_positions_1x_2x = 0x0088cc44;
brw_multisample_positions_1x_2x = 0x008844cc;
/**
* Sample positions:

View File

@@ -68,10 +68,10 @@ gen6_get_sample_position(struct gl_context *ctx,
* index layout in case of 2X and 4x MSAA, but they are different in
* case of 8X MSAA.
*
* 2X MSAA sample index / number layout
* ---------
* | 0 | 1 |
* ---------
* 8X MSAA sample index layout 8x MSAA sample number layout
* --------- ---------
* | 0 | 1 | | 1 | 0 |
* --------- ---------
*
* 4X MSAA sample index / number layout
* ---------
@@ -107,7 +107,7 @@ gen6_get_sample_position(struct gl_context *ctx,
void
gen6_set_sample_maps(struct gl_context *ctx)
{
uint8_t map_2x[2] = {0, 1};
uint8_t map_2x[2] = {1, 0};
uint8_t map_4x[4] = {0, 1, 2, 3};
uint8_t map_8x[8] = {3, 7, 5, 0, 1, 2, 4, 6};
uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13,

View File

@@ -7,7 +7,7 @@ Name: gl
Description: Mesa OpenGL library
Requires.private: @GL_PC_REQ_PRIV@
Version: @PACKAGE_VERSION@
Libs: -L${libdir} -l@GL_LIB@
Libs: -L${libdir} -l@GL_PKGCONF_LIB@
Libs.private: @GL_PC_LIB_PRIV@
Cflags: -I${includedir} @GL_PC_CFLAGS@
glx_tls: @GLX_TLS@

View File

@@ -1229,7 +1229,7 @@ void st_init_extensions(struct pipe_screen *screen,
screen->is_format_supported(screen, PIPE_FORMAT_R8G8B8A8_UNORM,
PIPE_TEXTURE_2D, 0, 0,
PIPE_BIND_SAMPLER_VIEW) &&
screen->is_format_supported(screen, PIPE_FORMAT_B8G8R8A8_SRGB,
screen->is_format_supported(screen, PIPE_FORMAT_R8G8B8A8_SRGB,
PIPE_TEXTURE_2D, 0, 0,
PIPE_BIND_SAMPLER_VIEW) &&
screen->is_format_supported(screen, PIPE_FORMAT_R16_UNORM,

View File

@@ -120,6 +120,10 @@ TODO: document the other workarounds.
<option name="allow_glsl_extension_directive_midshader" value="true" />
</application>
<application name="Metro 2033 Redux / Metro Last Night Redux" executable="metro">
<option name="allow_glsl_extension_directive_midshader" value="true" />
</application>
<application name="Worms W.M.D" executable="Worms W.M.Dx64">
<option name="allow_higher_compat_version" value="true" />
</application>