Compare commits

...

3795 Commits

Author SHA1 Message Date
Emil Velikov
f623a8be3e Update version to 13.0.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-24 12:09:15 +01:00
Jonathan Gray
af81cdfec0 mapi: automake: set VISIBILITY_CFLAGS for shared glapi
shared glapi was previously built without setting CFLAGS for
AM_CFLAGS and VISIBILITY_CFLAGS.

This resulted in symbols being exported that shouldn't be.

The x86 and sparc assembly versions of the dispatch table partially
mitigated this by using .hidden.  Otherwise shared_dispatch_stub_*
were being exported.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.2 12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-24 11:32:13 +01:00
Emil Velikov
990f395e00 anv: automake: cleanup the generated json file during make clean
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8df581520a)

Conflicts:
	src/intel/vulkan/Makefile.am
2016-10-24 11:31:53 +01:00
Stencel, Joanna
19e8270fe0 egl/wayland: add missing destroy_window callback
The original patch by Joanna added the function pointer and callback yet
things got only partially applied - the infra was added, but the
implementation was missing.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Fixes: 690ead4a13 ("egl/wayland-egl: Fix for segfault in
dri2_wl_destroy_surface.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

(cherry picked from commit 2e0ab61e29)
2016-10-24 09:55:02 +01:00
Emil Velikov
cac49ee2cd automake: don't forget to pick wglext.h in the tarball
Earlier commit reworked the header install rules, to ensure that the
correct ones are installed only as needed.

By doing so it dropped a wildcard which was effectively including the
wglext.h header in the tarball.

Add the header to the top-level noinst_HEADERS, since the it is not
meant to be installed (autoconf is not used on Windows plaforms).

Fixes: a89faa2022 ("autoconf: Make header install distinct for various
APIs (v2)")
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

(cherry picked from commit 3511a86111)
2016-10-24 09:55:02 +01:00
Dave Airlie
0f8b7f90d1 radv: allow cmask transitions without fast clear
This fixes
dEQP-VK.pipeline.multisample.sampled_image*

These all render to multisampled image, and then
sample from it, so we must transition it correctly,
since we have a cmask and fmask this will cause
the correct transition.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a969548f59)
2016-10-24 09:55:02 +01:00
Jason Ekstrand
abf5327b86 anv: Suffix the intel_icd file with the host CPU
Vulkan has a multi-arch problem... The idea behind the Vulkan loader is
that you have a little json file on your disk that tells the loader where
to find drivers.  The loader looks for these json files in standard
locations, and then goes and loads the my_driver.so's that they specify.
This allows you as a driver implementer to put their driver wherever on the
disk they want so long as the ICD points in the right place.

For a multi-arch system, however, you may have multiple libvulkan_intel.so
files installed that the loader needs to pick depending on architecture.
Since the ICD file format does not specify any architecture information,
you can't tell the loader where to find the 32-bit version vs. the 64-bit
version.  The way that packagers have been dealing with this is to place
libvulkan_intel.so in the top level lib directory and provide just a name
(and no path) to the loader.  It will then use the regular system search
paths and find the correct driver.  While this solution works fine for
distro-installed Vulkan drivers, it doesn't work so well for user-installed
drivers because they may put it in /opt or $HOME/.local or some other more
exotic location.  In this case, you can't use an ICD json file with just a
library name because it doesn't know where to find it; you also have to add
that to your library lookup path via LD_LIBRARY_PATH or similar.

This patch handles both use-cases by taking advantage of the fact that the
loader dlopen()s each of the drivers and, if one dlopen() calls fails, it
silently continues on to open other drivers.  By suffixing the icd file, we
can provide two different json files: intel_icd.x86_64.json and
intel_icd.i686.json with different paths.  Since dlopen() will only succeed
on the libvulkan_intel.so of the right arch, the loader will happily ignore
the others and load that one.  This allows us to properly handle multi-arch
while still providing a full path so user installs will work fine.

I tested this on my Fedora 25 machine with 32 and 64-bit builds of our
Vulkan driver installed and 32 and 64-bit builds of crucible.  It seems to
work just fine.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d96345de98)

Squashed with commit:

anv: Always use the full driver path in the intel_icd.*.json

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7ea4ef8849)

Squashed with commit:

configure: Get rid of the --disable-vulkan-icd-full-driver-path flag

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3f05fc62f9)
2016-10-24 09:54:28 +01:00
Francisco Jerez
d0d3e721d0 Revert "Revert "mapi: export all GLES 3.2 functions in libGLESv2.so""
This reverts commit 85e9bbc14d.  The
previous commit should help with the scons build failure caused by the
original commit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 811eb7f178)
2016-10-24 09:10:01 +01:00
Francisco Jerez
293e458558 glapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.
These two GLES 3.2 entry points were being defined in the category of
the ARB_ES3_2_compatibility and KHR_blend_equation_advanced extensions
respectively instead of in the ES3.2 category.  Defining them in the
ES3.2 category makes sure that the gl_procs.py generator emits
declarations in the glprocs.h header file for the unsuffixed GLES-only
entry points that PrimitiveBoundingBoxARB and BlendBarrierKHR
respectively alias.  This should avoid a compilation failure during
scons builds in combination with "mapi: export all GLES 3.2 functions
in libGLESv2.so".

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 15a084a039)
2016-10-24 09:08:28 +01:00
Samuel Pitoiset
5798d602e0 nvc0: do not break 3D state by pushing MS coordinates on Fermi
Long story short, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.

I still don't have the faintest clue what is going on, but
this seems to only affect Fermi generation. A possible fix
could be to use two different channels, one for 3D and one
for CP.

This fixes a bunch of regressions pinpointed by piglit.

Fixes: "nvc0: fix up image support for allowing multiple samples"
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 42273edf79)
2016-10-24 09:07:21 +01:00
Nicolai Hähnle
039d1e6f11 radeonsi: fix 64-bit loads from LDS
Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among
others.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 4a2dbfff05)
2016-10-24 09:06:16 +01:00
Nicolai Hähnle
ba6efd48c3 st/mesa: only set primitive_restart when the restart index is in range
Even when enabled, primitive restart has no effect when the restart index
is larger than the representable values in the index buffer.

Fixes GL45-CTS.gtf31.GL3Tests.primitive_restart.primitive_restart_upconvert
for radeonsi VI.

v2: add an explanatory comment

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
(cherry picked from commit bfa50f88ce)
2016-10-24 09:05:18 +01:00
Nicolai Hähnle
13f685cf11 st/glsl_to_tgsi: sort input and output decls by TGSI index
Fixes a regression introduced by commit 777dcf81b.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98307
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3d9b57e493)
2016-10-24 09:04:17 +01:00
Nicolai Hähnle
8f807e914f st/glsl_to_tgsi: fix block copies of arrays of structs
Use a full writemask in this case. This is relevant e.g. when a function
has an inout argument which is an array of structs.

v2: use C-style comment (Timothy Arceri)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a1895685f8)
2016-10-24 09:03:16 +01:00
Nicolai Hähnle
3581e21d5b st/glsl_to_tgsi: fix block copies of arrays of doubles
Set the type of the left-hand side to the same as the right-hand side,
so that when the base type is double, the writemask of the MOV instruction
is properly fixed up.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ca592af880)
2016-10-24 09:02:14 +01:00
Ilia Mirkin
52df379d6b nv50/ir: process texture offset sources as regular sources
With ARB_gpu_shader5, texture offsets can be any source, including TEMPs
and IN's. Make sure to process them as regular sources so that we pick
up masks, etc.

This should fix some CTS tests that feed offsets directly to
textureGatherOffset, and we were not picking up the input use, thus not
advertising it in the shader header.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dave Airlie <airlied@redhat.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cd45d758ff)
2016-10-24 09:01:03 +01:00
Ilia Mirkin
05b89cf40e nv50,nvc0: avoid reading out of bounds when getting bogus so info
The state tracker tries to attach the info to the wrong shader. This is
easy enough to protect against.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 313fba5ee1)
2016-10-24 08:59:57 +01:00
Eric Engestrom
4768b7353f wsi/wayland: fix error path
Fixes: 1720bbd353 ("anv/wsi: split image alloc/free out to separate fns.")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8bf7717e1f)
2016-10-24 08:58:59 +01:00
Dave Airlie
554a99ebde radv: use emit_icmp for samples_identical
On a debug llvm build we'd assert on the next compare
when the return from samples_identical was i1 instead
of i32.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d842546ad1)
2016-10-24 08:51:57 +01:00
Emil Velikov
e45c4586c2 Update version to 13.0.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 19:12:42 +01:00
Emil Velikov
2ced8eb136 Revert Use absolute path in intel_icd.json and related patches.
This commit effectively reverts the following commits:

This reverts commit 0b6837a643.
This reverts commit 05f36435ef.
This reverts commit a2ae67aa47.

While the feature introduced is convinient for development it is not as
useful for distributions. Furthermore it even breaks things as one
wishes to have both 32 and 64 bit package installed on the same system.

Keep the functionality in development branch(es) and drop it from
distribution packages to avoid confusion and misuse.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 19:10:30 +01:00
Emil Velikov
3ef8d4288a docs: rename release notes to 13.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 19:10:16 +01:00
Marek Olšák
a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
54f8efeb02 st/mesa: disable alpha-test, alpha-to-coverage, alpha-to-one for integer FBs
v2: rebased

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
c64da9d499 mesa: remove gl_shader_compiler_options::EmitNoNoise
it's always true

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
2897cb3dba glsl_to_tgsi: remove code for fixing up TGSI labels
I don't know what this was supposed to do, but all TGSI labels were
always 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
ec35ff4e2b glsl_to_tgsi: remove subroutine support
Never used. The GLSL compiler doesn't even look at EmitNoFunctions.

v2: add back "return" support in "main"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
eacda2c080 mesa_to_tgsi: remove remnants of flow control and subroutine support
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
82f4c0126d mesa_to_tgsi: drop support for instructions that can't occur here
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
4e42898d9d glsl_to_tgsi: allocate glsl_to_tgsi_instruction::tex_offsets on demand
sizeof(glsl_to_tgsi_instruction): 384 -> 264

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
4d3d620f26 glsl_to_tgsi: merge buffer and sampler fields in glsl_to_tgsi_instruction
sizeof(glsl_to_tgsi_instruction): 416 -> 384

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
dbf64ea28b glsl_to_tgsi: reduce the size of glsl_to_tgsi_instruction using bitfields
sizeof(glsl_to_tgsi_instruction): 464 -> 416

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
9015cbb3a3 glsl_to_tgsi: reduce the size of st_dst_reg and st_src_reg
I noticed that glsl_to_tgsi_instruction is too huge.

sizeof(glsl_to_tgsi_instruction): 752 -> 464 (-38%)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
222c599b61 glsl_to_tgsi: remove unused st_translate::tex_offsets
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
0d95eeb79c glsl_to_tgsi: remove unused parameters from calc_deref_offsets
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Marek Olšák
6980480052 glsl_to_tgsi: use array_id for temp arrays instead of hacking high bits
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Adam Jackson
4276b5c16a reviewers: Throw myself on the GLX grenade
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-19 12:37:22 -04:00
Eric Engestrom
8acb79dfac egl: bring back the default glapi.so name
Earlier commit replaced the default platform specific libglapi.so name
with an #error.

This may have been overzealous since the name is the correct for the BSD
platforms, at least. Reinstate the hunk - bringing back OpenBSD, et al.
to a successful build state.

Fixes: 7a9c92d071 ("egl/dri2: non-shared glapi cleanups")
[Emil Velikov: format the patch from Eric, add commit message and tag.]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-10-19 15:09:26 +01:00
Iago Toral Quiroga
66d8bd3b7e i965: fix subnr overflow in suboffset()
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-19 11:48:21 +02:00
Dave Airlie
86c4575a81 radv: decompress fmask before reading using texture unit
Before we can read the fmask using the compute shader, we need
to decompress the fmask in place.

This fixes a bunch of remaining failure and hopefully multisampling
in Talos.
2016-10-19 17:39:47 +10:00
Dave Airlie
67c91ef2a2 radv: fix samples_identical return value.
This was returning an inversion, so not doing as it should have.

We need to compare the fmask value with 0, and return the result
from that.
2016-10-19 17:39:01 +10:00
Dave Airlie
93ba86c307 radv: fix wsi porting regression in swapchain destroy.
The code in anv is right, there's a pending patch to fix this up
different, but I'll sync the code for now.
2016-10-19 13:54:49 +10:00
Dave Airlie
63406b669e radv: fix fmask ptr issue
We were using the wrong descriptor in the fmask picking code.
2016-10-19 13:16:25 +10:00
Dave Airlie
db7ae14b60 radv: simplify fast clear shaders
There is no need for anything but a noop shader here.
2016-10-19 13:16:14 +10:00
Dave Airlie
1ec5e6e702 vulkan/wsi: fix out of tree build. 2016-10-19 10:54:42 +10:00
Dave Airlie
b0e11a153c radv: start using defines for the user sgpr offsets
This adds some comments and adds defines for the user sgprs,
so that we can move them around easier later and not have
to change/revalidate every one of these.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 10:17:48 +10:00
Dave Airlie
6c3bd1cdb3 radv: port to common wsi codebase
This drops all the radv WSI code in favour of using
the new shared code that was ported from anv

This regresses Talos for now, Jason has pointed out
the bug is in Talos and we should wait for them to fix it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
3f7ef24889 anv: move to using shared wsi code
This moves the shared code to a common subdirectory
and makes anv linked to that code instead of the copy
it was using.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
ec0bc14a70 anv/wsi: remove all anv references from WSI common code
the WSI code should be now be clean for sharing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
971523410f anv: move common wsi code to x11/wayland common files.
Next task is to rename all the anv_ out of this,
and move to a common location

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
e0d15fbe1d anv/wsi/wayland: add callback to get device format properties.
This avoids having to know the toplevel API name.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
4392de6771 anv/wsi/wl: stop using device in more places
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
507722b882 anv/wsi: split out surface creation to avoid instance API
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
954cd09e66 anv/wsi: move further away from passing anv displays around
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
1720bbd353 anv/wsi: split image alloc/free out to separate fns.
This moves these outside the wsi platform code, so we can reuse
that code

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:43 +10:00
Dave Airlie
828b8dbce4 anv/wsi: switch to using VkDevice in swapchain
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
6542001345 anv/wsi/x11: more refactoring to use generic handles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
340e72f056 anv/wsi/x11: start refactoring out the image allocation/free functionality
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
c264c272a5 anv/wsi: drop device from get format
Just use the wsi_device instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
467d161e6a anv/wsi: remove device from get_support interface
replace with wsi_device and allocator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
b8e7460563 anv/wsi/x11: abstract WSI interface from internals.
This allows the API and the internals to be split, and the
internals shared.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
36e6be2e0d anv/wsi/x11: push anv_device out of the init/finish routines
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
7c10258567 anv/wsi: abstract wsi interfaces away from device a bit more.
This is a step towards separating out the wsi code for sharing

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
be61fff6da anv/wsi/x11: push device out of x11 connection fns.
just pass the allocator/wsi_interface instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
e9cf7c4460 anv/wsi: drop device from get caps
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
0e4abc3e10 anv/wsi: drop get present modes device arg
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Dave Airlie
32d70c0d66 radv/anv/wsi: drop unneeded parameter
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 10:15:42 +10:00
Roland Scheidegger
aeceec54a8 draw: improve vertex fetch (v2)
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, it looks easier swapping the fetch loops (outer loop per attrib,
inner loop filling up the per vertex elements - this way the aos->soa
conversion also can be done per attrib and not just at the end though again
this doesn't really make much of a difference in the generated code). (This
would also make it possible to vectorize the calculations leading to the
fetches.)
There's also some minimal change simplifying the overflow math slightly.
All in all, the generated code seems to look slightly simpler (depending
on the actual vs), but more importantly I've seen a significant reduction
in compile times for some vs (albeit with old (3.3) llvm version, and the
time reduction is only really for the optimizations run on the IR).
v2: adapt to other draw change.

No changes with piglit.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger
0942fe548e draw: improved handling of undefined inputs
Previous attempts to zero initialize all inputs were not really optimal
(though no performance impact was measurable). In fact this is not really
necessary, since we know the max number of inputs used.
Instead, just generate fetch for up to max inputs used by the shader,
directly replacing inputs for which there was no vertex element by zero.
This also cleans up key generation, which previously would have stored
some garbage for these elements.
And also drop the assertion which indicates such bogus usage by a
debug_printf (the whole point of initializing the undefined inputs was to
make this case safe to handle).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger
d1b4a3451e gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger
6f2f0daeb4 gallivm: Use native packs and unpacks for the lerps
For the texturing packs, things looked pretty terrible. For every
lerp, we were repacking the values, and while those look sort of cheap
with 128bit, with 256bit we end up with 2 of them instead of just 1 but
worse, plus 2 extracts too (the unpack, however, works fine with a
single instruction, albeit only with llvm 3.8 - the vpmovzxbw).

Ideally we'd use more clever pack for llvmpipe backend conversion too
since we actually use the "wrong" shuffle (which is more work) when doing
the fs twiddle just so we end up with the wrong order for being able to
do native pack when converting from 2x8f -> 1x16b. But this requires some
refactoring, since the untwiddle is separate from conversion.

This is only used for avx2 256bit pack/unpack for now.

Improves openarena scores by 8% or so, though overall it's still pretty
disappointing how much faster 256bit vectors are even with avx2 (or
rather, aren't...). And, of course, eliminating the needless
packs/unpacks in the first place would eliminate most of that advantage
(not quite all) from this patch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Dave Airlie
7e1e06bc75 anv: drop pointless struct decl.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
e4df1830e4 radv: drop pointless struct decl.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
4450f40519 radv: move to using shared vk_alloc inlines.
This moves to the shared vk_alloc inlines for vulkan
memory allocations.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
1ae6ece980 anv: move to using vk_alloc helpers.
This moves all the alloc/free in anv to the generic helpers.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
0cfd428aef vulkan: add vk_alloc.h shared allocation inlines.
vulkan allocation allows for overriding the allocator used,
add some macros for anv/radv to share for this.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
2c6d8bff03 anv: drop local MIN/MAX macros.
Use the ones from mesa, most places already did.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:26 +10:00
Dave Airlie
c6f1077e0d radv: drop local MIN/MAX macros.
Use the ones in macros.h instead.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:25 +10:00
Dave Airlie
78bce52f9a util: move min/max/clamp macros to util macros.h
Although the vulkan drivers include mesa macros.h, for
radv I'd like to move away from that.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:25 +10:00
Dave Airlie
f5daaba0fd radv: make use of shared vector helper.
This removes the vector code from radv in favour of sharing
code with anv.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:25 +10:00
Dave Airlie
8df014c01a anv: port to using new u_vector shared helper.
This just removes the anv vector code and uses the new helper.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:25 +10:00
Dave Airlie
008f54f63a util: add vector util code.
This is ported from anv, both anv and radv can share this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-19 09:05:25 +10:00
Brian Paul
8b731b8b03 svga: minor code improvements in svga_validate_pipe_sampler_view()
Use the 'texture' local var in more places.
Rename 'pFormat' to 'viewFormat'.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-18 16:16:26 -06:00
Lionel Landwerlin
0ca134aa9f intel: genxml: add SAMPLER_BORDER_COLOR_STATE structures
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-18 22:43:41 +01:00
Boyuan Zhang
5567145d59 st/va: force to flush the last p frame in idr period
During dual instance encoding submission, if the second encode task and first
encode task have no reference dependency, e.g. p following with idr-frame,
there is a chance the second task will use for its reconstructed picture
buffer the same buffer used by first task for its reference/reconstructed
picture. In this case, buffer corruption may occur depending on encoding
speed. Fix is to force flush these two tasks separately to avoid race condition

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-18 15:16:34 -04:00
Chad Versace
52a6483e8a egl/surfaceless: Fix segfault in eglSwapBuffers
Since commit 63c5d5c6c4, the surfaceless
platform has allowed creation of pbuffer surfaces. But the vtable entry
for eglSwapBuffers has remained NULL.

Discovered by running a little pbuffer test.

Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-18 11:12:22 -07:00
Marek Olšák
21af69e753 radeonsi: rename prefixes from radeon to si
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:08 +02:00
Marek Olšák
6e475fefa1 radeonsi: merge radeon_llvm_context and si_shader_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:06 +02:00
Marek Olšák
5ab25bb4ba radeonsi: import all TGSI->LLVM code from gallium/radeon
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:04 +02:00
Marek Olšák
4967cacdfa gallium/radeon: simplify initialization of 64-bit gallivm builders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:03 +02:00
Marek Olšák
502dad4dca gallium/radeon: remove unused radeon_llvm_reg_index_soa
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:01 +02:00
Marek Olšák
4e5d076fcf radeonsi: move LLVM ALU codegen into radeonsi
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:40:59 +02:00
Jonathan Gray
41754f743f genxml: add generated headers to EXTRA_DIST
Building the Mesa 12.0.3 distfile failed on a system without python
as generated files were not included in the distfile.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-18 17:06:42 +01:00
Jonathan Gray
23392abf50 mesa: automake: include mesa_glinterop.h in distfile
Add mesa_glinterop.h to the list of headers that will get included
in the distfile as it is required to build Mesa itself.

Corrects a regression introduced in a89faa2022.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-18 17:06:42 +01:00
Jonathan Gray
2fc1374be6 egl: remove docs directory from EXTRA_DIST
The egl docs directory no longer exists as of
88b5c36fe1.

Remove it from EXTRA_DIST to unbreak 'make dist'

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-18 17:06:42 +01:00
Jonathan Gray
27572db46d genxml: avoid using a GNU make pattern rule
% pattern rules are a GNU extension.  Convert the use of one to a
inference rule to allow this to build on OpenBSD.

This is a related change to the one made in
e3d43dc5ea

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-18 17:06:42 +01:00
Emil Velikov
9898c60745 configure.ac: use a single require_libdrm helper
Rather than having 4-5 places which do the explicit check/message just
polish the gallium helper and use it everywhere.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:38 +01:00
Emil Velikov
3e079c3f86 configure.ac: remove no longer needed *_pci_id logic
Previously it was used to differentiate between the different codepaths
in the loader. Although strictly speaking the (core) of the loader is
only used when a hardware device is available. The latter of which in
itself requires libdrm (one of the codepaths available).

That said, all the configure toggles which relate to enabling/using hw
device should attribute and require libdrm, so there's no need to keep
this code around.

With this gallium_require_drm_loader becomes an empty stub, so nuke that
one as well.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:35 +01:00
Emil Velikov
47b5925d9b loader: cleanup copyright section
With previous patches nearly all the original code (as seen in the
various loaders) is gone.

Update the copyright/license section to reflect that.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:32 +01:00
Emil Velikov
af7abc512c loader: remove loader_get_driver_for_fd() driver_type
Reminiscent from the pre-loader days, were we had multiple instances of
the loader logic in separate places and one could build a "GALLIUM_ONLY"
version.

Since that is no longer the case and the loaders (glx/egl/gbm) do not
(and should not) require to know any classic/gallium specific we can
drop the argument and the related code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:29 +01:00
Emil Velikov
f9f7e44c94 loader: remove final sysfs codepath in loader_get_device_name_for_fd()
Effectively everyone with actual hardware and/or requesting the
"device_name" requires a working libdrm. Thus they could/should already
be using the (now only) codepath.

Apart from the code simplification, we can slim down our configure.ac
even further. But that will be done in separate patch(es).

Cc: Gary Wong <gtw@gnu.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:26 +01:00
Emil Velikov
4f1c33fd9d travis: remove no longer needed libudev-dev dependency
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:24 +01:00
Emil Velikov
cb23fba3f3 scons: remove all libudev references
Analogous to previous automake/autoconf commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:21 +01:00
Emil Velikov
4a183f4d06 scons: loader: use libdrm when available
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:18 +01:00
Emil Velikov
0607c5b1b0 gbm: remove superfluous/incorrect udev comment
The gbm_device_get_backend_name() provides an (somewhat) internal name
of the implementation/backend used. Is has nothing to do with the udev,
one cannot and should not attempt to derive the name from it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:15 +01:00
Emil Velikov
6b21fdaa8f automake: remove all the libudev references
As of last commit nothing in mesa depends on libudev.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:12 +01:00
Emil Velikov
1e2e625e30 loader: remove libudev_get_device_name_for_fd and related code
With this all the libudev related code is now gone.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:10 +01:00
Emil Velikov
fcdc02f512 loader: reimplement loader_get_user_preferred_fd via libdrm
Currently not everyone has libudev and with follow-up patches we'll
completely remove the divergent codepaths.

Use the libdrm drm device API to construct the required ID_PATH_TAG-like
string, to preserve the current functionality for libudev users and
allow others to benefit from it as well.

v2: Drop ranty comments, pick the correct device
v3: \n -> \0 in PCI_ID_PATH_TAG_LENGTH comment (Axel).
v4: Use snprintf (Nicolai)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-18 17:06:10 +01:00
Emil Velikov
8222100631 loader: annotate __driConfigOptionsLoader as static
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:07 +01:00
Emil Velikov
d561e064a8 loader: separate USE_DRICONF code into separate function
Improves readability and allows us to do further cleanups a lot easier.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:04 +01:00
Emil Velikov
be239326aa loader: slim down loader_get_pci_id_for_fd implementation(s)
Currently mesa has three code paths in the loader - libudev, manual
sysfs and drm ioctl one.

Considering the issues we had with libudev - strip those down in favour
of the libdrm drm device API. The latter can be implemented in any way
depending on the platform and can be reused by others.

v2: Use correct message on drmGetDevice failure. (Nicolai)

Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-18 17:06:04 +01:00
Emil Velikov
fd00aba5f4 configure.ac: mark libdrm as have_pci_id provider
With follow on work, we'll untangle and simplify all the different
codepaths in loader. Then again, we forget to set have_pci_id when
libdrm is present (one of the codepaths available).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:01 +01:00
Ilia Mirkin
8c78fdb328 gm107/ir: fix bit offset of tex lod setting for indirect texturing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-18 09:56:14 -04:00
Ilia Mirkin
ecea2f69ef gm107/ir: fix texturing with indirect samplers
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted by one argument position.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-18 09:56:14 -04:00
Marek Olšák
34099894c3 gallium/tgsi: add missing #include
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 11:20:57 +02:00
Julien Isorce
dbc8e18116 st/va: set default rt formats when calling vaCreateConfig
As specified in va.h, default value should be set on attributes
not present in the input list.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2016-10-18 08:44:14 +01:00
Kenneth Graunke
9f677d6541 i965: Fix gl_InvocationID in dual object GS where invocations == 1.
dEQP-GLES31.functional.geometry_shading.instanced.geometry_1_invocations
draws using a geometry shader that specifies

   layout(points, invocations = 1) in;

and then uses gl_InvocationID.  According to the Haswell PRM, the
"GS Instance ID 0" (and 1) thread payload fields are undefined in
dual object mode:

   "If 'dispatch mode' is DUAL_OBJECT this field is not valid."

But there's no point in using them - if there's only one invocation,
the ID will be 0.  So just load a constant.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 20:22:02 -07:00
Jason Ekstrand
52904ba85c anv: Get rid of anv_cmd_buffer_emit_state_base_address
All code that would have once called this can now call the gen-specific
version.  The switching version is no longer needed.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Jason Ekstrand
7998e37774 anv/cmd_buffer: Move descriptor flushing into genX_cmd_buffer.c
It really should have gone here all along.  We were trying a bit too hard
to make it gen-agnostic just because it didn't have any #if's.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Jason Ekstrand
eddaa237c0 anv/cmd_buffer: Expose ensure_push_constant_*
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Jason Ekstrand
1f3e6468d2 anv/cmd_buffer: Unify flush_compute_state across gens
With one small genxml change, the two versions were basically identical.
The only differences were one #define for HSW+ and a field that is missing
on Haswell but exists everywhere else.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Jason Ekstrand
2314c9ed2e anv/cmd_buffer: Move Begin/End/Execute to genX_cmd_buffer.c
vkBeginCommandBuffer and vkCmdExecuteCommands both call into the
gen-specific emit_state_base_address function and vkEndCommandBuffer
belongs with begin.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Jason Ekstrand
ac0ca066de anv/cmd_buffer: Move state base address re-emit into ExecuteCommands
This has two primary advantages.  First, it means that the batch_chain code
knows less about the actual command buffer contents which is good because
improves separation.  Second, it means that it only gets re-emitted once
after all of the secondaries instead of once after each secondary which is
just wasteful.  It also has the advantage of cleaning the code up a bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-17 17:41:35 -07:00
Edward O'Callaghan
1c05f92590 doc/features.txt: factor out radeonsi as GL45 complete
V2. add i965/hsw+ to list
V3. rebased on master.
V4. 'DONE' -> 'DONE ()'.
V5. remove i965/hsw+ from list :/

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-18 09:55:45 +11:00
Ian Romanick
89e1436e2d i965: Silence unused parameter warnings
brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter]
                            gl_shader_stage shader_type,
                                            ^
brw_nir.c: In function ‘brw_nir_lower_vs_inputs’:
brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
                         const struct gen_device_info *devinfo,
                                                       ^
brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter]
                            uint32_t sampler,
                                     ^
brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter]
 vec4_visitor::gs_emit_vertex(int stream_id)
                                  ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-17 11:32:03 -07:00
Ian Romanick
7c0c3740f0 glsl: Remove unused function import_prototypes
Once upon a time, this was used to extract prototypes from the shader
containing GLSL built-in functions.  This was removed by f5692f45 in
November 2010 for Mesa 7.10.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-17 11:32:03 -07:00
Ian Romanick
5c025ea6fc glsl: Remove prototypes for nonexistent functions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-17 11:32:03 -07:00
Ian Romanick
fde48c1262 glsl: Replace assert with unreachable
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-17 11:32:03 -07:00
Lionel Landwerlin
696f5c1853 anv: replace , with ; in anv_batch_emit()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-17 18:16:38 +01:00
Lionel Landwerlin
6b17e3a6da intel: aubinator: use different colors to signal batch start/end
This makes the stream of commands a bit easier to read.

v2 (Ken): Use bold text on green headers for easier readability;
          swap the green and blue headers so the majority stay blue.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-17 18:16:38 +01:00
Nicolai Hähnle
c3ce0d22b4 st/glsl_to_tgsi: fix [ui]vec[34] conversion to double
The corresponding opcodes for integers need to be treated the same as F2D.

Fixes GL45-CTS.gpu_shader_fp64.conversions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:09:45 +02:00
Nicolai Hähnle
1dd99a15a4 st/glsl_to_tgsi: fix atomic counter addressing
When more than one atomic counter buffer is in use, UniformStorage[n].opaque
is set up to contain indices that are contiguous across all used buffers.

This appears to be used by i965 via NIR, but for TGSI we do not treat atomic
counter buffers as opaque, so using the data in the opaque array is incorrect.

Fixes GL45-CTS.compute_shader.resource-atomic-counter.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:09:42 +02:00
Nicolai Hähnle
9d6f82320c st/glsl_to_tgsi: fix a corner case of std140 layout in uniform buffers
See the comment in the code for an explanation. This fixes
GL45-CTS.buffer_storage.map_persistent_draw.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:09:39 +02:00
Nicolai Hähnle
57a1514203 st/mesa: fix fragment shader output mapping
Properly handle the case where there is a gap in the assigned output locations,
e.g. a fragment shader writes to color buffer 2 but not to color buffers 0 & 1.

Fixes GL45-CTS.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:09:37 +02:00
Nicolai Hähnle
e0213f36bb glsl: print non-zero bindings of variables
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:09:33 +02:00
Nicolai Hähnle
9160b4d981 radeonsi: unify the constant load paths
Remove the split between direct and indirect.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:08:45 +02:00
Nicolai Hähnle
51f9b38ce8 radeonsi: fix indirect loads of 64 bit constants
This fixes GL45-CTS.compute_shader.fp64-case3.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:08:36 +02:00
Eric Engestrom
e9864f93c6 gbm: add a couple missing includes
Needed for memset() and drmIoctl().

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-17 08:47:38 -07:00
Iago Toral Quiroga
8785a8ff89 glsl: fail compilation of compute shaders when unsupported
Generally, we only check for the presence of compute shaders during
parsing when we find any language (like layout qualifiers) that are
specific to compute shaders, however, it is possible to define an
empty compute shader does not use any language specific to compute
shaders at all and we should fail the compilation anyway. dEQP checks
this.

This patch adds a check for compute shader availability after we have
parsed the source code. At this point we know the effective GLSL version
and also extensions enabled in the shader.

Fixes a subcase of the following dEQP tests:
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader

The tests still fail because there is one more subcase that fails that needs
another fix.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-17 15:14:12 +02:00
Tapani Pälli
3d48353e29 egl/android: fix error in droid_add_configs_for_visuals()
This was some kind of leftover in commit acd35c8 and format_count
array variable (declared in outer scope) should be used instead.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: acd35c8758 ("egl/android: tweak droid_add_configs_for_visuals()")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-17 11:51:15 +01:00
Marek Olšák
74d145f4a8 radeonsi: shorten "shader->selector" to "sel" in si_shader_create
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-17 12:13:00 +02:00
Marek Olšák
2e74e8ead9 radeonsi: clear DB_RENDER_OVERRIDE
Vulkan doesn't set these fields even though it doesn't use HiS.
HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-17 12:13:00 +02:00
Kenneth Graunke
f30f48476f glsl: Disable textureOffset(sampler2DArrayShadow, ...) in GLSL ES.
This has apparently never existed in GLSL ES.

Fixes dEQP-GLES3.functional.shaders.texture_functions.invalid
.textureoffset_sampler2darrayshadow_vec4_ivec2_vertex and
.textureoffset_sampler2darrayshadow_vec4_ivec2_fragment

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98244
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-16 15:05:00 -07:00
Axel Davy
9baf4505fb st/nine: Fix multisample limit check
Fixes regression introduced by
b560305687

The regression prevents some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-17 00:02:52 +02:00
Eric Anholt
c61eb3c91c vc4: Fix fast clear color packing for 565.
Piglit didn't manage to cover this because fbo-clear-formats uses
scissors, so we don't get fast clearing.
2016-10-16 11:22:50 -07:00
Eric Anholt
46cd3bab93 state_tracker: Fix check for scissor enabled when < 0.
DEQP's clear tests like to give us x + w < 0 or y + h < 0.  Since we
were comparing to an unsigned, it would get promoted to unsigned and come
out as bignum >= width or height and we would clear the whole fb instead
of none of the fb.

Fixes 10 tests under deqp-gles2/functional/color_clear.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-16 11:22:50 -07:00
Chad Versace
07422bf32b egl/surfaceless: Fix comparison between pointer and integer
Fixes GCC warning:
  drivers/dri2/platform_surfaceless.c:196:18: warning: comparison
      between pointer and integer

Fixes: 4b8a55809e ("egl/surfaceless: tweak surfaceless_add_configs_for_visuals()")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-10-16 09:03:31 -07:00
Emil Velikov
d19b014b77 egl/surfaceless: use correct index when accesing the visual
i is used for the driver_configs, while j is for the visuals.

Fixes: 4b8a55809e ("egl/surfaceless: tweak surfaceless_add_configs_for_visuals()")
Reported-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-16 09:03:27 -07:00
Gustaw Smolarczyk
36cb5508e8 radv/winsys: Fail early on overgrown cs.
When !use_ib_bos, we can't easily chain ibs one to another. If the
required cs size grows over 1Mi - 8 dwords just fail the cs so that we
won't assert-fail in radv_amdgpu_winsys_cs_submit later on.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-10-16 12:38:53 +02:00
Kenneth Graunke
493237d4ee glsl: Drop the ES requirement that VS outputs must be flat qualified.
Several conformance tests violate this requirement:

ES31-CTS.core.tessellation_shader.max_patch_vertices
ES31-CTS.core.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through

I submitted a merge request to fix the conformance tests, but Khronos
opted to drop this GLSL ES specific requirement in favor of making flat
qualification of VS outputs optional, matching modern desktop GL.

Note that there were 7 Piglit tests which enforce this rule:
tests/spec/glsl-es-3.00/compiler/interpolation/qualifiers/*nonflat*
but these were deleted in Piglit commit acc0a2fabbd714bc704c16f1675e7c0.

Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=15465#c7
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-15 13:47:47 -07:00
Jason Ekstrand
6ef5a44a43 intel/genxml: Make some PIPE_CONTROL fields booleans
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:50 -07:00
Jason Ekstrand
f34de3e8b0 intel/genxml: Make "Predication enable" a boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:46 -07:00
Jason Ekstrand
468e1042cb intel/genxml; Make "Use Global GTT a boolean
We also remove the redundant zero defaults since everything without an
explicit default gets zeroed automatically.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:43 -07:00
Jason Ekstrand
ce86227175 intel/genxml; Make "Tiled Surface" a boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:39 -07:00
Jason Ekstrand
e6f9637d8a intel/genxml: Make "SO Buffer Enable" fields boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:36 -07:00
Jason Ekstrand
fa0285eaac intel/genxml: Make "Stencil Buffer Enable" a boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:30 -07:00
Jason Ekstrand
34826078f6 intel/genxml: Make a couple of STREAMOUT fields booleans
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:26 -07:00
Jason Ekstrand
6a064ad01d intel/genxml: Make "Include Vertex Handles" and "Include Primitive ID" booleans
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:23 -07:00
Jason Ekstrand
f21d3b4d01 intel/genxml: Make "Vector Mask Enable" a boolean
We also get rid of the "(VME)" a few places

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:19 -07:00
Jason Ekstrand
aee501c87e intel/genxml: Make "Single Program Flow" a boolean
We also get rid of the "(SPF)" a few places.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-15 12:20:14 -07:00
Tobias Klausmann
b7d9677de8 nv50/ir: constant fold OP_SPLIT
Split the source immediate value into new values and move them into the
original defs set by the split. Since we can only have up to 64-bit
immediates, this is largely beneficial for F64 (and, in the future, U64)
operations.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: always use U32, set newi for foldCount tracking]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-14 23:23:57 -04:00
Kenneth Graunke
75128d6ffd i965: Enable OpenGL 4.5.
Everything is in place.  There are still conformance issues to sort out,
but we may as well turn it on in master.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 17:35:13 -07:00
Jason Ekstrand
9d65595c06 anv/pipeline: Remove a meta hack from emit_ds_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
69b2e931d4 anv/image: Create views directly in VkCreate*View
Without meta, we no longer need the _init helpers and the ability to back
an image view with surface states allocated out of the command buffer.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
0a2c375af9 anv/image: Get rid of the usage hacks for meta
Now that meta is gone and we're using blorp, we don't need all of the usage
hacks.  Instead, the usage provided by the app is exactly the usage that we
want because the app is the only thing creating image views.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
8e1a8dd47e anv: Move Create*Pipelines into genX_cmd_buffer.c
Now that we don't have meta, we have no need for a gen-agnostic pipeline
create path.  We can, instead, just generate one Create*Pipelines function
per gen and be done with it.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
7df46b7533 anv/pipeline: Remove support for direct-from-nir shaders
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
6d557ae403 anv: Make entrypoint resolution take a gen_device_info
In order for things such as the ANV_CALL and the ifuncs to work, we used to
have a singleton gen_device_info structure that got assigned the first time
you create a device.  Given that the driver will never be used
simultaneously on two different generations of hardware, this was fairly
safe to do.  However, it has caused a few hickups and isn't, in general, a
good plan.  Now that the two primary reasons for this singleton are gone,
we can get rid of it and make things quite a bit safer.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
4c9dec80ed anv: Get rid of the ANV_CALL macro
This macro was needed by meta in order to make gen-specific calls from
gen-agnostic code.  Now that we don't have meta, the remaining two uses are
fairly trivial to get rid of.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
ac77528f7d anv: Get rid of graphics_pipeline_create_info_extra
Now that we no longer have meta, all pipelines get created via the normal
Vulkan pipeline creation mechanics.  There is no more need for this bit of
extra magic data that we've been passing around.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
dedc406ec8 anv: Get rid of meta
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:40:39 -07:00
Jason Ekstrand
d823f92970 anv: Use blorp for subpass clears
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
51faab487f anv: Use blorp for ClearAttachments
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
c9eaf12de2 anv/hiz: Perform HiZ resolves for all partial renders
If we don't, we can end up with corruption in the portion of the depth
buffer that lies outside the render area when we do a HiZ resolve at the
end.  The only reason we weren't seeing this before was that all of the
meta-based clears such as VkCmdClearDepthStencilImage were internally using
HiZ so the HiZ buffer never truly got out-of-sync.  If the CTS ever tested
a depth upload (which doesn't care about HiZ) and then a partial render we
would have seen problems.  Soon, we will be using blorp to do depth clears
and it won't bother with HiZ so we would get CTS regressions without this.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
58f2315c38 anv: Use blorp for ClearDepthStencilImage
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
29e289fa65 anv/image: Add an isl_view to anv_image_view
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
0340548c8e anv/image: Rework our handling of 3-D image array ranges
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
146ee31159 anv/blorp: Don't hand-roll flush_pipeline_select_3d
When I initially brought up Vulkan blorp, I completely missed that this
was already factored out.  There's no good reason for us to hand-roll it.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
d80c0307ea intel/blorp: Add a flag to make blorp not re-emit dept/stencil buffers
In Vulkan, we want to be able to use blorp to perform clears inside of a
render pass.  If blorp stomps the depth/stencil buffers packets then we'll
have to re-emit them.  This gets tricky when secondary command buffers get
involved.  Instead, we'll simply guarantee that the depth and stencil
buffers we pass to blorp (if any) match those already set in the hardware.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
0cabf93b80 intel/blorp: Add an entrypoint for clearing depth and stencil
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
82a2c49c5f intel/blorp: Emit a NULL render target for depth/stencil-only operations
This never mattered before because the only time we used blorp
depth/stencil only was to do HiZ operations on gen6-7.  It may have worked
in that case (and maybe it didn't) but slow depth clears actually do depth
rendering so they need a valid render target.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
b324c38ae3 intel/blorp: Allow for running without a PS on gen8+
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
81be7be119 intel/blorp: Add an "enabled" bit to surface_info
This gives a slightly smarter way to check whether or not a particular
surface exists than looking at the address.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
bc4bb5a7e3 intel/blorp: Emit more complete DEPTH_STENCIL state
This should now set the pipeline up properly for doing depth and/or stencil
clears by plumbing through depth/stencil test values.  We are now also
emitting color calculator state for blorp operations without an actual
shader because that is where the stencil reference value goes pre-SKL.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
7017742ad7 intel/blorp: Unify the DEPTH_STENCIL emit code across gens
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
cf2e3c3163 intel/blorp: Simplify depth/stencil config
The newly reworked depth/stencil config code can properly handle having
depth, stencil, both, or neither.  We no longer need to predicate it on
having depth or stencil.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
0414aaa133 intel/blorp: Set QPitch for depth and HiZ on gen8+ 2016-10-14 15:39:41 -07:00
Jason Ekstrand
563fa63bf2 intel/blorp: Add support for binding an actual stencil buffer
While we're here, we also make depth without HiZ work.

v2:
 - Use the correct surface type for 1-D on SKL+
 - Set QPitch on BDW+

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
f180faab79 intel/blorp: Move CLEAR_PARAMS setup into emit_depth_stencil_config
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
c1fcf1a957 intel/genxml: Add a uint MOCS field to 3DSTATE_STENCIL_BUFFER
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-14 15:39:41 -07:00
Jason Ekstrand
5dacd3caee intel/blorp: Make the Z component of the primitive adjustable
We want to be able to start doing slow depth clears with blorp.  This
allows us to adjust the depth we're clearing to.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-14 15:39:41 -07:00
Emil Velikov
7cb197c3a8 i915: workaround multiple intelFenceExtension definitions
Due to conflicting symbol names (between i915 and i965) in the
megadriver, we use a set of defines in i915/intel_screen.h.

With a recent commit we've introduced a symbol intelFenceExtension which
has different implementation for each driver, yet we forgot to add the
define.

Fixes: d11515ff1b ("i915/sync: Implement DRI2_Fence extension")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98264
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 19:22:16 +01:00
Chad Versace
cb836b673c docs/specs: Update allocated EGL enum values
Document the EGL enum ranges for Mesa and those values allocated by the
following extensions:

    EGL_MESA_drm_image
    EGL_MESA_platform_gbm
    EGL_MESA_platform_surfaceless
    EGL_WL_bind_wayland_display

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:41 -07:00
Chad Versace
0cfa34c102 doc/specs: Reference the Khronos registry XML
Years ago Khronos replaced the registry's spec files with newfangled XML
files.  Update the reference in doc/specs/enum.txt accordingly.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:40 -07:00
Chad Versace
88b5c36fe1 egl: Move old EGL_MESA_screen_surface spec
It was the lone file in src/egl/docs. Move it to where the other specs
live, in $MESA_TOP/docs/specs.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:40 -07:00
Chad Versace
a597c8ad5b egl: Implement EGL_MESA_platform_surfaceless
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:40 -07:00
Chad Versace
c177ef9d47 egl: Don't advertise unsupported platform extensions
Mesa's set of supported platform extensions depends on the autoconf
option --with-egl-platforms=foo,bar,baz. If --with-egl-platforms lacks
foo, then eglGetPlatformDisplay(EGL_PLATFORM_FOO, ...) unconditonally
fails.

So, if --with-egl-platforms lacks foo, then remove
EGL_VENDOR_platform_foo from the EGL client extension string.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:27 -07:00
Chad Versace
27f4e38173 docs: Add EGL_MESA_platform_surfaceless.txt (v2)
v2:
    - Assign enum values.
    - Define interactions with EGL_EXT_platform_base and EGL 1.4.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 11:19:13 -07:00
Ian Romanick
4246986dec i965: Sort some extension names
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-10-14 11:16:59 -07:00
Jose Fonseca
b12606b693 scons: Fix the Python dependency scanner.
modulefinder wasn't searching for dependencies in the script dir.

It's not capable of detecting the sys.path manipulations scripts do
internally neither.

This change fixes the first issue, and hacks around the second.

Honestly, I've come to the conclusion that automatic Python dependency it will always be
too brittle.   I think we should start manually typing the dependencies
like we do in automake.  At very least it will enable any person to
eyeball and spot/fix missing dependencies, without dig into SCons internals.
2016-10-14 16:52:13 +01:00
Jose Fonseca
c6d17701c8 pipe_loader_sw: Don't invoke Unix close() on Windows.
Trivial.
2016-10-14 16:29:04 +01:00
Emil Velikov
ebffa7b6af Revert "egl/dri2: rework dri2_make_current code flow"
This reverts commit 675719817e.
2016-10-14 16:07:33 +01:00
Mauro Rossi
6eacd69b6f i915: store reference to the context within struct intel_fence (v2)
Porting of the corresponding patch for i965.

Here follows the original commit message by Tomasz Figa:

"As the spec allows for {server,client}_wait_sync to be called without
currently bound context, while our implementation requires context
pointer.

v2: Add a mutex and acquire it for the duration of
    brw_fence_client_wait() and brw_fence_is_completed() as suggested
    by Chad."

NOTE: in i915 all references to 'brw' are replaced by 'intel'

Marshmallow-x86 boots ok with the following results of Android CTS.

Android CTS 6.0_r7 build:2906653
Session     Pass       Fail     Not Executed
0(EGL)      1410       24       0
1(GLES2)    13832      82       0

I get the same results as per i965GM.

[Emil Velikov: Include Mauro's test results]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 15:43:57 +01:00
Mauro Rossi
d11515ff1b i915/sync: Implement DRI2_Fence extension
Here is the porting of corresponding patch for i965,
i.e. commit c636284 i965/sync: Implement DRI2_Fence extension

Here follows part of original commit message by Chad Versace:

"This enables EGL_KHR_fence_sync and EGL_KHR_wait_sync."

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 15:43:53 +01:00
Mauro Rossi
19fa29a592 i915/sync: Replace prefix 'intel_sync' -> 'intel_gl_sync'
This is the porting of corresponding patch for i965,
i.e. commit 2516d83 i965/sync: Replace prefix 'intel_sync' -> 'intel_gl_sync'

The only difference compared to i965 one is that intel_check_sync() was renamed
to intel_gl_check_sync() here, as it is more appropriate.

Here follows original commit message by Chad Versace:

"I'm about to implement DRI2_Fenc in intel_syncobj.c.  To prevent
madness, we need to prefix functions for GL_ARB_sync with 'gl' and
functions for DRI2_Fence with 'dri'. Otherwise, the file will become
a jumble of similiarly named functions.

For example:
    old-name:      intel_client_wait_sync()
    new-name:      intel_gl_client_wait_sync()
    soon-to-come:  intel_dri_client_wait_sync()

I wrote this renaming commit separately from the commit that implements
DRI2_Fence because I wanted the latter diff to be reviewable."

[Emil Velikov: rename the outstanding intel_sync instances]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 15:43:22 +01:00
Emil Velikov
284795616a egl/drm: set eglError and provide an error message on failure
v2: Remove gratuitous newline/semicolon (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
d81ba763e3 egl/x11: attribute for dri2_add_config failure
... in dri2_x11_add_configs_for_visuals().

Currently the latter does not consider that, thus in such cases it adds
"empty" configs in the list.

Properly account for things and as we do that we can reuse count,
instead of calling _eglGetArraySize to determine if we've added any
configs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
0b2b719121 egl/wayland: introduce dri2_wl_add_configs_for_visuals() helper
Analogous to previous commits - with an extra bonus.

Current code, apart from not attributing the lack of 'per visual'
and overall configs also overwrites the newly added config.

Namely if the dpy supports two or more of the supported formats
(XRGB8888, ARGB8888 and RGB565) earlier configs will be overwritten
and the the final one will be stored, since the we use the same index
for all three in our dri2_add_config call.

v2: Use correct comparison in loop conditional (Eric)
    Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
4b8a55809e egl/surfaceless: tweak surfaceless_add_configs_for_visuals()
Analogous to previous commit.

v2: Use correct comparison in loop conditional (Eric)
    Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
acd35c8758 egl/android: tweak droid_add_configs_for_visuals()
Iterate over the driver_configs first in order to cut down the number of
getConfigAttrib() calls by a factor of 5.

While we're here, also drop the sentinel of the visuals array. We
already know its size so we can use that and save a few bytes.

v2: Use correct comparison in loop conditional (Eric)
    Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
36fe5900a4 egl/drm: introduce drm_add_configs_for_visuals() helper
Factor out and rework the existing code so that it prints a debug
message if we have zero configs for any visual.

As a nice side effect we now provide a correct (sequential ID) when
creating a config (via dri2_add_config).

v2: Use correct comparison in loop conditional (Eric)
    Use valid C initializer (Gurchetan)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
23ed073aa4 egl/surfaceless: print out a message on zero configs for given format
Currently we print a debug message if the total configs is non-zero only
to do the same (at an error level) as we return from the function.

Rework the message to print if we're missing a config for the given
format.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
98f5d0106a egl/dri2: set WL_bind_wayland_display in a consistent way
Introduce a helper and use it throughout the platform code. This allows
us to reduce the amount of ifdef(s) and (potentially) use
kms_swrast_dri.so for !drm platforms (namely wayland and x11).

Note: in the future as other platforms (android, surfaceless) support
the extension they can reuse the helper.

v2: Rebase, check for device_name.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:39 +01:00
Emil Velikov
637d001a97 egl/android: remove duplicate KHR_image_base set
The core egl/dri2 already sets the extension bit _only_ when possible -
which in Android's case is always.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:53:38 +01:00
Emil Velikov
9caacb39b9 loader/dri3: constify the loader_dri3_vtable
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:53:35 +01:00
Emil Velikov
fdd373acca egl/dri2: micro optimise dri2_bind_extensions()
Do not loop over all matches if we've already found one.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:46:09 +01:00
Emil Velikov
665cad1658 egl/dri2: annotate dri2_extension_match instances as const data
v2: Rebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:46:05 +01:00
Emil Velikov
3948ad82ce egl/dri2: use dri2_bind_extensions to manage the optional extensions
v2: dri2_bind_extensions() now takes optional as an argument.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:46:03 +01:00
Emil Velikov
d5342c6ff2 gbm: rename gbm_dri_device::{,loader_}extensions
To align with the name used in the EGL and GLX loaders.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:45:54 +01:00
Emil Velikov
38526bd468 egl/dri2: add support for optional extensions in dri2_bind_extensions()
Will allow us to reuse the function for optional extensions and fold a
bit of code.

v2: Make dri2_bind_extensions::optional flag an argument to
dri2_bind_extensions (Kristian).

Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:45:24 +01:00
Emil Velikov
ebc68e3849 egl/dri2: coding style cleanup
Consistently indent with space rather than a mix of tab and
spaces.

v2: Keep the structs properly aligned (Eric).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:43:57 +01:00
Emil Velikov
b10c05d4ff egl/x11: don't crash if dri2_dpy->conn is NULL
The dri3 version of commits 60e9c35b3a and 6de9a03bed.

While using xcb_connect() guarantees that we always get a non NULL
return value, XGetXCBConnection() does/can not.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:42:37 +01:00
Emil Velikov
f871946594 egl/dri2: rework dri2_egl_display::extensions storage
Remove the error prone fixed size array.
While we're here also rename to loader_extensions like in the GLX code.

v2: Rebase. Keep image_loader_extension within the wayland_drm
dri2_loader_extensions list.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:42:22 +01:00
Emil Velikov
f7b8108289 egl/dri2: remove unused dri2_egl_display::{dri2,swrast}_loader_extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:42:18 +01:00
Emil Velikov
e7fcf1b09b egl/x11: don't populate dri2_dpy->swrast_loader_extension
Analogous to earlier commits.

Note: the actual version of the extension is 1, since it does not
implement .putImage2.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:42:02 +01:00
Emil Velikov
2dbe14af1e egl/wayland: don't populate dri2_dpy->swrast_loader_extension
Similar to the dri2 one - the extension stored in struct
dri2_egl_display is unused.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:42:00 +01:00
Emil Velikov
3963a5fc94 egl/x11: don't populate dri2_dpy->dri2_loader_extension
Analogous to the earlier android and wayland patches. As we're here we
can drop exposing the old version of the extension.

Any dri loader/driver interface use lower bound checking thus exposing
dri2 loader v3 to a v2 capable driver is perfectly normal.

v2: Preserve compat with dri2_minor < 1. The driver does not know if
there is a protocol to manage getBuffersWithFormat(). It's up-to the
loader to expose the vfunc if there is one. (Kristian)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:41:56 +01:00
Emil Velikov
d2d579da7e egl/wayland: don't populate dri2_dpy->dri2_loader_extension
Analogous to the earlier android patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:41:51 +01:00
Emil Velikov
31ef5d4452 egl/surfaceless: trivial coding style fixes
Remove a few gratious blank lines and use the correct level of
indentation.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:41:48 +01:00
Emil Velikov
d0155bcbe8 egl/surfaceless: don't check the mask(s) prior to calling dri2_add_config
The latter already does it for us.

As we're here annotate the masks as const and use unsigned for the
index(es).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2016-10-14 12:41:43 +01:00
Emil Velikov
ff700f8c22 egl/surfaceless: remove unused dri2_loader_extension implementation
Earlier commit introduced support for image_loader and left the
dri2_loader code dangling/unused. Let's remove it.

Fixes: 63c5d5c6c4 ("Added pbuffer hooks for surfaceless platform")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2016-10-14 12:17:18 +01:00
Emil Velikov
6a8fe32430 egl/android: don't populate dri2_dpy->dri2_loader_extension
The extension stored in struct dri2_egl_display isn't used, thus we can
create a static const instance of the extension and point extensions[]
to it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:17:18 +01:00
Emil Velikov
675719817e egl/dri2: rework dri2_make_current code flow
Fold duplicate conditional blocks and add a few extra comments ;-)

v2: Bring back the explicit "unbind" logic (Eric), remove NULL derefs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:17:18 +01:00
Emil Velikov
07690a289a egl/dri2: drop NULL checks prior to dri2_destroy_surface
The function already have the respective check within.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 12:17:18 +01:00
Emil Velikov
8cf83f9c08 egl/dri2: call static functions directly, not via _EGLDriver::API
The indirection is meant to be used by the core EGL implementation in
main. Not in the drivers themselves.

Move the dri2_destroy_surface definition to avoid forward declaration of
the static function.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:08 +01:00
Emil Velikov
532ec2edd8 egl/dri2: use dri2_egl_display inline wrapper where possible
This way the only places that reference DriverData are the ones that
manipulate it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:07 +01:00
Emil Velikov
d6dcf3b4ca egl/dri2: bail out on NULL dpy in dri2_display_release()
Currently all callers are careful enough not to do that, yet that will
not be the case in the future.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:06 +01:00
Emil Velikov
8fb9ea413d egl/dri2: move surface refcounting out of the platform code
All the platforms are duplicating what should be a driver/dri2 thing -
refcounting. Just fold it accordingly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:05 +01:00
Emil Velikov
02f1158746 egl/dri2: coding style fix
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:04 +01:00
Emil Velikov
7a9c92d071 egl/dri2: non-shared glapi cleanups
For a while now we require shared glapi for EGL, thus we can drop a
few bits from the olden days. Namely - dlopen(NULL...) is not possible,
error out at build stage if so and drop the guard around dlclose().

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:03 +01:00
Emil Velikov
b349c11098 egl/dri2: glFlush is not optional, treat it as such
The documentation is clear - one must glFlush the old context on
eglMakeCurrent. Thus keeping it optional is not something we should be
doing. Furthermore if we cannot get the entry point we're likely having
a broken setup/stack.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-14 12:16:00 +01:00
Emil Velikov
13bf390657 aubinator: replace pragma once with ifndef guard
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Sirisha Gandikota<sirisha.gandikota@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:45 +01:00
Emil Velikov
ae6fb9c922 anv: error out if anv_genX.h is included by !anv_private.h
Update the comment to reflect the correct filename and add a guard to
catch incorrect inclusion of the header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:43 +01:00
Emil Velikov
08efa6a19f anv: use correct header guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:41 +01:00
Emil Velikov
76ae842366 intel/genxml: use correct header guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:39 +01:00
Emil Velikov
72e70c00f3 intel/common: use correct header guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:37 +01:00
Emil Velikov
0d86c92dcb intel/blorp: use correct header guards
Avoid the discouraged use of pragma once and a missing guard for
blorp_genX_exec.h.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:34 +01:00
Emil Velikov
3a98bffa59 isl: use ifndef header guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:32 +01:00
Emil Velikov
4c1c9d62a9 isl: make locally used functions static
Signed-off-by: Emil Velikov <emil.velikov@collabra.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:30 +01:00
Emil Velikov
4fe6e7f2bd isl: trivial include-what-you-want cleanups
Noticed while skimming through the files.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:28 +01:00
Emil Velikov
eac752e54b isl/gen7: remove unneeded ISL_DEV_GEN check
The function gen7_format_needs_valign2 has two callers - the gen7 only
gen7_choose_valign_el() and isl_gen6_filter_tiling(). The latter of
which already guarding the invocation appropriately.

To be extra cautious add a couple of asserts alongside the removal of the
runtime check.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:25 +01:00
Emil Velikov
5b1efb65ce isl: prefix non-static API with isl_
The rest of ISL already follows this approach. Be consistent and resolve
the final references.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:22 +01:00
Emil Velikov
84f9ef1de4 isl/gen6: correctly check msaa layout samples count
Samples == 1 is a valid value, so returning false is plain wrong.
Seeming copy/paste typo introduced since day 1.

Fixes: afdadec77f ("isl: Implement isl_surf_init() for gen4-gen9")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-14 11:53:15 +01:00
Emil Velikov
c572360c30 automake: add radv to the `make distcheck' hooks
Will allow us to catch issues (as fixed with previous patches) rather
than release a broken tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-10-14 11:09:00 +01:00
Emil Velikov
3fd0cafc1c radv: move AMDGPU_LIBS later in the link chain
At the moment (albeit unlikely) one could get link-time issues, since
libdrm_amdgpu.so is before it's users in the link chain.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-10-14 11:09:00 +01:00
Emil Velikov
a8a5f0a025 radv: correct variable name VISIBILITY_{, C}FLAGS
The letter C was missing, thus in turn all the internal symbols were
exported.

As a result we hide ~150 symbols and cut ~36K from libvulkan_radeon.so.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-10-14 11:09:00 +01:00
Emil Velikov
753a9c989f amd/addrlib: hide private symbols via VISIBILITY_CXXFLAGS
Private/internal symbols should not be exported. Using the CXXFLAGS cuts
~300 exported symbols and ~23K from libvulkan_radeon.so.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
72fa5ca06d intel: automake: replace direct basename $@ invokation with $(@F)
Use the shorthand make variable(s) as elsewhere in the build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
48267b730c gallium: annotate sw_driver_descriptor instance as const data
Already treated and handled as such.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
792148f16a gallium: annotate drm_driver_descriptor instance as const data
Already treated and handled as such.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
c079a206ad gallium: rename drm_driver_descriptor::{, driver_}name
Historically we use "device name" for the name of the kernel module and
"driver name" for the dri/other driver.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
9837cf13b1 gallium: remove unused drm_driver_descriptor::driver_name
Likely unused since day 1, although I've only checked back until the
st/dri unification with commit 29ca7d2c94 ("st/dri: merge dri/drm and
dri/sw backends")

Based on the comment, referencing drmOpenByName it's not something we
want to bring back.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
0f031dcf11 gallium: fix drm_driver_descriptor::name comment
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov
c85b34ffd0 mesa_glinterop: allow building without X and related headers
This commit effectively reverts c10dcb2ce8
and fixes the typedef redefinition which inspired it.

In order to prevent requiring X packages at build time earlier commit
forward declared the required X/GLX typedefs. Since that approach
introduced typedef redefinition (a C11 feature) it was reverted.

To avoid the redefinition while _not_ mandating X and related headers
forward declare the structs and use those through the header.

As anyone uses the mesa interop header they ensure that the X (or others
in terms of EGL) headers are included, which ensures that everything is
resolved within the compilation unit.

Cc: Vinson Lee <vlee@freedesktop.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Fixes: c10dcb2ce8 ("Revert "mesa_glinterop: remove inclusion of GLX
header"")
Fixes: 8472045b16 ("mesa_glinterop: remove inclusion of GLX header")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96770
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-10-14 11:08:59 +01:00
Mark Thompson
0b241b7717 st/va: Fix H.264 PicOrderCnt value
TopFieldPicOrderCnt is exactly the PicOrderCnt value for a frame - see
H.264 section 8.2.1.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:52 +02:00
Mark Thompson
1edaa33135 st/va: Baseline profile is not supported
Constrained baseline profile is supported, so use that instead.  This
matches what the encoder already does (constraint_set1_flag is always
set in the output bitstream).

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:48 +02:00
Mark Thompson
e0604eed9f st/va: Return surface formats depending on config chroma format
This makes the supported format actually match the configuration, and
allows the user to observe that NV12 is supported for video processing
where previously they couldn't (though it did always work if they
blindly tried to use it anyway).

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:44 +02:00
Mark Thompson
e7c7ef3625 st/va: Save surface chroma format in config
Both YUV420 and RGB32 configurations are supported, so we need to be
able to distinguish which is being used.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:40 +02:00
Mark Thompson
8a931c83ba st/va: Return more useful config attributes
The encoder attributes are needed for a user of the encoder to be
able to configure it sensibly without internal knowledge.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:25 +02:00
Mario Kleiner
0c94ed0987 glx: Perform check for valid fbconfig against proper X-Screen.
Commit cf804b4455
('glx: fix crash with bad fbconfig') introduced a check
in glXCreateNewContext() if the given config is a valid
fbconfig.

Unfortunately the check always checks the given config against
the fbconfigs of the DefaultScreen(dpy), instead of the
actual X-Screen specified in the config config->screen.

This leads to failure whenever a GL context is created
on a non-DefaultScreen(dpy), e.g., on X-Screen 1 of
a multi-x-screen setup, where the default screen is
typically 0.

Fix this by using config->screen instead of DefaultScreen(dpy).

Tested to fix context creation failure on a dual-x-screen setup.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-14 10:11:25 +01:00
Tim Rowley
a42c22fdbf swr: [rasterizer core] don't construct pArContext on non-ar builds
Stops debug directory being created on non-ar builds.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
29d07480b8 swr: [rasterizer core] remove WorkerWaitForThreadEvent bucket
Cause of bucket stop capture hang, as threads get stuck in level 1.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
ada27b503e swr: [rasterizer core] move binner functionality to separate file
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
f0a66c1da2 swr: [rasterizer scripts] add DEBUG_OUTPUT_DIR knob
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
ffd0224303 swr: [rasterizer core] fix comment typo
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
4889922210 swr: [rasterizer core/sim] 8x2 backend + 16-wide tile clear/load/store
Work in progress (disabled).

USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths
(emulated on non-AVX512 HW).

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley
bf1f46216c swr: [rasterizer archrast] fix event file issue with saving data
Also, tagging stats with draw id to correlate these events with
draw/dispatch events.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:13 -05:00
Eric Engestrom
827e038062 swr: [rasterizer common] fix assert index
Fixes: b3bd8bb611 ("swr: [rasterizer core] add support
       for "RAW" surface format")
CovID: 1373647
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 21:37:20 -05:00
Ilia Mirkin
5f885225cf docs: mark GL 4.4/4.5 extension groups as DONE for nvc0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-13 21:45:21 -04:00
Ilia Mirkin
afb6dc53bf nv50: enable ARB_enhanced_layouts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-13 21:45:21 -04:00
Ilia Mirkin
a6d6eff2e6 nvc0/ir: be more careful about preserving modifiers in SHLADD creation
src2 was being given the wrong modifier, and we were not properly
managing the modifier on the SHL source either.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-13 21:44:03 -04:00
Brian Paul
3a2869aaca mesa: fix indentation in vertex_attrib_binding()
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
743a526372 mesa: add sanity check assertion in update_array_format
At most, one of the normalized, integer, doubles bools can be true.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
d6b0002195 mesa: remove needless cast in update_array()
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
74745dcfa4 mesa: simplify update_array() with a vao local var
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
0de9265b1f vbo: simplify some code in check_draw_elements_data()
Use the 'vao' local var in more places.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
15fb88e912 mesa: rename gl_vertex_attrib_array gl_array_attributes
The structure contains the attributes of a vertex array.  The old name
was kind of confusing.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
c89802aeea mesa: rename gl_vertex_attrib_array::VertexBinding
Rename to gl_vertex_attrib_array::BufferBindingIndex because this field
is an index into the array of buffer binding points.  This makes some
code a little easier to follow since there's also a "VertexBinding" field
in gl_vertex_array_object.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
c328268b92 mesa: rename some vars in arrayobj.c
Use 'vao' instead of 'obj' to be consistent with other code.
Plus, add a comment.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
2016-10-13 17:38:49 -06:00
Brian Paul
b81546d43c tgsi: fix comment typo in tgsi_ureg.c
Trivial.
2016-10-13 17:38:49 -06:00
Brian Paul
ff00ab745c mesa: replace gl_framebuffer::_IntegerColor wih _IntegerBuffers
Use a bitmask to indicate which color buffers are integer-valued, rather
than a bool.  Also, the old field was mis-computed.  If an integer buffer
was followed by a non-integer buffer, the _IntegerColor field was wrongly
set to false.

This fixes the new piglit gl-3.1-mixed-int-float-fbo test.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-13 17:38:49 -06:00
Brian Paul
a710c21ac2 mesa: remove 'params' parameter from ctx->Driver.TexParameter()
None of the drivers which implement this hook do anything with the
texture parameter value.  Drivers just look at the pname and set a
dirty flag if needed.

We were doing some ugly casting and type conversion to setup the
argument so that all goes away.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-13 17:38:49 -06:00
Eric Anholt
99d790538d vc4: Avoid loading from the texture during non-utile-aligned glTexImage().
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".

However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.

Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
2016-10-13 14:27:30 -07:00
Axel Davy
0717cd975d st/nine: Fix possible segfault in surface ctor
Regression introduced by
ba0274c7d6

Check the resource exists before assigning it
a flag (and use This->base.resource instead
of pResource, since the former may have a newly
allocate resource, while the latter would be
NULL).

This should reintroduce the behaviour of previous
code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-13 21:16:35 +02:00
Axel Davy
98b8ad61c6 st/nine: Remove useless code in nine_shader
Since 1604efa6fd,
lconsti and lconstb don't need to be initialized.

Remove some leftovers from the previous code (which
has now invalid use of ARRAY_SIZE on a pointer instead
of an array).

Reported by Coverity.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-13 21:16:35 +02:00
Axel Davy
197cdd1bbd gallium/os: Use unsigned integers for size computation
Use uint64_t instead of int64_t in the calculation,
as the result is uint64_t.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 21:16:35 +02:00
Samuel Pitoiset
4527222169 nvc0: enable ARB_enhanced_layouts
All ARB_enhanced_layouts piglit tests pass without any changes
in our compiler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-13 21:13:34 +02:00
Dave Airlie
47a7d86fe9 radv: fix the wayland wsi busy bit
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 05:10:02 +10:00
Dave Airlie
a3834ebaf9 anv: fix the wayland wsi busy flag setting
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 05:10:02 +10:00
Tom Stellard
5c66d46d6a radv: Use new image load/store intrinsic signatures v2
These were changed in LLVM r284024.

v2:
  - Only use float types for vdata of llvm.amdgcn.image.store.  LLVM doesn't
    support integer types for this intrinsic.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:48:11 +10:00
Tom Stellard
30e63fb0e4 radv: Fix incorrect comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:48:11 +10:00
Dave Airlie
060e6f468a radv: fix identity swizzle handling
The identity swizzle should operate exactly
like an .r = R, .g = G, .b = B, .a = A swizzle.

This fixes a bunch of the 16-bit BGRA blit tests
dEQP-VK.api.copy_and_blit.blit_image.all_formats.b4g4r4a4*

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:45:57 +10:00
Dave Airlie
8980ac0411 anv/wsi: fix apps that acquire multiple images up front
This fix was found in the radv codebase when running dota2,
no idea if anyone has reported it on anv, but the same problem
occurs.

Once an image is acquired we need to mark it busy.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:45:11 +10:00
Dave Airlie
8bdac874e6 radv/wsi: fix app that acquire multiple images up front
dota2 does multiple acquires followed by multiple queues,
this bug manifested itself as a hang in the xshmfence code
randomly when dota2 was doing it's menus. It also occured
when running dota2 under phoronix-test-suite.

The fix is once the image is acquired to mark it busy then
so nobody else can acquire. We have to trust vulkan apps
that they will eventually submit it.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:45:11 +10:00
Dave Airlie
dfe74fd1a9 anv: initialise and increment send_sbc
At least set this to not be uninitialised memory.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-14 04:45:00 +10:00
Marek Olšák
7dddf0b7ab radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák
e12c1cab5d radeonsi: disable ReZ
This is a serious performance fix. Discovered by luck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák
d4d9ec55c5 radeonsi: implement TC-compatible HTILE
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.

Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.

This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.

I don't see a measurable increase in performance though.

(I tested Talos Principle and DiRT: Showdown, the latter is improved by
 0.5%, which is almost noise, and it originally used layered Z16,
 so at least we know that Z16 promoted to Z32F isn't slower now)

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák
a077185ea9 gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELY
For performance tuning in drivers. It filters out window system
framebuffers and OpenGL renderbuffers.

radeonsi will use this to guess whether a depth buffer will be read
by a shader. There is no guarantee about what will actually happen.

This is a departure from PIPE_BIND flags which are defined to be strict
but they are useless in practice.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Nicolai Hähnle
761388a0eb radeonsi: fix regression in image atomics
Caused by a bad rebase when pushing commit 76a940893.
2016-10-13 16:04:16 +02:00
Nicolai Hähnle
d413fbb159 st/mesa: fix vertex elements setup for doubles
Whether one or two slots are taken up by one API array depends on the
vertex shader, not on how the array is configured. When an array is
set up with fewer components than the shader expects, the high components
are undefined.

Fixes GL45-CTS.vertex_attrib_binding.basic-inputL-case1.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-13 15:41:36 +02:00
Nicolai Hähnle
15fc74905b st/glsl_to_tgsi: remove unnecessary ir_instruction argument from get_opcode
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-13 15:41:33 +02:00
Nicolai Hähnle
1d7685e52c st/glsl_to_tgsi: fix textureGatherOffset with indirectly loaded offsets
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-13 15:41:29 +02:00
Nicolai Hähnle
b234e37765 st/glsl_to_tgsi: simplify translate_tex_offset
This fixes a bug with offsets from uniforms which seems to have only been
noticed as a crash in piglit's
arb_gpu_shader5/compiler/builtin-functions/fs-gatherOffset-uniform-offset.frag
on radeonsi.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-13 15:41:11 +02:00
Nicolai Hähnle
76a940893d radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*
Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic*

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-13 10:17:42 +02:00
Nicolas Koch
35e2bfa6d9 radv: Return correct result in EnumeratePhysicalDevices
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE. Since only a single
physical device is supported, this is only the case when
pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-13 09:11:13 +10:00
Ilia Mirkin
e6a693c447 st/mesa: only flip stipple pattern for winsys fbo's
Gallium is completely oblivious to whether the fbo is flipped or not.
Only flip the stipple pattern when the fbo is flipped as well. Otherwise
the driver has no idea when to unflip the pattern.

Fixes bin/gl-2.1-polygon-stipple-fs -fbo

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-12 17:04:16 -04:00
Emil Velikov
a4622305e6 swr: automake: add ar_eventhandlerfile_h.template to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-12 18:55:22 +01:00
Emil Velikov
3c419a941a radv: add all headers to the sources list
Otherwise they'll be missing from the tarball and the build will fail.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-12 18:55:20 +01:00
Ilia Mirkin
a48a343c29 nvc0/ir: fix textureGather with a single offset
Recent fix for non-const offsets broke the case of a single offset (vs 4
offsets). The later code relies on the offs array to contain null values
to tell whether they should be added onto the srcs list.

Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-12 13:18:14 -04:00
Ilia Mirkin
300b5ad023 nv50/ir: copy over value's register id when resolving merge of a phi
The offset needs to be properly copied over to the phi value, otherwise
it will get assigned to the base of the merge instead of the proper
location.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-12 13:18:14 -04:00
Nicolai Hähnle
789119d212 st/mesa: enable ARB_enhanced_layouts and turn the cap on
v2: mark llvmpipe & softpipe properly as well (Jason Wood)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
b5b4aa42ba st/glsl_to_tgsi: adjust swizzles and writemasks for explicit components
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
777dcf81b9 st/glsl_to_tgsi: explicitly track all input and output declaration
In order to be able to emit overlapping input and output array
declarations, we flip the logic of emitting those declarations on its
head: rather than iterating over slots and emitting the corresponding
declarations, we iterate over the declarations from GLSL and emit those.

v2: fix some regressions related to structs
v3: fix a regression in geometry and tessellation shader array handling

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v2)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v2)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
2299a9940c st/glsl_to_tgsi: mark "gaps" in input/output arrays as used
In some cases, a shader may have an input/output array but not use some
entries in the middle. This happens with eON games, for example.

We emit declarations that cover the entire array range even if there are
some unused gaps. This patch now reflects that in the InputsRead etc.
fields to ensure the various input/outputMapping arrays are actually
correct, which will be important when we re-jiggle the way declarations
are emitted.

v2: fix a typo (Edward O'Callaghan)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
63193b9cde st/glsl_to_tgsi: disable on-the-fly peephole for 64-bit operations
This optimization is incorrect with 64-bit operations, because the
channel-splitting logic in emit_asm ends up being applied twice to
the source operands.

A lucky coincidence of how the writemask test works resulted in this
optimization basically never being applied anyway. As far as I can tell,
the only case where it would (incorrectly) have been applied is something
like

    dvec2 d;
    float x = (float)d.y;

which nobody seems to have ever done. But the moral equivalent does occur
in one of the component layout piglit test.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
f5f3cadca3 st/glsl_to_tgsi: simpler fixup of empty writemasks
Empty writemasks mean "copy everything", so we can always just use the number
of vector elements (which uses the GLSL meaning here, i.e. each double is a
single element/writemask bit).

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
957d541089 st/glsl_to_tgsi: explicit handling of writemask for depth/stencil export
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
14aaaa1b4b glsl: dump explicit location when printing IR
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
2b460c750a tgsi/ureg: add ureg_DECL_output_layout
For specifying an exact location/component.

v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
047a7c7a0b tgsi/ureg: add layout/component input declarations
v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
f9a01f3872 tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arrays
v2: remove a tautological left-over assert (Marek)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle
700a571f89 gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTS
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Tom Stellard
b33cb709fd radeonsi: Use the new image load/store intrinsic signatures
This patch requires LLVM r284024 or newer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:43 +00:00
Tom Stellard
ff0df66e10 radeonsi: Add function for converting LLVM type to intrinsic string
The existing function only worked for integer types.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:07 +00:00
Tom Stellard
a96a7eae04 radeonsi: Refactor image store/load intrinsic name creation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:07 +00:00
Marek Olšák
d7e74b52bb winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fences
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
e4bbab9022 radeonsi: fix R600_DEBUG=precompile for shader-db
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
40e1f7e09b radeonsi: use TC write-back instead of full cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák
65a4d55a9f radeonsi: don't invalidate VMEM L1 for memory barriers for index buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Samuel Pitoiset
87b06cab14 nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)
total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs    :335256 -> 335273 (0.01%)
total local used in shared programs   :31968 -> 31968 (0.00%)

                local        gpr       inst      bytes
    helped           0          41         852         852
      hurt           0          44          23          23

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-12 17:46:03 +02:00
Nicolai Hähnle
85ba409967 mapi: fix out-of-tree build dependencies
We shouldn't be using wildcard here in the first place, but changing that
is some effort. As it stands, make -p confirms that glapi_gen_mapi_deps only
contains mapi_abi.py when building outside the Mesa tree.

As a result, only some of the tables were updated when XML files change, but
not the tables for shared glapi. This change ensures that we pick up the
XML files and scripts from the source tree as dependencies also for shared
glapi.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-12 17:36:35 +02:00
Roland Scheidegger
7e86b2ddae draw: initialize shader inputs
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).

No piglit change.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-10-12 15:05:44 +02:00
Edward O'Callaghan
cfbf956dfd radv: trivial case stmt style fixups
Relocate a 'default:' to the end of a case stmt and fix an
indent issue.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2016-10-12 20:12:43 +11:00
Nicolas Koch
fd27d5fd92 anv: Return correct result in EnumeratePhysicalDevices
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE.
Since only a single physical device is supported, this is only the case
when pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-11 22:58:27 -07:00
Kenneth Graunke
2871d4d687 anv: Allow vp_info to be NULL in 3DSTATE_CLIP code.
pViewportState may be NULL if rasterization is disabled.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-11 22:50:19 -07:00
Kenneth Graunke
ba38a9d380 anv: Fix anv_pipeline_validate_create_info assertions.
Many of these can be "NULL if the pipeline has rasterization disabled."
Also, we should assert that pMultisampleState exists.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-11 22:50:09 -07:00
Ilia Mirkin
389d6dedbe trace: add invalidate_resource callback
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-11 20:47:54 -04:00
Gustaw Smolarczyk
c3f3c6b0e8 radv/winsys: Fix radv_amdgpu_cs_grow min_size argument. (v2)
It's supposed to be how much at least we want to grow the cs, not the
minimum size of the cs after growth.

v2: Unbreak use_ib_bos.
    Don't mask the ib_size when !use_ib_bos, since it's not needed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 09:06:30 +10:00
Grigori Goronzy
a22b5f28fb radv: fix strict aliasing violation
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 09:00:22 +10:00
Grigori Goronzy
0b539abcf4 radv: fix uninitialized variables
This gets rid of "may be used uninitialized" compiler warnings.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 09:00:22 +10:00
Grigori Goronzy
7ca44f8a33 radv: add missing unreachable
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 09:00:22 +10:00
Dave Airlie
8cc9f89d26 radv: remove the validation layer and some related bits.
As pointed out by Emil this isn't used in anv anymore,
and it was totally unused in radv anyways.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:57:09 +10:00
Dave Airlie
014ec78fb2 radv: drop entrypoint split out.
radv really doesn't need different dispatch per gen yet,
there really isn't that many differences yet.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:56:41 +10:00
Dave Airlie
12301c5418 radv: drop the RADV_CALL macro.
This is leftover from anv, and we really never needed it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:56:41 +10:00
Dave Airlie
fc28f89157 radv: check driver name before calling amdgpu.
This checks the kernel driver name is amdgpu before calling
libdrm_amdgpu.

This avoids the following error:
amdgpu_device_initialize: DRM version is 1.6.0 but this driver is only compatible with 3.x.x

when run on a machine with i915 graphics as well as amdgpu.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:56:41 +10:00
Dave Airlie
6215b47648 radv: fix memory leak from physical device if wsi fails
Inspired by patch from Edward O'Callaghan <funfunctor@folklore1984.net>
which didn't do it right.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:53:44 +10:00
Edward O'Callaghan
e0641c61ca radv/winsys: Fix mem leak at failed do_winsys_init() call site
Probably unlikely however ensure we don't leak a heap allocation
on the fail path.

V.2:
 also fix missing 'amdgpu_device_deinitialize()' calls (Emil Velikov).

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:46:10 +10:00
Edward O'Callaghan
4a0db58f14 radv/winsys: Trivial style and readability fixups
Drop/add a few newlines where appropriate and drop a couple of
unnessary braces.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-12 08:24:50 +10:00
Marek Olšák
b425b57d1e radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-11 20:04:57 +02:00
Tim Rowley
9db9c61d26 swr: [rasterizer archrast] update proto file
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley
3805e40f32 swr: [rasterizer archrast] add support for stats files
Only stat and counter events are saved to the event files.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley
f4684cdb5f swr: [rasterizer jitter] remove architecture override
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley
185a531206 swr: [rasterizer jitter] adjust jitmanager assert
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:17 -05:00
Tim Rowley
eaec263427 swr: [rasterizer] eliminate unused label warnings on gcc
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
12e6f4c879 swr: [rasterizer core] implement depth bounds test
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
1b86c050ad swr: [rasterizer core] update/add formats
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
a907b7a5f7 swr: [rasterizer core] SwrStoreTiles api change
SwrStoreTiles now takes a mask of surfaces to store.  Reduces
overhead when storing multiple render targets.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
5d5179a6c2 swr: [rasterizer scripts] add ENABLE_ASSERT_DIALOGS knob for windows
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
07326d4006 swr: [rasterizer archrast] add mako template
Add template for generating code to save events to a file.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
e845eeb0be swr: [rasterizer core] disable cull for rect_list
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
b3bd8bb611 swr: [rasterizer core] add support for "RAW" surface format
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
2966d9c691 swr: [rasterizer core] align Macrotile FIFO memory to SIMD size
Align and use streaming store instructions for BE fifo queues.
Provides slightly faster enqueue and doesn't pollute the caches.
Add appropriate memory fences to ensure streaming writes are
globally visible.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
6b3691c876 swr: [rasterizer common] remove threadviz code
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley
2550b04179 swr: [rasterizer memory] split load/store for compile speed
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Eric Engestrom
0a606a400f egl: add eglSwapBuffersWithDamageKHR
EGL_KHR_swap_buffers_with_damage is actually already supported, as it is
technically nothing but a rename of EGL_EXT_swap_buffers_with_damage.

To that effect, both extension are advertised depending on the same
condition, and the new entrypoint simply redirects to the previous one.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-11 14:04:26 +01:00
Mauro Rossi
b9e639589d intel/genxml: fix building rules for aubinator required headers
New generated headers were introduced by commit 63a366a
"intel: aubinator: generate a standalone binary"

Android does not need aubinator yet, so in order to avoid building error,
aubinator required new genxml headers are defined in a separate list.

If required, building rules for Android will be added later.
[Emil Velikov: don't use a _HEADERS variable name (causes warnings)]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-11 13:53:19 +01:00
Emil Velikov
0b54c022a8 radv: automake: move libamdgpu_addrlib.la to VULKAN_LIB_DEPS
The static library is analogous to the intel ISL, which is required for
both hardware and (to be added) testing library.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-11 13:51:09 +01:00
Emil Velikov
4882476eca radv: automake: remove unused variables
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-11 13:51:08 +01:00
Emil Velikov
e2cb253346 radv: automake: include the python scripts/formats table in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-11 13:51:06 +01:00
Tapani Pälli
fc8b358bd6 mesa: fix error handling in _mesa_TransformFeedbackVaryings
Patch changes function to use _mesa_lookup_shader_program_err both
in TransformFeedbackVaryings and GetTransformFeedbackVarying that
handles errors correctly for invalid values of shader program.

Fixes following dEQP test:
   dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.transform_feedback_varyings

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98135
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-11 07:44:33 +03:00
Xu,Randy
d11a63d6e6 i965: solve cubemap negative x/y/z faces buffer offset issue in dEQP.
Add the miptree level/slice x/y_offset when count the surface offset
in brw_emit_surface_state. The surface offset has two parts, one is
from mt->offset, which should be 32 aligned in width/height for tiled
buffer; another is from mt->level[current_level].slice[current_slice].
x/y_offset.

This fix will solve 12 deqp failure
dEQP-EGL.functional.image.create.gles2_cubemap_negative_*_texture

Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-11 07:44:18 +03:00
Nicholas Bishop
64435fd888 i915g: fix incorrect gl_FragCoord value
On Intel Pineview M hardware, the i915 gallium driver doesn't output
the correct gl_FragCoord. It seems to always have an X coord of 0.0
and a Y coord of the window's height in pixels, e.g. 600.0f or such.

I believe this is a regression caused in part by this commit:
afa035031f

The old behavior used the output at index zero, while the new behavior
uses actual zeroes. In the case of gl_FragCoord the output at index
zero happened to be the correct one, so the behavior appeared correct
although the code already had a bug.

Fixed by checking for I915_SEMANTIC_POS when setting up texCoords. If
the generic_mapping is I915_SEMANTIC_POS, look for the
TGSI_SEMANTIC_POSITION instead of a TGSI_SEMANTIC_GENERIC output.

https://bugs.freedesktop.org/show_bug.cgi?id=97477

Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
2016-10-10 18:32:36 -07:00
Vinson Lee
c10dcb2ce8 Revert "mesa_glinterop: remove inclusion of GLX header"
This reverts commit 8472045b16.

Conflicts:

	include/GL/mesa_glinterop.h

This patch fixes this build error with GCC 4.4.

  Compiling src/glx/dri_common_interop.c ...
In file included from src/glx/dri_common_interop.c:33:
include/GL/mesa_glinterop.h:62: error: redefinition of typedef ‘GLXContext’
include/GL/glx.h:165: note: previous declaration of ‘GLXContext’ was here

Fixes: 8472045b16 ("mesa_glinterop: remove inclusion of GLX header")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96770
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2016-10-10 15:09:44 -07:00
Axel Davy
eef0744d43 st/nine: More checks for GetRenderTargetData
Fixes a wine test crash

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
a52e700169 st/nine: Add debug output for lost devices
Add debug output to ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
5d85253dc3 st/nine: Prevent crash in GetRenderTargetData
Return error instead of crashing on source surfaces
with format D3DFMT_NULL.

Fix for issue #236.

Tested on Windows 7.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
09edc0555f st/nine: Set CLAMP_TO_EDGE on cubetextures
Wine tests show that cubetextures always use
PIPE_TEX_WRAP_CLAMP_TO_EDGE regardless of set
sampler states.

Fixes failing d3d9 wine test test_cube_wrap.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
fa2574497b st/nine: handle possible failure of D3DWindowBuffer_create
Check for errors and pass them to the callers.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
f04fa0a62c st/nine: Assert on buffer creation failure
Add an assert to make sure buffer creation doesn't fail.
Add error handling in calling functions.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
f8c01e7a96 st/nine: Use NineDevice9_CreateDepthStencilSurface in swapchain9
Replace custom code with NineDevice9_CreateDepthStencilSurface.
All functionality is given now.
2016-10-10 23:43:51 +02:00
Axel Davy
63367e6c95 st/nine: Fix check and remove useless code in swapchain9
The removed code was there for two reasons:
1) Allow DF16, DF24, INTZ to be used as depth buffer
for swapchain, if the driver doesn't support
PIPE_BIND_SAMPLER_VIEW for the underlying format
2) Set PIPE_BIND_SAMPLER_VIEW if possible, such that
if StretchRect is called on the depth texture, it is happy.

1) The reason these formats needed a workaround is because
the check flags for them in CheckDeviceFormat were incorrect,
which led applications to think the formats were valid for
swapchains, even if they weren't supported.
2) StretchRect limitations for depth buffers force
the resource_copy_region path, which should be fine without
PIPE_BIND_SAMPLER_VIEW.

Thus fix the check for 1), and remove the code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
60624be203 st/nine: Implement MSAA quality levels
Advertise quality levels:
Each supported multisample count matches to one quality level.
The application doesn't know how much samples each quality level has.
For that reason it's not possible to set the multisample mask.

Return errors on quality level missmatch.

Fixes several old games not having multisample support until now.

Fix for issue #73.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
8a50b1244f st/nine: Prepare update_framebuffer for MS quality levels
Compare resource's nr_samples instead of D3D multisample level.
Required for multisample quality levels to work correct.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
b560305687 st/nine: Add additional error handling in CheckDeviceMultiSampleType
Return one supported quality level in error cases.
Return error on invalid multisample count.

Fixes failing wine tests.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
7afab4ad39 st/nine: Fix compiler warning
Use strict aliasing in SetPrivateData and struct pheader.
Casting char[1] to IUnknown** isn't allowed in strict aliasing.
Compute pointer to body by adding size of header to header pointer.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
b9f31111ac st/nine: Remove resource9 {Set/Get/Free}PrivateData functions
Remove {Set/Get/Free}PrivateData in resource9.
Functionality has been implement in IUnknown interface.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
03888e8a46 st/nine: Remove volume9 {Set/Get/Free}PrivateData functions
Remove {Set/Get/Free}PrivateData in volume9.
Functionality has been implement in IUnknown interface.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
485cba7eb4 st/nine: Switch {Set/Get/Free}PrivateData functions
Switch {Set/Get/Free}PrivateData function to introduced IUnknown functions.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
4117f5e1ab st/nine: Implement {Set/Get/Free}PrivateData in iunknown
Implement {Set/Get/Free}PrivateData in iunknown to get rid
of duplicated code in resource9 and volume9.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
c1c8e852c1 st/nine: Return device in NineSurface9_GetContainer
According to MSDN the device is returned for surfaces that do
not have a regular container.

Such surfaces are:
OffscreenPlainSurface, DepthStencilSurface and RenderTarget

Tested and verified on Windows.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
ba0274c7d6 st/nine: Allocate surface resources in surface ctor
Allocate resources in surface ctor.
Allows to use statetracker internal memory accounting.

Fix for issue #231.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Axel Davy
1f65f67b21 st/nine: Fix D3DFMT_NULL size
D3DFMT_NULL is mapped to PIPE_FORMAT_NONE.
Instead of relying on PIPE_FORMAT_NONE to
return a size, pick one.
The one picked is the same than Wine.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
9dc792b95b st/nine: Add debugging output
Add DBG calls to NineTexture9_GetLevelDesc and
NineTexture9_GetSurfaceLevel to ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
8ceb2264c5 st/nine: Fix assert in NineUnknown_QueryInterface
Tests showed that is allowed to call this method on
object that have a zero refcount.
Required for issue #230.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph
f2eacef33d st/nine: Print interface id in NineVolume9_GetContainer
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph
489dbc51ae st/nine: Print interface id in NineSurface9_GetContainer
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph
e63a38832b st/nine: Print interface id in NineUnknown_QueryInterface
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph
6a1cce20b6 st/nine: Move assert in NineSurface9_ctor
Move assert to function entry.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
851e4b8d8a st/nine: Properly declare sampler states for ff
Fixes a softpipe assertion failure with wine tests

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
5ce23c1689 st/nine: Handle user clipping planes properly for ff
Found reading msdn and checking Wine.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
d2fd296648 st/nine: Fix the calculation of the number of vs inputs
Fixes hangs on radeonsi, and assert on llvmpipe.

Signed-off-by: Axel Davy <axel.davy@ens.fr>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:50 +02:00
Axel Davy
71e7292a85 st/nine: Fix specular w coordinate
Found looking at Wine formulas.
Fixes a few visual issues.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
732cea09cd st/nine: Disable parts of lighting calculation if no normal provided
Behaviour found in Wine sources, and checked with some test apps.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
fc9bb19dce st/nine: Fix condition for specular lightning
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
c56c7c1fc8 st/nine: Do always accumulate diffuse
According to spec.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
c5bce80f50 st/nine: Initialize ps ff registers
Found with wine tests for the rTmp register.
Not sure for the other ones.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
4ed3d5ee57 st/nine: Do not pollute rTmp in ff ps
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
d9b8b3196e st/nine: Allocate temporaries on demand for ps ff
Same change than for vs ff.
This makes it easier to not introduce mistakes
reusing temporaries whose result shouldn't be
erased.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
f7dd27aed3 st/nine: Fix texbem
Error found with wine tests.
nine_shader was expecting another order
than the one device9 was using.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
7afcbb49ba st/nine: Fix ff computation for inverse
Thanks to wine tests.
Apparently 4x4 inverse is to be used, and
if the inverse can't be calculated, the
input matrix is to be used.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
36399f9a7f st/nine: Used normed Vtx for reflectionvector
Fix deduced from the spec.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
eda1e6ece7 st/nine: Implement SPHEREMAP
Behaviour checked with a test app.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
a3ddc80ec8 st/nine: Enable passthrough only if positiont is used
Wine tests for the passthrough feature are for positiont.

Nothing seems to indicate passthrough happens when positiont
it not used. However having passthrough with positiont makes
sense (to be used with ProcessVertices outputs).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
0b5bed774b st/nine: Fix wrong mask in ff vs
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
028dab95f6 st/nine: Fix tweening factor computation
The computation was reversed.
Deduced by tests on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
1fe055338d st/nine: Disable ff vertex blending if required inputs are missing
This behaviour has been partially tested on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
aa69bb6848 st/nine: Use materials if source is not given.
Deduced by test on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
ab068a78d3 st/nine: Fix ff SPECULARENABLE
We were (wrongly) adding specular to diffuse
in vertex shaders when SPECULARENABLE was set.

However the spec says specular has to be added
after texture processing (which is in ps).
Besides SPECULARENABLE is flagged as a pixel state.

There was unused support for SPECULARENABLE
in the ps ff code.
Remove the vs code, and use the ps code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
1d7890a441 st/nine: Undefined specular should be full of zeros
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
d9330f9348 st/nine: Implement normal transformation with vertex blending
The formula is different from the one of the spec,
but otherwise nothing particular.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
305e8106ab st/nine: Increase MaxVertexBlendMatrixIndex
Modern cards do advertise 8.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
567be40de9 st/nine: Compact ff vs constants a bit
There are several holes. This patch reduces
the holes a bit, which reduces the size of
the constant buffer uploaded.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
07d1f32e0f st/nine: Fix vertex blending aVtx computation
There was an multiplication by the world matrix 0
which had nothing to do there.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
d9d8cb9f19 st/nine: Reorganize ff vtx processing
The new order simplified the code a bit for
next patches.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
cde74cba71 st/nine: Small simplification for position_t and fog
position_t disables fog computation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
5d2a8e8a36 st/nine: Cleaning code for vs temporaries
This has been a real mess up to now: the temporaries
were allocated once, and shared after that between
the different parts of the code.

To help maintaining the code, the temporaries are now
allocated and released on need.

As surprising as it could be, this patch, which was
supposed to introduce no behaviour change, actually
solved a visual bug observed on a sample program.
This was due to ureg_normalize3 polluting a temporary
variable.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
1f18b6f351 st/nine: No need for the local flag for temporaries in ff
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
eb9ad8f969 st/nine: Handle D3DRS_NORMALIZENORMALS
When this state is set, the normals computed
in the vs ff shader should be normalized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy
b9639c661f st/nine: Initial ProcessVertices support
For now only VS 3 support is implemented.

This enables The Sims 2 to work.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy
3bf02d383f st/nine: Partial software vertex processing support
Software Vertex Processing allows:
. Less limitations for shaders (more loops, etc)
. Less limitations for ff (more enabled lights, 255
matrices for VertexBlend)

In particular shaders can get more constants.
This patch implements support for this (not using software
rendering, but hardware rendering, as llvmpipe and dx10+ hw
have the same limits...)

This is considered a second class path. Even apps asking for
"Mixed Vertex processing" (ie the ability to switch to swvp
on demand) do not use the feature much. Some just initialize
more constants than the normal limit at the start of the
application, but never use more than the normal limit.
When the apps do not need the software vertex processing
features, they do not seem to turn it on. This means it is
ok if that path is slow.
Thus no care has been made to make the path optimized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
f8c8f44244 st/nine: Rework vs int and bool constants buffer
This will help to support swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
a83dce0128 st/nine: Change dirty tracking for vs int and bool constants
This change makes easier to introduce tracking for
swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
f78089b962 st/nine: Drop unused constant upload path
This path has been disabled for some time because
of some bugs with it. It hasn't been updated to the
new features, and is not faster.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:49 +02:00
Axel Davy
1604efa6fd st/nine: Add support for swvp constants in shaders
swvp has relaxed limits (more nested loops, etc).
In particular it enables more constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
56ea3df7d4 st/nine: Initial mixed vertex processing support
In mixed vertex processing, the user can enable or disable
software vertex processing. It is on hardware by default.

This feature is not a state, and thus the setting doesn't
need to be recorded by stateblocks.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
747f1ef8b6 st/nine: Implement SetNPatchMode
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
ded7a73eb3 st/nine: Implement D3DUSAGE_SOFTWAREPROCESSING
Buffers with this flag must be usable with both software
and hardware vertex processing. Use Staging for fast cpu access.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:49 +02:00
Patrick Rudolph
19703f2a36 st/nine: Allocate more space for ATI1
ATIx are "unknown" formats that do not follow block format conventions.
Tests showed that pitch*height bytes are allocated.
apitrace used to depend on this behaviour.
It used to copy more bytes than it has to for the ATI1 block format,
but it didn't crash on Windows.

Increase buffersize for ATI1 to fix this crash.
The same issue was present in WINE but a patch has been sent by me.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Patrick Rudolph
ec6c636722 st/nine: Add missing break
Add missing break instruction.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
03f60a3357 st/nine: Implement relative addressing for ps inputs
To implement the feature we copy the ps inputs to a temp array.
This is not optimal for performance, but it is the simplest solution.

This is a feature that is very very rarely used.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
a5d308e51a st/nine: Wait for pending tasks to execute in swapchain
Fixes crash after Reset() when using thread_submit=true

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
f090705075 st/nine: Use fixed size arrays for swapchain buffers
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Patrick Rudolph
a719800cb8 st/nine: Fix buffer count check for Ex devices
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
9ff0dc3129 st/nine: Disable seamless cubemap for d3d
d3d9 doesn't have seamless cubemap.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
f0ec54ee32 st/nine: Fix some check flags
Uses the new defines introduced in previous commit.
See comment in the commit for more explanation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy
39e98d351f st/nine: Unify some check flags
The new defines will be reused in a later patch.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:48 +02:00
Axel Davy
2290eac84e gallium/util: Really allow aliasing of dst for u_box_union_*
Gallium nine relies on aliasing to work with this function.
Without this patch, dirty region tracking was incorrect, which
could lead to incorrect textures or vertex buffers.
Fixes several game bugs with nine.
Fixes https://github.com/iXit/Mesa-3D/issues/234

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:48 +02:00
Axel Davy
5e7f0ebe29 softpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
softpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy
814ca96d0d llvmpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
llvmpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy
218459771a gallium/os: Fix overflow on 32 bits
On systems with more than 4GB of ram,
os_get_total_physical_memory was triggering an integer
overflow for the linux and haiku path, when on
32 bits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 23:43:48 +02:00
Axel Davy
9904581dc6 st/nine: Memset pipe_resource templates
Fixes regression introduced by
ecd6fce261
and is more future proof than just clearing the next
field.

Other nine usages did already zero out the templates.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-10 23:43:48 +02:00
Samuel Pitoiset
d43151318a nvc0: fix valid range for shader buffers
When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-10 21:32:16 +02:00
Ilia Mirkin
5239bd5920 nvc0/ir: fix overwriting of value backing non-constant gather offset
Normally the value is an immediate, which is moved to some temporary, so
there's no problem. In the case of a non-constant offset (as allowed by
ARB_gpu_shader5), we have to take care to copy it first before using it
to build up the bits.

This fixes a compilation error observed in F1 2015.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-10 14:28:32 -04:00
Vinson Lee
0a898ec28b glsl: Add missing cache_destroy stub function.
CC       glsl/tests/cache_test.o
glsl/tests/cache_test.c: In function ‘test_cache_create’:
glsl/tests/cache_test.c:160:4: error: implicit declaration of function ‘cache_destroy’ [-Werror=implicit-function-declaration]
    cache_destroy(cache);
    ^

Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an on-disk cache")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-10 11:17:31 -07:00
Anuj Phogat
f8f6f60a36 docs: Mark GL_OES_viewport_array done on i965
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2016-10-10 10:48:38 -07:00
Chad Versace
8044885182 egl: Unify the EGLint/EGLAttrib paths in eglCreateSync* (v3)
Pre-patch, there were two code paths for parsing EGLSync attribute
lists: one path for old-style EGLint lists, used by eglCreateSyncKHR,
and another for new-style EGLAttrib lists, used by eglCreateSync (1.5)
and eglCreateSync64 (EGL_KHR_cl_event2).

There were two attrib_list parsing functions,
  _eglParseSyncAttribList(_EGLSync *sync, const EGLint *attrib_list)
  _eglParseSyncAttribList64(_EGLSync *sync, const EGLattrib *attrib_list)
This patch unifies the two attrib_list parsing functions into one,
  _eglParseSyncAttribList(_EGLSync *sync, const EGLattrib *attrib_list)

Many internal EGLSync function signatures had *two* attrib_list
parameters to accomodate both code paths: one parameter was an EGLint
list and other an EGLAttrib list. At most one of the parameters was
allowed to be non-null.  This patch removes the `EGLint *attrib_list`
parameter, leaving only the `EGLAttrib *attrib_list` parameter, for all
internal EGLSync functions.

v2:
  - Consistently use condition (sizeof(int_list[0]) ==
    sizeof(attrib_list[0])). [for emil]
v3:
  - Don't double-unlock the display in eglCreateSyncKHR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
2016-10-10 09:54:11 -07:00
Eric Anholt
0f99c0686e intel: Fix bash-specific redirection.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-10 09:50:05 -07:00
Eric Anholt
ec9ed1c4d8 gallium: Fix install-gallium-links.mk on non-bash /bin/sh
Debian uses dash by default, which doesn't do '+='.  Fixes servo's
osmesa-based headless testing system, which was looking for libOSMesa in
the lib/ directory.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-10 08:56:12 -07:00
Ilia Mirkin
ec05331a7b nv50/ir: only stick one preret per function
A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use functions in earnest, it could have in some cases
overflowed the call stack, in case a function had a lot of early
returns.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-10 10:45:06 -04:00
Nicolai Hähnle
1f95121626 radeonsi: make more use of si_have_tgsi_compute
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:33 +02:00
Nicolai Hähnle
38cfd5160a gallium/radeon: assign a name to LLVM output variables in debug builds
This can be helpful with R600_DEBUG=preoptir.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:30 +02:00
Nicolai Hähnle
39a29c2431 gallium/radeon: avoid redundant work with overlapping in/out arrays
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:37:50 +02:00
Nicolai Hähnle
77c81164bc radeonsi: support ARB_compute_variable_group_size
Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:36:42 +02:00
Lionel Landwerlin
014bd4acb8 anv: turn on samplerAnisotropy in VkPhysicalDeviceFeatures
According to the Vulkan spec 5.63.4 :

  samplerAnisotropy indicates whether anisotropic filtering is supported. If
  this feature is not enabled, the maxAnisotropy member of the
  VkSamplerCreateInfo structure must be 1.0.

Since we already set maxAnisotropy to 16 and program the hardware according
to the VkSamplerCreateInfo.maxAnisotropy, it seems we can turn this on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-10 09:25:38 +01:00
Edward O'Callaghan
ba43768a1e radv: Use proper header guards over 'pragma once' directives
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-10 16:10:56 +11:00
Tapani Pälli
2d7e0f35c5 mesa: throw error if bufSize negative in GetSynciv on OpenGL ES
Fixes following dEQP tests:

   dEQP-GLES31.functional.debug.negative_coverage.callbacks.state.get_synciv
   dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_synciv
   dEQP-GLES31.functional.debug.negative_coverage.log.state.get_synciv

v2: drop _mesa_is_gles check (Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98133
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-10 07:29:31 +03:00
Tapani Pälli
d997d5c0c9 glsl: prohibit lowp, mediump precision on atomic_uint
Fixes following dEQP tests:

   dEQP-GLES31.functional.debug.negative_coverage.callbacks.atomic_counter.atomic_precision
   dEQP-GLES31.functional.debug.negative_coverage.get_error.atomic_counter.atomic_precision
   dEQP-GLES31.functional.debug.negative_coverage.log.atomic_counter.atomic_precision

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98131
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-10 07:29:31 +03:00
Tapani Pälli
c64093e7d5 glsl: optimize copy_propagation_elements pass
Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.

v2 (idr): Formatting / whitespace fixes.  Embed the acp_ref in the
acp_entry.

v3 (idr): Delete unused copy constructor.  Use while(pop_head) instead
of foreach() { remove }.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-10 07:29:31 +03:00
Dave Airlie
db5d278541 radv: don't build without SHA1.
Just copy the section from anv above this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98167
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-10 10:08:47 +10:00
Edward O'Callaghan
185be15d9d docs/features.txt: Add GL_KHR_robustness supported on ES 3.2
Both radeonsi and nvc0 should also support ES so fixup doc.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-09 01:06:38 +11:00
Lionel Landwerlin
4682abdaa8 intel: aubinator: enable loading dumps from standard input
In conjuction with an intel_aubdump change, you can now look at your
application's output like this :

$ intel_aubdump -c '/path/to/aubinator --gen=hsw' my_gl_app

v2: Add print_help() comment about standard input handling (Eero)
    Remove shrinked gtt space debug workaround (Eero)

v3: Use realloc rather than memcpy/free (Ben)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
2016-10-08 02:18:47 +01:00
Lionel Landwerlin
619c8de522 intel: aubinator: enable loading xml files from a given directory
This might be useful for people who debug with out of tree descriptions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
2016-10-08 02:17:35 +01:00
Lionel Landwerlin
63a366a881 intel: aubinator: generate a standalone binary
Embed the xml files into the binary, so aubinator can be used from any
location.

v2: Split generation packing into another patch (Jason)
    Check for xxd (Jason)

v3: Fix out of tree builds (Jason)
    Generate custom variable name rather than names generated by xxd
    (Lionel)

v4: Move generated _xml.h files to genxml/ (Sirisha)

v5: Remove newline from makefile (Jason)

v6: Add comment on gen*_xml.h creation (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-08 02:17:03 +01:00
Nanley Chery
4d7d9825f3 anv/TODO: Update the HiZ task
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-07 12:54:18 -07:00
Nanley Chery
d8aacc24cc anv: Enable fast depth clears
Provides an FPS increase of ~30% on the Sascha triangle and multisampling
demos.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-07 12:54:18 -07:00
Chad Versace
78d074b87a anv/cmd_buffer: Enable rendering to HiZ
Nanley Chery:
(rebase)
 - Resolve conflicts with new anv_batch_emit macro
(amend)
 - Handle a QPitch TODO
 - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
 - Only use HiZ for single-subpass renderpasses
 - Emit the HiZ instruction before the stencil instruction to follow the
   optimized clear sequence specified in the PRMs
 - Don't modify clear params
 - Enable resolves when a HiZ buffer is used to ensure depth buffer validity

Provides an FPS increase of ~15% on the Sascha triangle and multisampling
demos.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:18 -07:00
Nanley Chery
134d181be1 anv/cmd_buffer: Add code for performing HZ operations
Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-07 12:54:18 -07:00
Jason Ekstrand
9919a2d34d anv/image: Memset hiz surfaces to 0 when binding memory
Nanley Chery (amend):
 - Change memset value from 0xff to 0 (a defined value for HiZ).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:18 -07:00
Jason Ekstrand
b4bbabf21b anv: Move BindImageMemory to anv_image.c
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:18 -07:00
Chad Versace
917814dccd anv: Allocate hiz surface
Nanley Chery:
(rebase)
 - Use isl_surf_get_hiz_surf()
(amend)
 - Only add a HiZ surface onto a depth/stencil attachment
 - Add comment above HiZ surface addition
 - Hide HiZ behind INTEL_VK_HIZ prior to BDW
 - Disable HiZ for untested cases
 - Remove DISABLE_AUX_BIT instead of preventing it from being added

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-10-07 12:54:18 -07:00
Chad Versace
3aec432ed3 anv: Add func anv_image_has_hiz()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:17 -07:00
Chad Versace
fe40d026a1 anv: Add anv_image::hiz_surface
Unused.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:17 -07:00
Nanley Chery
814fa12379 isl: Correct a comment in the isl_format enum
HiZ is not a color surface, but an auxiliary depth surface.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 12:54:17 -07:00
Rob Clark
495ba8884a gallium: add missing zero-init for resource templates
Mostly test code, plus one spot I noticed in r600.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 15:50:46 -04:00
Rob Clark
3ebfc44b42 freedreno: don't try to shadow layered textures
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place.  But if we did, it
wouldn't do the right thing so just bail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Rob Clark
f88f025e8c freedreno/a3xx+a4xx: fix clip-plane lowering state
If enabled clip-planes have changed, we need to mark program state
dirty.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Ian Romanick
f546b41f6a glsl: Let cache_test build when the shader cache is not enabled
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Aaron Watry <awatry@gmail.com>
2016-10-07 11:19:37 -07:00
Lionel Landwerlin
eb23de6116 anv: pipeline cache: fix return value of vkGetPipelineCacheData
According to the spec - 9.6. Pipeline Cache :

  If pDataSize is less than the maximum size that can be retrieved by the
  pipeline cache, at most pDataSize bytes will be written to pData, and
  vkGetPipelineCacheData will return VK_INCOMPLETE.

Fixes the following test from Vulkan CTS :

  dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_fragment_stage
  dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_geometry_stage_fragment_stage
  dEQP-VK.pipeline.cache.misc_tests.invalid_size_test

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-07 18:46:12 +01:00
Timothy Arceri
965ebc8b28 util: remove unused variable
Also initialise page at declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 21:24:50 +11:00
Martin Peres
a599b1c203 loader/dri3: import prime buffers in the currently-bound screen
This tries to mirrors the codepath taken by DRI2 in IntelSetTexBuffer2()
and fixes many applications when using DRI3:
 - Totem with libva on hw-accelerated decoding
 - obs-studio, using Window Capture (Xcomposite) as a Source
 - gstreamer with VAAPI

v2:
 - introduce get_dri_screen() in the dri3 loader's vtable (krh)

Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Tested-by: Ionut Biru <biru.ionut@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71759
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
2016-10-07 11:11:55 +03:00
Martin Peres
0247e5ee3e loader/dri3: add get_dri_screen() to the vtable
This allows querying the current active screen from the
loader's common code.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
2016-10-07 11:11:44 +03:00
Jason Ekstrand
82b4f1c47b anv/entrypoints: Save off the entire devinfo rather than a pointer
Since the gen_device_info structs are no longer just constant memory, a
pointer to one is not a pointer to something in the .data section so we
shouldn't be storing it in a static variable.  Instead, we should just
store the entire device_info structure.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-06 21:13:52 -07:00
Dave Airlie
85a47f647e radv: drop all uint for unsigned.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-07 12:09:13 +10:00
Eric Anholt
20d91e5ce9 vc4: Don't worry about partial Z/S clear if the other is already cleared.
We have to be careful to not smash the value they're clearing to, but
other than that we're fine.  Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).

Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
2016-10-06 18:29:16 -07:00
Eric Anholt
cb328123fe vc4: Try to fix the HW-2116 workaround.
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch.  It also failed to count the statechanges from the GFXH-515
workaround.

This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more.  Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.

Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
2016-10-06 18:29:12 -07:00
Eric Anholt
bca9a58d04 vc4: Drop dead argument from vc4_start_draw(). 2016-10-06 18:09:24 -07:00
Eric Anholt
9421a6065c vc4: Fix fallback to quad clears of depth in GLX.
The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.
2016-10-06 18:09:24 -07:00
Eric Anholt
8810270d06 vc4: Add the format name in miptree_debug.
I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.
2016-10-06 18:09:24 -07:00
Eric Anholt
ee577e7fa7 vc4: Fix perf debug formatting on partial Z/S clear. 2016-10-06 18:09:24 -07:00
Eric Anholt
7c7bcbbc7d vc4: Drop destination register when it's unused.
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before.  This commit is just to make QIR
code failing more intelligible when register allocation fails.
2016-10-06 18:09:24 -07:00
Eric Anholt
d4ae5ca823 vc4: Fix live intervals analysis for screening defs in if statements.
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
2016-10-06 18:09:24 -07:00
Eric Anholt
06cc3dfda4 vc4: Fix simulator when more than one vc4_screen is opened.
We would assertion fail in setting up the simulator the second time
around.  This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
2016-10-06 18:09:24 -07:00
Eric Anholt
b30205b112 vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.
Fixes 100 piglit tests since the assertions were added to nir.h.  What's
amazing is that these tests used to pass, even when casting garbage.
2016-10-06 18:09:24 -07:00
Jason Ekstrand
c81ec84c1e anv/cmd_buffer: Move the clear_subpasses calls to set_subpass
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-06 16:52:31 -07:00
Jason Ekstrand
b548fdbed5 anv/cmd_buffer: Don't call set_subpass in a secondary
Initially, we had intended set_subpass to be an interesting function that
did whatever (presumably a lot) setup we needed for a subpass.  In reality,
it just sets a pointer and a dirty bit and then emits depth and stencil
state.  When we call BeginCommandBuffer on a secondary, there's no point in
setting depth and stencil state since it will already be set by the
primary.  Instead, the only thing we need to do at the start of a secondary
is set the subpass pointer and the dirty bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-06 16:52:31 -07:00
Jason Ekstrand
fe4e276b02 anv/cmd_buffer: Rework descriptor dirtying in set_subpass
We have a DIRTY_RENDER_TARGETS flag and that makes a lot more sense than
just dirtying fragment descriptors.  We're checking for it in some of the
gen7 code but unfortunately, nothing was setting it and it didn't do what
it was supposed to do in cmd_buffer_flush_state.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-06 16:52:31 -07:00
Jason Ekstrand
a1db0e87ff anv/wsi: Advertise UNORM formats as well as sRGB
Because WSI images are created with VkImageCreateInfo::flags explicitly set
to 0, they don't ever have the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set.
This means that you can't create an image view of it with a different
format so applications can't render directly in sRGB (without automatic
encoding) unless we actually advertise UNORM formats.  There are a lot of
applications that want to do their own sRGB conversion, so we should allow
for that.  We do, however, make UNORM come after sRGB in the list so that
the default for dumb apps that just grab the first thing is to render in
linear and let the sRGB conversion happen automatically.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-06 16:52:31 -07:00
Dave Airlie
5267124648 radv: fix configure.ac check
This should be positive test.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-07 09:28:03 +10:00
Gustaw Smolarczyk
24815bd7b3 radv: Skip already signalled fences.
If the user created a fence with VK_FENCE_CREATE_SIGNALED_BIT set, we
shouldn't fail to wait for a fence if it was not submitted since that is
not necessary.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-07 09:24:09 +10:00
Dave Airlie
f4e499ec79 radv: add initial non-conformant radv vulkan driver
This squashes all the radv development up until now into
one for merging.

History can be found:
https://github.com/airlied/mesa/tree/semi-interesting

This requires llvm 3.9 and is in no way considered
a conformant vulkan implementation. It can run a number
of vulkan applications, and supports all GPUs using
the amdgpu kernel driver.

Thanks to Intel for providing anv and spirv->nir,
and Emil Velikov for reviewing build integration.

Parts of this are:
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Authors: Bas Nieuwenhuizen and Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-07 09:16:09 +10:00
Samuel Pitoiset
28ecd3eac2 nv50/ir: fix wrong check when optimizing MAD to SHLADD
Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-07 01:13:06 +02:00
Lionel Landwerlin
0b10152b80 intel: aubinator: use getopt to parse arguments
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <sirisha.gandikota@intel.com>
2016-10-07 00:05:56 +01:00
Samuel Pitoiset
a198883bf7 nvc0: dump program binary only when NV50_PROG_DEBUG is set
When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 01:01:17 +02:00
Jason Ekstrand
325b3fd668 nir: Fix the control flow tests for nir_loop_first_block changes
Commit 2ed17d46de changed
nir_loop_first_cf_node and friends to return a nir_block instead of a
nir_cf_node.  This broke one of the NIR control flow tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98128
2016-10-06 15:48:30 -07:00
Samuel Pitoiset
e3f586c98d docs: mark ARB_compute_variable_group_size as done for nvc0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
56a0bed2c1 nvc0: expose ARB_compute_variable_group_size
Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
11e75fffeb nv50/ir: set number of threads/block for variable local size
When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
590734fa0d st/mesa: expose ARB_compute_variable_group_size
This extension is only exposed if the underlying driver supports
ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK
is set.

v3: - initialize max_variable_threads_per_block to 0
v2: - expose the ext based on that new cap

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
dfd7734cb7 st/mesa: add support for dispatching a variable local size
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
e78bd48b9c st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZE
gl_LocalGroupSizeARB can be translated into TGSI_SEMANTIC_BLOCK_SIZE
which represents the block size in threads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
07bb4513c6 gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
v3: - use a new case statement in r600_pipe_common.c
    - fix compilation of softpipe...

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
48de9aaa72 glsl: add gl_LocalGroupSizeARB as a system value
v2: - only add it if the ext is enabled (Ilia)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
dee627a16e glsl/linker: handle errors when a variable local size is used
Compute shaders can now include a fixed local size as defined by
ARB_compute_shader or a variable size as defined by
ARB_compute_variable_group_size.

v2: - update formatting spec quotations (Ian)
    - various cosmetic changes (Ian)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
008e785f74 glsl: reject compute shaders with fixed and variable local size
The ARB_compute_variable_group_size specification explains that
when a compute shader includes both a fixed and a variable local
size, a compile-time error occurs.

v2: - update formatting spec quotations (Ian)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
dd2bda7002 glsl: process local_size_variable input qualifier
This is the new layout qualifier introduced by
ARB_compute_variable_group_size which allows to use a variable work
group size.

v4: - add missing '%s' in the monster format string

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
d5c8481d57 glsl: add enable flags for ARB_compute_variable_group_size
This also initializes the default values for the standalone compiler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
45ab63c0cb mesa/main: add support for ARB_compute_variable_groups_size
v5: - replace fixed_local_size by !LocalSizeVariable (Nicolai)
v4: - slightly indent spec quotes (Nicolai)
    - drop useless _mesa_has_compute_shaders() check (Nicolai)
    - move the fixed local size outside of the loop (Nicolai)
    - add missing check for invalid use of work group count
v2: - update formatting spec quotations (Ian)
    - move the total_invocations check outside of the loop (Ian)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset
a063f3084a glapi: add entry points for GL_ARB_compute_variable_group_size
v2: - correctly sort that new extension (Ian)
    - fix up the comment (Ian)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Karol Herbst
f96945c5b5 nv50/ir: optimize sub(a, 0) to a
helped some ue4 demos and divinity OS shaders

total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs    : 379273 -> 379273 (0.00%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25837792 -> 25837192 (-0.00%)

                local        gpr       inst      bytes
    helped           0           0          33          33
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2016-10-06 19:39:51 +02:00
Brian Paul
6963f94e98 st/mesa: move all sampler view code into new st_sampler_view.[ch] files
Previously, the sampler view code was scattered across several different
files.

Note, the previous REALLOC(), FREE() for st_texture_object::sampler_views
are replaced by realloc(), free() to avoid conflicting macros in Mesa vs.
Gallium.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-06 11:29:32 -06:00
Brian Paul
e5cc84dd43 st/mesa: optimize pipe_sampler_view validation
Before, st_get_texture_sampler_view_from_stobj() did a lot of work to
check if the texture parameters matched the sampler view (format,
swizzle, min/max lod, first/last layer, etc).  We did this every time
we validated the texture state.

Now, we use a ctx->Driver.TexParameter() callback and a couple other
checks to proactively release texture views when we know that
view-related parameters have changed.  Then, the validation step is
simplified:
- Search the texture's list of sampler views (just match the context).
- If found, we're done.
- Else, create a new sampler view.

There will never be old, out-of-date sampler views attached to texture
objects that we have to test.

Most apps create textures and set the texture parameters once.  This
make sampler view validation much cheaper for that case.

Note that the old texture/sampler comparison code has been converted
into a set of assertions to verify that the sampler view is in fact
consistent with the texture parameters.  This should help to spot any
potential regressions.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-06 11:29:32 -06:00
Brian Paul
0f3aee888e mesa: call ctx->Driver.TexParameter() in texture_buffer_range()
To inform drivers of texture buffer offset/size changes, as we do for
other texture object parameters.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-06 11:29:32 -06:00
Brian Paul
b3127a96a9 st/mesa: consolidate view format setup code
Before, we had code to compute the sampler view's format spread across two
different functions: in update_single_texture() and
st_get_texture_sampler_view_from_stobj().  Now it's all in one new function.

Also, use _mesa_texture_base_format() to simplify the code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-06 11:29:32 -06:00
Brian Paul
628e651f64 st/mesa: add some const qualifiers in st_atom_texture.c
And minor code reformatting.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-06 11:29:32 -06:00
Brian Paul
b3c8935165 st/mesa: simplify some code in get_texture_format_swizzle()
There's no need to cast to st_texture_image.  Just use gl_texture_image.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-06 11:29:31 -06:00
Brian Paul
9add37b100 mesa: make _mesa_texture_buffer_range() static
Not called from any other file.  Also, add a comment.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-06 11:29:31 -06:00
Brian Paul
92188c207e mesa: add const qualifier, comment on can_avoid_reallocation()
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-06 11:29:31 -06:00
Brian Paul
57279c5454 mesa: add comment/assertion on get_tex_level_parameter_buffer()
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-06 11:29:31 -06:00
Jason Ekstrand
ae032e5ea6 nir: Remove some no longer needed asserts
Now that the NIR casting functions have type assertions, we have a bunch of
assertions that aren't needed anymore.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:39 -07:00
Jason Ekstrand
2ed17d46de nir: Make nir_foo_first/last_cf_node return a block instead
One of NIR's invariants is that control flow lists always start and end
with blocks.  There's no good reason why we should return a cf_node from
these functions since we know that it's always a block.  Making it a block
lets us remove a bunch of code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:37 -07:00
Jason Ekstrand
7a3bcadf4e nir: Add asserts to the casting functions
This makes calling nir_foo_as_bar a bit safer because we're no longer 100%
trusting in the caller to ensure that it's safe.  The caller still needs to
do the right thing but this ensures that we catch invalid casts with an
assert rather than by reading garbage data.  The one downside is that we do
use the casts a bit in nir_validate and it's not a validate_assert.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:24 -07:00
Steven Toth
e00fdd643b gallium/hud: Remove superfluous debug
No longer required.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:37:06 +01:00
Emil Velikov
03350c9708 amd: add amd_kernel_code_t.h to the sources list
Otherwise it won't be picked in the tarball and the build will fail.

Fixes: 91ec6e5664 ("radeonsi/compute: Use the HSA abi for non-TGSI
compute shaders v3")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:17:51 +01:00
Emil Velikov
b634be0e69 svga: add svga_mksstats.h to the sources list
Otherwise it won't be picked in the tarball and the build will fail.

Fixes: 0035f7f136 ("svga: add guest statistic gathering interface")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:17:09 +01:00
Emil Velikov
78a7415f0b glx: rename choose_visual(), drop const argument
The function deals with fb (style) configs, thus using "visual"
in the name is misleading. Which in itself had led to the use of
fbconfig_style_tags argument.

Rename the function to reflect what it does and drop the unneeded
argument.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-06 15:03:47 +01:00
Emil Velikov
2e9e05dfca glx: return GL_FALSE from glx_screen_init where applicable.
Return GL_FALSE if we fail to find any fb/visual configs, otherwise we
end up with all sorts of chaos further down the GLX stack.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-06 15:03:47 +01:00
Emil Velikov
e542ed463d glx: correctly mask the drawableType for GLX_ARB_fbconfig_float
The comment/spec says - only for pbuffer drawables, while the code
clears the window/pixmap bit. Practise what you preach and apply the
trivial tweak. In practise this should not cause functional change.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-06 15:03:46 +01:00
Chuck Atkins
a89faa2022 autoconf: Make header install distinct for various APIs (v2)
This fixes a problem where GL headers would only get installed if
glx was enabled.  So if osmesa was enabled but not glx, then the
GL headers required by osmesa would be missing from the install.

v2: Dropped unneeded mesa_glinterop.h redundant osmesa.h install

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
0216a16819 mesa: annotate AttribFuncsARB[] as const
It's read-only data, so annotate it accordingly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
0728e2bb17 mapi/glapi: remove unused _glapi_check_table()
Similar to earlier commit - symbol was never part of the public API so
we're safe to remove it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
96b9ec1ea3 glapi/hgl: remove the final user of _glapi_check_table()
The symbol is a no-op since, the EXTRA_DEBUG macro is not set in the
build. Unused by !Haiku people/platforms since 2010 (commit
a73c6540d9) while the Haiku C++ wrapper
has no obvious users.

Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
79835565c3 mapi/glapi: remove unused _glapi_check_table_not_null
Function was never part of the API/ABI and the final user was removed
with commit a73c6540d9, back in 2010.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
9b7fd4080a st/xvmc/tests: force enable assertions
Similar to the other 'tests', enable assertions in xvmc_bench.

This silences the GCC warnings about unused-variable(s), makes the
program actually useful, as the XvMC API called. Atm the function
calls are omitted, since they're called within the assert.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
0b6837a643 anv: automake: ship intel_icd.json.in in the tarball
Otherwise we'll fail to (re)generate intel_icd.json.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Emil Velikov
a42115d6e2 intel: automake: reference the correct header
The header was renamed with earlier commit, so update the
Makefile.sources respectively.

{vulkan/genX_multisample.h => common/gen_sample_positions.h}

Fixes: c779ad3e661("intel: Move Vulkan sample positions to common code")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Lionel Landwerlin
b84234fd28 intel: aubinator: add missing return characters
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-06 10:39:53 +01:00
Kenneth Graunke
f7659e02c3 nir: Delete open coded type printing.
glsl_print_type() prints arrays of arrays incorrectly.  For example,
a type with name float[3][7] would be printed as float[7][3].  (This
is an array of length 3 containing arrays of 7 floats.)  cdecl says
that the type name is correct.

glsl_print_type() doesn't really do anything above and beyond printing
type->name, and glsl_print_struct() wasn't used at all.  So, drop them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-06 02:13:36 -07:00
Philipp Zabel
0408d50f43 anv: fix GetPhysicalDeviceProperties to return timestampPeriod in ns
According to chapters 16.5. (Timestamp Queries) and 30.2 (Limits) of the
Vulkan Specification 1.0.29, the .limits.timestampPeriod field returned
by vkGetPhysicalDeviceProperties is measured in nanoseconds, not in
seconds.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 02:02:35 -07:00
Timothy Arceri
88428fbe41 i965: remove remaining tabs in brw_draw.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:04:16 +11:00
Timothy Arceri
7627fbd9b0 i965: get inputs read from nir info
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.

This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:04:09 +11:00
Timothy Arceri
7ef8286487 i965: get outputs written from nir info
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.

This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:04:03 +11:00
Timothy Arceri
b526a9b708 i965: get outputs read from nir info
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.

This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:03:57 +11:00
Timothy Arceri
a38c809f6e i965: remove remaining tabs in brw_wm.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:03:52 +11:00
Timothy Arceri
201f940d2e mesa: remove the UsesDFdy flag
Seems the last user of this was removed in 08bc74e69.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:03:46 +11:00
Timothy Arceri
556335eb99 i965: get uses discard from nir info
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.

This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:03:40 +11:00
Timothy Arceri
ee829cba8e i965: get uses texture gather from nir info
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.

This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-06 16:03:00 +11:00
Kenneth Graunke
a85a8ecd32 i965: Eliminate brw->cs.prog_data pointer.
Just say no to:

-   brw->cs.base.prog_data = &brw->cs.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_cs_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:35 -07:00
Kenneth Graunke
16d5536e55 i965: Eliminate brw->wm.prog_data pointer.
Just say no to:

-   brw->wm.base.prog_data = &brw->wm.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_wm_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:35 -07:00
Kenneth Graunke
ff366f3db4 i965: Eliminate brw->gs.prog_data pointer.
Just say no to:

-   brw->gs.base.prog_data = &brw->gs.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_gs_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:33 -07:00
Kenneth Graunke
e512941537 i965: Eliminate brw->tes.prog_data pointer.
Just say no to:

-   brw->tes.base.prog_data = &brw->tes.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tes_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:09 -07:00
Kenneth Graunke
82c97ac710 i965: Eliminate brw->tcs.prog_data pointer.
Just say no to:

-   brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tcs_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:09 -07:00
Kenneth Graunke
40258a13d5 i965: Eliminate brw->vs.prog_data pointer.
Just say no to:

-   brw->vs.base.prog_data = &brw->vs.prog_data->base.base;

We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:21:06 -07:00
Kenneth Graunke
e51e055fcd i965: Introduce downcast helpers for prog_data structures.
Similar to brw_context(...), intel_texture_object(...), and so on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-05 19:20:42 -07:00
Chad Versace
74b02a7449 i965/sync: Rename awkward variable
What is the difference between a 'driver_fence' and a 'fence'? Do the
characters 'driver_' add anything helpful? Nope. They do, though, add an
extra 7 chars and pull your eyeballs away to ask "huh? what's that?" one
microsecond too many.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-05 17:09:25 -07:00
Chad Versace
a99ff82714 i965/sync: Rename intel_syncobj.c -> brw_sync.c
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-05 17:09:25 -07:00
Chad Versace
9ea48fc877 i965/sync: Replace 'intel' prefix with 'brw'
This is yet another patch for the great renaming begun long ago.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-05 17:09:24 -07:00
Chad Versace
ce1d67c2e5 i965/sync: Fix uninitalized usage and leak of mutex
We locked an unitialized mutex in the callstack
    glClientWaitSync
    intel_gl_client_wait_sync
    brw_fence_client_wait_sync
because we forgot to initialize it in intel_gl_fence_sync.
(The EGLSync codepath didn't have this bug. It initialized the mutex in
intel_dri_create_sync).

We also forgot to tear down (mtx_destroy) the mutex when destroying
the sync object.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-05 17:09:24 -07:00
Jason Ekstrand
28ab2570c8 nir: Use the correct infos structure for copying atomic sources
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: "12.0" <mesa-dev@lists.freedestkop.org>
2016-10-05 13:04:54 -07:00
Samuel Pitoiset
a41cfbbf2b nvc0: dump program binary when chipset has been forced
Currently, program binaries are only dumped at upload time, but
when the chipset has been forced via NV50_PROG_CHIPSET we might
want to show the generated code, especially with shaderdb.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-05 21:15:44 +02:00
Marek Olšák
cc4a19c4ad radeonsi: fix texture border colors for compute shaders
There are VM faults without this.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák
844f8268e1 gallium/radeon/winsyses: set reasonable max_alloc_size
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.

This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák
1b37e5541c radeonsi: fix interpolateAt opcodes for .zw components
Not returning garbage in .zw seems pretty important.

This fixes:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.*

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
300a8221e9 radeonsi: add assertions to validate interpolation flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
d4a8bf89ce radeonsi: interpolate colors after interpolation weight shuffling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
faee2d6dda tgsi/scan: don't set interp flags for inputs only used by INTERP (v2)
(v1 pushed, then reverted)

This fixes 9 randomly failing tests on radeonsi:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

v2: use input_interpolate[input] (correct) instead of
    input_interpolate[index] (incorrect)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák
10e5f126dd ddebug: dump most driver information with GALLIUM_DDEBUG=always
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Karol Herbst
d8bcd3ef37 nv50/ra: let simplify return an error and handle that
fixes a crash in the case simplify reports an error

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-05 19:11:42 +02:00
Nanley Chery
f315c4f189 intel/blorp: Use documented RECTLIST vertex positions
Use the vertex positions described in the PRMs. This has no effect on
rendering but quiets the simulator warnings seen when the vertices
appear out of order.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-10-05 09:41:21 -07:00
Jason Ekstrand
e3a1d33077 anv/meta: Roll clear_image into CmdClearDepthStencilImage
It is now the only caller so there's no sense in keeping things split out.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-05 09:33:44 -07:00
Jason Ekstrand
f027609a64 anv: Use blorp for VkCmdFillBuffer
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-05 09:33:44 -07:00
Kyle Brenneman
ca9f26ac6f egl: Implement EGL_KHR_debug (v2)
Wire up the debug entrypoints to EGL dispatch, and add the extension
string to the client extension list.

v2:
- Lots of style fixes
- Fix missing EGLAPIENTRYs
- Factor out valid attribute check
- Lock display in eglLabelObjectKHR as needed, and use RETURN_EGL_*
- Move "EGL_KHR_debug" into asciibetical order in client extension
  string

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.veliko@collabora.com>
2016-10-05 11:41:26 -04:00
Kyle Brenneman
6a5545d3ba egl: Track EGL_KHR_debug state when going through EGL API calls (v3)
This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.

This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.

v2:
- Clear the object label in more cases in _eglSetFuncName
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval

v3:
- Set dummy thread's CurrentAPI to EGL_OPENGL_ES_API not zero
- Less ?: in _eglSetFuncName

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.veliko@collabora.com>
2016-10-05 11:40:51 -04:00
Lionel Landwerlin
f8b861a867 intel: aubinator: pack supported generations into an array
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-05 16:23:28 +01:00
Ben Widawsky
2dc06e2324 i965/l3: Add explicit way size calculation for bxt
There should be no functional change here because Broxton and CHV are
both gt1. Without this code however, it might seem like broxton support
is missing.

While here, put the gt1 check in front to hopefully short-circuit the
condition for the mobile cases.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-05 07:57:58 -07:00
Nicolai Hähnle
11cc59afca virgl: Fix build regression of commit 8a943564 2016-10-05 16:27:29 +02:00
Nicolai Hähnle
0cba7b771a st/mesa: enable GL_KHR_robustness
The difference to the virtually identical ARB_robustness (which is already
enabled unconditionally) is miniscule and handled elsewhere, but this cap
seems like the right thing to require for this extension.

v2: drop the device reset cap requirement (Ilia)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-05 15:51:59 +02:00
Nicolai Hähnle
b5cd7dfe3e gallium/radeon: implement set_device_reset_callback
Check for device reset on flush. It would be nicer if the kernel just
reported this as an error on the submit ioctl (and similarly for fences),
but this will do for now.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:56 +02:00
Nicolai Hähnle
a1fa8b731f st/mesa: set a device reset callback when available
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:53 +02:00
Nicolai Hähnle
d856130025 st/mesa: extract conversion from pipe_reset_status to GLenum
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:49 +02:00
Nicolai Hähnle
07bea09c64 ddebug: add pass-through of set_device_reset_callback
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:47 +02:00
Nicolai Hähnle
1a3c75e30e gallium: add pipe_context::set_device_reset_callback
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:51:34 +02:00
Nicolai Hähnle
8a943564fd virgl: use the new parent/child pools for transfers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:42:22 +02:00
Nicolai Hähnle
2a83036fe2 vc4: use the new parent/child pools for transfers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:42:20 +02:00
Nicolai Hähnle
0334ba150f freedreno: use the new parent/child pools for transfers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:42:17 +02:00
Nicolai Hähnle
616e36674a r300: use the new parent/child pools for transfers (v2)
v2: slab_alloc_st -> slab_alloc

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:42:13 +02:00
Nicolai Hähnle
e56e1f8119 gallium/radeon: use the new parent/child pools for transfers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-05 15:42:07 +02:00
Nicolai Hähnle
d8cff811df util/slab: re-design to allow migration between pools (v3)
This is basically a re-write of the slab allocator into a design where
multiple child pools are linked to a parent pool. The intention is that
every (GL, pipe) context has its own child pool, while the corresponding
parent pool is held by the winsys or screen, or possibly the GL share group.

The fast path is still used when objects are freed by the same child pool
that allocated them. However, it is now also possible to free an object in a
different pool, as long as they belong to the same parent. Objects also
survive the destruction of the (child) pool from which they were allocated.

The slow path will return freed objects to the child pool from which they
were originally allocated. If that child pool was destroyed, the corresponding
page is considered an orphan and will be freed once all objects in it have
been freed.

This allocation pattern is required for pipe_transfers that correspond to
(GL) buffer object mappings when the mapping is created in one context
which is later destroyed while other contexts of the same share group live
on -- see the bug report referenced below.

Note that individual drivers do need to migrate to the new interface in
order to benefit and fix the bug.

v2: use singly-linked lists everywhere
v3: use p_atomic_set for page->u.num_remaining

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894
2016-10-05 15:40:40 +02:00
Nicolai Hähnle
8915f0c0de util: use GCC atomic intrinsics with explicit memory model
This is motivated by the fact that p_atomic_read and p_atomic_set may
somewhat surprisingly not do the right thing in the old version: while
stores and loads are de facto atomic at least on x86, the compiler may
apply re-ordering and speculation quite liberally. Basically, the old
version uses the "relaxed" memory ordering.

The new ordering always uses acquire/release ordering. This is the
strongest possible memory ordering that doesn't require additional
fence instructions on x86. (And the only stronger ordering is
"sequentially consistent", which is usually more than you need anyway.)

I would feel more comfortable if p_atomic_set/read in the old
implementation were at least using volatile loads and stores, but I
don't see a way to get there without typeof (which we cannot use here
since the code is compiled with -std=c99).

Eventually, we should really just move to something that is based on
the atomics in C11 / C++11.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-05 15:39:39 +02:00
Lionel Landwerlin
d51c1f9d51 i965: use L3 data cache for SSBOs
Anv programs the hardware to use L3 data cache if we use either SSBOs or
images in the shaders, we can program i965 the same way.

gl_shader_program has a bit of a confusing named field with
'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the
shader in an atomic way but instead the number of atomic counters
manipulated by the shader.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-05 12:24:04 +01:00
Kenneth Graunke
a40640f530 mesa: Raise INVALID_ENUM in FramebufferTexture*D for unknown textargets.
ES3-CTS.functional.negative_api.buffer.framebuffer_texture2d expects
glFramebufferTexture[123]D to raise GL_INVALID_ENUM when
supplied a completely bogus textarget parameter (i.e. 0xffffffff).

This is at odds with the spec.  GLES 3.1 says:

   "An INVALID_OPERATION error is generated if texture is not zero and
    textarget is not one of TEXTURE_2D, TEXTURE_2D_MULTISAMPLE, or one
    of the cube map face targets from table 8.21."

(and GLES 3.0 and GL 4.5 both have similar text).  However, GL has a
general guideline that says:

   "If a command that requires an enumerated value is passed a symbolic
    constant that is not one of those specified as allowable for that
    command, an INVALID_ENUM error is generated."

Apparently other vendors reconcile these two rules as follows: GL should
raise INVALID_OPERATION for actual texture target enumeration values
which are not allowed for this particular glFramebufferTexture*D call.
Any value that is not a texture target should result in GL_INVALID_ENUM.

For example, glFramebufferTexture2D with GL_TEXTURE_1D would result in
INVALID_OPERATION because it is a real texture target, but not allowed
for the 2D version of the function.  But calling it with GL_FRONT would
result in INVALID_ENUM, as that isn't even a texture target.

Fixes:
- {ES3-CTS,dEQP-GLES3}.functional.negative_api.buffer.framebuffer_texture2d
- {ES31-CTS,ES32-CTS,dEQP-GLES31}.functional.debug.negative_coverage.get_error.buffer.framebuffer_texture2d

References: https://gitlab.khronos.org/opengl/cts/merge_requests/387
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-04 21:10:24 -07:00
Kenneth Graunke
aecdb21be8 mesa: Reorganize check_textarget().
Having one top-level switch statement covering all known texture targets
will make the next change easier to implement.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-04 21:10:05 -07:00
Kenneth Graunke
53b8f6374f aubinator: use the correct format specifier for printing ptrdiff_t.
Fixes more warnings in 32-bit builds.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-04 17:28:01 -07:00
Kenneth Graunke
af41e1a500 aubinator: Use less -RS instead of -r for the implicit pager.
From the less man page:

   "Warning: when the -r option is used, less cannot keep track of the
    actual appearance  of  the screen (since this depends  on  how the
    screen responds to each type of control character).  Thus, various
    display problems may result, such as long lines being split in the
    wrong place."

Lines which are too long to fit in the terminal would be word wrapped,
but unfortunately less would get confused about which line it was on,
and text would be drawn on top of other text.  The most noticable case
was shader assembly, which is frequently too wide for an 80 character
terminal, and thus would be drawn on top of the following state packets,
making them completely unreadable.

Using -R instead of -r fixes this problem by only allowing color escape
sequences.  (Notably, Git's implicit pager invocation uses -R.)
Unfortunately, it means our "clear to the end of the line" hack for
extending the blue bar headers won't work anymore.

Word wrapping usually isn't terribly readable, anyway, so we also add
the -S option (chop long lines) to restrict it to the terminal width.
(You can hit the left and right arrow keys to scroll sideways.)

Then, for a new blue bar hack, we can use a printf specifier to pad
the command packet names to be 80 characters long (arbitrarily), which
extends them "far enough" to look good, and doesn't require us to use
ioctls to determine the terminal width.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sirisha Gandikota <sirisha.gandikota@intel.com>
2016-10-04 17:25:46 -07:00
Kenneth Graunke
8a484a63f8 i965: Drop _NEW_TRANSFORM from 3DSTATE_VS atom on Gen7.
The atom that uploads push constants listens to _NEW_TRANSFORM for
legacy clip plane handling.  On Sandybridge, the gen6_vs_state atom
emits 3DSTATE_CONSTANT_VS as well as 3DSTATE_VS, so it needs to listen
to the same set of conditions.

However, it looks like Gen7 doesn't need this.  The push constant atom
emits 3DSTATE_CONSTANT_VS directly, and the gen7_vs_state atom that
emits 3DSTATE_VS doesn't have a dependency on ctx->Transform.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:21:40 -07:00
Kenneth Graunke
d3cc3d28bd i965: Fix brw_clear_cache to clean up TCS/TES shaders.
We need to free prog_data for TCS/TES too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
2016-10-04 17:09:08 -07:00
Kenneth Graunke
bab1c05634 i965: Add missing BRW_CS_PROG_DATA to CS work group surface atom.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Kenneth Graunke
ce6c80ebbb i965: Add missing BRW_NEW_CS_PROG_DATA to compute constant atom.
CACHE_NEW_CS_PROG hasn't existed in quite a long time...the old
comment was there, but not the actual bit.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Kenneth Graunke
f2b9b0c730 i965: Add missing BRW_NEW_FS_PROG_DATA to render target reads.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Kenneth Graunke
0047d600af i965: Move BRW_NEW_FRAGMENT_PROGRAM from 3DSTATE_PS to PS_EXTRA.
3DSTATE_PS doesn't need this.  3DSTATE_PS_EXTRA however does,
for brw_color_buffer_write_enabled().

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Kenneth Graunke
28e1538be7 i965: Add missing BRW_NEW_VS_PROG_DATA to 3DSTATE_CLIP.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Kenneth Graunke
78df96256b i965: Fix missing _NEW_TRANSFORM in Gen8+ 3DSTATE_DS atom.
Needed for user clip plane enables.  Broken since this code was
introduced.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 17:09:07 -07:00
Ian Romanick
40dd45d0c6 i965: Enable ARB_shader_atomic_counter_ops
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-04 16:53:32 -07:00
Ian Romanick
3d2011cb33 i965: Refactor emission of atomic counter operations
This will make it easier to add more operations.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-04 16:53:32 -07:00
Ian Romanick
7cd0b3084c nir/intrinsics: Add more atomic_counter ops
v2: Delete some stray debug code notice by Iago.

v3: Massive rebase on new ir_function_signature::intrinsic_id mechanism.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:32 -07:00
Ian Romanick
2c9a17ac79 nir/intrinsics: Include atomic_counter_ in the names used in macro invocations
Otherwise grepping for where atomic_counter_inc and friends are defined
is a very frustrating experience.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:32 -07:00
Ian Romanick
c42fe30c86 glsl: Kill __intrinsic_atomic_sub
Just generate an __intrinsic_atomic_add with a negated parameter.

Some background on the non-obvious reasons for the the big change to
builtin_builder::call()... this is cribbed from some discussion with
Ilia on mesa-dev.

Why change builtin_builder::call() to allow taking dereferences and
create them here rather than just feeding in the ir_variables directly?
The problem is the neg_data ir_variable node would have to be in two
lists at the same time: the instruction stream and parameters.  The
ir_variable node is automatically added to the instruction stream by the
call to make_temp.  Restructuring the code so that the ir_variables
could be in parameters then move them to the instruction stream would
have been pretty terrible.

ir_call in the instruction stream has an exec_list that contains
ir_dereference_variable nodes.

The builtin_builder::call method previously took an exec_list of
ir_variables and created a list of ir_dereference_variable.  All of the
original users of that method wanted to make a function call using
exactly the set of parameters passed to the built-in function (i.e.,
call __intrinsic_atomic_add using the parameters to atomicAdd).  For
these users, the list of ir_variables already existed:  the list of
parameters in the built-in function signature.

This new caller doesn't do that.  It wants to call a function with a
parameter from the function and a value calculated in the function.  So,
I changed builtin_builder::call to take a list that could either be a
list of ir_variable or a list of ir_dereference_variable.  In the former
case it behaves just as it previously did.  In the latter case, it uses
(and removes from the input list) the ir_dereference_variable nodes
instead of creating new ones.

   text	   data	    bss	    dec	    hex	filename
6036395	 283160	  28608	6348163	 60dd83	lib64/i965_dri.so before
6036923	 283160	  28608	6348691	 60df93	lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:32 -07:00
Ian Romanick
bb290b5679 glsl: Remove ir_function_signature::_is_intrinsic field
text	   data	    bss	    dec	    hex	filename
6036491	 283160	  28608	6348259	 60dde3	lib64/i965_dri.so before
6036395	 283160	  28608	6348163	 60dd83	lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:31 -07:00
Ian Romanick
acfcc7bbfa glsl: Add ir_function_signature::is_intrinsic() method
This necessetated renaming the is_intrinsic field to _is_intrinsic.  The
next commit will remove the field.

   text	   data	    bss	    dec	    hex	filename
6036507	 283160	  28608	6348275	 60ddf3	lib64/i965_dri.so before
6036491	 283160	  28608	6348259	 60dde3	lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:31 -07:00
Ian Romanick
b7df52b106 glsl: Use the ir_intrinsic_* enums instead of the __intrinsic_* name strings
text	   data	    bss	    dec	    hex	filename
6038043	 283160	  28608	6349811	 60e3f3	lib64/i965_dri.so before
6036507	 283160	  28608	6348275	 60ddf3	lib64/i965_dri.so after

v2: s/ir_intrinsic_atomic_sub/ir_intrinsic_atomic_counter_sub/.  Noticed
by Ilia.

v3: Silence unhandled enum in switch warnings in st_glsl_to_tgsi.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:31 -07:00
Ian Romanick
5854de99b2 glsl: Track a unique intrinsic ID with each intrinsic function
text	   data	    bss	    dec	    hex	filename
6037483	 283160	  28608	6349251	 60e1c3	lib64/i965_dri.so before
6038043	 283160	  28608	6349811	 60e3f3	lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:31 -07:00
Ian Romanick
c01f2bfc6c glsl: Don't emit ir_binop_carry during ir_binop_imul_high lowering
st_glsl_to_tgsi only calls lower_instructions once (instead of in a
loop), so the ir_binop_carry generated would not get lowered.  Fixes
assertion failure

state_tracker/st_glsl_to_tgsi.cpp:2265: void glsl_to_tgsi_visitor::visit_expression(ir_expression*, st_src_reg*): Assertion `!"Invalid ir opcode in glsl_to_tgsi_visitor::visit()"' failed.

on softpipe in 16 piglit tests:

    mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb-nonuniform.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb.shader_test
    mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 16:53:31 -07:00
Timothy Arceri
0e8f1eaf41 i965: fix unused variable warning in brw_emit_gpgpu_walker()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-05 10:14:05 +11:00
Timothy Arceri
6fdfcd4d1c i965: add MAYBE_UNUSED to assert param
Fixes unused variable warning in release build.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-05 10:13:58 +11:00
Timothy Arceri
4340294af8 i965: wrap unused function in #ifndef NDEBUG
This function is only ever used by an assert() this fixes an
unused function warning in release builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-05 10:13:58 +11:00
Timothy Arceri
c9f1767903 i965: fix unused variable warning in gen7_block_read_scratch()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-05 10:13:58 +11:00
Timothy Arceri
df4ff31d3c i965: add MAYBE_UNUSED to assert param
This fixes an unused variable warning on release builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-05 10:13:52 +11:00
Jose Fonseca
437d7e1baf gallivm: Use AVX2 gather instrinsics.
v2: Use AVX2 gather for non aligned loads too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-04 23:36:20 +01:00
Roland Scheidegger
bc80741d7a gallivm: Use 8 wide AoS sampling on AVX2.
v2: Make sure that with num_lods > 1 and min_filter != mag_filter we
still enter the splitting path. So this case would still use 4-wide aos
path (as a side note, the 4-wide aos sampling path could actually be
improved quite a bit if we have avx2, by just doing the filtering with
256bit vectors).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-04 23:36:20 +01:00
José Fonseca
e088390c7d gallivm: Basic AVX2 support.
v2: pblendb -> pblendvb

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-04 23:36:20 +01:00
Chad Versace
add01add1b egl: Drop duplicate check on EGLSync type
_eglInitSync checked that the display supported the sync type (such as
EGL_SYNC_FENCE), and did it wrong. When the check failed it emitted
EGL_BAD_ATTRIBUTE, but sometimes EGL_BAD_PARAMETER is needed.

_eglCreateSync already does the error checking, and it does it right.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:29 -07:00
Chad Versace
02e4f1cb43 egl: Cleanup control flow in _eglParseSyncAttribList
When the function encountered an error, it effectively returned
immediately. However, it did so indirectly by breaking out of a loop.
Replace the loop breakout with a explicit 'return'.

Do the same for _eglParseSyncAttribList64 too.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:29 -07:00
Chad Versace
3e0d575a6d egl: Add _eglConvertIntsToAttribs()
This function converts an attribute list from EGLint[] to EGLAttrib[].
Will be used in following patches to cleanup EGLSync attribute parsing.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:29 -07:00
Chad Versace
f2c2f43d4e egl: Fix an error path in eglCreateSync*
When the user called eglCreateSync64KHR on a display without
EGL_KHR_cl_event2 (the only extension that exposes it), we returned
EGL_NO_SYNC but did not update the error code.

We also did the same for eglCreateSync on a display without EGL 1.5.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:28 -07:00
Chad Versace
69adb9a778 egl: Fix truncation error in _eglParseSyncAttribList64
The function stores EGLAttrib values in EGLint variables. On 64-bit
systems, this truncated the values.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:28 -07:00
Chad Versace
17084b6f93 egl: Fix missing unlock in eglGetSyncAttribKHR
On the error path, eglGetSyncAttribKHR neglected to unlock the
EGLDisplay before returning.

Fixes deadlock in dEQP-EGL.functional.fence_sync.invalid.get_invalid_value.

Cc: mesa-stable@lists.freedesktop.org
Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 14:11:22 -07:00
Anuj Phogat
d2112fc8d9 anv/gen7_pipeline: Fix typo in semicolon
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:35 -07:00
Anuj Phogat
1ffcf95fc4 anv/gen7_pipeline: Set sample mask field in 3DSTATE_PS
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:35 -07:00
Anuj Phogat
deeb1e95d0 anv/gen7_pipeline: Move ksp{1,2} state setting next to ksp0
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:35 -07:00
Anuj Phogat
517b1bf499 anv/gen7: Make use of local variable prog_data
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
2abb7486f5 anv/gen8_pipeline: Add an assert to ensure use_alt_mode is not set in prog_data
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-04 13:20:34 -07:00
Anuj Phogat
fa04b57c15 anv/gen8_pipeline: Fix typo in semicolon
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
7daafad9ac intel/genxml: Keep the value name 'Alternate' uniform across gen75.xml
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
c0f02bbc57 intel/genxml: Fix typo in gen75.xml
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
cd69d3f929 i965/gen8+: Enable GL_OES_viewport_array
This patch causes 2 regressions in khronos' gles cts tests
on various intel platforms.
Failing tests:
ES3-CTS.functional.state_query.integers.viewport_getinteger
ES3-CTS.functional.state_query.integers.viewport_getfloat

Here is an explanation of what's causing the failures:

CTS tests are not clamping the x, y location of the viewport's
bottom-left corner as recommended by ARB_viewport_array and
OES_viewport_array:
"The location of the viewport's bottom-left corner, given by (x,y), are
 clamped to be within the implementation-dependent viewport bounds range.
 The viewport bounds range [min, max] tuple may be determined by
 calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES"

Khronos CTS merge request to fix the test case:
https://gitlab.khronos.org/opengl/cts/merge_requests/399

V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 13:20:34 -07:00
Anuj Phogat
239ff64173 mesa: Add a check for OES_viewport_array
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 13:20:34 -07:00
Anuj Phogat
0a7691ee62 mesa: Enable enums for OES_viewport_array
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-04 13:20:34 -07:00
Anuj Phogat
2c7e1165fa anv/gen7_pipeline: Use MSDISPMODE_PERSAMPLE for non-multisampled fbo
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
f75a93f610 anv/blorp: Handle zero width/height blits in blorp_copy()
V2: Move the check from copy_buffer_to_image() to blorp_copy(). (Nanley)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-04 13:20:34 -07:00
Anuj Phogat
2c78b2ec90 intel/isl: Add an assert to check zero width/height surface
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Leo Liu
0e85ff3355 st/omx/dec/h265: add scaling list data
Specified by subclause 7.3.4

v2: get the loop optimized

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-04 11:09:59 -04:00
Leo Liu
ffb863fd2c st/omx/dec/h265: fix the skip for before and after list
For reference picture sets, there are cases that rps will not always
be used. Once detect the unused flag from encoded bitstream, we should
not add this rps to any list, otherwise pass the incorrect reference
and skip the correct rps.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
c50b68e6a8 st/omx/dec/h265: set the default reference picture set for reference
It will fix the corruption for frame, that only has one stort term ref
picture set, we set NULL rps for this case previously, causing taking
incorrect reference. Instead we should take that only short term set
as reference

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
091aae0265 st/omx/dec/h265: decoder size should follow from sps
The video size from format container is not always compatible with
the size from codec bitstream, the HW decoder should take the size
information from bitstream, otherwise the corruption appears with clip
that has different size info between bitstream and format container

So we are passing width(height)_in_samples from sequence parameter
set to video decoder.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
2371119db9 st/omx/dec/h265: increase dpb max size to 32
For clip with frame delta poc over 16

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-10-04 11:09:59 -04:00
Eric Engestrom
66f85c3824 nir/spirv: Remove a duplicate spirv2nir from .gitignore
This reverts commit fc03ecfeaf.

Chad had already pushed the same change between me posting the patch and Jason
pushing it: 44bcf1ffcc (".gitignore: Ignore src/compiler/spirv2nir")

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 07:43:15 -07:00
Nicolai Hähnle
8b1f9fd3b3 radeonsi: optionally run the LLVM IR verifier pass
This is enabled automatically if shader printing is enabled, or separately
by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later
pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:33 +02:00
Nicolai Hähnle
1e9476e8c5 gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsics
Caught by R600_DEBUG=checkir (next commit).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:28 +02:00
Nicolai Hähnle
1b6fb88ab2 gallium/radeon: unify the creation of basic blocks
This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:25 +02:00
Nicolai Hähnle
d377f4c1ca gallium/radeon: merge branch and loop flow control stacks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:21 +02:00
Nicolai Hähnle
b0d50e157d gallium/radeon: simplify if/else/endif blocks
In particular, we no longer emit an else block when there is no ELSE
instruction.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:18 +02:00
Nicolai Hähnle
89e9de2ea6 gallium/radeon: label basic blocks by the corresponding TGSI pc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:15 +02:00
Nicolai Hähnle
6f87d7a146 gallium/radeon: cleanup and fix branch emits
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).

This also fixes a bug where multiple terminators were created in a block.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:10 +02:00
Nicolai Hähnle
dfc1afda83 winsys/radeon: add buffer_get_reloc_offset
Really fix the bug that was supposed to be fixed by commits 3e7cced4b and
a48bf02d: even when virtual addresses are used, the legacy relocation-based
method with offsets relative to the kernel's buffer object are used for
video submissions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:37:44 +02:00
Marek Olšák
71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák
b2a694f079 radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interp
We can finally do this, because the opcodes are scalar now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:14 +02:00
Marek Olšák
b57aef8033 radeonsi: simplify si_llvm_emit_ddxy
si_llvm_emit_ddxy is called once per element, so we don't have to generate
code for 4 elements at once.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:12 +02:00
Marek Olšák
046c199c3a radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:11 +02:00
Marek Olšák
bcc55e1f32 radeonsi: use a helper function for BuildGEP(0, x)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:10 +02:00
Marek Olšák
e20f7142a3 radeonsi: remove obsolete shader definitions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:09 +02:00
Marek Olšák
8c6ea5a6ff radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:07 +02:00
Marek Olšák
3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák
53d2c8f00f radeonsi: don't re-create shader PM4 states after scratch buffer update
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:05 +02:00
Marek Olšák
6c01684393 gallium/radeon: move r600_common_context::texture_buffers to r600g
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:03 +02:00
Marek Olšák
7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák
7e6428e0a8 radeonsi: optimize si_invalidate_buffer based on bind_history
Just enclose each section with: if (rbuffer->bind_history & PIPE_BIND_...)

Bioshock Infinite: +1% performance

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:00 +02:00
Marek Olšák
e43bd861e8 radeonsi: track buffer bind history
similar to gl_buffer_object::UsageHistory

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:58 +02:00
Marek Olšák
b523a9ddc5 radeonsi: drop support for NULL sampler views
not used anymore. It was used when the polygon stipple texture was constant.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:57 +02:00
Marek Olšák
82e51e8188 radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emission
We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:56 +02:00
Marek Olšák
3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák
f92113c5a1 radeonsi: don't check PIPE_BARRIER_MAPPED_BUFFER
Caches are always flushed at IB boundary.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:51 +02:00
Marek Olšák
ca1d1e0e19 radeonsi: parse SURFACE_SYNC correctly on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:49 +02:00
Marek Olšák
37065b0583 gallium/radeon: inline r600_context_add_resource_size
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:47 +02:00
James Legg
e33f31d61f radeonsi: Fix primitive restart when index changes
If primitive restart is enabled for two consecutive draws which use
different primitive restart indices, then the first draw's primitive
restart index was incorrectly used for the second draw.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 15:57:37 +02:00
Timothy Arceri
338d3c0b0f spirv: replace assert() with unreachable()
This fixes an uninitialized warning for is_vertex_input.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 22:33:51 +11:00
Timothy Arceri
298c2e03d7 intel: use the correct format specifier for printing uint64_t
Fixes a bunch of warnings in 32-bit builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-04 22:32:57 +11:00
Matt Whitlock
42ed8a6c9c gallium/winsys: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:03 +02:00
Matt Whitlock
ac6064f918 st/xa: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:01 +02:00
Matt Whitlock
0c060f691c st/dri: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:58 +02:00
Matt Whitlock
5d0069eca2 gallium/auxiliary: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:55 +02:00
Matt Whitlock
c8fd7d060d egl/android: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:50 +02:00
Tapani Pälli
387e0af0b4 intel: fix compilation warning on gen_get_device_info
(warning: 'const' type qualifier on return type has no effect)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-04 07:38:45 +03:00
Kenneth Graunke
9d6ca7c3d0 i965: Only emit 1 viewport when possible.
In core profile, we support up to 16 viewports.  However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.

Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor.  This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.

This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case.  The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.

According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-03 18:41:10 -07:00
Dave Airlie
7eb7684818 spirv: translate cull distance semantic.
This just translates to the correct cull distance slot.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:16:23 +10:00
Dave Airlie
bd0157d542 compiler: add printable values for cull distance varyings.
We need these for spir-v/nir shaders.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:15:23 +10:00
Jason Ekstrand
6ffbfc760d nir/spirv/cfg: Use a nop intrinsic for tagging the ends of blocks
Previously, we were saving off the last nir_block in a vtn_block before
moving on so that we could find the nir_block again when it came time to
handle phi sources.  Unfortunately, NIR's control flow modification code is
inconsistent when it comes to how it splits blocks so the block pointer we
saved off may point to a block somewhere else in the shader by the time we
get around to handling phi sources.  In order to get around this, we insert
a nop instruction and use that as the logical end of our block.  Since the
control flow manipulation code respects instructions, the nop will keeps
its place like any other instruction and we can easily find the end of our
block when we need it.

This fixes a bug triggered by a couple of vkQuake shaders.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-03 16:17:12 -07:00
Jason Ekstrand
7697b4b98b nir: Add a nop intrinsic
This intrinsic has no destination, no sources, no variables, and can be
eliminated.  In other words, it does nothing and will always get deleted by
dead code elimination.  However, it does provide a quick-and-easy way to
temporarily tag a particular location in a NIR shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-03 16:17:12 -07:00
Jason Ekstrand
0176c6a692 intel/isl: Allow non-2D HiZ surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
4e397c6c75 intel/isl: Add a detailed comment about multisampling with HiZ
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
c3bd711411 intel/isl: Remove tiling checks from choose_msaa_layout
We already do those checks in filter_tiling.  There's no good reason to
repeat them in choose_msaa_layout.  If anything they should have been
asserts and not "return false" checks.  Also, this check was causing us to
outright reject multisampled HiZ surfaces which wasn't intended.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
69d3bb9915 intel/isl: Handle HiZ and CCS tiling more directly
The HiZ and CCS tiling formats are always used for HiZ and CCS surfaces
respectively.  There's no reason why we should go through filter_tiling and
it's much easier to always get HiZ and CCS right if we just handle them
directly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
b1311a48e0 intel/isl: Allow multisampling with ISL_FORMAT_HiZ
HiZ buffers can be multisampled and, on Broadwell and earlier, simply using
interleaved multisampling with a compression block size of 8x4 samples
yields the correct HiZ surface size calculations.  Unfortunately,
choose_msaa_layout was rejecting multisampled HiZ buffers because of format
checks.  Now that we have a simple helper for determining if a format
supports multisampling, that's an easy enough issue to fix.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
baade41a5c intel/isl: Allow creation of 1-D compressed textures
Compressed 1-D textures are not well-defined thing in either GL or Vulkan.
However, auxiliary surfaces are treated as compressed textures in ISL and
we can do HiZ and CCS with 1-D so we need to be able to create them.  In
order to prevent actually using them (the docs say no), we assert in the
state setup code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
f82166578f intel/isl: Fix up asserts in calc_phys_level0_extent_sa
The assertion that a format is uncompressed in the multisample layouts
isn't quite right.  What we really want to assert is that the format
supports multisampling which is a bit more complicated query.  We also want
to assert that it has a block size of 1x1 since we do nothing with the
block size in the phys_level0_sa assignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
5637f3f120 intel/isl: Add a format_supports_multisampling helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Nayan Deshmukh
b7a0f2e1f7 vl/dri3: fix warning about incompatible pointer type
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-10-03 12:51:30 -04:00
Bruce Cherniak
903d00cd32 swr: Removed stalling SwrWaitForIdle from queries.
Previous fundamental change in stats gathering added a temporary
SwrWaitForIdle to begin_query and end_query.  Code has been reworked to
remove stall.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-10-03 09:57:45 -05:00
Tim Rowley
cdac042733 swr: [rasterizer core] refactor thread creation
Create worker pool now computes number of worker threads based on
things like topologies, etc. and creates the pool but doesn't actually
launch the threads. Instead there is a separate start thread pool
function. This allows thread resources to be constructed first before
threads start.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:57:38 -05:00
Tim Rowley
114f7a92c6 swr: [rasterizer jitter] canonicalize blend compile state
Canonicalize to prevent unnecessary JIT compiles.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:57:31 -05:00
Tim Rowley
4198520a82 swr: [rasterizer core] archrast fixes
- Immediately sleep threads until thread data is initialized
- Fix some compile bugs with AR enabled

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:57:25 -05:00
Tim Rowley
aaeb07989e swr: [rasterizer jitter] fixes for icc in vs2015 compat mode
- Move most jitter functionality into SwrJit namespace
- Avoid global "using namespace llvm" in headers

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:57:19 -05:00
Tim Rowley
b8a6f06c85 swr: [rasterizer core] generalize compute dispatch mechanism
Generalize compute dispatch mechanism to support other types of dispatches.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:57:13 -05:00
Tim Rowley
33a1a09eb0 swr: [rasterizer common] os.h portability header changes
- Fix conflict between windows MemoryFence and llvm::sys::MemoryFence
- Declare gettid()

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-03 09:56:47 -05:00
Ville Syrjälä
2fef0d108a anv/formats: Fix build on gcc-4 and earlier
gcc-4 and earlier don't allow compound literals where a constant
is required in -std=c99/gnu99 mode, so we can't use ISL_SWIZZLE()
when populating the anv_formats[] array. There are a few ways around
it: First one would be -std=c89/gnu89, but the rest of the code
depends on c99 so it's not really an option. The second option
would be to upgrade to gcc-5+ where the compiler behaviour was relaxed
a bit [1]. And the third option is just to avoid using compound
literals. I chose the last option since it keeps gcc-4 and earlier
working.

[1] https://gcc.gnu.org/gcc-5/porting_to.html

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 7ddb21708c ("intel/isl: Add an isl_swizzle structure and use it for isl_view swizzles")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-03 15:45:28 +03:00
Tapani Pälli
4d6d55deef egl: stop claiming support for pbuffer + msaa
This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test
and same crash in many dEQP EGL tests.

I also found that some Qt example did a workaround because of this
crash: https://bugreports.qt.io/browse/QTBUG-47509

v2: Ian pointed out that v1 removed support for all multisample
    configs, including window ones. This one removes pbuffer bit
    when adding configs, now only pbuffer+msaa gets rejected and
    window+msaa continues to work. Fixed also comment (Emil)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-03 07:56:44 +03:00
Timothy Arceri
eaf147cb46 i965: rename max_ds_* variable to max_tes_*
Using consistent naming allows us to create macros more easily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-03 15:29:58 +11:00
Timothy Arceri
b67633ce5e i965: rename max_hs_* variables to max_tcs_*
Using consistent naming allows us to create macros more easily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-03 15:29:51 +11:00
Kenneth Graunke
da274ba5f8 i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks.
There's an assert right above this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-02 14:49:20 -07:00
Timothy Arceri
024c207319 glsl: add missing headers to blob.h
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-10-02 13:48:06 +11:00
Jason Ekstrand
ef3c5ac7fb nir/spirv/cfg: Detect switch_break after loop_break/continue
While the current CFG code is valid in the case where a switch break also
happens to be a loop continue, it's a bit suboptimal.  Since hardware is
capable of handling the continue as a direct jump, it's better to use a
continue instruction when we can than to bother with all of the nasty
switch break lowering.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-01 15:40:34 -07:00
Jason Ekstrand
4d02faede5 nir/spirv/cfg: Handle switches whose break block is a loop continue
It is possible that the break block of a switch is actually the continue of
the loop containing the switch.  In this case, we need to identify the
break block as a continue and break out of current level of CFG handling.
If we don't, the continue portion of the loop will get handled twice, once
by following after the break and a second time by the loop handling code
handling it explicitly.

This fixes 6 of the new Vulkan CTS tests:
 - dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order*
 - dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order*

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-01 15:40:14 -07:00
Eric Engestrom
fc03ecfeaf nir/spirv: add spirv2nir binary to .gitignore
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-01 15:27:48 -07:00
Eric Engestrom
c867938044 nir/spirv: improve mmap() error handling
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-01 15:27:46 -07:00
Eric Engestrom
65c8cbe89d nir/spirv: improve lseek() error handling
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-01 15:27:44 -07:00
Eric Engestrom
23519a9de2 nir/spirv: add some error checking to open()
CovID: 1373369
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-01 15:27:31 -07:00
Timothy Arceri
913e0296f2 mesa: use uint32_t rather than unsigned for xfb struct members
These structs will be written to disk as part of the shader cache
so use uint32_t just to be safe.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-10-01 11:26:25 +10:00
Timothy Arceri
7064f8674a i915/i965: remove commented out warning
The warning was also the wrong location, it should have been
in the else.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-01 09:24:33 +10:00
Brian Paul
951bf44a56 mesa: move _mesa_valid_to_render() to api_validate.c
Almost all of the other drawing validation code is in api_validate.c
so put this function there as well.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-30 16:28:00 -06:00
Steven Toth
e99b9395be gallium/hud: Add support for CPU frequency monitoring
Detect all of the CPUs in the system. Expose metrics
for min, max and current frequency in Hz.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-30 15:18:46 -06:00
Marek Olšák
7b87190d2b Revert "gallium/hud: automatically print % if max_value == 100"
This reverts commit dbfeb0ec12.

With max_value being rounded to 100, it's often wrong.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-30 22:07:12 +02:00
Brian Paul
1d07552ba5 docs: update the list of Mesa major versions and API support
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-30 09:17:33 -06:00
Nicolai Hähnle
7bac5bf032 gallium/radeon: fix crash/regression in performance counters
Regression introduced by "gallium/radeon: zero all query buffers".

Cc: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-30 12:41:45 +02:00
Nicolai Hähnle
cfd870de70 gallium/radeon: update documentation of buffer_get_virtual_address
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-30 12:41:41 +02:00
Nicolai Hähnle
fd9f54223d gallium/radeon: emit relocations for query fences
This is only needed for r600 which doesn't have ARB_query_buffer_object and
therefore wouldn't really need the fences, but let's be optimistic about
filling in this feature gap eventually.

Cc: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-30 12:38:57 +02:00
Nicolai Hähnle
3e7cced4b9 radeon/uvd: adjust the buffer offset when relocation is used
We don't plan to use sub-allocated buffers with UVD, but just in case one
slips through, this increases the chances of things working out anyway.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-30 12:38:52 +02:00
Nicolai Hähnle
a48bf02d05 radeon/vce: adjust the buffer offset when relocation is used
We don't plan to use sub-allocated buffers with VCE, but just in case one
slips through, this increases the chances of things working out anyway.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-30 12:38:48 +02:00
Nicolai Hähnle
13cb41f666 radeon/video: don't use sub-allocated buffers
Cc: Christian König <christian.koenig@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97976
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-30 12:38:29 +02:00
Steven Toth
1d466b9b04 gallium/hud: Add power sensor support
Implement support for power based sensors, reporting units in
milli-watts and watts.

Also, minor cleanup - change the related if block to a switch.

Tested with two different power sensors, including the nouveau
'power1' sensors on a GTX950 card.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-29 17:51:15 -06:00
Samuel Pitoiset
3abe68b828 nv50/ir: teach insnCanLoad() about SHLADD
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.

This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.

GF100/GK104:

total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs    :396684 -> 396386 (-0.08%)
total local used in shared programs   :34416 -> 34416 (0.00%)

                local        gpr       inst      bytes
    helped           0         326        1105        1105
      hurt           0          55           3           3

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:50 +02:00
Samuel Pitoiset
115c79be10 nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:47 +02:00
Samuel Pitoiset
2e008be9a9 nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:44 +02:00
Samuel Pitoiset
e4eb0fca02 nv50/ir: optimize IMAD to SHLADD in presence of power of 2
Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:41 +02:00
Samuel Pitoiset
31545b64b8 nvc0/ir: add emission for SHLADD
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:36 +02:00
Samuel Pitoiset
85132c7453 nv50/ir: add preliminary support for SHLADD
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 21:20:30 +02:00
Samuel Pitoiset
652874754a nvc0: update GM107 sched control codes format
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-29 20:13:05 +02:00
Nicolai Hähnle
e4b585f009 gallium/radeon: use smaller buffers for query results
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:24:56 +02:00
Nicolai Hähnle
de84e99e45 gallium/radeon/winsyses: add radeon_winsys::min_alloc_size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:24:52 +02:00
Nicolai Hähnle
7a0e543836 radeonsi: enable ARB_query_buffer_object (v2)
v2: enable only when compute is available

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:15:00 +02:00
Nicolai Hähnle
15e2661137 gallium/radeon: implement get_query_result_resource (v2)
v2: fix a comment (Gustaw Smolarczyk)

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:54 +02:00
Nicolai Hähnle
2c9d546402 gallium/radeon: zero all query buffers
To ensure that fences are properly initialized.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:51 +02:00
Nicolai Hähnle
daeab0171d gallium/radeon: cleanup getting PIPE_QUERY_TIMESTAMP result
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:45 +02:00
Nicolai Hähnle
631c47384c gallium/radeon: add query fences and r600_get_hw_query_params
We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.

ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.

The new function provides summary information to be used by subsequent
patches.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:41 +02:00
Nicolai Hähnle
51b57a9b5a radeonsi: add save_qbo_state
Save compute shader state that will be used for the ARB_query_buffer_object
implementation.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:37 +02:00
Nicolai Hähnle
70f9ca2468 radeonsi: add si_get_shader_buffers/get_pipe_constant_buffers (v2)
These functions extract the pipe state structure from the current
descriptors, for state saving.

v2: correctly dereference *buf (Bas)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:33 +02:00
Nicolai Hähnle
8d45243e40 gallium/radeon: add r600_gfx_{write,wait}_fence
For bottom-of-pipe fences inside the gfx command stream.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:29 +02:00
Nicolai Hähnle
8e4de00930 gallium/radeon: add barrier_flags to r600_common_screen
There are driver-specific context flags for barriers that are not covered
by the Gallium barrier interfaces.

The R600 settings of these flags may not be optimal, but we're not going
to use them yet anyway.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-29 11:14:11 +02:00
Timothy Arceri
577e06095b glsl: remove remaining tabs from ast_type.cpp
Acked-by: Dave Airlie <airlied@redhat.com>
2016-09-29 11:06:12 +10:00
Timothy Arceri
222f66a812 glsl: remove remaining tabs from ast_to_hir.cpp
Acked-by: Dave Airlie <airlied@redhat.com>
2016-09-29 11:06:12 +10:00
Timothy Arceri
fc1d200bc7 glsl: remove remaining tabs from ast_array_index.cpp
Acked-by: Dave Airlie <airlied@redhat.com>
2016-09-29 11:06:12 +10:00
Timothy Arceri
b193c4d75b glsl: remove tabs from ast_expr.cpp
Acked-by: Dave Airlie <airlied@redhat.com>
2016-09-29 11:06:12 +10:00
Timothy Arceri
386045a3df glsl: remove tabs from linker.{cpp,h}
Acked-by: Dave Airlie <airlied@redhat.com>
2016-09-29 11:06:12 +10:00
Steven Toth
8c60bcb4c3 gallium/hud: Add support for block I/O, network I/O and lmsensor stats
V8: Feedback based on peer review
    convert if block into a switch
    Constify some func args

V7: Increase precision when measuring lmsensors volts
    Flatten patch series.

V6: Feedback based on peer review
    Simplify sensor initialization (arg passing).
    Constify some func args

V5: Feedback based on peer review
    Convert sprintf to snprintf
    Convert char * to const char *
    int arg converted to bool
    Func changes to take a filename vs a larger struct.
    Omit the space between '*' and the param name.

V4: Merged with master as of 2016/9/27 6pm

V3: Flatten the entire patchset ready for the ML

V2: Additional seperate patches based on feedback
a) configure.ac: Add a comment related to libsensors

b) HUD: Disable Block/NIC I/O stats by default.
Implement configuration option --enable-gallium-extra-hud=yes
and enable both statistics when this option is enabled.

c) Configure.ac: Minor cleanup to user visible configuration settings

d) Configure.ac: HUD stats - build system improvements
Move the -lsensors out of a deeper Makefile, bring it into the configure.ac.
Also, rename a compiler directive to more closely follow the standard.

V1: Initial release to the ML
Three new features:
1. Disk/block I/O device read/write stats MB/ps.
2. Network Interface RX/TX transfer statistics as a percentage
   of the overall NIC speed.
3. lmsensor power, voltage and temperature sensors.

The lmsensor changes makes a dependency on libsensors so support
for the change is opt out by default.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-28 16:18:05 -06:00
Ben Widawsky
29783c0887 i965: Remove useless (harmful) assertion
The code already skips doing the depth stall on gen >= 8, and as we
enable new platforms this assertion will fail needlessly. Instead of
changing the caller, make this simple change.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-28 09:42:53 -07:00
Eric Anholt
2a721b1b79 vc4: Emit perf debug when we fall back to quad clears. 2016-09-28 08:31:14 -07:00
Eric Anholt
1aa8a0392f nir: Optimize out discard_ifs with a constant 0 argument.
I found this in a shader that was doing an alpha test when alpha is fixed
at 1.0.

v2: Rebase on master (now the const value is "u32" not "u").

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2016-09-28 08:31:14 -07:00
Michel Dänzer
8d8c440ebf gallium/radeon: Initialize pipe_resource::next to NULL
Fixes lots of piglit tests crashing due to using uninitialized memory.

Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-28 10:39:22 +09:00
Timothy Arceri
3eb0baeecf glsl: don't crash when dumping shaders if some come from cache
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-28 10:43:15 +10:00
Timothy Arceri
87ab26b2ab glsl: Add initial functions to implement an on-disk cache
This code provides for an on-disk cache of objects. Objects are stored
and retrieved via names that are arbitrary 20-byte sequences,
(intended to be SHA-1 hashes of something identifying for the
content). The directory used for the cache can be specified by means
of environment variables in the following priority order:

	$MESA_GLSL_CACHE_DIR
	$XDG_CACHE_HOME/mesa
	<user-home-directory>/.cache/mesa

By default the cache will be limited to a maximum size of 1GB. The
environment variable:

	$MESA_GLSL_CACHE_MAX_SIZE

can be set (at the time of GL context creation) to choose some other
size. This variable is a number that can optionally be followed by
'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
gigabytes. By default, an unadorned value will be interpreted as
gigabytes.

The cache will be entirely disabled at runtime if the variable
MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.

Many thanks to Kristian Høgsberg <krh@bitplanet.net> for the initial
implementation of code that led to this patch. In particular, the idea
of using an mmapped file, (indexed by a portion of the SHA-1), for the
efficent implementation of cache_has_key was entirely his
idea. Kristian also provided some very helpful advice in discussions
regarding various race conditions to be avoided in this code.

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-28 09:16:31 +10:00
Chad Versace
44bcf1ffcc .gitignore: Ignore src/compiler/spirv2nir 2016-09-27 13:22:44 -07:00
Ian Romanick
ea6ed2379d glsl: Fix cut-and-paste bug in hierarchical visitor ir_expression::accept
At this point in the code, s must be visit_continue.  If the child
returned visit_stop, visit_stop is the only correct thing to return.

Found by inspection.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 12:06:46 -07:00
Ian Romanick
7f64041cee glsl: Add bit_xor builder
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 12:06:46 -07:00
Ian Romanick
5f7f7d582b glsl/standalone: Enable GLSL 4.00 through 4.50
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 12:06:46 -07:00
Ian Romanick
798d1b8816 glsl/standalone: Use API_OPENGL_CORE if the GLSL version is >= 1.40
Otherwise extensions to 1.40 that are only for core profile won't work.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 12:06:46 -07:00
Ian Romanick
afd99734db glsl: Update function parameter documentation for do_common_optimization
max_unroll_iterations was moved into options a long, long time ago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 12:06:46 -07:00
Tim Rowley
bacdd9ef4c configure.ac: add llvm inteljitevents component if enabled
Needed to successfully link llvmpipe or swr when using shared llvm libs
built with inteljitevents enabled.

v2: Make adding inteljitevents component global rather than just
llvmpipe/swr, since libgallium will have a symbol dependency.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-27 12:56:47 -05:00
Tim Rowley
50842e8a93 swr: replace gallium->swr format enum conversion
Replace old string comparison with a mapping table.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-09-27 12:55:26 -05:00
Nicolai Hähnle
4421c0fb0d gallium/radeon/winsyses: reduce the number of pb_cache buckets
Small buffers are now handled via the slabs code, so separate buckets in
pb_cache have become redundant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:41 +02:00
Nicolai Hähnle
fb827c055c winsys/radeon: enable buffer allocation from slabs
Only enable for chips with GPUVM, because older driver paths do not take the
required offset into account.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:37 +02:00
Nicolai Hähnle
a1e391e39d winsys/radeon: add fine-grained fences for slab buffers
Note the logic for adding fences is somewhat different than for amdgpu,
because radeon has no scheduler and we therefore have no guarantee about
the order in which submissions from multiple threads are processed.

(Ironically, this is only an issue when "multi-threaded submission" is
disabled, because "multi-threaded submission" actually means that all
submissions happen from a single thread that happens to be separate from
the application's threads. If we only supported "multi-threaded
submission", the fence handling could be simplified by adding the fences
in that thread where everything is serialized.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:34 +02:00
Nicolai Hähnle
0edebde9a4 winsys/radeon: add slab buffer list
Introducing radeon_bo::hash will reduce collisions between "real" buffers
and buffers from slabs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:32 +02:00
Nicolai Hähnle
cbb9c2f170 winsys/radeon: separate adding a buffer from updating its reloc data
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:29 +02:00
Nicolai Hähnle
a9e8672585 winsys/radeon: add slab entry structures to radeon_bo
Already adjust the map/unmap logic accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:25 +02:00
Nicolai Hähnle
ffa1c669dd winsys/amdgpu: enable buffer allocation from slabs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:23 +02:00
Nicolai Hähnle
a3832590c6 winsys/amdgpu: add fence and buffer list logic for slab allocated buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:20 +02:00
Nicolai Hähnle
a987e4377a winsys/amdgpu: add slab entry structures to amdgpu_winsys_bo
Already adjust amdgpu_bo_map/unmap accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:15 +02:00
Nicolai Hähnle
5af9eef719 winsys/amdgpu: do not synchronize unsynchronized buffers
When a buffer is added to a CS without the SYNCHRONIZED usage flag, we now
no longer add a dependency on the buffer's fence(s).

However, we still need to add a fence to the buffer during flush, so that
cache reclaim works correctly (and in the hypothetical case that the buffer
is later added to a CS _with_ the SYNCHRONIZED flag).

It is now possible that the submissions refererring to a buffer are no longer
linearly ordered, and so we may have to keep multiple fences around. We keep
the fences in a FIFO. It should usually stay quite short (# of contexts * 2,
for gfx + dma rings).

While we're at it, extract amdgpu_add_fence_dependency for a single buffer,
which will make adding the distinction between real buffer and slab cases
easier.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:11 +02:00
Nicolai Hähnle
6d89a40676 gallium/radeon: add RADEON_FLAG_HANDLE
When passed to winsys->buffer_create, this flag will indicate that we require
a buffer that maps 1:1 with a kernel buffer handle.

This is currently set for all textures, since textures can potentially be
exported to other processes. This is not a huge loss, since the main purpose
of this patch series is to deal with applications that allocate many small
buffers.

A hypothetical application with tons of tiny textures might still benefit
from not setting this flag, but that's not a use case I'm worried about
just now.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:05 +02:00
Nicolai Hähnle
e703f71ebd gallium/radeon: add RADEON_USAGE_SYNCHRONIZED
This is really the behavior we want most of the time, but having a
SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that
OR'ing different flags together always results in stronger guarantees.

The parent BOs of sub-allocated buffers will be added unsynchronized.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:02 +02:00
Nicolai Hähnle
84f156c0cb gallium/pipebuffer: add pb_slab utility
This is a simple framework for slab allocation from buffers that fits into
the buffer management scheme of the radeon and amdgpu winsyses where bufmgrs
aren't used.

The utility knows about different sized allocations and explicitly manages
reclaim of allocations that have pending fences. It manages all the free lists
but does not actually touch buffer objects directly, relying on callbacks for
that.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:44:42 +02:00
Nicolai Hähnle
b3ebc229dc gallium/u_math: add util_logbase2_ceil
For finding the exponent of the next power of two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:44:38 +02:00
Nicholas Bishop
c060f291c2 i915g: add dma-buf support to i915_drm_buffer_get_handle
The implementation of i915_drm_buffer_get_handle now handles
DRM_API_HANDLE_TYPE_FD in the same way that intel_winsys_import_handle
does, by calling drm_intel_bo_gem_create_from_prime.

Tested by successfully running Chrome's ozone_demo [1] with the
ozone-gbm backend on an Intel Pineview M machine. Without this change
it fails while trying to create a DMA-BUF.

[1] https://chromium.googlesource.com/chromium/src.git/+/master/ui/ozone/demo/ozone_demo.cc

Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
[Emil Velikov: Fix coding style]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-27 13:37:21 +01:00
Nicholas Bishop
aa560e8e63 st/dri: check pipe_screen->resource_get_handle() return value
Change dri2_query_image to check the return value of resource_get_handle
and return GL_FALSE if an error occurs.

For reference this is an example callstack that should propagate the
error back to the user:

    i915_drm_buffer_get_handle
    i915_texture_get_handle
    u_resource_get_handle_vtbl
    dri2_query_image
    gbm_dri_bo_get_fd
    gbm_bo_get_fd

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
[Emil Velikov: Split from larger patch, polish coding style, cc stable]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-27 13:37:21 +01:00
Nicholas Bishop
2d05ba2ca0 gbm: return appropriate error when queryImage() fails
Change gbm_dri_bo_get_fd to check the return value of queryImage and
return -1 (an invalid file descriptor) if an error occurs.

Update the comment for gbm_bo_get_fd to return -1, since (apart from the
above) we've already return -1 on error.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
[Emil Velikov: Split from larger patch, polish coding style, cc stable]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-27 13:37:21 +01:00
Andy Furniss
a599302227 st/va Avoid VBR bitrate calculation overflow v2
VBR bitrate calc needs 64 bits at high rates.

v2: use float.

Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2016-09-27 14:21:45 +02:00
Mark Thompson
a543f231d7 st/va: Fix vaSyncSurface with no outstanding operation
Fixes crash if the application doesn't do what the state tracker expects.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-27 14:21:44 +02:00
Timothy Arceri
df920367bf glsl: remove remaining tabs in glsl_parser_extras.h
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-09-27 20:32:47 +10:00
Ilia Mirkin
477cc0e085 st/mesa: enable ARB_ES3_2_compatibility when enough available
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 00:20:44 -04:00
Ilia Mirkin
67fbaa5873 st/mesa: enable GL_ANDROID_extension_pack_es31a when available
For now that's never since advanced blend hasn't been piped through.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 00:20:41 -04:00
Timothy Arceri
63e8221574 glsl: move some uniform linking code to new link_assign_uniform_storage()
This makes link_assign_uniform_locations() easier to follow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 11:29:05 +10:00
Timothy Arceri
ab67b6afdf glsl: move some uniform linking code to new link_setup_uniform_remap_tables()
This makes link_assign_uniform_locations() easier to follow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 11:29:05 +10:00
Timothy Arceri
856e0bd707 i965: create populate key functions for tcs and tes
These will be used by the on disk shader cache.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 11:11:15 +10:00
Timothy Arceri
ec75570415 i965: make gs key generation helper available to shader cache
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 11:11:15 +10:00
Timothy Arceri
481d8ec291 glsl: use reproducible name for lowered const arrays
Otherwise we can end up with mismatching names between the cached
binary and the cached metadata.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-27 11:11:15 +10:00
Carl Worth
017081a3e5 i965: make vs and fs key generation helpers available to shader cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
2016-09-27 11:11:15 +10:00
Carl Worth
f61669f997 glsl: Prepare standalone compiler to be able to use parameter lists
As part of the shader-cache work an upcoming change will add new
references to _mesa_add_parameter and _mesa_new_parameter_list from
the glsl code. To prepare for that, and to allow the standalone
glsl_compiler to still link, here we add mesa/program/prog_parameter.c
to the libglsl_util sources.

Then, in order to get *that* to work, we also add to stubs to
standalone_scaffolding:

	_mesa_program_state_flags
	_mesa_program_state_string

These functions aren't actually used by the two functions in
prog_parameter.c that we are actually calling. They are used in other
functions in the same file. So we don't care what the implementation
of these stubs is, (they won't be called by glsl_compiler). We just
need the stubs present so that it can link.

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-27 11:11:15 +10:00
Samuel Pitoiset
f24b517858 nv50/ir: fix comments about instructions info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-26 21:59:37 +02:00
Rob Clark
ecd6fce261 mesa/st: support lowering multi-planar YUV
Support multi-planar YUV for external EGLImage's (currently just in the
dma-buf import path) by lowering to multiple texture fetch's for each
plane and CSC in shader.

There was some discussion of alternative approaches for tracking the
additional UV or U/V planes:

  https://lists.freedesktop.org/archives/mesa-dev/2016-September/127832.html

They all seemed worse than pipe_resource::next

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-26 15:29:17 -04:00
Rob Clark
e0ec1c3134 mesa/st: add nir pass to lower tex_src_plane
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-26 15:29:17 -04:00
Rob Clark
c2a60cacd4 mesa/st: add lowering pass for YUV samplers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-26 15:29:17 -04:00
Sirisha Gandikota
8e3e9d74b5 aubinator: Fix the decoding of values that span two Dwords
Fixed the way the values that span two Dwords are decoded.
Based on the start and end indices of the field, the Dwords
are fetched and decoded accordingly.

v2: rename dw to qw in gen_field_iterator_next
and remove extra white space (Anuj)

v3: change all instances of dw to qw (Anuj)

Earlier, 64-bit fields (such as most pointers on Gen8+)
weren't decoded correctly.  gen_field_iterator_next seemed
to walk one DWord at a time, sets v.dw, and then passes it
to field(). So, even though field() takes a uint64_t, we're
passing it a uint32_t (which gets promoted, so the top 32
bits will always be zero). This seems pretty bogus... (Ken)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-26 11:18:52 -07:00
Samuel Pitoiset
ac859d68f4 nvc0: allow to force compiling programs in debug build
This adds a new envvar called NV50_PROG_CHIPSET which allows to
compile shaders with a different target, especially useful for
shader-db.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-26 19:39:04 +02:00
Samuel Pitoiset
e05042b367 nv50/ir: drop unused NVISA_XXX_CHIPSET constants
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-26 19:39:04 +02:00
Samuel Pitoiset
be0535b8c7 gallium/util: make use of strtol() in debug_get_num_option()
This allows to use hexadecimal numbers which are automatically
detected by strtol() when the base is 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-09-26 19:39:04 +02:00
Glenn Kennard
5da24242b3 r600g: Add support for PK2H/UP2H
Based off of Ilia's original patch, but with output values replicated so
that it matches the TGSI semantics.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-09-26 17:08:49 +02:00
Timothy Arceri
eb2dc04127 i965: stop passing stage as a function parameter
We already pass the shader so we can just get the stage from this.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-09-26 09:59:24 +10:00
Nayan Deshmukh
b3827819aa aubinator: fix resource leak
CovID: 1373370

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-25 12:32:48 -07:00
Emilio Cobos Álvarez
cb7c2c9d65 osmesa: Unbind the current context when given a null context and buffer.
This is needed to be consistent with other drivers.

Signed-off-by: Emilio Cobos Álvarez <me@emiliocobos.me>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-23 19:55:50 -06:00
Brian Paul
07d1f8faf9 st/mesa: small optimization in swizzle_swizzle()
Usually, there's no user-specified texture swizzle so we can optimize
the swizzle_swizzle() function and skip the loop/switch.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
1cdc232e1a st/mesa: fix swizzle issue in st_create_sampler_view_from_stobj()
Some demos, like Heaven, were creating and destroying a large number
of sampler views because of a swizzle issue.

Basically, we compute the sampler view's swizzle by examining the
texture format, user swizzle, depth mode, etc.  Later, during validation
we recompute that swizzle (in case something like depth mode changes)
and see if it matches the view's swizzle.

In the case of PIPE_FORMAT_RGTC2_UNORM, get_texture_format_swizzle
returned SWIZZLE_XYZW but the u_sampler_view_default_template() function
was setting the sampler view's swizzle to SWIZZLE_XY01.  This mismatch
caused the validation step to always "fail" so we'd destroy the old
sampler view and create a new one.

By removing the conditional, the sampler view's swizzle and the computed
texture swizzle match and validation "passes".  When creating a new sampler
view, we always want to use the texture swizzle which we just computed.

Fixes VMware issue 1733389.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
c0d7b6073d svga: set PIPE_BIND_DEPTH_STENCIL flag for new resources when possible
When we create a depth/stencil texture, also check if we can render to
it and set the PIPE_BIND_DEPTH_STENCIL flag.  We were previously doing
this for color textures (PIPE_BIND_RENDER_TARGET).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
f942a70340 svga: don't special case caps for SVGA3D_R32_FLOAT
This may have been needed years ago during development, but not now.
Prevents some regressions after introducing the next patch.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
14639cdf8f svga: use new adjust_z_layer() helper in svga_pipe_blit.c
To handle z/layer fix-ups for blitting and copying.  Note that we weren't
doing this properly in svga_blit() before.

Also, remove redundant stex, dtex assignments.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
c42000545d svga: simplify/improve the format compatibility check for region copies
The util_is_format_compatible() function didn't quite do what we wanted
for vgpu10.  This check is more flexible and allows copies between
formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
2ad4ba0727 svga: add const qualifier on svga_translate_format()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Brian Paul
4d04696524 svga: eliminate unneeded gotos in svga_validate_surface_view()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:54:42 -06:00
Neha Bhende
47f16f5e7f svga: disable srgb format related code from svga_blit()
With latest mesa and latest piglit tests srgb<->linear conversion
is not required as per GL4.4 rules

See commit b662c70aea.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-23 19:53:51 -06:00
Timothy Arceri
29c174a3e5 Revert "glsl: move xfb BufferStride into gl_transform_feedback_info"
This reverts commit f5a6aab403.

This broke some tests. It seems gl_transform_feedback_info gets memset
to 0 so we were losing the values in BufferStride before we used them.
2016-09-24 10:17:26 +10:00
Kenneth Graunke
943b69cddd glsl: Delete linker stuff relating to built-in functions.
Now that we generate built-in functions inline, there's no need to link
against the built-in shader, and no built-in prototypes to consider.

This lets us delete a bunch of code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
2016-09-23 16:40:40 -07:00
Kenneth Graunke
f7a5c714b3 glsl: Delete ftransform support from builtin_functions.cpp.
This is now handled directly by ast_function.cpp.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
2016-09-23 16:40:40 -07:00
Kenneth Graunke
b04ef3c08a glsl: Immediately inline built-ins rather than generating calls.
In the past, we imported the prototypes of built-in functions, generated
calls to those, and waited until link time to resolve the calls and
import the actual code for the built-in functions.

This severely limited our compile-time optimization opportunities: even
trivial functions like dot() were represented as function calls.  We
also had no way of reasoning about those calls; they could have been
1,000 line functions with side-effects for all we knew.

Practically all built-in functions are trivial translations to
ir_expression opcodes, so it makes sense to just generate those inline.
Since we eventually inline all functions anyway, we may as well just do
it for all built-in functions.

There's only one snag: built-in functions that refer to built-in global
variables need those remapped to the variables in the shader being
compiled, rather than the ones in the built-in shader.  Currently,
ftransform() is the only function matching those criteria, so it seemed
easier to just make it a special case.

On Skylake:

total instructions in shared programs: 12023491 -> 12024010 (0.00%)
instructions in affected programs: 77595 -> 78114 (0.67%)
helped: 97
HURT: 309

total cycles in shared programs: 137239044 -> 137295498 (0.04%)
cycles in affected programs: 16714026 -> 16770480 (0.34%)
helped: 4663
HURT: 4923

while these statistics are in the wrong direction, the number of
hurt programs is small (309 / 41282 = 0.75%), and I don't think
anything can be done about it.  A change like this significantly
alters the order in which optimizations are performed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
2016-09-23 16:40:40 -07:00
Kenneth Graunke
1617f59bc6 glsl: Check TCS barrier restrictions at ast_to_hir time, not link time.
We want to check prior to optimization - otherwise we might fail to
detect cases where barrier() is in control flow which is always taken
(and therefore gets optimized away).

We don't currently loop unroll if there are function calls inside;
otherwise we might have a problem detecting barrier() in loops that
get unrolled as well.

Tapani's switch handling code adds a loop around switch statements, so
even with the mess of if ladders, we'll properly reject it.

Enforcing these rules at compile time makes more sense more sense than
link time.  Doing it at ast-to-hir time (rather than as an IR pass)
allows us to emit an error message with proper line numbers.
(Otherwise, I would have preferred the IR pass...)

Fixes spec/arb_tessellation_shader/compiler/barrier-switch-always.tesc.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
2016-09-23 16:40:40 -07:00
Timothy Arceri
f5a6aab403 glsl: move xfb BufferStride into gl_transform_feedback_info
It makes more sense to have this here where we store the other values
from xfb qualifiers. The struct it was previously part of is now only
used to store values that come from the api.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-09-24 09:18:29 +10:00
Dylan Baker
85e9bbc14d Revert "mapi: export all GLES 3.2 functions in libGLESv2.so"
This reverts commit e66a2b879b.

Which breaks the scons build in an interesting way, particularly when
BlendBarrier and PrimitiveBoundingBox are added to static_data.py's
functions list. This seems to be related to the fact that the unsuffixed
names are only in GLES3.2, but Desktop GL only has suffixed versions.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2016-09-23 12:13:13 -07:00
Adam Jackson
8ce2afe776 i965: Enable EGL_KHR_gl_texture_3D_image
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-23 06:53:21 -04:00
Adam Jackson
5981366b9f i915: Enable EGL_KHR_gl_texture_3D_image
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-23 06:53:17 -04:00
Nicolas Koch
f17948a30a anv: Check for VK_WHOLE_SIZE in anv_CmdFillBuffer
From the Vulkan spec:

   Size is the number of bytes to fill, and must be either a multiple of 4,
   or VK_WHOLE_SIZE to fill the range from offset to the end of the buffer.
   If VK_WHOLE_SIZE is used and the remaining size of the buffer is not a
   multiple of 4, then the nearest smaller multiple is used.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-23 00:20:16 -07:00
Lionel Landwerlin
6b21728c4a anv: get rid of duplicated values from gen_device_info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-23 10:12:06 +03:00
Lionel Landwerlin
94d0e7dc08 i965: get rid of duplicated values from gen_device_info
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-23 10:12:06 +03:00
Lionel Landwerlin
bc24590f0c intel/i965: make gen_device_info mutable
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).

This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-23 10:11:59 +03:00
Timothy Arceri
e60928f4c4 gallium: remove unused PIPE_CC_GCC_VERSION
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-09-23 16:18:21 +10:00
Timothy Arceri
4eb0e90c6b util: remove Sun C Compiler support
Support for this compiler was dropped in 51564f04b7

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-09-23 16:17:16 +10:00
Ilia Mirkin
c0a7e931e3 st/mesa: turn on OES_viewport_array when dependencies are met
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-22 20:42:30 -04:00
Ilia Mirkin
0f01aa8033 mesa: add implementations for new float depth functions
This just up-converts them to doubles. Not great, but this is what all
the other variants also do.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-22 20:42:30 -04:00
Ilia Mirkin
381b15dc20 mesa: move ARB_viewport_array params to a GLES 3.1-accessible section
This is needed for GL_OES_viewport_array.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-22 20:42:30 -04:00
Ilia Mirkin
5644a90801 mesa: add GL_OES_viewport_array to the extension string
The expectation is that drivers will set this based on
OES_geometry_shader and ARB_viewport_array support. This is a separate
enable on the same reasoning as for OES_texture_cube_map_array.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-22 20:42:30 -04:00
Ilia Mirkin
70aef97f9e glsl: add OES_viewport_array enables and use them to expose gl_ViewportIndex
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-22 20:42:30 -04:00
Ilia Mirkin
411a72d4a2 mesa: add new entrypoints for GL_OES_viewport_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-22 20:42:30 -04:00
Dylan Baker
e66a2b879b mapi: export all GLES 3.2 functions in libGLESv2.so
See commit 5921f372c8 for the rational of
this commit.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-22 16:01:40 -07:00
Dylan Baker
ce83e36ec0 mapi: sort static_data.py functions
Sorted by vim's builtin "sort i" (keeping the sorting case insensitive)

v2:
 - uses case insensitive sorting (Ken)

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-22 15:29:27 -07:00
Dylan Baker
2fd51cf8ca mapi: retab static_data.py to be consistent
This file currently uses a mixture of 3 and 4 space indent. I have
changed it all to 4 space indent, matching the settings in
$ROOT/.editorconfig.

This was generated with sed:
sed -i -e 's@^   "@    "@g'

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-22 15:28:44 -07:00
Lionel Landwerlin
9adfa695ac spirv: fix AtomicLoad/Store on images
OpAtomicLoad/Store should have pointer to images just like the rest of the
atomic operators. These couple of lines were poorly copied from the
ssbo/shared_vars cases (the only ones currently tests by the CTS).

Fixes 2afb950161 ("spirv/nir: Add support for OpAtomicLoad/Store")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-22 14:08:21 +03:00
Eric Anholt
36f0f03182 nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.

This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise.  However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.

For now, just turn on aggressive flattening on vc4.  i965 will need some
tuning to avoid regressions.  It does looks like this may be useful to
replace freedreno code.

Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.

vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs:     17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs:     3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs:     26085 -> 24649 (-5.51%)

v2: Update shader-db output.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2016-09-22 11:10:21 +03:00
Kenneth Graunke
6c648cdac8 docs: Mark ES 3.2 "all done" for i965/gen9+. 2016-09-21 11:52:59 -07:00
Kenneth Graunke
a4fbc73ee8 docs: Add ES 3.2 to release notes. 2016-09-21 11:52:59 -07:00
Brian Paul
b35684543e gallium/util: add comment on util_is_format_compatible()
From reading the code, it's not obvious what is src/dest compatible.
The list of a->b copy-compatible formats comes from Jose's original
check-in message, with some format name updates.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-21 12:26:17 -06:00
Brian Paul
99d9f764b2 svga: minor simplification in svga_validate_surface_view()
Get rid of unneeded local var.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-21 12:23:45 -06:00
Brian Paul
1cc7a76d73 svga: remove disable_shader debug variable
Never used, AFAIK.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-21 12:23:45 -06:00
Kenneth Graunke
a53da57d5a i965: Enable ES 3.2 on Skylake.
It's already advertised because the version.c extension checks are
fulfilled, but we didn't actually claim support, so trying to create
a ES 3.2 context would fail.

It's all done, and the CTS results look good, so let's turn it on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-09-21 10:51:58 -07:00
Jason Ekstrand
d2f42a945e nir/spirv/glsl450: Add support for the InterpolateAt opcodes
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-21 05:39:06 -07:00
Jason Ekstrand
a529644889 nir/spirv: Claim support for SampleRateShading
We already support all of the decorations that require this capability.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-21 05:39:06 -07:00
Jason Ekstrand
7c48622581 nir/spirv: Bring back the spirv2nir helper binary
This was something that I wrote in the early days of the spirv_to_nir code
but deleted once we had a real driver.  However, in the absence of a
shader_runner equivalent, it's extremely useful for debugging the
spirv_to_nir code so let's bring it back.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-21 05:38:26 -07:00
Chuanbo Weng
e4648ba8dd i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.
Implement querying this attribute in intelImageExtension and bump
version of intelImageExtension.

Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-21 12:19:19 +01:00
Chuanbo Weng
9e8de866f7 egl: return corresponding offset of EGLImage instead of 0.
The offset should not always be 0. For example, if EGLImage is
created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the
offset should be the actual start of miplevel 1 in bo.

v2: Add version check of __DRIimageExtension implementation
(Suggested by Axel Davy).

v3: Don't add version check of __DRIimageExtension implementation.
Set the offset only when queryImage() succeeds. (Suggested by Emil
Velikov)

Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
[Emil Velikov: coding style fixes]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-21 12:19:19 +01:00
Chuanbo Weng
1ceb775d57 dri: add offset attribute and bump version of EGLImage extensions.
Offset is useful for buffer sharing with other components, so add
it to queryImage attributes.

Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-21 12:19:19 +01:00
Francisco Jerez
e5311ba1ac i965/ir: Test thread dispatch packing assumptions.
Not [originally] intended for upstream.  Should cause a GPU hang if
some thread is executed with a non-contiguous dispatch mask breaking
assumptions of brw_stage_has_packed_dispatch().  Doesn't cause any
CTS, DEQP or Piglit regressions, while replacing
brw_stage_has_packed_dispatch() with a dummy implementation that
unconditionally returns true on top of this patch causes multiple GPU
hangs.

v2: Refactor into a separate function instead of emitting the test
    code directly from emit_nir_code(), drop VEC4 test and clean up
    slightly for upstream. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-21 13:45:46 +03:00
Francisco Jerez
c05a4f11a0 i965/ir: Pass identity mask to brw_find_live_channel() in the packed dispatch case.
This avoids emitting a few extra instructions required to take the
dispatch mask into account when it's known to be tightly packed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-21 13:45:46 +03:00
Francisco Jerez
f57f526fc5 i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread dispatch.
The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL.  This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions.  Disable the optimization
in that case.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
    implement hardware generation check.
2016-09-21 13:45:46 +03:00
Jason Ekstrand
8a468d186e i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNEL
On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes.  It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1.  In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage.  Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Francisco Jerez <currojerez@riseup.net>
[ Francisco Jerez: Fix a couple of typos, add mask register type
  assertion, clarify reason why ce0 can have bits set for disabled
  channels, clarify that this may only be a problem when thread
  dispatch doesn't pack channels tightly in the SIMD thread.  Apply
  same treatment to Align16 path. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-21 13:45:45 +03:00
Jason Ekstrand
a2392cee48 i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg
The state register sr0 is really a collection of dwords not a SIMD8
anything.  It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
[ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-21 13:45:45 +03:00
Lionel Landwerlin
b8162d6b6e anv: pipeline: use correct number of thread for compute
Reproduces this commit :

commit 0fb85ac08d
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Mon Jun 6 21:37:34 2016 -0700

    i965: Use the correct number of threads for compute shaders.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-21 12:01:06 +03:00
Lionel Landwerlin
f2d43b44d7 anv: allocator: correct scratch space for haswell
This reproduces this commit :

commit 2213ffdb4b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Mon Jun 6 21:37:34 2016 -0700

    i965: Allocate scratch space for the maximum number of compute threads.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-21 12:01:06 +03:00
Lionel Landwerlin
09394ee6cf anv: device: calculate compute thread numbers using subslices numbers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-21 12:01:06 +03:00
Nicolai Hähnle
1f291369e4 gallivm: support negation on 64-bit integers
This should be analogous to 32-bit integers.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:50 +02:00
Dave Airlie
4207612f9c radeonsi: prepare 64-bit integer support. (v2)
v2:
- no PIPE_CAP_INT64 yet
- emit DIV/MOD without the divide-by-zero workaround

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:38 +02:00
Dave Airlie
5561a37710 gallivm/llvmpipe: prepare support for ARB_gpu_shader_int64.
This enables 64-bit integer support in gallivm and
llvmpipe.

v2: add conversion opcodes.
v3:
- PIPE_CAP_INT64 is not there yet
- restrict DIV/MOD defaults to the CPU, as for 32 bits
- TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:30 +02:00
Dave Airlie
6b26039da3 tgsi/softpipe: prepare ARB_gpu_shader_int64 support. (v3)
This adds all the opcodes to tgsi_exec for softpipe to use.

v2: add conversion opcodes.
v3:
- no PIPE_CAP_INT64 yet
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:11 +02:00
Dave Airlie
3985e6c044 gallium/tgsi: add support for 64-bit integer immediates.
This adds support to TGSI for 64-bit integer immediates.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-21 10:23:55 +02:00
Dave Airlie
6e1a34d545 gallium: add opcode and types for 64-bit integers. (v3)
This just adds the basic support for 64-bit opcodes,
and the new types.

v2: add conversion opcodes.
add documentation.
v3:
- make docs more consistent
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:23:05 +02:00
Kenneth Graunke
9694b23f66 i965: Rename intelScreen to screen.
"intelScreen" is wordy and also doesn't fit our style guidelines.
"screen" is shorter, which is nice, because we use it fairly often.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-20 20:08:20 -07:00
Kenneth Graunke
8fec9fbb9f i965: Rename __DRIScreen pointers to "dri_screen".
I want to use "screen" as the variable name for a struct intel_screen
pointer.  This means that we can't use it for __DRIscreen pointers.

Sometimes we called it "screen", sometimes "sPriv", sometimes
"driScrnPriv", and sometimes "psp" (Pointer to Screen Private?).
The last one is particularly confusing because we use "psp" to refer to
the Gen4 PIPELINED_STATE_POINTERS packet as well.

Let's be consistent.  "dri_screen" is clear, and it's not used often
enough that I'm worried about the verbosity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-20 20:08:12 -07:00
Dylan Baker
d4bf9baa43 mesa: Implement ARB_shader_viewport_layer_array for i965
This extension is a combination of AMD_vertex_shader_viewport_index and
AMD_vertex_shader_layer, making it rather trivial to implement.

For gallium I *think* this needs a new cap because of the addition of
support in tessellation evaluation shaders, and since I don't have any
hardware to test it on, I've left that for someone else to wire up.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-20 16:23:04 -07:00
Leo Liu
956f3e3bcd radeon/vce: add firmware support for version 52.8.3
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-20 15:58:56 -04:00
Indrajit Das
f9311265bf st/omx/dec/h265: Correct the timestamping
(derived from commit 3b6bda665a)

v2: fix the tabs(Leo)

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nishanth Peethambaran <nishanth.peethambaran@amd.com>
Signed-off-by: Indrajit Das <indrajit-kumar.das@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-09-20 15:58:56 -04:00
Lionel Landwerlin
792d77165b aubinator: add a custom handler for immediate register load
Transforming this :

0x00c77084:  0x11000001:  MI_LOAD_REGISTER_IMM
0x00c77088:  0x0000b020 : Dword 1
    Register Offset: 0x0000b020
    0x00c7708c:  0x00880038 : Dword 2
    Data DWord: 8912952

Into this:

0x007880f0:  0x11000001:  MI_LOAD_REGISTER_IMM
0x007880f4:  0x0000b020 : Dword 1
    Register Offset: 0x0000b020
    0x007880f8:  0x00080040 : Dword 2
    Data DWord: 524352
register L3CNTLREG2 (0xb020) : 0x80040
    SLM Enable: 0
    URB Allocation: 32
    URB Low Bandwidth: 0
    RO Allocation: 32
    RO Low Bandwidth: 0
    DC Allocation: 0
    DC Low Bandwidth: 0

v2: Drop unused arguments (Sirisha)
    Print out register name

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-09-20 10:47:21 +01:00
Nayan Deshmukh
0301858a31 st/va: flush the context before calling flush_frontbuffer(v2)
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function

v2: change comment style

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-20 11:18:29 +02:00
Nayan Deshmukh
e4cc2276c1 st/vdpau: flush the context before calling flush_frontbuffer
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-20 11:18:07 +02:00
Nayan Deshmukh
853e80f5a0 vl/dri3: handle the case of different GPU(v4.2)
In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
    a seprate linear buffer is used (Michel)
v4.1: remove empty line
v4.2: destroy the context and handle the case when
      create_context fails (Emil)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-20 11:17:02 +02:00
Ilia Mirkin
40d787ab05 st/vdpau: fix argument type to vlVdpOutputSurfaceDMABuf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-20 11:13:05 +02:00
Tim Rowley
92ec820244 swr: [rasterizer core] Better thread destruction
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-09-19 20:10:19 -05:00
Tim Rowley
fdf2890423 swr: [rasterizer jitter] Fix missing end-of-file newline
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-09-19 20:10:19 -05:00
Tim Rowley
2f86a9577a swr: [rasterizer core] Add macros for mapping ArchRast to buckets
Switch all RDTSC_START/STOP macros to use AR_BEGIN/END macros.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-09-19 20:10:19 -05:00
Kenneth Graunke
04026b43c8 glsl: Skip "unsized arrays aren't allowed" check for TCS/TES/GS vars.
Fixes ESEXT-CTS.draw_elements_base_vertex_tests.AEP_shader_stages and
ESEXT-CTS.texture_cube_map_array.texture_size_tesselation_con_sh.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-19 12:01:11 -07:00
Samuel Pitoiset
6ed05fa4cb nvc0: get rid of nvc0_stage_sampler_states_bind_range()
Same thing as nvc0_stage_set_sampler_views_range().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-19 20:03:24 +02:00
Samuel Pitoiset
407948df1b nvc0: get rid of nvc0_stage_set_sampler_views_range()
This function was quite similar to nvc0_stage_set_sampler_views()
and I don't see any reasons to not remove it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-19 20:03:20 +02:00
Samuel Pitoiset
557a29b51f nv50/ir: optimize SUB(a, b) to MOV(a - b)
This helps shaders in UE4 demos, especially with Elemental
(+1% perf). This optimization reduces spilling usage in one
shader which explains the little gain.

GF100/GK104:

total instructions in shared programs :2838551 -> 2838045 (-0.02%)
total gprs used in shared programs    :396706 -> 396684 (-0.01%)
total local used in shared programs   :34432 -> 34416 (-0.05%)

                local        gpr       inst      bytes
    helped           1          19         112         112
      hurt           0           0           0           0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-18 16:42:39 +02:00
Samuel Pitoiset
d8b4f5fcca gk110/ir: fix wrong emission of OP_NOT
This should emit src0 instead of src1.
Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-09-18 16:42:33 +02:00
Martina Kollarova
15804c4b90 r600g/sb: fix struct/class declaration conflicts
A couple of forward-declarations were causing warnings in clang:
    'value' defined as a class here but previously declared as a struct
    [-Wmismatched-tags]

Signed-off-by: Martina Kollarova <martina.kollarova@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-09-18 09:23:42 +02:00
Eric Anholt
073129c7af i965: Drop assertion about buffer offset at draw time.
Given robust access, we should just be returning zeroes if the user gives
us a base pointer that's too big, which is what was happens on a release
build.  This was caught by a webgl conformance test for out-of-bounds
draws on servo.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-17 17:48:16 +01:00
Lars Hamre
ddd6116e32 tgsi: Enable returns from within loops
Fixes the following piglit test (for softpipe):
/spec/glsl-1.10/execution/fs-loop-return

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:13 -06:00
Charmaine Lee
8a6391477e svga: relax restriction of compressed formats for texture upload
This patch relaxes the restriction of compressed formats for texture
upload buffer. For now, 3D texture with compressed format
is still not supported in the texture upload buffer path.

As Brian noted, ETQW does many texture updates with glCompressedTexSubImage.
This patch greatly improves the performance of the ETQW trace.

Tested with ETQW, MTT piglit, glretrace, conform, viewperf

v2: Per Brian's suggestion, removed the subregion boundary check.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:13 -06:00
Brian Paul
15dee0fc1d svga: skip query flush if we already have the query result
This reduces the number of times we flush in some situations (the
arbocclude demo is one trivial example).

Tested with Piglit, ETQW, Sauerbraten, arbocclude.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-17 10:24:13 -06:00
Brian Paul
c71e82b8e9 svga: remove unneeded svga_context_flush() in svga_end_query()
Since commit 99d8fe20ab we don't have to flush the command buffer when
we end a query.

Tested with Piglit, Sauerbraten, arbocclude, ETQW (noticably faster now).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-09-17 10:24:13 -06:00
Charmaine Lee
f1b3374d28 svga: use upload buffer for upload texture.
With this patch, when running with vgpu10, instead of mapping directly to the
guest backed memory for texture update, we'll use the texture upload buffer
and use the transfer from buffer command to update the host side texture memory.

This optimization yields about 20% performance improvement with
Lightsmark2008 and about 40% with Tropics.

Tested with Lightsmark2008, Tropics, Heaven, MTT piglit, glretrace, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:13 -06:00
Charmaine Lee
a9c4a861d5 svga: refactor svga_texture_transfer_map/unmap functions
Split the functions into separate functions for dma and direct map to make
the code more readable.

Tested with MTT piglit, glretrace, viewperf, conform, various OpenGL apps

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:12 -06:00
Charmaine Lee
c8ef82d65a svga: add SVGA3d_vgpu10_TransferFromBuffer()
Also add the corresponding dump function to dump the TransferFromBuffer command.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:12 -06:00
Charmaine Lee
2a4b019239 svga: single sample surface can be created as non-multisamples surface
With this patch, single sample surface will be created as non-multisamples
surface.

Tested with piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:12 -06:00
Charmaine Lee
5947d90830 svga: fix memory leak with sampler state
This patch fixes a memory leak with sampler state when piglit
is run with HW version 11. Sampler state clean up was incorrectly skipped
in svga_cleanup_sampler_state() for vgpu9.

Tested with piglit.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:12 -06:00
Brian Paul
12689efbbe svga: fix prim type check/assignment in translate_indices()
Left over test code spotted by Sinclair.

Tested with piglit, Google Earth, Lightsmark, Heaven4, glretraces, etc.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-09-17 10:09:00 -06:00
Charmaine Lee
50359ddb5d svga: use SVGA3D_QUERYTYPE_MAX for svga query type check
Use SVGA3D_QUERYTYPE_MAX instead of SVGA_QUERY_MAX for
svga query type check.

Tested with various OpenGL apps with GALLIUM_HUD set.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:09:00 -06:00
Charmaine Lee
ee39814d90 svga: split the num-resources-mapped hud to textures & buffers
Replace the num-resources-mapped hud with
num-textures-mapped and num-buffers-mapped, so we can
differentiate the map counts for these two different resources.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:09:00 -06:00
Charmaine Lee
f168c886c9 svga: change svga hud defines to enums
This will make it easier to add new hud types.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:09:00 -06:00
Brian Paul
4f74b379aa svga: implement an index buffer translation cache
Some OpenGL apps, like Cinebench R15, have many glDrawElements(GL_QUADS)
calls.  Since we don't directly support quads we have to convert these
calls into GL_TRIANGLES which involves generating a new index buffer.

This patch saves the new/translated index buffer in the hope that it
can be reused for a later draw call.

Cinebench R15 increases by about 20% with this change.
The NobelClinician Viewer app also hits this code.
Tested with full piglit run.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-17 10:09:00 -06:00
Brian Paul
581292a78c svga: try to emit fewer buffer rebind commands
If a consecutive sequence of drawing commands references the same
vertex/index buffers, there should be no need to rebind the surfaces
for the second and subsequent drawing commands.

Apps that use multiple display lists benefit from this since the vertex
data for several display lists is often stored in one buffer.

In the case of the legacy E&S Glaze demo, this reduces the size of our
command buffers from 91KB to 44KB.  One WSI Fusion trace shows a 33%
reduction in command buffer sizes.

Tested with full piglit run.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-17 10:09:00 -06:00
Brian Paul
ee5f5e2269 svga: reduce unmapping/remapping of the default constant buffer
Previously, every time we put shader constants into the default constant
buffer we called u_upload_alloc(), which mapped the buffer, and
u_upload_unmap().  We had to unmap the buffer before calling
svga_buffer_handle() to get the winsys handle for the buffer.  But we
really only need to do that the first time we reference the const buffer.
Now we try to keep the upload manager's buffer mapped until we fill it or
flush the command buffer.

v2: add additional comment on the buffer unmapping code in
svga_context_flush(), per Charmaine.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-17 10:09:00 -06:00
Brian Paul
ce3b34b727 svga: optimize memcpy() in svga_buffer_update_hw()
When we migrate a buffer from sw/malloc storage to a hardware buffer,
don't memcpy the whole buffer, just copy the part we've written to.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-09-17 10:08:59 -06:00
Neha Bhende
b7bee25052 svga: Use comparison between svga texture types to use PredCopyRegion command
PredCopyRegion support copy between same type of textures.
Instead of comparing src and dst pipe texture type, compare svga texture
type which can avoid some software fallback.
for example, it avoids a software blit with the Redway3D Aston demo.

Tested piglit tests on VGPU9 and  VGPU10 on GL/DX11Renderer, Redway3D Aston demo

v2: some nit pick suggested by Charmaine.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:08:59 -06:00
Neha Bhende
b9f333cc81 svga: Add function svga_resource_type()
This function returns svga texture type for corresponding pipe texture.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:08:59 -06:00
Samuel Pitoiset
50baaf6bc6 nvc0/ir: fix subops for IMAD
Offset was wrong, it's at bit 8, not 4. Also, uses subr instead
of sub when src2 has neg. Similar to GK110 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-09-17 17:42:45 +02:00
Samuel Pitoiset
9b8b69b3c4 nvc0/ir: fix comments about instructions info
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-17 17:42:40 +02:00
Kenneth Graunke
eaacb27812 mesa: Move buffers-unmapped earlier in check_valid_to_render().
This needs to be above the switch on API, as that can return true
(valid to render) before this error check even had a chance to run.

Fixes ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos,
which worked before commit 72f1566f90.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-09-16 19:42:56 -07:00
Kenneth Graunke
6b0ba02cae mesa: Expose GL_CONTEXT_FLAGS in ES 3.2.
Fixes four ES32-CTS.context_flags.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-16 18:55:38 -07:00
Tom Stellard
91ec6e5664 radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v3
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:

https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md

The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.

The main changes in this patch are:
  - We now pass the scratch buffer resource into the shader via user sgprs
    rather than using relocations.
  - Grid/Block sizes are now passed to the shader via the dispatch packet
    rather than at the beginning of the kernel arguments.

Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically.  However, in Mesa we let the driver do
this work.  The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.

v2:
  - Add comments explaining why we are setting certain bits of the scratch
    resource descriptor.

v3:
  - Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-16 23:07:10 +00:00
Tom Stellard
a2b8346fa6 radeonsi/compute: Add some more debug printfs 2016-09-16 22:51:06 +00:00
Marek Olšák
ae0a4a1299 glsl: remove interpolateAt* instructions for demoted inputs
This fixes 8 fs-interpolateat* piglit crashes on radeonsi, because it can't
handle non-input operands in interpolateAt*.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-16 22:35:08 +02:00
Marek Olšák
d58a3906cb mesa: fix glGetFramebufferAttachmentParameteriv w/ on-demand FRONT_BACK alloc
This fixes 66 CTS tests on st/mesa.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-16 22:35:08 +02:00
Serge Martin
1c8d4c694b clover: fix getting scalar args api size
This fix getting the size of a struct arg. vec3 types still work ok.
Only buit-in args need to have power of two alignment, getTypeAllocSize
reports the correct size in all cases.

Acked-by: Francisco Jerez <currojerez@riseup.net>
2016-09-16 22:09:47 +02:00
Ilia Mirkin
f65187bb93 docs: add GL_ARB_gl_spirv to features list
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-16 12:04:12 -04:00
Rob Clark
ba8a50955d ttn: fix warning after 7bf76563e
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-16 11:55:26 -04:00
Brian Paul
702ff0b9a0 gallium/docs: document alpha_to_coverage and alpha_to_one blend state
The gallium interface defines these like DX10.  Note that OpenGL ignores
these options if MSAA is disabled or the dest buffer doesn't support
MSAA.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-16 08:44:26 -06:00
Brian Paul
187c278121 st/mesa: update comment in st_atom_msaa.c
The old comment was a copy and paste mistake.  Indent another comment.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-16 08:44:26 -06:00
Brian Paul
a01872f808 st/mesa: only enable MSAA coverage options when we have a MSAA buffer
Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
we should not set the alpha_to_coverage or alpha_to_one flags if the
current drawing buffer does not do MSAA.

This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.

ETQW is a game that enables GL_SAMPLE_ALPHA_TO_COVERAGE without MSAA.
Shrubs along the side of roads were invisible because fragments with
alpha < 0.5 were being discarded (zero coverage).

v2: remove ctx->DrawBuffer != NULL check.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-16 08:44:12 -06:00
Dave Airlie
e1ea36ae71 spirv: use subpass image type (v1.1)
This adds support for the input attachments subpass type
to the SPIRV->NIR pass.

v1.1: drop handling from vtn_handle_texture

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-16 15:16:31 +10:00
Dave Airlie
7bf76563e2 glsl: add subpass image type (v2)
SPIR-V/Vulkan have a special image type for input attachments
called the subpass type. It has different characteristics than
other images types.

The main one being it can only be an input image to fragment
shaders and loads from it are relative to the frag coord.

This adds support for it to the GLSL types. Unfortunately
we've run out of space in the sampler dim in types, so we
need to use another bit.

v2: Fixup subpass input name (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-16 15:16:31 +10:00
Kenneth Graunke
081f21f29b isl: Finish tiling filtering for Gen6.
Gen6 only has one additional restriction over Gen7+, so we just add it
to the existing gen7 function (which actually covers later gens too).

This should stop FINISHME spew when running GL on Sandybridge.

v2: Fix bytes per block vs. bits per block confusion (Jason) and
    rename function to gen6_filter_tiling (Jason and Chad).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-15 21:21:50 -07:00
Ilia Mirkin
9fec15a7e0 i965: enable ARB_ES3_2_compatibility on gen8+
Note that ASTC support is not actually mandated for this extension to be
exposed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 19:29:41 -04:00
Jason Ekstrand
111f6b250d i965/nir: Roll set_default_interpolation into lower_fs_inputs
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand
246db0063e i965/fs: Use NIR for handling forced per-sample interpolation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand
ed65e6ef49 nir: Add a flag to lower_io to force "sample" interpolation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Jason Ekstrand
114874b22b i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode
From the ARB_gpu_shader5 spec:

   The built-in functions interpolateAtCentroid() and interpolateAtSample()
   will sample variables as though they were declared with the "centroid"
   or "sample" qualifiers, respectively.

When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample".  In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.

Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:27 -07:00
Brian Paul
0d2eb8c14d mesa: check for no matrix change in _mesa_LoadMatrixf()
Some apps issue redundant glLoadMatrixf() calls with the same matrix.
Try to avoid setting dirty state in that situation.

This reduces the number of constant buffer updates by about half in
ET Quake Wars.

Tested with Piglit, ETQW, Sauerbraten, Google Earth, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 12:00:12 -06:00
Jon Turney
533b3530c1 direct-to-native-GL for GLX clients on Cygwin ("Windows-DRI")
Structurally, this is very similar to the existing Apple-DRI code, except I
have chosen to implement this using the __GLXDRIdisplay, etc. vtables (as
suggested originally in [1]), rather than a maze of ifdefs.  This also means
that LIBGL_ALWAYS_SOFTWARE and LIBGL_ALWAYS_INDIRECT work as expected.

[1] https://lists.freedesktop.org/archives/mesa-dev/2010-May/000756.html

This adds:

* the Windows-DRI extension protocol headers and the windowsdriproto.pc
file, for use in building the Windows-DRI extension for the X server

* a Windows-DRI extension helper client library

* a Windows-specific DRI implementation for GLX clients

The server is queried for Windows-DRI extension support on the screen before
using it (to detect the case where WGL is disabled or can't be activated).

The server is queried for fbconfigID to pixelformatindex mapping, which is
used to augment glx_config.

The server is queried for a native handle for the drawable (which is of a
different type for windows, pixmaps and pbuffers), which is used to augment
__GLXDRIdrawable.

Various GLX extensions are enabled depending on if the equivalent WGL
extension is available.
2016-09-15 13:14:43 +01:00
Emil Velikov
2ac09ac5a5 docs: add news item and link release notes for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-15 11:31:06 +01:00
Emil Velikov
219a2f5f9f docs: add sha256 checksums for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 09460b8cf7)
2016-09-15 11:30:00 +01:00
Emil Velikov
06f83a5548 docs: add release notes for 12.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d79b2e7bf3)
2016-09-15 11:29:59 +01:00
Kenneth Graunke
3bcdc2e3db mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.
This is supposed to be exposed with the GL_KHR_robustness extension,
which we support on ES 2.0 and later.  On desktop GL, it's also exposed
by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
so we also allow desktop GL.

Fixes:
- ES32-CTS.robust.robustness.noResetNotification
- ES32-CTS.robust.robustness.loseContextOnReset

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-15 00:58:47 -07:00
Jason Ekstrand
89a96c8f43 anv/cmd_buffer: Set the L3 atomic disable mask bit in CHICKEN3 on HSW
Without this bit set, the value in "L3 Atomic Disable" won't get applied by
the hardware so we won't properly get L3 atomic caching.

Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex and 198 of
the dEQP-VK.image.atomic_operations.* tests on HSW

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-14 17:53:16 -07:00
Jason Ekstrand
a814e18c96 intel/blorp: Stop setting 3DSTATE_DRAWING_RECTANGLE
The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT
at the GPU initialization time and never sets it again.  The GL driver sets
it every time the framebuffer changes.  Originally, blorp set it to the
size of the drawing area but meant we had to set it back in the Vulkan
driver.  Instead, we can easily just do that in the GL driver's blorp_exec
implementation and not set it in blorp core.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Jason Ekstrand
b56f509ee0 intel/blorp: Emit 3DSTATE_MULTISAMPLE directly
Previously, we relied on a driver hook for 3DSTATE_MULTISAMPLE.  However,
now that Vulkan and GL use the same sample positions, we can set up
3DSTATE_MULTISAMPLE directly in blorp and delete the driver hook.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Jason Ekstrand
c779ad3e66 intel: Move Vulkan sample positions to common code
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-14 17:51:16 -07:00
Marek Olšák
f019255acf Revert "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions"
This reverts commit 524fd55d2d.

Reason: https://bugs.freedesktop.org/show_bug.cgi?id=97808
2016-09-15 00:47:24 +02:00
Francisco Jerez
6d861968ca i965/vec4: Assert that pull constant load offsets are 16B-aligned.
Non-16B-aligned pull constant loads are unlikely to be particularly
useful given that you can get roughly the same effect by using
swizzles on the result.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez
5ca35c6367 i965/vec4: Assert that ATTR regions are register-aligned.
It might be useful to actually handle this once copy propagation
becomes smarter about register-misaligned offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez
f33a8f8fcf i965/vec4: Don't spill non-GRF-aligned register regions.
A better fix would be to do something along the lines of the FS
back-end spilling code and emit a scratch read before any instruction
that overwrites the register to spill partially due to a non-zero
sub-register offset.  In the meantime mark registers used with a
non-zero sub-register offset as no-spill to prevent the spilling code
from miscompiling the program.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez
8531f943d9 i965/vec4: Fix copy propagation for non-register-aligned regions.
This prevents it from trying to propagate a copy through a
register-misaligned region.  MOV instructions with a misaligned
destination shouldn't be treated as a direct GRF copy, because they
only define the destination GRFs partially.  Also fix the interference
check implemented with is_channel_updated() to consider overlapping
regions with different register offset to interfere, since the
writemask check implemented in the function is only valid under the
assumption that the source and destination regions are aligned
component by component.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:59 -07:00
Francisco Jerez
0e657b7b55 i965/vec4: Compare full register offsets in cmod propagation.
Cmod propagation would misoptimize the program if the destination
offset of the generating instruction wasn't exactly the same as the
source region offset of the copy instruction.  In preparation for
adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
8bed1adfc1 i965/vec4: Assign correct destination offset to rewritten instruction in register coalesce.
Because the pass already checks that the destination offset of each
'scan_inst' that needs to be rewritten matches 'inst->src[0].offset'
exactly, the final offset of the rewritten instruction is just the
original destination offset of the copy.  This is in preparation for
adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
3a74e437fd i965/vec4: Don't coalesce registers with overlapping writes not matching the MOV source.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
1bb5074474 i965/vec4: Compare full register offsets in opt_register_coalesce nop move check.
In preparation for adding support for sub-GRF offsets to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
3be0d6d040 i965/vec4: Check that the write offsets match when setting dependency controls.
For simplicity just assume that two writes to the same GRF with
different sub-GRF offsets will potentially interfere and break the
dependency control chain.  This is in preparation for adding sub-GRF
offset support to the VEC4 IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
b52fefc4d5 i965/vec4: Change opt_vector_float to keep track of the last offset seen in bytes.
This simplifies things slightly and makes the pass more correct in
presence of sub-GRF offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
230615e228 i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().
This should also have the side effect of fixing convert_to_hw_regs()
to handle sub-GRF register offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
eb746a80e5 i965/ir: Update several stale comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
47784e2346 i965/ir: Don't print ARF subnr values twice.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:58 -07:00
Francisco Jerez
5d65d51e78 i965/vec4: Print src/dst_reg::offset field consistently for all register files.
C.f. 'i965/fs: Print fs_reg::offset field consistently for all
register files.'.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
ec259f5307 i965/fs: Print fs_reg::offset field consistently for all register files.
The offset printing code in fs_visitor::dump_instruction() was doing
things differently for sources and destinations and for each register
file -- In some cases it would be added to the base register number
fs_reg::nr, in other cases it would follow the base register separated
with a plus sign, in other cases (uniforms) it would do both (!).  The
sub-register offset was also being printed or not rather
inconsistently.  Fix it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
950af5ed40 i965/fs: Misc simplification.
Get rid of some leftover redundant arithmetic introduced during the
conversion to byte offsets and sizes that can be simplified easily.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
80e1d670b4 i965/fs: Get rid of fs_inst::set_smear().
component() was generally a better alternative because of several
issues set_smear() had:

 - It wouldn't take the original stride and offset of the register
   into account, which means that set_smear() on the result of
   e.g. another set_smear() call or an offset() call would give a
   bogus region as result.

 - It was an inherently destructive operation.  See the
   'nir_intrinsic_shader_clock' hunk below for how this could lead to
   subtle bugs in cases where set_smear() was called multiple times on
   the same register like 'r.set_smear(0), r.set_smear(1)' with the
   expectation that each call would return a separate value instead of
   a reference to the same subsequently mutated object.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
8e58e4412f i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
f2d2156ba2 i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.
Also changed the argument names since 'src' and 'dst' don't make that
much sense outside of the context of copy propagation.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
645261c4b2 i965/fs: Change region_contained_in() to use byte units.
This makes the function less annoying to use and more accurate -- We
shouldn't propagate a copy into a register region that wasn't fully
contained in the destination of the copy (IOW, a source region that
wasn't fully defined by the copy) just because the number of registers
written and read by each instruction happened to get rounded up to the
same GRF multiple.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
1c67e27247 i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup.
By keeping track of 'offset' in byte units.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:57 -07:00
Francisco Jerez
2d7d4a7910 i965/fs: Simplify a bunch of fs_inst::size_written calculations by using component_size().
Using component_size() is easier and generally more correct because it
takes into account the register type and stride for you.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
0bc46cc961 i965/fs: Simplify result_live calculation in dead_code_eliminate().
No need to unroll the first iteration of the loop manually.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
62aaef6c83 i965/fs: Simplify and fix buggy stride/offset calculations using subscript().
These were bashing the 'offset' and 'stride' values of several
registers without taking the previous value into account, which
probably didn't matter in practice for optimize_frontfacing_ternary()
because the 'tmp' register already had a known region, but it would
have given the wrong region as result in the other cases in
lower_integer_multiplication().  subscript(..., i) is a more
straightforward way to take the i-th field of a given type from each
channel of a register which should give the right answer as result
regardless of the original 'offset' and 'stride' parameters of the
register region.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
3b7b908787 i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead of rounding.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
ee930c0435 i965/fs: Simplify byte_offset().
In the most common case this can now be implemented as a simple
addition because the offset is already encoded as a single scalar
value in bytes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
bae3a41171 i965/fs: Fix signedness of the return value of fs_inst::size_read().
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
a384503c15 i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units.
This makes the helper function less annoying to use and somewhat more
accurate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
401fc228fd i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf.
The 'scan_inst->dst.offset % REG_SIZE' term in the final
'scan_inst->dst.offset' calculation is obviously bogus.  The offset
from the start of the copy destination register 'inst->dst' where the
destination of the generating instruction 'scan_inst' would be written
to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst'
relative to the source of the copy instruction (AKA rel_offset in the
code below).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
cd0134072a i965/fs: Take into account copy register offset during compute-to-mrf.
This was dropping 'inst->dst.offset' on the floor.  Nothing in the
code above seems to guarantee that it's zero and in that case the
offset of the register being coalesced into wouldn't be taken into
account while rewriting the generating instruction.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:56 -07:00
Francisco Jerez
fcd9d1badc i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().
This makes sure that overlap checks are done correctly throughout the
back-end when the '*this' register starts before the register/size
pair provided as argument, and is actually less annoying to use than
in_range() at this point since regions_overlap() takes its size
arguments in bytes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
56bcb2230f i965/vec4: Port regions_overlap() to the vec4 IR.
This is copy-pasted almost line by line from the FS back-end.  The
only reason it cannot be implemented in terms of backend_reg is that
the backend_reg::nr field doesn't have the same meaning for uniforms
on both back-ends.  It could be easily deduplicated by using a
template function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
c057278c06 i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().
Its only use left in the FS back-end should be using regions_overlap()
instead to avoid getting a false negative result in cases where source
and destination overlap but the former starts before the latter in the
VGRF file.

v2: Put back lost components factor (Iago).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
b42c13a5b8 i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().
fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result.  regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
32d67923b2 i965/fs: Fix LOAD_PAYLOAD handling in register coalesce is_nop_mov().
is_nop_mov() was broken for LOAD_PAYLOAD instructions in two ways: On
the one hand the original destination register offset wasn't being
taken into account which would give incorrect results if it was
already non-zero, and on the other hand all source registers were
being treated as if they had a size of 32B, which is almost never the
case in SIMD16 programs for non-header sources.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
5cc6425d70 i965/fs: Fix can_propagate_from() source/destination overlap check.
The previous overlap condition only made sure that the VGRF numbers or
GRF-aligned offsets were different without taking the amount of data
written and read by the instruction into consideration.  Use the
regions_overlap() helper instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
9ae77d7020 i965/fs: Compare full register offsets in cmod propagation pass.
This could potentially have misoptimized a program in cases where
inst->src[0] had a non-zero sub-GRF offset.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
3a4ea7cf80 i965/fs: Don't consider LOAD_PAYLOAD with stride > 1 source to behave like a raw copy.
Noticed the problem by inspection while typing in the previous commit.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
1164aa1a1b i965/fs: Don't consider LOAD_PAYLOAD with sub-GRF offset to behave like a raw copy.
This was likely the original intention, and at least register coalesce
relies on it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez
a5bbe4c127 i965/vec4: Take into account misalignment in regs_written() and regs_read().
Unlike the FS counterpart of this commit this was likely not (yet) a
bug, but let's fix it already in preparation for implementing support
for sub-GRF offsets in the VEC4 back-end.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
717d8efd58 i965/fs: Take into account misalignment in regs_written() and regs_read().
There was a workaround for this in fs_inst::size_read() for the
SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file
*only*.  We should take this possibility into account for the sources
and destinations of all instructions on all optimization passes that
need to quantize dataflow in 32B increments by adding the amount of
misalignment to the size read or written from the regs_read() and
regs_written() helpers respectively.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
e540045df5 i965/fs: Take into account trailing padding in regs_written() and regs_read().
This fixes regs_written() and regs_read() to return a more accurate
value when the padding left between components due to a stride value
greater than one causes the region bounds given by size_written or
size_read to overflow into the next register.  This could become a
problem in optimization passes that keep track of dataflow using
fixed-size arrays with register granularity, because the overflow
register (not actually accessed by the region) may not have been
allocated at all which could lead to undefined memory access.

An alternative to this would be to subtract the trailing padding
already during the calculation of fs_inst::size_read and
::size_written, but that would break code that currently assumes that
::size_read and _written are whole multiples of the component size,
and would be hard to maintain looking forward because size_written is
assigned from a bunch of different places.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
937373eb25 i965/fs: Handle fixed HW GRF subnr in reg_offset().
This will be useful later on when we start using reg_offset() on fixed
hardware registers.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
1a4b7fdd88 i965/fs: Handle arbitrary offsets in brw_reg_from_fs_reg for MRF/VGRF registers.
This restriction seemed rather artificial...  Removing it actually
simplifies things slightly.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
d6b60934aa i965/fs: Return more accurate read size for LINTERP from fs_inst::size_read.
The LINTERP virtual instruction only reads three scalar components
from the first 16B of the second source, we can now teach size_read()
about it since its return value is represented with byte granularity.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
31a40202b8 i965/fs: Return more accurate read size from fs_inst::size_read for IMM and UNIFORM files.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
728dd30c0a i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte units.
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible.  I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:54 -07:00
Francisco Jerez
e1a918ba7b i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible.  I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
27cb6b081e i965/ir: Drop backend_instruction::regs_written field.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
69fdf13c21 i965/vec4: Replace vec4_instruction::regs_written with ::size_written field in bytes.
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible.  I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
69570bbad8 i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible.  I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
d28cfa35fe i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written.
This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units.  The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation.  The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
c458eeb946 i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.
This is in preparation for dropping fs_inst::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units.  The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation.  The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
be095e11e4 i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.
The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units.  The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.

For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible.  I'll come
back later to clean up any ugliness introduced here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
9a523dd051 i965/ir: Remove backend_reg::reg_offset.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:53 -07:00
Francisco Jerez
fba020e5af i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset expressed in bytes.
The dst/src_reg::offset field in byte units introduced in the previous
patch is a more straightforward alternative to an offset
representation split between ::reg_offset and ::subreg_offset fields.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple FS
back-end bugs in the past.  To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units.  Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

v2: Fix division by the wrong reg_unit in the UNIFORM case of
    convert_to_hw_regs(). (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:52 -07:00
Francisco Jerez
86944e063a i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past.  To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units.  Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:52 -07:00
Eero Tamminen
8ad5fb3a8f glsl: grammar fix
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-09-14 13:35:47 -07:00
Kenneth Graunke
aa70ac172e docs: Mention AEP in release notes 2016-09-14 12:43:16 -07:00
Kenneth Graunke
8c9dddadad i965: Enable ANDROID_extension_pack_es31a on Gen9+.
AEP requires ASTC, which is currently only enabled on Skylake and later.
(It may be possible to extend this to Cherryview/Braswell in the future,
but earlier hardware doesn't have ASTC support.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-14 12:16:25 -07:00
Kenneth Graunke
2d8a3fa7ea nir: Report progress from nir_lower_phis_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:51 -07:00
Kenneth Graunke
32630e211e nir: Report progress from nir_lower_alu_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:49 -07:00
Kenneth Graunke
e6eed3533e nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().
This is mandatory.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:39 -07:00
Rob Clark
bff90aedf1 nir/lower_tex: fix typo with sample_dim
Numeric 2 is actually GLSL_SAMPLER_DIM_3D, which I don't think is what
was intended.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-14 13:45:32 -04:00
Rob Clark
1a8424ceba nir: move tex_instr_remove_src
I want to re-use this in a different pass, so move to nir.h

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-14 13:45:32 -04:00
Rob Clark
2c3f966276 nir/lower_tex: remove tex_instr_find_src()
Turns out it already exists.. so don't duplicate it.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-14 13:45:32 -04:00
Kyle Brenneman
7206b3a556 egl: Add storage for EGL_KHR_debug's state to EGL objects
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
1d535c1e83 egl: Factor out _eglGetSyncAttribCommon
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
5b0b844ac9 egl: Factor out _eglWaitSyncCommon
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
9a992038e7 egl: Lock the display in _eglCreateSync's callers
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
58338c6b65 egl: Factor out _eglCreateImageCommon (v2)
v2:
- Pass disp to RETURN_EGL_ERROR so we unlock the display

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
82a2e2cb50 egl: Factor out _eglWaitClientCommon
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
8cc3d9855f egl: Use _eglCreatePixmapSurfaceCommon consistently
This moves the native pixmap fixup to a helper function so we don't
repeat ourselves.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
7d7ae5e1c3 egl: Use _eglCreateWindowSurfaceCommon consistently
This moves the native window fixup to a helper function so we don't
repeat ourselves.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
017946b724 egl: Factor out _eglGetPlatformDisplayCommon
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
fe6ffa79be egl: Fix typo
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 11:45:58 -04:00
Adam Jackson
e2c067d256 egl: Tear down images and syncs at eglTerminate
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-14 11:45:58 -04:00
Kyle Brenneman
6e50f12b04 egl: Update eglext.h (v2)
Updated eglext.h to revision 33111 from the Khronos repository.

v2:
- Don't (re)move extension includes from eglext.h (Emil Velikov)
- Bump to revision 33111 (Adam Jackson)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2016-09-14 11:45:58 -04:00
Brendan King
95f3e5861c configure.ac: fix the name of the Wayland Scanner pc file
The Wayland Scanner pkg-config file is called wayland-scanner.pc.

Fixes: 153539bd9d ("configure: rework wayland_scanner
       handling (fix make distcheck)")

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Brendan King <Brendan.King@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 14:38:30 +01:00
Eric Engestrom
4bb9efb592 gbm: remove left-over array
e7c8c85785 ("gbm: Removed unused function.") forgot to remove
the global array used only by that function.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-14 14:37:34 +01:00
Martina Kollarova
2527e18eeb gallium: fix return value check
A possible error (-1) was being lost because it was first converted to an
unsigned int and only then checked.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Martina Kollarova <martina.kollarova@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-09-14 14:36:43 +01:00
Marek Olšák
ab29788250 radeonsi: reload PS inputs with direct indexing at each use (v2)
The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: don't call load_input for fragment shaders in emit_declaration

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-14 12:33:00 +02:00
Marek Olšák
007b512f9d radeonsi: get rid of constant buffer preloading
26011 shaders in 14651 tests
Totals:
SGPRS: 1152636 -> 1146340 (-0.55 %)
VGPRS: 728198 -> 727371 (-0.11 %)
Spilled SGPRs: 3776 -> 2218 (-41.26 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35835152 -> 35841268 (0.02 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222372 -> 222559 (0.08 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-09-14 12:32:59 +02:00
Marek Olšák
16be87c904 radeonsi: get rid of img/buf/sampler descriptor preloading (v2)
26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222221 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)

v2: merge codepaths where possible

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-14 12:32:59 +02:00
Marek Olšák
22797d7d83 radeonsi: rename get_sampler_desc -> load_sampler_desc
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-09-14 12:32:59 +02:00
Marek Olšák
5f0a8fbcc8 radeonsi: cosmetic changes in si_shader.c
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-09-14 12:32:59 +02:00
Marek Olšák
afaf27bff3 radeonsi: load streamout buffer descriptors before use (v2)
v2: inline the code and remove the conditional that's a no-op now

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-14 12:32:59 +02:00
Eric Anholt
f597ac3966 vc4: Implement job shuffling
Track rendering to each FBO independently and flush rendering only when
necessary.  This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.

Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
    desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
    by 17%

While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
2016-09-14 06:25:41 +01:00
Eric Anholt
f473348468 vc4: Handle resolve skipping at job submit time.
This is done in vc4_flush currently, but I'm going to make the job always
track the surfaces it might be rendering to instead of putting in the
destinations at flush time.
2016-09-14 06:08:03 +01:00
Eric Anholt
9688166bd9 vc4: Move the render job state into a separate structure.
This is a preparation step for having multiple jobs being queued up at the
same time.
2016-09-14 06:08:03 +01:00
Eric Anholt
c31a7f529f vc4: Always unref the current job surfaces at job reset time.
Drops some tricky logic in vc4_flush() trying to update the pointers, and
fixes a broken lack of unref for MSAA surfaces at context destroy time.
2016-09-14 06:08:03 +01:00
Eric Anholt
774a556b6d vc4: Move job-submit skip cases to vc4_job_submit().
For calling job_submit() directly, I need the skipping here.
2016-09-14 06:08:03 +01:00
Eric Anholt
0ef1b32ebb vc4: Move bin CL trailer to job_submit() time.
To implement job shuffling, I want to be able to call submit() on specific
jobs, turning vc4_flush() into the context's flush-all-jobs hook.
2016-09-14 06:08:03 +01:00
Eric Anholt
a2014c2eb9 vc4: Simplify the DISCARD_RANGE handling
It's really just an upgrade to attempting WHOLE_RESOURCE.  Pulling the
logic out caught two bugs in it: We would try to do so on cubemaps (even
though we're only mapping 1 of the 6 slices), and we would break
persistent coherent mappings by trying to reallocate when we shouldn't.
2016-09-14 06:08:03 +01:00
Eric Anholt
21a27ad956 vc4: Fix incorrect clearing of Z/stencil when cleared separately.
The clear of Z or stencil will end up clearing the other as well, instead
of masking.  There's no way around this that I know of, so if we are
clearing just one then we need to draw a quad.

Fixes a regression in the job-shuffling code, where the clear values move
to the job and don't just have the last clear's value laying around when
you do glClear(DEPTH) and then glClear(STENCIL) separately
(ext_framebuffer_multisample-clear 4 depth)).

This causes regressions in ext_framebuffer_multisample/multisample-blit
depth and ext_framebuffer_multisample/no-color depth, but these were
formerly false positives due to the reference image also being black.  Now
the reference and test images are both being drawn, and it looks like
there's an incorrect resolve of depth during blitting to an MSAA FBO.
2016-09-14 06:08:03 +01:00
Ilia Mirkin
89a49af31e glsl: add core plumbing for GL_ANDROID_extension_pack_es31a
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 20:49:55 -04:00
Ilia Mirkin
83116d084f mesa: introduce glPrimitiveBoundingBoxARB entrypoint
This requires a bit of rejiggering, since normally ES entrypoints alias
core ones, not vice-versa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 20:49:50 -04:00
Ilia Mirkin
a69dc2c412 mesa: add a GLES3.2 enums section, and expose new MS line width params
This also exposes them for ARB_ES3_2_compatibility.

While both specs refer to the new MS line width parameters being
separate from the existing AA line widths, reality begs to differ. It's
the same on all hardware currently supported by mesa. Should hardware
come along that wants these to be different, they're easy enough to
separate out.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 20:49:47 -04:00
Sirisha Gandikota
aa7b410592 aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()
Earlier, the loop pretends to loop over instructions from "start" to "end",
but the callers always pass 8192 for end, which is some huge bogus
value. The real loop termination condition is send-with-EOT or 0. (Ken)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 16:32:42 -07:00
Sirisha Gandikota
1ab92d80a8 aubinator: Make gen_disasm_disassemble handle split sends
Skylake adds new SENDS and SENDSC opcodes, which should be
handled in the send-with-EOT check. Make an is_send() helper
that checks if the opcode is SEND/SENDC/SENDS/SENDSC (Ken)

v2: Make is_send() much more crispier, Mix declaration and
code to make the code compact (Ken)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 16:32:39 -07:00
Sirisha Gandikota
5d2440532f aubinator: Simplify print_dword_val() method
Remove the float/dword union and use the iter->p[f->start / 32]
directly as printf formatter %08x expects uint32_t (Ken)

v2: Make the cleanup much more crispier (Ken)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-13 16:32:24 -07:00
Jason Ekstrand
1eebb60917 anv/image: Set correct base_array_layer and array_len for storage images
Since Vulkan doesn't allow single-slice 3D storage images, we need to just
set the base_array_layer and array_len to the full size of the 3-D LOD.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-13 14:45:49 -07:00
Jason Ekstrand
106709db7b Revert "intel/isl: Ignore base_array_layer and array_len for 3D storage..."
This reverts commit 3943888c94.  It turns out
that commit was pretty-much bogus since it breaks binding a 3-D texture as a
2-D storage image.  The correct fix for the Vulkan CTS tests needs to be in
the Vulkan driver itself rather than ISL.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-13 14:45:15 -07:00
Jason Ekstrand
330104464f anv: Use blorp for doing MSAA resolves
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-13 12:40:13 -07:00
Jason Ekstrand
6bcb1f753e anv: Use blorp for ClearColorImage
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-13 12:40:13 -07:00
Jason Ekstrand
57e87862eb anv: Delete meta_blit2d
Everything that we were once using the blit2d framework for is now done
with blorp.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-13 12:40:13 -07:00
Jason Ekstrand
36286ccb96 anv/blorp: Add a gcd_pow2_u64 helper and use it for buffer alignments
This is a lot cleaner and easier to read than the old piles of if
statements.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-09-13 12:40:13 -07:00
Jason Ekstrand
af5d30de55 anv: Use blorp for CopyBuffer and UpdateBuffer
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-09-13 12:40:13 -07:00
Jason Ekstrand
0f1ca5407a anv: Use blorp for CopyImage
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
58593f24cb anv: Use blorp for CopyBufferToImage
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
f07f44a5bc anv: Use blorp for CopyImageToBuffer
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
9f44745eca anv: Use blorp to implement VkBlitImage
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
52fa3e8347 anv: Make image_get_surface_for_aspect_mask const
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
8f780af968 anv: Add initial blorp support
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
1fe8bf82b2 intel/anv: Use #defines for all __gen_ helpers
This allows us to #undef them later if we don't want them to persist

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
4a6c9e20b8 anv: Generalize emit_urb_setup
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
8cb144bd93 anv/pipeline: Roll compute_urb_partition into emit_urb_setup
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
823ab83432 intel/blorp: Use #defines for all __gen_ helpers
This allows us to #undef them later if we don't want them to persist

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
c0b9776cd6 intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
00e79cec99 isl/state: Don't set QPitch for GEN4_3D surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-13 12:40:12 -07:00
Jason Ekstrand
cb780c9ccf intel/blorp: Rework alloc_binding_table
The original blorp_alloc_binding_table helper was supposed to return the
binding table offset and map along with the surface state maps.  This isn't
quite what we want, however.  What we really want is the binding table
offsets, surface state offsets, and surface state maps.  In the GL driver,
the binding table map *is* an array of surface state offsets.  However, in
Vulkan, this isn't quite true as the entries in the binding table are
surface state offsets combined with another binding table block offset.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-13 12:40:11 -07:00
Marek Olšák
524fd55d2d tgsi/scan: don't set interp flags for inputs only used by INTERP instructions
radeonsi depends on the interp flags a little bit too much.

This fixes 9 randomly failing tests:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
15a127bc2c radeonsi: fix FP64 UBO loads with indirect uniform block indexing
No known tests.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
35d284d08e winsys/amdgpu: don't assume GTT if the VRAM flag isn't set
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
6df872df59 radeonsi: clean up CP DMA emit code
Unify the clear and copy paths, clean up the definitions.
It looks more like a rework. It's a preparation for GDS support,
which might or might not come.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
84860dd0bb radeonsi: print the IB and buffer list in VM fault reports
This is a fallout from reworking the debug flags.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
fd69fa65a8 radeonsi: add sampler view BOs to the BO list last
If si_sampler_view_add_buffer ends up flushing, then the code
in begin_new_cs would previously have added the buffer(s) for
whatever was previously bound to that slot. Now it would add only
the new buffer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
275c073c6a radeonsi: export SampleMask from pixel shaders at full rate
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
b89854b0c7 gallium/radeon: set new r600_resource fields correctly in other places too
This was missed in:

    commit 0d2e43fcb1
    Author: Marek Olšák <marek.olsak@amd.com>
    Date:   Thu Aug 18 16:30:00 2016 +0200

        gallium/radeon: derive buffer placement and flags only at initialization

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
c723acc03d ddebug: dump shader buffers and images
this was unimplemented

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák
fdd457c89f ddebug: fix a crash in resource_get_handle
broken recently

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Jan Vesely
b671909d27 radeon: Don't check DCC on pipe buffers
Fixes segfaults in EG compute since:
commit 21de3be8e6
radeonsi: fix texture format reinterpretation with DCC

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-13 14:23:26 -04:00
Andy Furniss
304f70536a vl/util: Fix YV12/I420 convert to NV12 U/V reversal
Fix VAAPI YV12/I420 convert to NV12 U/V reversal.
Input order is YVU when this is called.

Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-09-13 13:58:40 -04:00
Jason Ekstrand
6ac469a6c3 anv/allocator: Use VG_NOACCESS_WRITE in anv_bo_pool_free
Previously, we were relying on the fact that VALGRIND_MEMPOOL_FREE came
later on in the function to prevent "link->bo = bo" from causing an invalid
write.  However, in the case where the size requested by the user is very
small (less than sizeof(struct anv_bo)), this isn't sufficient.  Instead,
we should call VALGRIND_MEMPOOL_FREE early and then use VG_NOACCESS_WRITE.
We do, however, have to call VALGRIND_MEMPOOL_FREE after reading bo_in
because it may be stored in the bo itself.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-13 10:44:03 -07:00
Jason Ekstrand
3943888c94 intel/isl: Ignore base_array_layer and array_len for 3D storage surfaces
The time we want to restrict the Z range of a 3-D surface is when rendering
to it.  For storage surfaces, we always want he full range.  However, we
still need to set MinimumArrayElement and RenderTargetViewExtent to
sensible values so we'll just set them to the reasonable defaults we used
before we started respecting the base_array_layer and array_len.

This fixes a bunch of Vulkan CTS regressions caused by 48f195d7c6.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97790
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-13 10:43:21 -07:00
Jose Fonseca
62affedbed appveyor: Update winflexbison download URL.
This particular version got moved into a `old_versions` subdirectory.
2016-09-13 17:54:51 +01:00
Jason Ekstrand
a1e49be713 i965: Use blorp_copy for all copy_image operations on gen6+
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-12 19:44:05 -07:00
Jason Ekstrand
540395bf9b i965/blorp: Add a copy_miptrees helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-12 19:44:05 -07:00
Jason Ekstrand
d038adca0e intel/isl: Add support for RGB formats in X and Y-tiled memory
Normally, using a non-linear tiling format helps improve cache locality by
ensuring that neighboring pixels are usually close-by in memory.  For RGB
formats, this still sort-of holds, but it can also lead to rather terrible
memory access patterns where a single RGB pixel value crosses a tile
boundary and gets split into two pieces in different 4K pages.  It also
makes for some rather awkward calculations because your tile size is no
longer an even multiple of surface element size.  For these reasons, we
chose to simply never create tiled RGB images in the Vulkan driver.

The GL driver, however, is not so kind so we need to support it somehow.  I
briefly toyed with a couple of different schemes but this is the best one I
could come up with.  The fundamental problem is that a tile no longer
contains an integer number of surface elements.  I briefly considered a
couple other options but found them wanting:

 1) Using floats for the logical tile size.  This leads to potential
    rounding error problems.

 2) When presented with a RGB format, just make the tile 3-times as wide.
    This isn't so nice because now our tiles are no longer power-of-two
    size.  Also, it can force the row_pitch to be larger than needed which,
    while not strictly a problem for ISL, causes incompatibility problems
    with the way the GL driver chooses surface pitches.

The chosen method requires that you pay attention and not just assume that
your tile_info is in the units you think it is.  However, it's nice because
it provides a nice "these are the units" declaration in isl_tile_info
itself.  Previously, the tile_info wasn't usable as a stand-alone structure
because you had to also know the format.  It also forces figuring out how
to deal with inconsistencies between tiling and format back to the caller
which is good because the two different consumers of isl_tile_info really
want to deal with it differently:  Computation of the surface size wants
the fewest number of horizontal tiles possible while get_intratile_offset
is far more concerned with things aligning nicely.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Chad Versace <chadversary@chromium.org>
2016-09-12 19:44:05 -07:00
Jason Ekstrand
883086500b intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7
The restriction that Y-tiled surfaces must have valign == 4 only aplies to
render targets but we were applying it universally.  This causes problems
if ISL_FORMAT_R32G32B32_FLOAT is used because it requires valign == 2; this
should be okay because you can't render to that format.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-12 19:44:05 -07:00
Jason Ekstrand
54db5afd2c intel/blorp: Work in terms of logical array layers
When Ivy Bridge introduced array multisampling, someone made the decision
to do lots of stuff throughout the driver in terms of physical array layers
rather than logical array layers.  In ISL, we use logical array layers most
of the time and it really makes no sense to use physical array layers in
the blorp API.  Every time someone passes physical array layers into blorp
for an array multisampled surface, they're always divisible by the number
of samples and we divide right away.

Eventually, I'd like to rework most of the GL driver internals to use
logical array layers but that's going to be a big project and will probably
happen as part of the ISL conversion.  For now, we'll do the conversion in
brw_blorp and let blorp just use the logical layers.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
fa4627149d intel/blorp: Increase the presision of coordinate transform calculations
The result of this calculation goes into an fma() in the shader and we
would like it to be as precise as possible.  The division in particular
was a source of imprecision whenever dst1 - dst0 was not a power of two.
This prevents regressions in some of the new Vulkan CTS tests for blitting
using a filtering of NEAREST.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
c70be1ead5 intel/blorp: Add a swizzle parameter to blorp_clear
While we're here, we also re-arrange the parameters to better match the
parameter order of blorp_blit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
ea1399aba0 intel/blorp: Make color_write_disable const and optional
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
9286f62f11 intel/blorp: Add support for clearing R9G9B9E5 surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
ab03e59867 intel/blorp: Add support for RGB destinations in copies
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
5ae8043fed intel/blorp: Add an entrypoint for doing bit-for-bit copies
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
941b4d063a intel/blorp: Pull the guts of blorp_blit into a helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
4e03edf189 intel/blorp: Stop using the X/YOffset field of RENDER_SURFACE_STATE
While it can be useful, the field has substantial limtations.  In
particular, the bittom 2 or 3 bits is missing so your offset always has to
be a multiple of 4 or 8.  While surface alignments usually work out to make
this ok, when you start trying to fake compressed surfaces as uncompressed
(which we will want to do) this falls apart.  The easiest solution is to
simply align all offsets to a tile boundary and munge the regions we're
copying to account for the intratile offset.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
c170606fc6 intel/blorp: Use fake_interleaved_msaa in retile_w_to_y
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
a613449f71 intel/blorp: Use isl_get_interleaved_msaa_px_size_sa
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
8ac99eabb6 intel/isl: Add a helper for getting the size of an interleaved pixel
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
3cc15ba5bb intel/blorp: Handle 3D surfaces in convert_to_single_slice
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
43d25edf78 intel/isl: Fix an assert in get_intratile_offset_sa
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
6da968b651 intel/blorp: Fix the early return condition in convert_to_single_slice
The convert_to_single_slice operation is *mostly* idempotent.  The only
non-repeatable thing it does is that, when it sets the intratile offset
fields, it just overwrites them instead of doing a += operation.  This is
supposed to be ok because we have an early return at the top that should
make it bail of the surface is already a single slice.  Unfortunately, the
if condition has been broken ever since it was first added in 96fa98c18.
This commit fixes the condition and adds an assert to ensure we don't stomp
any non-zero intratile offsets.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
ec7e0d62c5 intel/blorp: Use the surface format for computing offsets
If we use the view format, it may be an uncompressed view of a compressed
image which throws things off.  Since we're computing offsets of images, we
want the actual surface offset anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
7f2fecd114 intel/blorp: Don't assume R8_UINT in convert_to_single_slice
We're going to use it for more than just stencil textures

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
2fc9c7e3d9 intel/blorp: Take a destination swizzle in blorp_blit
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
2dba5489ae intel/blorp: Take an isl_swizzle instead of a SWIZZLE
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Jason Ekstrand
7ddb21708c intel/isl: Add an isl_swizzle structure and use it for isl_view swizzles
This should be more compact than the enum isl_channel_select[4] that we
were using before.  It's also very convenient because we already had such a
structure in the Vulkan driver we just needed to pull it over.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-09-12 19:42:57 -07:00
Kenneth Graunke
376d1dc2f1 docs: Add OES_tessellation_shader to the release notes. 2016-09-12 17:24:35 -07:00
Kenneth Graunke
049cee2c16 docs: Mark OES_tessellation_shader as done. 2016-09-12 17:23:20 -07:00
Ilia Mirkin
742832434a st/mesa: fix is_scissor_enabled when X/Y are negative
Similar to commit 49c24d8a24 ("i965: fix noop_scissor range issue on
width/height") - take the X/Y into account to determine whether the
scissor covers the whole area or not.

Fixes the recently-added gl-1.0-scissor-depth-clear-negative-xy piglit
test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-09-12 20:07:21 -04:00
Mauro Rossi
6b9d7e69ee android: add support for libmesa_amdgpu_addrlib
Android porting of the following commits:

f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place."
69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code"

This patch fixes android building errors

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-13 10:06:04 +10:00
Dave Airlie
0fe9152868 u_endian: add android to glibc clause
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
2016-09-13 10:04:13 +10:00
Jason Ekstrand
24be630660 Revert "i965: Drop the maximum 3D texture size to 512 on Sandy Bridge"
This reverts commit 6ba88bce64.  The commit
was erroneous because GL has a separate limit, GL_MAX_FRAMEBUFFER_LAYERS
which guards the number of layers you are allowed to render into.

The GL 4.5 spec says:

   "The framebuffer attachment point attachment is said to be framebuffer
   attachment complete if [...] all of the following conditions are true:

      [...]

      If image is a three-dimensional, one- or two-dimensional array, or
      cube map array texture and the attachment is layered, the depth or
      layer count of the texture is less than or equal to the value of the
      implementation-dependent limit MAX_FRAMEBUFFER_LAYERS."

and goes on to say that "framebuffer complete" requires all attachments to
be "framebuffer attachment complete".

On Sandy Bridge, we set GL_MAX_FRAMEBUFFER_LAYERS to 512 so creating a 3D
texture bigger than 512 is fine; you just can't render into all of the
slices at once.

Fixes ES3-CTS.gtf.GL3Tests.npot_textures.npot_tex_image on Sandy Bridge

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-09-12 16:52:10 -07:00
Jason Ekstrand
2519237c24 intel/blorp: Handle the 512 layers restriction on Sandy Bridge
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-12 16:48:56 -07:00
Jason Ekstrand
48f195d7c6 intel/isl: Treat 3-D textures as 2-D arrays for rendering
In particular, this means that isl_view::base_array_layer and
isl_view::array_len get applied to 3-D textures but only when rendering.
We were already applying isl_view::base_array_layer for rendering into 3-D
textures so this isn't a huge deviation.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-12 16:48:56 -07:00
Sirisha Gandikota
63fe9ab894 aubinator: Simplify gen_disasm_create()'s devinfo handling
Copy the whole devinfo structure instead of just few fields (Ken)

Earlier, copied only couple of fields which added more code. So,
simplify code by copying the whole structure.

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-12 16:20:04 -07:00
Sirisha Gandikota
d2869c95fb aubinator: Fix compiler warning
Add 'const' qualifier to gen_field_iterator::p pointer (Ken)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-12 16:19:56 -07:00
Julien Isorce
bf901a2f8c st/va: also honors interlaced preference when providing a video format
This fixes a crash when using the prefered video format with vaapisink
on Nvidia hardwares.
Also caught by the following assert:
  nouveau_vp3_video.c:91: Assertion `templat->interlaced' failed.

TEST= gst-launch-1.0 videotestsrc ! video/x-raw, format=NV12 ! vaapisink

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Tested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-12 22:17:40 +01:00
Samuel Pitoiset
3f3640c86c tgsi: document semantics for compute shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-12 22:15:10 +02:00
Kenneth Graunke
54138af1cd mesa: Enable OES/EXT_tessellation_shader for ES 3.1 + ARB_tess drivers.
Drivers which support ARB_tessellation_shader and ES 3.1 now will
expose OES_tessellation_shader and EXT_tessellation_shader as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-12 13:07:38 -07:00
Marek Olšák
546bc07349 radeonsi: don't preload constants at the beginning of shaders
LLVM can CSE the loads, thus we can always re-load constants before each
use. The decrease in SGPR spilling is huge.

The best improvements are the dumbest ones.

26011 shaders in 14651 tests
Totals:
SGPRS: 1453346 -> 1251920 (-13.86 %)
VGPRS: 742576 -> 728421 (-1.91 %)
Spilled SGPRs: 52298 -> 16644 (-68.17 %)
Spilled VGPRs: 397 -> 369 (-7.05 %)
Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
Code Size: 36136488 -> 36001064 (-0.37 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 219315 -> 222221 (1.33 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-12 21:06:57 +02:00
Jason Ekstrand
e2fb044115 intel/blorp: Add a TODO file
This provides a nice little place to share notes on what still needs to be
done and/or would be nice to have in BLORP.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 10:14:49 -07:00
Alejandro Piñeiro
6165603209 i965: check for GL_TEXTURE_EXTERNAL_OES at miptree_create_for_teximage
Forgotten on commit "i965: Fix calculation of the image height at start level".

Thanks to Ilia Mirkin for point it.

Fixes the following regressions on Haswell and Broadwell:
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestSimpleUnassociated (crash back to pass)
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestSimple (crash back to fail)
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestVertexShader (crash back to fail)

https://bugs.freedesktop.org/show_bug.cgi?id=97761

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 18:10:50 +02:00
Chuanbo Weng
9a1eb54237 gbm: fix potential NULL deref of mapImage/unmapImage.
The mapImage/unmapImage functions of DRIimage extension can be NULL,
so we should add additional check for them.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-12 16:52:55 +01:00
Emil Velikov
63faf7de61 Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.
A earlier sync with the Khronos headers added _extension_ prototype
guards to all the GLES2/3/31/32 core entry points. Effectively breaking
all the applications that aim to be portable and do not set the define.

The issue has been reported to Khronos (internal bugzilla #14206) and is
being worked on. Until updated/fixed headers are released locally fix
the issue.

The following report is when building weston.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97773

Cc: Armin Krezović <krezovic.armin@gmail.com>
Cc: Emmanuel Gil Peyrot <emmanuel.peyrot@collabora.com>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
Fixes: 6a5504de2f ("Update Khronos-supplied headers to r33100")
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-12 16:52:43 +01:00
Emil Velikov
ceaa2e1738 aubinator: rework print_help()
Rather than using platform specific methods to retrieve the program
name pass it explicitly. The function is called directly from main().

Similarly - basename comes in two versions POSIX (can modify string,
always pass a copy) and GNU (never modifies the string).

Just printout the complete program name, esp. since the program is not
meant to be installed. Thus using $basename is unlikely to work, not to
mention it is misleading.

Reported-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
2016-09-12 16:49:59 +01:00
Adam Jackson
0cb1428fbb docs: Note MESA_configless_context as superseded
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-12 11:29:11 -04:00
Adam Jackson
d9f5b1915b egl: Rename MESA_configless_context bit to KHR_no_config_context
Keep the old name in the extension string, but refer to the KHR
extension internally.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-12 11:29:09 -04:00
Adam Jackson
cc45a5c308 egl: QueryContext on a configless context returns zero
MESA_configless_context does not specify the interaction with
QueryContext at all, and the code to generate an error in this case
predates the Mesa extension. Since EGL_NO_CONFIG_{KHR,MESA} are
numerically identical there's no way to distinguish which one the
application asked for, so use the KHR behaviour.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-12 11:28:38 -04:00
Boyuan Zhang
e5009b7c26 st/va: enable vbr rate control for vaapi encode
This patch enables variable bit-rate for vaapi encoding. According to va.h,
target bit-rate equals to maximum bit-rate multiplies by target_percentage.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-12 10:34:53 -04:00
Leo Liu
6a7f79af9b vl/rbsp: match initial escaped bits with valid in the buffer
Otherwise the check for the three byte will not make sense.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-09-12 10:09:27 -04:00
Timothy Arceri
2da15a3b89 egl: fix gcc warning braces around scalar initializer
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-09-12 22:43:49 +10:00
Nicolai Hähnle
b8703e363c winsys/radeon: rename nrelocs, crelocs to max_relocs, num_relocs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:55 +02:00
Nicolai Hähnle
d66bbfbede winsys/radeon: don't pre-allocate the relocations array
It's really not necessary. Switch to an exponential resizing strategy.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:53 +02:00
Nicolai Hähnle
f47da2e34f winsys/radeon: remove unused radeon_cs_context::priority_usage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:51 +02:00
Nicolai Hähnle
17fff0c2de winsys/amdgpu: remove amdgpu_cs_lookup_buffer
The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:47 +02:00
Nicolai Hähnle
12657a7abf winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:07 +02:00
Nicolai Hähnle
3cdeb2a177 winsys/amdgpu: remove initial buffer list allocation
It's really not necessary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:04 +02:00
Nicolai Hähnle
cc53dfda9f winsys/amdgpu: extract adding a new buffer list entry into its own function
While at it, try to be a little more robust in the face of memory allocation
failure.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:01 +02:00
Nicolai Hähnle
11cbf4d7ae winsys/amdgpu: use only one fence per BO
The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).

Therefore, keeping only the last fence around is sufficient.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:59 +02:00
Nicolai Hähnle
480ac143df winsys/amdgpu: add do_winsys_deinit function
The idea is to have matching init/deinit functions so that deinit can be
re-used for cleanup in the error path of amdgpu_winsys_create.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:56 +02:00
Nicolai Hähnle
9fb8d354ca winsys/amdgpu: clean up error paths in amdgpu_winsys_create
No need to call pb_cache_deinit, because the cache hasn't been initialized
at that point.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:53 +02:00
Nicolai Hähnle
a6c38d47d4 gallium/radeon: page alignment for buffers is unnecessary
In some places (e.g. shader program pointers) we require 256 bytes alignment.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:45 +02:00
Nicolai Hähnle
339867c077 gallium/radeon/winsyses: remove #includes of pb_bufmgr.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:36 +02:00
Topi Pohjolainen
e54b70b3d4 i965/rbc: Clarify rational given for shader image resolves
Original commit added documentation explaining lossless compression
case:

commit 56f29911ec
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Tue Feb 2 10:00:41 2016 +0200

    i965: Add a flag telling color resolve pass to ignore CCS_E

It, however, easily gives the impression that the sole purpose
of the intel_miptree_resolve_color() is to address lossless
compression. Original intention is to document the lack of
INTEL_MIPTREE_IGNORE_CCS_E flag given for the resolve call.

This patch fixes this along with a typo found spotted further
down.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:30 +03:00
Topi Pohjolainen
1df4b666ed i965/blorp: Use hw generetad primitive copies for layered clears
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:30 +03:00
Topi Pohjolainen
b712aa2614 i965/blorp: Sanity check all layers before actual clear
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:30 +03:00
Topi Pohjolainen
a1c7de09dc intel/blorp: Add plumbing for setting color clear layer count
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
514afdce95 intel/blorp: Allow multiple layers
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
e597821ef2 i965/blorp: Instruct vertex fetcher to provide prim instance id
This will indicate target layer (Render Target Array Index) needed
for layered clears.

v2: Use 3DSTATE_VF_SGVS for gen8+

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
39712b2a14 i965/rbc: Allocate mcs directly
such as we do for compressed msaa. In case of non-compressed simgle
sampled buffers the allocation of mcs is deferred until there is
actually a clear operation that needs the mcs.
In case of render buffer compression the mcs buffer always needed
and there is no real reason to defer the allocation. By doing it
directly allows to drop quite a bit unnecessary complexity.

Patch leaves brw_predraw_set_aux_buffers() a no-op. Subsequent
patches will re-use it and it seemed cleaner to leave it instead
of removing and re-introducing.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
024a39511f isl/gen8+: Allow 1D and 3D auxiliary surfaces
Otherwise once mcs buffer gets allocated without delay for lossless
compression (same as we do for msaa), assert starts to fire in
piglit case: tex3d. The test uses depth of one which is in fact
supported even now.

v2 (Jason): Allow also 1D case as there is nothing in the specs
            constraining it either.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
6939532593 i965: Add sanity check for non-compressible texture views
v2: Fix missing inline declaration

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:48:29 +03:00
Topi Pohjolainen
1b6fcc08df i965/rbc: Consult rb settings for texture surface setup
Once mcs buffer gets allocated without delay for lossless
compression (same as we do for msaa), one gets regression in:

GL45-CTS.texture_barrier_ARB.same-texel-rw

Setting the auxiliary surface for both sampling engine and data
port seems to fix this. I haven't found any hardware documentation
backing this though.

v2 (Jason): Prepare also for the case where surface is sampled with
            non-compressible format forcing also rendering without
            compression.
v3: Split asserts and decision making.
v4: Detailed comment provided by Jason explaining the need for using
    auxiliary buffer for texturing when the same surface is also
    used as render target.
    Added check for existence of renderbuffer before considering if
    underlying miptree matches.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 11:46:13 +03:00
Topi Pohjolainen
22d9a4824b i965: Track non-compressible sampling of renderbuffers
v3:
   - Actually set the flags when needed instead of falsely
     overwriting them (Jason).
   - Use more generic name for flag (dropped RENDERBUFFER)
   - Consult also shader images
v4:
   - Consult only lossless compressd shader images

v5:
   - Check the existence of renderbuffer before considering
     if it matches the given miptree

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 08:58:38 +03:00
Topi Pohjolainen
1f51217d99 i965: Replace boolean rb surface state setup argument with flags
And add plumbing to provide it all the way to surface state emitter.
This is not used yet but will be in subsequent patches to carry
additional constraints.

v2 (Jason): Use uint32_t instead of int as the type

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 08:58:38 +03:00
Topi Pohjolainen
1634a4963c i965/rbc: Allow integer formats as advertised in isl_format.c
Blorp consults brw_is_color_fast_clear_compatible() to see if any
restrictions apply for fast clear in addition to the capablities
advertised in isl_format.c::format_info[]. On Gen8+ integer formats
are backlisted for plain old fast clear but there is no reason why
lossless compression shouldn't be supported. In fact, lossless
compression of integer formats is already supported for normal
render paths.

This patch prepares for dropping the delayed allocating of the mcs
buffer for lossless compression. Until now the skip of fast clear
also prevented the mcs being allocated and hence the lossless
compression being effectively turned off for integer formats.
Once the mcs buffer is allocated beforehand, the assertion addressed
here would start triggering.

v2: Drop the assert instead of relaxing it (Jason)
    Fix typo while at it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-12 08:58:38 +03:00
Alejandro Piñeiro
e77bf32475 i965: remove unused variable at intel_miptree_create_for_teximage
After commit "i965: Fix calculation of the image height at start level", it is
not needed. This commit removes the "warning: unused variable ‘i’" warning.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 07:21:32 +02:00
Thomas Helland
08c5b10ae9 mesa/glsl: Move string_to_uint_map into the util folder
This clears the last bits of the usecases of the hash table
located in mesa/program, allowing us to remove it.

V2: Rebase on top of changes to Makefile.sources

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
e55eb2b7ea glsl: Convert glcpp-parse to the util hash table
And change the include in glcpp.h accordingly.

V2: Whitespace fix

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
16fb318d0c glsl: Convert loop analysis to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
ec453979db mesa: Convert symbol table to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
f224ef4392 glsl: Convert varying test to the util hash table
V2: remove now unused ht_count_callback() (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
9efa977be5 glsl: Convert output read lowering to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
6adcc8f283 glsl: Convert interface block lowering to the util hash table
V2: move comment to correct location (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
5482d31b86 glsl: Convert if lowering to use a set
Also do some minor whitespace cleanups

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
85a197c4ed glsl: Convert linker to the util hash table
We are getting the util hash table through the include in
program/hash_table.h for the moment until we migrate the
string_to_uint_map to a separate file.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
f10cc9407b glsl: Convert link_varyings to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
e7f91d9de1 glsl: Change link_functions to use a set
The "locals" hash table is used as a set, so use a set to
avoid confusion and also spare some minor memory.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
2228548f83 glsl: Convert recursion detection to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
9b3c0f81a7 glsl: Convert constant_expression to the util hash table
V2: Fix incorrect ordering on hash table insert

V3: null check value returned by _mesa_hash_table_search()
    (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
9f188be8a6 glsl: Convert ast_to_hir to the util hash table
V2: Rebase to the adaption of new hashing functions

V3: move previous_label declaration to where it is used
    (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
9ac6d61751 glsl: Convert ir_clone to the util hash table
V2: add braces to multiline if (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
5b5d4ea4a0 glsl: Convert function inlining to the util hash table
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
eef2be6822 mesa: Convert string_to_uint_map to the util hash table
And remove the now unused hash_table_replace.

V2: Actually do the equivalent thing, and don't leak memory

V3: fix minor typo in comment (Timothy Arceri)

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
ddb8639b18 util: Move hash_table_call_foreach to util hash table
It is included through the util/hash_table include in
the program hash_table, so this should be safe.
This will be needed when we start converting each use of
the program_hash_table, as some places need this function.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
cf4a4820ac mesa: Remove prog_hash_table.c
Here we make the prog_hash_table functionally equivalent to
the one in util by wrapping the remaing functions that differ.

We also move the functions to the header so we can remove the c
file.

This enables us to do a step-by-step replacement of the table.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Thomas Helland
42ba435fd1 mesa: Remove unused hash table includes
This should prevent us from rebuilding the world.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-12 10:48:35 +10:00
Ilia Mirkin
148fbf32a8 freedreno/a3xx: disable filtering for texture buffers and int textures
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-11 13:14:06 -04:00
Niels Ole Salscheider
cfa914a1b4 st/clover: Define __OPENCL_VERSION__ on the device side
This is required by the OpenCL standard.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-09-10 15:48:54 -07:00
Ilia Mirkin
a8c0c7301c gm107/ir: allow indirect inputs to be loaded by frag shader
Looks like the GM107 IPA op does not allow a separate offset when
using an indirect register. Instead we must use AL2P like we do for
indirect vertex operations on Kepler+.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-09-10 13:40:04 -04:00
Ilia Mirkin
a22aee5ad1 gm107/ir: AL2P writes to a predicate register
We have to force it to write to predicate 7 (aka PT) in order for it not
to mess up another predicate. Unclear what would be returned in the
predicate, perhaps an error code for out-of-bounds requests. Blob
doesn't seem to check it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-09-10 13:36:20 -04:00
Antia Puentes
83e8617f4b i965: Fix calculation of the image height at start level
- Fixes CTS tests:

* GL44-CTS.shader_image_size.advanced-nonMS-cs-float
* GL44-CTS.shader_image_size.advanced-nonMS-cs-int
* GL44-CTS.shader_image_size.advanced-nonMS-cs-uint
* GL44-CTS.shader_image_size.advanced-nonMS-gs-float
* GL44-CTS.shader_image_size.advanced-nonMS-gs-int
* GL44-CTS.shader_image_size.advanced-nonMS-gs-uint
* GL44-CTS.shader_image_size.advanced-nonMS-tes-float
* GL44-CTS.shader_image_size.advanced-nonMS-tes-int
* GL44-CTS.shader_image_size.advanced-nonMS-tes-uint
* GL44-CTS.shader_image_size.advanced-nonMS-vs-float
* GL44-CTS.shader_image_size.advanced-nonMS-vs-int
* GL44-CTS.shader_image_size.advanced-nonMS-vs-uint

v1: (written by Dave Airlie) Always shift height images for levels.
Fixed the CTS test.

v2: Only shift height if the texture is not an 1D_ARRAY,
it fixes assertion in GL44-CTS.texture_view.gettexparameter
due to the original patch (Antia).

v3: Remove the loop. Do not shift height either for 1D textures.
Use an explicit switch and add an assertion (levels == 0) for
multisampled textures (Jason).

v4: Rectangle textures can not have levels either (Ilia Mirkin).

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-10 12:52:32 +02:00
Marek Olšák
08bcbfdc07 radeonsi: flush TC L2 before using a compute indirect buffer
There is no known test for this.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:07 +02:00
Marek Olšák
a5a2cc530c radeonsi: fix the VGT performance tweak for small instances
Based on the VGT spec.

The Vulkan driver doesn't do it optimally and they plan to fix it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
a67d81580b radeonsi: remove the cache_flush atom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
f9750932ea winsys/amdgpu: replace OUT_CS with radeon_emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Marek Olšák
81da78bfc3 winsys/radeon: replace OUT_CS with radeon_emit
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 22:45:06 +02:00
Christoph Haag
55ba5fa9a6 doc: document GALLIUM_DRIVER
v2: Add dot at end of sentence

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-09 09:24:28 +02:00
Haixia Shi
b1d636aa00 egl/android: Set EGL_MAX_PBUFFER_WIDTH and EGL_MAX_PBUFFER_HEIGHT
Set config attributes EGL_MAX_PBUFFER_WIDTH and EGL_MAX_PBUFFER_HEIGHT to
hard-coded non-zero values. These two attributes are required on Android.

v2: use _EGL_MAX_PBUFFER_WIDTH/HEIGHT from egldefines.h
    (based on discussion on the first version)

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-09 07:51:04 +03:00
Tapani Pälli
478fbc2348 android: depend on libmesa_genxml from i965 Android.gen.mk
Static library dependency is required to pull the generated
XML headers into the generated C file.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-09 07:51:04 +03:00
Tapani Pälli
4542c7ed5f i965: release GLSL IR in LinkShader after it's not needed
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-09-09 07:51:04 +03:00
Tapani Pälli
2cd02e30d2 glsl: use hash instead of exec_list in copy propagation
This change makes copy propagation pass faster. Complete link time
spent in test case attached to bug 94477 goes down to ~400 secs from
over 500 secs on my HSW machine. Does not fix the actual issue but
brings down the total. No regressions seen in CI.

v2: do not leak hash_table structure

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-09 07:50:42 +03:00
Jason Ekstrand
175ac629be i965/fs: Fail the shader compile instead of asserting when we can't spill
Blorp doesn't handle spilling so we set allow_spilling to false in that
case.  The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide.
This commit makes it so that we fail the 16-wide compile and successfully
fall back to 8-wide instead of just assert-failing when trying to compile
the 16-wide shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-08 20:53:01 -07:00
Jason Ekstrand
88a2a2e053 nir/gcm: Add global value numbering support
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other.  For instance, in
you have

if (...) {
   ssa_1 = ssa_0 + 7;
   /* use ssa_1 */
} else {
   ssa_2 = ssa_0 + 7;
   /* use ssa_2 */
}

Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to

if (...) {
   ssa_1 = ssa_0 + 7;
   /* use ssa_1 */
} else {
   /* use ssa_1 */
}

Obviously, we just broke SSA form which is bad.  Global code motion,
however, will repair this for us by turning this into

ssa_1 = ssa_0 + 7;
if (...) {
   /* use ssa_1 */
} else {
   /* use ssa_1 */
}

This intended to eventually mostly replace CSE.  However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM.  This makes it a bit more appropriate for use as a
clean-up in a late optimization run.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-08 20:53:01 -07:00
Jason Ekstrand
99ff4b3eb2 nir/gcm: Call nir_metadata_preserve
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-08 20:53:01 -07:00
Max Staudt
02675622b0 r300g: Set R300_VAP_CNTL on RSxxx to avoid triangle flickering
On the RSxxx chip series, HW TCL is missing and r300_emit_vs_state()
is never called.

However, if R300_VAP_CNTL is never set, the hardware (at least the
RS690 I tested this on) comes up with rendering artifacts, and
parts that are uploaded before this "fix" remain broken in VRAM.
This causes artifacts as in fdo#69076 ("triangle flickering").

It seems like this setup needs to happen at least once after power on
for 3D rendering to work properly. In the DDX with EXA, this happens in
RADEON_SWITCH_TO_3D() when processing an XRENDER Composite or an
Xv request. So playing back a video or starting a GTK+2 application
fixes 3D rendering for the rest of the session. However, this auto-fix
doesn't happen when EXA is not used, such as with GLAMOR or Wayland.

This patch ensures the register is configured even in absence of
the DDX's EXA module.

The register setting is taken from:
  xf86-video-ati  --  RADEONInit3DEngineInternal()
  mesa/src/mesa/drivers/dri/r300  --  r300EmitClearState()

Tested on RS690.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Max Staudt <mstaudt@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-09 13:30:47 +10:00
Marek Olšák
5981ab5445 gallium: remove PIPE_BIND_TRANSFER_READ/WRITE
not used in any useful way

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
0fbaf74977 radeonsi: unify si_set_optimal_micro_tile_mode call sites
There is nothing special happening in those code blocks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
758bc52959 radeonsi: fix texture reinterpretation after DCC fast clear
The problem is that TC-compatible DCC clear codes translate
into different clear values when you change the format.

I have a new piglit reproducing the issue.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
46c425e7c8 radeonsi: enable DCC fast clear for 128-bit formats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
831c0c80f1 radeonsi: clamp integer clear color values for DCC fast clear
It should be possible to get TC-compatible fast clear more often now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 22:51:33 +02:00
Marek Olšák
93f3d8e10d Revert "radeonsi: enable SDMA on CIK"
This reverts commit 0241d8300f.

It doesn't work with mobile Bonaire. It looks like the programming of
tiling parameters is wrong on some chips.
2016-09-08 22:51:33 +02:00
Christoph Haag
7b414bc512 doc: fix typo of GALLIUM_HUD_TOGGLE_SIGNAL
In the original commit message in 56a1c10 it was wrongly used too:
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-08 20:19:35 +02:00
Jason Ekstrand
a00bd7bc27 nir/spirv: Refactor variable deocration handling
Previously, we dind't apply variable decorations to the members of a split
structure variable.  This doesn't quite work, unfortunately, because things
such as the "flat" qualifier may get applied to an entire structure instead
of propagated to the members.  This fixes 9 of the new CTS tests in the
dEQP-VK.glsl.linkage.varying.struct.* group.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-09-08 10:45:23 -07:00
Jason Ekstrand
f5505730d3 nir/spirv: Break variable decoration handling into a helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-09-08 10:45:23 -07:00
Jonathan Gray
d50c56f868 aubinator: only use program_invocation_short_name with glibc/cygwin
program_invocation_short_name is a gnu extension.  Limit use of it
to glibc and cygwin and otherwise use getprogname() which is available
on BSD and OS X.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-08 18:37:02 +01:00
Jonathan Gray
2d3ebb474c aubinator: include libgen.h for basename(3)
Include libgen.h for basename as required by posix.
The definition is not found on at least OpenBSD otherwise.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-08 18:37:02 +01:00
Jonathan Gray
0ba9e281fc aubinator: stop using non portable error() function
error() is a gnu extension and is not present on OpenBSD
and likely other systems.

Convert use of error to fprintf/strerror/exit.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-08 18:37:02 +01:00
Adam Jackson
dbda375d6f egl: Fix up indentation on previous commit
This was requested in review but I pushed the wrong version.

Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-09-08 13:21:27 -04:00
Adam Jackson
a279760536 egl: Document why EGL_OPENGL{, _ES}_API are mostly identical
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-09-08 13:19:58 -04:00
Chad Versace
bad80c26e7 anv: Link to libX11-xcb only when unneeded
The Makefile unconditionally linked libX11-xcb into libvulkan_intel.so.
But it's needed only if HAVE_PLATFORM_X11.

Fixes build of libvulkan_intel.so on Chromium OS, which has no X11
libraries.

Fixes: 71258e9462 ("anv/x11: Add support for Xlib platform")
Cc: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-08 09:24:30 -07:00
Tim Rowley
7514e326f8 swr: fixes for format mapping and texture sizing
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-09-08 10:43:21 -05:00
Topi Pohjolainen
b863f4a39a intel/blorp: Allow single slice converter to suppress number of layers
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-08 08:53:45 +03:00
Lionel Landwerlin
0ad84b4366 spirv/nir: Implement OpAtomicLoad/Store for shared variables
Missing bits from 2afb950161.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-07 17:37:37 +01:00
Jason Ekstrand
37763bf446 nir/spirv: Remove an erroneous "fall through" comment 2016-09-07 09:04:34 -07:00
Kyle Brenneman
6e066f76ee EGL: Combine the GL and GLES current contexts (v2)
Only keep track of a single current context, instead of separate
contexts for GL and GLES.

In EGL 1.4 (and 1.5), EGL_OPENGL_API and EGL_OPENGL_ES_API are supposed
to be interchangeable for all purposes except for eglCreateContext.

The _EGLThreadInfo::CurrentContexts array is now a single pointer to the
current context, which may be a GL or GLES context. In addition, it now
keeps track of the current API as an enum instead of an index.

eglMakeCurrent will now replace the current context, regardless of which
client API is used for for the current and new contexts. It no longer
checks for a conflicting context. In addition, calling eglMakeCurrent
with EGL_NO_CONTEXT will now release the current context regardless of
the current API.

v2: Rebased against master (Adam Jackson)

Reviewed-by: Adam Jackson <ajax@redhat.com>
2016-09-07 11:56:48 -04:00
Rob Clark
74b1969d71 gbm: wire up fence extension
v2: make fence extension optional to not break non-i965 classic
    drivers, and move __DRI2_FENCE into core extensions, based
    on comments from Emil

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-07 11:54:00 -04:00
Rob Clark
32c061b110 freedreno: reject imports with bogus pitch
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-07 11:41:38 -04:00
Rob Clark
b4e88b500c gbm: add missing R8 and GR88 formats
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-07 11:30:41 -04:00
Lionel Landwerlin
2afb950161 spirv/nir: Add support for OpAtomicLoad/Store
Fixes new CTS tests :

dEQP-VK.spirv_assembly.instruction.compute.opatomic.load
dEQP-VK.spirv_assembly.instruction.compute.opatomic.store

v2: don't handle images like ssbo/ubo (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-07 11:00:30 +01:00
Marek Olšák
fe40a65fb6 radeonsi: skip redundant INDEX_TYPE writes
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
bdf767dac4 radeonsi: add more unlikely() uses into si_draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
a8e7ea6abc radeonsi: skip draws with instance_count == 0
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Marek Olšák
53d74e055e gallium/radeon/winsyses: fix counting mapped memory
Not all buffers are unmapped explicitly.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-07 11:13:13 +02:00
Ilia Mirkin
8c8874eafb nir: fix definition of pack_uvec2_to_uint
Found by inspection. Untested beyond compilation. This also matches the
logic used in nir_lower_alu_to_scalar.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2016-09-06 22:45:44 -04:00
Ilia Mirkin
c42acd93d4 mesa/formatquery: limit ES target support, fix core context support
First off, as late as ES 3.2, GetInternalformat only supports
RENDERBUFFER and 2DMS(_ARRAY) targets.

Secondly, the _mesa_has_ext helpers are very accurate... a little too
accurate, some might say. If we only show an extension in compat
profiles because core profiles have the functionality guaranteed, they
will return false. Fix these to either check for a core profile
explicitly, or to a different-but-identical extension available in core
profile.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matteo Bruni <matteo.mystral@gmail.com>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-09-06 22:45:44 -04:00
Ilia Mirkin
f654b4983a mapi: add gl32.h to the list of GLES3 headers for installation
This was missed when I added the updated (and new) Khronos headers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-09-06 22:45:44 -04:00
Ilia Mirkin
36347c8d6f main: GL_RGB10_A2UI does not come with GL 3.0/EXT_texture_integer
Add a separate extension check for that format. Prevents glTexImage from
trying to find a matching format, which fails on drivers without support
for this format.

Fixes: sized-texture-format-channels (on a3xx)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2016-09-06 22:41:48 -04:00
Jason Ekstrand
2b18a3f5d3 nir/spirv: Use fill_common_atomic_sources for image atomics
We had two almost identical copies of this code and they were both broken
but in different ways.  The previous two commits fixed both of them.  This
one just unifies them so that it's easier to handle in the future.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-06 17:08:13 -07:00
Jason Ekstrand
f2a10937d8 nir/spirv: Use the correct sources for CompareExchange on images
The CompareExchange operation has two "Memory Semantics" parameters instead
of one so the real arguments start at w[7] instead of w[6].

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-06 17:08:13 -07:00
Jason Ekstrand
0ead7bef6b nir/spirv: Swap the argument order for AtomicCompareExchange
SPIR-V has the two arguments in the opposite order from GLSL.  NIR uses the
GLSL order so we had them backwards.

Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-09-06 17:08:13 -07:00
Tim Rowley
edd688d986 vbo: increase VBO_SAVE_BUFFER_SIZE from 8k to 256k dwords
Increases the performance of legacy geometry-heavy apps
still using display lists.

Performance increase for a targeted testcase is on the
order of 8x, and applications like ParaView 4.x (5.x uses
no longer used display lists) improve by about 10%-20%.

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-06 15:15:11 -05:00
Vinson Lee
215075ae30 glsl: Add positional argument specifiers.
Fix build with Python < 2.7.

  File "./glsl/ir_expression_operation.py", line 360, in get_enum_name
    return "ir_{}op_{}".format(("un", "bin", "tri", "quad")[self.num_operands-1], self.name)
ValueError: zero length field name in format

Fixes: e31c72a331 ("glsl: Convert tuple into a class")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2016-09-06 12:03:30 -07:00
Roland Scheidegger
31a380c8dd util: (trivial) add <stdint.h> include to slab.c
should fix "src/util/slab.c:57:13: error: ‘uint8_t’ undeclared"
2016-09-06 19:47:14 +02:00
Jason Ekstrand
92162dbe32 glsl: Add .gitignore for make check warnings test 2016-09-06 08:32:19 -07:00
Jason Ekstrand
20b2f1ecb9 anv/pipeline: Lower indirect outputs when EmitNoIndirectOutput is set
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-06 08:27:23 -07:00
Rob Herring
244f0aba16 Android: glsl: add rules to generate ir_expression*.h header files
Recent changes to generate ir_expression*.h header files broke Android
builds. This adds the generation rules. This change is complicated due to
creating a circular dependency between libmesa_glsl, libmesa_nir, and
libmesa_compiler. Normally, we add static libraries so that include paths
are added even if there's no linking dependency. That is the case here.
Instead, we explicitly add the include path using $(MESA_GEN_GLSL_H) to
libmesa_compiler. This in turn requires shuffling the order of make
includes. It also uncovered missing dependency tracking of glsl_parser.h.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-06 15:58:55 +01:00
Leo Liu
2593354643 st/omx/dec: enable hevc omx decode support
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
1a534d31fe st/omx/dec/h265: get the reference list for uvd
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
7d63b80728 st/omx/dec/h265: add short term reference picture sets
Specified by subclause 7.3.7

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
fa7c4f151d st/omx/dec/h265: add slice header
Specified by subclause 7.3.6.1

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
a639a2868e st/omx/dec/h265: add picture parameter sets
Specified by subclause 7.3.2.3

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
b3c1583e17 st/omx/dec/h265: add sequence parameter sets
Specified by subclause 7.3.2.2

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
6d186a79f2 st/omx/dec: add initial omx hevc support
Mainly based on the h264 implementation.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:08:01 -04:00
Leo Liu
0c374a7770 st/omx/dec: set dst rect to match src size
When creating interlaced video buffer, hegith set to "template.height =
align(tmpl->height/ array_size, VL_MACROBLOCK_HEIGHT);", and we use
"template.height *= array_size;" for the buffer height, so it actually
aligned with 32. With progressive video buffer it still aligned with 16,
thus causing different height between interlaced buffer and progressive
buffer for 4K (height=2160), and 720p (height=720).

When transcode the video, this will cause the 16 lines corruption
at the bottom of the encode video.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-09-06 10:01:24 -04:00
Marek Olšák
e7a73b75a0 gallium: switch drivers to the slab allocator in src/util 2016-09-06 14:24:04 +02:00
Marek Olšák
761ff40302 util: import the slab allocator from gallium
There are also some cosmetic changes.
2016-09-06 14:24:04 +02:00
Michel Dänzer
dc3bb5db8c loader/dri3: Always use at least two back buffers
This can make a significant difference for performance with some extreme
test cases such as vblank_mode=0 glxgears.

Fixes: 1e3218bc5b ("loader/dri3: Overhaul dri3_update_num_back")
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97549
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-09-06 13:04:48 +09:00
Kenneth Graunke
d0cd504046 glsl: Fix locations of variables in patch qualified interface blocks.
As of commit d82f8d9772, we actually
parse and attempt to handle the 'patch' qualifier on interface blocks.

This patch fixes explicit locations for variables in such blocks.
Without it, many program interface query dEQP/CTS tests hit this
assertion in ir_set_program_inouts.cpp

   if (is_patch_generic) {
      assert(idx >= VARYING_SLOT_PATCH0 && idx < VARYING_SLOT_TESS_MAX);
      bitfield = BITFIELD64_BIT(idx - VARYING_SLOT_PATCH0);
   }

because the location was incorrectly based on VARYING_SLOT_VAR0.

Note that most of the tests affected currently fail before they hit
this, due to confusion about what the program interface query name
of those resources should be.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-05 17:37:55 -07:00
Kenneth Graunke
096ad19a2b mesa: Fix types in _mesa_get_color_read_format().
This is a mesa_format, not a GLenum.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-05 17:37:55 -07:00
Dave Airlie
69fca64259 amd/addrlib: move addrlib from amdgpu winsys to common code
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:33 +10:00
Dave Airlie
1add3562e3 gallium/util: move endian detect into a separate file
This just ports the simpler endian detection bits, addrlib
sharing wants this outside gallium.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:24 +10:00
Dave Airlie
a86be7b6ad radeon: move radeon_family/chip_class defintions to common
This just moves these to a common header file.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:06:04 +10:00
Dave Airlie
f1f1ba3781 radeonsi: move sid.h/r600d_common.h to a common place.
Step one to merging radv would be to move some files around.

This only adds the include path to r600/radeonsi, because later
we want to avoid having to add it to the generic target paths.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-06 10:05:13 +10:00
Marek Olšák
0d7ec8b7d0 gallium/radeon: remove VPORT_ZMIN/ZMAX from init config states
It's part of the viewport state now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
687c4be9cf gallium/radeon: set VPORT_ZMIN/MAX registers correctly
Calculate depth ranges from viewport states and
pipe_rasterizer_state::clip_halfz.

The evergreend.h change is required to silence a warning.

This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
8b0507672e gallium/radeon: unify viewport emission code
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
6c8b76263d radeonsi: also do VS_PARTIAL_FLUSH before updating VGT ring pointers
ported from Vulkan

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
22cb5aecbe radeonsi: fix variable naming in si_emit_cache_flush
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
911202817d radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not used
for less noise in the HUD

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
addca75f4e radeonsi: add HUD queries for counting VS/PS/CS partial flushes
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
1d0593abd7 gallium/radeon: rename the num-cs-flushes query to num-ctx-flushes
num-cs-flushes will mean compute shader flushes

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
1469c70c2a radeonsi: fix a badly implemented GS bug workaround
Limit it to geometry shaders and Hawaii.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
21de3be8e6 radeonsi: fix texture format reinterpretation with DCC
DCC is limited in how texture formats can be reinterpreted using texture
views. If we get a view format that is incompatible with the initial
texture format with respect to DCC, disable DCC.

There is a new piglit which tests all format combinations.
What works and what doesn't was deduced by looking at the piglit failures.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
63da0c991d radeonsi: fix Gather4 with integer formats
The closed compiler does the same thing.

This fixes: GL45-CTS.texture_gather.*-int-* (18 tests)

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
3e756f09d4 radeonsi: fix a crash in imageSize for cubemap arrays
Sometimes it was f32, other times it was i32. Now it's always i32.

This fixes:
GL45-CTS.texture_cube_map_array.image_texture_size.texture_size_compute_sh

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
03708deed2 radeonsi: fix gl_PatchVerticesIn for tessellation evaluation shader
This fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation
.gl_PatchVerticesIn

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
a4fa215058 radeonsi: fix cubemaps viewed as 2D
This fixes: GL43-CTS.texture_view.view_sampling

v2: fix a typo, merge both if statements

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
2975230fdc radeonsi: always use the same function signature for llvm.SI.export
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
1c13c71ef8 radeonsi: return correct eviction stats for NVX_gpu_memory_info
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
f660d1cb21 gallium/radeon: also eliminate DCC fast clear in resource_get_handle
just do what the comment says

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
01dd73f2f4 gallium/radeon: use the current ctx for CMASK elimination in resource_get_handle
For coherency with the current context.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
d22feeaa9d gallium/radeon: use the current ctx for DCC decompression in resource_get_handle
For coherency with the current context.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
0d2e43fcb1 gallium/radeon: derive buffer placement and flags only at initialization
Invalidated buffers don't have to go through it.

Split r600_init_resource into r600_init_resource_fields and
r600_alloc_resource.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Marek Olšák
a14c50bceb radeonsi: set more sampler settings
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-05 18:01:15 +02:00
Emil Velikov
4ea90682ab docs: add news item and link release notes for 12.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-05 16:13:48 +01:00
Emil Velikov
2099d5df97 docs: add sha256 checksums for 12.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 614fb93a6d)
2016-09-05 16:12:08 +01:00
Emil Velikov
f541530bbc docs: add release notes for 12.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2fc6a31f10)
2016-09-05 16:12:07 +01:00
Marek Olšák
b012a13af5 noop: implement resource_get_handle
X+DRI3 locks up if the returned handle is invalid.
2016-09-05 16:12:04 +02:00
Marek Olšák
1c71bccdaa noop: set missing functions 2016-09-05 16:12:04 +02:00
Marek Olšák
ed164f0d6b noop: simplify some functions 2016-09-05 16:12:04 +02:00
Emil Velikov
62b224d428 glx/glvnd: list the strcmp arguments in correct order
Currently, due to the inverse order, strcmp will produce negative result
when the needle is towards the start of the haystack. Thus on the next
iteration(s) we'll end up further towards the end and eventually fail to
locate the entry.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-09-05 11:59:07 +01:00
Jason Ekstrand
821e366385 nir/tests: Update the CF tests to not assume fake edges
In aad4f1550, we removed the concept of "fake" edges from NIR.  Now, if you
have a block at the end of an infinite loop it really has no predecessors.
This updates the unit tests to match.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97587
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-09-04 20:44:59 -07:00
Ilia Mirkin
61e978524a gk110/ir: fix quadop dall emission
We recently starting to always emit the NDV (== dall) bit for quadops.
However it was folded into the wrong code word.

Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-09-04 18:28:29 -04:00
Mauro Rossi
98f734e758 android: intel: fix include paths in new "common" library
Fixes building error in libmesa_intel_common static library

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-03 20:03:16 -07:00
Ilia Mirkin
ca313e00b6 a3xx: use window scissor to simulate viewport xy clip
Unfortunately a3xx does not have a separate disable for depth clipping,
so when depth clamp is enabled, we disable the whole 3d clipper logic.
This in turn also gets rid of the xy clip that it would normally do.
When we detect this would happen, instead we integrate the viewport into
the window scissor. This may have slightly different behavior around
wide points, but it's unlikely that anything depends on this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-09-03 19:58:42 -04:00
Ilia Mirkin
83d7230fd5 a3xx: make use of software clipping when hw can't handle it
The hw clipper only handles up to 6 UCPs. If there are more than 6 UCPs,
or a clip vertex, or clip distances are in use, then we must use the
fallback discard-based clipping from the frag shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-09-03 19:58:42 -04:00
Ilia Mirkin
dac72234c7 a3xx: make sure to actually clamp depth as requested
We were previously ... not clamping. I guess this meant that everything
got clamped to 1/0, which was enough to pass the existing tests. Or
perhaps the clamping would only happen to the rasterized depth value and
not the frag shader's output depth value.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-09-03 19:58:42 -04:00
Karol Herbst
ae7eb93e6c nvc0/ir: allow min/max instructions to be dual-issued in pairs
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=60000 /width=1024 /height=640:

inst_executed: 1.03G
inst_issued1: 614M -> 580M
inst_issued2: 213M -> 230M

score: 1021 -> 1030

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-03 13:53:09 -04:00
Jason Ekstrand
7e891f90c7 anv: Move cmd_buffer_config_l3 into anv_cmd_buffer.c
This is the only remaining part of genX_l3.c and there's really no good
reason for it to be in its own file.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
17968e2dfd anv/cmd_buffer: Move emit_lri and emit_lrm higher up
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
42d03c204c anv: Refactor pipeline l3 config setup
Now that we're using gen_l3_config.c, we no longer have one set of l3
config functions per gen and we can simplify a bit.  Also, we know that
only compute uses SLM so we don't need to look for it in all of the stages.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
6448c0e324 anv: Leverage the shared L3$ config code
When Jordan first implement L3$ configuration for Vulkan, he copied+pasted
from the GL driver because we had no good place to share it.  Now that we
have src/intel/common, we should be sharing these tables.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
49981891f7 intel: Pull the guts of gen7_l3_state.c into a shared helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
979d0aca62 intel: Rename brw_get_device_name/info to gen_get_device_name/info
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:07 -07:00
Jason Ekstrand
527f371999 intel: s/brw_device_info/gen_device_info/
Generated by:

sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:06 -07:00
Jason Ekstrand
55364ab5b7 intel: Add a new "common" library for more code sharing
The first thing to go in this new library is brw_device_info.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-03 08:23:06 -07:00
Mauro Rossi
4218c32166 intel/blorp: fix typo in android makefile
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-03 08:22:53 -07:00
Timothy Arceri
1692228a38 nir: remove unused variable
This was let over from aad4f15506

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-09-03 20:30:19 +10:00
Connor Abbott
356d101af3 nir: remove some fields from nir_shader_compiler_options
I accidentally added these with 0dc4cab. Oops!
2016-09-03 00:49:58 -04:00
Connor Abbott
c62b58c216 nir: fix bug with moves in nir_opt_remove_phis()
In 144cbf8 ("nir: Make nir_opt_remove_phis see through moves."), Ken
made nir_opt_remove_phis able to coalesce phi nodes whose sources are
all moves with the same swizzle. However, he didn't add the logic
necessary for handling the fact that the phi may now have multiple
different sources, even though the sources point to the same thing. For
example, if we had something like:

if (...)
   a1 = b.yx;
else
   a2 = b.yx;
a = phi(a1, a2)
... = a

then we would rewrite it to

if (...)
   a1 = b.yx;
else
   a2 = b.yx;
... = a1

by picking a random phi source, which in this case is invalid because
the source doesn't dominate the phi. Instead, we need to change it to:

if (...)
   a1 = b.yx;
else
   a2 = b.yx;
a3 = b.yx;
... = a3;

Fixes 12 CTS tests:
ES31-CTS.functional.tessellation.invariance.outer_edge_symmetry.quads*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-03 00:37:48 -04:00
Connor Abbott
0dc4cabee2 nir: add nir_after_phis() cursor helper
And re-implement nir_after_cf_node_and_phis() using it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-03 00:37:48 -04:00
Ilia Mirkin
64a69059ce glsl: expose max atomic counter/buffer consts for tess in ES 3.2
Curiously OES/EXT_tessellation_shader leave these out, while ES 3.2 adds
them in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-09-03 00:26:36 -04:00
Ilia Mirkin
8122e30aec mapi: don't forget to expose GetPointerv in GL ES 3.2
I left this out of my previous commit that went around enabling all of
the other ES 3.2 entrypoints.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-09-03 00:26:36 -04:00
Ilia Mirkin
346de79ffd main: add KHR_robustness to ES 3.2 extension requirements
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-09-03 00:26:36 -04:00
Ilia Mirkin
163a029eba nv50,nvc0: respect render condition enable flag when clearing rt/zs
This is a newly added flag. We always pass false into it from
nv50_clear_texture, but other callers may want to respect the render
condition. (And the functions were originally spec'd to respect it.)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-03 00:01:07 -04:00
Karol Herbst
d0cf7a6beb nvc0/ir: don't dual-issue ops that depend or interfere with each other
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: rewrite to split up the helpers and move more logic to target]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-03 00:01:06 -04:00
Jason Ekstrand
aad4f15506 nir: Remove fake edges in the CF handling code
When NIR was first introduced, Connor added this fake-edge hack to work
around issues related to unreachable blocks.  Thanks to GLSL IR's jump
lowering code, the only unreachable code you can have is a block after an
infinite loop.  With SPIR-V, we didn't have the jump lowering code so we
could also end up with the "if (...) { break; } else { continue; }" case
which generates an unreachable block after the if.  Because of this, most
of NIR had to be fixed up for handling unreachable blocks.  The only
remaining case of not handling unreachable blocks was specifically the
block-after-infinite-loop case in dead_cf which was fixed by the previous
commit.  We can now delete the fake edge hack.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-09-02 11:24:09 -07:00
Jason Ekstrand
9a4d76e534 nir/dead_cf: Don't crash on unreachable after-loop blocks
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-09-02 11:24:09 -07:00
Samuel Pitoiset
ea7b475968 nvc0: reduce the initial code segment size to 512KB
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:39 +02:00
Samuel Pitoiset
6557058827 nvc0: allow to resize the code segment dynamically
When an application uses a ton of shaders, we need to evict them
when the code segment is full but this is not really a good solution
if monster shaders are used because code eviction will happen a lot.

To avoid this, it seems better to dynamically resize the code
segment area after each eviction. The maximum size is arbitrary
fixed to 8MB which should be enough.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:35 +02:00
Samuel Pitoiset
96e21ad763 nvc0: add a new bin for the code segment
To avoid the bins list to grow up indefinitely when the code segment
size will be bumped, we need to separate that bin from the SCREEN
one because it contains other resources like the uniform bo.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:31 +02:00
Samuel Pitoiset
63ac80879e nvc0: add nvc0_screen_resize_text_area() helper
This function will be helpful for resizing the code segment
area when we need to evict all shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:28 +02:00
Samuel Pitoiset
3d928d9082 nvc0: re-upload currently bound shaders after code eviction
This fixes a very old issue which happens when the code segment
size is full. A bunch of real applications like Tomb Raider,
F1 2015, Elemental, hit that issue because they use a ton of shaders.

In this case, all shaders are evicted (for freeing space) but all
currently bound shaders also need to be re-uploaded and SP_START_ID
have to be updated accordingly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:25 +02:00
Samuel Pitoiset
34883626d1 nvc0: refactor the program upload process
This refactoring will help for fixing the "out of code space"
eviction issue because we will need to reupload the code for
all currently bound shaders but it's slightly different than
uploading a new fresh code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-09-01 21:25:17 +02:00
Jordan Justen
49c24d8a24 i965: fix noop_scissor range issue on width/height
If scissor X or Y was set to a negative value then the previous
code might have indicated noop scissors when the scissor range
actually was masking a portion of the framebuffer.

Since fb->_Xmin, _Xmax, _Ymin and _Ymax take scissors into
account, we can use these to test for a noop scissor.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-09-01 11:45:13 -07:00
Kenneth Graunke
9c562956f9 glsl: Only force varyings to be flat when varying packing.
Varying packing would like to mark certain variables as flat.
This works as long as both sides of the interfaces are changed
accordingly.  However, with SSO, we disable varying packing on
the outermost stages.  We also disable varying packing for
certain tessellation stages.

With SSO, we operate on the producer and consumer separately.
Checks based on the consumer stage and variable are risky, and
can easily lead to altering one half of the interface between
stages, breaking SSO pipeline IO validation.

Just stop monkeying around with interpolation modes unless
required for varying packing.  There's no point.  This also
disables it in unsafe SSO cases.

Fixes CTS tests:
*.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_MaxPatchVertices_Position_PointSize

Also fixes Piglit's spec/oes_geometry_shader/sso_validation:
- user-defined-gs-input-not-in-block.shader_test
- user-defined-gs-input-in-block.shader_test

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-09-01 11:24:17 -07:00
Kenneth Graunke
72b56e8b1a glsl: Reject TCS/TES input arrays not sized to gl_MaxPatchVertices.
We handled the unsized case, implicitly sizing arrays to the value
of gl_MaxPatchVertices.  But if a size was present, we failed to
raise a compile error if it wasn't the value of gl_MaxPatchVertices.

Fixes CTS tests:

  *.tessellation_shader.compilation_and_linking_errors.
  {tc,te}_invalid_array_size_used_for_input_blocks

Piglit's tcs-input-read-nonconst-* tests have recently been fixed.
This patch will break older copies of those tests, but the latest
should continue working.  Update to Piglit 75819c13af2ed5.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-09-01 11:07:07 -07:00
Frank Binns
2f3154f464 wayland-drm: add missing NULL check
Although malloc is unlikely to fail check its return value nevertheless.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-01 15:48:52 +01:00
Frank Binns
d5f65b8bf5 loader: fix sysfs uevent file parsing
When trying to get a device name for an fd using sysfs, it would always fail
as it was expecting key/value pairs to be delimited by '\0', which is not the
case.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-01 15:48:34 +01:00
Frank Binns
d6f669ba83 egl: only store device name when Wayland support is built
The device name is only needed for WL_bind_wayland_display so make this clear
by only storing the device name when Wayland support is built.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-09-01 15:47:58 +01:00
Lionel Landwerlin
2dc6930a5a isl: round format alignment to nearest power of 2
A few inline asserts in anv assume alignments are power of 2, but with
formats like R8G8B8 we have odd alignments.

v2: round up to power of 2 (Ilia)

v3: reuse util_next_power_of_two() from gallium/aux/util/u_math.h (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-01 11:36:09 +01:00
Thomas Hellstrom
fc6be40011 gallium/postprocess: Fix resource freeing
The code was triggering asserts in DEBUG builds of the SVGA driver since
the reference count of the resource was never decremented before destroy.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-09-01 07:59:49 +02:00
Ilia Mirkin
e3db415456 st/mesa: expose OES_geometry_shader and OES_texture_cube_map_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-31 20:12:55 -04:00
Eric Engestrom
3bd885d09c Introduce .editorconfig
A few weeks ago, Jose Fonseca suggested [0] we use .editorconfig files
to try and enforce the formatting of the code, to which Michel Dänzer
suggested [1] we start by importing the existing .dir-locals.el
settings. The first draft was discussed in the RFC [2].

These .editorconfig are a first step, one that has the advantage of
requiring little to no intervention from the devs once the settings
files are in place, but the settings are very limited. This does have
the advantage of applying while the code is being written.
This doesn't replace the need for more comprehensive formatting tools
such as clang-format & clang-tidy, but those reformat the code after
the fact.

[0] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121545.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121639.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123431.html

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-08-31 17:06:54 -07:00
Eric Anholt
509e2dbc10 vc4: Add missing break statement.
This opcode isn't used yet, so it didn't affect anything.  Caught by
Coverity, reported to me by imirkin.
2016-08-31 17:06:54 -07:00
Brian Paul
c87e8c8515 gallium/docs: clarify render_condition_enabled parameter to clear functions
If false, it means do the clear unconditionally.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-31 15:51:06 -06:00
Jason Ekstrand
b8bff0823b mesa: Add some more .gitignore 2016-08-31 13:45:27 -07:00
Matt Turner
90eaf01616 i965: Pass start_offset to brw_set_uip_jip().
Without this, we would pass over the instructions in the SIMD8 program
(which is located earlier in the buffer) when brw_set_uip_jip() is
called to handle the SIMD16 program.

The assertion about compacted control flow was bogus: halt, cont, break
cannot be compacted because they have both JIP and UIP. Instead, we
should never see a compacted instruction in this code at all.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-31 13:11:27 -07:00
Kenneth Graunke
bea048752e i965: Merge gen7_clip_state atom into gen6_clip_state atom.
The original motivation was that gen6_clip_state ignored _NEW_POLYGON
as it didn't care about early culling.  The only other change was that
Gen6 ignored BRW_NEW_TES_PROG_DATA as it doesn't have tessellation
shaders, but listening to this is harmless as it'll never be signalled.

Now that we've added _NEW_POLYGON for is_drawing_lines/points, we can
merge the two as the distinction is meaningless.

This actually fixes a bug, though: Gen8+ was using the gen6_clip_state
atom because it doesn't care about early culling, but it also needs
BRW_NEW_TES_PROG_DATA, which was missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-31 12:42:09 -07:00
Kenneth Graunke
4c116cbafb i965: Use gs_prog_data in is_drawing_points/lines().
State upload code should use prog_data rather than poking at core
Mesa shader data structures wherever possible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-31 11:50:15 -07:00
Kenneth Graunke
cd19db4ee6 i965: Fix missing dirty bits related to is_drawing_points/lines.
calculate_attr_overrides() uses is_drawing_points(), which depends
on tessellation and geometry program state, as well as polygon state.

v2: Add missing _NEW_POLYGON as well.  Caught by Iago Toral.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-31 11:50:15 -07:00
Samuel Pitoiset
3df8615dcd nvc0: remove an attempt at uploading all IMMD into a CB
This has never been used because info->immd.bufSize is always 0
and anyways this is an experimental code which has never been
completed.

This gets rid of some unused code in the program validation process.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-31 19:05:16 +02:00
Samuel Pitoiset
b2f3d50ca7 nv50: remove unused nv50_program::immd_size field
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-31 19:05:13 +02:00
Ilia Mirkin
6118bcab4e nv30: set usage to staging so that the buffer is allocated in GART
The code a few lines below expects to migrate the bo in question to
VRAM. Since we're filling the initial data via CPU, it's more efficient
to create the temporary buffer in GART. There is no "push" method
implemented, otherwise we'd use that instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-08-31 10:28:33 -04:00
Frank Binns
5505845945 egl/x11_dri3: provide an authentication function
To support WL_bind_wayland_display an authentication function needs to be
provided but this was not being done for this platform as it's not strictly
necessary. However, as this isn't an optional function there's the potential
for a segfault to occur if authentication is mistakenly performed. Protect
against this by providing a function that prints an error.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-08-31 15:10:14 +02:00
Frank Binns
4c28c916ef egl/x11_dri3: disable WL_bind_wayland_display for devices without render nodes
Up until now, DRI3 was only used for devices that have render nodes, unless
overridden via an environment variable, with it falling back to DRI2 otherwise.
This limitation was there in order to support WL_bind_wayland_display as it
requires client opened device node fds to be authenticated, which isn't possible
when using DRI3. This is an unfortunate compromise as DRI3 provides security
benefits over DRI2.

Instead, allow DRI3 to be used for devices without render nodes but don't
advertise WL_bind_wayland_display in this case. Applications that need this
extension can still be run by disabling DRI3 support via the LIBGL_DRI3_DISABLE
environment variable.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-08-31 15:09:12 +02:00
Jose Fonseca
55e417222f scons: Fix MinGW cross compilation.
The generated GLSL header files were only being built for the host
platform, and not the target platform.

Trivial.
2016-08-31 12:18:34 +01:00
Ilia Mirkin
8caf2cb0c0 nv30: only bail on color/depth bpp mismatch when surfaces are swizzled
The actual restriction is a little weaker than I originally thought. See
https://bugs.freedesktop.org/show_bug.cgi?id=92306#c17 for the
suggestion. This also explain why things weren't *always* failing
before, only sometimes. We will allocate a non-swizzled depth buffer for
NPOT winsys buffer sizes, which they almost always are.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-08-31 01:17:55 -04:00
Kenneth Graunke
d82f8d9772 glsl: Handle patch qualifier on interface blocks.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 22:09:36 -07:00
Ilia Mirkin
a0b1260fe0 i965: enable OES_primitive_bounding_box with the no-op implementation
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 21:31:30 -04:00
Ilia Mirkin
bf47b2bf88 st/mesa: provide the null implementation of bounding box outputs in tcs
Until hardware appears (in a gallium driver) that can make use of the
TCS-outputted gl_BoundingBox, we just request that the variable gets
assigned as a regular patch variable.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 20:25:15 -04:00
Ilia Mirkin
891d7e3c9e glsl: add gl_BoundingBox and associated varying slots
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 20:25:15 -04:00
Ilia Mirkin
10663c648e mesa: add support for GL_PRIMITIVE_BOUNDING_BOX storage and query
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 20:25:15 -04:00
Ilia Mirkin
3b81c998a2 mesa: add scaffolding for OES/EXT_primitive_bounding_box
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 20:25:15 -04:00
Ilia Mirkin
5ce0969df2 docs: add GL_OES_viewport_array to features
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 20:25:15 -04:00
Timothy Arceri
64a48efb9e aubinator: fix if indentation and add brackets to multiline body
Fixes misleading indentation warning in gcc.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-31 10:19:45 +10:00
Francisco Jerez
6df215d97e i965/fs: Assert that the number of color targets is one when dual-source blend is enabled.
Requested by Anuj during review of
4a87e4ade7, adding as follow-up since it
led to assertion failures due to various GLSL bugs that should be
fixed now.
2016-08-30 16:54:19 -07:00
Francisco Jerez
fd04d048ae glsl: Fix gl_program::OutputsWritten computation for dual-source blending.
In the fragment shader OutputsWritten is a bitset of FRAG_RESULT_*
enumerants, which represent the location of each color output written
by the shader.  The secondary and primary color outputs of a given
render target using dual-source blending have the same location, so
the 'idx' computation below will give the wrong bit as result if the
'var->data.index' term is non-zero -- E.g. if the shader writes the
primary and secondary colors of the FRAG_RESULT_COLOR output,
ir_set_program_inouts will think that the shader writes both
FRAG_RESULT_COLOR and FRAG_RESULT_SAMPLE_MASK, which is just bogus.

That would cause the brw_wm_prog_key::nr_color_regions computation
done in the i965 driver during fragment shader precompilation to be
wrong, which currently leads to unnecessary recompilation of shaders
that use dual-source blending, and triggers an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 16:54:19 -07:00
Francisco Jerez
965934f38a glsl: Fix incorrect hard-coded location of the gl_SecondaryFragColorEXT built-in.
gl_SecondaryFragColorEXT should have the same location as gl_FragColor
for the secondary fragment color to be replicated to all fragment
outputs.  The incorrect location of gl_SecondaryFragColorEXT would
cause the linker to mark both FRAG_RESULT_COLOR and FRAG_RESULT_DATA0
as being written to, which isn't allowed by the spec and would
ultimately lead to an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.

This should also fix the code below for multiple dual-source-blended
render targets, which no driver currently supports but we have plans
to enable eventually in the i965 driver (the comment saying that no
hardware will ever support it seems rather hilarious).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 16:54:19 -07:00
Francisco Jerez
342f945b13 st/glsl_to_tgsi: Use SecondaryOutputsWritten to determine dual-source fragment outputs.
Currently the mesa state tracker relies on there being two bits set
per dual-source output in the gl_program::OutputsWritten bitset, but
that only worked due to a GLSL front-end bug that caused it to set the
OutputsWritten bit for both location and location+1 even though at the
GLSL level the primary and secondary color outputs used for
dual-source blending have the same location.  Fix it by extending
outputMapping[] to 2*FRAG_RESULT_MAX elements in order to represent a
mapping from a (location, index) pair to its TGSI output, which should
also make it slightly easier to add support for dual-source blending
in combination with multiple render targets in the long run.

No Piglit regressions on llvmpipe.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 16:54:19 -07:00
Francisco Jerez
cb4b38af41 glsl: Calculate bitset of secondary outputs written in ir_set_program_inouts.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 16:54:18 -07:00
Ian Romanick
c011d7d900 glsl: Fix typo in comment
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
aee9ab7de7 glsl: Replace most assertions with unreachable()
text	   data	    bss	    dec	    hex	filename
7669233	 277176	  28624	7975033	 79b079	i965_dri.so before generated code
7647081	 277176	  28624	7952881	 7959f1	i965_dri.so before this commit
7669289	 277176	  28624	7975089	 79b0b1	i965_dri.so with this commit

Looking at the generated assembly, it appears that some of changes made
in the generated code prevent some loops from being unrolled.  Removing
the default cases (via unreachable()) allows these loops to unroll again.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
dd574be54c glsl: Refactor handling of horizontal operations
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
d6e73150a4 glsl: Use constant_template_horizontal instead of constant_template_horizontal_single_implementation for unops
This changes the "shape" of all the pack and unpack operators, but they
should function the same.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
822b5c5eb2 glsl: Eliminate constant_template2
constant_template_common can now handle the case where the result type
is different from the input type by using type_signature_iter.  This
changes the "shape" of all the cast-style operators, but they should
function the same.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
abc81f7883 glsl: Eliminate constant_template5
constant_template_common can now handle the case where the result type
is different from the input type by using type_signature_iter.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
53c54a6c73 glsl: Eliminate constant_template0
This template is mostly an artefact of the development of the original
patch series and to minimize the differences between the original code
and the generated code.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
ddb4b53de3 glsl: Eliminate one of the templates for simpler operations
The difference between these two templates were mostly an artefact of
the development of the original patch series and to minimize the
differences between the original code and the generated code.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
ee3cdac785 glsl: Use the generated constant expression code
Immediately previous to this patch,

    diff -wud src/glsl/ir_constant_expression.cpp \
              src/glsl/ir_expression_operation_constant.h

should be "minimal."

v3: With much help from José Fonseca, fix the SCons build.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
f3fcfe001f glsl: Generate code for constant ir_triop_csel expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:03 -07:00
Ian Romanick
2761190baa glsl: Generate code for constant ir_triop_lrp expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
6e09c8715d glsl: Generate code for constant ir_quadop_vector expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
f8e185a65f glsl: Generate code for constant ir_quadop_bitfield_insert expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
4d8ac28b20 glsl: Generate code for constant ir_triop_vector_insert expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
9f1d7c5235 glsl: Generate code for constant ir_binop_vector_extract expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
d8dd49419a glsl: Generate code for constant ir_binop_mul expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
8954a019f7 glsl: Generate code for constant ir_triop_fma and ir_triop_bitfield_extract expressions
ir_triop_bitfield_extract is a little weird because the second and third
operand and aways int, so they may differ in type from the first
operand.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
da61c94db8 glsl: Generate code for constant ir_binop_dot expressions
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
13106e1041 glsl: Generate code for constant ir_binop_lshift and ir_binop_rshift expressions
The code generated is quite different from what was previously used.  I
believe that it is still correct by the GLSL spec, and I believe, due to
C rules about shifts, the behavior will be the same.

Section 5.9 (Expressions) of the GLSL 4.50 spec says:

    The result is undefined if the right operand is negative, or greater
    than or equal to the number of bits in the left expression's base
    type.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
90da8bf547 glsl: Generate code for constant ir_binop_ldexp expressions
ldexp is weird because its two operands have different types.  Add
support for directly specifying the exact signatures of all the possible
variations of an operation.

v2: Use tuple() instead of () for clarity.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
0f87c54d1c glsl: Generate code for constant unary expressions that don't assign the destination
These are operations like the pack functions that have separate
functions that assign multiple outputs from a single input.

v2: Correct the source and destination types.  They were previously
transposed.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
8cf9157786 glsl: Generate code for some constant binary expression that are horizontal
Only operations where the implementation is identical code regardless of
type.  The only such operations are ir_binop_all_equal and
ir_binop_any_nequal.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
d5bfe6b9c4 glsl: Generate code for constant unary expression that are horizontal
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
8f5357b1d6 glsl: Generate code for constant expressions that have an output type the differs from the input types
v2: Remove extra int() cast in find_lsb.  Suggested by Matt.  'for (a,
b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
74e335c762 glsl: Generate code for constant binary expressions that combine vector and scalar operands
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:02 -07:00
Ian Romanick
f81b1c7fa7 glsl: Generate code for constant binary expressions that have one operand type
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
598929aee7 glsl: Generate code for constant unary expression that have different implementations for each source type
v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
aa9f4fc53e glsl: Generate code for constant unary expression that map one type to another
ir_unop_i2b is omitted because its source can either be int or uint.
That makes it special.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
3fcb6b85c0 glsl: Begin generating code for the most basic constant expressions
Unary operations where all of the supported types use the same C
expression to evaluate them.

v2: 'for (a, b) in d' => 'for a, b in d'.  Suggested by Dylan.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
e31c72a331 glsl: Convert tuple into a class
This makes things a little more clear now, and it will make future
changes... possible.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
6ef27003ac glsl: Compact a bunch of things onto one line
Even though they are much too long for that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
0cef8c683e glsl: Sort constant expression handling by IR operand enum value
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
8d54b5f756 glsl: Trivial whitespace and punctuation changes
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
fd2dabbb9f glsl: Sort GLSL type enums in switch-statements in enum order
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
13ef8c46b8 glsl: Always use correct float types in constant expression handling
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
ea05a72258 glsl: Extract ir_quadop_bitfield_insert implementation to a separate function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
fe153309a8 glsl: Extract ir_triop_bitfield_extract implementation to a separate function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
54ec6e1b8b glsl: Extract ir_binop_ldexp implementation to a separate function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
6d5fe1815c glsl: Use find_msb_uint to implement ir_unop_find_lsb
(X & -X) calculates a value with only the least significant bit of X
set.  Since there is only one bit set, the LSB is the MSB.

v2: Remove extra int() cast.  Suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:01 -07:00
Ian Romanick
5c24750a49 glsl: Extract ir_unop_find_msb implementation to a separate function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
d75034b3a2 glsl: Extract ir_unop_bitfield_reverse implementation to a separate function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
4b0606e0a7 glsl: Use _mesa_bitcount to implement constant ir_unop_bit_count
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
f4af9f36e7 glsl: Delete spurious comment about mod not taking integer operands
This hasn't been true since we added support for GLSL 1.30.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
d6ad3e2dd9 glsl: Delete spurious comment about updating ir_expression::get_num_operands
This hasn't been necessary since 007f48815.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
dc41d998f2 glsl: Do not generate comments or extra whitespace in expression files
The comments and whitespace can live in the Python code.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
c6e8fd82ea glsl: Just access the ir_expression_operation strings table directly
The operator_string functions gave us some protection against a
malformed table.  Now that the table is generated from the same data
that generates the enum, this is not a concern.  Just cut out the middle
man.

   text	   data	    bss	    dec	    hex	filename
7531892	 273992	  28584	7834468	 778b64	i965_dri-64bit-before.so
7531828	 273992	  28584	7834404	 778b24	i965_dri-64bit-after.so

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
fb44f69779 glsl: Generate ir_expression_operation_strings.h from Python
'diff -ud' is clean.

v2: Massive rebase.

v3: With much help from José Fonseca, fix the SCons build.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
90781eee4d glsl: Pull operator_strs out to its own file
No change except to the copyright symbol.  The next patch will generate
this file with Python, and Unicode + Python = pure rage.

v2: Massive rebase.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
140ec58a07 glsl: Generate the ir_last_* values
This ensures that they remain correct if the list is rearranged or new
opcodes are added.  I checked a diff of before and after to ensure that
each ir_last_ had the same value.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:00 -07:00
Ian Romanick
7d6af9e599 glsl: Generate ir_expression_operation.h from Python
There are differences in where end-of-line comments are placed, but
'diff -wud' is clean.

v2: Massive rebase.

v3: With much help from José Fonseca, fix SCons build.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2016-08-30 16:28:00 -07:00
Jason Ekstrand
10f9901bce anv: Rework pipeline caching
The original pipeline cache the Kristian wrote was based on a now-false
premise that the shaders can be stored in the pipeline cache.  The Vulkan
1.0 spec explicitly states that the pipeline cache object is transiant and
you are allowed to delete it after using it to create a pipeline with no
ill effects.  As nice as Kristian's design was, it doesn't jive with the
expectation provided by the Vulkan spec.

The new pipeline cache uses reference-counted anv_shader_bin objects that
are backed by a large state pool.  The cache itself is just a hash table
mapping keys hashes to anv_shader_bin objects.  This has the added
advantage of removing one more hand-rolled hash table from mesa.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97476
Acked-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
6899718470 anv: Add a struct for storing a compiled shader
This new anv_shader_bin struct stores the compiled kernel (as an anv_state)
as well as all of the metadata that is generated at shader compile time.
The struct is very similar to the old cache_entry struct except that it
is reference counted and stores the actual pipeline_bind_map.  Similarly to
cache_entry, much of the actual data is floating-size and stored after the
main struct.  Unlike cache_entry, which was storred in GPU-accessable
memory, the storage for anv_shader_bin kernels comes from a state pool.
The struct itself is reference-counted so that it can be used by multiple
pipelines at a time without fear of allocation issues.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Acked-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
13c09fdd0c anv: Add pipeline_has_stage guards a few places
All of these worked before because they were depending on prog_data to be
null.  Soon, we won't be able to depend on a nice prog_data pointer and
it's nice to be more explicit anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
b259d86ad6 anv: Remove unused fields from anv_pipeline_bind_map
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
d5945bec12 anv/pipeline: Properly handle OOM during shader compilation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
a0f5c496e3 anv/allocator: Correctly set the number of buckets
The range from ANV_MIN_STATE_SIZE_LOG2 to ANV_MAX_STATE_SIZE_LOG2 should
be inclusive and we have asserts that ensure that you never try to allocate
a state larger than (1 << ANV_MAX_STATE_SIZE_LOG2).  However, without
adding 1 to the difference, we allocate 1 too few bucckts and so, even
though we have an assert, anything landing in the last bucket will fail to
allocate properly..

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
4200c2266e anv/pipeline: Fix bind maps for fragment output arrays
Found by inspection.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Jason Ekstrand
d316cec1c1 anv/descriptor_set: memset anv_descriptor_set_layout
We hash this data structure so we can't afford to have uninitialized data
even if it is just structure padding.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-30 15:08:23 -07:00
Eric Engestrom
d5899b3010 docs/helpwanted: fix GL3.txt/features.txt link
Fixes: f926cf5bd0 ("docs: Rename GL3.txt to features.txt")

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CC: Andreas Boll <andreas.boll.dev@gmail.com>
2016-08-30 14:38:57 -07:00
Eric Engestrom
aac91fffae anv/wayland: fix assert typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-30 13:47:51 -07:00
Eric Engestrom
4e68bb620f anv/meta: fix unreachable() typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-30 13:47:51 -07:00
Eric Engestrom
b0acebd41f st/nine: fix unreachable() typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-30 13:47:46 -07:00
Eric Engestrom
e2627e34ba glsl: fix unreachable() typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-30 13:47:42 -07:00
Eric Engestrom
352f0d9180 get_reviewer.pl: fix mesa check
This script was broken for the last few days and I couldn't figure out why.
Turns out it was checking for the existence of a file that got renamed,
so rename it in here too.

Fixes: f926cf5bd0 ("docs: Rename GL3.txt to features.txt")
CC: Ian Romanick <ian.d.romanick@intel.com>
CC: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-30 16:44:00 -04:00
Kenneth Graunke
6699403651 glsl: Initialize outputs[] array in lower_blend_equation_advanced.
Caught by Coverity.  Likely fixes real issues if an output component
is not present.

CID: 1372278
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-30 13:11:00 -07:00
Samuel Pitoiset
6820f75c91 nvc0: fix indentation in nvc0_screen_init()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 18:42:02 +02:00
Samuel Pitoiset
0fc3b7c88e nvc0: check return value of nvc0_screen_resize_tls_area()
While we are at it, make it static and change the return values
policy to be consistent.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 18:41:59 +02:00
Samuel Pitoiset
b489ac88f6 nvc0: make use of FAIL_SCREEN_INIT in nvc0_screen_create()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 18:41:57 +02:00
Samuel Pitoiset
e0a067ed48 nv50/ir: always emit the NDV bit for OP_QUADOP
This silences a divergent error found with F1 2015.

Basically, the NDV bit has to be set when a FSWZ instruction is
inside divergent code, but it's not needed otherwise. The correct
fix should be to set it only in divergent code situations.

GM107 emitter already sets that bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-08-30 18:41:46 +02:00
Jason Ekstrand
9514c5a30f intel/blorp: Inline get_vs_entry_size into emit_urb_config
Topi asked to have the prefix removed because there's nothing gen7 about
it.  However, now that everything is in a single file, there is no good
reason to have it split out into a helper function anyway.  Let's just put
the contents in emit_urb_config and call it a day.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-30 09:24:50 -07:00
Tim Rowley
175052507c swr: [rasterizer] add archrast instrumentation
Statistics measurement system
2016-08-30 10:32:36 -05:00
Emil Velikov
5de640a518 i915: Check return value of screen->image.loader->getBuffers
Ported from the i965 commit e7ab358e81.

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-08-30 14:50:47 +01:00
Emil Velikov
4f5f9575d0 egl/android: remove config post-processing
No longer needed as of last commit, since we no longer add OPENGL to the
ClientAPIs thus, RenderType and Conformant don't have the desktop GL
bit set.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
2016-08-30 14:50:28 +01:00
Emil Velikov
03eaa6c596 egl/dri2: check if the EGL API is valid before adding it to ClientAPIs
In the rather unlikely case that the API is considered invalid, don't
add it to the (supported) ClientAPIs bitmask.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>

---
Strictly speaking we only need this in the Android case for OpenGL.
Adding it everywhere doesn't hurt us since the compiler will const
propagate and optimise/remove these.
2016-08-30 14:50:10 +01:00
Emil Velikov
4472b6e469 egl/android: annotate static const data as such
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
2016-08-30 14:50:08 +01:00
Emil Velikov
7563c39641 egl: treat EGL_OPENGL_API as invalid on Android
At the moment one can use OpenGL in eglBindAPI() only to clear the
EGL_OPENGL_BIT from RenderableType and Conformant for _each_ config.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
2016-08-30 14:49:24 +01:00
Ilia Mirkin
a165e5cb7c nouveau: make color/depth bpp match for pre-nv10 chips
This avoids generating fbconfigs whose winsys framebuffers will be
incomplete (see nouveau_check_framebuffer_complete).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 00:21:42 -04:00
Ilia Mirkin
357d8261f1 nouveau: always enable at least one RC
Experimentally, this is required for glxgears and others to display the
proper colors. This is also what the code used to do before the
referenced commit.

Fixes: c703658b39 (mesa: Drop _EnabledUnits.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-08-30 00:21:42 -04:00
Ilia Mirkin
91681302d0 nouveau: allow NV3x's to be used with nouveau_vieux
NV34 and possibly other NV3x hardware has the capability of exposing the
NV25 graph class. This allows forcing nouveau_vieux to be used instead
of the gallium driver, primarily for testing purposes. (Among other
things, NV2x only ever came as AGP or inside an Xbox, never PCI/PCIe).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 00:21:42 -04:00
Ilia Mirkin
ab0917311f nvc0: undo overzealous enum usage
Commit 7413625ad3 flipped a few functions too many to use
pipe_shader_type. These functions actually take an integer that does not
correspond 1:1 with the enum.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-30 00:17:54 -04:00
Brian Paul
ec16a5b091 svga: fix a texture readback bug
Backing views/surfaces are used to handle the case when a resource is
bound both as a render target and as a sampler source (such as when
doing auto mipmap generation).

This patch fixes a bug where mapping a resource (to do a glReadPixels)
was reading the stale data in the original surface rather than the
backing surface which was rendered to.

We need to propagate the backing resource (which we rendered to) back
to the original resource before we read from it.  The problem was the
svga_propagate_rendertargets() function was examining the wrong surface
views.

This fixes the "poc9" test described in VMware bug 1686661.
Also tested with Piglit, Cinebench, Lightsmark, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-29 17:46:50 -06:00
Brian Paul
646afc6ff7 svga: move surface propagation code into new function
Put new svga_propagate_rendertargets() function where all the other
surface propagation code lives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-29 17:46:50 -06:00
Brian Paul
b9b88516f8 mesa: fix format conversion bug in get_tex_rgba_uncompressed()
We need to set the need_convert flag with each loop iteration, not
just when the rgba pointer is null.

Bug reported by Markus Müller <mueller@imfusion.de> on mesa-users list.
Fixes new piglit arb_texture_float-get-tex3d test.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-29 17:46:50 -06:00
Dave Airlie
f235dc08ac radeonsi: add support for cull distances. (v1.1)
This should be all that is required for cull distances to work
on radeonsi.

v1.1: whitespace cleanup, add docs fix clipdist_mask usage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-30 09:35:56 +10:00
Timothy Arceri
5025e88703 spirv: replace assert with unreachable
Fixes uninitialised warning for coord_components.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-08-30 09:29:26 +10:00
Jason Ekstrand
f4314d06e8 isl/state: Add some asserts about format capabilities
This keeps invalid surface states from leaking through and potentially
hanging the GPU.  We shouldn't actually be hitting this on a regular basis,
but a helpful assert is better than a hang.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
87214414fd intel/blorp: Add a format parameter to blorp_fast_clear
This allows us to use the actual render format as opposed to the texture
format.  I don't know that the hardware actually cares in the case of fast
clears, but it certainly seems more correct.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
348509269e i965: Move blorp into src/intel/blorp
At this point, blorp is completely driver agnostic and can be safely moved
into its own folder.  Soon, we hope to start using it for doing blits in
the Vulkan driver.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
8bd35d8bd2 i965/blorp: Remove the remaining brw prefixes from the blorp.h API
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
3e46f11409 i965/blorp: Use isl_format_get_depth_format for setting depth formats
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
555b22a446 i965: Move the type_size function declartaions to brw_nir.h
Signed-of-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
007d8a6d04 i965: Move get_fast_clear_rect to blorp_clear.c
This has been the only caller since we deleted the meta fast clear code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
c8ff36228d i965: Roll brw_get_ccs_resolve_rect into blorp_ccs_resolve
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
12a2fe5389 i965/blorp: Get rid of most brw and mesa includes
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
87a1cb6979 i965: Move the hiz_op enum to blorp
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
db95a8108f i965/blorp: Add a fast_clear_op enum
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
71dc2e0106 i965/blorp: Make blorp_addres::buffer a void*
The Vulkan driver doesn't use libdrm so we don't want to bake that in.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
2191f5cb7e i965/blorp: Get rid of brw_context
This commit switches all of blorp from taking a brw_context to taking a
blorp_context and, where useful, a void *batch.  In the GL driver, we only
have one active batch at a time so the brw_context *is* the batch but in
Vulkan, batch will point to the anv_cmd_buffer in which we are building
instructions.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
99b9e9b86e i965/blorp: Take a blorp_context in compile_nir_shader
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
a818a32244 i965/meta_util: Take an isl_device in get_fast_clear_rect
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
bc159ff0f7 i965/blorp: Add an "exec" function pointer to blorp_context
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
cea360a708 i965/blorp: Remove some i965-isms from genX_blorp_exec.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
cf14b52478 i965/blorp: Move the guts of brw_blorp_exec into genX_blorp_exec.c
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
28ae664e3b i965/blorp: Pull the guts of blorp_exec into a driver-agnostic header
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
9a842c61fe i965/blorp/exec: Refactor to use a new blorp_batch struct
This gets rid of brw_context throughout the core of the state setup code.
Instead, it is replaced with blorp_batch which contains a pointer to the
blorp_context and a void* that the driver can use for its own blorp data.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
4e7bddf8a3 i965/blorp: Add a helper for allocating binding tables and surface states
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
8a39069dfe i965/blorp: Use BT_INDEX enums for setting up the binding table
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
1367af159e i965/blorp: Shorten binding table index enum names
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
da2a078deb i965/blorp/genX: Add a blorp_surface_reloc helper
Previously, we passed the buffer address (as per the latest offset from the
kernel) to ISL to use when it filled out the surface state.  We then called
drm_intel_bo_emit_reloc() to add the relocation to the list.  The newly
added blorp_surface_reloc helper adds the relocation to the list and then
writes the buffer address directly into the surface state.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
ac08bc8ac2 i965/blorp: Use blorp_address in brw_blorp_surface instead of bo+offset
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
33cc1f6bb4 i965/blorp: Pull emit_surface_state into genX_blorp_exec.c
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
6d2f8f8f5f i965/blorp: Add driver mocs settings to the context
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
9c380b639f i965/blorp/genX: Move emit_urb_config into another helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
28991c9601 i965/blorp: Use gen6_upload_urb
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
7ecbb9bada i965/gen6: Refactor gen6_upload_urb
This splits it into two functions very similar to gen7_upload_urb.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
3e4b43d11d i965/blorp/genX: Pull emit_3dstate_multisample into a helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
becd434d14 i965/blorp/genX: Add helpers for allocating various bits of state
This pulls most of the brw-specific bits into helpers with generic names.
Later, those will become the driver hooks for generic code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
600446ccc7 i965/blorp: Expose the shader cache through function pointers
This sanitizes blorp's access to the i965 driver's shader cache by patching
it through the blorp_context.  When we start using blorp in Vulkan, we will
simply have to implement such a caching interface in the Vulkan driver.

Note: In my first attempt at this, I simplified it down to a single
upload_shader entrypoint and implemented the caching inside of blorp.  This
doesn't work, however, because the i965 driver will, on occation, dump its
entire cache and start over.  When this happens, blorp needs to be able to
recompile its shaders and re-upload them.  It's easiest to just expose the
caching interface.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-29 12:17:34 -07:00
Jason Ekstrand
a14d1b63ce i965/blorp: Add a blorp_context struct and init/finish funcs
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-29 12:17:34 -07:00
Mauro Rossi
cd18bbeef3 android: intel: Flatten the makefile structure
Android porting of commit bebc1a1 "intel: Flatten the makefile structure"

Automake approach was followed, by moving makefiles a level up,
naming them Android.genxml.mk and Android.isl.mk,
performing the necessary adjustments to the paths,
adding src/intel/Android.mk and fixing mesa top level makefile.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-29 12:17:34 -07:00
Jan Vesely
083746bc48 clover: Use device cap to query pointer size instead of hardcoded 32bits
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97513
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-29 14:40:15 -04:00
Jan Vesely
c7af84968d gallium: add cap to export device pointer size
v2: document the new cap
v3: fix 80 char limit in screen.rst

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-29 14:40:15 -04:00
Brian Paul
f5602c27ec svga: s/unsigned/enum pipe_shader_type/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-29 12:40:45 -06:00
Jordan Justen
5e76baa2ad i965/hsw: Enable ARB_ES3_1_compatibility extension
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-08-29 11:23:08 -07:00
Rhys Kidd
b1b7e921f8 r600g: Clean up defined magic numbers for TGSI opcodes
Small code clean up that removes magic numbers where a TGSI
opcode has been defined.

No functional change expected as each opcode is unsupported on
the respective hardware.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: James Harvey <lothmordor@gmail.com>
2016-08-29 11:03:20 -07:00
Rhys Kidd
d4cb3ee95c r600g: Avoid duplicated initialization of TGSI_OPCODE_DFMA
As reported by Clang, TGSI_OPCODE_DFMA (defined magic number 118) is
currently initialized twice for Cayman and Evergreen.

When Jan Vesely added double precision FMA opcode it did make sense
to locate it immediately after TGSI_OPCODE_DMAD, although this is
out of order.

This change cleans up the prior magic number definition and ensures
any later reordering of this struct will not create problems.

Prior change was:

  commit 015e2e0fce
  Author: Jan Vesely <jan.vesely@rutgers.edu>
  Date:   Sat Jul 2 16:14:54 2016 -0400

      r600g: Add double precision FMA ops

      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
      Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")

      Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
      Tested-by: James Harvey <lothmordor@gmail.com>
      Signed-off-by: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: James Harvey <lothmordor@gmail.com>
2016-08-29 11:03:20 -07:00
Rhys Kidd
8ba1fd339c i915g: Fix typo in i915_translate_instruction()
Noticed this error in a debug message whilst reviewing
https://bugs.freedesktop.org/show_bug.cgi?id=97477

This patch doesn't go towards fixing that bug, but at
least may clarify future debug output.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-29 11:03:20 -07:00
Eric Anholt
60bed14d0f vc4: Handle discards while in control flow.
I missed this while adding loop support because the discard test inside a
loop was crashing before, anyway.  Fixes piglit glsl-fs-discard-04.
2016-08-29 11:03:11 -07:00
Eric Anholt
b9a74fbec7 vc4: Mark when we add discards while lowering blend state. 2016-08-29 10:57:04 -07:00
Eric Anholt
a99d70d105 nir: Update shader info when adding discards
vc4 is about to start using the shader info field to set up discard
handling.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-29 10:56:59 -07:00
Tim Rowley
fa8f87132a swr: [rasterier core] fix GetRasterizerFunc selection
Only rasterize scissor edges if one or more scissor/viewport
rects are not hottile aligned.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:36 -05:00
Tim Rowley
8e41a65fc5 swr: [rasterizer core] whitespace cleanup
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:30 -05:00
Tim Rowley
cc7f655177 swr: [rasterizer jitter] reimplement SCATTERPS
Implement SCATTERPS as a dynamic loop based on mask set bits
instead of a static compile time loop.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:23 -05:00
Tim Rowley
c7e21183a1 swr: [rasterizer core] upper left rule for scissors
Fixes upper left rule for scissors and viewport/scissor macrotile alignment.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:15 -05:00
Tim Rowley
e54df2c7e4 swr: [rasterizer scripts] undef DEFINE_KNOB after usage
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:10 -05:00
Tim Rowley
a4efbd14d3 swr: [rasterizer core] minor cleanup to thread initialization
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:42:04 -05:00
Tim Rowley
7472a8ee75 swr: [rasterizer core] remove KNOB_MAX_THREADS
Use dynamic memory allocation for per-thread data

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:41:58 -05:00
Tim Rowley
9e4a482d46 swr: [rasterizer core] track guardbands per viewport rect
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:41:51 -05:00
Tim Rowley
b473bec878 swr: [rasterizer core] per-primitive viewports/scissors
- use per-primitive viewports throughout the pipeline.
- track whether all available scissor rects are tile aligned.
  Causes failures, so not taken into account when choosing rasterizer yet.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-29 12:41:16 -05:00
Tom Stellard
63ed11cde9 radeonsi: Don't use global variables for tess lds
We were allocating global variables for the maximum LDS size
which made the compiler think we were using all of LDS, which
isn't the case.

Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-29 16:36:46 +00:00
Roland Scheidegger
f48ccb8c07 softpipe: (trivial) honor render_condition_enabled for clear_rt/clear_ds 2016-08-29 18:15:08 +02:00
Roland Scheidegger
c5d7624e1d llvmpipe: (trivial) honor render_condition_enabled for clear_rt/clear_ds 2016-08-29 18:14:49 +02:00
Kai Wasserbäch
4c53267b8f gallium: Use enum pipe_shader_type in set_shader_images()
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 09:07:37 -06:00
Kai Wasserbäch
15fe288dea gallium: Use enum pipe_shader_type in set_shader_buffers()
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 09:07:33 -06:00
Kai Wasserbäch
532db3b788 gallium: Use enum pipe_shader_type in set_sampler_views()
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 09:07:25 -06:00
Kai Wasserbäch
7413625ad3 gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)
v1 → v2:
 - Fixed indentation (noted by Brian Paul)
 - Removed second assert from nouveau's switch statements (suggested by
   Brian Paul)

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 08:45:48 -06:00
Marek Olšák
ed24d79ed7 gallium/radeon: clear dirty_level_mask when discarding CMASK
This fixes: GL45-CTS.texture_barrier.*

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-08-29 14:23:58 +02:00
Marek Olšák
d301efb400 tgsi/scan: remember sampler view types
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 14:16:57 +02:00
Nayan Deshmukh
5f0ea3db16 st/vdpau: use temporary buffers while applying filters
Use temporary buffers so that we don't read and write to the
same surface at the same time. We don't need to use linear
layout now.

v2: rebase the patch against reverted change

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-29 11:23:56 +02:00
Christian König
77e4424106 st/vdpau: Revert "change the order in which filters are applied(v3)"
This reverts commit 09dff7ae2e.

Turned out this can cause some artifacts in the output. Let's revert
it for now until we have sorted out all issues.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
2016-08-29 11:23:51 +02:00
Iago Toral Quiroga
9c9f45b824 i965/vec4: remove the generator hack for dual instanced GS
This hack was introduced in commit 03ac2c7223:
i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs

Specifically to fixup the code we emitted to deal with gl_PointSize inputs
in dual instance mode, where we were emitting a MOV to copy the point
size from .w (where the hardware delivers it) to .x (because code will
expect this to be a float). This meant that we were emitting a MOV
to an ATTR destination that could have a width of 4 (in dual instanced
mode) so it was necessary to fix the execution size and regioning of the
instruction.

Fortunately, Ken fixed this in 67c5d00273:
i965/vec4/gs: Stop munging the ATTR containing gl_PointSize.

by using a WWWW swizzle instead of a MOV, and as the commit log in that
patch states, we no longer emit instructions with ATTR destinations, so
that makes the fixup code in the generator unnecessary.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-29 08:09:09 +02:00
Timothy Arceri
22cec6dc5e glsl: initialise pointer to NULL
Fixes uninitialised warning and covery defect.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-29 13:13:42 +10:00
Ilia Mirkin
6a5504de2f Update Khronos-supplied headers to r33100
As retrieved from opengl.org and khronos.org. Maintained the APPLE hack
in GL/glext.h manually. Added gl32.h.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-08-28 21:41:47 -04:00
Ilia Mirkin
d49a231c33 mesa: add EXT_texture_cube_map_array support
This is identical to OES_texture_cube_map_array support. dEQP has tests
which use this extension. Also it is part of AEP.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-28 21:38:55 -04:00
Ilia Mirkin
4ec1c2bb7f mesa: remove OES_shader_io_blocks enable
This extension should just be available whenever ES 3.1 is available.
With the new extension verification infrastructure, it will only be
enable-able on a #version 310 es shader, rendering the original reason
for having a separate enable moot.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-28 21:38:55 -04:00
Ilia Mirkin
89e95d15f9 main: use KHR_blend_equation_advanced enable for ES 3.2 availability
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-08-28 21:38:55 -04:00
Ilia Mirkin
05b37e20de main: add missing EXTRA_END in OES_sample_variables get check
Fixes: 3002296cb6 (mesa: add GL_OES_shader_multisample_interpolation support)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-08-28 21:38:55 -04:00
Jose Fonseca
09dafb9630 scons: Take indirect gl_and_es_API.xml dependencies in consideration.
Same as 26a8f76ba1.

Trivial.
2016-08-27 22:59:06 +01:00
Ilia Mirkin
5b18e5fd7b docs: sort extensions in relnotes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-27 17:51:44 -04:00
Jason Ekstrand
fb89551047 isl: Allow multisampled array textures
This probably isn't the only thing that needs to be done to get
multisampled array textures working in Vulkan but I think this is all that
ISL really needs and it does fix 8 of the new CTS tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-26 19:00:02 -07:00
Ian Romanick
cf7be70aa7 mesa/version: OpenGL ES 3.2 depends on OES_texture_cube_map_array
This has a separate enable from ARB_texture_cube_map_array.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
b387bc90c8 i965: Enable OES_texture_cube_map_array on Gen8+
These are the only platforms that current expose OES_geometry_shader.
Once OpenGL ES 3.1 and OES_geometry_shader are enabled on Gen7, this
extension can be enabled there as well.

Gen6 will never get OpenGL ES 3.1, so it will never get this
extension... even though it has the desktop OpenGL extension.  Alas.

NOTE: This causes a failure on Gen8+ platforms in
ES3-CTS.gtf.GL3Tests.texture_storage.texture_storage_texture_targets.
The test only fails because it doesn't know that 0x9009 is a valid
value when the extension exists.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
dc4f53b683 mesa: Add support for OES_texture_cube_map_array
This has a separate enable flag because this extension also requires
OES_geometry_shader.  It is possible that some drivers may support
OpenGL ES 3.1 and ARB_texture_cube_map but not support
OES_geometry_shader.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
87fa462ffd mesa: Add and use _mesa_has_texture_cube_map_array helper
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
66b988d09a mesa: Use _mesa_has_ARB_texture_cube_map_array instead of open-coding it
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
daf1a61e11 mesa: Cosmetic changes in legal_texobj_target
Use bool instead of GLboolean and constify ctx.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
d79c950eeb mesa: Rearrange legal_texobj_target to look more like _mesa_legal_get_tex_level_parameter_target
This makes it a bit easier to add support for more features in different
APIs.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
ef5bad09c4 glsl: Add and use has_texture_cube_map_array helper
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
c879dbc4e4 glsl: Mark cube map array sampler types as reserved in GLSL ES 3.10
All the GLSL 4.x keywords were added to the list of reserved keywords
in GLSL ES 3.10.  As far as I can tell, these are the only ones that
were missed.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
8fb4af7789 glsl: Silence unused parameter warning
glsl/lower_buffer_access.cpp:324:55: warning: unused parameter ‘var’ [-Wunused-parameter]
                                          ir_variable *var,
                                                       ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
63af53dcd3 i965: Enable GL_OES_geometry_shader on Gen8+
Gen7 can get this extension (and GL_OES_shader_io_blocks) as soon as the
rest of OpenGL ES 3.1 is enabled.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
259fc50545 glsl/linker: Fail linking on ES if uniform precision qualifiers don't match
When GL_OES_geometry_shader is enabled, this fixes
dEQP-GLES31.functional.shaders.linkage.geometry.uniform.rules.type_mismatch_1.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:15 -07:00
Ian Romanick
06201e4f1a glsl: Allow invocations layout qualifier with GL_OES_geometry_shader
Fixes

dEQP-GLES31.functional.geometry_shading.instanced.geometry_1_invocations
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_2d_array
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_3d
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_cubemap
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_2d_array
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_3d
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_cubemap
dEQP-GLES31.functional.geometry_shading.query.geometry_shader_invocations

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:14 -07:00
Ian Romanick
3a0ae7b55c glsl: Allow gl_InvocationID and gl_Layer with GL_OES_geometry_shader
Fixes

dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_array
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_3d
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_cubemap

v2: Don't enable gl_ViewportIndex in GLSL ES 3.20.  Noticed by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:14 -07:00
Ian Romanick
1a72fbf9e6 mesa: Allow GL_EXT_geometry_shader and GL_EXT_geometry_point_size
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:14 -07:00
Ian Romanick
658e90f9a8 mesa: Document reasons for allowing XFB drawing modes in GLES 3.1 w/GL_OES_geometry_shader
Originally this patch added the checks to allow the draw calls with XFB,
but commit 2dabd497 beat me to it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:14 -07:00
Ian Romanick
aa228eb1a6 mesa: Remove redundant _mesa_has_shader_subroutine
The checks in _mesa_has_shader_subroutine are slightly different than
_mesa_has_ARB_shader_subroutine, but they're not different in a way
that matters.  The only way to have ctx->Version >= 40 is if
ctx->Extensions.ARB_shader_subroutine is set.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-26 15:03:14 -07:00
Ian Romanick
0115f356ee nouveau: Enable EXT_texture_env_dot3 on NV10 and NV20
GL_DOT3_RGB_EXT and GL_DOT3_RGBA_EXT. are nearly identical to
GL_DOT3_RGB and GL_DOT3_RGBA.  The only difference is the _EXT
versions do not apply the post-scale.  Just smash logscale to 0 so
that RC_OUT_SCALE_1 is always used.

NOTE: I have not actually tested this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-26 15:03:14 -07:00
Ian Romanick
a7d92c3c0b nouveau: Fix non-1x post-scale factor with DOT3 combiner
Fixes long standing bug on NV10 and NV20 where using a non-1x RGB or A
post-scale with GL_DOT3_RGB or GL_DOT3_RGBA texture environment would
not work.

The old combiner math uses HALF_BIAS_NORMAL and HALF_BIAS_NEGATE.  The
GL_NV_register_combiners defines these as

    HALF_BIAS_NORMAL_NV       max(0.0, e) - 0.5
    HALF_BIAS_NEGATE_NV       -max(0.0, e) + 0.5

In order to get the correct result from the dot-product, the
intermediate dot-product must be multiplied by 4.  This is a literal
implementation of the GL_ARB_texture_env_dot3 spec.  It also requires
using the register combiner post-scale.  As a result, the post-scale
cannot be used for the post-scale set by the application.

The new combiner math uses EXPAND_NORMAL and EXPAND_NEGATE.  The
GL_NV_register_combiners defines these as

    EXPAND_NORMAL_NV          2.0 * max(0.0, e) - 1.0
    EXPAND_NEGATE_NV          -2.0 * max(0.0, e) + 1.0

Since this fully expands the value to [-1, 1] range, the intermediate
dot-product result is the desired value.  This leaves the register
combiner post-scale available for application use.

NOTE: I have not actually tested this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-26 15:03:14 -07:00
Ian Romanick
f926cf5bd0 docs: Rename GL3.txt to features.txt
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-08-26 15:03:14 -07:00
Ian Romanick
8cd5c3cfe7 docs: Update GL3.txt for OpenGL 4.x on i965-ish hardware
v2: Note that GL_KHR_blend_equation_advanced and
GL_KHR_blend_equation_advanced_coherent are done.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-26 15:03:14 -07:00
Nicholas Bishop
d0a4c36dd6 docs: add links to clarify patch mailing section
* Changed "Mesa mailing list" to "mesa-dev mailing list" to clarify
  which list patches should be sent to

* Added an explicit link to
  https://lists.freedesktop.org/mailman/listinfo/mesa-dev to show
  where to subscribe to the list

* Added a link to https://git-scm.com/docs/git-send-email to help new
  users of that command

v2: add signed-off-by

Signed-off-by: Nicholas Bishop <nicholasbishop@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-26 14:54:26 -07:00
Brian Paul
ea33df7b58 svga: minor whitespace, etc clean-ups in svga_pipe_misc.c
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
8433b43337 svga: move some code in svga_propagate_surface()
Move computation of zslice, layer inside the conditional where they're
used.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
1a10b37ac3 svga: simplify surface propagation code in svga_set_framebuffer_state()
Rewrite the comment too.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
bb7f094b37 svga: add some comments in the svga_surface struct
Give more info about backing resources/surfaces.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
dcf63339e7 svga: use new svga_check_sampler_framebuffer_resource_collision()
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
ff500ed5a1 svga: add new svga_check_sampler_framebuffer_resource_collision()
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
d3d20d650d svga: remove assertions in svga_surface cast wrappers
We don't do this for other cast wrappers.  And this will simplify some
code at call sites.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
c6e89fa215 svga: minor code simplification in svga_texture_transfer_unmap()
Use the tex variable instead of using svga_texture() again.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
fe5a2704ec svga: reformat some expressions in svga_texture_transfer_map()
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
10ef6ddcf9 svga: remove duplicated variable in svga_texture_transfer_map()
tex was already declared at the function body scope.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:19 -06:00
Brian Paul
09d2780b39 svga: move some assignments in svga_texture_transfer_map()
Put near other assignments to the svga_transfer variable.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:18 -06:00
Brian Paul
4a52512666 svga: minor simplifications in svga_texture_transfer_map()
Use local vars instead of jumping through a pointer.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:18 -06:00
Brian Paul
088dd8f45e svga: minor reformatting of svga_texture() cast wrapper
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:18 -06:00
Brian Paul
e206f67261 svga: rewrite svga_buffer() cast wrapper
To make it symmetric with the svga_texture() cast wrapper.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:18 -06:00
Brian Paul
c72dcd9a71 svga: remove local variable in create_backed_surface_view()
To simplify the code a bit.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-26 14:20:18 -06:00
Kenneth Graunke
bc13e5f42a docs: Add GL_KHR_blend_equation_advanced to relnotes. 2016-08-26 13:17:22 -07:00
Mario Kleiner
2cc880cba5 r600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+
This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.

It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.

Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!

This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.

All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.

The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-08-26 19:57:21 +02:00
Jordan Justen
7970238fcf docs: Update stencil texturing & ES 3.1 status for i965 Haswell
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
93f5eb7ae7 i965: Enable OpenGLES 3.1 for Haswell
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
116b6e12d4 i965: Enable ARB_texture_stencil8 for Haswell
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
f20f616324 i965: Enable ARB_stencil_texturing for Haswell
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
751682434e i965/gen7: Use R8_UINT stencil copy when sampling the stencil texture
v2:
 * Check gen <= 7, rather than gen == 7. (Ian)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
8d78b096f8 i965/gen7: Copy stencil when sampling the stencil texture
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
7af51b8f03 i965: Add function to copy a stencil miptree to an R8_UINT miptree
v2:
 * Cleanups suggested by Ian, Matt and Topi

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
c8194dc737 i965: Track that the stencil data was updated when using Tex*Image
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
101b56bab2 i965: Track that the stencil data was updated when rendering
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
7bd87c1e6e i965: Track that the stencil data was updated when clearing
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
2a9c65a01d i965/gen7: Add R8_UINT stencil miptree copy for sampling
For gen < 8, we can't sample from the stencil buffer, which is
required for the ARB_stencil_texturing extension. We'll make a copy of
the stencil data into a new texture that we can sample using the
R8_UINT surface type.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
91627d1956 i965: Fix assert with multisampling and cubemaps
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
b82bb98441 i965/hsw: Adjust uploading default color for stencil surfaces
v2:
 * has_component (Ken); const bits_per_channel (Topi)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
30fee52036 i965/hsw: Don't advertise more than 64 threads for compute shaders
thread_width_max in the GPGPU walker command limits us to a maximum of
64 threads.

This fixes a crash on Haswell in the OpenGLES 3.1 conformance test
suite which tests the advertised limits of the max invocation counts.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
861c9cbee3 main: Add MESA_VERBOSE=api support for glClearStencil
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Jordan Justen
9a1f950bef main: Add MESA_VERBOSE=api support for glTexImage
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 10:09:22 -07:00
Charmaine Lee
0035f7f136 svga: add guest statistic gathering interface
This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 08:04:02 -06:00
Marek Olšák
49c798e902 radeonsi: disable CE on SI + AMDGPU
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
281f1a5980 winsys/amdgpu: disable IB chaining on SI
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
a6869e7c06 winsys/amdgpu: finish up SI addrlib integration
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-26 15:50:10 +02:00
Ronie Salgado
97b55243fb winsys/amdgpu: initial SI support
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-26 15:50:10 +02:00
Marek Olšák
971ef7518f gallium/radeon: add a driver query for AMDGPU_INFO_NUM_EVICTIONS
If the kernel driver doesn't support it, it returns 0.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
7172906c0c radeonsi: fix printing shaders and states on a VM fault
This was missed while rewriting the PIPE_DUMP flags.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
5ee3cac138 radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.

$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...

$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...

Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS

With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS

The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.

SI will get this once we add SDMA blit support for it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
0241d8300f radeonsi: enable SDMA on CIK
It passes R600_DEBUG=testdma on Bonaire/radeon.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
bcfd49e511 gallium/radeon: increase priority for shader binaries
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Marek Olšák
c3f716fe67 gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags
there's no reason to separate these

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-26 15:50:10 +02:00
Miklós Máté
b9ac72b511 vbo: set draw_id
Fixes conditional jump depending on uninitialized value
in si_state_draw.c:593

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 07:34:22 -06:00
Neha Bhende
10f6e08549 svga: fix regression related to srgb
This regression is caused because of commit 3190c7ee97
Regression caused by following OpenGL 4.4 spec rules relates to
GL_FRAMEBUFFER_SRGB in Mesa.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:52 -06:00
Neha Bhende
3b7341d547 svga: use local variable blit instead of pointer
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:52 -06:00
Brian Paul
b09e4ab13c svga: s/INDEX_0D/INDEX_IMMEDIATE32/
Both are zero, but the later is the right token.
2016-08-26 06:19:52 -06:00
Brian Paul
93779b87a1 svga: add comment about unsupported blend modes 2016-08-26 06:19:52 -06:00
Charmaine Lee
b1772651b7 svga: fix ordering of mksstats counter strings
String for SVGA_STATS_COUNT_TEXREADBACK was swapped
with the string for SVGA_STATS_COUNT_SURFACEWRITEFLUSH.

Trivial fix.
2016-08-26 06:19:52 -06:00
Charmaine Lee
2781d60375 svga: avoid emitting redundant SetShaderResource command
Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:52 -06:00
Charmaine Lee
5313b294e6 svga: add a cleanup function to clean up sampler state
This patch adds a cleanup function to clean up sampler state at
context destruction time.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:52 -06:00
Brian Paul
e292f38c6c svga: loosen the condition to flush in get_query_result_vgpu10()
Fixes piglit spec/ext_transform_feedback/overflow-edge-cases segfaults
because the query's fence pointer was null.

Tested with Piglit, Sauerbraten, ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:52 -06:00
Brian Paul
99d8fe20ab svga: fix vgpu10 query fencing
We don't want to flush the command buffer or sync on the fence when ending
a query (that kind of defeats the whole purpose of async queries).  Do that
instead in get_query_result().

Tested with Piglit, arbocclude, Sauerbraten game, Nobel Clinician Viewer,
ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:52 -06:00
Charmaine Lee
3f51a3f6ac svga: avoid emitting redundant DXSetSamplers command
This patch avoid emitting redundant DXSetSamplers command.

Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:52 -06:00
Neha Bhende
6a43148e20 svga: enable ARB_clear_texture extension in the driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:52 -06:00
Neha Bhende
2111795d51 svga: define svga_clear() in svga_init_clear_functions()
Put all the clearing related functions in svga_init_clear_functions()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:51 -06:00
Neha Bhende
40557ae07c svga: add svga_init_clear_functions()
define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:51 -06:00
Neha Bhende
52d88b67be svga: add new function svga_clear_texture()
To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:51 -06:00
Neha Bhende
1da538f85b svga: add new begin_blit()
Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-26 06:19:51 -06:00
Charmaine Lee
a5fd54f8bf svga: add opt to the list of valid build types
For opt build, add VMX86_STATS to the list of cpp defines.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:51 -06:00
Charmaine Lee
2e1cfcc431 svga: add guest statistic gathering interface
With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.

The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:51 -06:00
Charmaine Lee
4791991808 svga: fix indirect non-indexable temp access
If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.

Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-26 06:19:51 -06:00
Brian Paul
d221a6545c gallium/hud: move signo declaration inside PIPE_OS_UNIX block
To silence unused var warning with MSVC, MinGW.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-26 06:19:51 -06:00
Chris Wilson
f92a87a140 i965: Embrace "unlimited" GTT mmap support
From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.

A couple of caveats:

 - fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)

 - the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.

See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").

v2: Include some commentary on mmap virtual space vs CPU addressable
space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-26 09:09:34 +01:00
Tobias Klausmann
bc5be5323f mesa/main: Fix missing return in non void function
This was found by obs:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function main/program_resource.c:109

v2: Remove the ! on the string (Ian Romanick)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-26 08:46:03 +02:00
Kenneth Graunke
219a451497 i965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.
We always use a coherent read, and ignore the "opt out" enable flag.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
1bf9b2a600 mesa: Implement GL_KHR_blend_equation_advanced_coherent.
This adds the extension enable (so drivers can advertise it) and the
extra boolean state flag, GL_BLEND_ADVANCED_COHERENT_KHR, which can
be set to request coherent blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
c2b10cabed i965: Enable GL_KHR_blend_equation_advanced on G45 and later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
40241d40d0 i965: Disable hardware blending if advanced blending is in use.
We'll do blending in the shader in this case, so just disable the
hardware blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
8ab50f5dd1 glsl: Add a lowering pass to handle advanced blending modes.
Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader.  This lowering pass implements
all the necessary math for advanced blending.  It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.

This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.

Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()

Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.

Very little driver specific code should be required.

v2: Handle multiple output variables per render target (which may exist
    due to ARB_enhanced_layouts), and array variables (even with one
    render target, we might have out vec4 color[1]), and non-vec4
    variables (it's easier than finding spec text to justify not
    handling it).  Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
    can add our blending epilogue (caught by Francisco Jerez).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
e299661166 compiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.
This will be used for emulating GL_KHR_advanced_blend_equation features
in shader code.  We'll pass in the blending mode that's in use, and use
that in (effectively) a switch statement in the shader.

v2: Use the new _AdvancedBlendMode field.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
acf57fcf7f mesa: Add draw time validation for advanced blending modes.
v2: Add null checks (requested by Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:10 -07:00
Kenneth Graunke
75ae338d14 mesa: Restyle _mesa_check_blend_func_error().
I'm about to add more error conditions to this function, so I wanted to
move the current spec citation above the code that checks it.  Indenting
it required reformatting, so I tried to move it to our newer style.

While there, I also decided to drop some GL type usage, and drop the
unnecessary "_mesa_" prefix on a static function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Kenneth Graunke
0745e039a2 mesa: Track the current advanced blending mode.
This will be useful for a number of things:
- Checking the current advanced blending mode against the shader's
  blend_support_* qualifiers.
- Disabling hardware blending when emulating advanced blending.
- Uploading the current advanced blending mode as a state var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Kenneth Graunke
74837e3e91 mesa: Allow advanced blending enums in glBlendEquation[i].
Don't allow them in glBlendEquationSeparate[i], though, as required
by the spec.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Kenneth Graunke
80df3c030e glsl: Merge blend_support qualifiers when linking.
Since each qualifier represents a blending mode the shader can be used
with, we take the union of all possible modes when linking.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Ilia Mirkin
4b6819b407 glsl: process blend_support_* qualifiers
v2 (Ken): Add a BLEND_NONE enum value (no qualifiers in use).
v3 (Ken): Rename gl_blend_support_qualifier to gl_advanced_blend_mode.
v4 (Ken): Mark map[] as static const (Ilia).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Ilia Mirkin
e682f94594 glsl: add basic KHR_blend_equation_advanced infrastructure
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Ilia Mirkin
3b0406457a mesa: add KHR_blend_equation_advanced enable and extension string
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Ilia Mirkin
a8ae1bc767 glapi: add KHR_blend_equation_advanced dispatch
v2 (Ken): Fix enum values, drop _mesa_BlendBarrierKHR stub as Curro has
          already implemented it.
v3 (Ken): Rework for _mesa_BlendBarrierKHR -> _mesa_BlendBarrier rename.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Kenneth Graunke
1a1f4496c6 mesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.
Note that _mesa_BlendBarrierMESA is not currently hooked up in the
glapi XML, so we can just rename it.  We'll hook it up for the
KHR_blend_equation_advanced extension shortly.

We may as well use the ES 3.2 core name with no suffixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-25 19:22:09 -07:00
Kenneth Graunke
c2fd6b0f5d i965: Safely iterate the predecessors of the end block.
We want to insert code in each of the predecessors of the end block.
This code includes a nir_if, which would split the block, altering
the set.  To avoid that, I emitted a dead constant at the end of each
block before splitting it, so that the set of predecessors remained
unchanged.  This was admittedly ugly.

Connor suggested instead saving a copy of the set, so we can iterate
it safely.  This is also a little ugly, but a much better plan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 19:18:24 -07:00
Kenneth Graunke
3203fe3d50 nir: Use nir_shader_get_entrypoint in TCS quad workaround code.
We want to insert the code at the end of the program.  Looping over
all the functions (of which there was only one) was the old way of doing
this, but now we have nir_shader_get_entrypoint(), so let's use it.

Suggested by Connor Abbott.

v2: Update for nir_shader_get_entrypoint API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 19:18:24 -07:00
Kenneth Graunke
93bfa1d7a2 nir: Change nir_shader_get_entrypoint to return an impl.
Jason suggested adding an assert(function->impl) here.  All callers
of this function actually want ->impl, so I decided just to change
the API.

We also change the nir_lower_io_to_temporaries API here.  All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally.  Folding this change in
avoids the need to change it and change it back.

v2: Fix one call I missed in ir3_compiler (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 19:18:24 -07:00
Kenneth Graunke
8479b03c58 nir: Make nir_lower_io_to_temporaries store an impl internally.
This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function.  The next patch will change
the API.

v2: Rebase after framebuffer fetch landed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 19:18:11 -07:00
Francisco Jerez
da85b5a9f1 i965: Expose shader framebuffer fetch extensions on Gen9+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:09 -07:00
Francisco Jerez
4135fc22ff i965/fs: Hook up coherent framebuffer reads to the NIR front-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:09 -07:00
Francisco Jerez
be12a1f36e i965/fs: Remove special casing of framebuffer writes in scheduler code.
The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping.  We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:09 -07:00
Francisco Jerez
3daa0fae4b i965/fs: Don't CSE render target messages with different target index.
We weren't checking the fs_inst::target field when comparing whether
two instructions are equal.  For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
db123df747 i965/fs: Define logical framebuffer read opcode and lower it to physical reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
f2f75b0cf0 i965/fs: Define framebuffer read virtual opcode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
71d639f69e i965/disasm: Fix RC message type strings on Gen7+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
26ac16fe2f i965/eu: Add codegen support for the Gen9+ render target read message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
29eb8059fd i965/eu: Take into account the target cache argument in brw_set_dp_read_message.
brw_set_dp_read_message() was setting the data cache as send message
SFID on Gen7+ hardware, ignoring the target cache specified by the
caller.  Some of the callers were passing a bogus target cache value
as argument relying on brw_set_dp_read_message not to take it into
account.  Fix them too.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
8a2f19a777 i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.
This is not enabled on the original Gen4 part because it lacks surface
state tile offsets so it may not be possible to sample from arbitrary
non-zero layers of the framebuffer depending on the miptree layout (it
should be possible to work around this by allocating a scratch surface
and doing the same hack currently used for render targets, but meh...).

On Gen9+ even though it should mostly work (feel free to force-enable
it in order to compare the coherent and non-coherent paths in terms of
performance), there are some corner cases like 1D array layered
framebuffers that cannot be handled easily by the non-coherent path
because of the incompatible layout in memory of 1D and 2D miptrees (it
should be possible to work around this too by doing state-dependent
recompiles, but it's hard to care enough since Gen9 has native support
for coherent render target reads...)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
ecc4800383 i965: Implement glBlendBarrier.
This is a no-op if the platform supports coherent framebuffer fetch,
-- If it doesn't we just need to flush the render cache and invalidate
the texture cache in order for previous rendering to be visible to
framebuffer fetch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
786108e7b2 i965: Upload surface state for non-coherent framebuffer fetch.
This iterates over the list of attached render buffers and binds
appropriate surface state structures to the binding table block
allocated for shader framebuffer read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
dc96968dbf i965: Implement support for overriding the texture target in brw_emit_surface_state.
This allows the caller to bind a miptree using a texture target other
than the one it it was created with.  The code should work even if the
memory layouts of the specified and original targets don't match, as
long as the caller only intends to access a single slice of the
miptree structure.

This will be exploited by the next commit in order to support
non-coherent framebuffer fetch of a single layer of a 3D texture
(since some generations lack the minimum array element control for 3D
textures bound to the sampler unit), and multiple layers of a 1D array
texture (since binding it as an actual 1D array texture would require
state-dependent recompiles because the same shader couldn't
simultaneously work for 1D and 2D array textures due to the different
texel fetch coordinate ordering).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
49ea2bd175 i965: Massage argument list of brw_emit_surface_state().
This commit does three different things in a single pass in order to
keep the amount of churn low: Remove the for_gather boolean argument
which was unused, pass the isl_view argument by value rather than by
reference since I'll have to modify it from within the function, and
add a target argument to allow callers to bind textures using a target
other than the original.  The prototype of the function now looks
like:

 void brw_emit_surface_state(struct brw_context *brw,
                             struct intel_mipmap_tree *mt,
                             GLenum target, struct isl_view view,
                             uint32_t mocs, uint32_t *surf_offset, int surf_index,
                             unsigned read_domains, unsigned write_domains);

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
74e4baec59 i965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.
This surface state control has been supported by all hardware
generations since G45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
0fe732e66f i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
5759eb458b i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.
The logic to calculate the right layout and dimensionality for a given
GL texture target is going to be useful elsewhere, factor it out from
intel_miptree_get_isl_surf().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
99fb167839 i965: Resolve color for non-coherent FB fetch at UpdateState time.
This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data.  Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
071665c161 i965: Return whether the miptree was resolved from intel_miptree_resolve_color().
This will allow optimizing out the cache flush in some cases when
resolving wasn't necessary.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
f24e393bd5 i965/fs: Translate nir_intrinsic_load_output on a fragment output.
This gets the non-coherent framebuffer fetch path hooked up to the NIR
front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
b00a236d6a i965/fs: Allocate fragment output temporaries on demand.
This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called.  This makes nir_setup_outputs()
a no-op for the fragment shader stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
7dac882073 i965/fs: Rework representation of fragment output locations in NIR.
The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location).  nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
4e990b67ce i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.
Most likely we had only ever used this macro on bitfields of less than
31 bits -- That's going to change shortly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
f3cb2c34f2 i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.
I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore.  This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
af0cc743e6 i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.
v2: Memoize sample ID, misc codestyle changes. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
fe6abb5755 i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.
This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
98d61ee083 i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.
The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
08705badfe i965: Allocate space in the binding table for non-coherent FB fetch.
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.

The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
40b23ad57e i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.
Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use.  The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:

 MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch

(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
4a87e4ade7 i965/fs: Get rid of fs_visitor::do_dual_src.
This boolean flag was being used for two different things:

 - To set the brw_wm_prog_data::dual_src_blend flag.  Instead we can
   just set it based on whether the dual_src_output register is valid,
   which will be the case if the shader writes the secondary blending
   color.

 - To decide whether to call emit_single_fb_write() once, or in a loop
   that would iterate only once, which seems pretty useless.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:00 -07:00
Francisco Jerez
aee3d8f0d9 nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.
This requires emitting a series of copies at the top of the program
from each output variable to the corresponding temporary.  The initial
copy can be skipped for non-framebuffer fetch outputs whose initial
value is undefined, and the final copy needs to be skipped for
read-only outputs (i.e. gl_LastFragData), since it would be illegal to
emit a store output intrinsic for it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:33:29 -07:00
Francisco Jerez
97ac3eba58 nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.
The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics.  The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:33:29 -07:00
Eric Anholt
00c72acba5 vc4: Add support for fddx/fddy
Based vaguely on a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt
e763e19808 vc4: Add register allocation support for MUL output rotation.
We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.
2016-08-25 17:24:11 -07:00
Eric Anholt
8ce6526178 vc4: Add support for MUL output rotation.
Extracted from a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt
074f1f3c0c vc4: Add support for the 2-bit LOAD_IMM variants.
Extracted and fixed up from a patch by jonasarrow on github.  This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
2016-08-25 17:24:11 -07:00
Eric Anholt
3da4e38f48 vc4: Add QPU scheduling to handle MUL rotate sources.
We need MUL rotates to do ddx/ddy support.
2016-08-25 17:24:11 -07:00
Eric Anholt
b0b99a7952 vc4: Add disassembly for constant MUL rotates 2016-08-25 17:24:11 -07:00
Eric Anholt
b160708e03 vc4: Add real validation for MUL rotation.
Caught problems in the upcoming DDX/DDY implementation.
2016-08-25 17:24:11 -07:00
Eric Anholt
31da39ddc9 vc4: Add a QIR value for the QPU element register.
This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.
2016-08-25 17:24:11 -07:00
Chad Versace
5b03975889 i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()
Respect intel_miptree_slice::x_offset,y_offset and
intel_mipmap_tree::offset. All three may be non-zero when glReadPixels
is called on an EGLImage created from the non-base slice of a miptree.

Patch 2/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0
2016-08-25 16:52:00 -07:00
Chad Versace
c82f99e883 i965: Fix miptree layout for EGLImage-based renderbuffers
When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage
created from the non-base slice of a miptree,
intel_image_target_renderbuffer_storage() forgot to apply the intra-tile
offsets __DRIimage::tile_x,tile_y to the miptree layout.

This patch fixes the problem with a quick hack suitable for
cherry-picking. A proper fix requires more thorough plumbing in
intel_miptree_create_layout() and brw_tex_layout().

Patch 1/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b
2016-08-25 16:52:00 -07:00
Jason Ekstrand
bebc1a1d99 intel: Flatten the makefile structure
This pulls isl and genxml into a single make file so that they can properly
build in parallel.  This isn't terribly important now as genxml just
generates sources which happens serially first anyway but it will be more
important as we add more stuff to src/intel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 15:29:48 -07:00
Jason Ekstrand
c19fc5e019 isl/tests: Use a longer path for isl.h
The tests assumed that isl would be in the include path but that usually
isn't the case.  Instead, we usually have src/intel and you need to add an
"isl/" prefix.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 15:29:47 -07:00
Jason Ekstrand
8bdf605214 intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfaces
If the surface has a layout of GEN4_2D then we need to compute a normal 2D
alignment and not use the magic linewar 1D alignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:11:15 -07:00
Jason Ekstrand
cda1a5dc0e intel/isl: Pass the dim_layout into choose_alignment_el
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:10:43 -07:00
Jason Ekstrand
f68cfb05fa intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKL
The Sky Lake 1D layout is only used if the surface is linear.  For tiled
surfaces such as depth and stencil the old gen4 2D layout is used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:09:44 -07:00
Jason Ekstrand
78715c7211 nir/phi_builder: Don't recurse in value_get_block_def
In some programs, we can have very deep dominance trees and the recursion
can cause us to risk stack overflows.  Instead, we replace the recursion
with a pair of loops, one at the start and one at the end.  This is
functionally equivalent to what we had before and it's actually a bit
easier to read in the new form without the recursion.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 14:08:07 -07:00
Chad Versace
3eddf5219e .mailmap: Update my address again
I joined Google's Chrome OS graphics team.
2016-08-25 13:55:52 -07:00
Matt Turner
e53130cc27 nir: Walk blocks in source code order in lower_vars_to_ssa.
Prior to this commit rename_variables_block() is recursively called,
performing a depth-first traversal of the control flow graph. The
function uses a non-trivial amount of stack space for local variables,
which puts us in danger of smashing the stack, given a sufficiently deep
dominance tree.

XCOM: Enemy Within contains a shader with such a dominance tree (1574
nir_blocks in total, depth of at least 143).

Jason tells me that he believes that any walk over the nir_blocks that
respects dominance is sufficient (a DFS might have been necessary prior
to the introduction of nir_phi_builder).

In fact, the introduction of nir_phi_builder made the problem worse:
rename_variables_block(), walks to the bottom of the dominance tree
before calling nir_phi_builder_value_get_block_def() which walks back to
the top of the dominance tree...

In any case, this patch ensures we avoid that problem as well.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 13:45:39 -07:00
Marek Olšák
a491b9e945 radeonsi: don't use allocas for arrays with LLVM 3.8
It crashes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
2016-08-25 21:19:17 +02:00
Marek Olšák
fe91ae06d3 gallium/radeon: unify and simplify checking for an empty gfx IB
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.

This fixes unflushed fences that can get stuck due to empty IBs.
2016-08-25 21:19:17 +02:00
Matt Turner
e6673e7ac2 mesa: Drop sed of now dead Plo files.
gen6/7/8_blorp.c were removed in commits c8bc1ae96a, e198983c61, and
16a9fcbbb6 respectively.
2016-08-25 11:20:54 -07:00
Kenneth Graunke
6cf8708ce5 meta: Always do GenerateMipmaps in linear colorspace.
When generating mipmaps for sRGB textures, force both decode and encode,
so the filtering is done in linear colorspace, regardless of settings.

Fixes a WebGL conformance test in Chrome:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/misc/tex-srgb-mipmap.html?webglVersion=2

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97322
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-25 11:07:01 -07:00
Eric Engestrom
ed871af91c configure.ac: raise Mako required version to 0.8.0
It seems [0] old versions of Mako are no longer supported. Emil mentioned it
might need v0.8.0 [1] for isl_format_layout [2], although I didn't get
a confirmation that it's really the minimum.
Let's raise it to that to avoid getting other bugs.
We might lower it a bit again later if it turns out we can.

[0] https://lists.freedesktop.org/archives/mesa-dev/2016-July/122772.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/122775.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123278.html

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dave Airlie <Airlied@redhat.com>
2016-08-25 16:51:27 +01:00
Brian Paul
2a2dc416b6 swrast: fix incorrectly positioned putImage() in swrast driver
Some front buffer rendering was in the wrong position.  This included
scissored clears, glDrawPixels and glCopyPixels.  The problem was the
y coordinate passed to putImage() didn't match the y coordinate passed
to getImage().

We fix this by setting xrb->map_y to the inverted coordinate in
swrast_map_renderbuffer() which is used later by the putImage() call.
Also pass xrb->map_y to getImage() to be symmetric.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97426
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-25 07:19:35 -06:00
Marek Olšák
3ff0b67e1b radeonsi: disable SDMA texture copying on Carrizo
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-08-25 14:51:08 +02:00
Marek Olšák
1276316d67 gallium/noop: use 3-space indentation
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-25 14:09:48 +02:00
Marek Olšák
9daaa6f5a6 gallium: add a pipe_context parameter to resource_get_handle
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-25 14:09:48 +02:00
Nicolai Hähnle
b662c70aea st/mesa: fix sRGB BlitFramebuffer regression
Broken since: 3190c7ee97

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97285

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-08-25 13:21:05 +02:00
Michel Dänzer
1e3218bc5b loader/dri3: Overhaul dri3_update_num_back
Always use 3 buffers when flipping. With only 2 buffers, we have to wait
for a flip to complete (which takes non-0 time even with asynchronous
flips) before we can start working on the next frame. We were previously
only using 2 buffers for flipping if the X server supports asynchronous
flips, even when we're not using asynchronous flips. This could result
in bad performance (the referenced bug report is an extreme case, where
the inter-frame stalls were preventing the GPU from reaching its maximum
clocks).

I couldn't measure any performance boost using 4 buffers with flipping.
Performance actually seemed to go down slightly, but that might have
been just noise.

Without flipping, a single back buffer is enough for swap interval 0,
but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we can
start working on the next frame. This condition was previously reversed.

Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-25 17:40:24 +09:00
Jason Ekstrand
2301705dee anv: Include the pipeline layout in the shader hash
The pipeline layout affects shader compilation because it is what
determines binding table locations as well as whether or not a particular
buffer has dynamic offsets.  Since this affects the generated shader, it
needs to be in the hash.  This fixes a bunch of CTS tests now that the CTS
is using a pipeline cache.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-24 20:42:05 -07:00
Jason Ekstrand
05f36435ef anv: Add a --disable-vulkan-icd-full-driver-path option
This option makes installed Vulkan ICD files contain only a driver library
name and not a path.  This is intended for distros to help them work around
multi-arch issues.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-08-25 10:32:31 +10:00
Francisco Jerez
c8f5bd2c99 i965/fs: Don't consider the stencil output to be a color output.
This would cause gl_FragStencilRef to be counted as a color output
incorrectly during the precompile phase, which leads to unnecessary
recompilation on master and could trigger an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
2018371692 glsl: Keep track of the set of fragment outputs read by a GL program.
This is the set of shader outputs whose initial value is provided to
the shader by some external means when the shader is executed, rather
than computed by the shader itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
711213fb72 glsl: Don't consider read-only fragment outputs to be written to.
Since they cannot be written.  This prevents adding fragment outputs
to the OutputsWritten set that are only read from via the
gl_LastFragData array but never written to.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
913ae618c6 glsl/linker: Allow fragment output overlap for gl_LastFragData.
gl_LastFragData overlaps gl_FragData by definition.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
6b3d23dcc0 glsl/ast: Allow redeclaration of gl_LastFragData with different precision qualifier.
v2: No need to check the GLSL version. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
5e1d34394e glsl: Don't attempt to do dead varying elimination on gl_LastFragData arrays.
Apparently this pass can only handle elimination of a single built-in
fragment output array, so the presence of gl_LastFragData (which it
wouldn't split correctly anyway) could prevent it from splitting the
actual gl_FragData array.  Just match gl_FragData by name since it's
the only built-in it can handle.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
6b33eab959 glsl: Define a gl_LastFragData built-in for older GLSL versions.
The EXT_shader_framebuffer_fetch extension defines alternative
language for GLES2 shaders where user-defined fragment outputs are not
allowed.  Instead of using inout user-defined fragment outputs the
shader is expected to read from the gl_LastFragData built-in array.
In addition this allows using the same language on desktop GLSL
versions prior to 4.2 that support the deprecated gl_FragData built-in
in preparation for the MESA_shader_framebuffer_fetch desktop GL
extension.

Both legacy and user-defined inout outputs have a common
representation at the GLSL IR level, so it shouldn't make any
difference for optimization passes and back-ends whether the
application is using gl_LastFragData or user-defined outputs, all
they'll see is a variable dereference of a fragment output at a
certain interface location with the fb_fetch_output bit set to one.

v2: Don't define the built-in variable on GLSL versions for which
    gl_FragData exists but is deprecated. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:31 -07:00
Francisco Jerez
19e929a177 glsl: Handle the inout qualifier in fragment shader output declarations.
According to the EXT_shader_framebuffer_fetch extension the inout
qualifier can be used on ESSL 3.0+ shaders to declare a special kind
of fragment output that gets implicitly initialized with the previous
framebuffer contents at the current fragment coordinates.  In addition
we allow using the same language to define FB fetch outputs in GLSL
1.3+ shaders in preparation for the desktop MESA_shader_framebuffer_fetch
extensions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
b49d8f20f4 glsl: Add support for representing framebuffer fetch in the GLSL IR.
The GLSL IR representation of framebuffer fetch amounts to a single
bit in the ir_variable object applicable to fragment shader outputs.
The flag indicates that the variable will be implicitly initialized to
the previous contents of the render buffer at the same fragment
coordinates and sample index.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
d7cd7b9c49 glsl: Add parser state enables for the framebuffer fetch extensions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
303fb5881c mesa: Add blend barrier entry point and driver hook.
Both MESA_shader_framebuffer_fetch_non_coherent and the non-coherent
variant of KHR_blend_equation_advanced will use this driver hook to
request coherency between framebuffer reads and writes.  This
intentionally doesn't hook up glBlendBarrierMESA to the dispatch layer
since the extension isn't exposed to applications yet, see [1]
for more details.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
6a976bbf84 mesa: Move shader memory barrier functions into barrier.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
83d2f9db29 mesa: Rename "texturebarrier" source files to "barrier".
In preparation for collecting all pipeline barrier GL entry points
into a single source file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
642aa58577 mesa: Add support for querying GL_FRAGMENT_SHADER_DISCARDS_SAMPLES_EXT.
This can currently only give true as result since the only way you can
expose EXT_shader_framebuffer_fetch right now is by flipping the
MESA_shader_framebuffer_fetch bit, but that could potentially change
in the future, see [1] for an explanation.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
115a27357c mesa: Add extension enables for framebuffer fetch extensions.
This allows drivers to expose EXT_shader_framebuffer_fetch in GLES2+
contexts if desired.  Note that this adds boolean flags for two MESA
extensions, but only the EXT GLES-only extension is exposed for the
moment, see the cover letter of this series [1] for the rationale.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Francisco Jerez
acb12a1228 glapi: Add XML for GL_EXT_shader_framebuffer_fetch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-24 13:28:30 -07:00
Samuel Pitoiset
a227b0a4f1 nvc0: invalidate textures/samplers on GK104+
Like Fermi, textures and samplers are aliased between 3D and compute,
especially the TIC_FLUSH/TSC_FLUSH methods and we have to re-validate
these resources when switching between the two pipelines.

This fixes a GPU hang with Elemental (and most likely with other UE4 demos).

Tested on GK107 and GM107.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
CC: <mesa-stable@lists.freedesktop.org>
2016-08-24 22:26:36 +02:00
Rhys Kidd
c9c989763a gallium/ttn: Remove duplicated TGSI_OPCODE_DP2A initialization
Duplicate line is currently on 1535.

Identified by Clang, when run through Eric Anholt's Travis harness.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-24 11:54:50 -07:00
Eric Anholt
78ab62b1e9 travis: Upgrade LLVM dependency to 3.5 and enable LLVM drivers.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2016-08-24 11:54:50 -07:00
Eric Anholt
084678ccbb travis: Enable vc4 in libdrm to satisfy vc4 test build dependency.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2016-08-24 11:54:50 -07:00
Eric Anholt
80a872f3f0 travis: Update to the Ubuntu Trusty image.
This will hopefully fix wget from x.org (no real reason explained in
Travis CI bug reports), and may also mean that we can enable LLVM driver
builds.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2016-08-24 11:54:50 -07:00
Eric Anholt
ecbc76cf6e travis: Parse configure.ac to pick an updated LIBDRM_VERSION.
Travis has been broken a couple of times by configure.ac updates.  To make
it useful, auto-update the version necessary.

This could potentially be used for other dependencies, too, but those get
bumped less frequently.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2016-08-24 11:54:50 -07:00
Lionel Landwerlin
91987c51e3 anv: meta_blit2d: adapt texel fetch pitch for fake w-tiled
We need to compute detiling coordinates using the physical size of W tiling
(128x32) rather than the logical size (64x64).

v2: Correct comment (Jason)

Fixes dEQP-VK.api.copy_and_blit.image_to_image_stencil

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97448
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-24 11:29:23 -07:00
Eric Anholt
87a88f2daa vc4: Fix GPU hangs with >16 varying values.
Fixes glsl-routing in piglit and hangs in glbenchmark 2.0.2.
2016-08-24 10:43:22 -07:00
Leo Liu
5277f25480 vl/rbsp: fix another three byte not detected
This happens when three byte "00 00 03" is partly loaded to
vlc->buffer, thus at the bottom of buffer with valid bits is
"00" or "00 00" and left  like "00 03" or "03" in the data,
so that it will not be detected by three byte emulation check.
The reason for that is the escaped bit was set to 0 from the
rbsp init.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-08-24 11:17:16 -04:00
Marek Olšák
2c13abb491 radeonsi: fix VM faults due NULL internal const buffers on CIK
They are harmless, but the interrupts do decrease performance.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97039

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-08-24 15:39:57 +02:00
Tomasz Figa
577f85e2bb gallium/winsys/kms: Look up the GEM handle after importing a prime FD
drmPrimeHandleToFD() will return the same GEM handle every time the same
buffer is imported, even from a different prime FD. Since GEM handles
are not reference counted, we need to make sure that each GEM handle is
referenced only by one display target struct, by looking it up in
kms_sw->bo_list first and bumping the refcount of the found dt on hit
and falling back to creating a new dt only on miss.

v2: Split into separate function.
    Use helper function for lookup.

v3 [Emil Velikov]:
    Rename kms_sw_displaytarget_{lookup,find_and_ref} (Jordan)

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 14:39:23 +01:00
Tomasz Figa
0465c72d46 gallium/winsys/kms: Move display target handle lookup to separate function
As a preparation to use the lookup in more than once place, move the
code that looks up given KMS/GEM handle to a separate function. This
change should not introduce any functional changes.

v2: Split into separate patch.
    Move lookup code into separate function.

v3 [Emil Velikov]:
    Rename kms_sw_displaytarget_{lookup,find_and_ref} (Jordan)

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 14:39:23 +01:00
Tomasz Figa
e71b78ebf9 gallium/winsys/kms: Fully initialize kms_sw_dt at prime import time (v2)
Currently kms_sw_displaytarget_add_from_prime() allocates the struct and
fills in only some of the fields, resulting in a half-baked struct that
needs to be further completed by the caller. To make this a bit more
consistent, pass width, height and stride to this function and fill in
everything there, so that caller can take the returned struct as is.

v2: Split from one big patch into four fixing one thing at a time.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 14:39:23 +01:00
Tomasz Figa
0aa6a818ef gallium/winsys/kms: Fix double refcount when importing from prime FD (v2)
Currently the code creates a display target struct with refcount field
initialized to 1 and then the caller again increments it, leading to
a leaked reference. Let's remove the unnecessary increment.

v2: Split from one big patch into four fixing one thing at a time.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 14:39:22 +01:00
Alejandro Piñeiro
b4959e17f1 shaderapi: don't generate not linked error on GetProgramStage in general
Both ARB_shader_subroutine and the GL core spec doesn't list any
error when the program is not linked.

We left a error generation for the uniform location, in order to be
consistent with other methods from the spec that generate them.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-08-24 14:57:13 +02:00
Eric Engestrom
9411eb67ec gallium/cso: avoid unnecessary null dereference
The label `out:` calls `destroy()` which dereferences `ctx`.
This is unnecessary as there is nothing to destroy.
Immediately return instead.

CovID: 1258255
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-24 11:35:05 +01:00
Eric Engestrom
2f86582b92 .gitignore: Ignore tags generated by make tags
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
[Emil Velikov: rebase]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 11:33:48 +01:00
Eric Engestrom
f6b9fb6e4c st/xvmc: fix a couple 'unused-but-set-variable' warnings
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 11:32:00 +01:00
Eric Engestrom
49dad1aafd egl: turn a couple asserts static (compile-time)
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 11:30:15 +01:00
Eric Engestrom
8af1b540c5 i915: remove unnecessary if
if (x) return true; else return false;
can be simplified as:
	return x;
since `x` is already a boolean expression.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 11:17:05 +01:00
Eric Engestrom
253274351f i965: remove unnecessary if
if (x) return true; else return false;
can be simplified as:
	return x;
since both `x` are already boolean expressions.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 11:17:05 +01:00
Alejandro Piñeiro
07fe2d565b program_resource: subroutine active uniforms should return NumSubroutineUniforms
Before this commit, GetProgramInterfaceiv for pname ACTIVE_RESOURCES
and all the <shader>_SUBROUTINE_UNIFORM programInterface were
returning the count of resources on the shader program using that
interface, instead of the num of uniform resources. This would get a
wrong value (for example) if the shader has an array of subroutine
uniforms.

Note that this means that in order to get a proper value, the shader
needs to be linked, something that is not explicitly mentioned on
ARB_program_interface_query spec, but comes from the general
definition of active uniform. If the program is not linked we
return 0.

v2: don't generate an error if the program is not linked, returning 0
    active uniforms instead, plus extra spec references (Tapani Palli)

Fixes GL44-CTS.program_interface_query.subroutines-compute

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-08-24 11:33:04 +02:00
Stencel, Joanna
690ead4a13 egl/wayland-egl: Fix for segfault in dri2_wl_destroy_surface.
Segfault occurs when destroying EGL surface attached to already destroyed
Wayland window. The fix is to set to NULL the pointer of surface's
native window when wl_egl_destroy_window() is called.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Stencel, Joanna <joanna.stencel@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-24 10:18:13 +01:00
Kai Wasserbäch
f033d97155 st/va: Remove unused variable coded_size from vlVaEndPicture()
Removes the following GCC warning:
 ../../../../../src/gallium/state_trackers/va/picture.c:542:17: warning:
  unused variable 'coded_size' [-Wunused-variable]
    unsigned int coded_size;
                 ^~~~~~~~~~

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-08-24 10:35:53 +02:00
Kai Wasserbäch
83d08d4cab st/va: Remove else case in vlVaEndPicture() made superfluous by c59628d11b
Commit c59628d11b made the else statement
and duplication of the context->decoder->end_frame() call superfluous.

Cc: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-08-24 10:35:20 +02:00
Eric Engestrom
cd340052ad st/va: add missing mutex_unlock
Fixes: c59628d11b ("st/va: enable dual instances encode by sync surface")

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-24 10:33:07 +02:00
Kenneth Graunke
e7530bfcd6 aubinator: Style fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-23 21:19:58 -07:00
Sirisha Gandikota
56ba9656bb aubinator: Fix the tool to correctly decode the DWords
Several fixes have been added as part of this as listed below:

1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS
as the mask returned wrong values of the fields.

2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/
offset were handled the same way as the other fields and that gives
the wrong values for the address/offset.

3) Decode nested/recurssive structures - Many packets contain nested
structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC
structures. Previously, the aubinator printed 1 if there was a MOC
structure. Now we decode the entire structure and print out its fields.

4) Print out the DWord address along with its hex value - For a better
clarity of information, it is helpful to print both the address and
hex value of the DWord along with the DWord count. Since the DWord0
contains the instruction code and the instruction length, it is
unnecessary to print the decoded values for DWord0. This information
is already available from the DWord hex value.

5) Decode the <group> and the corresponding fields in the group- The
<group> tag can have fields of several types including structures. A
group can contain one or more number of fields and this has be correctly
decoded. Previously, aubinator did not decode the groups or the
fields/structures inside them. Now we decode the <group> in the
instructions and structures where the fields in it repeat for any number
of times specified.

v2: Fix the formatting (per Matt)
Make the start and end pos calculation to extract fields from a DWord
more appropriate by moving %32 away from mask() method

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
2016-08-23 21:19:55 -07:00
Kristian Høgsberg Kristensen
3e218ad7f8 aubinator: Add a new tool called Aubinator to the src/intel/tools folder.
The Aubinator tool is designed to help the driver developers in debugging
the driver functionality by decoding the data in the .aub files.
Primary Authors of this tool are Damien Lespiau <damien.lespiau at intel.com>
and Kristian Høgsberg Kristensen <krh at bitplanet.net>.

v2: Review comments are incorporated by Sirisha Gandikota as below:
1) Make Makefile.am more crisp, reuse intel_aub.h from libdrm (per Emil)
2) Aubinator will use platform name instead of GEN number (per Matt)
3) Disassmebler gets created based on pciid rather then GEN number (per Matt)
4) Other formatting comments (per Ken, Matt and Emil)

Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
2016-08-23 21:19:33 -07:00
Kenneth Graunke
eb1a0ddfd5 glsl: Mark tessellation qualifier maps static const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-23 21:15:59 -07:00
Jason Ekstrand
70bc891c42 isl/formats: Integer formats are not filterable
In ca2a8e5628, we updated the format table to add more formats (most of
which are new on SKL) but accidentally marked some integer formats as
filterable.  You can't filter an integer format.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-23 16:51:34 -07:00
Ilia Mirkin
361678edd7 st/dri: respect driver's request to avoid mixed color/depth bit configs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-23 18:30:53 -04:00
Ilia Mirkin
9515d651f9 gallium: add a cap to expose whether driver supports mixed color/zs bits
Some hardware can't render to color/depth buffers of mixed bitness. When
that happens a fallback has to happen, but this allows the driver to
express that this isn't an optimal scenario. The purpose of this is to
remove such fbconfigs from the GLX/EGL config list.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-23 18:30:49 -04:00
Ilia Mirkin
528390021f dri: add a way to request that modes have matching color/zs depths
Some GPUs, notably nv3x/nv4x can't render to mismatched color/zs
framebuffer depths. Fallbacks can be done by the driver, with shadow
surfaces, but no reason to encourage applications to select non-matching
glx visuals.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-23 18:30:30 -04:00
Ilia Mirkin
092f994a03 nv50/ir: make sure cfg iterator always hits all blocks
In some very specially-crafted cases, we could attempt to visit a node
that has already been visited, and then run out of bb's to visit, while
there were still cross blocks on the list. Make sure that those get
moved over in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96274
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-08-23 18:30:12 -04:00
Jason Ekstrand
7bdccd104b anv/clear: Clear E5B9G9R9 images as R32_UINT
We can't actually clear these images normally because we can't render to
them.  Instead, we have to manually unpack the rgb9e5 color value on the
CPU and clear it as R32_UINT.  We still have a bit of work to do to clear
non-power-of-two images, but this should get all of the power-of-two clears
working on at least Haswell.  This fixes three of the new Vulkan CTS tests
in the dEQP-VK.api.image_clearing.clear_color_image.* group.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-23 11:45:25 -07:00
Jason Ekstrand
afa7ca0f77 anv/clear: Make cmd_clear_image take an actual VkClearValue
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
cf3cf2ecfc anv/blit2d: Add support for RGB destinations
This fixes 104 of the new image_clearing and copy_and_blit Vulkan CTS
tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
16ddda8452 anv/blit2d: Add a format parameter to bind_dst and create_iview
Signed-off-by: Jasosn Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
954c0bfb20 anv/image: Don't create invalid render target surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
ca2a8e5628 isl/formats: Update the table with more samplable formats
There were a lot of formats where support was added on Haswell or later but
we never updated the format table.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
aba9e25b70 isl/formats: Report ETC as being samplable on Bay Trail
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
f6967ddd32 i965/surface_formats: Don't advertise 8 or 16-bit RGB formats
We have implicitly been not advertising these formats since we had them
turned off in the format capabilities table.  We are about to update that
table and this prevents a change in behavior.  The only change in behavior
created by this patch is that we no longer advertise support for
R16G16B16_FLOAT which means that it's now renderable which seems like a
bonus.  Maybe someday we'll want to change things to start supporting
16-bit RGB formats natively but, at the moment, there's no need.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-23 11:45:24 -07:00
Jason Ekstrand
fb90291dd5 anv/formats: Don't use an RGBX format if it isn't renderable
The whole point of using RGBX is so that we can render to it so if it isn't
renderable, that kind-of defeats the purpose.  Some formats (one example is
R32G32B32X32_SFLOAT) exist in the format table but aren't actually
renderable.  Eventually, we'd like to get away from RGBX entirely, but this
fixes hangs on BDW today.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-23 11:45:24 -07:00
Nicolas Boichat
4f3f8bb59d egl/dri2: dri2_initialize: Do not reference-count TestOnly display
In the case where dri2_initialize is called with a TestOnly display,
the display is not actually initialized, so dri2_egl_display always
fails, and we cannot do any reference counting.

Fixes piglit spec@egl_khr_create_context@verify gl flavor (reproducible
with LIBGL_ALWAYS_SOFTWARE=1).

Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-23 18:08:17 +01:00
Jan Ziak
6687037f1f vbo: fix format string compiler warning for 32-bit machines
Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-23 07:31:28 -06:00
Dongwon Kim
c6e97aaf75 egl/dri2: remove error checks on return values from mtx_lock and cnd_wait
This removes unnecessary error checks on return result of mtx_lock
and cnd_wait calls as in all other places in MESA source since there
is no chance that any of these functions return any of error codes
in current implementation.

This patch also removes a redundent _eglError call that follows
EGL_FALSE check in the bottom of dri2_client_wait_sync.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-23 12:00:45 +01:00
Dave Airlie
96ea753d9e i965: report bound buffer size not underlying buffer size for image size (v2)
This seems to make sense, the image is bound to a subset of the buffer
so the image size should be from the bound size not the underlying
object.

This fixes:
GL44-CTS.shader_image_size.advanced-nonMS-fs-int

v2: get mininum of the two values, same as we write to the hw.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-23 13:39:15 +10:00
Jason Ekstrand
34ff4fbba6 anv: Throw INCOMPATIBLE_DRIVER for non-fatal initialization errors
The only reason we should throw INITIALIZATION_FAILED is if we have found
useable intel hardware but have failed to bring it up for some reason.
Otherwise, we should just throw INCOMPATIBLE_DRIVER which will turn into
successfully advertising 0 physical devices

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-08-22 18:49:49 -07:00
Dave Airlie
26187f3890 st/glsl_to_tgsi: fix st_src_reg_for_double constant.
This needs to set the src swizzle so it doesn't access the .zw
members ever when we are just emitting a 0 constant here.

This fixes:
vert-conversion-explicit-dvec3-bvec3.shader_test
and a bunch of other fp64 tests on softpipe and radeonsi.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-23 11:14:03 +10:00
Dave Airlie
0bce055d9e mesa/subroutines: drop the old subroutine index uploads.
We used to upload the indices when they changed, now we rely
on the drivers calling the correct hook to have the values
updated from the context storage.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2016-08-23 11:03:46 +10:00
Dave Airlie
6a332a389a st/mesa: use the new subroutine index upload API.
This plugs the new API into the gallium state tracker.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Andres Gomez <agomez@igalia.com>
2016-08-23 11:03:45 +10:00
Dave Airlie
4adad99cfb i965: use new subroutine index uploader.
This plugs the subroutine index updates into the i965 backend,
where it loads constants.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Andres Gomez <agomez@igalia.com>
2016-08-23 11:03:45 +10:00
Dave Airlie
ea783667e4 mesa: add api to write subroutine indicies to the program storage.
This writes the subroutine indicies to the program storage for
a stage. This API is intended to be used by drivers to update
the uniform storage before uploading to the hw.

This isn't the most thread safe effort, but it will be significantly
more multi-context safe.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2016-08-23 11:03:45 +10:00
Dave Airlie
4566aaaa5b mesa/subroutines: start adding per-context subroutine index support (v1.1)
One piece of ARB_shader_subroutine I ignored was the fact that it
needs to store the subroutine index data per context and not per
shader program.

There is one CTS test that tests this:
GL45-CTS.shader_subroutine.multiple_contexts

However the test only does a write to context and readback,
it never renders using the values, so this is enough to fix the
test however not enough to do what the spec says.

So with this patch the info is now stored per context, but
it gets updated into the program at UseProgram and when the
values are inserted into the context, which won't help if
multiple contexts are in use in multiple threads.

v1.1: cleanups and nit-picks (Andres)

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2016-08-23 11:03:45 +10:00
Matt Turner
27d20ee264 vbo: Make #if 0'd debugging code compile. 2016-08-22 16:31:50 -07:00
Timothy Arceri
8ee909ee42 nir: avoid segfault when ssa src not found
Without this the following line will segfault and we don't get to
see the results of the validate_assert() above.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-08-23 09:06:29 +10:00
Eric Anholt
47e3cc7557 vc4: Tell state_tracker that we would prefer NIR.
Before this series, the code generation path was:

GLSL IR -> TGSI -> NIR -> NIR clone -> QIR -> QPU

Now it's (generally)

GLSL IR -> NIR -> NIR clone -> QIR -> QPU
2016-08-22 12:11:08 -07:00
Eric Anholt
d08f09c24e st/nir: Trim out unused VS input variables.
If we're going to skip setting up vertex input data in them, we should
probably not leave them as vertex inputs with a driver_location that
happens to alias to something else.

Fixes a regression in glsl-mat-attribute on vc4 when enabling GTN.

v2: Change commit message shortlog, lower the new globals away before
    handing off to the driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-22 12:11:05 -07:00
Eric Anholt
3ef1853f7d nir: Fix crash in nir_lower_drawpixels.
Generally you'd see the gl_Color reference first and get some cursor set.
However, in piglit draw-pixel-with-texture we're now seeing the TexCoord
dereferenced first.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-08-22 11:52:27 -07:00
Eric Anholt
0a8ff1681b nir: Fix a comment typo in nir_lower_drawpixels.
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-08-22 11:52:26 -07:00
Eric Anholt
f4d143f0d9 vc4: Use proper type sizes for uniforms. 2016-08-22 11:52:26 -07:00
Eric Anholt
bdb54cdc16 vc4: Add VARYING_SLOT_PNTC support.
We end up with this when doing GLSL-to-NIR.
2016-08-22 11:52:26 -07:00
Eric Anholt
3c1ea6e651 vc4: Fix vc4_nir_lower_io for non-vec4 I/O.
To support GLSL-to-NIR, we need to be able to support actual
float/vec2/vec3 varyings.
2016-08-22 11:52:26 -07:00
Eric Anholt
e8378fee0c nir: Define system values for vc4's blending-lowering arguments.
In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I
was calling the "state uniforms" that I was putting into the NIR fighting
with its other lowering passes.  Instead of using magic uniform base
numbers in the backend, follow the lead of load_user_clip_plane and just
define system values for them.

v2: Fix unintended change to channel_num, drop unspecified const_index
    value on blend_const_color_r_float.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-22 11:52:26 -07:00
Lionel Landwerlin
475ce61d1a anv: GetDeviceImageFormatProperties: fix TRANSFER formats
We let the user believe we support some transfer formats which we don't.
This can lead to crashes when actually trying to use those formats for
example on dEQP-VK.api.copy_and_blit.image_to_image.* tests.

Let all formats we can render to or sample from as meta implements transfers
using attachments.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-22 10:41:30 -07:00
Marek Olšák
0328b20050 gallium/hud: round max_value to print nicely rounded numbers next to graphs
This improves readability a lot.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák
0f1befe926 gallium/hud: generalize code for drawing numbers next to graphs
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák
a33eb48d61 gallium/hud: draw numbers with 3 decimal places if those aren't 0
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák
b9c9551c09 gallium/hud: use sRGB for nicer AA lines
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák
6ffde82083 gallium/hud: use AA lines for graphs
this looks a lot better (with the next patch)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák
6902f9e82a gallium/hud: don't enable blending for all objects
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Tapani Pälli
0abebec012 util: add assert that key cannot be NULL on insertion
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-22 07:37:55 +03:00
Tapani Pälli
68233801ae glsl: fix key used for hashing switch statement cases
Implementation previously used value itself as the key, however after
hash implementation change by ee02a5e we cannot use 0 as key.

v2: use constant pointer as the key and implement comparison
    for contents (Eric Anholt)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97309
2016-08-22 07:36:33 +03:00
Mauro Rossi
a5f445640e android: i965: add per-gen libmesa_i965_gen{8,9} static
Needed to fix android build after commit 16a9fcb
which enabled genxml for gen{8,9} state setup

This is the last patch needed, android build tested successfully.
2016-08-20 16:18:31 -07:00
Mauro Rossi
9dc70a71f8 android: i965: add per-gen libmesa_i965_gen{7,75} static libraries
Needed to fix android build after commit e198983
which enabled genxml for gen{7,75} state setup

Android build fix for gen{8,9} will follow as incremental patch,
build tested successfully with all per-gen patches applied.
2016-08-20 16:18:28 -07:00
Mauro Rossi
7478ddad29 android: i965: add per-gen libmesa_i965_gen6 static library
Needed to fix android build after commit c8bc1ae
where new per-gen genX_blorp.c source replaced gen6_blorp.c for gen6

Android build fixes for gen{7,75} and gen{8,9} will follow as incremental patches,
build tested successfully with all per-gen patches applied.
2016-08-20 16:18:26 -07:00
Kenneth Graunke
7db81d9a87 glsl: Rename link_fs_input_layout_qualifiers to "inout".
We're going to handle output qualifiers here too, and calling it "inout"
seems to be the going convention.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-20 13:52:25 -07:00
Matt Turner
7e3e1bed03 i965/cfg: Factor common code out of switch statement.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-20 11:40:42 -07:00
Jason Ekstrand
a2ae67aa47 anv: Give the installed intel_icd.json file an absolute path
Not providing a path allows the ICD to work on multi-arch systems but
breaks it if you install anywhere other than /usr/lib.  Given that users
may be installing locally in .local or similar, we probably do want to
provide a filename.  Distros can carry a revert of this commit if they want
an intel_icd.json file without the path.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chad@kiwitree.net>
2016-08-20 00:50:03 -07:00
Daniel Scharrer
16ef7ab5c1 mesa: Fix fixed function spot lighting on newer hardware (again)
This was first fixed in commit b3f9c5c and then broken again in commit
fe2d2c7, which removed the abs modifier from input registers.

v2: Don't change the size of struct ureg.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91342
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Daniel Scharrer <daniel@constexpr.org>
2016-08-19 20:46:53 -07:00
Matt Turner
a9033d1dc1 i965: Remove comment within a comment. 2016-08-19 20:44:37 -07:00
Roland Scheidegger
0849621891 llvmpipe: fix issues with depth clamp
We only did depth clamp when the value was written from the fs.
This is very wrong both for d3d10 and GL, and only passed the
corresponding piglit test due to pure luck (it no longer does
with the enhanced test).
Also, interpolation clamped values to 1.0 always, which can legitimately
happen if depth clip is disabled, so fix that as well (untested).
There is one unresolved issue left, d3d10 always does depth clamping,
whereas GL does not (but does [0,1] clamp instead for fs depth outputs)
- this information isn't in any gallium state object, leave it as-is
for now (though it looks like llvmpipe misses the [0,1] clamp as well).
This (with the previous patch) fixes piglit depth-clamp-range test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-08-20 04:05:33 +02:00
Roland Scheidegger
b0a647f284 llvmpipe: fix depth clamping wrt reversed near/far values
This wasn't handled before (the result was that no matter what value got
clamped, it always ended up as the near value in this case) (if clamping
actually happened).
Fix this by using the util helper for that (the math is otherwise "mostly"
the same, mostly because there could actually be differences due to float
rounding, but I don't even know which one would be more correct).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-08-20 04:05:33 +02:00
Matt Turner
a73116ecc6 i965/sched: Simplify work done by add_barrier_deps().
Scheduling barriers are implemented by placing a dependence on every
node before and after the barrier. This is unnecessary as we can limit
the number of nodes we place dependencies on to those between us and the
next barrier in each direction.

Runtime of dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
is reduced from ~25 minutes to a little more than three.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94681
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 16:52:25 -07:00
Matt Turner
e7c376adfd i965/vec4: Ignore swizzle of VGRF for use by var_range_end().
var_range_end(v, n) loops over the n components of variable number v and
finds the maximum value, giving the last use of any component of v.
Therefore it expects v to correspond to the variable associated with the
.x channel of the VGRF.

var_from_reg() however returns the variable for the first channel of the
VGRF, post-swizzle.

So, if the last register had a swizzle with y, z, or w in the swizzle
component, we would read out of bounds. For any other register, we would
read liveness information from the next register.

The fix is to convert the src_reg to a dst_reg in order to call the
dst_reg version of var_from_reg() that doesn't consider the swizzle.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 16:52:25 -07:00
Matt Turner
3ef31122d0 i965/vec4: Print spills:fills.
Allows shader-db to work on vec4 programs (has been broken since
shader-db commit 646df5ca98b2 from April!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 16:52:25 -07:00
Ilia Mirkin
89f00f749f a4xx: make sure to actually clamp depth as requested
We were previously ... not clamping. I guess this meant that everything
got clamped to 1/0, which was enough to pass the existing tests. Or
perhaps the clamping would only happen to the rasterized depth value and
not the frag shader's output depth value.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-08-19 19:40:04 -04:00
Ilia Mirkin
cd8e30452f a4xx: only disable depth clipping, not all clipping, when requested
The previous bit disables the whole clipper, including the regular
viewport-related clipping that would go on. The two new bits disable
near and far clipping (separately, as verified with the
depth-clamp-range piglit).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-08-19 19:40:04 -04:00
Eric Anholt
5adee83806 vc4: Switch store_output to using nir_lower_io_to_scalar / component. 2016-08-19 13:11:36 -07:00
Eric Anholt
f8fecc396a vc4: Use the intrinsic's first_component for vattr VPM index.
Avoids another multiplication by 4 of the base in the NIR.
2016-08-19 13:11:36 -07:00
Eric Anholt
cbf8c19410 vc4: Convert to using nir_lower_io_scalar for FS inputs.
The scalarizing of FS inputs can be done in a non-driver-dependent manner,
so extract it out of the driver.
2016-08-19 13:11:36 -07:00
Eric Anholt
c30b22c421 vc4: Switch to using the intrinsic accessors.
The const_index[] values have always felt magic, and this documents them a
bit better.
2016-08-19 13:11:36 -07:00
Eric Anholt
9f1411d1ec nir: Add an IO scalarizing pass using the intrinsic's first_component.
vc4 wants to have per-scalar IO load/stores so that dead code elimination
can happen on a more granular basis, which it has been doing in the
backend using a multiplication by 4 of the intrinsic's driver_location.
We can represent it properly in the NIR using the first_component field,
though.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt
c35f979220 nir: Add nir_builder support for individual system value loads.
The previous nir_load_system_value(b, nir_intrinsic_load_whatever), 0) was
rather verbose, when system values should be easy to generate.

The index is left out because only one system value had an index included
in it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt
24728637e2 nir: Move the undef of nir_intrinsics.h macros to the .h.
I wanted to include this from nir_builder as well, so it also needed the
undefs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt
c078c41520 ttn: Use nir_load_front_face instead of the TGSI-style input.
This reduces the diff between GLSL-to-NIR and TGSI-to-NIR, and gives NIR
more optimization to work on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt
3f607f9e4f nir: Use the system-value front face for twoside lowering.
GLSL-to-NIR generates system value usage, and vc4/freedreno would both
like the system value instead of the varying, so switch this pass over to
it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt
ed92241d78 ttn: Make FRAG_RESULT_DEPTH be a float variable to match gtn and ptn.
This lets TTN-using drivers handle FRAG_RESULT_DEPTH the same between all
their source paths.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-08-19 13:11:36 -07:00
Eric Anholt
d80d03b830 vc4: Dump the TGSI before trying to convert it to NIR.
In the case of debugging a crash in TTN, this is nice to have.
2016-08-19 13:11:36 -07:00
Boyuan Zhang
c0be51f270 radeon/vce: set flag based on dual instance enablement
Set the flag on when dual instance encoding is supported,
otherwise set it to off.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-08-19 10:36:44 -04:00
Boyuan Zhang
c59628d11b st/va: enable dual instances encode by sync surface
This patch improves the performance of Vaapi Encode by enabling dual
instances encoding. flush function is not called after each end_frame
call. radeon/vce will do flush whenever 2 frames are submitted for
encoding. Implement sync surface function to flush only if the frame
hasn't been flushed yet.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-19 10:36:44 -04:00
Jason Ekstrand
93d2b5c576 i965/blorp: Remove no longer used state setup helpers
Now that we're using genxml for everything, we no longer need the
hand-rolled state emit helpers.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
16a9fcbbb6 i965/blorp: Use genxml for gen8-9 state setup
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
e198983c61 i965/blorp: Use genxml for gen7 state setup
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
344841fcba i965/blorp: Add genxml-based vertex setup helpers
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
7b035fd0c9 i965/blorp: Add a helper for emitting surface states
The new helper emits surface states and the binding table in one go.  It's
nice to have it pulled out of the main blorp_exec function.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
48f13545dd i965/blorp: Add genxml-based sampler state emit function
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
eb655c4fc2 i965/blorp: Add genxml-based dynamic state emit functions
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
c8bc1ae96a i965: Move gen6_blorp.c to a file that gets recompiled per-gen
At the moment, it's only used for gen6 but that will change soon.  We use
the genX prefix for recompiled things in the Vulkan driver.  It isn't
great, but it seems to have worked ok.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
eea6a66222 i965/blorp/gen6: Use genxml packing structs for state setup
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
b5c20a98c1 i965/blorp: Stop setting point and line rasterization rules
Blorp never uses points or lines and the default values of 0 are perfectly
fine.  Explicitly setting them is just noise.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
5e2dd7a381 i965/blorp/gen8: Move viewport setup to after wm state
This matches gen6 and gen7.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
802f0f8596 i965/blorp/gen6-7: Move multisample setup to right after samplers
This mimics gen8 blorp

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
75304fdbd8 i965/blorp/gen6-7: Move surfaces and samplers closer together
This mimics what we do on gen8.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
8b0426ddd4 i965/blorp/gen7-8: Emit depth stencil state with CC and BLEND
All three go together on SNB so let's keep them together for gen7+ as well.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
38c1909c0a i965/blorp/gen6: Move constant disables higher up
This is what gen7-8 do and it's a bit cleaner.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
e0bc2cb145 i965/blorp: Don't clear an empty region
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
e4d6ffbbf6 i965/blorp: Move the non-static blorp state setup helpers to another file
We're about to start replacing blorp state setup code with packing structs
and we want to feel free to delete files as we go.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
50768a3879 i965/blorp: Make gen6 VS and GS disable helpers static
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
949a892026 i965: Roll intel_reg.h into brw_defines.h
More than half of the stuff in intel_reg.h had nothing whatsoever to do
with registers and really belongs in brw_defines.h anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
8455f9430f i965: Stop including brw_defines.h in brw_state.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
4c3acf94da i965/state: Move is_drawing_lines/points to gen6_clip_state.c
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
04f3594cd5 genxml/gen9: Make 3DSTATE_SBE::AttributeActiveComponentFormat an array
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
bfdff28d68 genxml: Add a uint MOCS field to VERTEX_BUFFER_STATE
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
373613fa4b genxml: Make a couple of VERTEX_BUFFER_STATE fields boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
29f1f945a6 genxml: Make VERTEX_ELEMENT_STATE::Valid a bool
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
eb2589cba6 genxml/gen6: Make SAMPLER_STATE look a bit more like gen7
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
2a84e40dae genxml: Add a uint MOCS field to DEPTH_BUFFER packets
This is easier than dealing with structs all the time

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
3f1022b029 genxml/gen6: Make "Depth Clear Value" a uint
The actual data storred is in float, UNORM24, or UNORM16 depending on the
actual depth format.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
be62e7645e genxml/gen6: Add the 3D_Prim_Topo_Type enum
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
cca95a7bd6 genxml/gen6: Fix the length of 3DSTATE_WM
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
3ddb6f6e2a genxml/gen6: Add a Surface Base Address field to HIER_DEPTH_BUFFER
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
be52e16dbc genxml/gen6: Add uint MOCS fields for most things
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Kenneth Graunke
7d0554f341 nir: Rely on the fact that bcsel takes a well formed boolean.
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.

https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html

With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-19 02:05:23 -07:00
Francisco Jerez
7ceb42ccc5 i965/sched: Change the scheduling heuristics to favor early program termination.
This uses the unblocked time of the exit assigned to each available
node to attempt to unblock exit nodes as early as possible,
potentially reducing the runtime of the shader when an exit branch is
taken.  There is a natural trade-off between terminating the program
as early as possible and reducing the worst-case latency of the
program as a whole (since this will typically move exit-unblocking
nodes closer to its dependencies potentially causing additional stalls
of the execution pipeline), but in practice the bandwidth and ALU
cycle savings from terminating the program earlier tend to outweigh
the slight increase in worst-case program execution latency, so it
makes sense to prefer nodes likely to unblock an earlier exit
regardless of the latency benefits of other available nodes.

I haven't observed any benchmark regressions from this change after
testing on VLV, HSW, BDW, BSW and SKL.  The FPS of the GfxBench
Manhattan benchmark increases by 10%-20% and the FPS of Unigine Valley
improves by roughly 5% depending on the platform and settings.

The change to the register pressure-sensitive heuristic is rather
conservative and gives precedence to the existing heuristic in order
to avoid increasing register pressure and causing spill count and SIMD
width regressions in shader-db.  It may make sense to revisit this
with additional performance data.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
4147ca75d5 i965/sched: Assign a preferred exit node to each node of the dependency graph.
This adds a bit of metadata to schedule_node that will be used to
compare available nodes in the scheduling heuristic code based on
which of them unblocks the earliest successor exit node.  Note that
assigning exit nodes wouldn't be necessary in a bottom-up scheduler
because we could achieve the same effect by scheduling the exit nodes
themselves appropriately.

No shader-db changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
b295d7ca32 i965/sched: Calculate the critical path of scheduling nodes non-recursively.
The critical path of each node is calculated by induction based on the
critical paths of its children, which can be done in a post-order
depth-first traversal of the dependency graph.  The current code
implements graph traversal by iterating over all nodes of the graph
and then recursing into its children -- But it turns out that
recursion is unnecessary because the lexical order of instructions in
the block is already a good enough reverse post-order of the
dependency graph (if it weren't a reverse post-order some instruction
would have been located before one of its dependencies in the original
ordering of the basic block, which is impossible), so we just need to
walk the instruction list in reverse to achieve the same result more
efficiently.

No shader-db changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
b2b621a0ec i965/fs: Switch to per-subspan discard jumps.
ANY4H is more efficient than ANY8H and ANY16H because it makes sure
that whenever a whole subspan hits a discard statement it gets
disabled by the EU until the end of the program, regardless of whether
the discard condition is uniform across all channels of the SIMD8-16
thread.  OTOH ANY8H/ANY16H would cause the rest of the program to be
executed for *all* channels if only one of the channels hadn't taken
the discard branch, potentially increasing the bandwidth and ALU usage
of the program unnecessarily.

This change increases the FPS by over 3x of a simple micro-benchmark
that discards a bunch of fragments and then does a single costly
texturing operation.  I've just re-verified the FPS change on HSW and
SKL, but I expect all platforms from Gen6 up to get a similar benefit.

Note that we could potentially be more aggressive and use the NORMAL
predicate to discard individual channels, but that would need to
happen post-scheduling because the scheduler currently doesn't care to
reorder HALT instructions with respect to other instructions, and the
NORMAL predicate would cause the results of subsequent derivative
computations to become undefined -- If the scheduler didn't reorder
HALT instructions it would actually be safe to switch to NORMAL
because the behavior of derivative computations after a non-uniform
discard statement is undefined by the GLSL spec, but that would make
the optimization implemented by one of the following commits somewhat
more difficult.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
01b321f242 i965/fs: Drop bogus writemasking disable bit from HALT instructions.
This may have been the reason people ran into problems with
non-uniform HALT instructions and ended up using the inefficient
ANY16H/ANY8H predicates instead of ANY4H or NORMAL in order to prevent
non-uniform discard.  The HALT instruction is able to handle
non-uniform execution masks just fine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:04:59 -07:00
Ilia Mirkin
27e59ed477 mesa: avoid valgrind warning due to opaque only being set sometimes
Valgrind complains with a "Conditional jump or move depends on
uninitialised value(s)" warning due to opaque being conditionally
initialized. However in the punchthrough_alpha == true case, it is
always initialized, so just flip the condition around to silence the
warning.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-08-18 22:48:55 -04:00
Ilia Mirkin
59bb821180 vbo: remove unnecessary max_basevertex computation
The max basevertex is already computed and added into max_index by the
caller, _tnl_draw_prims.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-18 20:26:34 -04:00
Ilia Mirkin
659dc10d32 vbo: add basevertex when looking up elements for vbo splitting
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97351
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-18 20:26:22 -04:00
Marek Olšák
07ccec002b radeonsi: initialize and finalize the LLVM function pass manager
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-08-18 21:36:03 +02:00
Emil Velikov
d61d259518 isl: automake: use VISIBILITY_CFLAGS to restrict symbol visibility
v2: Add VISIBILITY_CFLAGS to AM_CFLAGS (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 15:06:19 +01:00
mil Velikov
ebd5dc8826 anv: remove dummy VK_DEBUG_MARKER_EXT entry points
The vkCmdDbgMarker{Begin,End} symbols are exported, yet the json does no
advertise that the driver supports the extension. Furthermore the
functions are empty stubs.

Remove those until we get a proper implementation and json notation.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 15:05:32 +01:00
Emil Velikov
49394e8d77 anv: do not export the Vulkan API
With version 1 of the Loader interface there is an internal/private symbol
(vk_icdGetInstanceProcAddr) which is used to retrieve all the API from the
Vulkan entrypoints from the ICD. Implying that exposing the Vulkan API is not
recommended.

Version 2 goes a step further explicitly forbiding the ICD from exposing Vulkan
symbols (and adding a negotiation API)

As a reference:
 - Nvidia 367.35
Missing negotiation API - version 1.
Exposes only vk_icdGetInstanceProcAddr.

 - AMD 16.30.3.306809
Have negotiation API - version 2,
Exposes vk_icdGetInstanceProcAddr.
Exposes a couple of Vulkan entry points - seems to be in violation with the spec.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 14:55:42 +01:00
Emil Velikov
1cdb6ca40b anv: automake: build with -Bsymbolic
Explicitly suggested in the Loader interface version 2 section, but it's good
idea either way. It essentially, ensures that our symbols are not interposed.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:33 +01:00
Emil Velikov
40e4fff563 anv: automake: use VISIBILITY_CFLAGS to restrict symbol visibility
Hide the internal symbols and annotate the vk_icdGetInstanceProcAddr as public
since the loader needs it (since v1 of the loader interface).

v2: Add VISIBILITY_CFLAGS to AM_CFLAGS (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:30 +01:00
Emil Velikov
b0d56f2f4f anv: remove internal 'validate' layer
Presently the layer has only a single entry point. As mentioned by Jason the
function does not validate anything that isn't checked elsewhere, thus we can
drop the whole thing.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:24 +01:00
Kenneth Graunke
3a9e6102b4 nir/search: Extend 'a@bool' to handle a couple of system values.
load_front_face and load_helper_invocation produce booleans.

On Broadwell:

total instructions in shared programs: 11638956 -> 11638011 (-0.01%)
instructions in affected programs: 115093 -> 114148 (-0.82%)
helped: 628
HURT: 14

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:27:27 -07:00
Kenneth Graunke
e8543feba7 nir/search: Fold src_is_bool()/alu_instr_is_bool() into src_is_type().
I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave
differently.  Having the logic spread out over three functions makes it
harder to decide where to put new logic, as well.

So, combine them all.  It's a bit simpler because there's now only one
recursive function rather than a pair of mutually recursive functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:27:15 -07:00
Kenneth Graunke
241870fe5b nir/search: Introduce a src_is_type() helper for 'a@type' handling.
Currently, 'a@type' can only match if 'a' is produced by an ALU
instruction.  This is rather limited - there are other cases we
can easily detect which we should handle.

Extending the code in-place would be fairly messy, so we introduce
a new src_is_type() helper.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:26:47 -07:00
Kenneth Graunke
d14dd727f4 i965: Fix barrier count shift in scalar TCS backend.
The "Barrier Count" field goes in 14:9 of m0.2.  The vec4 backend
correctly shifts by 9, but the scalar backend only shifted by 8.

It's not like this changed - I think I just made a typo when writing
the original scalar TCS backend code.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:47:00 -07:00
Kenneth Graunke
159f037755 i965: Fix execution size of scalar TCS barrier setup code.
Previously, the scalar TCS backend was generating:

mov(8)   g17<1>UD     0x00000000UD    { align1 WE_all 1Q compacted };
and(8)   g17.2<1>UD   g0.2<0,1,0>UD   0x0001e000UD  { align1 WE_all 1Q };
shl(8)   g17.2<1>UD   g17.2<8,8,1>UD  0x0000000bUD  { align1 WE_all 1Q };
or(8)    g17.2<1>UD   g17.2<8,8,1>UD  0x00008200UD  { align1 WE_all 1Q };
send(8)  null<1>UW    g17<8,8,1>UD
         gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

This is rubbish - g17.2<8,8,1>UD spans two registers, and is an illegal
region.  Not to mention it clobbers 8 channels of data when we only
wanted to touch m0.2.

Instead, we want:

mov(8)   g17<1>UD     0x00000000UD    { align1 WE_all 1Q compacted };
and(1)   g17.2<1>UD   g0.2<0,1,0>UD   0x0001e000UD  { align1 WE_all };
shl(1)   g17.2<1>UD   g17.2<0,1,0>UD  0x0000000bUD  { align1 WE_all };
or(1)    g17.2<1>UD   g17.2<0,1,0>UD  0x00008200UD  { align1 WE_all };
send(8)  null<1>UW    g17<8,8,1>UD
         gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

Using component() accomplishes this.

Fixes GL44-CTS.tessellation_shader.tessellation_shader_tc_barriers.
barrier_guarded_read_write_calls on Skylake.  Probably fixes other
barrier issues on Gen8+.

v2: Use a group(1, 0) builder so inst->exec_size is set correctly
    (thanks to Francisco Jerez for catching that it was incorrect).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-18 00:47:00 -07:00
Kenneth Graunke
9e778837ff i965: Implement the WaPreventHSTessLevelsInterference workaround.
Fixes several GL44-CTS.tessellation_shader (and GL45 and ES31) subcases:
- vertex_spacing
- tessellation_shader_point_mode.points_verification
- tessellation_shader_quads_tessellation.inner_tessellation_level_rounding

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:46:55 -07:00
Kenneth Graunke
d8971128ac nir/builder: Add bany_inequal and bany helpers.
The first simply picks the bany_inequal[234] opcodes based on the SSA
def's number of components.  The latter implicitly compares with zero
to achieve the same semantics of GLSL's any().

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:46:04 -07:00
Kenneth Graunke
01e99cba04 mesa: Fix uf10_to_f32() scale factor in the E == 0 and M != 0 case.
GL_EXT_packed_float, 2.1.B Unsigned 10-Bit Floating-Point Numbers:

        0.0,                      if E == 0 and M == 0,
        2^-14 * (M / 32),         if E == 0 and M != 0,
        2^(E-15) * (1 + M/32),    if 0 < E < 31,
        INF,                      if E == 31 and M == 0, or
        NaN,                      if E == 31 and M != 0,

In the second case (E == 0 and M != 0), we were multiplying the mantissa
by 2^-20, when we should have been multiplying by 2^-19 (which is
2^(-14 + -5), or 2^-14 * 2^-5, or 2^-14 / 32).

The previous section defines the formula for 11-bit numbers, which is:

        2^-14 * (M / 64),         if E == 0 and M != 0,

In other words, we had accidentally copy and pasted the 11-bit code
to the 10-bit case, and neglected to change the exponent.

Fixes dEQP-GLES3.functional.pbo.renderbuffer.r11f_g11f_b10f_triangles
when run with surface dimensions of 1536x1152 or 1920x1080.

Cc: mesa-stable@lists.freedesktop.org
References: https://code.google.com/p/chrome-os-partner/issues/detail?id=56244
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
2016-08-17 17:26:11 -07:00
Tim Rowley
0ff57446e3 swr: [rasterizer core] only use Viewport/Scissors during SwrDraw* operations
Add explicit rects for:

- SwrClearRenderTarget
- SwrDiscardRect
- SwrInvalidateTiles
- SwrStoreTiles

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
6209dbf5a4 swr: [rasterizer common] reorder SWR_FORMAT_INFO
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
2a25ce7472 swr: [rasterizer core] make dirtytile list point directly to macrotilequeues
Speeds up high geometry HPC workloads.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
550503e776 swr: [rasterizer core] portability - remove use of INT64
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
d70f96fd67 swr: [rasterizer core] viewport transform disabled fix
When viewport transform is disabled (ie. screen space coords are passed
in directly), the W component should be interpreted as RHW.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
812b45d049 swr: [rasterizer core] clamp scissor rects to current tile rect
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
93fb768c7e swr: [rasterizer core] align stats structures
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
9a25987b4a swr: [rasterizer core] use AVX2 permute to simplify PaTriList
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
c7c1a03f90 swr: [rasterizer core] move some global variables to SWR_CONTEXT
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
b8c4717567 swr: [rasterizer core] change scale on VP matrix element gathers
Was 1, which led to pulling denorms for non-zero indices.
Changed to sizeof(float).

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:54 -05:00
Tim Rowley
d816c5d6ad swr: [rasterizer] implementing native AVX-512 simd16 intrinsics
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:49 -05:00
Jason Ekstrand
342756a100 i965/blorp: Use nir_alu_type for the texture data type
This lets us remove the brw_reg.h include

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
ce2a9831cc i965: brw_blorp_blit.cpp -> blorp_blit.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
934adf1c30 i965: brw_blorp_clear.cpp -> blorp_clear.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
f5fbcc3683 i965: Split brw_blorp.c/h into multiple files
This mega-commit pulls most of the i965-specific bits of blorp into the
brw_blorp.c/h files which now contain nothing but i965 wrappers around
"core blorp" calls.  The "core blorp" api is moved into blorp.h and the
internal blorp data structures are moved into blorp_priv.h.  The new file
blorp.c is created to house "core blorp" internals which are pulled from
the old brw_blorp.c

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
075cc874bb i965/blorp: Factor the guts of blorp_hiz_exec into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
9d22fd934a i965/blorp: Break the guts of do_single_blorp_clear into two helpers
The helpers are completely miptree-unaware and each fairly cleanly do a
single thing.  This does come at the downside of not doing proper debug
reporting on whether or not we're doing replicated clears.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
7cddca39c0 i965/meta_util: Convert get_fast_clear_rect to take an isl_surf
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
376ce1d26e i965/blorp/clear: Move isl_surf setup higher in the function
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
583f040fda i965/blorp: Refactor fast-clear logic a bit
This pulls the mcs allocation into the if statement where we initially
determine that we are doing a fast clear and moves the programming of
wm_inputs and figuring out the fast clear rect into it's own if statement.
The next commit will put code inbetween the two.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
457a408932 i965/blorp/clear: Stop stomping the destination format
The blorp_surface_info_init call above should set the format for us and
stomping it later does nothing whatsoever.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
a6c2091da6 i965/meta_util: Only modify the input parameters in get_fast_clear_rect
We had another inline copy of brw_meta_get_buffer_rect embedded in
get_fast_clear_rect for no good reason.  This lets us get rid of the
gl_frameuffer parameter to get_fast_clear_rect.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
f748e15735 i965/blorp: Stop calling brw_meta_get_buffer_rect
We already have an inlined version of the function slightly higher up in
do_single_blorp_clear and all calling it does is stomp the values with the
same thing.  We might as well just get rid of it.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
18aad17ce2 i965/blorp: Pull the guts of resolve_color into a miptree-agnostic helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
dff74b83e1 i965/meta_util: Convert get_resolve_rect to use ISL
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
8fccdf85ba i965/blorp: Make the guts of brw_blorp_blit_miptrees miptree-unaware
Now that we have the brw_blorp_surf struct, we can start to make bits of
blorp completely miptree-unaware.  To start things off, we split the guts
of brw_blorp_blit_miptrees into a brw_blorp_blit function which knows
nothing about miptrees.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
75deae9c90 i965/blorp: Add a new brw_blorp_surf intermediate struct
At the moment, this seems to make all of the interfaces messier rather than
clener.  However, it does provide a representation of a surface that
simultaneously contains everything and is completely unaware of miptrees.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
57664c869f i965/blorp: Use the isl_surf for more params setup
The isl_surf munging doesn't happen until fairly late in the blorp_blit
function.  We can use the isl_surf for the vast majority if not all of our
params setup.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
d8644f3eb6 i965/blorp: Do gen6 stencil offsets up-front
This keeps all of the nastyness of gen6 stencil on the i965 side of the API
line and lets us delete that nasty hand-rolled ISL-based offset path that
we were using for ALL_SLICES_AT_EACH_LOD.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
406c503396 i965/blorp: Set up HiZ surfaces up-front
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
4d86b3fa2d i964/blorp: Set up most aux surfaces up-front
This commit also adds support for an offset for aux surfaces.  In GL, this
only gets used for HiZ on SNB at the moment.  However, in Vulkan, all aux
surfaces are at a non-zero offset and that is likely to happen in GL
eventually.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
d540864730 i965/blorp: Stop using the miptree in state setup for tex/rt surfaces
This commit movies us from a miptree model to a surf+bo+offset model.  In
the GL driver, miptrees are almost always at the start of the bo so the
offset is zero but we don't want to always make that assumption.  In the
sort term, gen6 stencil and HiZ will be at an offset but, in the long term,
any Vulkan surface is liable to be at a non-zero offset.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
8b02cd44d7 i965/blorp/blit: Move format work-arounds before surface_info_init
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
20c06d2b79 i965/miptree: Add real support for HiZ
The previous HiZ support was bogus because all of get_aux_isl_surf looked
at mt->mcs_mt directly.  For HiZ buffers, you need to look at either
mt->hiz_buf or mt->hiz_buf->mt.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
dc880c99b6 isl/state: Only set clear color if aux is used
Otherwise, the clear color will get ignored.  This prevents assertion
errors if clear color is set to something invalid and aux is not used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
2684e48321 i965/miptree: Use the isl helpers for creating aux surfaces
In order for the calculations of things such as fast clear rectangles to
work, we need more details of the auxiliary surface to be correct.  In
particular, we need to be able to trust the width and height fields.
(These are not necessarily what you want coming out of the miptree.)  The
only values state setup really cares about are the row and array pitch and
those we can safely stomp from the miptree.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
d9df82f2ff isl: Add helpers for creating different types of aux surfaces
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
3c44d99653 i965/miptree: Use mcs_mt->qpitch for aux surfaces
At one point, we were doing this correctly.  It must have gotten lost in
one of the many rebases.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
67ea60db0b i965/miptree: Allow get_aux_isl_surf when there is no aux surface
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
dd46c8da31 i965/miptree: Support depth in get_isl_clear_color
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
6155d4ef56 isl/state: Add an assertion for IVB multisample array textures
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
3c75b315e1 isl: Add a #define for DEV_IS_BAYTRAIL
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
56746d04d5 i965/blorp: Remove unused fields from blorp_surface_info
The only reason why we need layer or level is that we need the z-offset for
3-D surfaces.  Let's just have the one field for that.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
1495b6315e i965/blorp: Simplify depth buffer state setup a bit
The data comes in via ISL in a format that's almost directly usable by the
hardware so we can avoid some of the conversion headache.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
d814353365 i965/blorp: Use the generic surface state path for gen8 textures
Now that the generic blorp path uses base level/layer, there's no need to
make gen8 special.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
ed432fd681 isl: Add asserts for gen8+ X/YOffset rules
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
96fa98c18e i965/blorp: Only do offset hacks for fake W-tiling and IMS
Since the dawn of time, blorp has used offsets directly to get at different
mip levels and array slices of surfaces.  This isn't really necessary since
we can just use the base level/layer provided in the surface state.  While
it may have simplified blorp's original design, we haven't been using the
blorp path for surface state on gen8 thanks to render compression and
there's really no good need for it most of the time.  This commit restricts
such surface munging to the cases of fake W-tiling and fake interleaved
multisampling.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
9f9abc8214 i965/blorp: Add a z_offset field to blorp_surface_info
The layer field is in terms of physical layers which isn't quite what the
sampler will want for 2-D MS array textures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
a9a6df807e i965/blorp: Pass the Z component into all texture operations
Multisample array surfaces on IVB don't support the minimum array element
surface attribute so it needs to come through the sampler message.  We may
as well just pass it through everything.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
7abcdfbe13 i965/blorp: Rework hiz rect alignment calculations
At the moment, the minify operation does nothing because
params.depth.view.base_level is always zero.  However, as soon as we start
using actual base miplevels and array slices, we are going to need the
minification.  Also, we only need to align the surface dimensions in the
case where we are operating on miplevel 0.  Previously, it didn't matter
because it aligned on miplevel 0 and, for all other miplevels, the miptree
code guaranteed that the level was already aligned.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
871893cda2 i965/blorp: Map 1-D render targets with DIM_LAYOUT_GEN4_2D as 2D on gen9
The sampling hardware can handle them ok.  It just looks at the tiling to
determine whether it's the new gen9 1-D layout or the old one.  The render
hardware isn't so smart.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
ecd9789368 i965/miptree: Fill out the isl_surf::usage field
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
560a92c4fd isl: Take the slice0_extent shortcut for interleaved MSAA
The shortcut works just fine for MSAA and the comment even says so.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
1e02611276 isl: Remove duplicate px->sa conversions
In all three cases, we start with width and height taken from
isl_surf::phys_slice0_extent_sa which is already in samples.  There is no
need to do the conversion and doing so gives us an incorrect value.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
603d5f7638 i965/blorp: Use the isl_view from the blorp_surface_info
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
c097160463 i965/blorp: Get rid of brw_blorp_surface_info::width/height
Instead, we manually mutate the surface size as needed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
2095f932ef i965/blorp: Move surface offset calculations into a helper
The helper does a full transformation on the surface to turn it into a new
2-D single-layer single-level surface representing the original layer and
level in memory.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
90ab43d1bb i965/blorp: Use ISL to compute image offsets
For the moment, we still call the old miptree function; we just assert that
the two are equal.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
ba88a9622d isl: Add functions for computing surface offsets in samples
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
f6c75df083 isl: Fix get_image_offset_sa_gen4_2d for multisample surfaces
The function takes a logical array layer but was assuming it was a physical
array layer.  While we'er here, we also make it not assert-fail on gen9 3-D
surfaces.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
7997f4f95b i965/blorp: Add an isl_view to blorp_surface_info
Eventually, this will be the actual view that gets passed into isl to
create the surface state.  For now, we just use it for the format and the
swizzle.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
e046a46460 i965/blorp: Move intratile offset calculations out of surface state setup
Previously we multiplied full x/y offsets, resolved tile aligned buffer
offset and intra tile offset based on that.  Now we let ISL to take into
account the msaa setting and we only multiply the resolved intra tile
offsets.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
27a58615d3 i965/blorp: Refactor interleaved multisample destination handling
We put all of the code for fake IMS together.  This requires moving a bit
of the program key setup code further down so that it gets the right values
out of the final surface.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
3c25caa318 i965/blorp: Get rid of brw_blorp_surface_info::array_layout
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
09879eff30 i965/blorp: Use isl_msaa_layout instead of intel_msaa_layout
We also remove brw_blorp_surface_info::msaa_layout.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
e2a1bdb3c5 i965/blorp: Use the ISL aux_layout for deciding whether to do an MCS fetch
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
28b0ad890c i965/blorp: Get rid of brw_blorp_surface_info::num_samples
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
aa6c058ac4 i965/blorp: Make sample count asserts a bit more lazy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
aa4117a9e4 i965/blorp: Get rid of brw_blorp_surface_info::map_stencil_as_y_tiled
Now that we're carrying around the isl_surf, we can just modify it
directly instead of passing an extra bit around.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
801189e199 i965/blorp: Remove compute_tile_offsets
We have a handy little function is ISL that does exactly the same thing.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
b82de88008 i965/blorp: Create the isl_surf up-front
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
ffeb5f67ac i965/blorp/clear: Initialize surface info after allocating an MCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
1666d029aa isl/state: Use a valid alignment for 1-D textures
The alignment we use doesn't matter (see the comment) but it should at
least be an alignment we can represent with the enums.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
0aa0b39769 i965/miptree: Remove the stencil_as_y_tiled parameter from get_tile_masks
It's only used to stomp the tiling to Y and it's only used by blorp so
there's no reason why blorp can't do it itself.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Jason Ekstrand
573f6ffd04 isl: Fix the parameter names for get_intratile_offset
It's been in elements for a while but, for whatever reason, the parameter
names in the header file never got updated.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-17 14:46:22 -07:00
Brian Paul
5de29aeef0 util: try to use SSE instructions with MSVC and 32-bit gcc
The lrint() and lrintf() functions are pretty slow and make some
texture transfers very inefficient.  This patch makes a better effort
at using those intrisics for 32-bit gcc and MSVC.

Note, this patch doesn't address the use of SSE4.1 with MSVC.

v2: get rid of the ROUND_WITH_SSE symbol, per Matt.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 12:53:20 -06:00
Brian Paul
18e6e0796a svga: fix src/dst typo in can_blit_via_copy_region_vgpu10()
The function was always returning false because of this typo.

Retested with piglit.  There's some sRGB-related blit failures, but
that seems unrelated.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2016-08-17 12:53:20 -06:00
Brian Paul
55417140cd svga: initialize a variable to silence a gcc warning
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-17 12:53:20 -06:00
Ian Romanick
607ab6d3bf glsl: Pull enum ir_expression_operation out to its own file
No change except to the copyright symbol.  The next patch will generate
this file with Python, and Unicode + Python = pure rage.

v2: Massive rebase... I guess a lot can change in a year.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 13:48:25 +01:00
Ian Romanick
de71bc9eb6 glsl: Make the generated sources build rules more like NIR
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 13:48:25 +01:00
Francesco Ansanelli
120c9c6380 mesa/st: use llabs instead of abs for long args (v2)
v2: long has 32bit on Windows (Marek)
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 14:16:29 +02:00
Marek Olšák
57a8991020 radeonsi: fix up buffer descriptor upper-bound checking
st/mesa does this too, so we're safe.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:33 +02:00
Marek Olšák
325379096f gallium: change pipe_image_view::first_element/last_element -> offset/size
This is required by OpenGL. Our hardware supports this.

Example: Bind RGBA32F with offset = 4 bytes.

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:33 +02:00
Marek Olšák
7cd256ce7e gallium: change pipe_sampler_view::first_element/last_element -> offset/size
This is required by OpenGL. Our hardware supports this.

Example: Bind RGBA32F with offset = 4 bytes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97305

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:33 +02:00
Marek Olšák
1ac23a9359 gallium/radeon: assign the highest priority to scratch; make rings second
just FYI, the kernel receives priority/4

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:29 +02:00
Marek Olšák
9009516501 gallium/winsys: re-number winsys priority flags
free 60..63, move CP_DMA up

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
95020c6dfd gallium/radeon: mark shader rings as highest-priority buffers
and rename the enum

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
e2bb24f213 gallium/radeon: set SHADER_RW_BUFFER priority for streamout buffers
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
a6b5845a0d radeonsi: use current context for DCC feedback-loop decompress, fixes Elemental
This is just a workaround. The problem is described in the code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541

v2: say that it's only between the current context and aux_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-08-17 12:24:35 +02:00
Marek Olšák
9812a50ae6 radeonsi: simplify CB_TARGET_MASK logic
we can now rely on CB_COLORn_INFO to disable empty slots.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
2d2b384066 radeonsi: don't set CB_COLOR1_INFO for dual src blending
Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT
from NI was moved to SPI_SHADER_COL_FORMAT for SI.

I asked CB guys about this 2 days ago and they still haven't replied.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
e722b90bc9 radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not bound
All VP DX9 ports benefit from this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Marek Olšák
3de8ffe836 gallium/radeon: use unflushed fences for PIPE_QUERY_GPU_FINISHED
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 12:24:35 +02:00
Nicolai Hähnle
c5798d6314 gallium/radeon: use lp_build_alloca_undef
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:25 +02:00
Nicolai Hähnle
41001ca4bd gallivm: add lp_build_alloca_undef
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
17e88e276c gallivm: add create_builder_at_entry helper function
Reduces code duplication.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
f4204ba53d gallium/radeon: protect against out of bounds temporary array accesses
They can lead to VM faults and worse, which goes against the GL robustness
promises.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
ea283779be gallium/radeon: add radeon_llvm_bound_index for bounds checking
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
8916d1e2fa gallium/radeon: reduce alloca of temporaries based on usagemask
v2: take actual writemasks into account

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle
6bba956073 gallium/radeon: use tgsi_scan_arrays for temp arrays
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
7c2295d7ef gallium/radeon: allocate temps array info in radeon_llvm_context_init
Also, prepare for using tgsi_array_info.

This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
850c8dcc9c gallium/radeon: always do the full store in store_value_to_array
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.

This also allows us to get rid of get_alloca_for_array.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
4b150931c9 gallium/radeon: extract common getelementptr logic into get_pointer_into_array
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
dfbb8ea284 gallium/radeon: pass indirect register info into get_alloca_for_array
To have the same signature as get_array_range.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
b76aabffa2 gallium/radeon: extract common lookup code into get_temp_array function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:23 +02:00
Nicolai Hähnle
fa84296a5a gallium/radeon: clarify the comment on the array alloca heuristic
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
92b66b38c9 gallium/radeon: more descriptive names for LLVM temporaries in debug builds
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
eacfc86d83 gallium/radeon: simplify radeon_llvm_emit_store for direct array addressing
We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
87fa7cea23 gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressing
We can use the pointer stored in the temps array directly.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
eb50cbf3bd gallium/radeon: clean up emit_declaration for temporaries
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
cb9ed66cc5 st_glsl_to_tgsi: use calloc the way it's meant to be used
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:22 +02:00
Nicolai Hähnle
67c0f077a2 tgsi/scan: add tgsi_scan_arrays
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:21 +02:00
Ian Romanick
2ec3a3e151 glsl: Add missing ir_quadop_vector constant evaluation for Boolean types
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 10:52:39 +01:00
Ian Romanick
cf58e3f522 glsl: Fix typo in ir_unop_f2u implementation
This won't affect the output, but it was, technically, wrong.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 10:52:39 +01:00
Ian Romanick
8b123b08cb glsl: Fix typo in ir_unop_b2i implementation
This won't affect the output, but it was, technically, wrong.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 10:52:39 +01:00
Ian Romanick
cd8764737e glsl: Don't support integer types for operations that can't handle them
ir_unop_fract already forbade integer types in ir_validate.  ir_unop_rcp,
ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 10:52:39 +01:00
Ian Romanick
437e612bd7 glsl: Don't support ir_unop_abs or ir_unop_sign for unsigned integers
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-17 10:52:39 +01:00
Ian Romanick
cceb50e14e nir/algebraic: Optimize common array indexing sequence
Some shaders include code that looks like:

   uniform int i;
   uniform vec4 bones[...];

   foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);

CSE would do some work on this:

   x = i * 3
   foo(bones[x], bones[x + 1], bones[x + 2]);

The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like

   x = i * 3
   foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);

Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.

   x = i * 48;
   foo(bones[x], bones[x + 16], bones[x + 32]);

So, ~6 instructions becomes ~3.

Some individual shader-db results look pretty bad.  However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.

G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0

total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317

Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307

Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261

Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5

total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415

Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5

total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-17 10:52:38 +01:00
Michel Dänzer
4ac640e3d2 glx: Don't use current context in __glXSendError
There's no guarantee that there is one, and we don't need one anyway.

Fixes piglit tests:

glx@glx-fbconfig-bad
glx@glx_ext_import_context@import context, multi process
glx@glx_ext_import_context@import context, single process

Fixes: 2e3f067458 ("glx: fix error code when there is no context bound")
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-08-17 17:16:34 +09:00
Ilia Mirkin
e988999791 nv50/ir: fix bb positions after exit instructions
It's fairly rare that the BB layout puts BBs after the exit block, which
is likely the reason these issues lingered for so long.

This fixes a fraction of issues with the giant pixmark piano shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-08-16 21:56:16 -04:00
Ilia Mirkin
0b5f40b881 nv50/ir: properly clear upper bits of a bitset fill
Found by inspection. In practice, val is always == 0, so this never got
triggered.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-08-16 21:56:16 -04:00
Francisco Jerez
4d436c011f i965/fs: Estimate maximum sampler message execution size more accurately.
The current logic used to determine the execution size of sampler
messages was based on special-casing several argument and opcode
combinations, which unsurprisingly missed the possibility that some
messages could exceed the payload size limit or not depending on the
number of coordinate components present.  In particular:

 - The TXL, TXB and TEX messages (the latter on non-FS stages only)
   would attempt to use SIMD16 on Gen7+ hardware even if a shadow
   reference was present and the texture was a cubemap array, causing
   it to overflow the maximum supported sampler payload size and
   crash.

 - The TG4_OFFSET message with shadow comparison was falling back to
   SIMD8 regardless of the number of coordinate components, which is
   unnecessary when two coordinates or less are present.

Both cases have been handled incorrectly ever since cubemap arrays and
texture gather were respectively enabled (the current logic used by
the SIMD lowering pass is almost unchanged from the previous no16
fall-back logic used pre-SIMD lowering times).

Fixes the following GL4.5 conformance test on Gen7-8 (the bug also
affects Gen9+ in principle, but SKL passes the test by luck because it
manages to use the TXL_LZ message instead of TXL):

 GL45-CTS.texture_cube_map_array.sampling

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-16 16:31:59 -07:00
Francisco Jerez
61a02fb74c i965/fs: Return zero from fs_inst::components_read for non-present sources.
This makes it easier for the caller to find out how many scalar
components are actually read by the instruction.  As a bonus we no
longer need to special-case BAD_FILE in the implementation of
fs_inst::regs_read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-16 16:31:59 -07:00
Francisco Jerez
0c754d1c42 i965/fs: Lower TEX to TXL during NIR translation.
This simplifies the code slightly and will allow the SIMD lowering
pass to find out easily what the actual texturing opcode is in order
to determine the maximum execution size of texturing instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-16 16:31:59 -07:00
Rob Clark
5def00875d freedreno/a3xx: fix generic clear path
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-16 19:26:03 -04:00
Brian Paul
df2dcf6200 st/mesa: use pipe var instead of st->pipe in st_create_context_priv()
As is done in most other places in the function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:33 -06:00
Brian Paul
038b1b11fe gallium: remove unused u_clear.h file
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:33 -06:00
Brian Paul
22b8288b33 gallium/i915: inline the util_clear() code into i915_clear_blitter()
This is the only place the util_clear() function was used.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:32 -06:00
Brian Paul
66debeae9d gallium/util: minor reformatting in u_box.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:32 -06:00
Brian Paul
b6c81a780f svga: remove unused var in svga_mark_surfaces_dirty()
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-08-16 08:28:22 -06:00
Brian Paul
1e5eb79d9a svga: avoid a calloc in svga_buffer_transfer_map()
Just initialize the two other pipe_transfer fields explicitly.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:53 -06:00
Brian Paul
f934117bbb svga: don't call os_get_time() when not needed by Gallium HUD
The calls to os_get_time() were showing up higher than expected in
profiles.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:53 -06:00
Brian Paul
dcf2126f90 svga: remove unneeded memset() call in draw_vgpu10()
All three fields of the vbuffer_attrs[] array are assigned in the following
loop.  The remaining elements of the array are not used.

Tested with full Piglit run, Heaven 4.0, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
ced0dd0e95 svga: reduce looping in svga_mark_surfaces_dirty()
We don't need to loop over the max number of color buffers, just the
current number (which is usually one).

Tested with full Piglit run, Heaven 4.0, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
88efaf9878 svga: minor clean-ups in define_rasterizer_object()
Add const qualifiers, new comment.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
ce9c05a593 svga: remove incorrect buffer invalidation code
Fixes regression with team_fortress_2 trace.
This change has been in our in-house tree for some time.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
06b23f747d svga: additional comments for svga_hw_draw_state members
And re-order a few fields.

Signed-off-by: Brian Paul <brianp@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
7c5eda6f4e svga: use the sws local var to simplify some code
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-08-16 08:24:52 -06:00
Brian Paul
7b821941f6 svga: minor whitespace and code clean-ups
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-08-16 08:24:52 -06:00
Rob Clark
27f12dd8fd freedreno/a4xx: use generic clear path
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-16 09:21:13 -04:00
Rob Clark
f77e59e76c freedreno/a3xx: use generic clear path
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-16 09:21:13 -04:00
Rob Clark
a8e6734a83 freedreno: support for using generic clear path
Since clears are more or less just normal draws, there isn't that much
benefit in having hand-rolled clear path.  Add support to use u_blitter
instead if gen specific backend doesn't implement ctx->clear().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-16 09:21:13 -04:00
Rob Clark
142dd7b9c0 gallium/u_blitter: split out a helper for common clear state
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Rob Clark
2b2f436c69 gallium/u_blitter: add helper to save FS const buffer state
Not (currently) state that is overwridden by u_blitter itself, but
drivers with custom blit/clear which are reusing part of the u_blitter
infrastructure will use it.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Rob Clark
433e12fea8 gallium/u_blitter: export some functions
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Nicolas Boichat
78e3cea419 egl/dri2: dri2_make_current: Release previous context's display
eglMakeCurrent can also be used to change the active display. In that
case, we need to decrement ref_count of the previous display (possibly
destroying it), and increment it on the next display.

Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
we only need to test if old_ctx is non-NULL.

v2: Save the old display before destroying the context.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Tested-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
2016-08-16 17:30:35 +09:00
Nayan Deshmukh
09dff7ae2e st/vdpau: change the order in which filters are applied(v3)
Apply the median and matrix filter before the compostioning
we apply the deinterlacing first to avoid the extra overhead
in processing the past and the future surfaces in deinterlacing.

v2: apply the filters on all the surfaces (Christian)
v3: use get_sampler_view_planes() instead of
    get_sampler_view_components() and iterate over
    VL_MAX_SURFACES (Christian)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-16 10:07:35 +02:00
Kenneth Graunke
1f47f78fc3 glcpp: Update tests for new #undef of built-in macro rules.
Ian recently changed the preprocessor to allow this in most GLSL
versions, but not GLSL ES 3.00+.  This patch converts the existing
test that expects a failure to a #version 300 es shader, and adds
a #version 110 shader to make sure that it's allowed.

Fixes 'make check'.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97307
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2016-08-15 22:55:34 -07:00
Dave Airlie
c2f2252037 anv: fix writemask on blit fragment shader.
I'm not sure if anything even uses this, but I found this on radv, so
just fix it on anv for consistency.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-08-16 10:29:44 +10:00
Nicolas Boichat
c0580f6a38 egl/android: Set dpy->DriverData to NULL on error
Avoid use-after-free on error.

Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:30 +01:00
Nicolas Boichat
a9e8fb7397 egl/drm: Set disp->DriverData to NULL on error
Avoid use-after-free on error.

Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:30 +01:00
Nicolas Boichat
0e67d86540 egl/surfaceless: Set disp->DriverData to NULL on error
Avoid use-after-free on error.

Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:30 +01:00
Nicolas Boichat
48fd952f28 egl/wayland: Set disp->DriverData to NULL on error
Avoid use-after-free, fix spec@egl_khr_fence_sync@conformance.

Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:30 +01:00
Jan Ziak
769ac1ec78 egl/x11: avoid using freed memory if dri2 init fails
Found with valgrind:

==4841== Invalid read of size 4
==4841==    at 0x56BDC80: dri2_initialize (egl_dri2.c:783)
==4841==    by 0x56BAFE5: _eglMatchAndInitialize (egldriver.c:261)
==4841==    by 0x56BB15E: _eglMatchDriver (egldriver.c:295)
==4841==    by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841==    by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x402E59: main
==4841==  Address 0x6a05824 is 148 bytes inside a block of size 480 free'd
==4841==    at 0x4C2B680: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4841==    by 0x56C2AAE: dri2_initialize_x11_swrast (platform_x11.c:1233)
==4841==    by 0x56C2AAE: dri2_initialize_x11 (platform_x11.c:1493)
==4841==    by 0x56BDCEB: dri2_initialize (egl_dri2.c:805)
==4841==    by 0x56BAFAF: _eglMatchAndInitialize (egldriver.c:261)
==4841==    by 0x56BB0C9: _eglMatchDriver (egldriver.c:292)
==4841==    by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841==    by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x402E59: main
==4841==  Block was alloc'd at
==4841==    at 0x4C2A868: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4841==    by 0x56C2A47: dri2_initialize_x11_swrast (platform_x11.c:1171)
==4841==    by 0x56C2A47: dri2_initialize_x11 (platform_x11.c:1493)
==4841==    by 0x56BDCEB: dri2_initialize (egl_dri2.c:805)
==4841==    by 0x56BAFAF: _eglMatchAndInitialize (egldriver.c:261)
==4841==    by 0x56BB0C9: _eglMatchDriver (egldriver.c:292)
==4841==    by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841==    by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841==    by 0x402E59: main

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:29 +01:00
Emil Velikov
6b4b2a4dd6 anv: add genX_multisample.h to the sources list(s).
Otherwise it won't end up in the release tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 19:00:29 +01:00
Kevin Strasser
71258e9462 anv/x11: Add support for Xlib platform
Some applications continue to use the Xlib client library and expect that
VK_KHR_xlib_surface will be available in the driver. Service these
applications by converting the Display pointer to xcb_connection_t and use
the existing xcb code in the driver.

Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-15 09:47:06 -07:00
Tapani Pälli
5d9b50e596 glx: apple specific occurences of dummyContext check
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Cc: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
2016-08-15 09:24:10 +03:00
Bernard Kilarski
2e3f067458 glx: fix error code when there is no context bound
v2: change all related NULL checks to check against dummyContext
v3: really check for dummyContext *only* when ctx was from
    __glXGetCurrentContext
v4: cover more checks, add dummyBuffer, dummyVtable (Emil)

Signed-off-by: Bernard Kilarski <bernard.r.kilarski@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-08-15 09:24:10 +03:00
Mathias Fröhlich
312ece9cd7 mesa: Remove duplicate include.
In api_validate.c stdbool.h was included twice.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-15 07:10:39 +02:00
Mathias Fröhlich
84984b9986 vbo: Remove always true return from vbo_bind_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-15 07:10:39 +02:00
Mathias Fröhlich
72f1566f90 mesa: Move check for vbo mapping into api_validate.c.
Instead of checking for mapped buffers in vbo_bind_arrays
do this check in api_validate.c. This additionally
enables printing the draw calls name into the error
string.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-15 07:10:39 +02:00
Mathias Fröhlich
b7b0c51f1f mesa: Move _mesa_all_buffers_are_unmapped to arrayobj.c.
Move the function to check if all vao buffers are
unmapped into the vao implementation file.
Rename the function to _mesa_all_buffers_are_unmapped.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-15 07:10:39 +02:00
Mathias Fröhlich
c17cf1c8f5 vbo: Array draw must not care about glBegin/glEnd vbo mapping.
In array draw do not check if the vertex buffer object that
is used to implement immediate mode glBegin/glEnd is mapped.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-15 07:10:39 +02:00
Ilia Mirkin
5c1ccd8053 nv50,nvc0: fix depth range when halfz is enabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-08-14 17:41:49 -04:00
Ilia Mirkin
c85b7f0e87 gallium/util: add helper to compute zmin/zmax for a viewport state
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-08-14 17:41:33 -04:00
Ilia Mirkin
68b64f32e8 vbo: allow DrawElementsBaseVertex in display lists
Looks like it was missed originally. The multi version is there already.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97331
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2016-08-14 12:06:51 -04:00
Rob Clark
561fd226d4 freedreno/a3xx+a4xx: move common VBOs to fd_context
These are the same for a3xx and later.  (a2xx could probably use them
too, but due to limited hw support and ancient downstream kernels, it
isn't so easy to test.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-13 13:59:03 -04:00
francians@gmail.com
a49fb4ab2d freedreno/a2xx: add missing casts to silence notices
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-13 09:37:41 -04:00
Rob Clark
78ba262d00 freedreno/ir3: fix issue with emit_tex()
For various tex fetch instructions, coord's get fixed up in different
ways.  But modifying the array returned from get_src() has side-effects
if the same SSA src is used again.. the later instruction will see the
previous fixups.

Fix this, and const'ify things to prevent this sort of mistake in the
future.

Noticed by Varad when adding support for txf_ms.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-08-13 09:33:47 -04:00
Ilia Mirkin
a32c87f74b glsl: emit a specific error when ast_*_assign changes type
For regular ast_add, we can implicitly change either a or b's type.
However in an assignment situation, the type of the lvalue is fixed. So
if the implicit conversion logic decides to change it, it means that the
rhs's type could not be converted to the lhs type.

Emit a specific error for this rather than the rather mysterious "is not
an lvalue" error that results from having a i2f or other operation as
the lvalue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96729
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-12 22:45:20 -04:00
Ilia Mirkin
d816a51b81 st/mesa: provide GL_OES_copy_image support by caching the original ETC data
The additional provision of GL_OES_copy_image is that it work for ETC.
However many desktop GPUs don't have native ETC support, so st/mesa does
the decoding by hand. Instead of discarding the compressed data, keep it
around in CPU memory. Use it when performing image copies.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-08-12 20:21:08 -04:00
Ilia Mirkin
7727e6f67c st/mesa: refactor duplicated etc fallback checks
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-12 20:21:08 -04:00
Ilia Mirkin
1baae00089 glsl: look for frag data bindings with [0] tacked onto the end for arrays
The GL spec is very unclear on this point. Apparently this is discussed
without resolution in the closed Khronos bugtracker at
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7829 . The
recommendation is to allow dropping the [0] for looking up the bindings.

The approach taken in this patch is to instead tack on [0]'s for each
arrayness level of the output's type, and doing the lookup again. That
way, for

out vec4 foo[2][2][2]

we will end up looking for bindings for foo, foo[0], foo[0][0], and
foo[0][0][0], in that order of preference.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-12 20:21:08 -04:00
Lionel Landwerlin
0294dd00cc anv: pipeline: gen7: fix assert in debug mode
SampleMask is only 8bits long on gen7.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97278

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-08-12 17:03:48 -07:00
Haixia Shi
8c56ff643b mesa: change state query return value for RGB565
The GL_BGR and GL_UNSIGNED_SHORT_5_6_5_REV are not defined anywhere in
OpenGL ES 3.2 (or earlier) specification, and there are no known extensions
in the Khronos registry that would add these enums as valid responses for
glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_TYPE) and
glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_FORMAT) queries.

Note that this patch does not change the bit layout returned by the query. As
defined by the GL spec, the bit layout of GL_RGB + GL_UNSIGNED_SHORT_5_6_5 and
GL_BGR + GL_UNSIGNED_SHORT_5_6_5_REV are identical.

TEST=dEQP-GLES3.functional.state_query.integers.*

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Change-Id: I81bbc8ccdc7e125edaeae443baf6fa8fdefcc6b6
2016-08-12 15:34:09 -07:00
Anuj Phogat
0bf531aee6 anv/device: Add limits for InterpolationOffset
Fixes the vulkan cts regression in test dEQP-VK.api.info.device.properties

Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-12 10:45:02 -07:00
Anuj Phogat
7f6136d7db i965: Change 8X MSAA sample mapping
This is required following the change in 8X sample positions.
Fixes the recently modified multisample-scaled-blit piglit tests.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-12 10:45:02 -07:00
Anuj Phogat
fb1bc5007d i965: Change 8x multisample positions
There are no standard sample positions defined in OpenGL and OpenGL
ES specs. Implementations have the freedom to pick the positions
which give plausible results. But the Vulkan 1.0 spec does define
standard sample positions for different sample counts. Defined
positions in Vulkan for all the sample counts except 8X match with
the positions we set in i965. We have an upcoming plan to share the
blorp code between OpenGL and Vulkan driver in near future. Keeping
the 8X sample positions same on both the drivers will help us move
in that direction.

Here is an argument by Neil Roberts (from commit 20250e85) against
any advantage of current 8X sample positions over the new ones:

"The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern."

Observed no regressions in jenkins testing.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-12 10:45:02 -07:00
Anuj Phogat
1fe36d849c anv: Use macro to avoid code duplication for sample positions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-12 10:45:02 -07:00
Marek Olšák
317e136ef0 st/mesa: BufferData should flag NewDriverState
because NewDriverState is filtered depending on active shader states,
while st->dirty isn't.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:50:01 +02:00
Marek Olšák
085aa7f91e st/mesa: don't update atomic, SSBO, UBO and TBO states that have no effect
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:50:01 +02:00
Marek Olšák
ac032d800e st/mesa: _NEW_TEXTURE & CONSTANTS shouldn't flag states that aren't used
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:50:01 +02:00
Marek Olšák
c323d5b809 st/mesa: when changing shaders, only dirty states that are affected by them
This reduces the amount of state processing that has no effect.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:50:01 +02:00
Marek Olšák
8c1775c14c st/mesa: determine states used or affected by shaders at compile time
At compile time, each shader determines which ST_NEW flags should be set
at shader bind time.

This just sets the new field for all shaders. The next commit will use it.

v2: small code unification

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-08-12 18:49:24 +02:00
Marek Olšák
a7d33315a7 st/mesa: remove TES/TCS/GS state dirtying optimization
This will be replaced with a better mechanism.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:47:24 +02:00
Marek Olšák
0be30ea1a8 st/mesa: don't update clip state on VS changes if it has no effect
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:47:24 +02:00
Marek Olšák
412bd7360c st/mesa: don't update clip state if it has no effect
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-12 18:47:24 +02:00
Chad Versace
dd93cbc894 mesa: Document that _mesa_enum_to_string() returns non-null (v2)
It always returns non-null, even if the number is an invalid enum.

Cc: Haixia Shi <hshi@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Change-Id: I26e8843c96130be972e66f48a49e362442e1bf97
2016-08-12 09:09:55 -07:00
Kenneth Graunke
f9f462936a glsl: Fix invariant matching in GLSL 4.30 and GLSL ES 1.00.
Old languages (GLSL <= 4.20 and GLSL ES 1.00) require "invariant"
to be specified on both inputs and outputs, and match when linking.

New languages only allow outputs to be qualified as "invariant"
and remove the "invariant must match" restriction when linking
varyings (because no input can have that qualifier).

Commit 426a50e208 introduced the new
behavior for ES 3.00.  It also removed the "must match" restriction
for ES 1.00 shaders, which I believe is incorrect.  This patch adds
that back, as well as making 4.30+ follow the new rules.

Thanks to Qiankun Miao for noticing this discrepancy.

Fixes a WebGL 2.0 conformance test when run in Chromium:
https://www.khronos.org/registry/webgl/sdk/tests/deqp/data/gles3/shaders/qualification_order.html?webglVersion=2

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96971
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-11 23:56:53 -07:00
Kenneth Graunke
0ed316360f glsl: Tidy stream handling in merge_qualifier().
The previous commit fixed xfb_buffer handling, which was largely copy
and pasted from the stream handling.  The difference is that stream
was set in input_layout_mask, so it worked.

However, that's totally rubbish: stream is only valid on geometry shader
outputs.  Presumably this was to hack around inout.  Instead, apply the
solution I used in the previous fix.

Really, we just need to separate shader interface and parameter
qualifier handling so this isn't a mess, but this patch at least
tidies it slightly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-11 23:56:48 -07:00
Kenneth Graunke
dffa371665 glsl: Fix inout qualifier handling in GLSL 4.40.
inout variables have q.in and q.out set.  We were trying to set
xfb_buffer = 1 for shader output variables (and inadvertantly setting
it on inout parameters, too).  But input_layout_mask doesn't have
xfb_buffer set, so it was seen as in invalid input qualifier.

This meant that all 'inout' parameters were broken.

Caught by running a WebGL conformance test in Chromium:
https://www.khronos.org/registry/webgl/sdk/tests/deqp/data/gles3/shaders/qualification_order.html?webglVersion=2

Fixes Piglit's tests/spec/glsl-4.40/compiler/inout-parameter-qualifier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-11 23:56:40 -07:00
Miklós Máté
17f1c49b9a swrast: fix active attribs with atifragshader
Only include the ones that can be used by the shader.

This fixes texture coordinates, which were completely wrong,
because WPOS was included in the list of attribs. It also
increases performance noticeably.

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-11 08:29:23 -06:00
Indrajit Das
8074c6b6ea st/omx/dec/h264: pass default scaling lists in raster format
Tested-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-08-11 16:02:28 +02:00
Jose Fonseca
06b63f1f43 appveyor: Force Visual Studio 2013 image.
It seems the default build image is now Visual Studio 2015, and Visual
Studio 2013 is not installed.
2016-08-11 14:39:39 +01:00
Jose Fonseca
16627fc87d appveyor: Install pywin32 extensions.
AppVeyor build images seem to have been upgraded to Python 2.7.12, but
no longer have pywin32 pre-installed.
2016-08-11 14:39:39 +01:00
Timothy Arceri
33b3815773 glsl/tests: fix segfault in uniform initializer test
Caused by 549222f5

Tested-by: Aaron Watry <awatry@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97286
2016-08-11 14:57:18 +10:00
Ian Romanick
50b49d242d glcpp: Only disallow #undef of pre-defined macros on GLSL ES >= 3.00 shaders
Section 3.4 (Preprocessor) of the GLSL ES 3.00 spec says:

   It is an error to undefine or to redefine a built-in (pre-defined)
   macro name.

The GLSL ES 1.00 spec does not contain this text.

Section 3.3 (Preprocessor) of the GLSL 1.30 spec says:

   #define and #undef functionality are defined as is standard for C++
   preprocessors for macro definitions both with and without macro
   parameters.

At least as far as I can tell GCC allow '#undef __FILE__'.  Furthermore,
there are desktop OpenGL conformance tests that expect '#undef
__VERSION__' and '#undef GL_core_profile' to work.

Fixes:

    GL45-CTS.shaders.preprocessor.definitions.undefine_version_vertex
    GL45-CTS.shaders.preprocessor.definitions.undefine_version_fragment
    GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_vertex
    GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_fragment

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-08-10 16:42:02 -07:00
Ian Romanick
eda6349346 glcpp: Track the actual version instead of just the version_resolved flag
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-08-10 16:42:02 -07:00
Timothy Arceri
30e5ff7067 glsl: remove remaining tabs in link_uniform_initializers.cpp
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-11 08:33:38 +10:00
Timothy Arceri
549222f5f8 glsl: use UniformHash to find storage location
There is no need to be looping over all the uniforms.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-11 08:33:30 +10:00
Timothy Arceri
82e153daff glsl: remove dead builtins before assigning varying locations
Builtins already have locations assigned so this shouldn't
change anything. We want to call it earlier so we can tranform
GLSL IR to NIR earlier.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-11 08:33:21 +10:00
Timothy Arceri
588702cc41 glsl: split out varying and uniform linking code
Here a new function link_varyings_and_uniforms() is created this
should help make it easier to follow the code in link_shader()
which was getting very large.

Note the end of the new function contains a for loop with some
lowering calls that currently don't seem related to varyings or
uniforms but they are a dependancy for converting to NIR ealier
so we move things here now to keep things easy to follow.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-11 08:33:12 +10:00
Jason Ekstrand
4c3a6b07e2 i965/vec4: Make opt_vector_float reset at the top of each block
The pass isn't really control-flow aware and you can get into case where it
tries to combine instructions from different blocks.  This can actually
lead to an assertion failure when removing unneeded instructions if part of
the vector is set in one block and part in another.  This prevents
regressions in the next commit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-10 15:19:55 -07:00
Eric Anholt
ac6966360f mesa: Use a temporary set to track whether we've added a resource yet.
Saves another .1s on servo.trace.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-10 12:27:22 -07:00
Eric Anholt
ee02a5e330 prog_hash_table: Convert to using util/hash_table.h.
Improves glretrace -b servo.trace (a trace of Mozilla's servo rendering
engine booting, rendering a page, and exiting) from 1.8s to 1.1s.  It uses
a large uniform array of structs, making a huge number of separate program
resources, and the fixed-size hash table was killing it.  Given how many
times we've improved performance by swapping the hash table to
util/hash_table.h, just do it once and for all.

This just rebases the old hash table API on top of util/, for minimal
diff.  Cleaning things up is left for later, particularly because I want
to fix up the new hash table API a little bit.

v2: Add UNUSED to the now-unused parameter.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-10 12:27:22 -07:00
Eric Anholt
91945f9e91 prog_hash_table: Convert compare funcs to match util/hash_table.h.
I'm going to replace this hash table with util/hash_table.h, and the first
step is to compare things the same way.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-10 12:27:22 -07:00
Eric Anholt
60f1b436b9 nir: Drop an unused program/hash_table.h include.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-10 12:27:22 -07:00
Tim Rowley
6198160250 swr: [rasterizer core] unused variable warning fixes
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:48 -05:00
Tim Rowley
9aa75e5d46 swr: [rasterizer jitter] add core string to JitManager
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:42 -05:00
Tim Rowley
b311bdf92d swr: [rasterizer core] fix OOB check of viewport indices
Use correct comparison intrinsic for OOB check of viewport indices.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:36 -05:00
Tim Rowley
2eae02f77c swr: [rasterizer common] add linux definition for InterlockedAdd64
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:22 -05:00
Tim Rowley
e8b35a2321 swr: [rasterizer jitter] add VMASKSTOREPS intrinsic
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:16 -05:00
Tim Rowley
3393279fc9 swr: [rasterizer jitter] add mask support for odd format fetch
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:10 -05:00
Tim Rowley
92621ac5d5 swr: [rasterizer core] routing of viewport indexes through frontend
Viewport transform performed based on per-prim viewport index if available.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:09:00 -05:00
Tim Rowley
4e8763cb09 swr: [rasterizer core] split FE and BE stats
Separated FE stats out into its own structure.  There are 17 FE vs 3 BE
stat fields.  Since there is only one FE thread per DC then we don't have
to loop over all threads and sum up FE stats over all the worker threads.
This also reduces size of DC since we only need to store one copy of the
FE stats and not one per worker.  Finally, we can use the new FE callback
mechanism to update these.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:51 -05:00
Tim Rowley
f833b694cd swr: [rasterizer core] remove all old stats code
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:45 -05:00
Tim Rowley
ad153189ec swr: [rasterizer core] viewport array support
Change viewport matrix storage from AOS to SOA to support viewport arrays.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:40 -05:00
Tim Rowley
d86e2487a0 swr: [rasterizer jitter] fetch support for offsetting VertexID
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:33 -05:00
Tim Rowley
6625fd08db swr: [rasterizer core] fundamentally change how stats work
Add a per draw stats callback to update driver stats.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:23 -05:00
Tim Rowley
047493c198 swr: [rasterizer core] add rasterizerSampleCount to PS context
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:17 -05:00
Tim Rowley
a83beb936e swr: [rasterizer core] remove cygwin threads.cpp stubs
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:11 -05:00
Tim Rowley
29e1c4a8a9 swr: [rasterizer core] allow override of KNOB thread settings
- Remove HYPERTHREADED_FE support
- Add threading info as optional data passed to SwrCreateContext.
  If supplied this data will override any KNOB thread settings.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:08:05 -05:00
Tim Rowley
e0c10306f5 swr: [rasterizer core] add SwrWaitForIdleFE
This is a blocking call that waits until all FE work is complete.
This is useful for waiting for FE work to complete such as for streamout.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:07:59 -05:00
Tim Rowley
8dfaf249cc swr: [rasterizer core] change threadsDone to be a 32-bit value.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:07:53 -05:00
Tim Rowley
6624e01114 swr: [rasterizer core] update trivial accept test conditions
enable/disable raster tile trivial accept test based on scissor enable trait.
Can be optimized further.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:07:47 -05:00
Tim Rowley
7cf187d08a swr: [rasterizer core] improve implementation for SoWriteOffset
1. SoWriteOffset is no longer treated as a stat
2. Added callback from core to update streamout write offset

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:07:40 -05:00
Tim Rowley
8d3b20135e swr: [rasterizer common] make disabled asserts always print (but not break)
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-10 11:07:00 -05:00
Leo Liu
6575ebdc45 vl/rbsp: add a check for emulation prevention three byte
This is the case when the "00 00 03" is very close to the beginning of
nal unit header

v2: move the check to rbsp init

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-10 09:52:44 -04:00
Ilia Mirkin
bc5df3b321 Re-apply "glsl: don't try to lower non-gl builtins as if they were gl_FragData"
If a shader has an output array, it will get treated as though it were
gl_FragData and rewritten into gl_out_FragData instances. We only want
this to happen on the actual gl_FragData and not everything else.

This is a small part of the problem pointed out by the below bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-10 15:43:36 +02:00
Marek Olšák
9c63fd9056 radeonsi: set CB_COLORn_INFO.ROUND_MODE
just do what the register spec says

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-10 15:43:36 +02:00
Marek Olšák
667ad9fa3e radeonsi: set CB_COLORn_INFO.SIMPLE_FLOAT
This can help enable some blend optimizations (see the register spec).
Vulkan always sets this.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-10 15:43:36 +02:00
Marek Olšák
36057ff12a radeonsi: disallow MIN/MAX blend equations for dual source blending
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-10 15:43:36 +02:00
Marek Olšák
947e0614d0 radeonsi: only set dual source blending for MRT0
This is the proper fix for Overlord and Witcher 2 hangs.

The hang condition is that 1 app must write to MRT0 and MRT1 from a pixel
shader while MRT1 is disabled in CB_TARGET_MASK (does this generate
unflushable pixel quads? I don't know), and another app (e.g. Glamor)
must enable dual source blending in both MRT0 and MRT1. The hw gets
confused, which leads to corruption and hangs.

Cc: 12.0 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-10 15:43:36 +02:00
Miklós Máté
88c2fc6b2d st/mesa: in ATI fs don't assume TEMP0=REG0
The temporaries are allocated dynamically.

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-08-10 15:03:58 +02:00
Trevor Davenport
9a4d5db4d2 st/nine: Fix invalid attempt to use indirect draws.
Since commit 6d7177f01b, radeonsi
would take a different path if info->indirect_params was not
initialized properly.  Nine was not initializating this field.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-08-10 15:02:20 +02:00
Mathias Fröhlich
0ce5ec8ece util: Use win32 intrinsics for util_last_bit if present.
v2: Split into two patches.
v3: Fix off by one problem.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-08-10 09:30:07 +02:00
Marek Olšák
3f100b77f9 gallium/radeon: use unflushed fences for deferred flushes (v2)
+23% Bioshock Infinite performance.

v2: - use the new fence_finish interface
    - allow deferred fences with multiple contexts
    - clear the ctx pointer after a deferred flush

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
1cc95a1255 st/mesa: set the ctx parameter of fence_finish
for deferred flushes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
54272e18a6 gallium: add a pipe_context parameter to fence_finish
required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush
the context

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
c6043e7d54 st/mesa: use PIPE_USAGE_STREAM for GL_CLIENT_STORAGE_BIT without READ_BIT (v2)
v2: keep STAGING for GL_MAP_READ_BIT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
33a9b4e8a1 gallium/radeon: add HUD queries for mapped VRAM/GTT
mainly for monitoring visible VRAM congestion

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
645d395d9a winsys/radeon: track the amount of mapped memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
1e04483c22 winsys/amdgpu: track the amount of mapped memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
8276776e64 winsys/amdgpu: don't try to unmap userptr buffers
no app calls this AFAIK

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
ef836c0d04 gallium/radeon: increase the size of the renderer string
Mine is longer than 64 bytes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
739d526b07 gallium/radeon: implement ARB_clear_texture (v3)
Some ideas copied from Jakob Sinclair's implementation, but the color
clearing is completely different.

v2: remove leftover code, disable conditional rendering
    disable render condition cleanly

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
7df15389af gallium/radeon: handle render_condition_enable for clear_rt/ds
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:10:21 +02:00
Marek Olšák
a909210131 gallium: add render_condition_enable param to clear_render_target/depth_stencil
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:10:21 +02:00
Haixia Shi
a7c6993a33 egl: android: query native window default width and height (v2)
On android platform, the width and height of a native window surface may
be updated after initialization. It is therefore necessary to query android
framework for the current width and height.

v2: remove Android specific #ifdef's and just implement the fallback directly
if the platform query_surface() callback is not provided.

TEST=dEQP-EGL.functional.resize.surface_size#* on cyan-cheets

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> (v1)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I673f7d2f1d90c3bf572b30f63da537f2cae1496e
2016-08-09 15:49:28 -07:00
Anuj Phogat
c4cd0e8ecd anv/device: Enable sample shading on gen7+
Passes all 30 min_sample_shading tests in vulkan cts.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-09 14:45:25 -07:00
Anuj Phogat
f16295a198 anv/gen7_pipeline: Set multisample state using shared function
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-09 14:45:25 -07:00
Anuj Phogat
2ef5063ad7 anv/pipeline: Add sample locations for gen7-7.5
V1: Add multisample positions (Nanley)
V2: Fix 8x sample positions to match OpenGL (Anuj)
V3: Vulkan has standard sample locations. They need not be same as
    in OpenGL. (Anuj)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-09 14:45:25 -07:00
Anuj Phogat
dc49dd7f10 anv/pipeline: Move emit_ms_state() to genX_pipeline_util.h
This will help sharing multisample state setting code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-09 14:45:25 -07:00
Mathias Fröhlich
aa920736fe gallium: Add c99_compat.h to u_bitcast.h
We need this for 'inline'.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-09 21:20:56 +02:00
Mathias Fröhlich
027cbf00f2 util: Move _mesa_fsl/util_last_bit into util/bitscan.h
As requested with the initial creation of util/bitscan.h
now move other bitscan related functions into util.

v2: Split into two patches.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-09 21:20:46 +02:00
Nicolai Hähnle
e4cb3af524 radeonsi: enable multi-draw related pipe caps
This enables GL_shader_draw_parameters and GL_ARB_indirect_parameters as well
as a properly accelerated implementation of GL_ARB_multi_draw_indirect.

Enabling the feature requires a sufficiently uptodate firmware -- those have
already been released a long time ago, although this does mean that the
feature only works with the amdgpu kernel module, since the radeon module
doesn't have a way to query the firmware version.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
6d7177f01b radeonsi: program additional multi draw parameters
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
b6c71d37c7 radeonsi: program the DRAWID SGPR
Note that for indirect draws, the new MULTI firmware packets are required.

There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
8dbf2a8570 radeonsi: add DRAWID parameter to vertex shaders
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:04 +02:00
Nicolai Hähnle
febb5dbf72 radeonsi: wire up TGSI_SEMANTIC_BASEINSTANCE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Nicolai Hähnle
d34292a77f radeonsi: remove an incorrect assertion
Byte indices don't need any alignment, so remove this assertion (it got moved
into a path where a piglit test hit it during the refactoring of
commit 64ff23a58c).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Nicolai Hähnle
2852dedaa0 radeonsi: flush TC L2 cache for indirect draw data
This fixes a bug when indirect draw data is generated by transform
feedback.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Nicolai Hähnle
76c4a3b567 radeonsi/sid: add additional bits for the DRAW_(INDEX)_INDIRECT_MULTI packets
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-09 15:56:03 +02:00
Brian Paul
60dc36a680 st/mesa: define ST_NEW_ flags as uint64_t values, not enums
MSVC doesn't support 64-bit enum values, at least not with C code.
The compiler was warning:

c:\users\brian\projects\mesa\src\mesa\state_tracker\st_atom_list.h(43) : warning
 C4309: 'initializing' : truncation of constant value
c:\users\brian\projects\mesa\src\mesa\state_tracker\st_atom_list.h(44) : warning
 C4309: 'initializing' : truncation of constant value
...

And at runtime we crashed since the high 32-bits of the 'dirty' bitmask
was always 0xffffffff and the 32+u_bit_scan() index went out of bounds of
the atoms[] array.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-09 07:50:18 -06:00
Miklós Máté
d9519c6f06 mesa: simplify ff fs generator a bit
Literally.

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-09 07:46:37 -06:00
Marek Olšák
06b2fd04f6 ddebug: dump driver states and shaders for apitrace calls
I think this was an oversight when the PIPE_DUMP flags were added.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-09 15:35:42 +02:00
Timothy Arceri
8c4d9afb7e nir: make use of nir_cf_list_extract() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-09 13:21:30 +10:00
Matt Turner
b1d9c742e9 nir: Always print non-identity swizzles.
Previously we would not print a swizzle on ssa_52 when only its .x
component is used (as seen in the definition of ssa_53):

   vec3 ssa_52 = fadd ssa_51, ssa_51
   vec1 ssa_53 = flog2 ssa_52
   vec1 ssa_54 = flog2 ssa_52.y
   vec1 ssa_55 = flog2 ssa_52.z

But this makes the interpretation of the RHS of the definition difficult
to understand and dependent on the size of the LHS. Just print swizzles
when they are not the identity swizzle, so the previous example is now
printed as:

   vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz
   vec1 ssa_53 = flog2 ssa_52.x
   vec1 ssa_54 = flog2 ssa_52.y
   vec1 ssa_55 = flog2 ssa_52.z

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-08 17:52:35 -07:00
Lionel Landwerlin
8cde4ddbce anv/pipeline/gen7: Set multisample modes
Fixes the following failures :

dEQP-VK.api.copy_and_blit.resolve_image.whole_4_bit
dEQP-VK.api.copy_and_blit.resolve_image.whole_8_bit
dEQP-VK.api.copy_and_blit.resolve_image.partial_4_bit
dEQP-VK.api.copy_and_blit.resolve_image.partial_8_bit
dEQP-VK.api.copy_and_blit.resolve_image.with_regions_4_bit
dEQP-VK.api.copy_and_blit.resolve_image.with_regions_8_bit

Tested on IVB/HSW

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-08 14:44:25 -07:00
Lionel Landwerlin
a3c472a2ec anv/pipeline: rename info to rs_info in emit_rs_state
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-08 14:44:25 -07:00
Marek Olšák
1ebf3c4b67 Revert "glsl: don't try to lower non-gl builtins as if they were gl_FragData"
This reverts commit a37e46323c.

It broke the game Overlord such that it hung a GCN GNU. While I don't know
how the hang happened because of its randomness and gfx corruption precedes
it, many of the shaders contain this:

out vec4 FragData[gl_MaxDrawBuffers];
2016-08-08 23:24:20 +02:00
Tomasz Figa
3723e9826f egl/android: Add support for YV12 pixel format (v2)
This patch adds support for YV12 pixel format to the Android platform
backend. Only creating EGL images is supported, it is not added to the
list of available visuals.

v2: Use const array defined just for YV12 instead of trying to be overly
    generic.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I4aeb2d67a95c5cdd10b530c549b23146c8f0b983
2016-08-08 14:18:38 -07:00
Kenneth Graunke
3190c7ee97 st/mesa: Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules.
OpenGL 4.4 specifies that BlitFramebuffer should perform sRGB encode
and decode like ES 3.x does, but only when GL_FRAMEBUFFER_SRGB is
enabled.  This is technically incompatible in certain cases, but is
more consistent across GL, ES, and WebGL, and more flexible.

The NVIDIA 367.35 drivers appear to follow this behavior.

For the awful spec analysis, please read Piglit's
tests/spec/arb_framebuffer_srgb/blit.c, which explains the differences
between GL 4.1, 4.2, 4.3 (2012), 4.3 (2013), and 4.4, and why this
is the right rule to implement.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-08 14:04:18 -07:00
Kenneth Graunke
f6dc71483a meta: Make Meta's BlitFramebuffer() follow the GL 4.4 sRGB rules.
Just avoid whacking GL_FRAMEBUFFER_SRGB altogether, so we respect
the application's setting.  This appears to work.

v2: Update one more comment (requested by Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 14:04:01 -07:00
Kenneth Graunke
ad32dcf630 i965: Make BLORP's BlitFramebuffer follow the GL 4.4 sRGB rules.
OpenGL 4.4 specifies that BlitFramebuffer should perform sRGB encode
and decode like ES 3.x does, but only when GL_FRAMEBUFFER_SRGB is
enabled.  This is technically incompatible in certain cases, but is
more consistent across GL, ES, and WebGL, and more flexible.

The NVIDIA 367.35 drivers appear to follow this behavior.

For the awful spec analysis, please read Piglit's
tests/spec/arb_framebuffer_srgb/blit.c, which explains the differences
between GL 4.1, 4.2, 4.3 (2012), 4.3 (2013), and 4.4, and why this
is the right rule to implement.

Note that ctx->Color.sRGBEnabled is initialized to _mesa_is_gles(ctx),
and ES doesn't have enable/disable flags for GL_FRAMEBUFFER_SRGB, so
it's effectively on all the time.  This means the ES behavior should
be unchanged.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 14:01:51 -07:00
Kenneth Graunke
352401f6a9 i965: Make BLORP do sRGB encode/decode on ES 2 as well.
This should have no effect, as all drivers which support BLORP also
support ES 3.0 - so ES 2.0 would be promoted and follow the ES 3 rules.

ES 1.0 doesn't have BlitFramebuffer.

This is purely to clarify the next patch a bit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 14:01:51 -07:00
Kenneth Graunke
0c7047ab9c Revert "st/mesa: use sRGB formats for MSAA resolving if destination is sRGB"
This reverts commit 4e549ddb50,
dropping the hack from Gallium that I just deleted from i965.

See the previous commit for rationale.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-08 14:01:51 -07:00
Kenneth Graunke
cc27c7fe38 i965: Drop the "do resolves in sRGB" hack.
I've never quite understood the purpose of this hack - supposedly,
doing resolves in the sRGB colorspace is slightly more accurate.

Currently, BlitFramebuffer() ignores sRGB encoding and decoding
on OpenGL, although it encodes and decodes in GLES 3.x.

The updated OpenGL 4.4 rules also allow for encoding and decoding
if GL_FRAMEBUFFER_SRGB is enabled, allowing the application to
control what colorspace blits are done in.  I don't think this hack
makes any sense in such a world - the application can do what it
wants, and we shouldn't second guess them.

A related Piglit patch, "Make multisample accuracy test set
GL_FRAMEBUFFER_SRGB when resolving." makes the Piglit MSAA accuracy
test explicitly request SRGB encoding/decoding during resolves when
running "srgb" subtests.  Without that patch, this commit will regress
those tests, but with it, they should continue to work just fine.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 14:01:51 -07:00
Kenneth Graunke
b1586526e8 i965: Bail on the BLT path if BlitFramebuffer requires sRGB conversion.
Modern OpenGL BlitFramebuffer require sRGB encode/decode when
GL_FRAMEBUFFER_SRGB is enabled.  The blitter can't handle this,
so we need to bail.  On Gen4-5, this means falling back to Meta,
which should handle it.

We allow sRGB <-> sRGB blits, as decode then encode ought to be a noop
(other than potential precision loss, which nobody wants anyway).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 14:01:51 -07:00
Tomasz Figa
7dfb1a4074 egl/android: Make get_fourcc() accept HAL formats
There are DRI_IMAGE_FOURCC macros, for which there are no corresponding
DRI_IMAGE_FORMAT macros. To support such formats we need to make the
lookup function take the native format directly. As a side effect, it
simplifies all existing calls to this function, because they all called
get_format() first to convert from native to DRI_IMAGE_FORMAT.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I4674000fb5ccfd02e38b8fa89bc567ac1d4fc16b
2016-08-08 11:40:41 -07:00
Tomasz Figa
e77b493390 egl/android: Refactor image creation to separate flink and prime paths (v2)
This patch splits current dri2_create_image_android_native_buffer() into
main entry point and two additional functions, one for creating an image
from flink name and one for handling prime FDs using the generic DMA-buf
path. This makes the code cleaner and also prepares for disabling flink
path more easily in the future.

v2: Split into separate patch.
    Add error messages.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: Ifdfb5927399d56992fe707160423c29278f49172
2016-08-08 11:40:37 -07:00
Tomasz Figa
217af75a40 egl/android: Respect buffer mask in droid_image_get_buffers (v2)
Drivers can request different set of buffers depending on the buffer
mask they pass to the get_buffers callback. This patch makes
droid_image_get_buffers() respect this mask.

v2: Return error only in case of real error condition and ignore requests
    of unavailable buffers.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I6c3c4eca90f4c618579f6725dec323c004cb44ba
2016-08-08 11:40:31 -07:00
Tomasz Figa
c6c26bc589 egl/android: Remove unused variables in droid_get_buffers_with_format()
Fix compilation warnings due to unused variables left after some earlier
code changes.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: Iec09eb2a62887f3a38dff156756ed8385f3f3447
2016-08-08 11:40:26 -07:00
Jason Ekstrand
52fcc40760 anv/pipeline/gen7: Set the depth format in 3DSTATE_SF
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:46 -07:00
Jason Ekstrand
21d5c1be6a isl: Add a helper for getting a depth format from an isl_format
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:44 -07:00
Jason Ekstrand
ce980541d5 anv/pipeline: Unify 3DSTATE_RASTER and 3DSTATE_SF setup between gen7 and gen8
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:41 -07:00
Jason Ekstrand
960e8a1260 anv/pipeline/gen8: Set 3DSTATE_SF::StatisticsEnable
We've been setting it in gen7 forever but never in gen8; best to make it
consistent.  This hasn't caused any problems yet because we don't advertise
support for statistics queries yet.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:36 -07:00
Jason Ekstrand
12e653adec anv/pipeline/gen8: Unconditionally set DXMultisampleRasterizaitonEnable
The multisample rasterization mode is computed based on this field,
3DSTATE_RASTER::DXMultisampleRasterizationMode (only for forced
multisampling), 3DSTATE_RASTER::APIMode, and the number of samples.  There
are two tables in the SKL PRM that describe how the final multisample mode
is calculated: "Windower (WM) Stage >> Multisampling >> Multisample
ModeState >> Table 1" and the formula for "SF_INT::Multisample
Rasterization Mode".

The "DX Multisample Rasterization Enable" bit changes whether multisample
mode is set to OFF_PIXEL or ON_PATTERN in the samples > 1 case.  In the
samples == 1 case, the bit has no effect.  Since Vulkan has no concept of
disabling multisampling for samples > 1, we can just set the bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:33 -07:00
Jason Ekstrand
1df511b6f0 anv/pipeline/gen8: Use fewer designated initializers in emit_rs_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:31 -07:00
Jason Ekstrand
6136fb8687 genxml: Make 3DSTATE_SF more consistent between gen7 and gen8+
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:28 -07:00
Jason Ekstrand
2d76dcae71 anv/pipeline/gen8: Remove an old comment
This is now handled in emit_3dstate_clip

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-08 11:13:04 -07:00
Kenneth Graunke
7314007925 mesa: Skip ES 3.0/3.1 transform feedback primitive counting error.
This error condition is not implementable when using tessellation or
geometry shaders.  The text was also removed from the ES 3.2 spec.
I believe the intended behavior is to remove the error condition
when either OES_geometry_shader or OES_tessellation_shader are
exposed.

v2: Quote a better part of issue 13 (suggested by Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 10:01:30 -07:00
Kenneth Graunke
23b2bcd460 mesa: Share code between _mesa_validate_DrawArrays[_Instanced].
Mostly, I want to share the GLES 3 transform feedback handling,
though most of the rest of the code is identical as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 10:01:30 -07:00
Kenneth Graunke
522b5d4566 glsl: Implicitly enable OES_shader_io_blocks if geom/tess are enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Kenneth Graunke
0eaa84e8af glsl: Expose gl_PointSize if OES/EXT_tessellation_point_size is enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Kenneth Graunke
58709d36d7 glsl: Add extension plumbing for OES/EXT_tessellation_shader.
This adds the #extension directive support, built-in #defines,
lexer keyword support, and updates has_tessellation_shader().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Kenneth Graunke
722fd10456 mesa: Move tessellation shader gets to GL_CORE, GLES31 section.
This makes them available in the GLES 3.1 API.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Kenneth Graunke
c8438b62b7 mesa: Add {OES,EXT}_tessellation_shader to the extensions table.
Also update _mesa_has_tessellation to know about the new extensions.

For now, these are dummy_false, to avoid turning on the extension
until everything's in place.  Eventually, we'll move them over to
the "ARB_tessellation_shader" bit so that any drivers supporting
both the desktop extension and ES 3.1 get the feature.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Kenneth Graunke
73554c47e0 mapi: Add PatchParameteriOES and PatchParameteriEXT.
The OES_tessellation_shader and EXT_tessellation_shader specifications
have suffixed names.  These are identical to the core function, so just
alias them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-08 09:59:03 -07:00
Nicolai Hähnle
96bbb620a5 radeonsi: add has_draw_indirect_multi flag
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.

Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:06 +02:00
Nicolai Hähnle
5c343cce0f radeonsi: transpose indirect/index draw dispatch
This allows better code sharing for indirect draw calls.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:04 +02:00
Nicolai Hähnle
64ff23a58c radeonsi: move index buffer calculations in si_emit_draw_packets up
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:53:02 +02:00
Nicolai Hähnle
cf7d18b75c radeonsi: unify emitting PKT3_SET_BASE for indirect draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:52:59 +02:00
Nicolai Hähnle
e0736c438c winsys/amdgpu: query ME/PFP/CE firmware versions
The radeon kernel module doesn't have the firmware query interface, so the
corresponding values will remain 0.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:52:41 +02:00
Nicolai Hähnle
7f5a8dc27e radeonsi: move spi_ps_input_addr override outside of the loop
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:51:32 +02:00
Nicolai Hähnle
287822ee33 radeonsi: drop unnecessary u_pstipple.h include
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:51:29 +02:00
Nicolai Hähnle
3e4c5693a1 radeonsi: do not pass the return type to buffer_load_const
Overriding it is not allowed anyway, and actually lead to a crash when polygon
stippling was used with monolithic shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-08 12:51:26 +02:00
Kenneth Graunke
bd1bd03268 glsl: Combine GS and TES array resizing visitors.
These are largely identical, except that the GS version has a few
extra error conditions.  We can just pass in the stage and skip these.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:53:59 -07:00
Kenneth Graunke
398428f406 glsl: Fix location bias for patch variables.
We need to subtract VARYING_SLOT_PATCH0, not VARYING_SLOT_VAR0.

Since "patch" only applies to inputs and outputs, we can just handle
this once outside the switch statement, rather than replicating the
check twice and complicating the earlier conditions.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:53:42 -07:00
Kenneth Graunke
1556f16e46 glsl: Fix the program resource names of gl_TessLevelOuter/Inner[].
These are lowered to gl_TessLevel{Outer,Inner}MESA.  We need them to
appear in the program resource list with their original names and types.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:53:28 -07:00
Kenneth Graunke
4a49851da1 glsl: Delete bogus ir_set_program_inouts assert.
This assertion is bogus.  Varying structs, and arrays of structs, are
allowed by GLSL, and we can see them here.  While we currently don't
have any partial-variable support for those, simply returning false
and marking the entire thing as used is certainly legitimate.

I believe this is often swept under the rug by varying packing,
but that's disabled in certain tessellation situations.

Hit by 20 dEQP-GLES31.functional.tessellation.user_defined_io.* tests.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:51:21 -07:00
Kenneth Graunke
86915b495b glsl: Simplify interface qualifier parsing.
This better matches the grammar in section 4.3.9 of the GLSL 4.5 spec,
and also removes some redundant code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:48:48 -07:00
Kenneth Graunke
d0642c52fc glsl: Add a has_tessellation_shader() helper.
Similar to has_geometry_shader(), has_compute_shader(), and so on.
This will make it easier to add more conditions here later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-08-07 23:47:55 -07:00
Marek Olšák
3fb4a9b3b3 Revert "gallium/radeon: count contexts"
This reverts commit b403eb3385.

Not needed.
2016-08-06 17:29:23 +02:00
Marek Olšák
11b1d064a3 radeonsi: add GLSL lit tests
They can only be run manually as described in HOW_TO_RUN.
It should help catch suboptimal code generation.

Some of the tests already fail.

v2: rename the tests to *.glsl,
    fix lit.cfg to find FileCheck

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-08-06 16:11:43 +02:00
Marek Olšák
35942ee8a8 radeonsi: add a standalone compiler amdgcn_glslc
This will be used by GLSL lit tests.

For developers only. It shouldn't be distributable and it doesn't use
the Mesa build system.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 16:11:39 +02:00
Marek Olšák
ad8af99c86 radeonsi: add environment variable SI_FORCE_FAMILY
This will be used by: amdgcn_glslc -mcpu=[family]

It can also be used for shader-db if you want stats for a different family.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 16:11:35 +02:00
Marek Olšák
d0646cc745 winsys/radeon: implement cs_get_next_fence
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:31 +02:00
Marek Olšák
63b99590db winsys/amdgpu: implement cs_get_next_fence
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:30 +02:00
Marek Olšák
04a6cb63aa gallium/radeon: add cs_get_next_fence winsys callback
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:30 +02:00
Marek Olšák
b403eb3385 gallium/radeon: count contexts
We don't wanna use unflushed fences when we have multiple contexts.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:30 +02:00
Marek Olšák
16d568d911 gallium/radeon: count gfx IB flushes
This will be used as a counter for whether fence_finish needs to flush
the IB.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:30 +02:00
Marek Olšák
c5ff0d3e65 gallium/radeon: move radeon_winsys::cs_memory_below_limit to drivers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
076db67217 gallium/radeon: inline radeon_winsys::query_memory_usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
9646ae7799 gallium/radeon/winsyses: expose per-IB used_vram and used_gart to drivers
The following patches will use this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
1c8f17599e gallium/radeon/winsyses: print CS submission error number
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
0edc2e433e radeonsi: flush if constant, shader, and streamout buffers use too much memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c3efdeb8dd radeonsi: flush if sampler views and images use too much memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
d82cfab84c radeonsi: deal with high vertex buffer memory usage correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
e62caf576e radeonsi: take compute shader and dispatch indirect memory usage into account
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
c56ecb68e7 radeonsi: take scratch buffer and draw indirect memory usage into account
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
ed2254d157 radeonsi: check IB memory usage of CP DMA operations
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
f4b977bf3d gallium/radeon: add r600_resource::vram_usage and gart_usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Mathias Fröhlich
62d41162bb mesa: Copy bitmask of VBOs in the VAO on gl{Push,Pop}Attrib.
On gl{Push,Pop}Attrib(GL_CLIENT_VERTEX_ARRAY_BIT) take
care that gl_vertex_array_object::VertexAttribBufferMask
matches the bound buffer object in the
gl_vertex_array_object::VertexBinding array.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2016-08-06 06:27:37 +02:00
Nanley Chery
c495c18b24 anv/gen7_pipeline: Set PixelShaderKillPixel for discards
According to the IVB PRM Vol2 P1, this bit must be set if a pixel shader
contains a discard instruction.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97207
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-05 09:53:52 -07:00
Jason Ekstrand
21f357b66e util/r11g11b10f: Whitespace cleanups
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:07:06 -07:00
Jason Ekstrand
ffcf8e1049 util/format: Use explicitly sized types
Both the rgb9e5 and r11g11b10 formats are defined based on how they are
packed into a 32-bit integer.  It makes sense that the functions that
manipulate them take an explicitly sized type.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:07:04 -07:00
Jason Ekstrand
c7eb9a7565 util/rgb9e5: Get rid of the float754 union
There are a number of reasons for this refactor.  First, format_rgb9e5.h is
not something that a user would expect to define such a generic union.
Second, defining it requires checking for endianness which is ugly.  Third,
90% of what we were doing with the union was float <-> uint32_t bitcasts
and the remaining 10% can be done with a sinmple left-shift by 23.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:07:01 -07:00
Jason Ekstrand
cda8d95660 util/format_rgb9e5: Get rid of the rgb9e5 union
The rgb9e5 format is a packed format defined in terms of slicing up a
single 32-bit value.  The bitfields are far more confusing than simple
shifts and require that we check the endianness.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:59 -07:00
Jason Ekstrand
f29fd7897a util: Move format_r11g11b10f.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:57 -07:00
Jason Ekstrand
6c665cdfc5 util: Move format_rgb9e5.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:31 -07:00
Andres Gomez
591869e921 glsl: fix indentation, comments and line lengths in ast_function.cpp
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-08-05 14:27:11 +03:00
Andres Gomez
8f98a120f3 glsl: apply_implicit_conversion is static again
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-08-05 14:27:11 +03:00
Andres Gomez
1443c10d74 glsl: struct constructors/initializers only allow implicit conversions
When an argument for a structure constructor or initializer doesn't
match the expected type, only Section 4.1.10 “Implicit Conversions”
are allowed to try to match that expected type.

From page 32 (page 38 of the PDF) of the GLSL 1.20 spec:

  " The arguments to the constructor will be used to set the structure's
    fields, in order, using one argument per field. Each argument must
    be the same type as the field it sets, or be a type that can be
    converted to the field's type according to Section 4.1.10 “Implicit
    Conversions.”"

From page 35 (page 41 of the PDF) of the GLSL 4.20 spec:

  " In all cases, the innermost initializer (i.e., not a list of
    initializers enclosed in curly braces) applied to an object must
    have the same type as the object being initialized or be a type that
    can be converted to the object's type according to section 4.1.10
    "Implicit Conversions". In the latter case, an implicit conversion
    will be done on the initializer before the assignment is done."

v2: Remove also the now redundant constant conversion, the
    constant_record_constructor helper and the replacement code
    (Timothy).

Fixes GL44-CTS.shading_language_420pack.initializer_list_negative

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-08-05 14:27:03 +03:00
Andres Gomez
de60d549b9 glsl: Refactor implicit conversion into its own helper
v2: Refactor also the conversion to constant and replacement code
    (Timothy).

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-08-05 14:27:03 +03:00
Andres Gomez
af796d756e glsl/types: disallow implicit conversions before GLSL 1.20
Implicit conversions were added in the GLSL 1.20 spec version.

v2: Join the checks for GLSL 1.10 and ESSL (Timothy).

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-08-05 14:27:03 +03:00
Kenneth Graunke
875341c69b i965: Rework the unlit centroid workaround.
Previously, for every input, we moved the dispatch mask to the flag
register, then emitted two predicated PLN instructions, one with
centroid barycentric coordinates (for normal pixels), and one with
pixel barycentric coordinates (for unlit helper pixels).

Instead, we can simply emit a set of predicated MOVs at the top of
the program which copy the pixel barycentric coordinates over the
centroid ones for unlit helper pixel channels.  Then, we can just
use normal PLNs.

On Sandybridge:

total instructions in shared programs: 7538470 -> 7534500 (-0.05%)
instructions in affected programs: 101268 -> 97298 (-3.92%)
helped: 705
HURT: 9 (all of which are SIMD16 programs)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-05 01:43:52 -07:00
Tim Rowley
b521083ffb swr: [rasterizer core] static analysis fixes for conservative rast
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:35 -05:00
Tim Rowley
68dc544879 swr: [rasterizer core] implement InnerConservative input coverage
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:35 -05:00
Tim Rowley
4034f48833 swr: [rasterizer core] remove CanEarlyZ function
Test is now in SetupPipeline.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
b365989875 swr: [rasterizer core] use 32x32 macrotile for openswr
Significant performance increase (up to 2x) on high geometry workloads.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
5f4bc9e85b swr: [rasterizer fetch] add support for 24bit format fetch
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
527d45c8fe swr: [rasterizer fetch] additional fetch format support
Add support for 0 pitch in fetch.

Add support for USCALE/SSCALE for 32bit integer fetches.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
f438b7ba81 swr: [rasterizer jitter] fix potential jit exit crash
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
57b07498d2 swr: [rasterizer core] update sync handling
Sync now uses a callback to ensure that it's called by the last
thread moving past a DC.  This will help with the new counter
handling.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:38:34 -05:00
Tim Rowley
191786d0f4 swr: [rasterizer core] rename variable
Avoid nested declarations of the same name within a single function.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:37 -05:00
Tim Rowley
61cc012e9a swr: [rasterizer jitter] adjust extern "C" block scope
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:31 -05:00
Tim Rowley
9f7d99fcfe swr: [rasterizer core] conservative rast degenerate handling
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 14:01:25 -05:00
Tim Rowley
f01827a469 swr: [rasterizer core] allow hexadecimal for integer knobs
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-04 13:52:12 -05:00
Eric Anholt
49741e1cd2 mesa: Dynamically allocate the matrix stack.
By allocating and initializing the matrices at context creation, the OS
couldn't even overcommit the pages.  This saves about 63k (out of 946k) of
maximum memory size according to massif on simulated vc4
glsl-algebraic-add-add-1.  It also means we could potentially relax the
maximum stack sizes, but that should be a separate commit.

v2: Drop redundant Top update, explain why the stack is small at init
    time.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-04 08:52:11 -07:00
Eric Anholt
2a808219b3 state_tracker: Initialize the draw context only when needed.
It's only used for rarely-used deprecated GL features
(feedback/rasterpos), so we can skip the memory allocation and
initialization for it most of the time.

Saves about 659k (out of 1605k) of maximum memory size according to massif
on simulated vc4 glsl-algebraic-add-add-1

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-04 08:48:27 -07:00
Eric Anholt
c976e164d2 vc4: Move scalarizing and some lowering to link time.
This works out to be a wash in terms of memory usage: We use more memory
to store the separate ALU instructions, but we optimize out a lot of code
as well.  The main result, though, is that we do more of our work at link
time rather than draw time.
2016-08-04 08:48:27 -07:00
Eric Anholt
2350569a78 vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.
We don't want to bake the whole array into the FS key, because of the
hashing overhead.  But we can keep a set of the arrays seen, and use a
pointer to the copy in as the array's proxy.

Between this and the previous patch, gl-1.0-blend-func now passes on
hardware, where previously it was filling the 256MB CMA area with shaders
and OOMing.

Drops 712 shaders from shader-db.
2016-08-04 08:48:27 -07:00
Eric Anholt
62ea2461ed vc4: Don't recompile the CS when the FS changes.
The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but
only the VS dereferences it.

Drops 754 shaders from shader-db.
2016-08-04 08:48:27 -07:00
Eric Anholt
d577dbc201 vc4: Move FS inputs setup out to a helper function.
It's a pretty big block, and I was about to make it bigger.
2016-08-04 08:48:27 -07:00
Kenneth Graunke
144cbf8987 nir: Make nir_opt_remove_phis see through moves.
I found a shader in Tales of Maj'Eyal that contains:

        if ssa_21 {
                block block_1:
                /* preds: block_0 */
                ...instructions that prevent the select peephole...
                vec1 32 ssa_23 = imov ssa_4
                vec1 32 ssa_24 = imov ssa_4.y
                vec1 32 ssa_25 = imov ssa_4.z
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                vec1 32 ssa_26 = imov ssa_4
                vec1 32 ssa_27 = imov ssa_4.y
                vec1 32 ssa_28 = imov ssa_4.z
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */
        vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26
        vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27
        vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28

Here, copy propagation will bail because phis cannot perform swizzles,
and CSE won't do anything because there is no dominance relationship
between the imovs.  By making nir_opt_remove_phis handle identical moves,
we can eliminate the phis and rewrite everything to use ssa_4 directly,
so all the moves become dead and get eliminated.

I don't think we need to check "exact" - just the alu sources.
Presumably phi sources should match in their exactness.

On Broadwell:

total instructions in shared programs: 11639872 -> 11638535 (-0.01%)
instructions in affected programs: 134222 -> 132885 (-1.00%)
helped: 338
HURT: 0

v2: Fix return value to be NULL, not false (caught by Iago).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-04 00:42:12 -07:00
Kenneth Graunke
7603b4d3a1 nir: Make nir_alu_srcs_equal non-static.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-04 00:41:07 -07:00
Kenneth Graunke
6aa730000f nir: Turn imov/fmov of undef into undef.
On Broadwell:

total instructions in shared programs: 11640214 -> 11639872 (-0.00%)
instructions in affected programs: 17744 -> 17402 (-1.93%)
helped: 78
HURT: 0

total spills in shared programs: 2924 -> 2922 (-0.07%)
spills in affected programs: 104 -> 102 (-1.92%)
helped: 1
HURT: 0

total fills in shared programs: 4394 -> 4389 (-0.11%)
fills in affected programs: 237 -> 232 (-2.11%)
helped: 1
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-04 00:40:59 -07:00
Kenneth Graunke
12a912586f i965: Use a separate register for every access to an SSA undef.
Previously, we allocated a new VGRF for every undefined definition.
Instead, this patch makes us allocate a new VGRF for every use of an
undefined definition.  This makes sure that undefined values are
fully independent of one another, and have live ranges limited to
their single use.  This allows register coalescing to combine the
source and destination of MOVs from undefined sources, eliminating
the MOV altogether.

On Broadwell:

total instructions in shared programs: 11641187 -> 11640214 (-0.01%)
instructions in affected programs: 70199 -> 69226 (-1.39%)
helped: 213
HURT: 1

v2: Add a comment (based on Iago's suggested one).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-04 00:40:10 -07:00
Michel Dänzer
67c5e843b9 vl/dri3: Destroy Present event context when destroying drawable v2
Without this, the X server may accumulate stale Present event contexts
if a client performs several video decoding sessions using the same
window.

v2: Based on Chris Wilson's review:
* Use xcb_discard_reply() instead of free(xcb_request_check())

Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
2016-08-04 15:45:43 +09:00
Michel Dänzer
5d191bafa2 loader/dri3: Destroy Present event context when destroying drawable v2
Without this, the X server may accumulate stale Present event contexts
if a client ends up creating and destroying DRI drawables for the same
window.

v2: Based on Chris Wilson's review:
* Use xcb_present_select_input_checked so that protocol errors
  generated by old X servers can be handled gracefully
* Use xcb_discard_reply() instead of free(xcb_request_check())
2016-08-04 15:45:43 +09:00
Ben Widawsky
1743c4184b gbm: Correct bo_import documentation (trivial)
Missed here:
commit a43d286ef7
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Fri Mar 28 10:17:11 2014 -0700

    gbm: Add import from fd

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-03 10:56:41 -07:00
Eric Anholt
bc1fc9c985 vc4: Avoid generating a custom shader per level in glGenerateMipmaps().
We were baking in the LOD of the source level to each shader.  Instead,
pass it in as a uniform -- this requires storing it to a temp register,
but that's better than compiling a ton of separate shaders:

total instructions in shared programs: 115032 -> 115036 (0.00%)
instructions in affected programs:     96 -> 100 (4.17%)
LOST:                                  572
2016-08-03 10:55:54 -07:00
Eric Anholt
e97e9e62a1 vc4: Tell valgrind about BO allocations from mmap time to destroy.
This helps in debugging memory pressure.  It would be nice if we could
tell valgrind about it all the way from allocation time to destroy, but we
need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
2016-08-03 10:28:20 -07:00
Jan Ziak
fd32868590 loader: fix memory leak in loader_dri3_open
Found via "valgrind --leak-check=full glxgears".

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Acked-by: Boyan Ding <boyan.j.ding@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-03 10:25:09 -07:00
Eric Anholt
a0671d67de vc4: Fix a leak of the src[] array of VPM reads in optimization.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-08-03 10:25:09 -07:00
Eric Anholt
9f95690959 vc4: Fix leak of the bo_handles table. 2016-08-03 10:25:08 -07:00
Eric Anholt
02f8c444e8 vc4: Fix handling of UBO range offsets.
The ranges are in units of bytes, not dwords.  This wasn't caught by
piglit tests because ttn tends to make one big uniform file, so we only
had one UBO range with a src and dst offset of 0.
2016-08-03 10:25:08 -07:00
Eric Anholt
9128acfb57 nir: Allow opt_peephole_select to work on empty blocks.
nir_opt_peephole_select has the job of removing IF statements with no side
effects.  However, if the IF statement's successor didn't have any
instructions in it, we were skipping it, which occurred in mupen64 on vc4
with glsl_to_nir enabled:

instructions in affected programs:     6134 -> 4120 (-32.83%)
total uniforms in shared programs: 38268 -> 38219 (-0.13%)

No changes on Haswell shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-03 10:25:08 -07:00
Eric Anholt
36b9eb82c1 vc4: Dump NIR at shader state creation time as well.
I keep wanting to see this version of the NIR.
2016-08-03 10:25:08 -07:00
Marek Olšák
435d9595d3 r600g: use last_gfx_fence like radeonsi
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
a6bfafa083 gallium/radeon: move last_gfx_fence from radeonsi to common code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c15a9dec29 radeonsi: skip unnecessary si_update_shaders calls
Small decrease in draw call overhead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c2a0e99169 radeonsi: print the command line to VM fault reports (v2)
v2: rebase on top of Brian's commit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
6573ad69ef ddebug: print the command line to all logs (v2)
for piglit with the pipelined hang detection mode

v2: rebase on top of Brian's commit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
840353059a ddebug: don't use fmemopen on non-Linux OS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97140

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
c88b309fd5 radeonsi: don't set the last parameter component of llvm.AMDGPU.cube
LLVM doesn't use it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
42c5f839ad radeonsi: use llvm.amdgcn.cube* if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
1fb6e55eaf radeonsi: use llvm.amdgcn.rsq.f64 if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Marek Olšák
db2d31dab1 radeonsi: use v_mad_f32 for fma
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.

This tries to restore performance for those games.

The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.

Also, there is code size reduction:
  Totals from affected shaders:
  VGPRS: 109796 -> 109808 (0.01 %)
  Spilled SGPRs: 29995 -> 30022 (0.09 %)
  Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
  Code Size: 6667596 -> 6476356 (-2.87 %) bytes
  Max Waves: 26931 -> 26899 (-0.12 %)

I've not actually tested real performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-03 17:46:46 +02:00
Haixia Shi
4c4bfed670 i965: use mt->offset in intel_miptree_map_movntdqa()
We need to include mt->offset in the calculation of src pointer because its
value may be non-zero, for example in a cubemap texture.

Signed-off-by: Haixia Shi <hshi@chromium.org>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I461ad5b204626d5a1c45611fc6b63735dcf29f63
2016-08-03 08:28:52 -07:00
Timothy Arceri
6fb6201f71 nir: fix validation message
Looks like a copy and paste error from f752effa08

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-03 09:31:57 +10:00
Chad Versace
2d788a9181 .mailmap: Update my address
I left Intel, so make my personal address the canonical address.
2016-08-02 13:29:53 -07:00
Tim Rowley
11072de368 swr: build swr with -fno-strict-aliasing
swr rasterizer contains numerous data transfers between vectors
and ordinary C types.  Fixing for strict aliasing will take time.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-02 14:30:33 -05:00
Andres Gomez
3356ac208b ast: Updated AST_NUM_OPERATORS for coherence with ast_operators
AST_NUM_OPERATORS stores the dimension of the ast_operators
enumeration but was not updated after its last modification.

This doesn't add any real modification for any code paths but it makes
sense for coherence.

v2 (Eric Engestrom): Just place the define at the end of the
                     enumeration, not below.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-08-02 21:33:03 +03:00
Matt Turner
c3211ae093 i965: Disable the unlit centroid workaround on Gen7.
Once upon a time (commit 8313f44409) Paul added code for the unlit
centroid workaround (WaCopyUnlitCentroidBarys). His commit message
claims it fixed the EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled} piglit tests but does not say
on which platform, though he cites the IVB PRM.

"3DSTATE_WM [DevIVB, DevHSW]" says

   "[DevIVB]: Workaround: When Centroid Barycentric mode is required, HW
    may produce incorrect interpolation results when a 2X2 pixels have
    unlit pixels."

I later disabled it for Haswell (commit f6db414f3c) with no known ill
effects.

The Sandybridge page does not have this text, but the workarounds
database (see WaCopyUnlitCentroidBarys) says the issues applies *only*
to Sandybridge, and in fact in commit 1a2de7dce8 I note that disabling
the workaround on Sandybridge causes the tests Paul originally mentioned
to fail.

So this is, and always has been, a huge confusing mess.

Disabling the workaround indeed causes the tests Paul originally
mentioned to fail on Sandybridge but not on Ivybridge/Baytrail.

On Ivybridge:

   total instructions in shared programs: 6914901 -> 6909599 (-0.08%)
   instructions in affected programs: 106766 -> 101464 (-4.97%)
   helped: 884

   total cycles in shared programs: 70874764 -> 70813774 (-0.09%)
   cycles in affected programs: 794144 -> 733154 (-7.68%)
   helped: 688
   HURT: 186

   LOST:   1
   GAINED: 6

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-02 10:37:13 -07:00
Marek Olšák
6db93cd167 gallium/util: fix align64
it cut off the upper 32 bits

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-01 23:28:14 +02:00
Matt Turner
88ad8c7ded mesa: Drop -fno-strict-aliasing.
Improves performance of OglBatch7 by 4.06851% +/- 1.17925% (n=169) on
Haswell, and cuts ~18k of .text:

   text     data      bss      dec      hex  filename
5824627   287816    29384  6141827   5db783  before/i965_dri.so
5806354   287816    29384  6123554   5d7022  after/i965_dri.so

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-08-01 12:09:17 -07:00
Matt Turner
12a14052e8 i915: Avoid aliasing violation.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-01 12:09:17 -07:00
Matt Turner
be35c6ba92 draw: Avoid aliasing violations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
8e68f35d32 r600g: Avoid aliasing violations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
d2838f77ec r300g: Avoid aliasing violation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
16ff8f9ae8 gallium/auxiliary: Add u_bitcast.h header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner
bbe012f02a glsl_to_tgsi: Avoid aliasing violations.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-08-01 12:09:17 -07:00
Brian Paul
500a3dd11f st/mesa: silence missing braces warning in st_program.c
Silence a gcc warning:
state_tracker/st_program.c: In function 'st_create_fp_variant':
state_tracker/st_program.c:957:10: warning: missing braces around initializer [-Wmissing-braces]
          nir_lower_drawpixels_options options = {0};
          ^
state_tracker/st_program.c:957:10: warning: (near initialization for 'options.texcoord_state_tokens') [-Wmissing-braces]

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:20:19 -06:00
Brian Paul
13fa051356 auxiliary/os: add new os_get_command_line() function
This can be used by the driver to get the command line which started
the process.  Will be used by the VMware driver for extra logging.

For now, this is only implemented for Linux via /proc/self/cmdline
and Windows via GetCommandLine().

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:20:19 -06:00
Charmaine Lee
c2b4942afc svga: avoid redundant SetVertexBuffer/SetIndexBuffer commands at rebind
This patch eliminates the redundant SetVertexBuffers and
SetIndexBuffer commands that are emitted for rebind purpose.
With this patch, the set commands will be skipped, but we will still
reference the associated resources to allow the kernel to
bring in the resources.

Tested with Lightsmark2008, Valley, MTT glretrace, piglit, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-01 12:20:19 -06:00
Rob Clark
53b2b8bf6f u_vbuf: fix potentially bogus assert
There are cases where we hit u_vbuf path due to alignment or pitch-
alignment restrictions, but for an output-format that u_vbuf does not
support translating (yet the driver does support natively).  In which
case we hit the memcpy() path and don't care that u_vbuf doesn't
understand it.

Fixes crash with debug build of mesa in:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95000
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 13:42:11 -04:00
Ben Widawsky
e7c8c85785 gbm: Removed unused function.
AFAICT, it's never been used.

It was briefly nudged in the right direction here:
commit 10e5ffd496
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date:   Sat Jan 25 17:19:10 2014 +0000

    gbm: do not export _gbm_mesa_get_device

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2016-08-01 09:11:14 -07:00
Timothy Arceri
cec377eed3 i965: fix comparison warning
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-01 14:52:07 +10:00
Eric Anholt
26ff7e373f vc4: Zero-initialize the hardware sampler view structure.
Fixes failure to initialize the force_first_level flag, causing
failures in piglit levelclamp.
2016-07-31 19:23:03 -07:00
Mathias Fröhlich
b730960e77 mesa: Remove set but not used gl_client_array::Stride.
The field is only read for printing today and
there it was probably a leftover.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:46 +02:00
Mathias Fröhlich
56c65cd315 mesa: Remove set but not used gl_client_array::Enabled.
The way it is used today does not care about the
Enabled flag anymore.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:46 +02:00
Mathias Fröhlich
43a6f435ca vbo: Use the VAO array enabled flags in vbo_exec_array.
Instead of gl_client_array::Enabled inside a VAO,
directly use the gl_vertex_attrib_array::Enabled value
which is the origin of the above.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:46 +02:00
Mathias Fröhlich
4cda690019 vbo: Walk the VAO in check_array_data.
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:46 +02:00
Mathias Fröhlich
99b42184f9 vbo: Walk the VAO in print_draw_arrays.
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO. Also make use of gl_vert_attrib_name.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Mathias Fröhlich
eec516d8e1 mesa: Walk the VAO in _mesa_print_arrays.
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO. Also make use of gl_vert_attrib_name.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Mathias Fröhlich
144737a498 vbo: Walk the VAO to check for mapped buffers.
Similarily to _mesa_all_varyings_in_vbos walk the VAO
to check if we have an illegal mapped buffer object
instead of walking all gl_client_arrays.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Mathias Fröhlich
3f5e5696fe vbo: Walk the VAO to see if all varyings are in vbos.
In vbo_draw_transform_feedback we currently look at
exec->array.inputs to determine if all varying
vertex attributes reside in vbos. But the vbo_bind_arrays
call only happens past the vbo_all_varyings_in_vbos
query. Thus we may work on a stale set of client arrays.
Using the current VAOs content for this query feels much
more logical to me.
Additionally with this change mesa makes more use of the
information already tracked in the VAO instead of looping
across VERT_ATTRIB_MAX vertex arrays.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Mathias Fröhlich
f8be969b1b mesa: Implement _mesa_all_varyings_in_vbos.
Implement the equivalent of vbo_all_varyings_in_vbos for
vertex array objects.

v2: Update comment.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Mathias Fröhlich
f7cb46a972 mesa: Unbind deleted vbo using _mesa_bind_vertex_buffer.
When a vertex buffer object gets deleted, it is unbound
at the VAO. To do this use _mesa_bind_vertex_buffer instead
of plain unreferencing the buffer object. This keeps the VAOs
internal state consistent. In this case it showed up with
gl_vertex_array_object::VertexAttribBufferMask getting out of
sync.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-31 10:05:45 +02:00
Timothy Arceri
f696b712d7 glsl: be more strict on block qualifiers
V2: Add spec references and allow patch qualifier (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96528
2016-07-31 09:24:45 +10:00
Timothy Arceri
d3dc1b8b5e glsl: add name param to validate_flags()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-31 09:24:45 +10:00
Timothy Arceri
2262fe4081 glsl: add component to ast_type_qualifier::validate_flags
This was added with ARB_enhanced_layouts.

V2: Add an extra format specifier for the new qualifier.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-31 09:24:45 +10:00
Timothy Arceri
bbe839379a docs: Add GL4.4 and ARB_enhanced_layouts to the release notes
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-31 08:19:21 +10:00
Kenneth Graunke
b5661c1d70 anv: Perform rasterizer discard in the SOL stage instead of the clipper.
See commit b0629e6894, where we discovered
that the SOL stage's "Rendering Disable" feature is a lot faster at
throwing away all geometry than the clipper's "reject all" mode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-30 12:06:37 -07:00
Roland Scheidegger
99a47391e4 Revert "gallium/util: fix resource leak"
This reverts commit d1fe26a628.

Replacing a resource leak with a segfault isn't the solution.
2016-07-30 18:18:09 +02:00
Eric Engestrom
d1fe26a628 gallium/util: fix resource leak
CovID: 401540
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-30 17:27:42 +02:00
francians@gmail.com
e713a9e613 freedreno/a4xx: fix comparison out of range warnings
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:25:42 -04:00
francians@gmail.com
43492c7f2c freedreno/a3xx: fix comparison out of range warnings
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:25:31 -04:00
francians@gmail.com
089cc74b6a freedreno/a2xx: fix comparison out of range warnings
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:25:16 -04:00
francians@gmail.com
3fa68fdc90 freedreno/ir3: init ir3_shader_key with memset()
To silence missing initializers warning

Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:24:59 -04:00
Eric Engestrom
a63bac9271 gallium/freedreno: move cast to avoid integer overflow
Previously, the bitshift would be performed on a simple int (32 bits on
most systems), overflow, and then be cast to 64 bits.

CovID: 1362461
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Eric Engestrom
3563c4d161 freedreno/a2xx: remove duplicate assignment
CovID: 1362445, 1362446
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
2d64a003c5 freedreno: defer flush_queue allocation
Some apps, like warsow, create a bazillion contexts but don't render on
most of them.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
4175606474 freedreno: add some hw query traces
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
e684c32d2f freedreno: some locking
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
010e4b2d52 os: add pipe_mutex_assert_locked()
Would be nice if we could also have lockdep, like in the linux kernel.
But this is better than nothing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
9f0eb69527 freedreno: drop needs_rb_fbd
We need to emit RB_FRAME_BUFFER_DIMENSION once per batch.. tracking this
in fd_context is wrong when the gmem code executes asynchronously from
the flush_queue worker.  But in fact we don't really need to track it at
all.  We cannot assume previous value at the beginning of the batch
(because of other processes potentially using the GPU), so just drop the
tracking and emit it in _tile_init().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
e6bfe1c773 freedreno: move needs_wfi into batch
This is also used in gmem code, which executes from the "bottom half"
(ie. from the flush_queue worker thread), so it cannot be in fd_context.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
0739bbceec freedreno: a bit of micro-optimization
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
e1b1052700 freedreno: drop mem2gmem/gmem2mem query stages
They weren't really used, and it gets somewhat more complicated to deal
with if batches are flushed asynchronously (on another thread).  So just
drop them, and move _query_set_state(NULL) call into batch (so it is not
happening on background thread).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
00bed8a794 freedreno: threaded batch flush
With the state accessed from GMEM+submit factored out of fd_context and
into fd_batch, now it is possible to punt this off to a helper thread.
And more importantly, since there are cases where one context might
force the batch-cache to flush another context's batches (ie. when there
are too many in-flight batches), using a per-context helper thread keeps
various different flushes for a given context serialized.

TODO as with batch-cache, there are a few places where we'll need a
mutex to protect critical sections, which is completely missing at the
moment.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
c44163876a freedreno: track batch/blit types
Add a bit of extra book-keeping about blits and back-blits (from
resource shadowing).  If the app uploads all mipmap levels, as opposed
to uploading the first level and then glGenerateMipmap(), we can discard
the back-blit (as opposed to being naive and shadowing the resource for
each mipmap level).  Also, after a normal blit, we might as well flush
the batch immediately, since there is not likely to be further rendering
to the surface.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
7f8fd02dc7 freedreno: re-order support for hw queries
Push query state down to batch, and use the resource tracking to figure
out which batch(es) need to be flushed to get the query result.

This means we actually need to allocate the prsc up front, before we
know the size.  So we have to add a special way to allocate an un-
backed resource, and then later allocate the backing storage.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
10baf05b2c freedreno: use prsc for hw queries
Switch to using a pipe_resource (rather than an fd_bo directly) for hw
query result buffers.  This is first step towards making queries work
properly with reordered batches, since we'll need the additional
dependency tracking to know which batches to flush.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
ba30096888 freedreno: support discarding previous rendering in special cases
Basically, to "DCE" blits triggered by resource shadowing, in cases
where the levels are immediately completely overwritten.  For example,
mid-frame texture upload to level zero triggers shadowing and back-blits
to the remaining levels, which are immediately overwritten by
glGenerateMipmap().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
7105774bab freedreno: shadow textures if possible to avoid stall/flush
To make batch re-ordering useful, we need to be able to create shadow
resources to avoid a flush/stall in transfer_map().  For example,
uploading new texture contents or updating a UBO mid-batch.  In these
cases, we want to clone the buffer, and update the new buffer, leaving
the old buffer (whose reference is held by cmdstream) as a shadow.

This is done by blitting the remaining other levels (and whatever part
of current level that is not discarded) from the old/shadow buffer to
the new one.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
dcde4cd114 freedreno: spiff up some debug traces
Make it easier to track batches, to ensure things happen properly when
they are reordered.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
9f219c7047 freedreno: add batch-cache and batch reordering
Note that I originally also had a entry-point that would construct a key
and do lookup from a pipe_surface.  I ended up not needing that (yet?)
but it is easy-enough to re-introduce later if we need it for the blit
path.

For now, not enabled by default, but can be enabled (on a3xx/a4xx) with
FD_MESA_DEBUG=reorder.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
f02a64dbdd freedreno: move more batch related tracking to fd_batch
To flush batches out of order, the gmem code needs to not depend on
state from fd_context (since that may apply to a more recent batch).
So this all moves into batch.

The one exception is the gmem/pipe/tile state itself.  But this is
only used from gmem code (and batches are flushed serially).  The
alternative would be having to re-calculate GMEM layout on every
batch, even if the dimensions of the render targets are the same.

Note: This opens up the possibility of pushing gmem/submit into a
helper thread.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
eeafaf2d37 freedreno: dynamically sized/growable cmd buffers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
9e4561d3c4 freedreno: push resource tracking down into batch
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Rob Clark
9bbd239a40 freedreno: introduce fd_batch
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping.  In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.

For now there is just a single batch at a time.  Note that one
outcome of this change is that rb's are allocated/freed on each
use.  But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.

The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Marek Olšák
12aec78993 mesa: remove dd_function_table::UseProgram
finally unused

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
b47839ad83 st/mesa: update sampler states when shaders are changed
This bug seems to have always been there. Applications changing shaders
but not textures between draw calls would have gotten undefined behavior.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
c7954b130a st/mesa: don't dirty sample shading on _NEW_PROGRAM
Already done as part of ST_NEW_FRAGMENT_PROGRAM in st_validate_state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
79dcd69afa st/mesa: remove excessive shader state dirtying
This just needs to be done by st_validate_state.

v2: add "shaders_may_be_dirty" flags for not skipping st_validate_state
    on _NEW_PROGRAM to detect real shader changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
1f73e2bb94 st/mesa: unreference optional shaders when unbinding
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
0a46e6f410 st/mesa: skip updates of states that have no effect
v2: - also don't check edge flags for GLES

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
c8fe3b9dca st/mesa: completely rewrite state atoms
The goal is to do this in st_validate_state:
   while (dirty)
      atoms[u_bit_scan(&dirty)]->update(st);

That implies that atoms can't specify which flags they consume.
There is exactly one ST_NEW_* flag for each atom. (58 flags in total)

There are macros that combine multiple flags into one for easier use.

All _NEW_* flags are translated into ST_NEW_* flags in st_invalidate_state.
st/mesa doesn't keep the _NEW_* flags after that.

torcs is 2% faster between the previous patch and the end of this series.

v2: - add st_atom_list.h to Makefile.sources

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
53bc28920a st/mesa: remove st_tracked_state::name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
f2adba4a4c st/mesa: remove atom debugging code
This won't be needed after the rewrite.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Kenneth Graunke
ebdc82d065 i965: Fix move_interpolation_to_top() pass.
The pass I introduced in commit a2dc11a781
was entirely broken.  A missing "break" made the load_interpolated_input
case always fall through to "default" and hit a "continue", making it
not actually move any load_interpolated_input intrinsics at all.
It would only move the simple load_barycentric_* intrinsics, which
don't emit any code anyway, making it basically useless.

The initial version I sent of the pass worked, but I apparently
failed to verify that the simplified version in v2 actually worked.

With the obvious fix applied (so we actually tried to move
load_interpolated_input intrinsics), I discovered a second bug: we
weren't moving the offset SSA def to the top, breaking SSA validation.

The new version of the pass actually moves load_interpolated_input
intrinsics and all their dependencies, as intended.

Papers over GPU hangs on Ivybridge and Baytrail caused by the
recent NIR FS input rework by restoring the old behavior.
(I'm not honestly sure why they hang with PLN not at the top.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-29 16:05:24 -07:00
Rob Clark
591eeb7d1c freedreno: limit non-user constant buffers to a4xx
Seems to mostly work on a3xx.  Except when it doesn't and kills gpu
quite badly.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-29 14:58:39 -04:00
Jan Ziak
427771d1c7 glsl: fix uninitialized instance variable
Valgrind detected that variable ir_copy_propagation_visitor::killed_all
is uninitialized.

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-29 14:57:51 -04:00
Jan Ziak
b107169eef configure: add support for LLVM 4.0.0svn static libs
Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-07-29 16:24:03 +09:00
Rob Herring
a235765d27 virgl: add exported dmabuf to BO hash table
Exported dmabufs can get imported by the same process, but the handle was
not getting added to the hash table on export. Add the handle to the hash
table on export.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-29 09:09:56 +10:00
Anuj Phogat
6d958c7c16 anv: Enable per sample shading on gen8+
Vulkan CTS test results on gen9:
./deqp-vk --deqp-case=dEQP-VK.pipeline.multisample.min_sample_shading*
Test run totals:
  Passed:        60/90 (66.7%)
  Failed:        0/90 (0.0%)
  Not supported: 30/90 (33.3%)
  Warnings:      0/90 (0.0%)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-28 13:11:12 -07:00
Anuj Phogat
0f94cdc976 anv/pipeline: Fix setting per sample shading in pixel shader
We should use the persample_dispatch variable in prog_data.

Fixes all (~60) the DEQP sample shading tests. Many tests exited with
VK_ERROR_OUT_OF_DEVICE_MEMORY without this patch.

V2: Use the shader key bits set in brw_compile_fs (Jason)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-28 13:11:12 -07:00
Nicolas Boichat
9ee683f877 egl/dri2: Add reference count for dri2_egl_display
android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
eglTerminate, followed by eglReleaseThread. A similar case is
observed in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=69622,
where the test calls eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).

With the current code, dri2_dpy structure is freed on eglTerminate
call, so the display is not initialized when eglReleaseThread calls
MakeCurrent with NULL parameters, to unbind the context, which
causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
either in glFlush or in a latter call.

eglTerminate specifies that "If contexts or surfaces associated
with display is current to any thread, they are not released until
they are no longer current as a result of eglMakeCurrent."

However, to properly free the current context/surface (i.e., call
glFlush, unbindContext, driDestroyContext), we still need the
display vtbl (and possibly an active dri dpy connection). Therefore,
we add some reference counter to dri2_egl_display, to make sure
the structure is kept allocated as long as it is required.

One drawback of this is that eglInitialize may not completely reinitialize
the display (if eglTerminate was called with a current context), however,
this seems to meet the EGL spec quite well, and does not permanently
leak any context/display even for incorrectly written apps.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-28 14:08:25 +01:00
Emil Velikov
8431c0e9d4 vc4: automake: remove vc4_drm.h from the sources lists
The file was removed with earlier commit breaking 'make dist'.
Drop it from Makefile.sources since it's no longer around.

Fixes: 16985eb308 ("vc4: Switch to using the libdrm-provided
vc4_drm.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-28 14:08:24 +01:00
Nicolai Hähnle
bade0cd0fb ddebug: use pclose to close a popen()'d FILE
Found by Coverity.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-28 10:47:51 +01:00
Nicolai Hähnle
21556d86fc glsl: fix optimization of discard nested multiple levels
The order of optimizations can lead to the conditional discard optimization
being applied twice to the same discard statement. In this case, we must
ensure that both conditions are applied.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96762
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-28 10:47:04 +01:00
Nicolai Hähnle
185b0c15ab st_glsl_to_tgsi: only skip over slots of an input array that are present
When an application declares varying arrays but does not actually do any
indirect indexing, some array indices may end up unused in the consuming
shader, so the number of input slots that correspond to the array ends
up less than the array_size.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-28 10:46:02 +01:00
Dieter Nützel
041b330a32 clover: make GCC 4.8 happy
Without this GCC 4.8.x throws below error:

error: invalid initialization of non-const reference of type
'clover::llvm::compat::raw_ostream_to_emit_file {aka llvm::raw_svector_ostream&}'
from an rvalue of type '<brace-enclosed initializer list>'

v2: change commit title and add error message like Eric Engestrom requested

Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97019
[ Francisco Jerez: Trivial formatting fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-27 20:41:05 -07:00
Timothy Arceri
a86aa87342 i965: remove unnecessary null check
We would have hit a segfault already if this could be null.

Fixes Coverity warning spotted by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-28 11:05:57 +10:00
Timothy Arceri
29d70cc964 glsl: free hash tables earlier
These are only used by get_matching_input() which has been call
at this point so free the hash tables.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-28 08:05:04 +10:00
Samuel Pitoiset
af08cfc626 nvc0: enable ARB_tessellation_shader on GM107+
This exposes OpenGL 4.1 on Maxwell (tested on GM107 and GM206).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:19:07 +02:00
Samuel Pitoiset
3ac373df6e gm107/ir: add a legalize SSA pass for PFETCH
PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:58 +02:00
Samuel Pitoiset
653af07119 nvc0: fix up TCP header on GM107+
The number of outputs patch (limited to 255) has moved in the TCP
header, but blob seems to also set the old position. Also, the high
8-bits are now located inbetween the min/max parallel output read
address at position 20.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:41 +02:00
Mathias Fröhlich
2060f19b4f vbo: Fix handling of POS/GENERIC0 attributes.
In case of split primitives we need to restore
the original setting of the vtx.attrsz array to make
immediate mode attribute array tracking work.

v2: Use bool instead of boolean.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96950
2016-07-27 06:43:03 +02:00
Marek Olšák
c98c732158 radeon/llvm: Use alloca instructions for larger arrays [revert a revert]
This reverts commit f84e9d749f.

Bioshock Infinite no longer hangs.
2016-07-26 23:31:56 +02:00
Marek Olšák
8636a718b5 r600g: add support for B5G6R5 PBO uploads via texture buffers (v2)
v2: set endian swap to 16

untested
2016-07-26 23:21:45 +02:00
Marek Olšák
1e5f00f9d5 radeonsi: pre-generate shader logs for ddebug
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
18475aab6d radeonsi: add empty lines after shader stats
to separate individual shaders dumped consecutively.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
dd66f9d3e7 radeonsi: move the shader key dumping to si_shader_dump
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
b47727a83a ddebug: implement pipelined hang detection mode
For good performance while being able to generate decent hang reports.
The report doesn't contain the parsed IB and the buffer list, but it
isolates the draw call and dumps shaders while not having to flush
the context.

This is for GPU hangs that are harder to reproduce and require interactive
playing for minutes or even hours.

dd_pipe.h explains some implementation details. Initializing, copying
(recording) and clearing states is most of the code.

The performance should be at least 50% of the normal performance depending
on the circumstances. (i.e. 50% is expected to be the worst case scenario,
not the best case) The majority of time is spent in
dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all
the optimizations in later patches. There is no obvious way to optimize
that further.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
0795a3d54f ddebug: don't save pointers to call parameters
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
e4079677a7 ddebug: move dd_call into dd_pipe.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
d50f9e9b04 ddebug: separate draw call dumping logic
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
95c3025a41 ddebug: move all states into a separate structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
f7720948cc ddebug: write contents of dmesg into hang reports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
1f85f17998 ddebug: implement create_batch_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
6b9924ccb6 ddebug: don't use abort()
We don't want a core dump.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
26ef8158ac ddebug: make dd_get_file_stream accept the screen only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
27fa933a71 ddebug: clean up ddebug_screen_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
6bf81de339 gallium: rework flags for pipe_context::dump_debug_state
The pipelined hang detection mode will not want to dump everything.
(and it's also time consuming) It will only dump shaders after a draw call
and then dump the status registers separately if a hang is detected.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Rob Herring
9ace2c1355 vc4: add hash table look-up for exported dmabufs
It is necessary to reuse existing BOs when dmabufs are imported. There
are 2 cases that need to be handled. dmabufs can be created/exported and
imported by the same process and can be imported multiple times.
Copying other drivers, add a hash table to track exported BOs so the
BOs get reused.

v2: Whitespace fixup (by anholt)

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-26 13:47:50 -07:00
Eric Anholt
ce8504d196 vc4: Disable early Z with computed depth.
We don't tell the hardware whether we're computing depth, so we need
to manage early Z state manually.  Fixes piglit early-z.
2016-07-26 13:47:50 -07:00
Eric Anholt
4d0b2c7aaa ttn: Update shader->info as we generate code.
We could use the nir_shader_gather_info() pass to update it after the
fact, but this is what glsl_to_nir and prog_to_nir do.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2016-07-26 13:47:50 -07:00
Vedran Miletić
7b9a0f4e38 mesa: standardize naming Mesa3D, MESA -> Mesa
Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-26 13:28:01 -07:00
Kenneth Graunke
95c48391ee mesa: Make MESA_SHADER_CAPTURE_PATH skip shaders with Name == -1.
Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta
shaders that we don't really want to capture.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-26 13:27:09 -07:00
Matt Turner
20553e4a2d mesa: Use AC_HEADER_MAJOR to include correct header for major().
Gentoo has been smoke testing an upcoming change to glibc.

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=580392
2016-07-26 12:12:41 -07:00
Matt Turner
815135166c glsl: Remove references to tail_pred. 2016-07-26 12:12:27 -07:00
Matt Turner
5ed3299822 glx: Avoid aliasing violations.
Compilers are perfectly capable of generating efficient code for calls
like these to memcpy().

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
2a1d2874f1 mesa: Avoid aliasing violation in uniform_query.cpp.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
f5ac1d366e mesa: Avoid aliasing violation in FXT1.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
a1e9b72102 swrast: Avoid aliasing violation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
149309a424 glsl: Avoid aliasing violations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
d1f6f65697 glsl: Separate overlapping sentinel nodes in exec_list.
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.

No difference in OglBatch7 (n=20).

Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Jason Ekstrand
5d76690f17 i965/miptree: Stop multiplying cube depth by 6 in HiZ calculations
intel_mipmap_tree::logical_depth0 is now in number of 2D slices so we no
longer need to be multiplying by 6.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-26 07:58:44 -07:00
Jason Ekstrand
833e389bc0 i965/miptree/isl: Stop multiplying depth by 6 for cubes
Now that the logical_depth0 field is in number of 2D slices, we don't need
to be multiplying by 6 when creating the surface.  It wasn't hurting
anything primarily because we get the actual length from the view which was
already handling it correctly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-26 07:58:44 -07:00
Jason Ekstrand
d16dc8e963 i965/blorp/gen8: Stop multiplying depth by 6 for cubes
intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need
for us to multiply by 6 when we go to fill out a blorp surface state.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-26 07:58:44 -07:00
Samuel Pitoiset
126bd15940 nvc0: use nvc0_m2mf_push_linear() to reduce code duplication
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-26 00:50:34 +02:00
Samuel Pitoiset
c5236f0ecc nvc0: use nve4_p2mf_push_linear() to reduce code duplication
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-26 00:40:37 +02:00
Andreas Boll
0420666ac0 build: Remove unused AX_CHECK_COMPILE_FLAG macro
Unused since 1a6ae84041

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-25 15:14:12 +02:00
Nils Wallménius
a354c389f5 main: memcpy larger chunks in _mesa_propagate_uniforms_to_driver_storage
When possible, do the memcpy on larger blocks. This reduces cycles
spent in _mesa_propagate_uniforms_to_driver_storage from
1.51 % to 0.62% according to perf during the Unigine Heaven benchmark.
It did not affect the framerate of the benchmark. The system used for
testing was an i5 6600K with a Radeon R9 380.

Piglit hangs randomly on this system both with and without the patch
so i could not make a comparison.

v2: fixed whitespace

Signed-off-by: Nils Wallménius <nils.wallmenius@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-25 13:51:16 +02:00
Boyuan Zhang
dd208ea006 st/va: enable h264 VAAPI encode
Enable H.264 VAAPI encoding through config. Currently only H.264 baseline is supported. Encode entrypoint is not accepted by driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:54 +02:00
Boyuan Zhang
71da1354d7 st/va: add function to handle misc param type frame rate
Frame rate can be passed to driver either through VAEncSequenceParameterBufferType or VAEncMiscParameterTypeFrameRate. Previous code only implement the former one, which is used by Gstreamer-Vaapi. Now adding implementation for VAEncMiscParameterTypeFrameRate. Also adding default frame rate as 30 just in case application never provides frame rate information to driver.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang
10dec2de2d st/va: add enviromental variable to disable interlace
Add environmental variable to disable interlace mode. At VAAPI decoding stage, driver can not distinguish b/w pure decoding case and transcoding case. And since interlace encoding is not supported, we have to disable interlace for transcoding case. The temporary solution is to use enviromental variable to disable interlace mode.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:53 +02:00
Boyuan Zhang
b0ceb4cc48 st/va: add preset values for VAAPI encode
Add some hardcoded values hardware needs mainly for rate control purpose. With previously hardcoded values for OMX, the rate control result is not correct. This change fixed the rate control result by setting correct values for Vaapi.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang
85d807f2e0 st/va: add functions for VAAPI encode
Add necessary functions/changes for VAAPI encoding to buffer and picture. These changes will allow driver to handle all Vaapi encode related operations. This patch doesn't change the Vaapi decode behaviour.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-07-25 13:39:52 +02:00
Boyuan Zhang
10c1cc47a6 st/va: get rate control method from configattrib v2
Rate control method is passed from app to driver through config attrib list.
That is why we need to store this rate control method to config. And later
on, we will pass this value to context->desc.h264enc.rate_ctrl.rate_ctrl_method.

v2 (chk): fix broken build and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang
34f4634843 st/va: add conversion for yv12 to nv12in putimage v2
For putimage call, if image format is yv12 (or IYUV with U V field swap) and
surface format is nv12, then we need to convert yv12 to nv12 and then copy
the converted data from image to surface. We can't use the existing logic
where surface is destroyed and re-created with yv12 format.

v2 (chk): fix some compiler warnings and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:51 +02:00
Boyuan Zhang
23b4ab1738 vl/util: add copy func for yv12image to nv12surface v2
Add function to copy from yv12 image to nv12 surface for VAAPI putimage call.
We need this function in VaPutImage call where copying from yv12 image to nv12
surface for encoding. Existing function can't be used because it only work for
copying from yv12 surface to nv12 image in Vaapi.

v2: cleanup variable types and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:18 +02:00
Boyuan Zhang
5bcaa1b9e9 st/va: add encode entrypoint v2
VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We
will save this encode entry point in config. config_id was used as profile
previously. Now, config has both profile and entrypoint field, and config_id is
used to get the config object. Later on, we pass this entrypoint to
context->templat.entrypoint instead of always hardcoded to
PIPE_VIDEO_ENTRYPOINT_BITSTREAM for decoding case previously. Encode entrypoint
is not accepted by driver until we enable Vaapi encode in later patch.

v2 (chk): fix commit message to match 80 chars, use switch instead of ifs,
	  fix memory leaks in the error path, implement vlVaQueryConfigEntrypoints
	  as well, drop VAEntrypointEncPicture (only used for JPEG).

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:30:42 +02:00
Samuel Pitoiset
e7b2ce5fd8 nvc0: upload sample locations on GM20x
This fixes a bunch of multisample piglit tests on GM206, like
bin/arb_texture_multisample-texelfetch 2 -auto -fbo

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-24 22:46:26 +02:00
Rob Clark
2f57e57881 freedreno/a4xx: time-elapsed query should be active for clears
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-24 09:33:05 -04:00
Samuel Pitoiset
3a2e67bf78 nvc0/ir: fix up an assertion in emitUADD()
It's illegal to have neg modifiers on both sources for OP_ADD,
and it's illegal to have OP_SUB with just src0 neg.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-24 00:42:47 +02:00
Samuel Pitoiset
a159a3d5cb nvc0: fix wrong indentation in nvc0_validate_fb()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-23 23:59:10 +02:00
Ilia Mirkin
e483cb9a3a glsl: reuse main extension table to appropriately restrict extensions
Previously we were only restricting based on ES/non-ES-ness and whether
the overall enable bit had been flipped on. However we have been adding
more fine-grained restrictions, such as based on compat profiles, as
well as specific ES versions. Most of the time this doesn't matter, but
it can create awkward situations and duplication of logic.

Here we separate the main extension table into a separate object file,
linked to the glsl compiler, which makes use of it with a custom
function which takes the ES-ness of the shader into account (thus
allowing desktop shaders to properly use ES extensions that would
otherwise have been disallowed.) We can also now use this logic to
generate #define's for all supported extensions automatically, removing
the duplicate (and often inaccurate) list in glcpp.

The effect of this change should be nil in most cases. However in some
situations, extensions like GL_ARB_gpu_shader5 which were formerly
available in compat contexts on the GLSL side of things will now become
inaccessible.

This regresses two ES CTS tests:

  ES3-CTS.shaders.shader_integer_mix.define
  ES31-CTS.shader_integer_mix.define

however that is due to them using #version 100 instead of 300 es. As the
extension is only defined for ES3, I believe this is the correct
behavior.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v2)
v2 -> v3: integrate glcpp defines into the same mechanism
2016-07-23 13:48:04 -04:00
Rob Clark
9253dcde58 freedreno/a4xx: timestamp queries
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Rob Clark
b888d8e937 freedreno: hw timestamp support
If the kernel supports it, use hw counter for timestamps.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Rob Clark
6a4b052820 freedreno: prep work for timestamp queries
We need "NULL" state to be a valid bit in the bitmask, because timestamp
queries are not restricted to draw/etc stages (ie. the only commands to
submit may just be to read the timestamp).  And just because there are
no draws, isn't a reason to skip the flush and return zero.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 13:39:30 -04:00
Nicolai Hähnle
3d69357da9 radeonsi: ensure sample locations are set for line and polygon smoothing
Since commit d938b8c, the sample locations are no longer set unconditionally,
so we need to set the atom to dirty on all chips, not just Polaris.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-23 15:36:39 +02:00
Nicolai Hähnle
f755da0f2f radeonsi: fix Polaris MSAA regression
The regression was introduced by commit d938b8c. The problem here is that in
order to use the small primitive filter, we need to explicitly set the sample
locations to 0. But the DB doesn't properly process the change of sample
locations without a flush, and so we can end up with incorrect Z values.

Instead of doing a flush, just disable the small primitive filter when MSAA
is force-disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-23 15:36:38 +02:00
francians@gmail.com
abb2a865a4 freedreno/ir3: Add missing braces in initializer
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 09:14:55 -04:00
francians@gmail.com
c99cdd2175 freedreno/a2xx: silence missing case 'SHADER_COMPUTE' warning (v2)
v2: no need for break after an unreachable (Matt Turner)

Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-23 09:14:18 -04:00
Marek Olšák
700de07771 radeonsi: implement buffer_subdata without indirect calls
There is less noise in CPU profile data now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-23 13:33:42 +02:00
Marek Olšák
8e3e9d2839 gallium/util: don't modify usage in pipe_buffer_write
All drivers were already doing it except virgl.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-23 13:33:42 +02:00
Marek Olšák
1ffe77e7bb gallium: split transfer_inline_write into buffer and texture callbacks
to reduce the call indirections with u_resource_vtbl.

The worst call tree you could get was:
  - u_transfer_inline_write_vtbl
    - u_default_transfer_inline_write
      - u_transfer_map_vtbl
        - driver_transfer_map
      - u_transfer_unmap_vtbl
        - driver_transfer_unmap

That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.

The new interface is:
  pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
  pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
                                stride, layer_stride)

v2: fix whitespace, correct ilo's behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-07-23 13:33:42 +02:00
Kenneth Graunke
0ba7288376 nir: Lower interp_var_at_* like a normal load_var for flat inputs.
"flat centroid" and "flat sample" both just mean "flat", so we should
ignore interpolateAtCentroid/Sample and just return the flat value.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-22 20:31:20 -07:00
Kenneth Graunke
f80bea2d80 mesa: Don't call GenerateMipmap if Width or Height == 0.
One of the WebGL 2.0 conformance tests is trying to call
glGenerateMipmaps with a width and height of 0.  With the meta
implementation, this generates a "framebuffer attachment incomplete"
status, and falls back to the CPU path, calling MapTextureImage.

Except that there's no actual texture to map, and we assert fail.

There's no work to do in this case.  The test expects it to succeed,
so just return early with no error and avoid hassling the driver.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96911
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-22 20:31:20 -07:00
Jason Ekstrand
b33bccb519 anv/pipeline: Set up point coord enables
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
9e05e51cff spirv/nir: Add support for ImageQuerySamples
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
71202352c8 spirv/nir: Handle texture projectors
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
36c31b8fa2 nir/spirv: Refactor coordinate handling in handle_texture
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
b820c8b78c spirv/nir: Refactor type handling in handle_texture
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
561be50a1a spirv/nir: Move opcode selection higher up in handle_texture
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
c8da91aa24 anv/image: Assert that the image format is actually supported
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
34a39e91ba spirv/nir: Don't increment coord_components for array lod queries
For lod query instructions, we really don't care whether or not the sampler
is an array type because that doesn't factor into the LOD.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
67b7d876e4 i965: Get rid of the do_lower_unnormalized_offsets pass
We can do this in NIR now.  No need to keep a GLSL pass lying around for
it.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
9f32721f86 i965/nir: Enable NIR lowering of txf and rect offsets
This fixes the following piglit tests on gen6+:

tex-miplevel-selection textureProjGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRect
tex-miplevel-selection textureGradOffset 2DRectShadow
tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4
tex-miplevel-selection textureProjGradOffset 2DRectShadow

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:54 -07:00
Jason Ekstrand
d9156efc52 nir/lower_tex: Add support for lowering coordinate offsets
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures.  Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:53 -07:00
Jason Ekstrand
843fc8f3e7 nir/lower_tex: Add some helpers for working with tex sources
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:53 -07:00
Jason Ekstrand
09135cd55a nir: Add a helper for determining the type of a texture source
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Jason Ekstrand
3c0077a6ec anv/pipeline: Set binding_table.gather_texture_start
This should get texture gather working on gen8+ and mostly working on gen7.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Jason Ekstrand
95e9d58bdb spirv/nir: Properly handle gather components
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Jason Ekstrand
7c7acf53b2 spirv/nir: Add support for shadow samplers that return vec4
While SPIR-V technically doesn't support "old style" shadow, the
shadow-compare gather instruction does return a vec4 so we need to be able
to set the old_style_shadow bit in NIR.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Jason Ekstrand
2ddefd03b7 spirv/nir: Fix some texture opcode asserts
We can't get an lod with txf_ms and SPIR-V considers textureGrad to be an
explicit-LOD texturing instruction.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Samuel Pitoiset
3f5cf8c488 nv50/ir: allow to swap sources for OP_SUB
This allows the load-propagation pass to swap the sources in presence
of immediate values.

Maxwell (GM107):

total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs    :330741 -> 330154 (-0.18%)
total local used in shared programs   :28032 -> 28032 (0.00%)

                local        gpr       inst      bytes
    helped           0         271         425         425
      hurt           0           0         194         194

Fermi (GF114):

total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs    :380934 -> 380215 (-0.19%)
total local used in shared programs   :33304 -> 33264 (-0.12%)

                local        gpr       inst      bytes
    helped           5         314         521         521
      hurt           0           4         195         195

No regressions on GM107 and GF114 with full piglit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 22:51:37 +02:00
Marek Olšák
2e890b5350 gallium/radeon: make deferred flushes asynchronous
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-22 22:34:49 +02:00
Marek Olšák
d17b35e671 gallium: add PIPE_FLUSH_DEFERRED
There are 2 uses:
- Asynchronous flushing for multithreaded drivers.
- Return a fence without flushing (mid-command-buffer fence). The driver
  can defer flushing until fence_finish is called.

This is required to make Bioshock Infinite faster, which creates
1000 fences (flushes) per frame.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-07-22 22:34:49 +02:00
Marek Olšák
4cdc482283 gallium/os: use CLOCK_MONOTONIC for sleeps (v2)
v2: handle EINTR, remove backslashes

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-22 22:34:49 +02:00
Eric Engestrom
4da9f7e7ce mapi: fix typo in macro name
Fixes: 5ec140c17b ("mapi: Massage code to allow clang to compile.")
Reported-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-22 10:14:00 -07:00
Kenneth Graunke
44ef2ce6ec docs: Put swr back on the GL_ARB_texture_buffer_object_rgb32 list.
Looks like this was lost when resolving merge conflicts in
commit d1fbd4cdb1.
2016-07-22 09:57:54 -07:00
Andres Gomez
d068b38e46 glsl: subroutine types cannot be compared
subroutine variables are to be used just in the way functions are
called. Although the spec doesn't say it explicitely, this means that
these variables are not to be used in any other way than those left
for function calls. Therefore, a comparison between 2 subroutine
variables should also cause a compilation error.

From The OpenGL® Shading Language 4.40, page 117:

  "  To use subroutines, a subroutine type is declared, one or more
     functions are associated with that subroutine type, and a
     subroutine variable of that type is declared. The function
     currently assigned to the variable function is then called by
     using function calling syntax replacing a function name with the
     name of the subroutine variable. Subroutine variables are
     uniforms, and are assigned to specific functions only through
     commands (UniformSubroutinesuiv) in the OpenGL API."

From The OpenGL® Shading Language 4.40, page 118:

  "  Subroutine uniform variables are called the same way functions
     are called. When a subroutine variable (or an element of a
     subroutine variable array) is associated with a particular
     function, all function calls through that variable will call that
     particular function."

Fixes GL44-CTS.shader_subroutine.subroutines_cannot_be_assigned_float_int_values_or_be_compared

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-22 17:30:25 +03:00
Timothy Arceri
a2b3c146d2 i965: fix varying output setup
Since 7f53fead5c we treat every location as using all
four components so we only need special handling for
doubles when they cross multiple locations.

This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations
where the outputs array would overflow when a dmat2 was stored at
the max varying location i.e 30.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-23 00:04:10 +10:00
Samuel Pitoiset
c2801f9272 nvc0/mme: fix offsets used for indirect draws
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x180 to 0x1a0, and the macros have to be re-compiled.

Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 11:32:09 +02:00
Samuel Pitoiset
dbcff7fdbb nvc0: fix offsets of MP perf counters input parameters
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.

This also fixes amd_performance_monitor_measure piglit.

Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-22 11:32:04 +02:00
Kenneth Graunke
cb70773129 mesa: Add GL_BGRA_EXT to the list of GenerateMipmap internal formats.
The GL_EXT_texture_format_BGRA8888 extension specification defines a
GL_BGRA_EXT unsized internal format (which is a little odd - usually
BGRA is a pixel transfer format).  The extension is written against
the ES 1.0 specification, so it's a little hard to map, but I believe
it's effectively adding it to the table used here, so we should allow
it here as well.

Note that GL_EXT_texture_format_BGRA8888 is always enabled (dummy_true),
so we don't need to check if it's enabled here.

This fixes mipmap generation in Skia and ChromeOS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
References: https://bugs.chromium.org/p/chromium/issues/detail?id=630371
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reported-by: Stéphane Marchesin <marcheu@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2016-07-21 21:31:57 -07:00
Kenneth Graunke
be1c53d2cf i965: Fix "operation operation" in comment.
From the redundant redundant department.

Reported-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
2016-07-21 21:31:57 -07:00
Kenneth Graunke
76e161056a i965: Fix shared atomic intrinsics to pay attention to base.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-21 21:31:55 -07:00
Kenneth Graunke
cf6f2d3ce7 nir: Add a base const_index to shared atomic intrinsics.
Commit 52e75dcb8c made nir_lower_io
start using nir_intrinsic_set_base instead of writing const_index[0]
directly.  However, those intrinsics apparently don't /have/ a base,
so this caused assert failures.

However, the old code was happily setting non-existent const_index
fields, so it was pretty bogus too.

Jason pointed out that load_shared and store_shared have a base,
and that the i965 driver uses that field.  So presumably atomics
should have one as well, so that loads/stores/atomics all refer
to variables with consistent addressing.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-21 21:31:41 -07:00
Timothy Arceri
91dde3ddca glsl: re-enable varying packing in GL4.4+
We can still do packing we just need to get the packing type from the consumer
rather than the producer.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97033
2016-07-22 10:21:08 +10:00
Kenneth Graunke
2db357e4c3 i965: Include VUE handles for GS with invocations > 1.
We always resort to the pull model for instanced GS inputs.  So, we'd
better include the VUE handles, or else we can't actually pull anything.

Ian reports that on his branch with OES_geometry_shader enabled,
this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests::

- instanced.draw_2_instances_geometry_2_invocations
- instanced.draw_2_instances_geometry_8_invocations
- instanced.draw_4_instances_geometry_2_invocations
- instanced.draw_4_instances_geometry_8_invocations
- instanced.draw_8_instances_geometry_2_invocations
- instanced.draw_8_instances_geometry_8_invocations
- instanced.geometry_2_invocations
- instanced.geometry_32_invocations
- instanced.geometry_8_invocations
- instanced.geometry_max_invocations
- instanced.geometry_output_different_2_invocations
- instanced.geometry_output_different_32_invocations
- instanced.geometry_output_different_8_invocations
- instanced.geometry_output_different_max_invocations
- instanced.invocation_output_vary_by_attribute
- instanced.invocation_output_vary_by_texture
- instanced.invocation_output_vary_by_uniform
- query.primitives_generated_instanced

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-21 11:15:12 -07:00
Matt Turner
8c8c3f859e mesa: Add -fno-math-errno -fno-trapping-math to CXXFLAGS.
Not sure why I forgot to add them to CXXFLAGS in commit f55c408067 or
commit 875458b778. Cuts about 1k of .text.

   text     data      bss      dec      hex  filename
5806354   287816    29384  6123554   5d7022  i965_dri.so before
5805497   287744    29384  6122625   5d6c81  i965_dri.so after

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-21 10:45:28 -07:00
Matt Turner
5353855e9d mesa: Drop -fno-builtin-memcmp.
According to the referenced bug report, gcc-4.5 and newer do not inline
memcmp(). I see no difference in performance of ipers with llvmpipe on a
Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by removing
this flag.

I attempted to confirm the problem with gcc-4.4, but it fails to compile
for quite a few different reasons.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-21 10:45:28 -07:00
Matt Turner
5ec140c17b mapi: Massage code to allow clang to compile.
According to https://llvm.org/bugs/show_bug.cgi?id=19778#c3 this code
was violating the spec, resulting in it failing to compile.

Cc: mesa-stable@lists.freedesktop.org
Co-authored-by: Tomasz Paweł Gajc <tpgxyz@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89599
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-21 10:45:28 -07:00
Ian Romanick
6bc5491193 docs: Add extensions not part of any GL or GL ES version
Based loosely on patches submitted ages ago by Thomas Helland.

v2: Add lots of missing data provided by Ilia.  Fix sort order of
GL_ARB_sparse_texture extensions suggested by Ilia.

v3: Note that Dave Airlie has started work on GL_ARB_bindless_texture.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-21 10:31:04 -07:00
Ian Romanick
d1fbd4cdb1 docs: Update GL3.txt for OpenGL 4.0 on i965-ish hardware
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-21 10:30:20 -07:00
Ian Romanick
7dc99da81a docs: Update GL3.txt for OpenGL ES on i965-ish hardware
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-21 10:26:55 -07:00
Timothy Arceri
4f89cf4941 i965: print error messages if gs fails to compile
We do this for all other stages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-21 15:05:05 +10:00
Timothy Arceri
b463b1d7cc i965: enable GL4.4 for Gen8+
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-21 12:06:11 +10:00
Timothy Arceri
4ba9bd138a i965: enable ARB_enhanced_layouts for gen6+
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
f3805c5f09 i965/vec4: add packing support for tcs load outputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
255388a965 i965/vec4: add support for packing tes inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-07-21 12:06:11 +10:00
Timothy Arceri
d07cfb31c4 i965/vec4: add support for packing tcs outputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
b25e49a3c7 i965/vec4: support packing tcs inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
d1192bef7e i965/vec4: add component packing for gs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
d1b1fca0b7 i965/vec4: add support for packing vs/gs/tes outputs
Here we create a new output_generic_reg array with the ability to
store the dst_reg for each component of user defined varyings.
This is needed as the previous code only stored the dst_reg based
on the varying location which meant packed varyings would overwrite
each other.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-07-21 12:06:11 +10:00
Timothy Arceri
b427abba0c i965/vec4: add support for packing inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
138aad06b3 i965: add helper for creating packing writemask
For example where n=3 first_component=1 this will give us
0xE (WRITEMASK_YZW).

V2:
Add assert to check first component is <= 4 (Suggested by Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Timothy Arceri
4b57b53f85 i965: add helpers for creating component layout swizzle
This will be used to swizzle components to the beginning or end
of the vector based on the component layout qualifier and whether
we are doing a load or store.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 12:06:11 +10:00
Eric Anholt
d2b4b16589 vc4: Return V3D version details in the GL renderer info.
This is as close as we get to a name for the 3D blocks.
2016-07-20 16:15:15 -07:00
Eric Anholt
d81934cded vc4: Check the V3D version reported by the kernel.
We don't want to bring up an old userspace driver on a kernel for
newer hardware.  We'll also want to look at the other ident fields in
the future.
2016-07-20 16:15:15 -07:00
Eric Anholt
83b8ca58e1 vc4: Detect and report kernel support for branching. 2016-07-20 16:15:15 -07:00
Eric Anholt
16985eb308 vc4: Switch to using the libdrm-provided vc4_drm.h.
The required version is set to .69 for the getparam ioctl that will be
used in the next commit.
2016-07-20 16:15:15 -07:00
Timothy Arceri
3d8c29ed32 docs: mark ARB_enhanced_layouts as DONE for i965
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 09:10:53 +10:00
Timothy Arceri
d99a040bbf i965: enable ARB_enhanced_layouts for gen8+
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-21 09:10:53 +10:00
Timothy Arceri
cba6657d8b nir: add doubles component packing support
This makes sure we give the correct driver location
for doubles when using component packing. Specifically
it handles packing a dvec3 with a double which is the
only packing scenario allowed which spans across two
locations.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-07-21 09:10:53 +10:00
Timothy Arceri
ad5dd39984 i965: add component packing support for load_output intrinsics
Here we use the component qualifier (which is the first component)
as an offset when loading output varyings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-21 09:10:53 +10:00
Timothy Arceri
7f53fead5c i965: enable component packing for vs and fs
Rather than trying to work out the total number of components
used at a location we simply treat all outputs as vec4s. This
removes the need for complex code looping over varyings to match
packed locations and the need for storing the total number of
components used at each location.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-21 09:10:53 +10:00
Timothy Arceri
09e46f99ad i965: bring back type_size_vec4_times_4()
We will use this for output varyings. To make component
packing simpler we will just treat all varyings as vec4s.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-21 09:10:53 +10:00
Jason Ekstrand
9d503aea06 nir/inline: Constant-initialize local variables in the callee if needed
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-20 15:29:55 -07:00
Jason Ekstrand
dc9f2436c3 nir: Add a nir_deref_foreach_leaf helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-20 15:29:55 -07:00
Tom Stellard
106946153f clover: Re-order includes in invocation.cpp to fix build
The build was failing because the official CL headers have a few defines,
like: # define cl_khr_gl_sharing 1

Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-20 21:15:53 +00:00
Tom Stellard
a73bf11a63 clover: Add missing include v2
clang commit r275822 removed unnecessary includes from header files,
so we now need to explicitly include clang/Lex/PreprocessorOptions.h

v2:
  - Use <> instead of "" for the include path.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-20 21:15:53 +00:00
Kenneth Graunke
3dba8516d6 i965: Move VS load_input handling to nir_emit_vs_intrinsic().
TCS/TES/GS and now FS all handle these in stage-specific functions.
CS don't have inputs, so VS was the only one left using this code.

Move it to the VS-specific function for clarity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:26 -07:00
Kenneth Graunke
1608209952 i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.
We no longer use this message.  As far as I can tell, it's fairly
useless - the equivalent information is provided in the payload.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:24 -07:00
Kenneth Graunke
1eef0b73aa i965: Rewrite FS input handling to use the new NIR intrinsics.
This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front.  The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location.  Instead,
each intrinsic specifies exactly what's needed.

This should unblock Timothy's work on GL_ARB_enhanced_layouts.

Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages.  The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.

Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable.  We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.

On Broadwell:

total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209

total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10

total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10

LOST:   3
GAINED: 3

The differences mostly appear to be slight changes in MOVs.

v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
    flag rather than passing it directly to nir_lower_io.  Use the
    unreachable() macro rather than assert in one place.  (Review
    feedback from Chris Forbes.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:16 -07:00
Kenneth Graunke
a2dc11a781 i965: Move load_interpolated_input/barycentric_* intrinsics to the top.
Currently, i965 interpolates all FS inputs at the top of the program.
This has advantages and disadvantages, but I'd like to keep that policy
while reworking this code.  We can consider changing it independently.

The next patch will make the compiler generate PLN instructions "on the
fly", when it encounters an input load intrinsic, rather than doing it
for all inputs at the start of the program.

To emulate this behavior, we introduce an ugly pass to move all NIR
load_interpolated_input and payload-based (not interpolator message)
load_barycentric_* intrinsics to the shader's start block.

This helps avoid regressions in shader-db for cases such as:

   if (...) {
      ...load some input...
   } else {
      ...load that same input...
   }

which CSE can't handle, because there's no dominance relationship
between the two loads.  Because the start block dominates all others,
we can CSE all inputs and emit PLNs exactly once, as we did before.

Ideally, global value numbering would eliminate these redundant loads,
while not forcing them all the way to the start block.  When that lands,
we should consider dropping this hacky pass.

Again, this pass currently does nothing, as i965 doesn't generate these
intrinsics yet.  But it will shortly, and I figured I'd separate this
code as it's relatively self-contained.

v2: Dramatically simplify pass - instead of creating new instructions,
    just remove/re-insert their list nodes (suggested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:11 -07:00
Kenneth Graunke
048a56c1fc i965: Add a pass to demote sample interpolation intrinsics.
When working with a non-multisampled render target, asking for "sample"
interpolation locations doesn't make sense.  We demote them to centroid.

In a couple of patches, brw_compute_barycentric_modes will begin looking
at these intrinsics to determine the barycentric modes.  fs_visitor also
will use them to code-generate pixel interpolator messages or payload
references.  Handling the "but what if it's not MSAA?" logic ahead of
time in a NIR pass simplifies things and prevents duplicated logic.

This patch doesn't actually do anything useful yet as we don't generate
these intrinsics.  I decided to keep it separate as it's self-contained,
in the hopes of shrinking the "convert everything" patch for reviewers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:08 -07:00
Kenneth Graunke
707ca00fce nir: Add nir_load_interpolated_input lowering code.
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.

flat inputs continue using regular load_input.

v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
    passing (in response to review feedback from Chris Forbes).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:00 -07:00
Kenneth Graunke
2496462479 nir: Add new intrinsics for fragment shader input interpolation.
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.

One exception is fragment shader inputs.  load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location.  This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).

With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location.  The intrinsics make this easy:
simply load N components from location <loc, component>.  However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.

Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.

We represent this by introducing five new load_barycentric_* intrinsics:

- load_barycentric_pixel     (ordinary variable)
- load_barycentric_centroid  (centroid qualified variable)
- load_barycentric_sample    (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())

Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2.  The last two also take a sample
or offset source.

We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.

The intention is that flat inputs will still use regular load_input
intrinsics.  This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.

This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.

v2: Document the interp_mode const_index value, define a new
    BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
    some of them (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:00:45 -07:00
Kenneth Graunke
e614062e54 anv: Properly call gen75_emit_state_base_address on Haswell.
This should fix MOCS values.  Caught by Coverity.

CID: 1364155

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
87660579f5 genxml: Rename "API Rendering Disable" to "Rendering Disable".
Gen7/7.5 call it "Rendering Disable" while Gen8/9 prefix it with "API".

Pick one for consistency, and so we can share code between generations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
bfd9942cdc anv: Unify 3DSTATE_CLIP code across generations.
The bulk of this is the same.  There are just a couple fields that only
exist on one generation or another, and we can easily handle those with
an #ifdef.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
44502afd82 anv: Enable early culling on Gen7.
We set the cull mode, but forgot the enable bit.  Gen8 uses this.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
0d77f08042 anv: Fix near plane clipping on Gen7/7.5.
The Gen7/7.5 clip code used APIMODE_OGL, while the Gen8+ clip code used
APIMODE_D3D.  The meaning hasn't changed, so one of these must be wrong.

It appears that the hardware documentation is completely wrong.  It
claims that the "API Mode" bit means:

   0h    APIMODE_OGL    NEAR_VP boundary == 0.0 (NDC)
   1h    APIMODE_D3D    NEAR_VP boundary == -1.0 (NDC)

However, DirectX typically uses 0.0 for the near plane, while unextended
OpenGL uses -1.0.  i965's gen6_clip_state.c uses APIMODE_D3D for the
GL_ZERO_TO_ONE case, so I believe the meanings are backwards from what
the documentation says.

Section 23.2 ("Primitive Clipping") of the Vulkan 1.0.21 specification
contains the following equations:

   -w_c <= x_c <= w_c
   -w_c <= y_c <= w_c
      0 <= z_c <= w_c

This means that Vulkan follows D3D semantics.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
6b67270262 genxml: Add APIMODE_D3D missing enum values and improve consistency.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Kenneth Graunke
c31cf532af genxml: Add CLIPMODE_* prefix to 3DSTATE_CLIP's "Clip Mode" enum values.
Gen6-7.5 use CLIPMODE_REJECT_ALL, while Gen8+ just used REJECT_ALL.
Being consistent will let me unify code, and I prefer having the prefix.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 10:59:44 -07:00
Tim Rowley
0f13a8f770 swr: [rasterizer core] introduce simd16intrin.h
Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged,
adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side,
with emulation, that we can use piecemeal, rather than the all-or-nothing
approach to bring up avx512.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
5fe361e2c0 swr: [rasterizer core] fix for possible int32 overflow condition
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
a123d12e14 swr: [rasterizer core] rename *_MAX enum values to *_COUNT
Makes these names semantically correct.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
e41d9dd576 swr: [rasterizer core] centroid correction
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
e0529a4668 swr: [rasterizer core] support range of values in TemplateArgUnroller
Fixes Linux warnings.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
0363015964 swr: [rasterizer core] ensure adjacent topologies use the cut-aware PA
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
efdaf5fa3e swr: [rasterizer] attribute swizzling and linkage
Add support for enhanced attribute swizzling. Currently supports constant
source overrides to handle PrimitiveID support. No support yet for input
select swizzling or wrap shortest. Removes obsoleted linkageMask and
associated code.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
a5846fb75a swr: [rasterizer common] icc declspec definitions
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:15 -05:00
Tim Rowley
0d13f2e801 swr: [rasterizer jitter] rework vertex/instance ID storage in fetch
Moved the setting into the existing component control code. Fixes bad
interaction between attribute/component setting for vertex/instance ID
and component packing.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Tim Rowley
1d09b3971a swr: [rasterizer core] avx512 simd utility work
Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Tim Rowley
98641f4e73 swr: [rasterizer core] viewport rounding for disabled scissor
Adjust viewport rounding when scissor rect is disabled during macro
tile scissor setup.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-20 10:22:14 -05:00
Jason Ekstrand
96dfed49e4 i965: Stop muging cube array lengths by 6
From the Sky Lake PRM:

   "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port
   Surfaces, the range of this field is [0,340], indicating the number of
   cube array elements (equal to the number of underlying 2D array elements
   divided by 6). For other surfaces, this field must be zero."

In other words, the depth field for cube maps is in number of cubes not
number of 2-D slices so we need to divide by 6.  ISL will do this correctly
for us assuming that we provide it with the correct array bounds which it
expects to be in 2-D slices.  It appears as if we've been doing this wrong
ever since we first added cube map arrays for Sandy Bridge and the change
to ISL made things slightly worse.  While we're at it, we now need to remoe
the shader hacks we've always done since they were only needed because we
were setting the depth field six times too large.

v2: Fix the vec4 backend as well (not sure how I missed this).

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-07-20 08:19:26 -07:00
Jason Ekstrand
e19b7f7f1b i965/miptree: Set logical_depth0 == 6 for cube maps
This matches what we do for cube maps where logical_depth0 is in number of
face-layers rather than number of cubes.  This does mean that we will
temporarily be setting the surface bounds too loose for cube map textures
but we are already setting them too loose for cube arrays and we will be
fixing that in the next commit anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>
2016-07-20 08:19:22 -07:00
Jason Ekstrand
d4d505d0b0 i965/miptree: Enforce that height == 1 for 1-D array textures
The GL API and mesa internals do this differently than we do.  In GL, there
is no depth parameter for 1-D arrays and height is used.  In the i965
miptree code we do the sane thing and make height == 1 and use depth for
number of slices.  This makes for a mismatch every time we create a 1-D
array texture from GL.  Instead of actually solving this problem, we just
said "1-D is hard, let's make sure it works no matter which way we pass the
parameters" and called it a day.

This commit fixes the one GL -> i965 transition point where we weren't
already handling 1-D array textures to do the right thing and then replaces
the magic fixup code with an assert that you're doing the right thing.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>
2016-07-20 08:18:19 -07:00
Stefan Dirsch
27ef7bfd6c Avoid overflow in 'last' variable of FindGLXFunction(...)
This 'last' variable used in FindGLXFunction(...) may become negative,
but has been defined as unsigned int resulting in an overflow,
finally resulting in a segfault when accessing _glXDispatchTableStrings[...].
Fixed this by definining it as signed int. 'first' variable also needs to be
defined as signed int. Otherwise condition for while loop fails due to C
implicitly converting signed to unsigned values before comparison.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Stefan Dirsch <sndirsch@suse.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 16:05:17 +01:00
Tomasz Figa
9e1248d075 egl/android: Stop leaking DRI images
Current implementation of the DRI image loader does not free the images
created in get_back_bo() and so leaks memory. Moreover, it creates a new
image every time the DRI driver queries for buffers, even if the backing
native buffer has not changed. leaking memory again.

This patch adds missing call to destroyImage() in droid_enqueue_buffer()
and a check if image is already created to get_back_bo() to fix the
above.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:48:54 +01:00
Tomasz Figa
565fa6b748 egl/android: Add some useful error messages
It is much easier to debug issues when the application gives some
meaningful error messages. This patch adds few to the EGL Android
platform backend.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:48:03 +01:00
Tomasz Figa
94282b6dd0 egl/android: Check return value of dri2_get_dri_config()
It might return NULL if specific config variant is unsupported.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:47:23 +01:00
Emil Velikov
4f48674d51 i965: store reference to the context within struct brw_fence (v2)
As the spec allows for {server,client}_wait_sync to be called without
currently bound context, while our implementation requires context
pointer.

v2: Add a mutex and acquire it for the duration of
    brw_fence_client_wait() and brw_fence_is_completed() as suggested
    by Chad.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
2016-07-20 15:45:20 +01:00
Nicolas Boichat
9bebef4034 egl/dri2: dri2_make_current: Set EGL error if bindContext fails
Without this, if a configuration is, say, available only on GLES2/3, but
not on GLES1, and is rejected by the dri module's bindContext call,
eglMakeCurrent fails with error "EGL_SUCCESS".

In this patch, we set error to EGL_BAD_MATCH, which is what CTS/dEQP
dEQP-EGL.functional.surfaceless_context expect.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:10:33 +01:00
Tomasz Figa
ccda100a5a egl/android: Remove unused variables
There are some unused variables left after previous clean-ups triggering
compiler warnings. Let's remove them.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:10:33 +01:00
Tomasz Figa
70a28afb29 gallium/dri: Add shared glapi to LIBADD on Android
An earlier patch fixed the problem for classic drivers, however Gallium
was still left broken. This patch applies the same workaround to
Gallium, when compiled for Android. Following is a quote from the
original patch:

0cbc90c57c mesa: dri: Add shared glapi to LIBADD on Android

/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"

On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
  dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
  dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);

However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 15:10:33 +01:00
Emil Velikov
ae9a2baaa6 mesa: scons: remove left over src/glsl include
The path no longer exists.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 13:33:43 +01:00
Emil Velikov
1c7c0d77ac mesa: scons: list builddir before srcdir
Analogous to previous commit.

Note: scons always uses OOT builds, while the in-tree generated files
could be created either manually or by the autoconf build.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 13:32:24 +01:00
Emil Velikov
eafa82e20e mesa: automake: list builddir before srcdir
In the case of building in out-of-tree fashion, while having generated
in-tree sources, the latter [likely stale] files will be used.

Flip the order to prevent that.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-20 13:30:50 +01:00
Józef Kucia
14608ef920 radeonsi: advertise 8 bits subpixel precision for viewport bounds
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-20 12:45:31 +02:00
Józef Kucia
98aa807188 r600: advertise 8 bits subpixel precision for viewport bounds
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-20 12:45:31 +02:00
Józef Kucia
3cd28fe3de gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)
This allows Gallium drivers to advertise the subpixel precision
for floating point viewports bounds.

v2:
  - Set ViewportSubpixelBits in st_init_limits.

Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 12:45:31 +02:00
Samuel Pitoiset
3c78d89692 nvc0: disable MS images on GM107+
MS images have to be handled explicitly and I don't plan to implement
them for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:33 +02:00
Samuel Pitoiset
8489f20689 nv50/ir: print OP_SUREDB subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:30 +02:00
Samuel Pitoiset
1edc44bfd3 gm107/ir: add emission for SUREDx
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:26 +02:00
Samuel Pitoiset
4aaacd6dd0 gm107/ir: add emission for SUSTx and SULDx
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:21 +02:00
Samuel Pitoiset
e14cb05ce1 gm107/ra: fix constraints for surface operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:16 +02:00
Samuel Pitoiset
c68989b2c8 gm107/ir: lower surface operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:12 +02:00
Samuel Pitoiset
2ae4b5d622 nvc0: bind images for 3d/cp shaders on GM107+
On Maxwell, images binding is slightly different (and much better)
regarding Fermi and Kepler because a texture view needs to be uploaded
for each image and this is going to simplify the thing a lot.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:11:03 +02:00
Samuel Pitoiset
1da704a94c nvc0: increase the tex handles area size in the driver cb
Currently, we can store 32 tex handles of 32-bits integer each and
that fits perfectly with the underlying hardware except on GM107+
which requires to upload a texture view for each images.

This patch increases the number of storable texture handles in the
driver constant buffer from 32 to 40 because we expose 8 images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-20 11:10:56 +02:00
Kenneth Graunke
f0f466214e nir: Fix uninitialized use of 'replacement'.
For intrinsics we don't care about, just skip to the next loop iteration
and process the next instruction.  We don't want to execute the rest of
the code.

This was a bug in commit cdfc05ea6e.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-19 17:34:59 -07:00
Kenneth Graunke
89873c9b08 i965: Use tex_mocs instead of rb_mocs for GL images.
Fixes a 10-20% performance regression in OglCSDof caused by commit
5a8c89038a, which made images (in the
image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB.

This seems sketchy, as the default PTE value is supposed to be
WB LLC eLLC, which is the same as our MOCS WB setting.  It's only
supposed to change when using a surface for display, which won't
ever happen for images.  Something may be wrong in the kernel...

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-19 17:34:59 -07:00
Marek Olšák
0ab47146c9 winsys/amdgpu: use pb_cache buckets for fewer pb_cache misses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
dea6fdadca winsys/radeon: use pb_cache buckets for fewer pb_cache misses
This makes Bioshock Infinite with deferred flushing 2.2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
8d5944199d gallium/pb_cache: reduce the number of pointer dereferences
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
3cdc0e133f gallium/pb_cache: divide the cache into buckets for reducing cache misses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
fec7f74129 gallium/pb_cache: check parameters that are more likely to fail first
This makes Bioshock Infinite with deferred flushing 2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
2596ae2b6e radeonsi: emit PS exports last
This effectively removes s_waitcnt instructions after FP16 exports.

Before:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v4, v5   ; 5E000B04
    v_cvt_pkrtz_f16_f32_e32 v1, v6, v7   ; 5E020F06
    exp 15, 1, 1, 0, 0, v0, v1, v0, v0   ; F800041F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v8, v9   ; 5E001308
    v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A
    exp 15, 2, 1, 0, 0, v0, v1, v0, v0   ; F800042F 00000100
    s_waitcnt expcnt(0)                  ; BF8C0F0F
    v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C
    v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E
    exp 15, 3, 1, 1, 1, v0, v1, v0, v0   ; F8001C3F 00000100
    s_endpgm                             ; BF810000

After:

    v_cvt_pkrtz_f16_f32_e32 v0, v0, v1   ; 5E000300
    v_cvt_pkrtz_f16_f32_e32 v1, v2, v3   ; 5E020702
    v_cvt_pkrtz_f16_f32_e32 v2, v4, v5   ; 5E040B04
    v_cvt_pkrtz_f16_f32_e32 v3, v6, v7   ; 5E060F06
    exp 15, 0, 1, 0, 0, v0, v1, v0, v0   ; F800040F 00000100
    v_cvt_pkrtz_f16_f32_e32 v4, v8, v9   ; 5E081308
    v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A
    exp 15, 1, 1, 0, 0, v2, v3, v0, v0   ; F800041F 00000302
    v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C
    v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E
    exp 15, 2, 1, 0, 0, v4, v5, v0, v0   ; F800042F 00000504
    exp 15, 3, 1, 1, 1, v6, v7, v0, v0   ; F8001C3F 00000706
    s_endpgm                             ; BF810000

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
b2b45cecef radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITS
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
ad70c3954b radeonsi: really wait for the second EOP event and not the first one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák
1a1cc67edd gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flag
always set

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Ian Romanick
0b626d7524 nir/algebraic: Optimize fabs(u2f(x))
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass.  The code generated for frexp() uses
fabs, and this resulted in an extra instruction.  Ultimately I ended up
not using frexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Ian Romanick
94296be276 st/mesa: Enable MESA_shader_integer_functions on all GLSL 1.30 platforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Ian Romanick
7cb49b1bd7 i965: Enable MESA_shader_integer_functions on all GLSL 1.30 platforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
5726e57f13 i965: Don't lower uaddCarry and usubBorrow in both GLSL IR and NIR
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
d7a47a76e0 i965: Update assertion to account for Gen < 7
Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
assertion assumed the Gen7+ accumulator rules.  A future patch will
allow this instruction on at least Gen6, so update the assertion.

v2: Use get_lowered_simd_width instead of open coding it.  Suggested by
Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2016-07-19 12:19:29 -07:00
Ian Romanick
3e7cebc8da i965: Use LZD to implement nir_op_find_lsb on Gen < 7
v2: Rebase on changes to previous two patches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
c2019c6c26 i965: Use LZD to implement nir_op_ifind_msb on Gen < 7
v2: Retype LZD source as UD to avoid potential problems with 0x80000000.
Suggested by Matt.  Also update comment about problem values with
LZD(abs(x)).  Suggested by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
de20086eed i965: Use LZD to implement nir_op_ufind_msb
This uses one less instruction.

v2: Move emit_find_msb_using_lzd out of the visitor classes.  Suggested
by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
26c7f04d4a i965: Always enable GL_ARB_shading_language_packing
With the existing lowering passes, the functions from this extension
become a bunch of bit twiddling operations that have always been
supported.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
4b2b6d4d4d i965: Move enable of EXT_shader_integer_mix
This extension does not depend on the Gen.  It only depends on the
availability of GLSL 1.30.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
a2379e44aa glsl: Add lowering pass for ir_bin_imul_high
This isn't the lowering pass you want.  Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32.  Many have 32x16->48.  Any GPU that does, should do the
lowering in the backend.  This is just the thing that will always work.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
1b5477668a glsl: Add lowering pass for ir_unop_find_msb
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
2a381a3c73 glsl: Add lowering pass for ir_unop_find_lsb
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:29 -07:00
Ian Romanick
ad9acb19c3 glsl: Add lowering pass for ir_unop_bitfield_reverse
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
3079dcb00c glsl: Add lowering pass for ir_quadop_bitfield_insert
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
4d6d219b58 glsl: Add lowering pass for ir_triop_bitfield_extract
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
7340be8a01 glsl: Add lowering pass for ir_unop_bit_count
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
806add360f MESA_shader_integer_functions: Allow new function overload matching rules
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
90537e1a0e MESA_shader_integer_functions: Allow implicit int->uint conversions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
65b0346fdb MESA_shader_integer_functions: Expose new built-in functions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
15c4ae461d MESA_shader_integer_functions: Boiler plate extension tracking
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:28 -07:00
Ian Romanick
91482ef226 MESA_shader_integer_functions: Add extension specification
v2: Fix typo in #extension line noticed by Ken.

v3: Update spec status.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:15 -07:00
Samuel Pitoiset
9c63224540 gm107/ir: make use of ADD32I for all immediates
ADD only allows to emit 19-bits immediates.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-07-19 18:07:15 +02:00
Samuel Pitoiset
0904a2ba97 gm107/ir: add missing NEG modifier for IADD32I
Like FADD32I, the NEG modifier of src0 is at position 56.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-07-19 18:07:10 +02:00
Andreas Boll
c482decd4d ddebug: Fix trivial typo in stderr message
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
2016-07-19 16:04:40 +02:00
Andreas Boll
d66cb7c84f configure.ac: Use ${datarootdir} for --with-vulkan-icddir help string too
The help string wasn't updated in cbc37f7.

Fixes: cbc37f7 ("anv: install the intel_icd.json to ${datarootdir} by
default")

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-07-19 16:04:01 +02:00
Eric Engestrom
8ba46fbd9e vl: fix memory leak
CovID: 1363008
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:41:00 +02:00
Boyuan Zhang
60c7450f16 vl: add entry point
Add entrypoint to distinguish H.264 decode and encode. For example, in patch
5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated
for H.264 encoding. So we need to use the entry_point to determine this is
H.264 decode or H.264 encode. We can use config to determine the entrypoint
since config_id is passed to us for VaCreateContext call. However, for
VaDestoyContext call, only context_id is passed to us. So we need to know the
entrypoint in order to not free the pps/sps for encoding case.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:36:46 +02:00
Ilia Mirkin
ed9dd3bcd9 nv50,nvc0: srgb rendering is only available for rgba/bgra
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:

  bin/getteximage-formats init-by-clear-and-render -auto -fbo
  bin/getteximage-formats init-by-rendering -auto -fbo

which previously ran into srgb-encoding differences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-07-18 20:04:17 -04:00
Ilia Mirkin
8e7893eb53 nvc0: add support for BGRA8 images
This is useful for pbo downloads, which are now accelerated with images.
BGRA8 is a moderately common format to do that in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-18 20:04:17 -04:00
Jason Ekstrand
905d7dc4d1 i965: Skip update_texture_surface when the plane doesn't exist
Thanks to rebase fail, recent surface state changes (commits 7e951cd56,
8521ce1a7, and 69c0dc5c53) effectively reverted 727a9b2493 and 367cf3a2e3
which was unintentional.  This should bring it back.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-07-18 16:44:29 -07:00
Timothy Arceri
cd5cbf0f6b glsl: use linked shaders rather than compiled shaders
At this point there is no reason not to be using the linked shaders,
using the linked shaders should be faster and will make things simpler
for upcoming shader cache work.

The previous variable name suggests the linked shaders were intended
to be used here anyway.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-19 09:42:00 +10:00
Lars Hamre
198074a41c The extension is already exposed, this simply marks it as done.
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-07-19 01:20:27 +02:00
Anuj Phogat
22935a3040 docs: Fix typo in extension name
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-18 15:53:24 -07:00
Anuj Phogat
7832e18879 docs: Add support for GL_KHR_texture_compression_astc_sliced_3d
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-18 15:44:18 -07:00
Anuj Phogat
c7b787ef90 Revert "docs: Mark KHR_texture_compression_astc_sliced_3d done on i965"
This reverts commit 82f8c23950.

KHR_texture_compression_astc_sliced_3d is not a requirement for
GLES 3.2.

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>\
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-18 15:43:58 -07:00
Anuj Phogat
82f8c23950 docs: Mark KHR_texture_compression_astc_sliced_3d done on i965
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Anuj Phogat
ac0eb36d8e i965/gen9: Enable KHR_texture_compression_astc_sliced_3d
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Anuj Phogat
15dea5ca82 mesa: Add the infrastructure for KHR_texture_compression_astc_sliced_3d
V2: Drop the changes to gl.xml.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-07-18 14:39:54 -07:00
Christian König
3e1ad846f9 radeon/uvd: add session context buffer for polaris 10/11 v2
This way we have unlimited UVD sessions.

v2: only enable it when kernel supports it as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-18 17:13:17 +02:00
Leo Liu
134d6e4e4f vl/dri3: fix a memory leak from front buffer
Inspired by fix for mem leak of vdpau interop, resource_from_handle
set texture reference count, that need to be decreased and released,
recall there is a similar case for DRI3, that is with VA-API glx
extension, there is temporary TFP(texture from pixmap), we target it
through dma-buf. leak happens when without count down the reference.

Checked and found with mpv vo=opengl case, there only one static TFP,
the leak happens once, but for totem player using gstreamer VA-API glx,
the dynamic TFP for each frame, so leak quite a bit.

This fixes mem leak for mpv and totem.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-18 09:20:40 -04:00
Iago Toral Quiroga
0f2516d88f i965/tes/scalar: fix 64-bit indirect input loads
We totally ignored this before because there were no piglit tests for
indirect loads in tessellation stages with doubles.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-18 09:53:51 +02:00
Iago Toral Quiroga
1737e75bfb i965/tcs/scalar: only update imm_offset for second message in 64bit input loads
Our indirect URB read messages take both a direct and an indirect offset
so when we emit the second message for a 64-bit input load we can just
always incremement the immediate offset, even for the indirect case.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-18 09:53:16 +02:00
Kenneth Graunke
18f67c8a69 i965: Move pulls_bary setting to emit_pixel_interpolator_send().
pulls_bary should be set when the shader uses a pixel interpolator
message.  So, setting it from the function that emits pixel interpolator
messages makes a lot of sense.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-17 19:26:54 -07:00
Kenneth Graunke
7ef7738a61 i965: Write gl_FragCoord directly to the destination.
This patch makes emit_general_interpolation take a destination register
as an argument, and write directly to that.  This is simpler than the
old approach of ralloc'ing a register, writing to that temporary, and
then making the caller emit per-component MOVs to copy it to the actual
destination.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-17 19:26:53 -07:00
Kenneth Graunke
a03812c321 i965: Drop has_pln checks in unlit centroid workaround.
The unlit centroid workaround starts being necessary on Gen6, which
is the first platform with multisampling.  PLN exists on G45+, so all
platforms which need this workaround have PLN.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-17 19:26:53 -07:00
Kenneth Graunke
b94890c19f i965: Drop VARYING_SLOT_FACE special case in barycentric setup.
glsl_to_nir always produces a system value for gl_FrontFacing, rather
than an input.  So there should never be an input with this slot,
making this code dead.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-17 19:26:53 -07:00
Kenneth Graunke
ac1181ffbe compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.
Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input.  Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
  -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
  -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-07-17 19:26:48 -07:00
Dave Airlie
e7d96e7685 virgl: drop pointless leftover init of virgl_transfer_inline_write.
Pointed out by Marek.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-17 06:20:53 +10:00
Ilia Mirkin
062c6b8e54 nv50: fix alphatest for non-blendable formats
The hardware can only do alphatest when using a blendable format. This
means that the various *16 norm formats didn't work with alphatest. It
appears that Talos Principle uses such formats, as well as alpha tests,
for some internal renders, which made them be incorrect. However this
does not appear to affect the final renders, but in a different game it
easily could.

The approach we take is that when alphatests are enabled and a suitable
format is used (which we anticipate is the vast minority of the time),
we insert code into the shader to perform the comparison and discard.
Once inserted, that code lives in the shader forever, and we re-upload
it each time the function changes with a fixed-up compare. To avoid
re-uploading too often, if we switch back to a blendable format, the
test is (effectively) disabled and the hw alphatest functionality is
used.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-16 11:45:30 -04:00
Rob Clark
cc46fc3c09 mesa/st: reduce size of state->st bitmask
In d035d50 this changed to 64b.. which I'm pretty sure was
unintentional.  Revert it back to 32b so the entire state struct
is a nice round 64b.

(Note sure that it would actually be measurable, but I did notice
that check_state() was hot in some benchmarks.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-16 10:00:04 -04:00
Rob Clark
44bbfedbd9 gallium/u_queue: add optional cleanup callback
Adds a second optional cleanup callback, called after the fence is
signaled.  This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence.  In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-16 10:00:04 -04:00
Nicolai Hähnle
6f73c7595f radeonsi: remove the DRAW_PREAMBLE packet
According to firmware guys, the new sequence that we added for Polaris should
work on all CIK parts, and should actually be faster on some parts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-16 13:02:37 +02:00
Brian Paul
b89d0df535 mesa: handle numSamples=0 in _mesa_test_proxy_teximage()
Should fix the regressions reported in bug 96949.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96949
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-15 21:32:24 -07:00
Kenneth Graunke
aa6f60f844 nir: Use dest.ssa.num_components rather than intrin->num_components.
I recently refactored this to share code between load and atomic
lowering.  loads used intrin->num_components, while atomics used
intrin->dest.ssa.num_components.  They should be equivalent, but
Jason wanted me to use the latter.  I missed applying his review.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-15 19:42:43 -07:00
Kenneth Graunke
da3d4a4c56 nir: Update outdated intrinsic const_index comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:10 -07:00
Kenneth Graunke
52e75dcb8c nir: Use nir_intrinsic_set_base in atomic lowering.
This is more readable and also offers assertions that protect against
setting const_index fields on the wrong kind of intrinsic.

Suggested by Jason.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:10 -07:00
Kenneth Graunke
50b9bb9421 nir: Split nir_lower_io's input/output/atomic handling into helpers.
The original function was becoming a bit hard to read, with the details
of creating and filling out load/store/atomic atomics all in one
function.

This patch makes helpers for creating each type of intrinsic, and also
combines them with the *_op() helpers, as they're closely coupled and
not too large.

v2: Minor style nits from Jason.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:10 -07:00
Kenneth Graunke
e12e4af780 nir: Drop bogus nir_var_shader_in case in nir_lower_io's store_op().
This can't happen, the caller asserts that mode is shader_out or shared.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:09 -07:00
Kenneth Graunke
cdfc05ea6e nir: Share destination rewriting and replacement code in IO lowering.
Both loads and atomics had identical code to rewrite destinations,
and all cases had the same two lines to replace instructions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:09 -07:00
Kenneth Graunke
349fe79c9b nir: Share get_io_offset handling in nir_lower_io.
The load/store/atomic cases all duplicated the get_io_offset code, with
a few tiny differences: stores didn't bother checking for per-vertex
inputs, because they can't be stored to, and atomics didn't check at
all, since shared variables aren't per-vertex.

However, it's harmless to check, and allows us to share more code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:09 -07:00
Kenneth Graunke
7171a9a87d nir: Make a 'var' temporary in nir_lower_io.
Less typing and word wrapping issues than intrin->variables[0]->var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:17:09 -07:00
Kenneth Graunke
f05770121f i965: Remove the emit_linterp() helper.
Rather than computing the barycentric mode each time we emit a LINTERP,
we can simply compute it once, as soon as we know we're doing non-flat
interpolation.

At that point, emit_linterp() doesn't do much, so fold it into the
call sites and drop it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:16:54 -07:00
Kenneth Graunke
203243f5ff i965: Reduce the number of fs_reg(brw_reg) calls in LINTERP handling.
A bit tidier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:16:54 -07:00
Kenneth Graunke
eefbbb943e i965: Make a barycentric_mode() helper function.
This combines two copies of basically the same code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-15 17:16:54 -07:00
Kenneth Graunke
783511e605 i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.
brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less
typing and suffers from fewer line wrapping problems.

The enum values themselves don't really benefit from "WM" in the name,
either.  Put "BARYCENTRIC" first instead of at the end and drop "WM".

Generated by:

for file in *.c *.cpp *.h; do sed -i \
   -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \
   -e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \
   -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \
   $file;
done

with a few whitespace changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:16:54 -07:00
Kenneth Graunke
2d6dd30a9b i965: Handle default interpolation modes and locations in NIR.
This consolidates a bunch of hacks in a single place - by setting
the interpolation modes and locations on variables appropriately,
we can simply trust them in the rest of the code.  This avoids
having to handle INTERP_QUALIFIER_NONE, gl_Color overrides,
sample-shading overrides, and Gen4-5 centroid-overrides in a bunch
of places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 17:16:54 -07:00
Jason Ekstrand
745f5778f3 i965/context: Remove some unnecessary vfuncs
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
305044c5b1 i965: Get rid of gen6_surface_state.c
The only useful thing left was gen6_init_vtable_surface_functions which we
can easily put in brw_wm_surface_state.c.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
16fb285946 i965: Use ISL for emitting buffer surface states
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
ee229d1b9c i965/state: Account for the element size in emit_buffer_surface_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
69c0dc5c53 i965/gen4-6: Use the generic ISL-based path for texture surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
2d56959bf8 i965/gen6: Use the generic ISL-based path for renderbuffer surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
efa7668545 i965/gen7: Use the generic ISL-based path for renderbuffer surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
8521ce1a7e i965/gen7: Use the generic ISL-based path for texture surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
26282a01f5 i965/gen8: Use the generic ISL-based path for renderbuffer surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:43 -07:00
Jason Ekstrand
7e951cd562 i965/gen8: Use the generic ISL-based path for texture surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 16:01:41 -07:00
Jason Ekstrand
09b5a71517 i965/state: Add generic surface update functions based on ISL
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:59:33 -07:00
Jason Ekstrand
1abb37baa0 i965/surface_state: Rename brw_update to gen4_update
We're about to add generic versions which work across gens and those should
have the brw name.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:59:33 -07:00
Jason Ekstrand
5a8c89038a i965/state: Use ISL for emitting image surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:59:33 -07:00
Jason Ekstrand
7a21d1bfc3 i965/blorp: Use a generic ISL path for texture surfaces on gen8
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-07-15 15:59:33 -07:00
Jason Ekstrand
5cf665afa1 i965/state: Add a helper for emitting a surface state using isl
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:59:24 -07:00
Jason Ekstrand
73ae4ec294 i965/blorp: Use the generic ISL path for texture surfaces on gen6
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:49 -07:00
Jason Ekstrand
cc78061003 i965/blorp: Use the generic ISL path for renderbuffer surfaces on gen6
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:49 -07:00
Jason Ekstrand
366a6a659d i965/blorp: Use the generic ISL path for texture surfaces on gen7
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:49 -07:00
Jason Ekstrand
3339ef42cf i965/blorp: Use the generic ISL path for renderbuffer surfaces on gen7
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
16022352ea i965/blorp: Use the generic ISL path for renderbuffer surfaces on gen8-9
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
6553dc0d70 i965/blorp: Add a generic ISL-based surface state emit path
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
e974456d4f i965/miptree: Add a helper for getting the aux isl_surf from a miptree
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
1e45349e82 i965/miptree: Add a helper for getting the ISL clear color from a miptree
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
f665a3da72 i965/miptree: Add a helper for getting an isl_surf from a miptree
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
e2dd3ce976 i965: Add an isl_device to the brw_context
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
4f282ff67e isl/state: Add support for OffsetX/Y in surface state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
f8984b918a isl: Add support for filling out surface states all the way back to gen4
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
815847e2b3 isl: Add an ISL_DEV_IS_G4X macro
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
27883f8cbc genxml: Add macros and #includes for gens 4-6
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
ba798ac6b1 genxml: Make X/Y Offset field of SURFACE_STATE a uint
THe offset type has special implications that it's intended to be some form
of aligned memory address.  These assumptions allow it to handle the case
where there is some alignment requirement on the offset and the bottom bits
are used for other things.  However, the offsets in the surface state field
are really just unsigned integers.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:48 -07:00
Jason Ekstrand
9a999ceab8 genxml: Add enough XML for gens 4, 4.5, and 5 to get SURFACE_STATE
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:47 -07:00
Jason Ekstrand
0f6eb5dea0 isl/state: Divide the aux qpitch by 4
The field is in multiples of 4 like regular QPitch.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:47 -07:00
Jason Ekstrand
2c6ca658e7 isl: Fix the bs assertion in isl_tiling_get_info
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 15:53:47 -07:00
Jason Ekstrand
593731ea3c anv: Handle VK_WHOLE_SIZE properly for buffer views
The old calculation, which used view->offset, encorporated buffer->offset
into the size calculation where it doesn't belong.  This meant that, if
buffer->offset > buffer->size, you would always get a negative size.  This
fixes 170 dEQP-VK.renderpass.attachment.* Vulkan CTS tests on Haswell.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 15:48:21 -07:00
Jason Ekstrand
827405f072 anv: Add an align_down_npot_u32 helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 15:48:21 -07:00
Jason Ekstrand
f124f4a394 anv: Enable independentBlend on gen7
We can totally do it, we were just only setting up one BLEND_STATE and, now
that the code is unified with gen8, we should be handling it correctly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 15:48:21 -07:00
Jason Ekstrand
a2e7b2e653 anv/pipeline: Unify blend state setup between gen7 and gen8
This fixes all 674 broken dEQP-VK.pipeline.blend Vulkan CTS tests on
Haswell.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 15:48:21 -07:00
Jason Ekstrand
aaa202ebe7 genxml: Make gen6-7 blending look more like gen8
This renames BLEND_STATE to BLEND_STATE_ENTRY and adds an new struct
BLEND_STATE which is just an array of 8 BLEND_STATE_ENTRYs.  This will make
it much easier to write gen-agnostic blend handling code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 15:48:21 -07:00
Eric Anholt
3bcd0f1912 vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.
To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary
miptree.  However, if a single level is being selected, we can use the
existing miptree and force all the sampling to be from that particular
level.

This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses
base levels in the blit implementation in gallium.  Improves "glmark2 -b
terrain" from 2 fps to 3 (perhaps some more precision would be useful?),
and cuts its CPU usage during the benchmarking from ~30% to ~10% (total
CPU time from 8.8s to 7.6s).
2016-07-15 13:54:00 -07:00
Eric Anholt
88152d7dc0 vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags.
The compiler uses the per-stage flags already, so it didn't need this.
vc4_uniforms was using it, so just replace it with both of the stage flags
for now.
2016-07-15 13:54:00 -07:00
Eric Anholt
5db82e0c89 vc4: Remove dead dirty_samplers field.
We use a big VC4_DIRTY_FRAGTEX/VC4_DIRTY_VERTEX on the stage, instead.
2016-07-15 13:54:00 -07:00
Eric Anholt
219b75deb9 vc4: Turn on control flow support in the simulator environment.
We can't merge the non-simulator support until we merge the kernel side and
get a new libdrm release.
2016-07-15 13:54:00 -07:00
Brian Paul
9a23a177b9 mesa: handle numLevels, numSamples in _mesa_test_proxy_teximage()
If numSamples > 0, we can compute the size of the whole mipmapped texture.
That's the case for glTexStorage(GL_PROXY_TEXTURE_x).

Also, multiply the texture size by numSamples for MSAA textures.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-15 14:24:34 -06:00
Brian Paul
39183ea971 mesa: add proxy texture targets in _mesa_next_mipmap_level_size()
So we can use it for computing size of proxy textures.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-15 14:24:34 -06:00
Brian Paul
0ac9f25032 mesa: add numLevels, numSamples to Driver.TestProxyTexImage()
So that the function can work properly with glTexStorage(), where we know
how many mipmap levels there are.  And so we can compute storage for MSAA
textures.

Also, remove the obsolete texture border parameter.

A subsequent patch will update _mesa_test_proxy_teximage() to use these
new parameters.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-15 14:24:34 -06:00
Brian Paul
e477d92c94 mesa: use _mesa_clear_texture_image() in clear_texture_fields()
This avoids a failed assert(img->_BaseFormat != -1) in
init_teximage_fields_ms() because the internalFormat argument is GL_NONE.
This was hit when using glTexStorage() to do a proxy texture test.

Fixes a failure with the updated Piglit tex3d-maxsize test.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-07-15 14:24:34 -06:00
Charmaine Lee
6b7923ee46 svga: avoid ubinding render targets that have already been unbound
Fixed the remaining redundant SetRenderTargets command emission.

Tested with lightsMark2008, Heaven, mtt piglit, glretrace, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-15 14:24:34 -06:00
Neha Bhende
4f633d110a svga: dump code for GenMips.
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-15 14:24:33 -06:00
Jon Turney
c7151401e0 Disable use of weak in threads_posix.h on Cygwin
Weak doesn't work the same on PE/COFF as on ELF, they are only weak
references.  Specifically, since nothing else pulls in the object which
contains pthread_mutexattr_init() (and coming from the C library, that is
the only thing that object contains), means that it ends up as 0

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Jon Turney
7d8edbaee7 configure: Don't require pthread-stubs on Cygwin
Commit 1f4869a2 unconditionally requires pthread-stubs.  Unfortunately, the
cleverness that pthread-stubs is doesn't work with PE/COFF, and historically
Cygwin doesn't have a pthread-stubs.pc.

Don't require pthread-stubs on Cygwin.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Yaakov Selkowitz
5d303867f5 Use correct names for dlopen()ed files on Cygwin
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Yaakov Selkowitz
3c18c16ecf configure: Define _GNU_SOURCE for Cygwin as well
Cygwin headers are now a bit more correct in handling feature test macros,
so use _GNU_SOURCE when building for Cygwin, as well.

(Notwithstanding f381c27c, we should probably have always been using
_GNU_SOURCE, since asprintf() is used by mesa in places)

Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Nanley Chery
1fc739d28e Revert "isl: Don't filter tiling flags if a specific tiling bit is set"
This reverts commit 091f1da902 .

Although a user may specify a specfic tiling bit, ISL should still
prevent incompatible tiling/surface combinations.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-15 10:35:40 -07:00
Nanley Chery
e179fee049 anv/blit2d: Copy with stencil sources when needed
In the next patch, ISL will unconditionally perform verification of a
surface's tiling and usage. Since it will require that w-tiled images
be stencil buffers, create a stencil surface to copy from a
w-tiled/stencil surface.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
1ef80b26d7 anv/image: Fix initialization of the ISL tiling
If an internal user creates an image with Vulkan tiling VK_IMAGE_TILING_OPTIMAL
and an ISL tiling that isn't set, ISL will fail to create the image as
anv_image_create_info::isl_tiling_flags will be an invalid value.

Correct this by making anv_image_create_info::isl_tiling_flags an opt-in,
filtering bitmask, that allows the caller to specify which ISL tilings are
acceptable, but not contradictory to the Vulkan tiling.

Opt-out of filtering for vkCreateImage.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
00caba4152 isl: Fix isl_tiling_is_any_y()
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
a5748cb920 anv/device: Fix max buffer range limits
Set limits that are consistent with ISL's assertions in
isl_genX(buffer_fill_state_s)() and Anvil's format-DescriptorType
mapping in anv_isl_format_for_descriptor_type().

Fixes the following new crucible tests:
* stress.limits.buffer-update.range.uniform
* stress.limits.buffer-update.range.storage

These tests are in this patch: https://patchwork.freedesktop.org/patch/98726/

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
028f6d8317 isl: Fix assert on raw buffer surface state size
See inline PRM reference.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
96c664cd03 anv/cmd_buffer: Simplify range member assignment
A ternary is clearer because the range member is assigned one of two values
dependant on one condition.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
1a7344531f anv/cmd_buffer: Remove unused variable
This became unused due to commit 612e35b2c6 .

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Nanley Chery
fd16e64321 anv/descriptor_set: Fix binding partly undefined descriptor sets
Section 13.2.3. of the Vulkan spec requires that implementations be able to
bind sparsely-defined Descriptor Sets without any errors or exceptions.

When binding a descriptor set that contains a dynamic buffer binding/descriptor,
the driver attempts to dereference the descriptor's buffer_view field if it is
non-NULL. It currently segfaults on undefined descriptors as this field is never
zero-initialized. Zero undefined descriptors to avoid segfaulting. This
solution was suggested by Jason Ekstrand.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96850
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-15 10:35:40 -07:00
Brian Paul
50a669de4e svga: handle mismatched number of samplers, sampler views
in svga_init_shader_key_common().  Since the CSO module only tracks
sampler views for fragment shaders, the number of samplers and sampler
views can be mismatched for other types of shaders.  This situation
triggered an assertion in Chrome with maps.google.com

This patch adds defensive code to handle that situation.

Fixes VMware bug 1694027
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-07-15 11:05:18 -06:00
Leo Liu
b9d10e79c8 st/omx/enc: check uninitialized list from task release
The uninitialized list should be checked and returned.

Thank Julien for the notification and suggested fix.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-15 09:17:36 -04:00
Samuel Pitoiset
ea6b236ab1 nv50/ir: add missing string for SV_WORK_DIM
Fixes: 2aa1197 ("nouveau: Add support for SV_WORK_DIM")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2016-07-14 22:28:39 +02:00
Marek Olšák
f84e9d749f Revert "radeon/llvm: Use alloca instructions for larger arrays"
This reverts commit 513fccdfb6.

Bioshock Infinite hangs with that.
2016-07-14 22:15:08 +02:00
Jan Vesely
489bb5473b r600,compute: Reserve vtx 3 for kernel arguments
Using vtx 0 does not work for dynamic offsets.

v2: add explanatory comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-07-14 16:04:50 -04:00
Marek Olšák
33eddde4a7 radeon/uvd: fail to create a decoder if RUVD_MSG_CREATE submission fails
This is the bare minimum for reporting the error to the user.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Marek Olšák
85388652f9 winsys/amdgpu: return an error on IB submission failures
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Marek Olšák
a7d84f7731 gallium/radeon: add a return value to cs_flush
Required by our UVD code.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Jason Ekstrand
b919100d61 glsl/types: Use _mesa_hash_data for hashing function types
This is way better than the stupid string approach especially since you
could overflow the string.  Again, I thought I had something better at one
point but it obviously got lost.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-14 10:48:25 -07:00
Jason Ekstrand
11ac1c4dbb glsl/types: Fix function type comparison function
It was returning true if the function types have different lengths rather
than false.  This was new with the SPIR-V to NIR pass and I thought I'd
fixed it a while ago but it may have gotten lost in rebasing somewhere.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-14 10:48:11 -07:00
francians@gmail.com
3db7f3458f freedreno/a4xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
francians@gmail.com
948822018f freedreno/a3xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
francians@gmail.com
cf2f345356 freedreno/a2xx: Fix sign compare warnings
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-14 09:55:02 -04:00
Boyuan Zhang
23c5e8bc58 radeon/vce: handle newly added parameters
Replace the previous hardcoded value with newly defined parameters

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:21 +02:00
Boyuan Zhang
5490068fb1 st/omx: assign previous values to new structure
Assign previously hardcoded values for OMX to newly defined
structure. As a result, OMX behaviour will not change at all.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:14 +02:00
Boyuan Zhang
b86bf4b568 vl: add parameters for VAAPI encode
Allow to specify more parameters in the encoding interface
which previously just hardcoded in the encoder

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 09:49:07 +02:00
Christian König
9ce52baf7f st/mesa: fix reference counting bug in st_vdpau
Otherwise we leak the resources created for the DMA-buf descriptors.

Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Tested-and-Reviewed by: Leo Liu <leo.liu@amd.com>
Ack-by: Tom St Denis <tom.stdenis@amd.com>
2016-07-14 09:33:44 +02:00
Eric Anholt
9194473dd2 vc4: Emit resets of the uniform stream at the starts of blocks.
If a block might be entered from multiple locations, then the uniform
stream will (probably) be at different points, and we need to make sure
that it's pointing where we expect it to be.  The kernel also enforces
that any block reading a uniform resets uniforms, to prevent reading
outside of the uniform stream by using looping.
2016-07-13 23:54:15 -07:00
Eric Anholt
44df061aaa vc4: Add support for scheduling of branch instructions.
For now we don't fill the delay slots, and instead just drop in NOPs.
2016-07-13 23:54:15 -07:00
Eric Anholt
a59da513d3 vc4: Move the QPU instructions to schedule into each block.
We'll want to schedule them individually, to handle delay slots.
2016-07-13 23:54:15 -07:00
Eric Anholt
37ecc61662 vc4: Disable vc4_opt_vpm in the presence of control flow.
It's a really valuable pass currently, but it will be a mess to rewrite
for control flow.  For now, just disable it if we have multiple blocks
present.
2016-07-13 23:54:15 -07:00
Eric Anholt
ee69cfd11d vc4: Convert vc4_opt_dead_code to work in the presence of control flow.
With control flow, we can't be sure that we'll see the uses of a variable
before its def as we walk backwards.  Given that NIR is eliminating our
long chains of dead code, a simple solution for now seems fine.

This slightly changes the order of some optimizations, and so an opt_vpm
happens before opt_dce, causing 3 dead MOVs to be turned into dead FMAXes
in Minecraft:

instructions in affected programs:     52 -> 54 (3.85%)
2016-07-13 23:54:15 -07:00
Eric Anholt
4e797bd98f vc4: Update copy propagation for control flow.
Previously, we could assume that a MOV from a temp was always an available
copy, because all temps were SSA in NIR, and their non-SSA state in QIR
was just due to the fact that they were from a bcsel or pack_unorm_4x8, so
we could use the current value of the temp after that series of QIR
instructions to define it.

However, this is no longer the case with control flow.  Instead, we track
a new array of MOVs defined within the block that haven't had their source
or dest killed yet, and use that primarily.  We fall back to looking
through the QIR defs array to handle across-block MOVs, but now require
that copies from the SSA defs have an SSA src as well.
2016-07-13 23:54:15 -07:00
Samuel Iglesias Gonsálvez
94135e8736 i965/fs: emit DIM instruction to load 64-bit immediates in HSW
v2 (Matt):
- Use brw_imm_df() as source argument of DIM instruction.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-14 08:11:50 +02:00
Samuel Iglesias Gonsálvez
0534863c47 i965/eu: set DF imm value to the source of DIM
According to HSW's PRM, vol02b, the DIM instruction has the following
restriction:

"Restriction : src0 must be immediate. src0 must specify the :f (F, Float)
type encoding but is an immediate 64-bit DF (Double Float) value. dst
must have type DF."

This commit allows to upload the immediate 64-bit DF value to the source
of a DIM instruction even when it is of float type encoding.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-14 08:06:01 +02:00
Samuel Iglesias Gonsálvez
6e28976d35 i965: enable the emission of the DIM instruction
v2 (Matt):
- Take a DF source argument for the DIM instruction emission
in the visitors.
- Indentation.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-14 08:06:01 +02:00
Jason Ekstrand
b9e99282a6 anv: Add a stub for CmdCopyQueryPoolResults on Ivy Bridge
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-13 20:31:27 -07:00
Timothy Arceri
a738732abf i965: fix compiler warnings for 32bit build
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-14 12:03:59 +10:00
Tim Rowley
29f53d7937 Revert "gallium: Force blend color to 16-byte alignment"
This reverts commit d8d6091a84.

Heap allocations may be only 8-byte aligned on 32-bit system, and so having
members with 16-byte alignment (such as in the case where pipe_blend_color is
embedded in radeonsi's si_context) is undefined behavior which indeed causes
crashes when compiled with gcc -O3.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96835
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Acked-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-07-13 13:55:33 -05:00
Jason Ekstrand
48ed8b6f26 isl/state: Add support for handling auxiliary surfaces
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
76e2dcc131 isl: Add an auxiliary surface usage enum
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
3ab3d97ac9 isl: Add support for color control surfaces
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
219024b9a7 isl: Add support for multisample compression surfaces
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
33dc8549fb isl: Add support for HiZ surfaces
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
fc3650a0a9 isl: Kill off isl_format_layout::bs
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
1f0433f075 isl: Take bpb rather than bs in tiling_get_info
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
01855d7331 isl: Use bpb in a few places where it's more natural than bs
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
8c76b9bdce isl: Use bpb for determining YUV image padding
When we initially dropped bpb in favor of bs, we accidentally didn't change
this one line properly.  This brings it back to what it should be.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
cf9ff082b4 isl: Bring back isl_format_layout::bpb
A while ago we got rid of the bits-per-block because we thought we didn't
need it.  We're about to introduce some very useful 1 and 2-bit formats so
we really should be able to handle them again.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
0bd3a7e931 isl: Change the physical size of a W-tile to 128x32
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
4b62c19c32 isl: Rework the way we define tile sizes.
This is based on a very long set of discussions between Chad and myself
about how we should properly represent HiZ and CCS buffers.  The end result
of that discussion was that a tiling actually has two different sizes, a
logical size in elements, and a physical size in bytes and rows.  This
commit reworks ISL's pitch and size calculations to work in terms of these
two sizes.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
7270bd0607 isl: Rework the way we handle surface padding
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
a52f26d6e8 isl: Use ARRAY_PITCH_SPAN_FULL for depth/stencil surfaces on gen7
We helpfully inserted a PRM quotation about how we need to use
ARRAY_PITCH_SPAN_FULL and then set it to COMPACT.  Oops...

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
0d48ac627a isl: Stop multiplying height by block size
The row pitch already specifies the size of a row of elements.
Multiplying by the block height simply causes us to allocate as muc as 12
times more memory than needed for compressed textures.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
58c1b1088b isl: Get rid of tiling_get_extent
It was unused

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-13 11:47:37 -07:00
Jason Ekstrand
49476576dd nir/spirv: Don't multiply the push constant block size by 4
I have no idea why we were multiplying by 4 before.  The offsets we get
from SPIR-V are in bytes and so is nir->num_uniforms so there's no need to
do any adjustment whatsoever.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-13 11:35:29 -07:00
Jason Ekstrand
1eed753ee8 anv/pipeline: Assert that the number of uniforms from NIR fits 2016-07-13 11:35:24 -07:00
Marek Olšák
0f7a6ea5e7 radeonsi: report accurate SGPR and VGPR spills
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
d227dbe272 radeonsi: add a workaround for a compute VGPR-usage LLVM bug
v2: use abort(), describe which LLVM version is affected

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
f4d1de7f86 radeonsi: use LLVMGetTypeKind to tell if an input is an array of descriptors
just a cleanup

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
785073ed0b radeonsi: replace !tbaa with !invariant.load
no change in generated code thanks to dereferenceable(n)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
348b9a5b1c radeonsi: set dereferenceable attribute on descriptor arrays
This allows moving the loads arbitrarily in the Sinking pass.

26002 shaders in 14643 tests
Totals:
SGPRS: 2080160 -> 2080160 (0.00 %)
VGPRS: 798875 -> 797826 (-0.13 %)
Spilled SGPRs: 108485 -> 79165 (-27.03 %)
Spilled VGPRs: 327 -> 327 (0.00 %)
Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread
Code Size: 36127192 -> 35559780 (-1.57 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 212464 -> 212672 (0.10 %)
Wait states: 0 -> 0 (0.00 %)

 PERCENTAGES / App    Shaders    SGPRs     VGPRs  SpillSGPR SpillVGPR  Scratch   CodeSize  MaxWaves    Waits
 (unknown)                  4     .         .         .         .         .         .         .         .
 0ad                        6     .         .         .         .         .         .         .         .
 alien_isolation         2938     .        0.04 %   -8.53 %     .         .       -0.71 %   -0.06 %     .
 anholt                    10     .         .         .         .         .         .         .         .
 batman_arkham_origins    589     .       -0.58 %  -79.54 %     .         .       -6.72 %    0.57 %     .
 bioshock-infinite       1769     .       -0.65 %  -89.32 %     .         .       -4.73 %    0.48 %     .
 borderlands2            3968     .       -0.31 %  -51.21 %     .         .       -4.09 %    0.22 %     .
 brutal-legend            338     .       -0.03 %   -2.95 %     .         .       -0.06 %     .         .
 civilization_beyond..    116     .         .      -14.17 %     .         .       -0.88 %     .         .
 counter_strike_glob..   1142     .         .         .         .         .         .         .         .
 dirt-showdown            541     .       -0.56 %  -40.14 %     .       -3.45 %   -1.82 %    0.35 %     .
 dolphin                   22     .         .         .         .         .        0.16 %     .         .
 dota2                   1747     .         .         .         .         .        0.01 %     .         .
 europa_universalis_4      76     .       -0.23 %  -42.11 %     .         .       -0.96 %     .         .
 f1-2015                  774     .       -0.09 %  -28.89 %     .         .       -2.60 %    0.09 %     .
 furmark-0.7.0              4     .         .         .         .         .         .         .         .
 gimark-0.7.0              10     .         .         .         .         .         .         .         .
 glamor                    16     .         .         .         .         .         .         .         .
 humus-celshading           4     .         .         .         .         .         .         .         .
 humus-domino               6     .         .         .         .         .         .         .         .
 humus-dynamicbranching    24     .        0.71 %     .         .         .        0.29 %   -0.45 %     .
 humus-hdr                 10     .         .         .         .         .         .         .         .
 humus-portals              2     .         .         .         .         .         .         .         .
 humus-volumetricfog..      6     .         .         .         .         .         .         .         .
 left_4_dead_2           1762     .         .         .         .         .         .         .         .
 metro_2033_redux        2670     .       -0.10 %   -7.15 %     .         .       -0.03 %     .         .
 nexuiz                    80     .         .         .         .         .         .         .         .
 pixmark-julia-fp32         2     .         .         .         .         .         .         .         .
 pixmark-julia-fp64         2     .         .         .         .         .         .         .         .
 pixmark-piano-0.7.0        2     .         .         .         .         .         .         .         .
 pixmark-volplosion-..      2     .         .         .         .         .         .         .         .
 plot3d-0.7.0               8     .         .         .         .         .         .         .         .
 portal                   474     .         .         .         .         .         .         .         .
 sauerbraten                7     .         .         .         .         .         .         .         .
 serious_sam_3_bfe        392     .         .      -13.20 %     .         .       -1.81 %     .         .
 supertuxkart               4     .         .         .         .         .         .         .         .
 talos_principle          324     .       -0.21 %  -18.39 %     .         .       -2.73 %    0.14 %     .
 team_fortress_2          808     .         .         .         .         .         .         .         .
 tesseract                430     .        0.08 %  -68.57 %     .         .       -0.45 %     .         .
 tessmark-0.7.0             6     .         .         .         .         .         .         .         .
 thea                     172     .         .         .         .         .        0.03 %     .         .
 ue4_effects_cave         299     .       -0.04 %  -10.15 %     .         .       -0.25 %    0.04 %     .
 ue4_elemental            586     .       -0.02 %  -13.93 %     .         .       -0.13 %    0.02 %     .
 ue4_lightroom_inter..     74     .       -0.17 %  -70.00 %     .         .       -1.27 %     .         .
 ue4_realistic_rende..     92     .         .      -32.58 %     .         .       -0.35 %     .         .
 unigine_heaven           322     .        0.12 %  -54.17 %     .         .       -1.42 %   -0.12 %     .
 unigine_sanctuary        264     .         .         .         .         .         .         .         .
 unigine_tropics          210     .         .         .         .         .         .         .         .
 unigine_valley           278     .       -0.15 %  -40.74 %     .         .       -2.00 %    0.09 %     .
 unity                     72     .         .         .         .         .        0.03 %     .         .
 warsow                   176     .         .         .         .         .         .         .         .
 warzone2100                4     .         .         .         .         .        0.13 %     .         .
 witcher2                1040     .       -0.03 %  -86.28 %     .         .       -0.28 %    0.01 %     .
 xcom_enemy_within       1236     .       -0.24 %  -63.54 %     .         .       -0.93 %    0.18 %     .
 yofrankie                 82     .       -0.61 % -100.00 %     .         .       -0.83 %    0.41 %     .
 -----------------------------------------------------------------------------------------------------------
 Total                  26002     .       -0.13 %  -27.03 %     .       -0.24 %   -1.57 %    0.10 %     .

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
6596ecf8c5 gallivm: add helper lp_add_attr_dereferenceable
Not sure if this is the right way to do it, but it seems to work.

v2: make it a no-op on LLVM <= 3.5

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
bccf9de4df radeonsi: clean up shader value metadata code
No change in behavior.
BTW, tbaa_md_kind == 1, which was the magic number in the code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
d7d7e6adbe radeonsi: remove LLVMNoUnwindAttribute uses
always set by gallivm

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
c4807505c0 radeonsi: fix a typo in SI_PARAM_LINEAR_* handling
introduced in 476e9cee1d

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
f2f573e777 gallium/radeon: normalize the code style
no change in behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Marek Olšák
ed3912d0da radeonsi: just save buffer sizes instead of buffers while recording IBs
whole buffer objects are not needed

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Jon Turney
fc8139b146 Add c99_alloca.h include to fix compilation on Cygwin
Fix compilation on Cygwin, since 50b22354, by adding c99_alloca.h include,
which should know how to portably make the alloc() prototype available.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-13 16:11:36 +01:00
Topi Pohjolainen
7d29fee4a8 i965/blorp: Cleanup leftovers from push constant disabling
Setup for pixel shader push constants is the same as for other
stages. Note that on gen8+ the if-else branches were identical
and the generation check for packet size redundant.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-13 12:10:03 +03:00
Topi Pohjolainen
26778da571 i965/blorp/gen7+: Bring back push constant setup
This is partial revert of commit cc2d0e64.

It looks that even though blorp disables a stage the corresponding
3DSTATE_CONSTANT_XS packet is needed to be programmed. Hardware
seems to try to fetch the constants even for disabled stages.
Therefore care needs to be taken that the constant buffer is
set up properly. Blorp will continue to trash it into non-existing
such as before.
It is possible that this could be omitted on SKL where the
constant buffer is considered when the corresponding binding table
settings are changed. Bspec:

  "The 3DSTATE_CONSTANT_* command is not committed to the shader
   unit until the corresponding (same shader)
   3DSTATE_BINDING_TABLE_POINTER_* command is parsed."

However, as CONSTANT_XS packet itself does not seem to stall on its
own, it is safer to emit the packets for SKL also.

Possible alternative to blorp trashing could have been to setup
defaults in the beginning of each batch buffer. However, hardware
doesn't seem to tolerate these packets being programmed multiple
times per primitive. Bspec for IVB:

  "It is invalid to execute this command more than once between
   3D_PRIMITIVE commands."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96878
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-13 12:09:35 +03:00
Nicolai Hähnle
65d48fcf8c radeonsi: silence Coverity warning
Coverity's analysis is too weak to understand that
r600_init_flushed_depth(_, _, NULL) only returns true when
flushed_depth_texture was assigned a non-NULL value.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-13 09:52:39 +02:00
Samuel Iglesias Gonsálvez
a2bd7334ed i965/fs: do d2x lowering before simd splitting
So that we can have gen7 split large writes produced by this lowering pass.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-13 07:09:41 +02:00
Iago Toral Quiroga
376d7ee587 i965/fs: do pack lowering before simd splitting
So that we can have gen7 split large writes produced by the pack lowering.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-13 07:09:41 +02:00
Samuel Iglesias Gonsálvez
9979a3f2ac i965/fs: do not require force_writemask_all with exec_size 4
So far we only used instructions with this size in situations where we
did not operate per-channel and we wanted to ignore the execution mask,
but gen7 fp64 will need to emit code with a width of 4 that needs
normal execution masking.

v2:
- Modify the assert instead of deleting it (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-13 07:09:41 +02:00
Iago Toral Quiroga
aa4796ae81 i965/fs/gen7: split instructions that run into exec masking bugs
In fp64 we can produce code like this:

mov(16) vgrf2<2>:UD, vgrf3<2>:UD

That our simd lowering pass would typically split in instructions with a
width of 8, writing to two consecutive registers each. Unfortunately, gen7
hardware has a bug affecting execution masking and as a result, the
second GRF register write won't work properly. Curro verified this:

"The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
 (where QtrCtrl is the 8-bit quarter of the execution mask signals
 specified in the instruction control fields) for the second
 compressed half of any single-precision instruction (for
 double-precision instructions it's hardwired to use NibCtrl+1,
 at least on HSW), which means that the EU will apply the wrong
 execution controls for the second sequential GRF write if the number
 of channels per GRF is not exactly eight in single-precision mode (or
 four in double-float mode)."

In practice, this means that we cannot write more than one
consecutive GRF in a single instruction if the number of channels
per GRF is not exactly eight in single-precision mode (or four
in double-float mode).

This patch makes our SIMD lowering pass split this kind of instructions
so that the split versions only write to a single register. In the
example above this means that we split the write in 4 instructions, each
one writing 4 UD elements (width = 4) to a single register.

v2 (Curro):
 - Make explicit that the thing about hardwiring NibCtrl+1 for the second
   compressed half is known to happen in Haswell and the issue with IVB
   might not be exactly the same.
 - Assign max_width instead of returning early so that we can handle
   multiple restrictions affecting to the same instruction.
 - Avoid division by 0 if the instruction does not write any registers.
 - Ignore instructions what have WE_all set.
 - Use the instruction execution type size instead of the dst type size.

v3 (Curro):
 - Move the implementation down so it is not placed in the middle of another
   workaround.
 - Declare channels_per_grf as const.
 - Don't break the loop early if we find a BAD_FILE source.
 - Fix the number of channels that the hardware shifts for the second half
   of a compressed instruction to be 8 in single precision and 4 in double
   precision.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-13 07:09:41 +02:00
Iago Toral Quiroga
87a13f598b i965/fs: use the new helper function to create double immediates
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-13 07:09:41 +02:00
Iago Toral Quiroga
9e196e907e i965/fs: add a helper function to create double immediates
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.

v2:
- Define setup_imm_df() as an independent function (Curro)
- Create a specific builder to get rid of some instruction field
  assignments (Curro).

v3:
- Get devinfo from builder (Kenneth)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-13 07:09:41 +02:00
Eric Anholt
93794145dd vc4: Validate QPU uniform pointer updates. 2016-07-12 17:42:42 -07:00
Eric Anholt
420845acb2 vc4: Add support for NIR loops and break/continue. 2016-07-12 17:42:42 -07:00
Eric Anholt
0adf2ec0ee vc4: Add support for emitting NIR IF nodes. 2016-07-12 17:42:42 -07:00
Eric Anholt
f505f66cd5 vc4: Add support for storing to NIR registers in a non-SSA fashion.
Previously, there were occasionally NIR registers in our programs, but
they were always actually used SSA-only.  Now that we're trying to support
control flow, we need to actually conditionally move to registers based on
whether channels are active or not.
2016-07-12 17:42:41 -07:00
Eric Anholt
ab1d40b84a vc4: Add a flag in the screen to track control flow support.
For now it's still always false, but I need it in place for kernel
backwards compat support as I extend the backend for control flow.
2016-07-12 17:42:40 -07:00
Eric Anholt
05bcd9dd96 vc4: Define a QIR branch instruction
This uses the branch condition code in inst->cond to jump to either
successor[0] (condition matches) or successor[0] (condition doesn't
match).
2016-07-12 17:42:40 -07:00
Eric Anholt
54800bb71c vc4: Add kernel support for branching in shader validation.
We're already checking that branch instructions are within the
contents of the shader and the proper PROG_END sequence is present.
The other thing we need in the presence of branching is to verify that
the shader doesn't overflow past the end of the uniforms stream.

To do that, we require that at the start of any basic block reading
uniforms have the following instructions:

load_imm temp, <offset within uniform stream>
add unif_addr, temp, unif

The instructions are generated by userspace, and the kernel verifies
that the load_imm is of the expected offset, and that the add adds it
to a uniform.  We track which uniform in the stream that is, and at
draw call time fix up the uniform stream to have the address of the
start of the shader's uniforms for that draw call.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-07-12 17:42:39 -07:00
Eric Anholt
e2d7760df5 vc4: Add a bitmap of branch targets in kernel validation.
This isn't used yet, it's just a first step toward loop validation.
During the main parsing of instructions, we need to know when we hit a new
basic block so that we can reset validated state.
2016-07-12 17:42:38 -07:00
Eric Anholt
24095c8b3b vc4: Track the current instruction into the validation_state.
This reduces how much we need to pass around as arguments, which was
becoming more of a problem with looping validation.
2016-07-12 17:42:38 -07:00
Eric Anholt
c73aa0a09b vc4: Add QPU support for generating BRANCH instructions. 2016-07-12 17:42:38 -07:00
Eric Anholt
6d34345001 vc4: Print live variable start/ends during QIR dumping.
This only happens when live variables are set up, which is not in the
normal dump, but is set up when we've failed to register allocate.
2016-07-12 17:42:37 -07:00
Eric Anholt
89918c1e74 vc4: Implement live intervals using a CFG.
Right now our CFG is always a trivial single basic block, but that will
change when enable loops.
2016-07-12 17:41:59 -07:00
Eric Anholt
f2eb8e3052 vc4: Make vc4_qir_schedule handle each block in the program.
Basically we just treat each block independently.  The only inter-block
scheduling I can think of that would be be interesting would be to move
texture result collection to after a short loop/if block that doesn't do
texturing.  However, the kernel disallows that as part of its security
validation.
2016-07-12 15:47:26 -07:00
Eric Anholt
46ec025ba9 vc4: Convert uniforms lowering to work with multiple blocks.
We still decide which uniform to lower based on how many
instructions-that-need-lowering use that uniform, but now we emit a new
temporary uniform load in each of the basic blocks containing an
instruction being lowered.

This commit is best reviewed with diff -b.
2016-07-12 15:47:26 -07:00
Eric Anholt
0c923e6c33 vc4: Convert vc4_opt_peephole_sf to work with control flow.
We need to apply the peephole pass to each of the blocks in the program.
We don't do dataflow analysis for SF across blocks, but we also don't
generate code that would need us to do so.
2016-07-12 15:47:26 -07:00
Eric Anholt
6c1f834a23 vc4: Create a basic block structure and move the instructions into it.
The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
2016-07-12 15:47:26 -07:00
Eric Anholt
d3cdbf6fd8 vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.
We have the prior list_foreach() all over the code, but I need to move
where instructions live as part of adding support for control flow.  Start
by just converting to a helper iterator macro.  (The simpler
"qir_for_each_inst()" will be used for the for-each-inst-in-a-block
iterator macro later)
2016-07-12 15:47:25 -07:00
Eric Anholt
6858f05924 vc4: Also enable phi elimination.
This avoids a bunch of code gen regressions when enabling loops in vc4.

Prior to that, the GLSL that would have generated these optimizable phi
nodes was being lowered to csels between either (undef, a) or (a, a), and
those were being dealt with by nir_opt_undef and nir_opt_algebraic.
2016-07-12 15:47:25 -07:00
Eric Engestrom
e8959ba7af vc4: fix memory leak
The allocation has succeeded by that point, so it needs to be freed.

CovID: 1358929
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-12 15:47:12 -07:00
Eric Anholt
c65a00eaff vc4: Close our screen's fd on screen close.
We're passed in a freshly dup()ed fd on screen create, so we should close
it on exit.  Debugged by Hugh Cole-Baker.
2016-07-12 15:46:09 -07:00
Eric Anholt
c93f6938d5 nir: Add optimization for (a || True == True)
This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
producing some constants that were getting compared.

total instructions in shared programs: 112276 -> 112198 (-0.07%)
instructions in affected programs:     2239 -> 2161 (-3.48%)
total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
estimated cycles in affected programs:     2365 -> 2301 (-2.71%)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-12 15:46:09 -07:00
Tim Rowley
be126c8a2a swr: [rasterizer core] correct MSAA behavior for conservative rasterization
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 11:10:55 -05:00
Tim Rowley
c6ca126591 swr: [rasterizer core] conservative rast backend changes
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 11:10:49 -05:00
Tim Rowley
b6dbb95dc9 swr: [rasterizer] buckets cleanup
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 11:10:44 -05:00
Tim Rowley
eb6b2b340e swr: [rasterizer core] make all api functions call GetContext
Small api cleanup.  Make all api functions call GetContext instead
of locally casting handle.  Makes debugging easier by providing a
single point to track context changes.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 11:10:36 -05:00
Tim Rowley
f810907669 swr: [rasterizer] add support for llvm-3.9
v2: use signed compare, remove unneeded vmask

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 11:09:49 -05:00
Tim Rowley
ae4f2c849a swr: [rasterizer jitter] fix llvm-3.7 compile
d3d97f8 broke llvm-3.7, which has a mismatched API for
setDataLayout/getDataLayout.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-07-12 10:42:57 -05:00
Brian Paul
d46489ddea docs: remove duplicated line in 12.0.1 release notes file
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-07-12 09:42:42 -06:00
Leo Liu
55f0b97b40 st/omx/dec: convert decoder video buffer to progressive
with encode tunneling

The idea of encode tunneling is to use video buffer directly for encoder,
but currently the encoder doesn’t support interlaced surface, the OMX
decoder set progressive surface before on that purpose.

Since now we are polling the driver for interlacing information for
decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
thus breaking the transcode with tunneling.

The solution is when with tunnel detected, re-allocate progressive target
buffers, and then converting the interlaced decoder results to there.

This has been tested with transcode results bit to bit matching as before
with surface from progressive to progressive.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Leo Liu
82f875f4d8 vl/compositor: set layer of y or uv to render
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Leo Liu
14761da9f9 vl/compositor: add weave to yuv shader
This shader will make interlaced yuv to progressive yuv.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Leo Liu
2e18c2c6f8 vl/compositor: move weave shader out from rgb weaving
We'll use weave shader in the later patch.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Marek Olšák
ead7736821 glsl_to_tgsi: don't use the negate modifier in integer ops after bitcast
This bug is uncovered by glsl/lower_if_to_cond_assign.
I don't know if it can be reproduced in any other way.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-12 11:58:53 +02:00
Francisco Jerez
e300696304 clover/api: Implement clLinkProgram per-device binary presence validation rule.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Serge Martin
f29ed2da24 clover: Add clLinkProgram (CL 1.2).
[ Francisco Jerez: Use validate_build_common for error checking,
  simplify control flow slightly and handle additional exception
  types. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Francisco Jerez
c478db6c0a clover: Trivial cleanups for api/program.cpp.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Francisco Jerez
9c7cda2792 clover/core: Remove compiler.hpp.
header_map was the only definition left in compiler.hpp, move it into
program.hpp which is its only user in clover/core.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Francisco Jerez
c2e37fe1f9 clover/llvm: Get rid of compile_program_llvm().
Superseded by compile_program() and link_program().

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Francisco Jerez
010918f5aa clover: Provide separate program methods for compilation and linking.
[ Serge Martin: Fix inverted opts and log build ctor args.
  Keep the log related to the build. Fix indentation ]

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:35 -07:00
Francisco Jerez
1942490bae clover: Unify program::build_* into a single method returning a struct.
This gets rid of the program::build_* query methods and replaces them
with the program::build() method that returns a single data structure
containing all parameters for the last build done on the given target
device (including build logs, options and the binary itself).

[ Serge Martin: Fix inverted opts and log build ctor args ]

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Serge Martin
7f6a4a4342 clover: Change program::build opts argument to std::string.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
2a73ae662c clover: Define error subclass to signal build option parse failure.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
4ef1c0918d clover: Move back to using build_error to signal compilation failure.
This partially reverts 7e0180d57d.
Having two different exception subclasses for compilation and linking
makes it more difficult to share or move code between the two
codepaths, because the exact same function under the same error
condition would need to throw one exception or the other depending on
what top-level API is being implemented with it.  There is little
benefit anyway because clCompileProgram() and clLinkProgram() can tell
whether they are linking or compiling a program.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Serge Martin
70fe6267a3 clover: Override ret_object.
Return an API object from an intrusive reference to a Clover object,
incrementing the reference count of the object.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
85309e8b55 clover/tgsi: Add stub link_program() function.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
ba613636e8 clover/tgsi: Move compiler entry point declaration into tgsi directory and namespace.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
fb3eeb1314 clover/llvm: Implement the -create-library linker option.
[ Serge Martin: disable internalize pass when building a library.
  Otherwise some functions may be inlined and removed ]

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
9de3f4a59f clover/llvm: Implement linkage of multiple clover modules.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
132b6ccd4f clover/llvm: Split compilation and linking.
Split the work previously done by compile_program_llvm() into
compile_program() (which simply runs the front-end and serializes the
resulting LLVM IR) and link_program() (which takes care of everything
else down to binary codegen).

[ Serge Martin: allow LLVM IR dump after compilation ]

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
1a7d11aa3d clover/llvm: Implement library bitcode codegen.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
86100e13ab clover/llvm: Trivial assorted cleanups for invocation.cpp.
Drop a few include and using directives which are no longer necessary.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
520cc26859 clover/llvm: Split native codegen into separate file.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:34 -07:00
Francisco Jerez
8195637363 clover/llvm: Split bitcode codegen into separate file.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
71ac9820d6 clover/llvm: Split shared codegen support code into separate file.
This is the common part of the code used to generate a clover::module
from LLVM bitcode, shared between the native and LLVM paths.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
26fa9bfd0d clover/llvm: Define function for bitcode print-out.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
f0721020ad clover/llvm: Split native codegen and assembly print-out into separate functions.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
1d042adc0a clover/llvm: Clean up bitcode codegen.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
952d1e6fd6 clover/llvm: Use metadata introspection utils for kernel enumeration.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
d37d5842c1 clover/llvm: Use metadata introspection utils for kernel argument set-up.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:33 -07:00
Francisco Jerez
3ed31bbf05 clover/llvm: Add simplified utility functions for metadata introspection.
v2: Fix for latest LLVM from SVN.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net> (v1)
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:34:30 -07:00
Francisco Jerez
7da2c1ff0f clover/llvm: Clean up codestyle of get_kernel_args().
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:59 -07:00
Francisco Jerez
0601fe7438 clover/llvm: Fold compile_native() call into build_module_native().
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:56 -07:00
Francisco Jerez
f98422eafd clover/llvm: Factor out duplicated construction of clover::module.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:53 -07:00
Francisco Jerez
3ce6ab068c clover/llvm: Clean up compile_native().
This switches compile_native() to the C++ API (which the rest of this
file makes use of anyway so there is little benefit from using the C
API), what should get rid of an amount of boilerplate and fix a leak
of the TargetMachine object in the error path.

v2: Additional fixes for LLVM 3.6.
v3: Update for the latest LLVM SVN changes.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:50 -07:00
Francisco Jerez
7bcefa5903 clover/llvm: Clean up ELF parsing.
This function was doing three separate things:
 - Initializing and releasing the ELF parsing state (the latter can be
   better done using RAII).
 - Searching for the symbol table in the ELF file.
 - Extraction of kernel symbol offsets from the symbol table.

Split each one into a separate function for clarity and clean up the
result slightly.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:48 -07:00
Francisco Jerez
574477e599 clover/llvm: Move a bunch of utility functions into separate file.
Some of these will be useful from a different compilation unit in the
same subtree so put them in a publicly accessible header file.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:43 -07:00
Francisco Jerez
92247cef3f clover/llvm: Tidy debug handling.
Most significant change is debugging flags are now a scoped enum and
all debugging helpers live in the debug namespace.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:40 -07:00
Francisco Jerez
4614397ac2 clover/llvm: Use helper function to abort compilation with error message.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:37 -07:00
Francisco Jerez
423eecb76a clover/llvm: Simplify diagnostic_handler().
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:29 -07:00
Francisco Jerez
5884dfbc2a clover/llvm: Trivial codestyle clean-up for optimize().
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:22:21 -07:00
Francisco Jerez
bdc27f13d5 clover/llvm: Clean up compilation into LLVM IR.
Some assorted and mostly trivial clean-ups for the source to bitcode
compilation path.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:21:50 -07:00
Francisco Jerez
714b167f57 clover/llvm: Factor out LLVM context init.
So it can be shared between the compilation and linking codepaths.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:21:30 -07:00
Francisco Jerez
fa94055d53 clover/llvm: Declare compiler instance at top level and pass down as argument.
This allows simplifying the interface of compile_llvm() because it no
longer needs to read out and return the optimization level and address
space map from the compiler instance.  Instead declare the compiler
instance at the top level so that both properties are available
directly.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:21:13 -07:00
Francisco Jerez
a27d4ec3b9 clover/llvm: Refactor compiler instance initialization.
This will be shared between the compiler and linker codepaths.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:21:08 -07:00
Francisco Jerez
c2a167ad73 clover/llvm: Factor out compiler option tokenization.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:20:47 -07:00
Francisco Jerez
c513cfa747 clover/llvm: Factor out target string parsing.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:20:41 -07:00
Francisco Jerez
251054220e clover/llvm: Collect #ifdef mess into a separate file.
This gets rid of most ifdef's from the invocation.cpp code -- Only a
couple of them are left which will be removed differently in the
following commits.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:20:12 -07:00
Francisco Jerez
11afde89b8 clover/llvm: Drop dead code.
This ifdef'ed out code was meant to handle compilation into TGSI, but
it doesn't seem likely that it will ever be useful even if the TGSI
back-end is resurrected because the TGSI bitcode can just be plumbed
through in ELF format and dealt with as a regular "native" back-end.

Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:20:05 -07:00
Francisco Jerez
600ac51448 clover/llvm: Drop support for LLVM < 3.6.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:19:49 -07:00
Serge Martin
8624888d6f clover: Bump required LLVM version to 3.6.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-07-11 20:19:14 -07:00
Ilia Mirkin
da7223ebdc mesa: set _NEW_BUFFERS when updating texture bound to current buffers
When a glTexImage call updates the parameters of a currently bound
framebuffer, we might miss out on revalidating whether it is complete.
Make sure to set _NEW_BUFFERS which will trigger the revalidation in
that case.

Also while we're at it, fix the fb parameter passed in to the eventual
RenderTexture call.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94148
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
2016-07-11 21:18:05 -04:00
Ilia Mirkin
8b7607d28a meta/texsubimage: tex_image is always non-null, avoid confusing code
Probably a copy-paste from mesa_meta_pbo_GetTexSubImage where tex_image
may apparently be null.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-11 21:18:05 -04:00
Ilia Mirkin
00d4315d37 st/mesa: return appropriate mesa format for ETC texture formats
Even when the backend driver does not support ETC formats, we handle the
decoding into an uncompressed backing texture. However as far as core
mesa is concerned, it's an ETC texture and we should return the relevant
ETC mesa format. This condition can get hit when using glTexStorage to
create the texture object.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-07-11 21:17:30 -04:00
Ilia Mirkin
8ee3cdde04 mesa: etc2 online compression is unsupported, don't attempt it
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-07-11 21:17:01 -04:00
Ben Skeggs
0d911a720d nvc0: initial support for GP100 GPUs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-07-12 10:56:35 +10:00
Samuel Pitoiset
9bc083284f nvc0: use a define for the driver constant buffer size
This might avoid mistakes if the size is bumped in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-11 22:30:41 +02:00
Samuel Pitoiset
31a615677b nvc0: fix the driver cb size when draw parameters are used
The size of the driver constant buffer for each stage should be 2048
and not 512 because it has been increased recently for buffers/images.
While we are at it, do the same change for indirect draws.

This fixes all ARB_shader_draw_parameters tests on GM107.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-11 22:11:27 +02:00
Samuel Pitoiset
19d0450b27 nvc0/ir: fix images indirect access on Fermi
This fixes the following piglits:

arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-07-11 21:01:21 +02:00
Marek Olšák
33c8723980 st/mesa: remove st_dump_program_for_shader_db
replaced by MESA_SHADER_CAPTURE_PATH in core Mesa

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-11 19:06:05 +02:00
Marek Olšák
d7b6f90684 gallivm: set LLVMNoUnwindAttribute on all intrinsics
RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement:

 Application            Files    SGPRs     VGPRs   SpillSGPR SpillVGPR Code Size    LDS    Max Waves   Waits
 unigine_valley           278    0.00 %   -0.29 %    0.00 %    0.00 %    0.01 %    0.00 %    0.17 %    0.00 %

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-07-11 19:06:05 +02:00
Francesco Ansanelli
3c44629142 i965: fix ignored qualifiers warning
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-11 05:50:22 -07:00
Nicolai Hähnle
374aa2bb27 gallium/u_queue: assert that users must wait on fences before destroying them
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-11 11:04:44 +02:00
Nicolai Hähnle
a0a616720a gallium/u_queue: guard fence->signalled checks with fence->mutex
I have seen a hang during application shutdown that could be explained by the
following race condition which this patch fixes:

1. Worker thread enters util_queue_fence_signal, sets fence->signalled = true.
2. Main thread calls util_queue_job_wait, which returns immediately.
3. Main thread deletes the job and fence structures, leaving garbage behind.
4. Worker thread calls pipe_condvar_broadcast, which gets stuck forever because
   it is accessing garbage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-11 11:03:59 +02:00
Chad Versace
5c17fb2cd6 anv/dump: Fix post-blit memory barrier
Swap srcAccessMask and dstAccessMask.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-09 20:58:33 -07:00
Chad Versace
bc33c9b455 anv/dump: Fix vkCmdPipelineBarrier flags
'true' is not valid for VkDependencyFlags.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-09 20:58:33 -07:00
Jason Ekstrand
ac7eeebce4 anv/dump: Add support for dumping framebuffers
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-09 20:58:33 -07:00
Jason Ekstrand
fad0b7b0b3 anv/dump: Add a barrier for the source image
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-09 20:58:33 -07:00
Jason Ekstrand
6ad183bf89 anv/dump: Refactor the guts into helpers
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-09 20:58:33 -07:00
Jason Ekstrand
adbed7ae7a anv/dump: Use anv_minify instead of hand-rolling it
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-09 20:58:33 -07:00
Jason Ekstrand
a26cda5ca5 anv/dump: Take an aspect in dump_image_to_ppm
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-09 20:58:33 -07:00
Nicolai Hähnle
b479c47a9c radeonsi: fix bad assertion in si_emit_sample_mask
The blitter sets mask == 1, which is fine since it doesn't use smoothing.
Fixes a regression introduced in commit 5bcfbf91.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-09 19:46:54 +02:00
Matt Turner
6624174c0a glx: Fix for commit 2c86668694.
Ian suggested these changes in his review and I made them, but I pushed
the old version of the patch.
2016-07-08 16:46:17 -07:00
Emil Velikov
83a782cd5e docs: add news item and link release notes for 12.0.0/12.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-09 00:09:51 +01:00
Emil Velikov
386ceb4c61 docs: add sha256 checksums for 12.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit edfc17a19a)
2016-07-09 00:03:21 +01:00
Emil Velikov
c7c0adc7e6 docs: add release notes for 12.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 04277f058d)
2016-07-09 00:03:16 +01:00
Emil Velikov
286a71b01f docs: add sha256 checksums for 12.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 3a146a789c)
2016-07-09 00:03:10 +01:00
Emil Velikov
4644908a9f docs: Update 12.0.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8b06176f31)
2016-07-09 00:03:04 +01:00
Matt Turner
2c86668694 glx: Undo memory allocation checking damage.
This partially reverts commit d41f5396f3.

That untested commit broke the tex-skipped-unit piglit test and the
arbvparray Mesa demo when run with indirect GLX.

state->array_state is used during initialization, so its assignment cannot be
moved to the end of the function.

The backtrace looked like:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77c7a5c in __glXGetActiveTextureUnit (state=0x6270e0) at indirect_vertex_array.c:1952
1952           return state->array_state->active_texture_unit;
(gdb) bt
0  0x00007ffff77c7a5c in __glXGetActiveTextureUnit (state=0x6270e0) at indirect_vertex_array.c:1952
1  0x00007ffff77cbf62 in get_client_data (gc=0x626f50, cap=34018, data=0x7fffffffd7a0) at single2.c:159
2  0x00007ffff77cce51 in __indirect_glGetIntegerv (val=34018, i=0x7fffffffd830) at single2.c:498
3  0x00007ffff77c4340 in __glXInitVertexArrayState (gc=0x626f50) at indirect_vertex_array.c:193

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-08 14:03:19 -07:00
Colin McDonald
b36644bae6 glx: Fix indirect multi-texture GL_DOUBLE coordinate arrays.
There is no draw arrays protocol support for multi-texture coordinate
arrays, so it is implemented by sending batches of immediate mode
commands from emit_element_none in indirect_vertex_array.c.  This sends
the target texture unit (which has been previously setup in the
array_state header field), followed by the texture coordinates.  But for
GL_DOUBLE coordinates the texture unit must be sent *after* the texture
coordinates. This is documented in the glx protocol description, and can
also be seen in the indirect.c immediate mode commands generated from
gl_API.xml. Sending the target texture unit in the wrong place can crash
the remote X server.

To fix this required some more extensive changes to
indirect_vertex_array.c and indirect_vertex_array_priv.h, in order to
remove the texture unit value out of the array_state "header" field, and
send it separately.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61907
2016-07-08 14:03:16 -07:00
Colin McDonald
5ced100bf5 glx: Correct opcode typos in __indirect_glTexCoordPointer.
At the same time, replace opcode numbers with names in
__indirect_glVertexAttribPointer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61907
2016-07-08 14:03:09 -07:00
Colin McDonald
d57c85c1bf glx: Call __glXInitVertexArrayState() with a usable gc.
For each indirect context the indirect vertex array state must be initialised
by __glXInitVertexArrayState in indirect_vertex_array.c.  As noted in the
routine header it requires that the glx context has been setup prior to the
call, in order to test the server version and extensions.

Currently __glXInitVertexArrayState is called from indirect_bind_context in
indirect_glx.c, as follows:

   state = gc->client_state_private;
   if (state->array_state == NULL) {
      glGetString(GL_EXTENSIONS);
      glGetString(GL_VERSION);
      __glXInitVertexArrayState(gc);
   }

But, the gc context is not yet usable at this stage, so the server queries
fail, and __glXInitVertexArrayState is called without the server version and
extension information it needs.  This breaks multi-texturing as
glXInitVertexArrayState doesn't get GL_MAX_TEXTURE_UNITS.  It probably also
breaks setup of other arrays: fog, secondary colour, vertex attributes.

To fix this I have moved the call to __glXInitVertexArrayState to the end of
MakeContextCurrent in glxcurrent.c, where the glx context is usable.

Fixes a regression caused by commit 4fbdde889c. Fixes ARB_vertex_program
usage in the arbvparray Mesa demo when run with indirect GLX and also
the tex-skipped-unit piglit test when run with indirect GLX.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61907
2016-07-08 14:02:56 -07:00
Christian König
64ac4aef27 radeon/uvd: simplify sending context buffer message
Just send it whenever it is allocated.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-08 21:03:32 +02:00
Christian König
6b474e06a2 radeon/uvd: fix contex buffer destruction in the error path
Destroying a not allocated buffer is harmless.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-08 21:03:32 +02:00
Christian König
36df04dac4 radeon/uvd: move polaris fw check into radeon_video.c v2
It's actually not very clever to claim to support H.264
and then fail to create a decoder.

v2: prefix FW macro with UVD_.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-08 21:03:31 +02:00
Christian König
5290bf43c8 radeon/video: fix coding style in radeon_video.c v2
v2: fix other tabs as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-07-08 21:03:31 +02:00
Brian Paul
74163475b0 svga: simplify/fix 1D/2D array resource copies
Fixes the one of the piglit arb_copy_image-targets tests for 1D arrays.
Previously, we were applying the 1D array z/face adjustment twice.

Also simplify the copy_region_vgpu10() function.  It never has to copy
multiple array layers/slices.  The Mesa code for glCopyImageSubData does
the loop over slices/faces.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-07-08 12:53:21 -06:00
Brian Paul
0e23f370c9 mesa: print number of samples in renderbuffer_storage error msg
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-08 12:53:21 -06:00
Brian Paul
fb26317604 svga: remove unused variable
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-07-08 12:53:21 -06:00
Brian Paul
689293ad52 svga: add dumping for more device commands
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-07-08 12:53:21 -06:00
Brian Paul
599c333d07 svga: silence a couple unused variable warnings
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-07-08 12:53:20 -06:00
Charmaine Lee
c3c7ff014b svga: rebind using render target surfaces in hw draw state
Currently when we rebind framebuffer resources at the beginning of
the command buffer, we use the color buffer surfaces saved in the context
hw clear state. But the surfaces could be different from the actual
emitted render target surfaces if any of the color buffer surfaces
is also used for shader resource, in that case, we create
a backed surface for the collided render target surface. So to rebind
the framebuffer resources correctly, use the render target surfaces saved
in the context hw draw state.

Tested with Heaven, Lightsmark2008, MTT piglit, glretrace, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-08 12:53:20 -06:00
Charmaine Lee
da98cee067 svga: invalidate gb surface before it is reused
With this patch, a guest-backed surface will be invalidated
using the SVGA_3D_CMD_INVALIDATE_GB_SURFACE command before
the surface is reused. This fixes the updating dirty image error
from the device when a surface is reused.

v2: Instead of invalidating the surface when it is reused,
    send the invalidate command before the surface is put into
    the recycle pool.

v3: (1) surface invalidate is a noop operation in Linux winsys, since
        surface invalidation is not needed for DMA path.
    (2) Instead of invalidating the surface content in
        svga_screen_surface_destroy() when a surface is to be destroyed,
        it is done in svga_screen_cache_flush() when the surface is
        no longer referenced in a command buffer and is ready to
        be moved to the unused list. At this point, the surface will
        be moved to the invalidate list. When the surface invalidation
        is submitted, the surface will be moved to the unused list.

Tested with piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-07-08 12:53:20 -06:00
Brian Paul
ca531aeeb1 svga: fix use of provoking vertex control
If the SVGA3D_DEVCAP_DX_PROVOKING_VERTEX query returns false, never
define rasterizer state objects with provokingVertexLast set.  Despite
what the device reports, it may interpret the provokingVertexLast flag
anyway.  This fixes an issue when using capability clamping.

Tested with piglit provoking-vertex and glsl-fs-flat-color tests.

VMware bug 1550143.

Reviewed-by: <charmainel@vmware.com>
2016-07-08 12:53:20 -06:00
Nayan Deshmukh
af18a04755 vl: add half pixel to v_tex before adding offsets
Since pixel center lies at 0.5, add half_pixel to vtex
before adding offsets to it.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-08 20:51:12 +02:00
Samuel Pitoiset
a0bf1768c7 nvc0/ir: remove unused resource info loading helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08 19:12:23 +02:00
Samuel Pitoiset
ed3a284382 nvc0/ir: refactor the surfaces info loading logic
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08 19:12:21 +02:00
Samuel Pitoiset
9cdbe80745 nvc0/ir: move the shift left op inside loadTexHandle()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08 19:12:06 +02:00
Nicolai Hähnle
04d93ea619 radeonsi: disable multi-threading when shader dumps are enabled
Otherwise, shader dumps can become interleaved and unusable.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:59:36 +02:00
Nicolai Hähnle
7ffc832ab8 radeonsi: use multi-threaded compilation in debug contexts
We only have to stay single-threaded when debug output must be synchronous.
This yields better parallelism in shader-db runs for me.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:59:32 +02:00
Nicolai Hähnle
084ca0d8e5 st/mesa: set debug callback async flag
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:59:29 +02:00
Nicolai Hähnle
2909e292fc gallium: add async flag to pipe_debug_callback
v2: fix typo db -> cb

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:58:52 +02:00
Nicolai Hähnle
5bcfbf91e5 radeonsi: catch a potential state tracker error with non-MSAA FBs
At least st/mesa ensures this, so I'd rather not handle deviations in radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:53:05 +02:00
Nicolai Hähnle
d938b8c0bf radeonsi: explicitly choose center locations for 1xAA on Polaris
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.

As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:50 +02:00
Nicolai Hähnle
7d2ce5258f r600g: call cayman_emit_msaa_sample_locs only when needed
In the case of nr_samples <= 1, that function is (currently) a no-op anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-08 10:52:45 +02:00
Kenneth Graunke
b3c5df3ca4 mesa: Mark R*32F formats as filterable when an extension is present.
GL_OES_texture_float_linear marks R32F, RG32F, RGB32F, and RGBA32F
as texture filterable.

Fixes glGenerateMipmap GL errors when visiting a WebGL demo in Chromium:
http://www.iamnop.com/particles

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-08 01:26:23 -07:00
Eric Engestrom
b7be23b6e1 i965/blorp: fix indentation level
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-07-08 11:07:36 +03:00
Francisco Jerez
37b901003b i965: Fix remaining flush vs invalidate race conditions in brw_emit_pipe_control_flush.
This hardware race condition has caused problems several times already
(see "i965: Fix cache pollution race during L3 partitioning set-up.",
"i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
"i965: intel_texture_barrier reimplemented").  The problem is that
whenever we attempt to both flush and invalidate multiple caches with
a single pipe control command the flush and invalidation happen in
reverse order, so the contents flushed from the R/W caches aren't
guaranteed to become visible from the invalidated caches after the
PIPE_CONTROL command completes execution if some concurrent rendering
workload happened to pollute any of the invalidated R/O caches in the
short window of time between the invalidation and flush.

This makes sure that brw_emit_pipe_control_flush() has the effect
expected by most callers of making the contents flushed from any R/W
caches visible from the invalidated R/O caches.

Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-07 14:16:39 -07:00
Francisco Jerez
0bd3a121c6 i965: Make room in the batch epilogue for three more pipe controls.
Review carefully, it sucks to have to keep track of the number of
command packet dwords emitted in the batch epilogue manually.  The
MI_REPORT_PERF_COUNT_BATCH_DWORDS calculation was obviously wrong.

Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-07 14:16:39 -07:00
Francisco Jerez
a10879f48c i965: Emit SKL VF cache invalidation W/A from brw_emit_pipe_control_flush.
There were two places in the driver doing a pipe control VF cache
flush, one of them was missing this workaround, move it down into
brw_emit_pipe_control_flush to make sure we don't miss it again.

Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-07-07 14:16:39 -07:00
Francisco Jerez
04f74d6629 i965: Emit SNB write cache flush W/A from brw_emit_pipe_control_flush.
Shouldn't cause any functional changes at this point, but we have
forgotten to apply this workaround several times in the past, make
sure it doesn't happen again.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-07-07 14:16:38 -07:00
Frank Binns
8fd5779da4 egl: restrict swap_available dri2_egl_display field to X11
This field is only ever set and read by the X11 platform.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-07 13:28:50 -07:00
Guillaume Charifi
9fea9d6f8e egl: Fix the bad surface attributes combination checking for pbuffers. (v3)
Fixes a regression induced by commit a0674ce5c4:
When EGL_TEXTURE_FORMAT and EGL_TEXTURE_TARGET were both specified (and
both != EGL_NO_TEXTURE), an error was instantly triggered, before the
other one had even a chance to be checked, which is obviously not the
intended behaviour.

v2: Full commit hash, remove useless variables.
v3: [chadv] Add Fixes footers.

Fixes: piglit "spec/egl 1.4/eglcreatepbuffersurface and then glclear"
Fixes: piglit "spec/egl 1.4/largest possible eglcreatepbuffersurface and then glclear"
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-07 11:28:55 -07:00
Eric Engestrom
7adb9b0948 egl/display: remove unnecessary code and make it easier to read
Remove the two first level `if` as they will always be true, and
flatten the two remaining `if`.
No functional change.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-07 11:13:13 -07:00
Gurchetan Singh
2e6d35809b mesa: Make single-buffered GLES representation internally consistent
There are a few places in the code where clearing and reading are done on
incorrect buffers for GLES contexts.  See comments for details.  This
fixes 75 GLES3 dEQP tests on the surfaceless platform with no regressions.

v2: Corrected unclear comment
v3: Make the change in context.c instead of get.c
v4: Removed whitespace

Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-07-07 11:02:35 -07:00
Emil Velikov
f35f8464ec bugzilla_mesa.sh: Drop "Bug " from sed command
After a recent Bugzilla update the word is no longer in the title. Thus
the script ended up producing bogus HTML.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-07 15:58:46 +01:00
Akihiko Odaki
42968424fb mesa: don't install GLX files if GLX is not built
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki.4i@stu.hosei.ac.jp>
[Emil Velikov: Drop guards around dri_interface.h, add stable tag]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-07 15:58:11 +01:00
Timothy Arceri
7a9d6abcae nir: add glsl_dvec_type() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-06 23:20:23 -07:00
Mathias Fröhlich
13affe0d3f osmesa: Export OSMesaCreateContextAttribs.
Since the function is exported like any other
public api function and put in the header
as if you could link against it, export it also
from shared objects.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-07-07 06:19:13 +02:00
Timothy Arceri
7ed5bca21d i965: consolidate generation check
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-07-07 12:29:21 +10:00
Timothy Arceri
e0dc3109d5 i965: don't copy VS attribute work arounds for HSW+
These workarounds are not required for HSW and above so stop
copying them at VS key generation which is called at draw time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 12:29:12 +10:00
Timothy Arceri
27e28197e8 i965: add double packing support to tess stages
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
8b80e9c31d i965: add double support packing support to gs inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
20e935e6f6 nir: add glsl_double_type() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
9d9b0b54cd i965: add indirect packing support to gs load inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
2477e6cfad i965: add indirect packing support for tcs and tes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
2bda4b062f i965: add component packing support for tcs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
cfff71a47a i965: add component packing support for tes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
a102ef2d4f i965: add component packing support for gs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
448adfbc67 nir: use the same driver location for packed varyings
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri
0eea6b3297 nir: add new intrinsic field for storing component offset
This offset is used for packing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Eric Engestrom
771f6db76f i965/docs: update Intel Linux Graphics URLs
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-06 13:18:23 -07:00
Chad Versace
8910de39c7 anv: gitignore anv_timestamp.h 2016-07-06 13:13:18 -07:00
Tom Stellard
513fccdfb6 radeon/llvm: Use alloca instructions for larger arrays
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays.  allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.

For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.

In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.

This fixes the piglit test:

spec/glsl-1.50/execution/geometry/max-input-component

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Tom Stellard
02873a7b0c radeon/llvm: Add helpers for loading and storing data from arrays.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Tom Stellard
2dc48984b2 radeon/llvm: Remove uses_temp_indirect_addressing() function
bld->indirect_files is never set, so this function always returns false.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 19:47:38 +00:00
Emil Velikov
9618e2a24c anv: vulkan: remove the anv_device.$(OBJEXT) rule
Atm the actual rule will expand to foo.o which is used for static
libraries only.

Thus the automake manual recommendation [to use OBJEXT] won't help us,
since since we're working with a shared library.

Thus let's 'demote' the file and add it back to BUILT_SOURCES. This will
manage all the complexity for us, at the (existing expense) of working
only with the all, check and install targets.

The crazy (why the issue was hard to spot):
If the dependencies (.deps/*.Plo) are already created one can alter the
anv_device.$(OBJEXT) line and/or nuke it all together. That won't lead
to any warnings/issues, even though the Makefile is regenerated.

Moral of the story:
Always rm -rf top_builddir or don't resolve the dependencies manually
and use BUILT_SOURCES.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96825
Fixes: d7a604c3f7a ("anv: use cache uuid based on the build timestamp.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-07-06 10:19:19 -07:00
Rob Clark
64d35f817a vbo: fix attr reset
In bc4e0c4 (vbo: Use a bitmask to track the active arrays in vbo_exec*.)
we stopped looping over all the attributes and resetting all slots.
Which exposed an issue in vbo_exec_bind_arrays() for handling GENERIC0
vs. POS.

Split out a helper which can reset a particular slot, so that
vbo_exec_bind_arrays() can re-use it to reset POS.

This fixes an issue with 0ad (and possibly others).

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-07-06 10:17:30 -04:00
Rob Clark
23dd9eaa94 list: fix list_replace() for empty lists
Before, it would happily copy list_head next/prev (ie. pointer to the
*from* list_head), leaving things in a confused state and causing much
mayhem.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-06 10:17:30 -04:00
Rob Clark
09fe35b450 gallium: un-inline pipe_surface_desc
Want to re-use this struct, so un-inline it.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:17:30 -04:00
Rob Clark
def044376a gallium/util: make util_copy_framebuffer_state(src=NULL) work
Be more consistent with the other u_inlines util_copy_xyz_state()
helpers and support NULL src.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:17:30 -04:00
Nicolai Hähnle
660cd3de4a winsys/amdgpu: avoid flushed depth when possible
If a depth/stencil texture has no mipmaps, we can always get a layout that is
compatible with DB and TC.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:52 +02:00
Nicolai Hähnle
7000dfd5c3 gallium/radeon: add depth/stencil_adjusted output to surface computation
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.

Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:52 +02:00
Nicolai Hähnle
68fe270e71 gallium/radeon: allocate only the required plane for flushed depth
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:52 +02:00
Nicolai Hähnle
1a0a8efcce radeonsi: decompress to flushed depth texture when required
v2: s/dirty_level_mask/stencil_dirty_level_mask/ in stencil case

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:51 +02:00
Nicolai Hähnle
4b7961da77 radeonsi: extract DB->CB copy logic into its own function
Also clean up some of the looping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:51 +02:00
Nicolai Hähnle
18cc825fb9 radeonsi: sample from flushed depth texture when required
Note that this has no effect yet. A case where can_sample_z/s can be false
in radeonsi will be added in a later patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:51 +02:00
Nicolai Hähnle
f2eb34f82f gallium/radeon: replace is_flushing_texture with db_compatible
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.

v2: adjust r600_init_color_surface as well

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:48 +02:00
Nicolai Hähnle
dd65126153 gallium/radeon: add can_sample_z/s flags for textures
v2: adjust r600_init_color_surface as well

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:43:43 +02:00
Nicolai Hähnle
065eeb79f7 radeonsi: correctly mark levels of 3D textures as fully decompressed
Account for the fact that max_layer is minified for higher levels.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:49 +02:00
Nicolai Hähnle
19f8d2a843 gallium/radeon/winsyses: remove unused stencil_offset
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:49 +02:00
Nicolai Hähnle
3a1da559c5 gallium/radeon: remove redundant null-pointer check
v2: keep using r600_texture_reference

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:48 +02:00
Nicolai Hähnle
5b87eef031 gallium/radeon: print StencilLayout only once
It is the same for all levels.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:48 +02:00
Nicolai Hähnle
bae066c3f0 gallium/radeon: flush stdout after printing texture information
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:42:48 +02:00
Ilia Mirkin
a37e46323c glsl: don't try to lower non-gl builtins as if they were gl_FragData
If a shader has an output array, it will get treated as though it were
gl_FragData and rewritten into gl_out_FragData instances. We only want
this to happen on the actual gl_FragData and not everything else.

This is a small part of the problem pointed out by the below bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-07-05 21:22:01 -04:00
Ian Romanick
795d8dff89 glsl: Document and enforce restriction on type values
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-07-05 17:55:29 -07:00
Ian Romanick
3119871bd9 glsl: Pack integer and double varyings as flat even if interpolation mode is none
v2: Also update varying_matches::compute_packing_class().  Suggested by
Timothy Arceri.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-05 16:58:27 -07:00
Ian Romanick
73a6a4ce49 mesa: Strip arrayness from interface block names in some IO validation
Outputs from the vertex shader need to be able to match
per-vertex-arrayed inputs of later stages.  Acomplish this by stripping
one level of arrayness from the names and types of outputs going to a
per-vertex-arrayed stage.

v2: Add missing checks for TESS_EVAL->GEOMETRY.  Noticed by Timothy
Arceri.

v3: Use a slightly simpler stage check suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-05 16:58:27 -07:00
Charmaine Lee
32651c67d1 svga: avoid emitting redundant DXSetRenderTargets command
Tested with Lightsmark2008, MTT piglit, glretrace, conform.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-05 16:58:29 -06:00
Leo Liu
aa7d42a5f9 radeon/vce: update encRefPic addr and array mode to tiled
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-05 09:15:50 -04:00
Leo Liu
e560a11b87 radeon/vce: increase cpb height alignment
Height should be aligned with 2 macroblocks, thus making safer
for tiled mode

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-05 09:15:47 -04:00
Iago Toral Quiroga
fa0654fc3c i965: Remove trailing whitespace
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-07-05 14:06:37 +02:00
Iago Toral Quiroga
d92ac67126 i965: Make inline function static
Without this the i965 driver fails to load.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-07-05 14:05:58 +02:00
Emil Velikov
cbc37f72e3 anv: install the intel_icd.json to ${datarootdir} by default
As mentioned by the spec (and used by Archlinux and Debian) default to
${datarootdir} as opposed to ${sysconfdir} for the default location.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-05 12:17:34 +01:00
Emil Velikov
744d0d8f3b swr: automake: don't ship LLVM version specific generated sources
Otherwise things will fail to build, if the builder is using another
version of LLVM.

v2: annotate all the dependencies of builder_gen.h
v3: clean the generated files as needed
v4: comment cleanups (Tim)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com> (v2)
Reported-by: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-05 12:17:05 +01:00
Emil Velikov
22e9357028 automake: don't mandate git_sha1.h/MESA_GIT_SHA1
It has proven subtle to get it right both from the build side POV (see
commit list below) and builders due to their varying workflows.

Furthermore it does not fully fulfil the reason why it was enforced -
to detect uniqueness between different builds, in order to distinguish
and invalidate Vulkan/GL caches.

With that having a much better solution (previous commit) we can drop
this solution.

This effectively reverts the following commits:
359d9dfec3 ("mesa: automake: add directory prefix for git_sha1.h")
2c424e00c3 ("mesa: automake: ensure that git_sha1.h.tmp has the right
attributes")
b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building OOT")
8229fe68b5 ("automake: get in-tree `make distclean' working again.")

Cc: Timo Aaltonen <tjaalton@debian.org>
Cc: Haixia Shi <hshi@chromium.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-05 12:16:20 +01:00
Emil Velikov
e5c1229a9a anv: automake: indent with tabs and not spaces
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-05 12:16:06 +01:00
Emil Velikov
addb099ce8 anv: use cache uuid based on the build timestamp.
Do not rely on the git sha1:
 - its current truncated form makes it less unique
 - it does not attribute for local (Vulkand or otherwise) changes

Use a timestamp produced at the time of build. It's perfectly unique,
unless someone explicitly thinkers with their system clock. Even then
chances of producing the exact same one are very small, if not zero.

v2: Remove .tmp rule. Its not needed since we want for the header to be
regenerated on each time we call make (Eric).

v3:
 - Honour SOURCE_DATE_EPOCH, to make the build reproducible (Michel)
 - Replace the generated header with a define, to prevent needless
builds on consecutive `make' and/or `make install' calls. (Dave)

v4:
 - Keep the timestamp generation at make time. (Jason)

v5:
 - Ensure that file is regenerated on incremental builds.

Cc: Michel Dänzer <michel@daenzer.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-05 12:15:23 +01:00
Emil Velikov
f98530b739 clover: conditionally use MESA_GIT_SHA1
Considering how hard/annoying it was for many peoples' workflow to
properly generate the macro, it will be demoted to conditionally
available with follow-up commits.

v2: Kill off gracious blank line (Vedran).

Cc: mesa-stable@lists.freedesktop.org
Cc: Vedran Miletić <vedran@miletic.net>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-05 12:14:34 +01:00
Timothy Arceri
9c9e3e7ee1 mesa: stop copying SamplerUnits twice
The call to _mesa_update_shader_textures_used() already takes
care of copying for us.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-07-05 20:18:05 +10:00
Timothy Arceri
25a32c2cbf mesa: make attribute binding message more useful
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-05 20:18:05 +10:00
Timothy Arceri
8f1ca0ee3f i965: make more effective use of SamplersUsed
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-07-05 20:18:05 +10:00
Timothy Arceri
51f912786f glsl: stop allocating memory for UBOs during linking
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.

These uniform types don't use this storage.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-07-05 20:18:05 +10:00
Timothy Arceri
549b9b12fc glsl: mark link_uniform_blocks_are_compatible() as static
Missed this when doing 6d1a59d15b.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-07-05 20:18:05 +10:00
Timothy Arceri
30812e90d1 mesa: fix build error
Fix build error cased by 6a524c76f5.
2016-07-05 18:42:06 +10:00
Gregory Hainaut
6a524c76f5 mesa: faster validation of sampler unit mapping for SSO
Code was inspired from _mesa_update_shader_textures_used

However unlike _mesa_update_shader_textures_used that only check for a single
stage, it will check all stages.

It avoids to loop on all uniforms, only active samplers are checked.

For my use case: high FS frequency switches with few samplers.
Perf event (relative to nouveau_dri.so) goes from 5.01% to 1.68% for
the _mesa_sampler_uniforms_pipeline_are_valid function.

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-07-05 16:44:31 +10:00
Dave Airlie
cb728df967 Revert "st/glsl_to_tgsi: don't increase immediate index by 1."
This reverts commit 27d456cc87.

DOH, what seems right and what is right with fp64 are always
two different things.

This regressed:
spec@arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-mixed-shader
on radeonsi

Reported-by: Michel Dänzer <michel@daenzer.net>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-05 10:25:29 +10:00
Samuel Pitoiset
c1fb3290a6 nvc0/ir: rename NVE4_SU_INFO_XXX to NVC0_SU_INFO_XXX
While we are at it, fix a typo inside the comment which describes
what those constants are for.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-05 01:44:15 +02:00
Samuel Pitoiset
f3b9fff3c3 nvc0/ir: reset the base offset for indirect images accesses
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-07-05 01:44:12 +02:00
Samuel Pitoiset
cb828b7b18 gm107/ir: fix sign bit emission for FADD32I
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.

This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-07-05 01:44:08 +02:00
Eric Anholt
ac772b24a1 vc4: Regularize instruction emit macros
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did.  This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
2016-07-04 16:33:22 -07:00
Eric Anholt
8a52f03f5d vc4: Enable dead CF elimination.
Now that we're about to start generating control flow in our NIR, we want
this in place.  It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
2016-07-04 16:33:22 -07:00
Eric Anholt
8f2af4763a vc4: Optimize out redundant SF updates.
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.

total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs:     1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs:     8568 -> 8523 (-0.53%)
2016-07-04 16:33:22 -07:00
Eric Anholt
200b4e4bd5 vc4: Move SF removal to a separate peephole pass.
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling.  We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.

No change on shader-db.
2016-07-04 16:33:22 -07:00
Eric Anholt
aa76ba6f2f vc4: DCE instructions with a NULL destination.
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects.  Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
2016-07-04 16:33:22 -07:00
Eric Anholt
2a8973fb78 vc4: Mark texturing setup instructions as having side effects.
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm.  Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
2016-07-04 16:33:22 -07:00
Eric Anholt
44df374a9c vc4: Fix a pasteo in scheduling condition flag usage.
Noticed by code inspection.  This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes.  MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
2016-07-04 16:33:22 -07:00
Eric Anholt
eaa53f80d9 vc4: Drop the dead QIR_PACK() macro.
This isn't used since we switched to using the dst.pack field instead of
custom instructions.
2016-07-04 16:33:18 -07:00
Marek Olšák
5c92c21369 radeonsi: do compilation from si_create_shader_selector asynchronously
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.

This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.

If an app creates more shaders, at most 4 threads will be used to compile
them.

Debug output disables this for shader stats to be printed in the correct
order.

(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
84824935cf radeonsi: don't lock shader cache mutex during compilation
to allow multiple shaders to be compiled simultaneously.

ALso, shader-db can again use all 4 cores.

v2: Remove the pipe_mutex_unlock call in the error path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-07-05 00:47:13 +02:00
Marek Olšák
850cd953b1 radeonsi: separate the compilation chunk of si_create_shader_selector
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
6781a2a994 radeonsi: move LLVMTargetMachineRef creation to a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
8a4ace4a47 gallium/radeon: add and use radeon_info::max_alloc_size (v2)
v2: - squashed the patches
    - use INT_MAX
    - clamp max_const_buffer_size
    - check the DRM version in radeon

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-07-05 00:47:13 +02:00
Marek Olšák
027ad71b57 radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
28a03be06b radeonsi: enable string markers and record apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:13 +02:00
Marek Olšák
642cf400aa ddebug: add an option to dump info about a specific apitrace call
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
1daec2b795 ddebug: implement pipe_context::generate_mipmap
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
50b2235478 ddebug: record and dump apitrace call numbers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
861ecf1ca9 ddebug: implement emit_string_marker
and remove some obsolete comments

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
a446c40e0a gallium/radeon: remove unused code - radeon_llvm_util.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
eaccc4e8c8 radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)
This will be needed after some LLVM changes that haven't landed yet.

v2: - use LLVMIsConstant to fix an LLVM assertion failure.
      LLVMSetMetadata doesn't work with constants.
    - don't set float metadata as string

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
1c00086746 radeonsi: remove an obsolete comment
It's not true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
4d1f32376d radeonsi: don't interpolate colors if flatshading is enabled
use v_interp_mov for those

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
4accb02d7a radeonsi: enable the barycentric optimization in all cases
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
476e9cee1d radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
a675c6a000 radeonsi: split ps.prolog.force_persample_interp into persp and linear bits
This reduces the number of v_mov's in the prolog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Marek Olšák
61010cfac0 radeonsi: don't dump the shader key for non-monolithic shaders early
It's always zero.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-05 00:47:12 +02:00
Jan Vesely
015e2e0fce r600g: Add double precision FMA ops
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: James Harvey <lothmordor@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-07-05 00:47:12 +02:00
Francesco Ansanelli
9827fc3f03 r600: fix duplicate 'const' declaration
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-04 21:26:31 +02:00
Topi Pohjolainen
2a60654f56 i965/urb: Allow blorp to record current settings
This makes it possible to skip urb re-configuration if the
subsequent renders agree with the settings.

Also allows blorp to allocate the maximun amount of vs entries
available. Core upload logic already knows how to calculate this.
Helps one synthetic benchmark.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
39fdee6b2d i965/blorp/gen7+: Do not trigger push constant space reconfig
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
cc2d0e64c0 i965/blorp/gen7+: Stop trashing push constant allocation
Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage
itself is enabled and hardware may try to prefetch constants from
the buffer. From the BSpec: 3D Pipeline - Windower -
3DSTATE_PUSH_CONSTANT_ALLOC_PS

  "Specifies the size of the PS constant buffer. This value will
   determine the amount of data the command stream can pre-fetch
   before the buffer is full."

This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS:

"This packet must be followed by WM_STATE."

Binding table emissions for stages other than PS can be now dropped,
they were only needed for the 3DSTATE_CONSTANT_XS to be effective:

From the BSpec:

  "The 3DSTATE_CONSTANT_* command is not committed to the shader unit
   until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_*
   command is parsed."

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
175e095744 i965/blorp: Remove support for push constants
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
46e1132b80 i965/blorp: Use flat inputs instead of uniforms
v2 (Jason): Use LOAD_INPUT() macro

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
07db95c24d i965/blorp: Fix the size requirement for vertex elements
v2: Rebased as this is needed before flat inputs are enabled

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
741a245ae4 i965/blorp: Load tranformation coordinates as vec4
In preparation for loading as flat vertex input.

v2: Use LOAD_INPUT() macro

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
01f2f364d4 i965/blorp: Rename LOAD_UNIFORM to LOAD_INPUT
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Topi Pohjolainen
641868103c i965/blorp: Organize pixel kill and blend/scaled inputs into vec4s
In addition, as these are never used in parallel, add a few
assertions.

v2 (Jason): Skip some complexity by putting them into a union but
            pad rectangle grid into a vec4 instead. Also keep the
            LOAD_UNIFORM macro.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 20:43:11 +03:00
Lionel Landwerlin
dbbc4fb4cc anv/wsi: create swapchain images using specified image usage
The image usage specified by the caller of vkCreateSwapchainKHR should be
passed onto the internal image creation. Otherwise the driver might later
crash when the user tries to use the image as a combined sampler even though
the creation was explicitly created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT.

Leaving the previous VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT as this might be
expected even if the swapchain is created without any flag.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96791
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-04 10:15:48 -07:00
Indrajit Das
51227b41c6 radeon/uvd: fix overflow error while calculating bit stream buffer size
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-04 11:38:05 +02:00
Topi Pohjolainen
9e3774a460 i965/blorp: Prepare for more than two vertex attributes
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 09:05:02 +03:00
Topi Pohjolainen
e762354309 i965/blorp: Tell vertex fetcher about flat inputs
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 09:04:38 +03:00
Topi Pohjolainen
89e6b4ef5d i965/blorp: Add support for flat input buffer
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 09:04:00 +03:00
Topi Pohjolainen
9b2fa17e97 i965/blorp: Store input read mask
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 09:03:41 +03:00
Topi Pohjolainen
73f78ab44b i965/blorp: Rename push constants to inputs
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 08:37:51 +03:00
Topi Pohjolainen
f2c472fcb3 i965/blorp: Use core vertex buffer state setup
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 08:37:44 +03:00
Topi Pohjolainen
4f7e68799f i965/blorp: Split vertex data and element setup
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 08:33:41 +03:00
Topi Pohjolainen
575c8cbb54 i965: Unify vertex buffer setup
On gen >= 8 one doesn't provide ending address but number of bytes
available. This is relative to the given offset.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 08:33:41 +03:00
Topi Pohjolainen
bdab945edd i965/draw: Expose vertex buffer state setup
Also change the interface to use start and end offsets.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-04 08:33:41 +03:00
Rob Clark
7295428e41 freedreno: fix crash on smaller gpus and higher resolutions
Devices with smaller GMEM size need more tiles.  On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen.  Lets bump it up to
512.  (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-03 11:16:28 -04:00
Rob Clark
01ccb0d91e i965: don't drop const initializers in vector splitting
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-02 09:00:19 -04:00
Rob Clark
f78a6b1ce3 glsl: add driconf to zero-init unintialized vars
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.

So add driconf param to force uninitialized variables to default to zero.

This issue was observed with rust, from steam store.  But has surfaced
elsewhere in the past.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-02 09:00:19 -04:00
Rob Clark
202710d110 freedreno/ir3: support glsl linking for cmdline compiler
For .vert/.frag, now multiple can be specified on the cmdline for
purposes of linking, and the last one specified is the one that is
fed into the ir3 backend (and dumped along the way if --verbose is
specified)

Without this, varyings in frag shaders would appear as undefined.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-07-02 09:00:19 -04:00
Rob Clark
07cfe4e6aa glsl/standalone: initialize MaxUserAssignableUniformLocations
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-07-02 09:00:19 -04:00
Rob Clark
1759eb1d19 freedreno: update valid_buffer_range for SO buffers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
da39ac9c51 freedreno/ir3: support non-user_buffer consts
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
2081c1ecc0 freedreno/a2xx: move setup/restore cmds into binning pass
Rather than doing a separate submit at context create, move these cmds
to before first tile, as is done on a3xx/a4xx.  Otherwise state can
be overwritten by other contexts.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
2c3b54c278 freedreno: pass index buffer as a pipe_resource
This will be useful in a following patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Rob Clark
88cc11e971 freedreno: switch emit_const_bo() to take prsc's
We can push the unwrap of pipe_resource down.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-02 08:58:50 -04:00
Hans de Goede
d7dfd4cb51 nv30: Fix "array subscript is below array bounds" compiler warning
gcc6 does not like the trick where we point to one entry before the
array start and then start a while with a pre-increment.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-02 12:21:28 +02:00
Hans de Goede
110ef733dc nouveau: Fix a couple of "foo may be used uninitialized' compiler warnings
These are all new false positives with gcc6.

In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer
to a variable into a function initialises that variable.

In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0
enabled dst channels, this never happens, but gcc cannot know this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-02 12:21:28 +02:00
Hans de Goede
1f3c8f3664 nouveau: Fix gcc6 / c++11 auto_ptr deprecation compiler warnings
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
2aa1197eee nouveau: Add support for SV_WORK_DIM
Add support for SV_WORK_DIM for nvc0 and nve4.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
3345f70f63 nvc0: Make NVC0_CB_AUX_GRID_INFO take an index argument
This brings it inline with the other macros like NVC0_CB_AUX_UBO_INFO
and NVC0_CB_AUX_TEX_INFO.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
ef8e50a841 clover: Pass work_dim parameter of clEnqueueNDRangeKernel() to driver
In order to implement get_work_dim() the driver may need to know the
clEnqueueNDRangeKernel() work_dim parameter, so pass it to the driver.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Hans de Goede
d386cef246 tgsi: Add WORK_DIM System Value
Add a new WORK_DIM SV type, this is will return the grid dimensions
(1-4) for compute (opencl) kernels.

This is necessary to implement the opencl get_work_dim() function.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Alejandro Piñeiro
da7efadf04 mesa/main: fix error checking logic on CopyImageSubData
For the case (both src or dst) where we had a texobject, but the
texobject target was not the same that the method target, this spec
paragraph was appplied:

 /* Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core
  * Profile spec says:
  *
  *     "An INVALID_VALUE error is generated if either name does not
  *     correspond to a valid renderbuffer or texture object according
  *     to the corresponding target parameter."
  */

But for that case, the correct spec paragraph should be:
 /* Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core
  * Profile spec says:
  *
  *     "An INVALID_ENUM error is generated if either target is
  *      not RENDERBUFFER or a valid non-proxy texture target;
  *      is TEXTURE_BUFFER or one of the cubemap face selectors
  *      described in table 8.18; or if the target does not
  *      match the type of the object."
  */

specifically the last sentence: "or if the target does not match the
type of the object".

This patch fixes the error returned (s/INVALID/ENUM) for that case,
and moves up the INVALID_VALUE spec paragraph, as that case (invalid
texture object) was handled before.

Fixes:
GL44-CTS.copy_image.target_miss_match

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-02 11:54:40 +02:00
Dave Airlie
27d456cc87 st/glsl_to_tgsi: don't increase immediate index by 1.
Immediates are stored into a separate table, and are
consolidated, so if we get an immediate we don't need
to offset it as the index it has is correct.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-02 17:01:25 +10:00
Ilia Mirkin
6f4d35212b st/mesa: get max supported number of image samples from driver
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-01 23:01:03 -04:00
Ilia Mirkin
b2b5075e04 nvc0: fix up image support for allowing multiple samples
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 23:01:02 -04:00
Nicolai Hähnle
07cc838b10 st/mesa: check the texture image level in st_texture_match_image
Otherwise, 1x1 images of arbitrarily high level are accepted.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96639#add_comment
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 17:55:19 +02:00
Nicolai Hähnle
0ba053b34c st/mesa: an incomplete texture may have a zero-size first image
Fixes a regression introduced by commit 42624ea83 which triggered
an assertion in
dEQP-GLES2.functional.texture.completeness.cube.not_positive_level_0

While stImage must have a non-zero size as verified by the caller, we also
look at the size of the base image in an attempt to make a better guess at
the level0 size (this is important when the base image size is odd). However,
the base image may have a zero size even when it exists.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96629
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 17:54:40 +02:00
Nayan Deshmukh
de772bc060 st/vdpau: use bicubic filter for scaling(v6.1)
use bicubic filtering as high quality scaling L1.

v2: fix a typo and add a newline to code
v3: -render the unscaled image on a temporary surface (Christian)
    -apply noise reduction and sharpness filter on
     unscaled surface
    -render the final scaled surface using bicubic
     interpolation
v4: support high quality scaling
v5: set dst_area and dst_clip in bicubic filter
v6: set buffer layer before setting dst_area
v6.1: add PIPE_BIND_LINEAR when creating resource

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-01 12:54:58 +02:00
Nayan Deshmukh
872dd9ad15 vl: add a bicubic interpolation filter(v5)
This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
    use a constant buffer to send dst_size to frag shader
    small changes to reduce calculation in shader
v5: send half pixel offset instead of sending dst_size

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-01 12:54:33 +02:00
Vinson Lee
3fea592c4e mesa/st: Use 'struct nir_shader' instead of 'nir_shader'.
Fix this build error with GCC 4.4.

  CC     state_tracker/st_nir_lower_builtin.lo
In file included from state_tracker/st_nir_lower_builtin.c:61:
state_tracker/st_nir.h:34: error: redefinition of typedef ‘nir_shader’
../../src/compiler/nir/nir.h:1830: note: previous declaration of ‘nir_shader’ was here

Suggested-by: Rob Clark <robdclark@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96235
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-07-01 00:19:24 -07:00
Alejandro Piñeiro
a97ee60926 docs: update MESA_DEBUG envvar documentation.
silent, flush, incomplete_tex and incomplete_fbo flags were not
documented (see src/mesa/main.debug.c for more info).

FP is not checked anymore.

v2 (Brian Paul):
 * MESA_DEBUG accepts a comma-separated list of parameters.
 * Clarify how MESA_DEBUG behaves with mesa debug and release builds.
 * Updated wording.

v3: Better wording for one paragraph (Brian Paul)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-07-01 08:15:15 +02:00
Alejandro Piñeiro
5e553a6bb3 i965: intel_texture_barrier reimplemented
Fixes:
GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass

On Haswell, Broadwell and Skylake (note that in order to execute that
test, it is needed to override GL and GLSL versions).

On gen6 this test was already working without this change. It keeps
working after it.

This commit replaces the call to brw_emit_mi_flush for gen6+ with two
calls to brw_emit_pipe_control_flush:

 * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
   a render cache flush after any concurrent rendering completes and
   cause the CS to stop parsing commands until the render cache
   becomes coherent with memory.

 * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
   to clean up any stale data from the sampler caches before rendering
   continues.

Didn't touch gen4-5, basically because I don't have a way to test
them.

More info on commits:
0aa4f99f56
72473658c5

Thanks to Curro to help to tracking this down, as the root case was a
hw race condition.

v2: use two calls to pipe_control_flush instead of a combination of
    gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
v3: no need to const cache invalidation (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-01 08:09:27 +02:00
Ilia Mirkin
51ca57df01 nv30: go back to not using viewport validate function for swtnl
The output of draw requires a null viewport transform, which the regular
code is ill-equiped to do. Reinstate the original settings in the render
path, and add setting of the viewport clip polygon based on fb
width/height (as that is all taken care of by draw).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 01:04:10 -04:00
Ilia Mirkin
71609c9954 nv30: fix viewport clipping settings to be based on viewport, not rt
This fixes a ton of "*clip*" dEQP GLES2 tests, as well as
triangle-guardband-viewport in piglit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-01 00:02:23 -04:00
Brian Paul
c823ff8dfb gallium/util: check for window cliprects in util_can_blit_via_copy_region()
We can't blit with resource_copy_region() if there are window clip rects.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-30 18:19:09 -06:00
Chuck Atkins
d8d6091a84 gallium: Force blend color to 16-byte alignment
This aligns the 4-element color float array to 16 byte boundaries.  This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.

v2: Fixed indentation and added a lengthy comment explaining the
    reason for the alignment.

Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-06-30 17:04:41 -05:00
Chuck Atkins
c1bf6692be swr: Refactor checks for compiler feature flags
Encapsulate the test for which flags are needed to get a compiler to
support certain features.  Along with this, give various options to try
for AVX and AVX2 support.  Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different.  This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.

This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.

v2: Pass preprocessor-check argument as true-state instead of
    false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__.  Additional defines suchas
    __FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
    w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
    swr_require_cxx_feature_flags

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@Intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-06-30 16:55:01 -05:00
Brian Paul
eb79b2b331 st/wgl: make own_mutex() non-static
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-30 15:29:07 -06:00
Andres Gomez
e0f4504adf glsl: atomic counters are different than their uniforms
The linker deals with atomic counters in terms of uniforms but the
data structure are called after the atomic counters.

Renamed the data structures used in the linker for disambiguation.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-06-30 23:55:32 +03:00
Andres Gomez
0f00c6dd77 glsl: count atomic counters correctly
Currently the linker uses the uniform count for the total number of
atomic counters. However uniforms don't include the innermost array
dimension in their count, but atomic counters are expected to include
them.

Although the spec doesn't directly state this, it's clear how offsets
will be assigned for arrays.

From OpenGL 4.2 (Core Profile), page 98:

  "  * Arrays of type atomic_uint are stored in memory by element
       order, with array element member zero at the lowest offset. The
       difference in offsets between each pair of elements in the
       array in basic machine units is referred to as the array
       stride, and is constant across the entire array. The stride can
       be queried by calling GetIntegerv with a pname of
       ATOMIC_COUNTER_- ARRAY_STRIDE after a program is linked."

From that it is clear how arrays of atomic counters will interact with
GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE.

For other kinds of uniforms it's also clear that each entry in an
array counts against the relevant limits.

Hence, although inferred, this is the expected behavior.

Fixes GL44-CTS.arrays_of_arrays_gl.AtomicDeclaration

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-06-30 23:55:32 +03:00
Brian Paul
c84444ea85 svga: use SVGA3D_vgpu10_BufferCopy() for buffer copies
So that we do copies host-side rather than in the guest with map/memcpy.

Tested with piglit arb_copy_buffer-subdata-sync test and new
arb_copy_buffer-intra-buffer-copy test.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-06-30 14:32:11 -06:00
Brian Paul
29a38f37ee svga: add SVGA3D_vgpu10_BufferCopy()
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:10 -06:00
Brian Paul
88a344253c svga: flush buffers when mapping for reading
With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have
to make sure any pending map-write operations are completed before reading
if the buffer is dirty.  Otherwise the ReadbackSubResource operation could
get stale data from the host buffer.

This allows the piglit arb_copy_buffer-subdata-sync test to pass when
we start using the SVGA3D_vgpu10_BufferCopy command.

v2: check the sbuf->dirty flag in the outer conditional, per Charmaine.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:10 -06:00
Neha Bhende
fa2cdd973d svga: enable ARB_copy_image extension in the driver
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:09 -06:00
Brian Paul
4a54514958 svga: try blitting with copy region in more cases
We previously could do blits with util_resource_copy_region() when doing
'loose' format checking.  Also do blits with util_resource_copy_region()
when the blit src/dst formats (not the underlying resources) exactly
match.  Needed for GL_ARB_copy_image.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:08 -06:00
Brian Paul
92b44efef4 svga: use copy_region_vgpu10() for region copies when possible
v2: remove extra svga_define_texture_level() call, per Charmaine.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:08 -06:00
Neha Bhende
1d0be402c7 svga: use vgpu10 CopyRegion command when possible
Do texture->texture copies host-side with this command when possible.
Use the previous software fallback otherwise.

Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:07 -06:00
Brian Paul
3a3c3d124a svga: set render target flag for snorm surfaces
We don't normally support rendering to SNORM surfaces, but with
GL_ARB_copy_image we can copy to them if we treat them as typeless
and use a UNORM surface view.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:07 -06:00
Brian Paul
46e7355a13 svga: add new svga_format_is_uncompressed_snorm() helper
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:07 -06:00
Brian Paul
68388043f3 svga: adjust sampler view format for RGBX
We previously handled the case of a RGBX sampler view of a RGBA surface.
Add the reverse case too.  For GL_ARB_copy_image.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:07 -06:00
Brian Paul
1049002eae svga: adjust render target view format for RGBX
For GL_ARB_copy_image we may be asked to create an RGBA view of
a RGBX surface.  Use an RGBX view format for that case.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:07 -06:00
Neha Bhende
429ace2fbc svga: don't advertise support for R32G32B32_UINT/SINT surface formats
We want to be able to copy between different 32-bit, 3-channel surface
formats for GL_ARB_copy_image but since we don't support R32G32B32_FLOAT
for textures (it's not blendable and wouldn't work for render to texture)
we can't support 32-bit, 3-channel integer formats.

The state tracker will choose 4-channel formats instead.

Fixes the piglit arb_copy_image-format test for several cases.

Note: This change may need to be revisited if/when the texture_view exension
is enabled in driver.

Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul
eb0ced74f6 svga: use untyped surface formats in most cases
This allows us to do copies between different, but compatible, surface
formats such as RGBA8_UNORM, RGBA8_SINT, RGBA8_UINT, etc. for
GL_ARB_copy_image.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul
5f1335878e gallium/util: add tight_format_check param to util_can_blit_via_copy_region()
The VMware driver will use this for implementing GL_ARB_copy_image.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul
a029d9f074 gallium/util: simplify a few things in util_can_blit_via_copy_region()
Since only the src box can have negative dims for flipping, just
comparing the src/dst box sizes is enough to detect flips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul
5d31ea4b8f gallium/util: new util_try_blit_via_copy_region() function
Pulled out of the util_try_blit_via_copy_region() function.  Subsequent
changes build on this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Neha Bhende
7988513ac3 svga: Fix failures caused in fedora 24
SVGA_3D_CMD_DX_GENRATE_MIPMAP & SVGA_3D_CMD_DX_SET_PREDICATION commands
are not presents in fedora 24 kernel module. Because of this
reason application like supertuxkart are not running.

v2: Add few comments and code modifications suggested by Brian P.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 12:45:09 -06:00
Brian Paul
52f297d144 st/wgl: remove unneeded inline qualifiers
No effect on size of the .o files (optimized build).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-06-30 12:43:50 -06:00
Brian Paul
395ee18bac st/wgl: add a stw_device::initialized field
Set when the stw_dev object's initialization is completed.  We test
for this in the window callback function to avoid potential crashes
on start-up in multi-threaded applications.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-06-30 12:43:50 -06:00
Brian Paul
128feef40e st/wgl: refactor framebuffer locking code
Split the old stw_framebuffer_reference() function into two new
functions: stw_framebuffer_reference_locked() which increments
the refcount and stw_framebuffer_release_locked() which decrements
the refcount and destroys the buffer when the count hits zero.

Original patch by Jose.  Modified by Brian (clean-ups, lock assertion
checks, etc).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-06-30 12:43:50 -06:00
José Fonseca
25cccb5bec st/wgl: rename curctx to old_ctx in stw_make_current()
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-30 12:43:49 -06:00
Brian Paul
24004a2435 st/wgl: release the pbuffer DC at the end of wglBindTexImageARB()
Otherwise we were leaking DC GDI objects and if wglBindTexImageARB()
was called enough we'd eventually hit the GDI limit of 10,000 objects.
Things started failing at that point.

v2: also release DC if we return early, per Charmaine.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-06-30 12:43:49 -06:00
Matt Turner
058c70bae1 mesa: Close fp on error path.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-30 11:08:39 -07:00
Matt Turner
e3d9125b77 i965: Simplify foreach_inst_in_block_safe() macro.
We know what the end looks like without examining .tail: it's NULL. It's
always NULL.
2016-06-30 11:08:39 -07:00
Andres Gomez
c4e47ab971 Revert "i965: get PrimitiveMode from the program rather than the shader struct"
This reverts commit 644e015f0b.

PrimitiveMode from the program doesn't always hold a valid value that
is neither of GL_TRIANGLES, GL_QUADS nor GL_ISOLINES when reaching
this code. This caused regressions in the following CTS tests:
  GL44-CTS.stencil_texturing.functional
  GL44-CTS.shading_language_420pack.binding_images
  GL44-CTS.shading_language_420pack.binding_samplers
  GL44-CTS.shading_language_420pack.binding_uniform_single_block
  GL44-CTS.shading_language_420pack.implicit_conversions
  GL44-CTS.shading_language_420pack.initializer_list
  GL44-CTS.shading_language_420pack.length_of_vector_and_matrix
  GL44-CTS.shading_language_420pack.line_continuation

Hence, we rather take it from the linked shader.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-06-30 16:20:22 +03:00
Timothy Arceri
1591e668e1 glsl/mesa: move duplicate shader fields into new struct gl_shader_info
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
fd2b3da5c8 glsl/main: remove unused params and make function static
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
32c410d2df glsl: simplify link_uniform_blocks()
There is only ever one shader so simplify the input params.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
1fb8c6df88 glsl/mesa: split gl_shader in two
There are two distinctly different uses of this struct. The first
is to store GL shader objects. The second is to store information
about a shader stage thats been linked.

The two uses actually share few fields and there is clearly confusion
about their use. For example the linked shaders map one to one with
a program so can simply be destroyed along with the program. However
previously we were calling reference counting on the linked shaders.

We were also creating linked shaders with a name even though it
is always 0 and called the driver version of the _mesa_new_shader()
function unnecessarily for GL shader objects.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
378f07ccb5 mesa: don't print name in _mesa_append_uniforms_to_file()
This is only used to print linked shaders which always have a name of 0
so this was pointless.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
e8c8aa0320 mesa: remove unreachable code from _mesa_write_shader_to_file()
_mesa_write_shader_to_file() is only used to print gl shader objects
so Program should never be set as it only gets set for linked shaders.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
9b41c743cc glsl: pass symbols to find_matching_signature() rather than shader
This will allow us to later split gl_shader into two structs.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
47f8381730 glsl: pass symbols rather than shader to _mesa_get_main_function_signature()
This will allow us to split gl_shader into two different structs, one for
shader objects and one for linked shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
9e9d01cbe8 mesa: don't use drivers NewShader function when creating shader objects
The drivers function only needs to be used when creating a struct for
linked shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Timothy Arceri
962933b6d4 glsl: make cross_validate_globals() more generic
Rather than passing in gl_shader we now pass in the IR. This will
allow us to later split gl_shader into two structs. One for use
as a linked per stage shader struct and one for use as a GL shader
object.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-30 16:51:25 +10:00
Ian Romanick
5921f372c8 mapi: Export all GLES 3.1 functions in libGLESv2.so
Khronos recommends that the GLES 3.1 library also be called libGLESv2.
It also requires that functions be statically linkable from that
library.

NOTE: Mesa has supported the EGL_KHR_get_all_proc_addresses extension
since at least Mesa 10.5, so applications targeting Linux should use
eglGetProcAddress to avoid problems running binaries on systems with
older, non-GLES 3.1 libGLESv2 libraries.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Reported-by: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Acked-by: Chad Versace <chad.versace@intel.com>
2016-06-29 14:28:59 -07:00
Chad Versace
d3a147ba40 i965: Use drmIoctl for DRM_I915_GETPARAM (v2)
Stop using drmCommandWriteRead for such a simple ioctl.

v2: Handle errno correctly. [ickle]

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-29 13:44:23 -07:00
sonjiang
b928ff6f62 radeon/uvd: fix a h265 context size bug
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-29 15:30:25 -04:00
sonjiang
5c80354a23 radeon/uvd: seperate uvd context buffer from DPB
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-29 15:30:20 -04:00
sonjiang
28f85eab49 radeon uvd add uvd fw version for amdgpu
Signed-off-by: sonjiang <sonny.jiang@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-29 15:30:14 -04:00
Samuel Pitoiset
fa10d1d674 nv50/ir: print EMIT subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-29 20:37:38 +02:00
Samuel Pitoiset
a6d3b2e176 nv50/ir: print RSQ/RCP subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-29 20:37:36 +02:00
Samuel Pitoiset
908ba19554 nv50/ir: print PIXLD subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-29 20:37:33 +02:00
Samuel Pitoiset
c0d92078bb nv50/ir: print SHFL subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-29 20:37:18 +02:00
Rodrigo Vivi
85ea8deb26 i965: Removing PCI IDs that are no longer listed as Kabylake.
This is unusual. Usually IDs listed on early stages of platform
definition are kept there as reserved for later use.

However these IDs here are not listed anymore in any of steppings
and devices IDs tables for Kabylake on configurations overview
section of BSpec.

So it is better removing them before they become used in any
other future platform.

Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2016-06-29 11:14:19 -07:00
Rodrigo Vivi
bdff2e5547 i956: Add more Kabylake PCI IDs.
The spec has been updated adding new PCI IDs.

Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2016-06-29 11:14:19 -07:00
Marek Olšák
63f8d648f0 gallium/radeon: remove zombie textures kept alive by DCC stat gathering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
44906101c4 gallium/radeon: don't re-create queries for DCC stat gathering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
82b39f3521 gallium/radeon: assume X11 DRI3 can use at most 5 back buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
9ae41227c2 gallium/radeon: separate DCC starts as disabled (ps_draw_ratio = 0)
DRI3:
- Only slows clears can enable it for the first frame.
- A good PS/draw ratio can enable it for other frames.

DRI2:
- Only slows clears can enable it for a frame.
- Page-flipped color buffers are unref'd at the end of each frame,
  so it can't be enabled in any other way.
- Relying on slow clears is sufficient for our synthetic benchmarks.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
9fd4eff43c gallium/radeon: R600_DEBUG=nodccfb disables separate DCC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
36cf5a57c2 gallium/radeon: add and use r600_texture_reference
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
6da92df538 gallium/radeon: add a HUD query for PS draw ratio stats from separate DCC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
49e3c74cdd gallium/radeon: add a heuristic enabling DCC for scanout surfaces (v2)
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.

At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)

Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.

The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.

The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.

The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).

v2: rename stuff, add assertions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
9124457bff gallium/radeon: add state setup for a separate DCC buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
fa7c927625 radeonsi: always calculate DCC info even if it's not used immediately
for a later use

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
ebb9c7d7c4 radeonsi: unreference framebuffer state with set_framebuffer_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Marek Olšák
e607a6be2b gallium/radeon: add flag R600_QUERY_HW_FLAG_BEGIN_RESUMES
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 20:12:00 +02:00
Chad Versace
a2ae888929 i965: Use intel_get_param() more often
Replace some open-coded ioctls with intel_get_param().

This is just a cleanup. No change in behavior.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-29 09:34:21 -07:00
Chad Versace
844e0bd946 i965: Refactor intel_get_param()
Replace the function's __DRIscreen parameter with struct intel_screen.
The callsites feel more natural that way.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-29 09:34:21 -07:00
Marek Olšák
0c135a773f radeonsi: don't advertise multisample shader images
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
eff81cbc81 radeonsi: enable distributed tess on multi-SE parts only
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
dd56d04568 radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
ported from Vulkan

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
9a71bf8858 radeonsi: enable CU0 in each SE for LS-HS execution
Offchip-only tessellation allows this.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Marek Olšák
4b11ef23b4 radeonsi: use conformant line rasterization
AA lines are not completely correct (see TODO), but everything else
should be.

+ 3 linestipple piglits

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-29 16:34:22 +02:00
Rob Herring
789ed13284 Android: add missing u_math.h include path for libmesa_isl
Commit 87d062a940 ("i965: Fix shared local memory size for Gen9+.")
added u_math.h include which broke the Android build:

In file included from external/mesa3d/src/intel/isl/isl_storage_image.c:25:
In file included from external/mesa3d/src/mesa/drivers/dri/i965/brw_compiler.h:29:
external/mesa3d/src/mesa/main/macros.h:35:10: fatal error: 'util/u_math.h' file not found
         ^

Add the missing include paths for libmesa_isl.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Kenneth Garunke <kenneth@whitecape.org>
2016-06-28 12:48:46 -07:00
Charmaine Lee
6397c12f32 svga: force direct map for transfering multiple slices
With commit fb9fe35, we start using transfer_inline_write
for memcpy of TexSubImage. But SurfaceDMA command does not work
well with texture array. This patch forces direct map when
transfering multiple slices of a texture array.

Fixes piglit regression "texelFetch fs sampler1DArray"

Tested with MTT piglit, glretrace, conform.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-06-28 13:43:23 -06:00
Brian Paul
d65c4e22a8 svga: whitespace, line wrapping fixes in svga_surface.c 2016-06-28 13:43:23 -06:00
Samuel Pitoiset
cc97b6a34a gm107/ir: make sure that flagsDef is set when emitting setcond
Rely on the existence of a second destination when emitting a setcond
flag is dangerous, because this doesn't mean that the flag has been
correctly set. Instead rely on flagsDef like what emitX() does
for flagsSrc.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-28 18:38:56 +02:00
Grazvydas Ignotas
234323558d doc: improve INTEL_DEBUG documentation
Remove 'reg' option that does not actually exist, elaborate more about
'sync' and add the missing options.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-28 07:21:07 -07:00
Marek Olšák
c1dbc563f4 radeonsi: set PA_SU_SMALL_PRIM_FILTER_CNTL register on Polaris
This was missing.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-06-28 15:47:13 +02:00
Boyuan Zhang
06f0a4d9ed radeon/vce: use vce structure for vce 52 firmware
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-06-28 08:58:03 -04:00
Boyuan Zhang
533bd6ae17 radeon/vce: add vce structures
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-06-28 08:58:00 -04:00
Leo Liu
05d302ffe2 st/omx: fix decoder fillout for the OMX result buffer
The call for vl_video_buffer_adjust_size is with wrong order of
arguments, apparently it will have problem when interlaced false;

The size of OMX result buffer depends on real size of clips, vl buffer
dimension is aligned with 16, so 1080p(1920*1080) video will overflow
the OMX buffer

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-06-28 08:57:56 -04:00
Hans de Goede
459cc94507 pipe_loader_sw: Fix fd leak when instantiated via pipe_loader_sw_probe_kms
Make pipe_loader_sw_probe_kms take ownership of the passed in fd,
like pipe_loader_drm_probe_fd does.

The only caller is dri_kms_init_screen which passes in a dupped fd,
just like dri2_init_screen passes in a dupped fd to
pipe_loader_drm_probe_fd.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-28 12:29:54 +02:00
Jan Vesely
87787e9079 clover: Fix kernel metadata retrieval after clang r273425
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2016-06-27 23:12:37 -07:00
Francisco Jerez
a8a966ddb5 clover/llvm: Fix copyright attribution of invocation.cpp.
This file still only has my name on the copyright notice even though
most of the code (likely more than 90% of it) was authored by various
contributors -- It doesn't seem right to have the whole file
attributed to myself.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Serge Martin <edb+mesa@sigluy.net>
2016-06-27 23:12:35 -07:00
Kenneth Graunke
034bd25327 i965: Print EOT in fs_visitor::dump_instruction().
This was useful when debugging the previous commit's issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-27 16:36:57 -07:00
Kenneth Graunke
7e7e501acf i965: Make emit_urb_writes() not produce an EOT message for GS.
emit_urb_writes() contains code to emit an EOT write with no actual
data when there are no output varyings.  This makes sense for the VS
and TES stages, where it's called once at the end of the program.

However, in the geometry shader stage, emit_urb_writes() is called once
for every EmitVertex().  We explicitly emit a URB write with EOT set at
the end of the shader, separately from this path.  So we'd better not
terminate the thread.  This could get us into trouble for shaders which
do EmitVertex() with no varyings followed by SSBO/image/atomic writes.

It also caused us to emit multiple sends with EOT set, which apparently
confuses the register allocator into not using g112-g127 for all but
the first one.  This caused EU validation failures in OglGSCloth
shaders in shader-db.  (The actual application was fine, but shader-db
thinks there are no outputs because it doesn't understand transform
feedback.)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-27 16:36:51 -07:00
Kenneth Graunke
a36a73a7b8 glsl: Ignore ir_texture in lower_const_arrays_to_uniforms.
The only part of an ir_texture which can be an array is the
offsets array in textureGatherOffsets() calls.  We don't want
to lower those, because they're required to remain constants.

Fixes textureGatherOffsets with Gallium drivers such as llvmpipe,
which commit ef78df8d3b regressed.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-27 16:36:30 -07:00
Samuel Pitoiset
7b9b096775 gm107/ir: add missing setcond flags for LOP variants
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-28 00:30:01 +02:00
Samuel Pitoiset
83a4f28dc2 gm107/ir: make use of LOP32I for all immediates
LOP only allows to emit 19-bits immediates.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-28 00:29:53 +02:00
Dave Airlie
c7cc264ca9 virgl: reduce some limits for now
These need to be passed from the host in caps structure if they
are larger, this fixes a bunch of tests on Intel hw, that I'd
put the limits too high for.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-28 06:49:26 +10:00
Julien Isorce
6e4cf937f8 st/omx: count number of slices
Used by nouveau driver.
Similar patch was done for st/va:
851e7e12aa

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-27 17:52:15 +01:00
Julien Isorce
e10f1fcebe st/omx: add support for nouveau / interlaced
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-27 17:52:15 +01:00
Julien Isorce
23b7a83cc1 st/omx: retrieve preferred interlaced and buffer_formats
Interlaced can be true for nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-27 17:52:15 +01:00
Marek Olšák
f6ff483646 radeonsi: use optimal WD settings for primitive restart on Polaris
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-27 13:54:39 +02:00
Gurkirpal Singh
46dba701d8 st/va: Check NULL pointer
Call to handle_table_get in vlVaDestroySurfaces can
return NULL on failure.

CID: 1243522

Signed-off-by: Gurkirpal Singh <gurkirpal204@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
2016-06-27 08:09:08 +01:00
Eric Anholt
d20b89e928 nir: Fix copy_prop_src when src is an indirect access on a reg.
The intent was to continue down the indirect chain, not to call ourselves
with unchanged input arguments.  Found by code inspection, and comparison
to copy_prop_alu_src().

We haven't hit this because callers of NIR's copy prop are doing so in
SSA, before indirect variable dereferences have been lowered to registers.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-26 15:38:09 -07:00
Samuel Pitoiset
c7fa3c92f8 gm107/ir: make use of MOV32I for all immediates
MOV only allows to emit 19-bits immediates. This is similar to the
previous fix I did for IMUL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-27 00:28:02 +02:00
Jordan Justen
367cf3a2e3 i965: Use miptree to decide format on multi-plane images for gen < 7
This wasn't handled correctly for multi-plane images on gen < 7 in
727a9b2493.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96674
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-26 10:49:34 -07:00
Ilia Mirkin
1f5f64b91f nvc0: update "derived" state function names
derived_1/2/etc aren't too informative. Instead name them based on the
state they're derived from.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-26 12:04:55 -04:00
Ilia Mirkin
89a7496b9d nvc0: provide support for unscaled poly offset units
On at least Kepler hardware, the units differ based on RT format. Emit a
properly scaled value for Z16 depth buffers vs other formats, to help
out st/nine.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-26 12:04:55 -04:00
Samuel Pitoiset
b84c97587b gm107/ir: make use of IMUL32I for all immediates
IMUL only allows to emit 19-bits immediates. This is similar to
d30768025a which fixed the same thing
for the GK110 emitter.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-26 17:33:06 +02:00
Marek Olšák
d93bacc1fa radeonsi: make si_is_format_supported static
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
3eacbc52d5 radeonsi: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
7db10093d3 gallium/radeon: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
1c5a10497a gallium/radeon/winsyses: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
d5383a7d31 gallium/radeon: use r600_resource_reference
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Jason Ekstrand
81978c6feb nir: Add a NIR_VALIDATE environment variable
It defaults to true so default behavior doesn't change but it allows you to
do NIR_VALIDATE=false if you don't want validation.  Disabling validation
can substantially speed up shader compiles so you frequently want to turn
it off if compiler invariants aren't in question.
Reviewed-by: Matt Turner <mattst88@gmail.com>

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-25 07:34:20 -04:00
Axel Davy
b76fa56739 st/nine: Use offset_units_unscaled
offset_units_unscaled enables proper support
for depth bias for gallium nine. Use it
if available.

Solves issues with some games using depth bias.
For example:
https://github.com/iXit/Mesa-3D/issues/220

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-06-25 10:16:15 +02:00
Axel Davy
f6704f2a4d r600g: Implement POLYGON_OFFSET_UNITS_UNSCALED
Empirical tests show that the polygon offset
behaviour is entirely determined by the content of
the PA_SU_POLY_OFFSET states, and not by the depth buffer
format bound.

PA_SU_POLY_OFFSET seems to directly set the parameters of
the polygon offset formula, and setting 0 for
PA_SU_POLY_OFFSET_DB_FMT_CNTL (ie setting the unorm depth
bias behaviour with a scale of 2^0 = 1.0f) gives the unscaled
behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
be7957b156 radeonsi: Implement POLYGON_OFFSET_UNITS_UNSCALED
Empirical tests show that the polygon offset
behaviour is entirely determined by the content of
the PA_SU_POLY_OFFSET states, and not by the depth buffer
format bound.

PA_SU_POLY_OFFSET seems to directly set the parameters of
the polygon offset formula, and setting 0 for
PA_SU_POLY_OFFSET_DB_FMT_CNTL (ie setting the unorm depth
bias behaviour with a scale of 2^0 = 1.0f) gives the unscaled
behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
c2b7b48a54 radeon: Remove useless pa_su_poly_offset_db_fmt_cntl
pa_su_poly_offset_db_fmt_cntl usages were removed in
previous patches.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
fe2ec50d75 r600g: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to poly offset states for evergreen
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with the other poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.

v2: Increase the num_dw field for the poly offset atom

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
400e8d8c40 r600g: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to poly offset states for r600
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with the other poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.

v2: Increase the num_dw field for the poly offset atom

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
ff5abe9d90 radeonsi: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to poly offset states
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with rasterizer poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Axel Davy
59a692916c gallium: Add a cap for offset_units_unscaled
D3D9 has a different behaviour for depth bias.

For OGL/D3D1X, the depth bias unit is the
minimal resolvable value for the depth buffer,
which depends on the format (and has different
behaviour for float depth buffers).

For D3D9, the depth bias unit is 1.0f.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-25 10:16:15 +02:00
Jordan Justen
727a9b2493 i965: Skip update_texture_surface when the plane doesn't exist
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96607
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-24 18:13:18 -07:00
Kenneth Graunke
c4a6b0d2d2 i965: Validate a few SEND-from-GRF requirements.
We recently had a mistake where we emitted SEND instructions with EOT
set, but from g107 rather than g112-g127.  Adding validation code should
prevent these sorts of problems from slipping back in.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:55 -07:00
Kenneth Graunke
192813e50e i965: Delete send-from-GRF only opcodes from implied_mrf_writes().
These only exist post-Sandybridge, and always use send-from-GRF.
So inst->base_mrf will be -1, and we will have already returned 0.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:55 -07:00
Kenneth Graunke
255cff76d9 i965: Drop unnecessary inst->base_mrf = -1 assignments.
These are now unnecessary, as base_mrf is -1 by default.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:55 -07:00
Kenneth Graunke
3e04e3758e i965: Set fs_inst::base_mrf = -1 by default.
On MRF platforms, we need to set base_mrf to the first MRF value we'd
like to use for the message.  On send-from-GRF platforms, we set it to
-1 to indicate that the operation doesn't use MRFs.

As MRF platforms are becoming increasingly a thing of the past, we've
forgotten to bother with this.  It makes more sense to set it to -1 by
default, so we don't have to think about it for new code.

I searched the code for every instance of 'mlen =' in brw_fs*cpp, and
it appears that all MRF-based messages correctly program a base_mrf.

Forgetting to set base_mrf = -1 can confuse the register allocator,
causing it to think we have a large fake-MRF region.  This ends up
moving the send-with-EOT registers earlier, sometimes even out of
the g112-g127 range, which is illegal.  For example, this fixes
illegal sends in Piglit's arb_gpu_shader_fp64-layout-std430-fp64-shader,
which had SSBO messages with mlen > 0 but base_mrf == 0.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:55 -07:00
Kenneth Graunke
3e258f7e31 i965: Drop unused return value from intel_finalize_mipmap_tree().
The old return type of GLuint was wonky - it should have been bool.
But nothing actually uses the return value anyway, so we can just drop
that and make it a void function.

In theory, it might make sense to ask whether the texture validated
successfully, but just checking intel_obj->mt != NULL works for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:44 -07:00
Kenneth Graunke
8ee23d6866 i965: Move contents of brw_tex.c into intel_tex_validate.c.
brw_tex.c is a tiny file containing a single function.  It's closely
tied to the validation logic in intel_tex_validate.c, so it makes sense
to put both in the same file.

While we're at it, update the function to our modern style.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-24 15:03:44 -07:00
Marek Olšák
28d0d0c5b4 radeonsi: fix fractional odd tessellation spacing for Polaris
ported from Vulkan (and no source explains why this is needed)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 17:36:43 +02:00
Marek Olšák
0d638f4b3d radeonsi: set some VGT context registers on SI-CI
the kernel sets them, but other UMDs can change them

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Marek Olšák
8f3ef4e8b8 radeonsi: optimize rendering to linear color buffers
loosely ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Marek Olšák
e4b22c9fa1 radeonsi: set almost optimal settings in SC_MODE_CNTL_1
ported from Vulkan

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Marek Olšák
603c073ec2 gallium/radeon: let drivers specify SC_MODE_CNTL_1 fields
radeonsi will set more fields

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Marek Olšák
ae0d2d15cc gallium/radeon: disable complicated point clipping against user clip planes
Nothing in the GL spec says that we should expand points to triangles.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Marek Olšák
1e8adb0ee4 radeonsi: fix a compute shader hang with big threadgroups on SI & CI
ported from Vulkan

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 16:24:53 +02:00
Ilia Mirkin
b433cb51e5 nvc0: when mapping directly, provide accurate xfer info + start
We were ignoring the incoming box parameters, and were providing totally
bogus stride/layer stride, and other bits, for when a non-full-surface
map was requested.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-24 09:53:13 -04:00
Ilia Mirkin
3f0fa3b32d st/mesa: don't assume that the whole surface gets mapped
Under some circumstances, the driver may choose to return a temporary
surface instead of a pointer to the original. Make sure to pass the
actual view volume to be mapped to the transfer function rather than
adjusting the map pointer after-the-fact.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 09:53:13 -04:00
Nicolai Hähnle
0da890e62c radeonsi: drop the DRAW_PREAMBLE packet on Polaris
It will be removed from the firmware for the Polaris.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 13:28:46 +02:00
Nicolai Hähnle
2aa0485902 radeonsi: use DRAW_(INDEX_)INDIRECT_MULTI on Polaris
The non-MULTI variants will be removed in Polaris firmware.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 13:28:32 +02:00
Francesco Ansanelli
82ab3f27ff st/mesa: handle negative _ColorDrawBufferIndexes values correctly
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:41:22 +02:00
Nicolai Hähnle
bc4b7ebbfd winsys/radeon: add guard pages when R600_DEBUG=check_vm is enabled
This should help flush out GPU VM faults.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle
49c0b4a0db winsys/amdgpu: add guard pages when R600_DEBUG=check_vm is enabled
This should help flush out GPU VM faults.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle
dbac88a839 radeonsi: report a failure to parse dmesg instead of asserting
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle
d46a9db840 radeon: check VM faults from DMA flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle
80dd7870fe radeonsi: move gfx fence wait out of si_check_vm_faults
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:03 +02:00
Nicolai Hähnle
ad8438403b radeonsi: extract IB and bo list saving into separate functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:02 +02:00
Nicolai Hähnle
b3de274b05 st/mesa: fix readpixels regression with MESA_pack_invert
Fixes an error introduced in commit 3948cd3797.

Reported-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:36:02 +02:00
Marek Olšák
05e741c6d6 radeonsi: set LLVM denormal flags
- make sure FP32 denormals will stay disabled in LLVM in the future
  (the current default is disabled)
- tell LLVM that FP64 denormals are enabled

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-24 12:31:03 +02:00
Marek Olšák
0e1fefa722 radeonsi: emit 1/sqrt for RSQ
We don't need the clamped version and we don't have to use any intrinsic.

Stats on Tonga:

15382 shaders in 9128 tests
Totals:
SGPRS: 1230560 -> 1230560 (0.00 %)
VGPRS: 469577 -> 462504 (-1.51 %)
Code Size: 22089908 -> 21730052 (-1.63 %) bytes
LDS: 598 -> 598 (0.00 %) blocks
Scratch: 283648 -> 281600 (-0.72 %) bytes per wave
Max Waves: 125664 -> 126969 (1.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 547280 -> 547280 (0.00 %)
VGPRS: 269132 -> 262059 (-2.63 %)
Code Size: 15709604 -> 15349748 (-2.29 %) bytes
LDS: 198 -> 198 (0.00 %) blocks
Scratch: 74752 -> 72704 (-2.74 %) bytes per wave
Max Waves: 47840 -> 49145 (2.73 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-24 12:31:03 +02:00
Jan Vesely
54c4d525da r600g: Enable FMA on chips that support it
v2: Merge with PIPE_SHADER_CAP_DOUBLES
    Add CHIP_HEMLOCK

v3: only set the instruction on EG and CM

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-06-24 12:30:59 +02:00
Marek Olšák
cbb5adb908 gallium/u_queue: allow the execute function to differ per job
so that independent types of jobs can use the same queue.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
4a06786efd gallium/u_queue: reduce the number of mutexes by 2
by converting semaphores to condvars and using the main mutex

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
2fba0aaa70 gallium/u_queue: add an option to name threads
for debugging

v2: correct the snprintf use

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
404d0d50d8 gallium/u_queue: add an option to have multiple worker threads
independent jobs don't have to be stuck on only one thread

v2: use CALLOC & FREE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
4358f6dd13 gallium/u_queue: rewrite util_queue_fence to allow multiple waiters
Checking "signalled" is first done without a mutex, then with a mutex.
Also, checking without waiting doesn't lock the mutex. This is racy, but
should be safe.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
d8367e91f2 gallium/u_queue: use a ring instead of a stack
and allow specifying its size in util_queue_init.

v2: use CALLOC & FREE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Jordan Justen
c36a363a2d i965: Preserve the internal format of the dri image
Since the OpenGLES API is strict about the internal format matching
the for many operations, we need to preserve it.

See _mesa_es3_error_check_format_and_type in
src/mesa/main/glformats.c.

Fixes ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96351
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-23 20:44:00 -07:00
Chad Versace
a0f3c3c9d4 anv: Add anv_render_pass_attachment::store_op
Will be needed for resolving auxiliary surfaces.

I didn't add anv_render_pass_attachment::stencil_store_op, as the driver
would likely never use it, as stencil surfaces never have auxiliary
surfaces.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-23 16:10:25 -07:00
Gurkirpal Singh
15d3777b74 gbm: Fix comments
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-23 13:55:03 -07:00
Eric Engestrom
b293e8b470 gbm: doc fixes
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-23 13:55:03 -07:00
Giuseppe Bilotta
60a27ad122 Remove wrongly repeated words in comments
Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by

v2:
    * proper commit message and non-joke title;
    * replace two 'as is' followed by 'is' to 'as-is'.
v3:
    * 'a integer' => 'an integer' and similar (originally spotted by
      Jason Ekstrand, I fixed a few other similar ones while at it)

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-23 13:55:03 -07:00
Brian Paul
5d07998317 svga: update some comments in svga_buffer_handle()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 13:02:28 -06:00
Brian Paul
fe76212873 svga: add a const qualifier in svga_buffer_upload_piecewise()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 13:02:28 -06:00
Brian Paul
e82fa96d19 svga: minor code refactor for svga_buffer_upload_command()
Put the HBS code into a separate function.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 13:02:28 -06:00
Brian Paul
db721da5a3 svga: minor code simplification in svga_context_finish()
Signed-off-by: Brian Paul <brianp@vmware.com>

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 13:02:28 -06:00
Kenneth Graunke
b0629e6894 i965: Implement rasterizer discard via SOL unless required for queries.
We currently use CL_INVOCATION_COUNT for the GL_PRIMITIVES_GENERATED
query, which involves passing all primitives to the clipper.  When
rasterizer discard is enabled, we program the clipper in REJECT_ALL
mode, rather than using the SOL stage's "Rendering Disable" feature.

See commit f09b91f782 for an explanation
of why we implement GL_PRIMITIVES_GENERATED this way.

Apparently the SOL stage's "Rendering Disable" feature is a lot faster
than having the clipper reject all primitives.  It's safe to use when
no GL_PRIMITIVES_GENERATED query is active, as we don't care about
CL_INVOCATION_COUNT incrementing.

This patch makes us use SO_RENDERING_DISABLE when no query is active,
but continues falling back to the clipper in REJECT_ALL mode when the
queries are enabled.  It brings back the perf_debug for the clipper
case (which I removed in commit 1f9445ff57, thinking it wasn't useful).

Improves performance in Gl32GSCloth by 84.8303% +/- 2.07132% (n = 10)
on my Broadwell GT2 laptop.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
4db98f8beb i965: Combine 3DSTATE_STREAMOUT emitters and genX_sol_state atoms.
They're basically the same.  Let's avoid the code duplication.

v2: Fix SO_BUFFER_ENABLE stuff to only happen on Gen < 8 (caught
    by Jason Ekstrand).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
fb857b5eea glsl: Don't constant propagate arrays.
Constant propagation on arrays doesn't make a lot of sense.  If the
array is only accessed with constant indexes, then opt_array_splitting
would split it up.  Otherwise, we have variable indexing.  If there's
multiple accesses, then constant propagation would end up replicating
the data.

The lower_const_arrays_to_uniforms pass creates uniforms for each
ir_constant with array type that it encounters.  This means that it
creates redundant uniforms for each copy of the constant, which means
uploading too much data.  It can even mean exceeding the maximum number
of uniform components, causing link failures.

We could try and teach the pass to de-duplicate the data by hashing
constants, but it makes more sense to avoid duplicating it in the first
place.  We should promote constant arrays to uniforms, then propagate
the uniform access.

Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum
number of uniform components by a huge margin and failed to link.

On Broadwell:

total instructions in shared programs: 9067702 -> 9068202 (0.01%)
instructions in affected programs: 10335 -> 10835 (4.84%)
helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent)
HURT: 20 (Natural Selection 2)

loops in affected programs: 4 -> 0

The hurt programs appear to no longer have a constarray uniform, as
all constants were successfully propagated.  Apparently before this
patch, we successfully unrolled a loop containing array access, but
only after promoting constant arrays to uniforms.  With this patch,
we unroll it first, so all array access is direct, and the array
is split up, and individual constants are propagated.  This seems
better.

Cc: mesa-stable@lists.freedesktop.org
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
ef78df8d3b glsl: Make lower_const_arrays_to_uniforms work directly on constants.
There's really no point in looking at ir_dereference_array of a
constant.  It also misses cases like:

  (assign () (var_ref tmp) (constant (array ...) ...))

No changes in shader-db, but keeps it working after the next commit.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
f7741c5211 i965: Copy propagate before doing variable index lowering.
The scalar backend currently doesn't support variable indexing on
temporary arrays, but it does support it on uniform arrays, and
some stages support it for input arrays.  Make sure these are
propagated through before exploding indirects into piles of
if-ladders unnecessarily.

On Broadwell, no instruction count change in shader-db.

total cycles in shared programs: 80675652 -> 80674928 (-0.00%)
cycles in affected programs: 649972 -> 649248 (-0.11%)
helped: 386
HURT: 165

This will help avoid code quality regressions in a future commit.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
586f4a42e7 glsl: Propagate invariant/precise after lowering const arrays.
The new uniform may need precise as well.

Fixes copy propagation of constant array uniforms in Tomb Raider shaders.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
c264fdbc07 glsl: Split arrays even in the presence of whole-array copies.
Previously, we failed to split constant arrays.  Code such as

   int[2] numbers = int[](1, 2);

would generates a whole-array assignment:

  (assign () (var_ref numbers)
             (constant (array int 4) (constant int 1) (constant int 2)))

opt_array_splitting generally tried to visit ir_dereference_array nodes,
and avoid recursing into the inner ir_dereference_variable.  So if it
ever saw a ir_dereference_variable, it assumed this was a whole-array
read and bailed.  However, in the above case, there's no array deref,
and we can totally handle it - we just have to "unroll" the assignment,
creating assignments for each element.

This was mitigated by the fact that we constant propagate whole arrays,
so a dereference of a single component would usually get the desired
single value anyway.  However, I plan to stop doing that shortly;
early experiments with disabling constant propagation of arrays
revealed this shortcoming.

This patch causes some arrays in Gl32GSCloth's geometry shaders to be
split, which allows other optimizations to eliminate unused GS inputs.
The VS then doesn't have to write them, which eliminates the entire VS
(5 -> 2 instructions).  It still renders correctly.

No other change in shader-db.

v2: Drop !AOA check and improve a comment (feedback from Tim Arceri).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Kenneth Graunke
acf5444044 glsl: Make constant propagation's folder not propagate into an LHS.
opt_constant_propagation.cpp contains constant folding code which can
actually do constant propagation in some cases.  It was happily
propagating constants into the left-hand-side of assignments.

For example,

   (assign () (var_ref temp) (constant ...))

would brilliantly be turned into:

   (assign () (constant ...) (constant ....))

This is a bigger hammer than necessary - it prevents propagation
into the left-hand-side altogether.  We could certainly do better
someday.  Notably, the constant propagation pass itself already
takes this approach - it's just the constant propagation pass's
built-in constant folding code (which actually propagates, too)
that was broken.

No change in shader-db, but prevents regressions after future commits.
It seems plausible that this could be hit today, but I haven't seen it
happen.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-23 11:58:50 -07:00
Topi Pohjolainen
3487d2e7bf i965/blorp: Disable vertex element swizzling
Without vertex elements originating directly from vertex fetcher
are not passed to wm-state correctly.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-23 21:39:09 +03:00
Topi Pohjolainen
12783aac50 i965/blorp: Let program data tell if push constants are needed
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-23 21:39:09 +03:00
Topi Pohjolainen
874f2e9523 i965/blorp: Use prog data counters to guide wm/ps setup
just as core upload logic does.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-23 21:39:09 +03:00
Topi Pohjolainen
f5e8575ab4 i965/blorp: Use prog data counters to guide sf/sbe setup
just as core upload logic does.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-23 21:39:09 +03:00
Ardinartsev Nikita
01c89ccc5d i965: Avoid division by zero.
Fixes regression introduced by af5ca43f26

Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
2016-06-23 10:08:58 -07:00
Tim Rowley
a16d274032 swr: [rasterizer core] fix dependency bug
Never be dependent on "draw 0", instead have a bool that makes the draw
dependent on the previous draw or not dependent at all.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:51:11 -05:00
Tim Rowley
73a9154bde swr: [rasterizer core] use wrap-around safe compares for dependency checking
Move drawIDs from 64-bit to 32-bit to increase perf.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:51:06 -05:00
Tim Rowley
dd189536dc swr: [rasterizer jitter] add support for component packing for 'odd' formats
Add early-out if no components are enabled. Add asserts.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:51:00 -05:00
Tim Rowley
35935ca4f2 swr: [rasterizer core] track whether GS outputs viewport array index
So we can skip the index gather in PA.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:55 -05:00
Tim Rowley
2d80295a6e swr: [rasterizer core] GS viewport array index attribute
Only adds the attribute mapping to the jitter; no implementation yet.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:47 -05:00
Tim Rowley
c7cd33b605 swr: [rasterizer core] conservative rasterization frontend support
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:41 -05:00
Tim Rowley
c867c22d85 swr: [rasterizer core] stop single threaded crash exit crash
Function static destructors were getting called by exit
handlers before context teardown.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:36 -05:00
Tim Rowley
0f025eb478 swr: [rasterizer jitter] small fetch jit cleanup
Handle SGV stores separate from the stream fetch code.

Because of this change, there is a potential to jit an extra unused store.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:30 -05:00
Tim Rowley
eca877f27b swr: [rasterizer core] remove old comment
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:25 -05:00
Tim Rowley
d3d97f8395 swr: [rasterizer jitter] cleanup supporting different llvm versions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:19 -05:00
Tim Rowley
42215e6116 swr: [rasterizer jitter] unitialized component fix in fetch jit
Was trying to store an extra uninitialized component.
Only affects component packing, which isn't enabled (yet).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:12 -05:00
Tim Rowley
b6d2c96851 swr: [rasterizer] add support for building avx512 version
Currently, most code paths between AVX2 and AVX512 are identical
(see changes to knobs.h).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:50:05 -05:00
Tim Rowley
695af2a7e2 swr: [rasterizer common] fix include for Intel compiler
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:49:59 -05:00
Tim Rowley
95f21a9766 swr: [rasterizer common] workaround clang for windows __cpuid() bug
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 10:49:46 -05:00
Tim Rowley
9ca741c645 swr: push/pop DEBUG macro around llvm includes
llvm redefines DEBUG; adding push/pop prevents a undefined reference
to debug_refcnt_state in llvm-3.7+.

v2: add undef DEBUG

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-23 09:58:08 -05:00
Jose Fonseca
805dbdf06d include: Require MSVC 2013 Update 4.
Earlier MSVC 2013 releases have troubles compiling some of our C99 code,
so make sure we have Update 4 to avoid confusion.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-23 15:07:19 +01:00
Brian Paul
4f5d513755 svga: rename svga_surface_copy() to svga_resource_copy_region()
To be consistent with the pipe_context function name.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 07:31:20 -06:00
Brian Paul
743ff588f2 svga: don't copy blit_info into local var
There's no reason for doing so.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 07:31:20 -06:00
Brian Paul
e0dc3c5f19 gallium/util: fix some 4-space indentation in blitter code
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 07:31:20 -06:00
Charmaine Lee
2aa9ff0cda svga: fix texture array update regression
With commit fb9fe35, we start using transfer_inline_write
for memcpy TexSubImage path, but that triggers a regression with
texture array in the svga driver.

With this patch, the direct map code will update the texture array
correctly.

Fixes VMware bug 1679293.

Tested with MTT piglit, glretrace, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-23 07:31:20 -06:00
Charmaine Lee
d4a77254cb svga: fix index/vertex buffer surface reference at draw
Currently with the SetVertexBuffers optimization, we avoid emitting
redundant DXSetVertexBuffers commands. However, these buffers surfaces
will still need to be referenced, otherwise, in the case of linux,
the subsequent surface discard map will map to the existing mob instead
of a new one, causing rendering artifacts.

With this patch, we'll call resource_rebind() to reference the resources
even if we are avoiding the actual set command. This fixes the
rendering artifacts in the window title area running with unity in
Ubuntu 14.04

Tested with piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-06-23 07:31:20 -06:00
Charmaine Lee
2b81e31d44 svga: fix vertex buffer references in the hw state
This patch fixes three issues with vertex buffer references:
(1) Instead of copy the vertex buffer resource handles to the hw state
    in the context structure, use pipe_resource_reference to properly
    reference the vertex buffer resources in the context.

(2) Make sure to unbind those unused vertex buffer resources.

(3) Force to rebind the vertex buffer resources at the first draw of each
    command buffer to make sure the vertex buffer resources are paged in.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-23 07:31:20 -06:00
Charmaine Lee
a1d74f5528 svga: fix index buffer reference in the hw state
Instead of copy the index buffer resource handle to the hw state in
the context structure, use pipe_resource_reference to properly reference
the index buffer resource in the context.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-23 07:31:19 -06:00
Timothy Arceri
ab99196b6b glsl/mesa: stop duplicating geom and tcs layout values
We already store these in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.

This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-23 11:01:46 +10:00
Timothy Arceri
24b3be0938 glsl/mesa: stop duplicating tes layout values
We already store this in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.

This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.

V2: remove unnecessary NULL check

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
2016-06-23 11:01:36 +10:00
Edward O'Callaghan
f3ae370a36 .mailmap: Fixup my email address
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-06-23 00:00:46 +02:00
Christian Gmeiner
22304554a2 st/mesa: expose EXT_vertex_array_bgra when supported by backend
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-22 12:46:08 -07:00
Jason Ekstrand
c2f2c8e407 anv: Use different BOs for different scratch sizes and stages
This solves a race condition where we can end up having different stages
stomp on each other because they're all trying to scratch in the same BO
but they have different views of its layout.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:39:45 -07:00
Jason Ekstrand
45c0f60999 genxml: Make ScratchSpaceBasePointer an address instead of an offset
While we're here, we also fixup MEDIA_VFE_STATE and rename the field in
3DSTATE_VS on gen6-7.5 to be consistent with the others.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:39:42 -07:00
Jason Ekstrand
966bed17c1 anv: Add an allocator for scratch buffers
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:39:20 -07:00
Jason Ekstrand
89ded099f8 genxml: Put append counter fields before MCS in RENDER_SURFACE_STATE on gen7
The pack header generation scripts can't handle the case where you have
two addresses in the same dword; they just take whatever is the last one.
This meant that the MCS address wasn't properly getting handled.  Since we
don't care about append counters, we can just re-arrange the XML for now.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
d82322eb18 anv,isl: Lower storage image formats in anv
ISL was being a bit too clever for its own good and lowering the format for
us.  This is all well and good *if* we always want to lower it.  However,
the GL driver selectively lowers the format depending on whether the
surface is write-only or not.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
97f12773b8 isl/state: Allow for full 31-bit buffer texture sizes
Ivy Bridge and above can handle up to 2^31 elements for RAW buffer
surfaces.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
bb64e666ba isl/state: Don't use designated initializers for buffer surface state
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
4061fde66e isl/state: Add assertions for buffer surface restrictions
Acked-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
ce24097abe isl/state: Don't set SurfacePitch for gen9 1-D textures
This field is ignored by the hardware in this case and, on very large 1-D
textures, it can end up being larger than the maximum allowed value.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
f47e23a8b6 isl/state: Use TILEWALK_XMAJOR for linear surfaces on gen7
This matches better what happens on gen8 where the "Tiled Surface" and
"Tile Walke" bits are combined into a single two-bit value.  This is also
more consistent with what the GL driver does.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
96706bad5f isl/state: Emit no-op mip tail setup on SKL
This hasn't ever been a problem in the past but it is recommended by the
hardware docs.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
14d7c16e50 isl/state: Only set cube face enables if usage includes CUBE_BIT
It seems safe to set it all the time, but this reduces the diff between
the way i965 does it and what ISL does.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
5d24e9cfa1 isl/state: Use the layout for computing qpitch rather than dimensions
For depth/stencil 1-D textures on SKL, we want them layed out in the old
format that has been used since gen4.  In order for the surface state
fill-out code to handle, this it needs to distinguish based on layout
rather than just dimensionality.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
6a43204afa isl/state: Set the IntegerSurfaceFormat bit on Haswell
This fixes 688 Vulkan CTS tests on Haswell.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
324103da75 isl/format: Mark R9G9B9E5 as containing 9-bit unsigned float channels
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
215282c9f4 isl/state: Don't set RenderTargetViewExtent for texture surfaces
The docs specify that this only matters for render targets and surfaces
used with typed dataport messages.  On some platforms (gen4-6) the Depth
field has more bits than RenderTargetViewExtent so we can have textures
with more levels than we can render to.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
bb326f7b01 isl/state: Set SurfaceArray based on the surface dimension
According to the PRM, you can't set SurfaceArray for 3D or buffer textures.
There doesn't seem to be a good reason not to set it when we can.  On the
other hand, if we don't set it we can end up getting strange results for
1-layer array textures such as textureSize() returning the wrong results.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
d050ffbce9 isl/state: Don't force-disable L2 bypass for everything
We already set the bit in the few cases where it's required by the docs so
there's no need to set it all the time.  This has no noticable perf impact
for Dota 2 on Vulkan with the time demo I have.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
87f0ffa646 isl/state: Refactor the setup of clear colors
This commit switches clear colors to use #if's instead of a C if.  This
lets us properly handle SNB where the clear color field doesn't exist.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
62a5e6e031 isl/state: Refactor the per-gen isl_to_gen_h/valign tables
This moves the #if's around so that halign and valign have different sets
of #if conditions.  This also prepares us for SNB because isl_to_gen_halign
is not defined at all on gen6.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
b1b0d6fb54 isl/state: Return an extent3d from the halign/valign helper
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
a60ae9e10a isl/state: Put pitch calculations together
This is purely cosmetic, but it makes things look a bit more readable.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
70c8afc0c8 isl/state: Put all dimension setup together and towards the top
This is purely cosmetic, but it makes things look a bit more readable.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
e66e70ef47 isl/state: Put surface format setup at the top
This is purely cosmetic, but it makes things look a bit more readable.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
39baea551f isl/state: Remove some unused fields
They're already zero-initialized and we have no plans of doing anything
more interesting with them.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
caf2af4181 isl/state: Don't use designated initializers for the surface state
While designated initializers are nice, they also force us to put some
things in the initializer and some things later.  Surface state setup is
complicated enough that this really hurts readability in the long run.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
de1d194856 genxml/gen8,9: Prefix the multisample format enum with MSFMT
This is what gen7 does and it's nice to have a prefix

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
320de71858 i965/blorp: Only set src_z for gen8+ 3D textures
Otherwise, we end up with a bogus value in the third component.  On gen6-7
where we always use 2D textures, this can cause problems if the
SurfaceArray bit is set in the SURFACE_STATE.

Acked-by: Chad Versace <chad.versace@intel.com>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
664dc89a1b i965/gen7,8: Set SURFACE_IS_ARRAY for all non-3D texture types
There's no real reason why we shouldn't set this bit.  It does affect how
the sampler operates a bit but since you can have a 2D non-array view of a
2D_ARRAY texture that distinction is very weak.  Also, this is what ISL
will do and we would like this change to be isolated from using ISL.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
2a1cc94d27 i965/gen4: Subtract 1 from buffer sizes
The PRM states that the values put in Width, Height, and Depth should be
various bits from the value size - 1.  We seem to have done this wrong
more-or-less from the start.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
e8580b8f98 i965: Remove fake W-tiled render target support
This hasn't been used since 1cfb4bc890 where we deleted the meta stencil
blit path.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
0195299c86 i965/fs: Use a default Y coordinate of 0 for TXF on gen9+
Previously, we were incrementing length but not actually putting anything
in the Y coordinate.  This meant that 1-D TXF operations had a garbage
array index.  If the surface is emitted as 1-D non-array, the coordinate
gets discarded and it works fine.  If it happens to be bound as an array
surface, it may count as an out-of-bounds array access and you get zero.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
1436238b75 i965/gen8: Use the qpitch from the aux_mt for AUX_QPITCH
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
620f81d2ed i965/blorp/gen8: Use the correct max level and layer in emit_surface_states
We were adding in the base which is wrong because the values given in the
miptree are relative to zero and not the base layer/level.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
6ba88bce64 i965: Drop the maximum 3D texture size to 512 on Sandy Bridge
The RenderTargetViewExtent field of RENDER_SURFACE_STATE is supposed to be
set to the depth of a 3-D texture when rendering.  Unfortunatley, that
field is only 9 bits on Sandy Bridge and prior so we can't actually bind
a 3-D texturing for rendering if it has depth > 512.  On Ivy Bridge, this
field was bumpped to 11 bits so we can go all the way up to 2048.  On Iron
Lake and prior, we don't support layered rendering and we use OffsetX/Y
hacks to render to particular layers so 2048 is ok there too.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
0f9cd74aab i965/gen4-6: Handle gl_texture_object::BaseLevel and MinLayer correctly
This is basically a direct translation of what we do for gen7.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83036
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Jason Ekstrand
ee39d3ba91 i965/gen4: Pull texture formats from the texture object not the miptree
This makes texture views sort-of work.  It doesn't add full texture view
support for gen4-5 but it is enough to fix the GL_ARB_copy_image formats
piglit test on Iron Lake.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83036
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-22 12:26:43 -07:00
Kenneth Graunke
77d6add00d i965: Fix point size with tessellation/geometry shaders in GLES.
Our previous code worked for desktop GL, and ES without geometry or
tessellation shaders.  But those features require fancier point size
handling.  Fortunately, we can use one rule for all APIs.

Fixes a number of dEQP tests with EXT_tessellation_shader enabled:
dEQP-GLES31.functional.tessellation_geometry_interaction.point_size.*

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-22 12:22:50 -07:00
Marek Olšák
5d85a21fee .mailmap: fix my main address 2016-06-22 14:45:52 +02:00
Timothy Arceri
356ea9a8da i965: move vs outputs written into a helper
We will reuse this for fs key generation for the on disk shader
cache.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-22 20:59:26 +10:00
Nicolai Hähnle
3948cd3797 st/mesa: use a single memcpy in st_ReadPixels when possible
This avoids costly address recomputations, function overhead, and may trigger
large copy optimizations.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-22 11:44:03 +02:00
Ilia Mirkin
36ed1b695e glsl: only match gl_FragData and not gl_SecondaryFragDataEXT
There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-21 21:58:34 -04:00
Ilia Mirkin
1f4bca798d nv50,nvc0: fix start_instance in manual push path
The start instance is applied as an offset into the buffer directly,
ignoring the divisor, not as an instance id offset that respects the
divisor.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-21 21:50:16 -04:00
Ilia Mirkin
5b0d64886d translate: fix start_instance parameter in sse version
The generic version gets this right already, but this was using an
incorrect formula in SSE.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-21 21:50:16 -04:00
Jason Ekstrand
35b53c8d47 anv/cmd: Dirty descriptor sets when a new pipeline is bound
Ever since c2581a9375, the binding table layout has depended on the
pipeline.  This means that whenever we change pipelines we also need to
re-emit binding tables for the new layout.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-21 16:45:25 -07:00
Jason Ekstrand
2bfe0c3374 anv/cmd: Move emit_descriptor_pointers to genX_cmd_buffer.c
It's tiny and fully generic so there's really no reason for it to be in a
gen7-specific file.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-21 16:45:25 -07:00
Jason Ekstrand
9df4d6bb36 anv/cmd: Move flush_descriptor_sets to anv_cmd_buffer.c
There's no good reason for recompiling it

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-21 16:45:25 -07:00
Jason Ekstrand
295e03c980 spirv: Use the system value version of gl_FrontFace
SPIR-V treats it as an input but NIR wants the system value.  This
shouldn't have been too much of a surprise given that we have to do the
same conversion in the GLSL IR to NIR pass.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-21 16:45:25 -07:00
Kenneth Graunke
40013c5033 i965: Reorganize prog_data->total_scratch code a bit.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-21 10:24:45 -07:00
Marek Olšák
b16d21270f radeonsi: add a debug flag for unsafe math LLVM optimizations
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-21 13:52:05 +02:00
Marek Olšák
70a25478fe radeonsi: use u_blitter for mipmap generation
This reduces time spend in glGenerateMipmap by a half.

v2: don't decompress the levels to be overwritten

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-21 13:52:05 +02:00
Marek Olšák
5fed1122e8 gallium/u_blitter: implement mipmap generation
for pipe_context::generate_mipmap

first move some of the blit code from util_blitter_blit_generic
to a separate function, then use it from util_blitter_generate_mipmap

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-21 13:52:05 +02:00
Nicolai Hähnle
3735a925ef st/mesa: cache staging texture for glReadPixels
v2: add ST_DEBUG flag for disabling (suggested by Ilia)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-06-21 11:02:41 +02:00
Nicolai Hähnle
a571859fc4 st/mesa: invalidate readpixels cache
Whenever a draw happens or some other function call might change the result
of future glReadPixels calls, we must invalidate the cache.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:19 +02:00
Nicolai Hähnle
615ba11563 st/mesa: add readpix_cache structure
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:16 +02:00
Nicolai Hähnle
b74c23138c st/mesa: move ReadPixels blit into a separate function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:12 +02:00
Nicolai Hähnle
f9ddd52317 st/mesa: flush bitmap cache before CopyImageSubData
Found by inspection.

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:10 +02:00
Nicolai Hähnle
e7fff3cfe1 st/mesa: flush bitmap cache before texture functions
As far as I can tell, a sequence of glBitmap followed by texture functions
that refer to a texture bound as the framebuffer is well within what should
be allowed.

Found by inspection.

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:08 +02:00
Nicolai Hähnle
c542b7e43d st/mesa: flush bitmap cache before compute dispatch
In the unlikely case that a program uses glBitmap to render to a framebuffer
whose texture is bound in a compute shader.

Found by inspection.

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-21 10:54:00 +02:00
Timothy Arceri
644e015f0b i965: get PrimitiveMode from the program rather than the shader struct
This is more consistent with what we do elsewhere and will allow
us to only cache one of the values in the shader cache.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-21 12:43:18 +10:00
Vedran Miletić
82e0bbd01a clover: Fix build against clang SVN >= r273191
setLangDefaults() now requires PreprocessorOptions as an argument.

Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-21 10:08:57 +09:00
Kenneth Graunke
cd89c834a8 i965: Fix multiplication of immediates on Cherryview/Broxton.
Cherryview and Broxton don't support DW x DW multiplication.  We have
piles of code to handle this, but apparently weren't retyping in the
immediate case.

For example,
tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes
makes the simulator angry about instructions such as:

   mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D

Just retype to W or UW.  It should be safe on all platforms.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-20 17:48:03 -07:00
Jason Ekstrand
eb6764c4a7 anv: Add proper support for depth clamping
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:04:08 -07:00
Jason Ekstrand
8a46b505cb anv/cmd_buffer: Split emit_viewport in two
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:03:09 -07:00
Jason Ekstrand
20e95a746d anv/cmd_buffer: Set depth/stencil extent based on the image
It used to be based on the framebuffer which isn't quite right.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:03:05 -07:00
Jason Ekstrand
b65f2e4163 anv/cmd_buffer: Don't crash if push constants are provided for missing stages
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:03:02 -07:00
Jason Ekstrand
e6c2fe4519 anv/pipeline: Do invariance propagation on SPIR-V shaders
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:02:58 -07:00
Jason Ekstrand
bec07b7292 nir/alu_to_scalar: Respect the exact ALU operation qualifier
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:02:55 -07:00
Jason Ekstrand
202751fbb7 nir: Add a pass for propagating invariant decorations
This pass is similar to propagate_invariance in the GLSL compiler.  The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:02:45 -07:00
Jason Ekstrand
68e308d853 nir/algebraic: Remove imprecise flog2 optimizations
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original.  For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite.  However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).

This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2.  This is done 6 or 8 times and summed to produce
the final result which is written to a red texture.  In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 11:56:57 -07:00
Ian Romanick
895f7ddfb5 i965: Delete redundant extension enables
A nearly identical block already exists in the gen >= 6 block above.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-20 11:18:39 -07:00
Ian Romanick
d3a5cae60a mesa: Fix incorrect "see also" comments
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-20 11:18:39 -07:00
Ian Romanick
08cd234db8 mesa: Silence unused parameter warning
main/pipelineobj.c: In function ‘delete_pipelineobj_cb’:
main/pipelineobj.c:110:30: warning: unused parameter ‘id’ [-Wunused-parameter]
 delete_pipelineobj_cb(GLuint id, void *data, void *userData)
                              ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-20 11:18:38 -07:00
Rob Clark
64180de1bf gallium: make image_view const
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-20 12:36:20 -04:00
Rob Clark
ef534b9389 gallium: make constant_buffer const
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-20 12:36:20 -04:00
Rob Clark
e1c1c40cbc gallium: make shader_buffers const
Be consistent with the rest of the "set_xyz" state interfaces.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-20 12:36:20 -04:00
Nicolai Hähnle
1167905c41 radeonsi: use trapezoid distribution for tess on Fiji and Polaris
This yields a small performance improvement in Unigine Heaven.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:29:55 +02:00
Nicolai Hähnle
650137a9c8 radeonsi/sid: add Fiji+ tesselation distribution mode
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:29:15 +02:00
Nicolai Hähnle
32fd92e028 radeonsi: emit PA_SC_RASTER_CONFIG_1 only once
It is the same for all SEs.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:34 +02:00
Nicolai Hähnle
c95175581e radeonsi: fix calculation of valid RB mask per SE
The old calculation treated too many RBs as disabled.

Cc: 11.0 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:31 +02:00
Nicolai Hähnle
6c2e636982 radeonsi: raise SI_PM4_MAX_DW
The old limit, introduced in commit afa752d3f0,
was exceeded by 4 SE configurations which hit si_write_harvested_raster_configs.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-20 18:28:17 +02:00
Roland Scheidegger
b0cf99165a gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).

This addresses https://llvm.org/bugs/show_bug.cgi?id=28176

CC: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
2016-06-20 17:19:03 +02:00
Ilia Mirkin
154c0a42a2 nvc0: don't make use of push hint if there are no non-const user vbos
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.

This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.

Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-19 10:14:57 -04:00
Ilia Mirkin
1804aa0b80 gk104/ir: fix tex use generation to be more careful about eliding uses
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-19 10:14:46 -04:00
Ilia Mirkin
194bcb49d1 nv50: add support for GL_EXT_window_rectangles
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-18 13:38:30 -04:00
Ilia Mirkin
b21a00d129 nvc0: add support for GL_EXT_window_rectangles
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-18 13:38:30 -04:00
Ilia Mirkin
d1bdc1238a st/mesa: add support for GL_EXT_window_rectangles
Make sure to pass the requisite information in draws, blits, and clears
that work on the context's draw buffer.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-18 13:38:30 -04:00
Ilia Mirkin
07fcb06fe0 gallium: add PIPE_CAP_MAX_WINDOW_RECTANGLES to all drivers
This says how many window rectangles are supported by the
implementation, although it may not exceed PIPE_MAX_WINDOW_RECTANGLES.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-18 13:38:29 -04:00
Ilia Mirkin
82fab73246 gallium: add API for setting window rectangles
Window rectangles apply to all framebuffer operations, either in
inclusive or exclusive mode. They may also be specified as part of a
blit operation.

In exclusive mode, any fragment inside any of the specified rectangles
will be discarded.

In inclusive mode, any fragment outside every rectangle will be
discarded.

The no-op state is to have 0 rectangles in exclusive mode.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-18 12:59:12 -04:00
Ilia Mirkin
d68c1e2ac2 mesa: add GL_EXT_window_rectangles state storage/retrieval functionality
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-18 12:51:55 -04:00
Ilia Mirkin
78506ad246 glapi: add GL_EXT_window_rectangles entrypoints
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-18 12:51:55 -04:00
Samuel Pitoiset
b214e0d2fb nv50/ir: add missing strings for some recent sysvals
This is pretty useful for debugging purposes and those should
not be omitted.

Fixes: 517a93b3 ("nvc0: add ARB_shader_draw_parameters support")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-18 18:34:50 +02:00
Bruce Cherniak
6b0ac95c28 swr: Update screen->context pointer with multiple contexts.
A pipe pointer in the screen allows for access to current device context
 in flush_frontbuffer and resource_destroy.  This wasn't tracking current
context in multi-context situations.

v2: More caffeine.  Corrected compare, removed unnecessary set of
screen-pipe in create_context, and added a few comments.
2016-06-17 13:56:03 -05:00
Brian Paul
ace3124f22 scons: put the generated git_sha1.h file in top-level src/ directory
To match what's done in the automake build.

v2: Use git rev-parse to get a 10-character hash ID
    Fix Python imports

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-06-17 10:33:00 -06:00
Tim Rowley
5a64549f54 swr: switch from overriding -march to selecting features
Acked-by: Chuck Atkins <chuck.atkins@kitware.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-06-17 10:34:17 -05:00
Timothy Arceri
481e924951 mesa: remove remaining tabs in api_validate.c
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-06-17 22:07:21 +10:00
Samuel Iglesias Gonsálvez
bdab572a86 i965/fs: indirect addressing with doubles is not supported in CHV/BSW/BXT
From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
Restrictions, page 844:

  "When source or destination datatype is 64b or operation is integer DWord
   multiply, indirect addressing must not be used."

v2:
- Fix it for Broxton too.

v3:
- Simplify code by using subscript() and not creating a new num_components
variable (Kenneth).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-17 11:33:18 +02:00
Iago Toral Quiroga
0177dbb6c2 i965/fs: Fix single-precision to double-precision conversions for CHV/BSW/BXT
From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
Register Region Restrictions:

   "When source or destination is 64b (...), regioning in Align1
    must follow these rules:

    1. Source and destination horizontal stride must be aligned to
       the same qword.
    (...)"

v2:
- Fix it for Broxton too.

v3:
- Remove inst->regs_written change as it is not necessary (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-17 08:46:02 +02:00
Kenneth Graunke
48593eaf2d docs: Mention GL_ARB_ES3_1_compatibility in release notes.
Ilia reminded me that I forgot this.
2016-06-16 17:10:35 -07:00
Kenneth Graunke
a08a16541b i965: Fix comment about CS scratch space encodings on Broadwell+.
I typo'd this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-16 16:11:35 -07:00
Kenneth Graunke
93d8f80a9a docs: Update ARB_ES3_1_compatibility status for i965. 2016-06-16 14:39:44 -07:00
Kenneth Graunke
1f9445ff57 i965: Drop perf_debug about rasterizer discard in SOL vs. clipper.
I recently experimented with performing rasterizer discard in the SOL
unit instead of the clipper, and as far as I can tell, it's basically
the same performance.  The clipper comes directly after SOL anyway,
and setting the clipper to REJECT_ALL should be pretty darn cheap.

Keep the perf_debug on Sandybridge, where the GS actually does work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-16 14:37:07 -07:00
Kenneth Graunke
32b1c0b694 i965: Enable GL_ARB_ES3_1_compatibility on Gen8+ if CS are available.
There are almost no tests in any test suite, but what little I've found
seems to work.  Ilia believes everything is in place.

v2: Predicate the enable on ES 3.1 being available (Gen8+) and also
    ARB_compute_shader being available (requested by Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-16 14:33:24 -07:00
Ian Romanick
6bec55a780 mesa: If validation fails in a debug context just emit a debug message
There are quite a few pipelines that desktop applications (including a
bunch of piglit test) can expect to have run but don't meet the GLES
requirements.  Instead of failing validation, just emit a debug message.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-16 09:33:54 -07:00
Ian Romanick
9c87282041 glsl: Always strip arrayness in precision_qualifier_allowed
Previously some callers of precision_qualifier_allowed would strip the
arrayness from the type and some would not.  As a result, some places
would not notice that float[6], for example, needed a precision
qualifier.

Fixes the new piglit test no-default-float-array-precision.frag.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-06-16 09:33:53 -07:00
Jose Fonseca
d04f652b75 mesa/main: Update _mesa_new_shader.
Left over from 31dee99e05.  It should fix
Clang Windows build.

Trivial.
2016-06-16 15:22:37 +01:00
Christian König
6d877d7121 st/vdpau: we support lumakeying now
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-06-16 09:41:13 +02:00
Christian König
bf89e672cf vl: support luma keying for interlaced surfaces as well
We had the CSC code twice in there, factor it out into a separate function.

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-06-16 09:41:12 +02:00
Timothy Arceri
456b5d9ac9 i965: remove remaining tabs in brw_link.cpp
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-16 16:24:19 +10:00
Mathias Fröhlich
0e73d9454d vbo: Use a bitmask to track the active arrays in vbo_save*.
The use of a bitmask makes functions iterating only active
attributes less visible in profiles.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
bc4e0c4868 vbo: Use a bitmask to track the active arrays in vbo_exec*.
The use of a bitmask makes functions iterating only active
attributes less visible in profiles.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
22e5d4a1ee mesa: Use bitmask/ffs to iterate the active_samplers bitmask.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
34f741b080 mesa: Use bitmask/ffs to iterate the enabled textures.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
11a5b776c2 mesa: Use designated bool value to check texture unit completeness.
The change helps to use the bitmask/ffs in the next change.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
c14ec9aafa mesa: Use bitmask/ffs to iterate SamplersUsed
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
53691b7cb1 i965: Use bitmask/ffs to iterate used vertex attributes.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
b670f0d1d7 i965: Use bitmask/ffs to iterate enabled clip planes.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:55 +02:00
Mathias Fröhlich
a0fe569e53 radeon/r200: Use bitmask/ffs to iterate enabled clip planes.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
dc9e604ef1 mesa: Use bitmask/ffs to iterate enabled clip planes.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
d8a3ac90df mesa: Use bitmask/ffs to iterate color material attributes.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
d4eb2f9cda mesa: Use bitmask/ffs to build ff fragment shader keys.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
The bitmask used here for iteration is a combination
of different enabled masks present for texture units.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
3ee409bebf mesa: Use bitmask/ffs to build ff vertex shader keys.
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
The bitmask used here for iteration is a combination
of different enabled masks present for texture units.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
b5820759de mesa: Remove the linked list of enabled lights
Clean up after conversion to bitmasks.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
21f7f67685 mesa: Switch to bitmask based enabled lights in gen_matypes.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
f0391ba6c1 radeon/r200: Use bitmask/ffs to iterate enabled lights
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
f69a400513 nouveau: Use bitmask/ffs to iterate enabled lights
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:54 +02:00
Mathias Fröhlich
9a3fcb010c tnl: Use bitmask/ffs to iterate enabled lights
Replaces loops that iterate all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
664aec4370 mesa: Use bitmask/ffs to iterate enabled lights for ff shader keys.
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
ccb1be2fab mesa: Use bitmask/ffs to iterate enabled lights
Replaces loops that iterate all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.

v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
b60c730235 mesa: Track enabled lights in a bitmask
This enables some optimizations afterwards.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
6749d77c69 mesa: Rename CoordReplaceBits back to CoordReplace.
It used to be called like that and fits better with 80 columns.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
291f00fa12 mesa: Remove the now unused CoordsReplace array.
Now that all users are converted, remove the array.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
d19c69659a i965: Convert i965 to use CoordsReplaceBits.
Switch over to use the CoordsReplaceBits bitmask.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:53 +02:00
Mathias Fröhlich
97f67be0a7 i915: Convert i915 to use CoordsReplaceBits.
Switch over to use the CoordsReplaceBits bitmask.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:52 +02:00
Mathias Fröhlich
8e01fd6396 r200: convert r200 to use CoordsReplaceBits.
Switch over to use the CoordsReplaceBits bitmask.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:52 +02:00
Mathias Fröhlich
da79d76503 gallium: Convert the state_tracker to use CoordsReplaceBits.
Switch over to use the CoordsReplaceBits bitmask.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:52 +02:00
Mathias Fröhlich
664ba9ccc9 swrast: Convert swrast to use CoordsReplaceBits.
Switch over to use the CoordsReplaceBits bitmask.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:52 +02:00
Mathias Fröhlich
1c78515d93 mesa: Add gl_point_attrib::CoordReplaceBits bitfield.
The aim is to replace the CoordReplace array by
a bitfield. Until all drivers are converted,
establish the bitfield in parallel to the
CoordReplace array.

v2: Fix bitmask logic.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-16 05:50:52 +02:00
Timothy Arceri
31dee99e05 mesa/glsl: stop using GL shader type internally
Instead use the internal gl_shader_stage enum everywhere. This
makes things more consistent and gets rid of unnecessary
conversions.

Ideally it would be nice to remove the Type field from gl_shader
altogether but currently it is used to differentiate between
gl_shader and gl_shader_program in the ShaderObjects hash table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-16 10:45:35 +10:00
Brian Paul
bb1292e226 auxilary/os: allow appending to GALLIUM_LOG_FILE
If the log file specified by the GALLIUM_LOG_FILE begins with '+', open
the file in append mode.  This is useful to log all gallium output for
an entire piglit run, for example.

v2: put GALLIUM_LOG_FILE support inside an #ifdef DEBUG block.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-15 17:16:42 -06:00
Chad Versace
c99a0a8bce anv: Fix a harmless overflow warning
anv_pipeline_binding::index is a uint8_t, but some code assigned to it
UINT16_MAX.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewd-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-15 15:34:13 -07:00
Rob Herring
067c5b10b6 vc4: fix vc4_resource_from_handle() stride calculation
The expected stride calculation is completely wrong. It should
ultimately be multiplying cpp and width rather than dividing. The width
also needs to be aligned to the tiling width first before converting to
stride bytes.

The whole stride check here is possibly pointless. Any buffers which
were allocated outside of vc4 may have strides with larger alignment
requirements.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-06-15 14:54:38 -07:00
Kenneth Graunke
c319512e16 i965: Use a uniform for gl_PatchVerticesIn in the TCS on Gen8+.
We still need to recompile the passthrough shader when this value
changes, as it also affects the output vertex count.  But otherwise,
we can eliminate recompiles on Gen8+.

We probably want to do this for Gen7 as well, but that requires
rewriting the input release code to use a loop, which is a trade-off
I'd need to consider in more detail.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-15 12:47:37 -07:00
Kenneth Graunke
2b867264d2 glsl: Optionally lower TCS gl_PatchVerticesIn to a uniform.
i965 has no special hardware for this, so the best way to implement
this is to pass it in via a uniform.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-15 12:47:37 -07:00
Kenneth Graunke
1bc194cd64 i965: Use a uniform for gl_PatchVerticesIn in the TES.
Fixes three GL44-CTS.tessellation_shader subtests:
- max_patch_vertices
- single.max_patch_vertices
- tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn

These use gl_PatchVerticesIn in the TES, but don't link against a
TCS (which would allow the linker to lower it to a constant).  We had
no handling for the system value in the backend, so it would just
assert fail.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-15 12:44:44 -07:00
Kenneth Graunke
0be2105137 glsl: Optionally lower TES gl_PatchVerticesIn to a uniform.
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-15 12:44:09 -07:00
Marek Olšák
d794072b3e winsys/radeon: use the common job queue for multithreaded command submission v2
v2: fixup after renaming to util_queue_fence

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-15 21:07:34 +02:00
Marek Olšák
562cb03d76 gallium/util: import the multithreaded job queue from amdgpu winsys (v2)
v2: rename the event to util_queue_fence

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-15 21:07:34 +02:00
Nicolai Hähnle
44e0c0e6ec radeonsi: fix undefined left-shift into sign bit
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-15 09:27:56 +02:00
Nicolai Hähnle
494e4b8976 st_glsl_to_tgsi: don't read potentially uninitialized buffer variable
Found by -fsanitize=undefined. Note that this should be a harmless issue in
practice because the inst->op check always dominates anyway.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-15 09:27:40 +02:00
Nicolai Hähnle
6510e07345 mesa/main: fix integer overflows in _mesa_image_offset
Found using -fsanitize=undefined.

Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-15 09:27:30 +02:00
Timothy Arceri
a8a9d1bf41 i965: remove type_size_vec4_times_4()
type_size_vec4_times_4() was introduced as a fix in 8dcf807cb4
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-15 15:01:10 +10:00
Kenneth Graunke
8b408972ff mesa: Pass gl_constant_value union into _mesa_fetch_state().
We've had some trouble in the past with copying integers around via
float pointers, as the C compiler sometimes uses x87 floating point
registers to load values on 32-bit systems.  Passing the
gl_constant_value union should be safer.

To avoid churn, this patch creates a "GLfloat *value" variable so
existing uses can stay the same.

Not observed to fix anything, but I was in the area adding more integer
state vars, and thought it'd be wise.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-14 16:09:57 -07:00
Marek Olšák
6ef50efc10 gallium/radeon: num-cs-flushes query should display per-frame average
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
4140afd04b gallium/radeon: add driver queries for compute/dma call stats and spills
also print the average count per frame

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
8fc688c303 radeonsi: don't generate "ret void undef"
Use LLVMBuildRetVoid in epilogs and the GS copy shader and
si_llvm_build_ret otherwise.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-14 20:22:16 +02:00
Marek Olšák
4eea710b0d radeonsi: try to hit direct hw MSAA resolve by changing micro mode in clear
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.

v2: comment the magic numbers

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
373060652c radeonsi: clarify the MSAA resolve limitation with scanout
this is the correct hw requirement

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Marek Olšák
789618e3b4 gallium/radeon: add micro_tile_mode to radeon_surf
for easier access

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-14 20:22:16 +02:00
Gurchetan Singh
63c5d5c6c4 Added pbuffer hooks for surfaceless platform
This change enables the creation of pbuffer
surfaces on the surfaceless platform.

v3: Going back to single-buffered pbuffer
plus additional code review changes

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-14 08:51:02 -07:00
Roland Scheidegger
afbf5888f5 gallium/util: don't use blocksize for minify for assertions
The previous assertions required for texture sizes smaller than block_size
that src_box.x + src_box.width still be block size.
(e.g. for a texture with width 3, and src_box.x = 0, src_box.width would
have to be 4 to not assert.)
This caused some assertions with some other state tracker.
It looks though like callers aren't expected to round up widths to block sizes
(for sizes larger than block size the assertion would still have verified it
wouldn't have been rounded up) so we simply shouldn't use a minify which
rounds up to block size.
(No piglit change with llvmpipe.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-14 17:03:34 +02:00
Roland Scheidegger
f4184d5450 llvmpipe: hack-fix bugs due to bogus bind flags
The gallium contract would be that bind flags must indicate all possible
bindings a resource might get used, but fact is the mesa state tracker does
not set bind flags correctly, and this is more or less unfixable due to GL.

This caused a bug with piglit arb_uniform_buffer_object-rendering-dsa
since 6e6fd911da - the commit is correct,
but it caused us to miss updates to fs UBOs completely, since the
corresponding buffer didn't have the appropriate bind flag set (thus we
wouldn't check if it is indeed currently bound).
See the discussion about this starting here:
https://lists.freedesktop.org/archives/mesa-dev/2016-June/119829.html

So, update the bind flags when we detect such usage.
Note we update this value for now only in places which matter for us - that
is creating sampler/surface view, or binding constant buffer. There's plenty
more places (setting streamout buffers, vertex/index buffers, ...) where
things can be set with the wrong bind flags, but the bind flags there never
matter.

While here also make sure we only set dirty constant bit when it's a fs
constant buffer - totally doesn't matter if it's vs/gs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-14 17:03:34 +02:00
Rob Clark
243417810b freedreno: support start param for sampler views/states
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Rob Clark
b8eb1493a9 freedreno: only do extra vertex-buffer state logic on a2xx
Possibly this should move into an fd2 wrapper fxn, similar to the
texture state tracking done for fd3/fd4 (clamp emulation, etc)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Rob Clark
26d0efa9ce freedreno: use util_copy_constant_buffer() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-14 11:00:59 -04:00
Nayan Deshmukh
fdec8f9e42 st/vdpau: replace 0.f and 1.f with 0.0f and 1.0f respectively
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-14 15:32:04 +01:00
Tomasz Figa
e7ab358e81 i965: Check return value of screen->image.loader->getBuffers (v2)
The images struct is an uninitialized local variable on the stack. If the
callback returns 0, the struct might not have been updated and so should
be considered uninitialized. Currently the code ignores the return value,
which (depending on stack contents) might end up in reading a non-zero
value from images.image_mask and dereferencing further fields.

Another solution would be to initialize image_mask with 0, but checking
the return value seems more sensible and it is what Gallium is doing.

v2: fix typos in commit message,
    fix indentation,
    remove unnecessary parentheses and pointer dereference to keep line
    length reasonable.

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-14 15:32:04 +01:00
Michel Dänzer
9ee3f097b6 st/dri: Clear drawable texture_mask in dri2_invalidate_drawable
This makes sure that dri_set_tex_buffer2 -> dri_drawable_validate_att
will re-create the front left attachment buffer after the drawable got
invalidated.

Fixes window contents not updating until the window is resized when
using DRI2 PRIME.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-14 18:16:54 +09:00
Eduardo Lima Mitev
a93bb2e33f glsl/builtin_variables: Populate MaxCombinedShaderStorageBlocks on GLSL 4.40
Built-in variable "MaxCombinedShaderStorageBlocks" was added to GLSL 4.40
revision 9.

Section "1.2.1 Changes since revision 8 of GLSL version 4.40",
page 3 of the PDF states:

    "Bug 11734: Add gl_MaxCombinedShaderOutputResources and mark
    gl_MaxCombinedImageUnitsAndFragmentOutputs  as deprecated."

Fixes: GL44-CTS.shader_image_load_store.basic-glsl-const

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-14 10:21:26 +02:00
Julien Isorce
1cdb4da1d6 st/va: ensure linear memory for dmabuf
In order to do zero-copy between two different devices
the memory should not be tiled.

Tested with GStreamer on a laptop that has 2 GPUs:
1- gstvaapidecode:
   HW decoding and dmabuf export with nouveau driver on Nvidia GPU.
2- glimagesink:
   EGLImage imports dmabuf on Intel GPU.

TEST: DRI_PRIME=1 gst-launch vaapidecodebin ! glimagesink

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-14 08:40:33 +01:00
Dylan Baker
5a87bc7181 isl: Replace bash generator with python generator
This replaces the current bash generator with a python based generator
using mako. It's quite fast and works with both python 2.7 and python
3.5, and should work with 3.3+ and maybe even 3.2.

It produces an almost identical file except for a minor layout changes,
and the addition of a "generated file, do not edit" warning.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 22:40:52 -07:00
Mathias Fröhlich
ed2dae86ae mesa: Make use of u_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-14 05:19:10 +02:00
Mathias Fröhlich
c3b6656676 mesa/gallium: Move u_bit_scan{,64} from gallium to util.
The functions are also useful for mesa.
Introduce src/util/bitscan.{h,c}. Move ffs function
implementations from src/mesa/main/imports.{h,c}.
Move bit scan related functions from
src/gallium/auxiliary/util/u_math.h. Merge platform
handling with what is available from within mesa.

v2: Try to fix MSVC compile.

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-14 05:19:10 +02:00
Aaron Watry
fafe026dbe clover: Include generated sources in AM_CPPFLAGS
git_sha1.c is generated in $(top_builddir)/src.

Fixes out-of-tree builds since 4825264f75.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96516
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-14 12:04:42 +09:00
Stephan Bergmann
0140938b26 nv50/ir: make Graph destructor virtual
Avoid ASan new-delete-type-mismatch when Function::domTree is created as
DominatorTree in Function::convertToSSA but destroyed only as base
Graph in ~Function.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-13 22:55:11 -04:00
Jason Ekstrand
be32a21327 i965/compiler: Bring back the INTEL_PRECISE_TRIG environment variable
This was removed in d9546b0c5d and replced with the precise_trig driconf
option.  However, we still need precise trig in the Vulkan driver so this
commit brings back the environment variable and compiler->precise_trig is
effectively the logical OR of the two.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96484
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 19:54:47 -07:00
Samuel Iglesias Gonsálvez
a0ed8503b7 i965: Defeat the register stride checker in pull uniform messages.
Pulling DF uniforms from pull constant buffer generates messages like:
    send(4)         g12<1>DF        g12<0,1,0>F
         sampler ld SIMD4x2 Surface = 1 Sampler = 0 mlen 1 rlen 1

which produces GPU hangs in Cherryview/Braswell:

    "For 64-bit Align1 operation or multiplication of dwords in CHV,
     source horizontal stride must be aligned to qword."

This seems to be documented in the Cherryview PRM, Volume 7, Page 843:

    "When source or destination datatype is 64b or operation is integer
     DWord multiply, regioning in Align1 must follow these rules:

     1. Source and Destination horizontal stride must be aligned to the
        same qword."

We should set the destination type to UD, D, or F so that
the register stride checker doesn't notice.  The destination type of
send messages is basically irrelevant anyway.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-13 19:36:59 -07:00
Kenneth Graunke
ed3ba651f6 i965: Defeat the register stride checker in URB reads.
Pulling DF inputs from the URB generates messages like:

   send(8)         g23<1>DF        g1<8,8,1>UD
                   urb 3 SIMD8 read mlen 1 rlen 2      { align1 1Q };

which makes the simulator angry:

   "For 64-bit Align1 operation or multiplication of dwords in CHV,
    source horizontal stride must be aligned to qword."

This seems to be documented in the Cherryview PRM, Volume 7, Page 823:

   "When source or destination datatype is 64b or operation is integer
    DWord multiply, regioning in Align1 must follow these rules:

    1. Source and Destination horizontal stride must be aligned to the
       same qword."

Setting the source horizontal stride to QWord is insane, as it's the
message header containing 8 URB handles in a single 32-bit DWord.
Instead, we should whack the destination type to UD, D, or F so that
the register stride checker doesn't notice.  The destination type of
send messages is basically irrelevant anyway.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-13 19:36:46 -07:00
Kenneth Graunke
9f37df06da i965: Fix issues with number of VS URB entries on Cherryview/Broxton.
Cherryview/Broxton annoyingly have a minimum number of VS URB entries
of 34, which is not a multiple of 8.  When the VS size is less than 9,
the number of VS entries has to be a multiple of 8.

Notably, BLORP programmed the minimum number of VS URB entries (34), with
a size of 1 (less than 9), which is invalid.

It seemed like this could be a problem in the regular URB code as well,
so I went ahead and updated that to be safe.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-13 19:35:52 -07:00
Timothy Arceri
b010fa8567 glsl: make sure UBO arrays are sized in ES
This check was removed in 5b2675093e add it back in.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
https://bugs.freedesktop.org/show_bug.cgi?id=96349
2016-06-14 11:33:24 +10:00
Vedran Miletić
4825264f75 clover: Update OpenCL version string to match OpenGL
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.

v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-06-13 15:55:59 -07:00
Francisco Jerez
bd9f972651 i965/fs: Fix regs_written for SIMD-lowered instructions some more.
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end.  Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly.  Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):

 spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
 spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-13 15:55:59 -07:00
Francisco Jerez
a84b5d43e2 i965: Fix cross-primitive scratch corruption when changing the per-thread allocation.
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.

I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original.  Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.

Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9.  The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Francisco Jerez
d960284e44 i965: Keep track of the per-thread scratch allocation in brw_stage_state.
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.

v2: Handle BO allocation manually instead of relying on
    brw_get_scratch_bo (Ken).

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Francisco Jerez
013ae4a70a i965: Fix scratch overallocation if the original slot size was already a power of two.
The bitwise arithmetic trick used in brw_get_scratch_size() to clamp
the scratch allocation to 1KB has the unintended side effect that it
will cause us to allocate 2x the required amount of scratch space if
the original per-thread scratch size happened to be already a power of
two.  Instead use the obvious MAX2 idiom to clamp the scratch
allocation to the expected range.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Kenneth Graunke
2df8f4a253 mesa: Make TexSubImage check negative dimensions sooner.
Two dEQP tests expect INVALID_VALUE errors for negative width/height
parameters, but get INVALID_OPERATION because they haven't actually
created a destination image.  This is arguably not a bug in Mesa, as
there's no specified ordering of error conditions.

However, it's also really easy to make the tests pass, and there's
no real harm in doing these checks earlier.

Fixes:
dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_width_height
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texsubimage3d_neg_width_height

v2: Drop redundant check (caught by Anuj Phogat).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-06-13 15:38:47 -07:00
Brian Paul
cf9bb9acac util: update some assertions in util_resource_copy_region()
To cope with copies of compressed images which are not multiples of
the block size.  Suggested by Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@sroland@vmware.com>
2016-06-13 13:30:19 -06:00
Kenneth Graunke
5a0d294d38 i965: Fix encode_slm_size() to take a generation, not a device info.
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct.  I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this.  But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly.  This led to crashes.

Brown paper bag fix for commit 87d062a940.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 12:23:11 -07:00
Kenneth Graunke
667e5cec76 i965: Don't leak scratch BOs for TCS/TES.
These need to be freed too.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 12:22:06 -07:00
Nanley Chery
a4a5917248 anv/pipeline: Don't dereference NULL dynamic state pointers
Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
of pCreateInfo members are moved to the earliest points at which they
should not be NULL.

This fixes a segfault seen in the McNopper demo, VKTS_Example09.

v3 (Jason Ekstrand):
   - Fix disabled rasterization check
   - Revert opaque detection of color attachment usage

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 11:35:45 -07:00
Nanley Chery
a0d84a9ef9 anv: Document and rename anv_pipeline_init_dynamic_state()
To reduce confusion, clarify that the state being copied is not dynamic.

This agrees with the Vulkan spec's usage of the term. Various sections
specify that the various pipeline state which have VkDynamicState enums
(e.g. viewport, scissor, etc.) may or may not be dynamic.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 11:35:45 -07:00
Samuel Pitoiset
7f257abc1b nvc0/ir: clamp the UBO index for compute on Kepler
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 20:12:48 +02:00
Marek Olšák
6e1b12c788 radeonsi: enable scratch coalescing
This makes one particular compute shader 8x faster.

Latest LLVM git is required.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-13 18:13:51 +02:00
Jimmy Berry
0c0f841e5d st/va: hardlink driver instances to gallium_drv_video.so
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.

Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Jan Vesely
1fb4179f92 vl: Fix trivial sign compare warnings
v2: add whitepace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
[Emil Velikov: squash a few more whitespace issues]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Rob Herring
112e988329 Android: move libdrm settings to top-level Android.common.mk
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:

external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;

HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Rob Herring
54e550ab8a Android: disable some noisy warnings
Turn off warnings for -Wpointer-arith, -Wno-missing-field-initializers,
-Wno-initializer-overrides, and -Wno-mismatched-tags. These are all deemed
pointless, on purpose or no plans to fix.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Emil Velikov
db8790c0da st/mesa: inline _mesa_create_context() into its only caller
Inline the function into it's only caller. This way it's more obvious
how the classic and gallium drivers (st/mesa) use _mesa_initialize_context.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:29 +01:00
Emil Velikov
a4fa8bf819 st/mesa: remove unneeded break from st_api_create_context()
We have return on the previous line, thus the break will never be
reached.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
6406bc1592 st/mesa: use c99 initializer for st_gl_api
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
15bc7856bf gallium: remove st_api::get_proc_address hook
It has been unused for a long time, plus makes the gallium dri modules
require an extra glapi symbol relative to their classic counterparts.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
23a7fca6aa mesa: remove _mesa_init_get_hash()
The actual code of the function print_table_stats() is guarded
by a ifdef GET_DEBUG, which was not been defined in years.

The last fix in 2013 (7db6b5aa91) indicates that it's rarely
used/tested. Since the issue has gone unnoticed for a whole year
(broken with 2ad4a47547).

Let's remove it for now. We can always revive it at a later stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
b81685eb32 mesa: kill off _mesa_do_init_remap_table()
... and inline its contents in _mesa_init_remap_table().

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
bfbf286f7d mesa: use native types when possible
All of the functions and related data is internal, so there's no point
if using the GL types.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
3f80c95f35 mesa: make _mesa_map_function_spec() static
Used only locally.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
390678f27d mesa: remove used _mesa_get_function_spec() and gl_function_remap
Final user was killed with last commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
5b700059a8 mesa: remove unused _mesa_map_function_array()
Unused as of commit 5a175127f3 ("dri: Remove all extension enabling
utility functions") and the patch before the previous patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
5378ee8187 glapi: remap_helper.py: remove MESA_alt_functions
The final user was nuked with last commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
b5dd8e0cf8 mesa: remove unused function _mesa_map_static_functions()
Unused as of commit 5a175127f3 ("dri: Remove all extension enabling
utility functions")

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
07ae8c7df7 dri/common: remove unused libdri_test_stubs.la
... and associated file(s).

No longer needed since commit 057259655e ("i965: Don't link libmesa or
libdri_test_stubs into tests")

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-13 15:31:27 +01:00
Emil Velikov
fcb5a75a66 swr: automake: add missing -I flag
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.

Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:24 +01:00
Emil Velikov
f4d26856df automake: add SWR to `make distcheck' gallium drivers
Will allows us to catch missing files and build issues before getting
the tarball out for general consumption.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:24:44 +01:00
Emil Velikov
bab5ab6940 configure.ac: strip out the llvm-config -march/mtune flags
Otherwise drivers such as SWR that depend on providing their own values
will fail to build.

v2: Add -mcpu for good measure (Chuck)

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chuck Atkins <chuck.atkins@kitware.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-06-13 15:24:44 +01:00
Chuck Atkins
c86fcaca72 swr: Add missing headers for package inclusion
CC: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:24:44 +01:00
Emil Velikov
8229fe68b5 automake: get in-tree `make distclean' working again.
With earlier commit we've handled the `make distclean' out of tree
build, yet we failed to attribute that for in-tree builds the test
condition will return 1. Thus effectively the target will be considered
as "failed".

Fixes: b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building
OOT")
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-13 15:24:44 +01:00
Jan Vesely
ace70aedcf gallivm: Fix trivial sign warnings
v2: include whitespace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-13 09:23:09 -04:00
Julien Isorce
a04804746f st/va: use proper temp pipe_video_buffer template
Instead of changing the format on the existing template
which makes error handling not nice and confuses coverity.

CoverityID: 1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-13 09:14:32 +01:00
Julien Isorce
6c43e0016e st/va: it is valid to release the VABuffer of an exported resource
pipe_resource_reference(&res, NULL) will decrement reference counting,
i.e. p_atomic_dec(res->count). But the va surface still has the initial
reference since it has created the resource. So calling vaDestroyImage
on a derived image calls VaDestroyBuffer but the decrementation won't
reach 0. It is just wrong for vlVaDestroyBuffer to rely on the
export_refcount flag. Finally the vaapi intel driver has the same logic.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-13 09:14:32 +01:00
Timothy Arceri
30df78236c glsl: fix component overlap validation for doubles
This change makes sure to remove arrays when checking if type
is a double.

The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-12 21:56:32 +10:00
Timothy Arceri
ad3def919e glsl: fix max varyings count for ARB_enhanced_layouts
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.

Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-12 21:56:28 +10:00
Kenneth Graunke
0fb85ac08d i965: Use the correct number of threads for compute shaders.
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.

Thanks to Curro and Jordan for helping track this down!

On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.

On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
  0.255654% (n=25).

On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
  by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
  0.432771% (n=64).

On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
  by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:15 -07:00
Kenneth Graunke
1db37ebecf i965: Assert that the scratch spaces are in range.
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.

We should really fix this properly.  However, the pitfall has existed
for ages, so for now we should at least detect it.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:15 -07:00
Kenneth Graunke
a42a93dc12 i965: Fix CS scratch size calculations on Ivybridge and Baytrail.
These are linear, not powers of two, and much more limited.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:14 -07:00
Kenneth Graunke
147a90d82a i965: Fix Haswell CS per-thread scratch space encoding.
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB.  But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.

This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.

Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now.  A subsequent commit will take care of that.

Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:14 -07:00
Kenneth Graunke
a7d029d3df i965: Account for poor address calculations in Haswell CS scratch size.
Curro figured this out by investigating the simulator.  Apparently
there's also a workaround in the Windows driver.  I'm not sure it's
actually documented anywhere.

We were underallocating the scratch buffer by a factor of 128/70.

v2: Rename threads_per_subslice to scratch_ids_per_subslice
    (suggested by Jordan Justen).

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:39:45 -07:00
Kenneth Graunke
2213ffdb4b i965: Allocate scratch space for the maximum number of compute threads.
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.

Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing.  We need to allocate enough
for every possible thread that could run.

Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:50 -07:00
Kenneth Graunke
9cd8f95809 i965: Set subslice_total on Gen7/7.5 platforms.
We'll use this for compute shader thread counts and scratch space
calculations shortly.

Note that subslices are referred to as "half slices" on Ivybridge.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:47 -07:00
Kenneth Graunke
87d062a940 i965: Fix shared local memory size for Gen9+.
Skylake changes the representation of shared local memory size:

 Size   | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
 -------------------------------------------------------------------
 Gen7-8 |    0 | none | none |    1 |    2 |     4 |     8 |    16 |
 -------------------------------------------------------------------
 Gen9+  |    0 |    1 |    2 |    3 |    4 |     5 |     6 |     7 |

The old formula would substantially underallocate the amount of space.
This fixes GPU hangs on Skylake when running with full thread counts.

v2: Fix the Vulkan driver too, use a helper function, and fix the table
    in the comments and commit message.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:26 -07:00
Ilia Mirkin
3f48548a6f nv50: reinstate dedicated constbuf push path
This was disabled due to occasionally incorrect behavior when trying to
upload data. It later became apparent that nvc0 also had a similar but
slightly different issue, which was resolved in commit e50c01d5. This
takes the same logic as nvc0 and applies it to nv50 (which has somewhat
different interfaces).

Unfortunately I did not note down precisely what was broken with UBOs
when removing the support from nv50, but I've tested a bunch of local
traces, and none of them appear to regress. This should hopefully
improve performance when UBOs are used, but this was not directly
verified.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-11 12:18:43 -04:00
Ilia Mirkin
f47845596b nv50: enable indirect addressing of fragment shader inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-11 11:50:42 -04:00
Ilia Mirkin
7d7e015381 mesa: add drawbuffer argument to ClearNamedFramebufferfi
This was fixed in revision 47 of the ARB_dsa spec in Oct 22, 2015. Since
it's horrible to have differing APIs across library versions, we should
attempt to minimize the impact by backporting it as far as possible and
hope no one notices.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:32:03 -04:00
Ilia Mirkin
92351a71a8 GL: update glcorearb.h to svn 32433
This brings in the fixed glClearNamedFramebufferfi definition, as well
as a lot of GLsizei -> GLsizeiptr changes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:31:53 -04:00
Ilia Mirkin
f81374fd3e GL: update glext to svn 32957
This brings in defines from GL_EXT_window_rectangles and fixes the
glClearNamedFramebufferfi definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:24:53 -04:00
Brian Paul
5cfc91624c docs: GL_ARB_copy_image done for softpipe, llvmpipe
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-06-10 15:50:55 -06:00
Brian Paul
e9b86bb92c llvmpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
2db747cf26 llvmpipe: don't use 3-component formats, except 32-bit x 3 formats
This basically disallows all 8-bit x 3 and 16-bit x 3 formats for
textures and render targets.  Some 3-component formats were already
disallowed before.  This avoids problems with GL_ARB_copy_image.

v2: the previous version of this patch disallowed all 3-component formats

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
672e92a146 softpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
d8fe6332d8 softpipe: don't use 3-component formats
Mesa and gallium don't have a complete set of matching 3-component
texture formats.  For example, 8-bit sRGB unorm.  To fully support
the GL_ARB_copy_image extension we need to have support for all of
these formats: RGB8_UNORM, RGB8_SNORM, RGB8_SRGB, RGB8_UINT, and
RGB8_SINT using the same component order.  Since we don't have that,
disable the 3-component formats for now.

v2: Simplify 3-component format check, per Marek.
Also check that target != PIPE_BUFFER.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
e295b4e800 st/mesa: tweak surface format mapping table
1. Try to choose R8G8B8A8 unorm/srgb formats before others in an
effort to try to match component ordering for UINT/SINT/etc.

2. If we can't get a format such as PIPE_FORMAT_A16_UNORM, try
PIPE_FORMAT_R16G16B16A16_UNORM before shallower formats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
dd4be2e19a util: update util_resource_copy_region() for GL_ARB_copy_image
This primarily means added support for copying between compressed
and uncompressed formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Anuj Phogat
466b320163 gallium: Fix region overlap conditions for rectangles with a shared edge
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."

So, the rectangles sharing just an edge shouldn't overlap.
 -----------
|           |
 ------- ---
|       |   |
|       |   |
 ------- ---

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 14:35:21 -07:00
Anuj Phogat
f8679badd4 mesa: Fix region overlap conditions for rectangles with a shared edge
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
 rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
 to the destination rectangle bounded by the locations (dstX0, dstY 0)
 and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
 while the upper bounds are exclusive."

So, the rectangles sharing just an edge shouldn't overlap.
     -----------
    |           |
     ------- ---
    |       |   |
    |       |   |
     ------- ---

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 14:35:21 -07:00
Dave Airlie
1584918996 gallivm: more 64-bit integer prep work.
This converts one other place to using the new helper.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:30 +10:00
Dave Airlie
f550b6d296 radeonsi: convert to 64-bitness checks instead of doubles.
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:21 +10:00
Dave Airlie
e5c57824ec gallivm: make non-float return code bitcast consistent.
This just uses the same form across the fetches.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:17 +10:00
Dave Airlie
3b97e50b9a gallium/gallivm: use 64-bit test instead of doubles.
This just makes some generic code that currently emits double
suitable for emitting 64-bit values.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:13 +10:00
Dave Airlie
213ab8db87 gallium/tgsi: add 64-bitness type check function.
Currently this just doubles, but we'll convert users to this
so making adding 64-bit integers easier.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:43:45 +10:00
Jason Ekstrand
8d37556ec9 anv/entrypoints: Rework #if guards
This reworks the #if guards a bit.  When Emil originally wrote them, he
just guarded everything.  However, part of what anv_entrypoints_gen.py
generates is a hash table for looking up entrypoints based on their name.
This table *cannot* get out of sync between C and python regardless of
preprocessor flags.  In order to prevent this, this commit makes us use
void pointers in the dispatch table for those entrypoints which aren't
available.  This means that the dispatch table size and entry order is
constant and it should never get out-of-sync with the python.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 13:21:07 -07:00
Jason Ekstrand
9ed0d9dd06 anv/entrypoints: Use the function pointer types provided by vulkan.h
This is a bit cleaner than generating the types ourselves when making the
table.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 13:21:07 -07:00
Nicolai Hähnle
42624ea837 st/mesa: use base level size as "guess" when available
When an applications specifies mip levels _before_ setting a mipmap texture
filter, we will initially guess a single texture level. When the second level
image is created, we try to allocate the full texture -- however, we get the
base level size guess wrong if that size is odd. This leads to yet another
re-allocation of the texture later during st_finalize_texture.

Even worse, this re-allocation breaks a (reasonable) assumption made by
st_generate_mipmaps, because the re-allocation in the finalization call will
again allocate a single-level pipe texture (based on the non-mipmap texture
filter!). As a result, mipmap generation fails in interesting ways.

All of this can be avoided by just using the fact that we already know the
size of the base level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95529
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-10 20:20:39 +02:00
Jason Ekstrand
a1e69930e4 anv: Remove the PhysicalDeviceLimits FINISHME
At this point, the limits are probably more-or-less correct.  If there is
an invalid limit, that's a bug not a FINSHME.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:45 -07:00
Jason Ekstrand
4f5bbf804b anv/pipeline_cache: Allow for an zero-sized cache
This gets ANV_ENABLE_PIPELINE_CACHE=false working again.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:10 -07:00
Jason Ekstrand
a1a25db699 anv/pipeline: Store the (set, binding, index) tripple in the bind map
This way the the bind map (which we're caching) is mostly independent of
the pipeline layout.  The only coupling remaining is that we pull the array
size of a binding out of the layout.  However, that size is also specified
in the shader and should always match so it's not really coupled.  This
rendering issues in Dota 2.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:07 -07:00
Jason Ekstrand
c13c5ac561 anv/descriptor_set: Ensure that bindings are always in increasing order
Since applications are allowed to specify some set of bindings which need
not be dense they also need not be in order.  For most things, this doesn't
matter, but it could result getting the wrong dynamic offsets. This adds a
quick-and-dirty sort to ensure that everything is always in increasing
order of binding index.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:03 -07:00
Jason Ekstrand
e2265926f2 anv/descriptor_set: Add a type field in debug builds
This allows for some extra validation and makes it easier to see what's
going on when poking around in gdb.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:42:59 -07:00
Jason Ekstrand
cd21015abd anv/descriptor_set: Set array_size to zero for non-existant descriptors
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:42:45 -07:00
Leo Liu
2ad443e4cc vl/dri3: support receiving new pixmap for front buffer
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.

This also fixes Totem player playback corruption.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:24 -04:00
Leo Liu
0ef8500aab vl/dri3: get Makefile properly
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.

This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:19 -04:00
Jose Fonseca
2b4cee0571 gallivm: Never emit llvm.fmuladd on LLVM 3.3.
Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't
handle llvm.fmuladd and instead it fall backs to a C function.  Which in
turn causes a segfault on Windows.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 16:17:04 +01:00
Jose Fonseca
320d1191c6 gallivm: Use llvm.fmuladd.*.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Jose Fonseca
9e8edfa190 util,gallivm: Explicitly enable/disable fma attribute.
As suggested by Roland Scheidegger.

Use the same logic as f16c, since fma requires VEX encoding.

But disable FMA on LLVM 3.3 without MCJIT.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Bas Nieuwenhuizen
54f755fa0f radeonsi: Reinitialize all descriptors in CE preamble.
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.

To illustrate suppose we have two graphics IB's 1 and 2, which  are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.

The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.

Fix this by always restoring the entire CE RAM.

v2: - Just load all descriptor set buffers instead of load and store the entire
      CE RAM.
    - Leave the ce_ram_dirty tracking in place for the non-preamble case.

v3: - Fixed parameter alignment.
    - Rebased to master (Nicolai's descriptor series).

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 12:18:29 +02:00
Jose Fonseca
f93c22109e mesa: Wrap extensions.h declarations with extern "C".
This should fix the MSVC linker failures that arose with commit
5e2d25894b.

Trivial.
2016-06-10 11:00:42 +01:00
Ilia Mirkin
f48f344700 st/mesa: fix type confusion with reladdrs
The reality is that this doesn't matter, because we manually emit the
ARL to the sampler reladdr, and those arguments don't get an extra load
later, so it's effectively just a boolean. However having the types be
wrong is confusing and could trigger very odd bugs should usage change
down the line.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-09 21:01:53 -04:00
Dave Airlie
f140ed6d95 glsl/ir: remove TABs in ir_constant_expression.cpp
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.

Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-10 10:30:18 +10:00
Anuj Phogat
73a54e4892 i965/gen9: Don't change halign and valign to fit in fast copy blit
An update in graphics specs has deleted the halign and valign fields
from XY_FAST_COPY_BLT command. See mesa commit 97f0f91.

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-06-09 15:50:07 -07:00
Anuj Phogat
46c8967813 mesa: Add a helper function for shared code in get_tex_rgba_{un}compressed
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-09 15:50:07 -07:00
Samuel Pitoiset
5e2d25894b mesa: Let compute shaders work in compatibility profiles
The extension is already advertised in compatibility profile, but
the _mesa_has_compute_shaders only returns true in core profile.
If we advertise it, we should allow it to work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-06-09 21:03:28 +02:00
Tim Rowley
2c85128e01 swr: implement clipPlanes/clipVertex/clipDistance/cullDistance
v2: only load the clip vertex once

v3: fix clip enable logic, add cullDistance

v4: remove duplicate fields in vs jit key, fix test of clip fixup needed

v5: fix clipdistance linkage for slot!=0,4

v6: support clip+cull; passes most piglit clip (failures understood)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-09 13:28:35 -05:00
Daniel Czarnowski
cf804b4455 glx: fix crash with bad fbconfig
GLX documentation states:
	glXCreateNewContext can generate the following errors: (...)
	GLXBadFBConfig if config is not a valid GLXFBConfig

Function checks if the given config is a valid config and sets proper
error code.

Fixes currently crashing glx-fbconfig-bad Piglit test.

v2: coding style cleanups (Emil, Topi)
    use DefaultScreen macro (Emil)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-06-09 17:55:44 +03:00
Nayan Deshmukh
2d140ae70a st/vdpau: implement luma keying
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-09 14:23:24 +02:00
Nayan Deshmukh
f24eb5a178 vl: Apply luma key filter before CSC conversion
Apply the luma key filter to the YCbCr values during the CSC conversion
    in video buffer shader. The initial values of max and min luma are set
    to opposite values to disable the filter initially and will be set when
    enabling it.

    Add extra parmeters min and max luma for the luma key filter in
    vl_compositor_set_csc_matrix in va, xvmc. Setting them
    to opposite value 1.f and 0.f respectively won't effect the CSC
    conversion

    v2: -Squash 1,2 and 3 into one patch to avoid breaking build of
        other components. (Christian)
        -use ureg_swizzle. (Christian)
        -change name of the variables. (Christian)

    v3: -Squash all patches in one to avoid breaking of build. (Emil)
        -wrap functions properly. (Emil)
        -use 0.0f and 1.0f instead of 0.f and 1.f respectively. (Emil)

    v4: -Divide it in two patches one which introduces the functionality
	 and assigs dummy values to the changed functions and second which
	 implements the lumakey filter. (Christian)
	-use ureg_scalar instead ureg_swizzle. (Christian)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-09 14:23:07 +02:00
Jason Ekstrand
037ce5d734 i965: Emit surface states for extra planes prior to gen8
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier.  It all works, we just need to emit the states
for the extra planes.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-08 21:57:57 -07:00
Marc-André Lureau
dc81b3ad43 virgl: fix checking fences
When calling virgl_fence_wait() with timeout=0,
virgl_{drm,vtest}_resource_is_busy() is called. However, it returns TRUE
for a busy resource, whereace virgl_fence_wait() should return TRUE for
a completed (non-busy) resource.

This fixes running supertuxkart in a VM (I could not reproduce locally
with vtest though there is a similar fix)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 14:07:53 +10:00
Dave Airlie
15896a470b glsl/types: rename is_dual_slot_double to is_dual_slot_64bit.
In the future int64 support will have the same requirements.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 09:17:24 +10:00
Dave Airlie
45c901f7a3 st/glsl_to_tgsi: move to checking 64-bitness instead of double
This uses the new types interfaces to check for 64-bit types,
as futureproofing against int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:49 +10:00
Dave Airlie
bbbc45b8e1 st/glsl_to_tgsi: use enum glsl_base_type instead of unsigned
This is just some better type safety that I noticed while working
on 64-bit integer support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:49 +10:00
Dave Airlie
152f5eea62 mesa: use new 64-bit checks instead of explicit double checks.
This just moves to the new interfaces in advance of int64.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:47 +10:00
Dave Airlie
2df46519e4 glsl/link_varyings: switch to 64bit check instead of double.
This is prep work for int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:43 +10:00
Dave Airlie
35616a9e0e glsl: use new interfaces for 64-bit checks.
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:19 +10:00
Dave Airlie
a82b8e8b36 compiler: use 64bit check for sizing instead of double check.
This just moves code to the new check in advance of int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:15 +10:00
Dave Airlie
246518154e compiler/types: add 64-bitness queries.
This adds an inline and type query for if a type is 64-bit.

Fow now this is equivalent to double, but int64 will change
this.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:04 +10:00
Adam Jackson
a1c5cd426c glapi/glx: Add overflow checks to the client-side indirect code
Coverity complains that the computed sizes can lead to negative lengths
passed to memcpy. If that happens we've been handed invalid arguments
anyway, so just bomb out.

The funky "0%s" is because the size string for the variable-length part
of the request is of the form "+ safe_pad() ...", and a unary + would
coerce the result to always be positive, defeating the overflow check.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-08 14:39:46 -04:00
Marek Olšák
26b69ad250 radeonsi: improve the computation and comment of scratch_waves
2% isn't much. If you think the number should be decreased, please speak up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
1d9c1d9386 radeonsi: print the number of spilled VGPRs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
2b18d67a1e gallium/radeon: remove dead code creating LLVMTargetMachine
This was for some old unsupported LLVM version.
Only si_create_context creates the target machine now.
r600g doesn't use this function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:42 +02:00
Marek Olšák
a343ab55f7 radeonsi: don't enable scratch just for SGPR spills
Diff from shader-db:
  Scratch: 3221504 -> 17408 (-99.46 %) bytes per wave

v2: add "break;"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:41 +02:00
Marek Olšák
55b097d004 st/mesa: try not to compile compute shader on the first use
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-08 19:23:41 +02:00
Marek Olšák
95288277d5 Revert "radeonsi: allow direct hw MSAA resolve for scanout surfaces"
This reverts commit ffd54d1936.

No, it doesn't work. The test case is "glxgears -samples 2".
2016-06-08 19:21:55 +02:00
Nicolai Hähnle
bd5c41fe5f st/mesa: directly compute level=0 texture size in st_finalize_texture
The width0/height0/depth0 on stObj may not have been set at this point.
Observed in a trace that set up levels 2..9 of a 2d texture, and set the base
level to 2, with height 1. This made the guess logic always bail.

Originally investigated by Ilia Mirkin, this patch gets rid of the somewhat
redundant storage of width0/height0/depth0 and makes sure we always compute
pipe texture sizes that are compatible with the base level image of the
GL texture.

Fixes the gl-1.2-texture-base-level piglit test provided by Brian Paul.

v2:
- try to re-use an existing pipe texture when possible
- handle a corner case where the base level is not level 0 and it is of
  size 1x1x1

v3:
- ptHeight = ptWidth in cube map 1x1 case (suggested by Brian)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-08 19:12:07 +02:00
Timothy Arceri
8c3ecde0e1 glsl: stop allocating memory for SSBOs and builtins
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.

These uniform types don't use this storage.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-08 13:19:32 +10:00
Ilia Mirkin
6e6fd911da st/mesa: use buffer usage history to set dirty flags for revalidation
We were previously unconditionally doing this for arrays and ubo's, and
ignoring texture/storage/atomic buffers. Instead use the usage history
to determine which atoms need to be revalidated.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-07 22:27:04 -04:00
Gurchetan Singh
d9546b0c5d i965: Integrate precise trig into configuration infrastructure
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put

<option name="precise_trig" value="true"/>

in the proper drirc file.

V2: Make option name more generic

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
2016-06-07 15:42:21 -07:00
Marek Olšák
f39439d166 radeonsi: re-enable PBO ReadPixels acceleration
disabled by 4f1cccf570

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 00:22:45 +02:00
Marek Olšák
7c6e88b643 radeonsi: allow MSAA resolving into a texture that has DCC enabled
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
9a472a3e0b gallium/radeon: move DCC clearing into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
ffd54d1936 radeonsi: allow direct hw MSAA resolve for scanout surfaces
No idea why this was disabled, but it works fine.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
4be46c7d9d radeonsi: don't allocate DCC for the temporary MSAA resolve surface
Allocating it has no effect, but it adds overhead (useless DCC clear).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
c06246501e radeonsi: don't enable DCC in the sampler if first_level doesn't have it
If first_level > 0 and DCC is disabled for that level, let's skip DCC
reads entirely.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
00389100b6 winsys/amdgpu: enable DCC for mipmapped textures
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
c65361763c gallium/radeon: don't disable DCC because of SDMA
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
2fd74a05bb radeonsi: don't flag renderbuffer feedback loop if DCC has just been disabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
aa7fe70443 radeonsi: add per-level dcc_enabled flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
60e93ddd06 radeonsi: compute DCC register parameters in si_emit_framebuffer_state
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
a01536a29f gallium/radeon: add an assertion checking the validity of PIPE_BIND_SCANOUT
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
d4d733e39d gallium/radeon: don't allocate DCC for non-renderable texture formats
R9G9B9E5 is the only uncompressed one hopefully.

This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Nicolai Hähnle
b42bc90b6a radeonsi: enable WQM in PS prolog when needed
WQM is needed when the PS prolog computes a VGPR that is consumed by a shader
with (implicit or explicit) derivatives.

Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be
effective (otherwise it's just a no-op).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Cc: 12.0 <mesa-dev@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 23:46:02 +02:00
Nicolai Hähnle
d3a584defe tgsi/scan: add uses_derivatives (v2)
v2:
- TG4 does not calculate derivatives (Ilia)
- also handle SAMPLE* instructions (Roland)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-07 23:45:17 +02:00
Nanley Chery
b7a0c0ec7f docs/devinfo: Expound on helpful extension tips
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Nanley Chery
9e7de50cab docs/devinfo: Update bullet in stale extension guide
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Nanley Chery
26b0f023d7 docs/devinfo: Add closing paragraph tag
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Tim Rowley
87f0a0448f swr: fix provoking vertex
Use rasterizer provoking vertex API.

Fix rasterizer provoking vertex for tristrips and quad list/strips.

v2: make provoking vertex tables static const

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-07 11:47:52 -05:00
Ilia Mirkin
c81b090c92 st/mesa: revalidate image atoms when a texture is updated
A texture may be redefined with _NEW_TEXTURE, which might have been
bound to a shader image slot. We have to revalidate the image atoms to
pick up on the new resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-07 10:18:34 -04:00
Ilia Mirkin
71ad8a173f gk104/ir: fix conditions for adding a texbar
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.

Based on an earlier patch by Samuel Pitoiset.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
2016-06-07 10:18:13 -04:00
Nicolai Hähnle
8239da28e8 radeonsi: keep track of dirty descriptor sets
Reduces CPU load for draw calls that change none or few of the descriptors.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:10 +02:00
Nicolai Hähnle
d152c73712 radeonsi: move si_descriptors into a per-context array
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:07 +02:00
Nicolai Hähnle
a29c4f9ebd radeonsi: pass shader stage to si_disable_shader_image
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:05 +02:00
Nicolai Hähnle
4e0fb72786 radeonsi: access descriptor sets via local variables
This will simplify moving them to a per-context array.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:02 +02:00
Nicolai Hähnle
ba4a2840c7 radeonsi: add si_set_rw_buffer to be used for internal descriptors
So that callers outside of si_descriptors.c need to worry less about the
details of descriptor handling.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:59 +02:00
Nicolai Hähnle
c615a055f4 radeonsi: pass shader stage to si_set_shader_image
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:57 +02:00
Nicolai Hähnle
e6612a3e68 radeonsi: pass shader stage to si_set_sampler_view
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:55 +02:00
Nicolai Hähnle
c32cd4b78d radeonsi: move descriptor set begin_new_cs handling into a separate function
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:39 +02:00
Nicolai Hähnle
031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00
Jason Ekstrand
d1e141a661 anv/entrypoints: Stop using the C preprocessor
Now that we emit guards for everything, we can just generate the files and
trust build flags to keep us safe.  This should also fix the tarball
problems.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Jason Ekstrand
d1a53f91ee anv/entrypoints: Emit #if guards for all platforms
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Haixia Shi
1ea233c6f3 platform_android: prevent deadlock in droid_swap_buffers
To avoid blocking other EGL calls, release the display mutex before
we enqueue buffer to android frameworks and re-acquire the mutex
upon return.

v2: moved lock/unlock inside droid_window_enqueue_buffer().

TEST=verify pinch zoom in Photos app no longer causes hangs

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Emil Velikov
b7f7ec7843 mesa: automake: distclean git_sha1.h when building OOT
In the case of out-of-tree (OOT) builds, in particular when building
from tarball, we'll end up with the file in both srcdir and builddir.

We want the former to remain intact (since we need it on rebuild) while
the latter should be removed otherwise `make distclean' gets angry at
us.

Ideally there'll be a solution that feels a bit less of a hack. Until
then this does the job exactly as expected.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:23 +01:00
Emil Velikov
2c424e00c3 mesa: automake: ensure that git_sha1.h.tmp has the right attributes
... when copied from git_sha1.h.

As the latter file can we lacking the write attribute, one should set it
explicitly. Otherwise we'll get a warning/failure at cleanup stage.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:21:46 +01:00
Emil Velikov
359d9dfec3 mesa: automake: add directory prefix for git_sha1.h
Otherwise the build will assume that we've talking about builddir, which
is not the case in the else statement.

Here the file is already generated and is part of the tarball.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:21:45 +01:00
Emil Velikov
1816c837c1 egl: android: don't add the image loader extension for !render_node
With earlier commit we introduced support for render_node devices, which
was couples with the use of the image loader extension.

As the work was inspired by egl/wayland we (erroneously) added the
extension for the !render_node path as well.

That works for wayland, as the implementations of the DRI2 and IMAGE
loader extensions converge behind the scenes. As that is not yet
the case for Android we shouldn't expose the extension.

Fixes: 34ddef39ce ("egl: android: add dma-buf fd support")

Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-07 12:21:45 +01:00
Marek Olšák
095803a37a gallium/radeon: add support for sharing textures with DCC between processes
v2: use a function for calculating WORD1 of bo metadata

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-07 11:12:26 +02:00
Marek Olšák
9e5b5fbde0 gallium/radeon: don't discard DCC if an external user can write to it
We don't import textures with DCC now, but soon we will.

v2: if we can't disable DCC for image writes, at least decompress DCC
    at bind time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-07 11:12:26 +02:00
Dave Airlie
c6b14bafa4 i915: fix typo CAP.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 18:31:14 +10:00
Jakob Sinclair
b450f29073 glsl: initialise pointer to NULL
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.

CID: 1343616

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-07 08:13:25 +02:00
Dave Airlie
c295923d13 i965/gen8: fix cull distance emission for tessellation shaders.
This fixes some cases of:
GL45-CTS.cull_distance.functional
on Skylake.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 11:52:17 +10:00
Ilia Mirkin
704bc0f0e9 nvc0: add support for VOTE tgsi opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
f64c36e2d7 st/mesa: expose GL_ARB_shader_group_vote when supported by backend
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
edfa7a4b25 gallium: add PIPE_CAP_TGSI_VOTE for when the VOTE ops are allowed
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
30684b50d7 gallium: add VOTE_* opcodes to implement GL_ARB_shader_group_vote
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:28 -04:00
Ilia Mirkin
5189f0243a mesa: hook up core bits of GL_ARB_shader_group_vote
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:48:46 -04:00
Kenneth Graunke
13b859de04 glsl: Make opt_copy_propagation_elements actually propagate into loops.
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.

The shader-db statistics are basically a wash:

   No change in instruction counts.

   total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
   cycles in affected programs: 2102646 -> 2097396 (-0.25%)
   helped: 48
   HURT: 83

I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-06 14:14:31 -07:00
Kenneth Graunke
0756e3a25c glsl: Make opt_copy_propagation actually propagate into loops.
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.

The shader-db statistics are not terribly impressive:

   total instructions in shared programs: 9008589 -> 9008613 (0.00%)
   instructions in affected programs: 4293 -> 4317 (0.56%)
   helped: 0
   HURT: 10

   total cycles in shared programs: 78550978 -> 78575760 (0.03%)
   cycles in affected programs: 655426 -> 680208 (3.78%)
   helped: 75
   HURT: 88

   GAINED: 2

Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one.  That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.

This patch is also useful for a couple of reasons:

1. It can clean up copies created by varying packing (previously, we
   couldn't if the uses were inside a loop).

   This fixes a bug when interpolateAt*() is used on a packed varying
   inside a loop: glsl_to_nir struggles to see through the extra copy
   and mistakenly believed the variable was not an input.

2. It will help propagate uniform array access created by
   lower_const_array_to_uniforms().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-06 14:14:31 -07:00
Samuel Pitoiset
08ddfe7b2f nv50/ir: use round toward 0 when converting doubles to integers
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.

This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 22:56:04 +02:00
Marek Olšák
00e6899ae5 gallium/radeon: don't re-set BO metadata after CMASK deallocation
CMASK has no effect on metadata, because it's not sharable.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
589d6b58c3 st/mesa: change SQRT lowering to fix the game Risen
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94627
(against nouveau)

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-06 22:50:55 +02:00
Marek Olšák
991cbfcb14 radeonsi: add a performance tweak for 4 SE parts
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
2802310c25 radeonsi: simplify PRIMGROUP_SIZE computation for tessellation
Ported from Vulkan.

v2: keep the comment

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
014c8ec770 r600g: use hw MSAA resolve for non-trivial resolves
This improves MSAA resolve performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
6b449783f6 radeonsi: use hw MSAA resolve for non-trivial resolves
This improves MSAA resolve performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Dave Airlie
07403014c3 mesa/program_resource: return -1 for index if no location.
The GL4.5 spec quote seems clear on this:
"The value -1 will be returned by either command if an error occurs,
if name does not identify an active variable on programInterface,
or if name identifies an active variable that does not have a valid
location assigned, as described above."

This fixes:
GL45-CTS.program_interface_query.output-built-in

[airlied: use _mesa_program_resource_location_index as
suggested by Eduardo]
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 06:10:19 +10:00
Nicolai Hähnle
ec2b52e2d9 radeonsi: set descriptor dirty mask on shader buffer unbind
Found randomly while skimming the code. This might have caused VM faults in
robustness tests.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 21:43:18 +02:00
Nicolai Hähnle
0f916d4ca7 st/mesa: fix resource leak in try_pbo_readpixels
Found by inspection after seeing
https://bugs.freedesktop.org/show_bug.cgi?id=96343

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 21:42:27 +02:00
Charmaine Lee
627e975896 tgsi: fix mixed data type comparison in tgsi_point_sprite.c
Cast the unsigned semantic index to integer datatype before comparing
to max_generic, otherwise, max_generic which is initialized to -1
will be converted to unsigned int before the comparison, causing a wrong
semantic index to be assigned to a shader output.

Fixes the assert running TurboCAD_gl.trace. (VMware bug 1667265)

Also tested with glretrace, mesa demos pointblast, spriteblast and pointcoord.

v2: use the original max_generic variable but add the (int) cast
    to the semantic index, as suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-06 10:20:45 -06:00
Charmaine Lee
304b5a1446 svga: print shader linkage info when tgsi debug bit is on
When TGSI debug flag is enabled, print the shader linkage info as well.

Tested with mesa demos with SVGA_DEBUG=tgsi

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-06 10:20:45 -06:00
Ilia Mirkin
4f1cccf570 st/mesa: check shader image format support before using PBO download
ARB_shader_image_load_store only requires a very fixed list of formats
to be supported, while textures may be in all kinds of formats, like
BGRA which are presently not supported on at least Kepler.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 12:05:59 -04:00
Lars Hamre
4163c71010 tgsi: use truncf in micro_trunc
Switches to using truncf in micro_trunc.

Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-trunc-float
fs-trunc-vec2
fs-trunc-vec3
fs-trunc-vec4
vs-trunc-float
vs-trunc-vec2
vs-trunc-vec3
vs-trunc-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-trunc-float
gs-trunc-vec2
gs-trunc-vec3
gs-trunc-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-06 15:56:28 +02:00
Samuel Iglesias Gonsálvez
2b648ec17c i965/gs/scalar: Fix load input for doubles
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Samuel Iglesias Gonsálvez
2d6f82a294 i965/fs: fix offset when loading double vector input varyings
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.

const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Samuel Iglesias Gonsálvez
cb30727648 i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Dave Airlie
4c86399378 glsl: geom shader max_vertices layout must match.
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."

Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 18:02:19 +10:00
Jason Ekstrand
ffcef720b7 anv/pipeline: Add support for caching the push constant map
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-06-06 00:44:32 -07:00
Dave Airlie
78659ade40 glsl: use enum glsl_interface_packing in more places. (v2)
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.

This makes things a bit nicer in gdb, and improves type
safety.

v2: fix a few pieces of interface I missed that caused some
piglit regressions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-06 15:58:37 +10:00
Dave Airlie
ff2e569153 i965: don't use NumLayers for 3D textures.
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.

This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 13:07:07 +10:00
Dave Airlie
1f66a4b689 glsl: for anonymous struct matching use without_array() (v3)
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().

Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in

v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:41 +10:00
Dave Airlie
6702c15810 glsl/ast: don't crash when func_name is NULL
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types

If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:30 +10:00
Dave Airlie
4336196b7f glsl: handle ast_aggregate in has_sequence_subexpression. (v2)
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };

The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.

v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:19 +10:00
Kenneth Graunke
f657a59d98 mesa: Try to unbreak the MSVC build.
PATH_MAX is apparently not a thing on Windows.  Borrow the hack from
pipe_loader.c to try and make this work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-05 16:32:08 -07:00
Kenneth Graunke
c417c0c9c3 mesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.
This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).

It supports both GLSL shaders and ARB programs.  All stages that
are linked together are written in a single .shader_test file.

This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly.  It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.

We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
  all linked shaders in one file), at draw time (not link time),
  with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
  MESA_SHADER_READ_PATH) also uses separate files per shader stage,
  but allows reading in files to replace an app's shader code.

v2:  Dump ARB programs too, not just GLSL.
v3:  Don't dump bogus 0.shader_test file.
v4:  Add "GL_ARB_separate_shader_objects" to the [require] block.
v5:  Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6:  Don't hardcode /tmp/mesa.
v7:  Fix memoization of getenv().
v8:  Also print "SSO ENABLED" (suggested by Timothy).
v9:  Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
     _mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-05 13:48:57 -07:00
Ilia Mirkin
092ec3920f nv50,nvc0: fix BGR10_A2UI vertex format
This is mostly academic as this is not reachable from GL, which only has
the packed RGB10_A2UI vertex format.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 15:13:46 -04:00
Samuel Pitoiset
be365f34f0 nvc0: do not clear surfaces bins in the validate function
We should not call nouveau_bufctx_reset() inside a validate function.
This only affects Fermi where images are aliased between 3D and CP.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 19:02:59 +02:00
Samuel Pitoiset
43d3ecfb33 nvc0: re-validate images after launching a grid on Fermi
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.

However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.

This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 18:48:02 +02:00
Marek Olšák
3b44864ab7 radeonsi: fix images with level > 0
This should fix spec@arb_shader_image_load_store@level.

Broken by:
    Commit: 95c5bbae66
    radeonsi: set some image descriptor fields at bind time

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-05 17:00:14 +02:00
Ilia Mirkin
fd6bbc2ee2 nvc0: reduce overhead from always marking images dirty
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
0f673db6f0 nvc0: reduce overhead from always marking buffers dirty
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
e8ee161b16 nvc0: fix memory barrier flag handling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
29abbeecd8 nvc0: mark bound buffer range valid
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Dave Airlie
f018456901 anv/entrypoints: don't go using wayland/xcb unless they are configured
The fix in:
anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards

breaks things if wayland headers aren't installed.

Separate things out properly to avoid that problem.

[airlied: fixed up to put in pre-existing sections].
Reported-by: Arjan van de Ven
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-05 07:03:12 +10:00
Marek Olšák
d5491a81ff gallium/radeon: don't use the DMA ring for pipelined buffer uploads
Submitting a DMA IB flushes the GFX IB and all GPU caches.

Vedran Miletić said:
  "On Tonga 380X, this improves The Talos Principle from 8.3 fps to 28.3 fps
   (all graphics settings Ultra, 4xAA, 1080p resolution with downsampling
   from 1200p)."

Some anonymous dude said:
   R9 390 results:
      Tomb Raider (normal settings): 80 -> 88 FPS
      Talos Principle (custom settings): 23 -> 56 FPS
      Metro Last Light Redux (default benchmark settings): 39 -> 40 FPS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Vedran Miletić <vedran@miletic.net>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
9c35ec2042 r600g: don't flush caches when binding shader resources
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
eff94af794 r600g: only do necessary cache flushes in cp_dma_copy_buffer
The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
9e62012c30 r600g: only do necessary cache flushes in cp_dma_clear_buffer
The main impact is that fast color clear doesn't flush TC, CONST, DB.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
c92a3ae7e9 r600g: remove a CP DMA workaround that's not needed anymore
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
5ea5ed6050 r600g: fix CP DMA hazard with index buffer fetches (v3)
v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
    otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ade16e1f5d r600g: properly sync CP with CP DMA on R6xx
This will allow removing useless cache & IB flushes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
7746903d3a r600g: write WAIT_UNTIL in the correct place
This has been wrong all along. Fixing this will allow removing useless
cache flushes.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ee0c96c11e gallium/radeon: rename allocator_so_filled_size -> allocator_zeroed_memory
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ada3d8f31e gallium/u_suballoc: allow different alignment for each allocation
Just move the alignment parameter from u_suballocator_create
to u_suballocator_alloc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Jason Ekstrand
441194edd9 anv/blit: Use CLAMP_TO_EDGE for scaled blits
When upscaling you can end up interpolating between the edge pixel and one
past the edge.  Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case.  This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*

Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
9313a56816 anv/copy: Account for the anv_surface.offset when creating a blit2d_surf
This was causing problems if the user tried to copy to/from the stencil
portion of a combined depth/stencil image.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
526a8de22d nir/spirv: Make a decoration switch complete
Getting rid of the default case makes the compiler warn if we are missing
cases.  While we're here, we also add the one missing case.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
62c6e94bd6 nir/spirv: Make unhandled decorations and capabilities non-fatal
glslang frequently throw bogus decorations into shaders.  While we are free
to assert-fail, it's a bit nicer to the application to just warn.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
ed14d21d04 nir/spirv: Add a way to print non-fatal warnings
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
2e46a5d155 nir/spirv: Add string lookup tables for a couple of SPIR-V enums
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
5a1e56f344 nir/spirv: Complete the list of capabilities
Previously we supported a subset of capabilities and just left a default
case for the others.  It's time to stop being lazy and actually audit the
capabilities.  This should bring them up-to-date with reality.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
9fa958e95b anv/pipeline: Add support for early depth stencil
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
66bd2e1133 mesa: Get rid of _mesa_active_fragment_shader_has_side_effects
It is no longer used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
35bf4d9dc2 i965/ps_state: Use wm_prog_data.has_side_effects
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
3fb289f957 i965/fs Add a wm_prog_data bit for has_side_effects
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
4d3b8318a7 nir/info: Get rid of uses_interp_var_at_offset
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
56a178922f anv/pipeline: Silently pass tests if depth or stencil is missing
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
bc7f7e1953 anv/pipeline: Unify gen7/8 emit_ds_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
fdc3c5dd05 genxml/gen6,7,75: s/BackFace/Backface
This is more consistent with gen8+

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
1f7b54ed29 nir/spirv: Handle the WorkgroupSize builtin decoration
This fixes the 7 dEQP-VK.pipeline.spec_constant.compute.local_size.* tests
in the latest dev version of the Vulkan CTS.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
b26cdd65e8 nir/spirv: Use breaks instead of returns in constant handling
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
a19ae36ce5 anv/pipeline: Refactor specialization constant handling a bit
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
45542f554c nir/lower_indirect_derefs: Use the direct array deref for recursion
This fixes about 100 of the new Vulkan CTS tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
59f06ac389 anv/clear: Handle ClearImage on 3-D images
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Francisco Jerez
7244dc1e06 Revert "i965/fs: Allow scalar source regions on SNB math instructions."
This reverts commit c1107cec44.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode.  Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):

   es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
   es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
   es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_float_frag_xvary
   es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
   es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
   es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-06-03 18:47:29 -07:00
Francisco Jerez
a2135c6fd9 i965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 18:38:51 -07:00
Emil Velikov
7a3a0d9212 anv: add the X related and Wayland CFLAGS to VULKAN_ENTRYPOINT_CPPFLAGS
Otherwise we will fail to find the headers in some scenarios.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2016-06-04 00:52:00 +01:00
Emil Velikov
a1256c0ea7 nir: automake: add nir_search_helpers.h to the sources list(s)
Fixes: dfbae7d64f ("nir/algebraic: support for power-of-two
optimizations")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-04 00:18:40 +01:00
Rob Clark
1535519e51 freedreno/ir3: do idiv lowering after main opt loop
Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-03 16:05:03 -04:00
Rob Clark
dfbae7d64f nir/algebraic: support for power-of-two optimizations
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 16:05:03 -04:00
Nicolai Hähnle
a64c7cd2ba radeonsi: mark buffer texture range valid for shader images
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.

This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.

Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-03 14:11:05 +02:00
Marek Olšák
8c361e84ad Revert "egl: Check if API is supported when using eglBindAPI."
This reverts commit e8b38ca202.

It broke Glamor for Gallium at least.
2016-06-03 11:33:45 +02:00
Alejandro Piñeiro
9bdbb9c0e0 mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment
For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.

The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.

v2: better wording on the comment (Ian Romanick)

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-03 07:38:25 +02:00
Dave Airlie
d10ae20b96 mesa/get: return correct value for layer provoking vertex.
This fixes:
GL45-CTS.geometry_shader.layered_rendering.layered_rendering

on Skylake.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 12:33:34 +10:00
Plamena Manolova
0b67efaed2 egl: Account for default values of texture target and format
When validating attributes during surface creation we should account
for the default values of texture target and format (EGL_NO_TEXTURE)
since the user is not obligated to explicitly set both via the
attribute list passed to eglCreatePbufferSurface.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-06-02 16:07:31 -07:00
Samuel Pitoiset
28590eb949 nvc0: mark buffer texture range valid for shader images
Loosely based on radeonsi (Thanks to Nicolai).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-06-03 00:12:23 +02:00
Mauro Rossi
278c2212ac isl: add support for Android libmesa_isl static library
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.

Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}

Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-02 22:31:44 +01:00
Mauro Rossi
4143245c23 android: libmesa_glsl: add a dependency on libmesa_nir static
Fixes the following building error:

target  C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
                 from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-02 22:31:00 +01:00
Emil Velikov
af1a0ae8ce isl: automake: don't include isl_format_layout.c in two lists.
Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.

v2: Drop unrelated vulkan hunk (Jason).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 22:26:04 +01:00
Emil Velikov
af2637aa32 automake: bring back the .PHONY git_sha1.h.tmp rule
With earlier commit 3689ef32af ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.

The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.

Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-02 22:23:12 +01:00
Kenneth Graunke
f74a29188c i965: Add _NEW_POINT to a couple of comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-02 14:11:55 -07:00
Charmaine Lee
0cf0d7c02e svga: allow copy box in svga_transfer_dma_band()
Instead of just allow copy of a rectangle in svga_transfer_dma_band(),
this patch allows it to copy a box, hence allows copy a 3d texture
in one transfer.

Fixes black screen in running Heaven after commit fb9fe35. (Bug 1663282)

Tested with Heaven, glretrace, piglit.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 15:03:41 -06:00
Rob Clark
94d8fbd217 freedreno: fix bad bitshift warnings
Coverity doesn't realize idx will never be negative.  Throw in some
assert()s to help it out.

(Hopefully assert() isn't getting compiled out for coverity build.. but
there seems to be just one way to find out.  We might have to change
these to assume())

Fixes CID 1362442, 1362443

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 16:29:32 -04:00
Rob Clark
676c77a923 freedreno: assume builtin shaders do compile
Maybe we should switch to ureg to build the builtin shaders.  But at any
rate, if they fail to compile it is because someone messed them up (or
changed TGSI syntax?).

CID 1362444

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 16:29:32 -04:00
Francisco Jerez
060c8d245d i965/fs: Reindent emit_zip().
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:24:48 -07:00
Francisco Jerez
7aa76d66a1 i965/fs: Skip SIMD lowering destination zipping if possible.
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.

v2: Don't handle partial source-destination overlap for simplicity
    (Jason).  No instruction count regressions with respect to v1 in
    either shader-db or the few FP64 shader_runner test-cases with
    partial overlap I've checked manually.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:24:48 -07:00
Anuj Phogat
75da9c9933 blorp: Fix 16x multisample scaled blits
Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled
(with added 16x sample support) now passes with this patch.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:21:26 -07:00
Anuj Phogat
59c19b7687 meta: Fix indentation in shader code
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-06-02 13:21:26 -07:00
Dave Airlie
af7bf610cf mesa/copyimage: report INVALID_VALUE for missing cube face
The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.

This fixes:
GL45-CTS.copy_image.non_existent_mipmap

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 06:08:44 +10:00
Dave Airlie
c0856eacf1 mesa/copyimage: fix num samples check to handle renderbuffers.
This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.

This fixes:
GL45-CTS.copy_image.invalid_target

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 06:08:22 +10:00
Rob Clark
80c2886033 freedreno/a4xx: silence coverity warning
CID 1362451

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
9b854ce53c freedreno/a3xx+a4xx: fix potential null ptr deref
Coverity spotted the a3xx case (not sure why not the a4xx).

CID 1362452

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
27a97097e1 freedreno/ir3: fix coverity warning
CID 1362453

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
374ad2e2bd freedreno/ir3: use nir_shader_get_entrypoint() helper
Should also fix coverity warning: CID 1362454

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
df64cd6814 freedreno/a4xx: fix incorrect enum type
a4xx has it's own enum, different from a2xx/a3xx.

Spotted by coverity: CID 1362458, 1362459

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
1632b0eac0 freedreno: fix coverity negative array index warning
Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative.  Lets let coverity realize
this.

CID 1362460

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
ba452d43e0 freedreno: fix dereference before null check
ptr can actually never be null so just drop the check.

CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
228b2b36f4 gallium/util: remove u_staging
Unused, and fixes a couple of coverity warnings: CID 1362171, 1362170

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-06-02 15:44:07 -04:00
Rob Clark
18fb922faa freedreno/a3xx: only update/emit bordercolor state when needed
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
11f0652404 freedreno/a4xx: only update/emit bordercolor state when needed
I noticed in stk that it was contributing to a lot of overhead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Matt Turner
0d81a684c1 i965: Add missing types to type_sz().
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-02 11:34:09 -07:00
Nanley Chery
c06cef7f9b mesa/extensions: Fix ES1 extension reporting
Commit eda15abd84 , unintentionally
advertised these extensions in ES1 contexts. Undo this error.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-02 10:46:59 -07:00
Plamena Manolova
e8b38ca202 egl: Check if API is supported when using eglBindAPI.
According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 07:45:19 -07:00
Eric Engestrom
17f4c723eb st/osmesa: remove double-write (overwriting)
These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.

CoverityID: 1296205

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 07:05:05 -06:00
Nayan Deshmukh
6c9a352d79 st/vdpau: check for null pointer in get/put bits.
Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-02 09:28:48 +02:00
Christian König
b3e75c3997 radeon/uvd: fix the H264 level for Tonga v2
We support 5.2 for a while now.

v2: we even support 5.2 for H264, 5.1 is for HEVC.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-02 09:27:57 +02:00
Alejandro Piñeiro
b48c42cd1f mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-02 08:54:17 +02:00
Alejandro Piñeiro
c1ceee6cc9 i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-02 08:54:10 +02:00
Alejandro Piñeiro
58617bcebe i965/eu: use simd8 when exec_size != EXECUTE_16
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-06-02 08:08:10 +02:00
Jordan Justen
0a3acff5b5 i965: Remove old CS local ID handling
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
b1f22c6317 i965: Enable cross-thread constants and compact local IDs for hsw+
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
 * Minimize size of patch that switches from the old local ID layout
   to the new layout (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
3ba9594f32 anv: Support new local ID generation & cross-thread constants
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
30685392e0 i965: Support new local ID push constant & cross-thread constants
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
d437798ace i965: Add CS push constant info to brw_cs_prog_data
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
 * Support older local ID push constant layout as well. (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
1b79e7ebbd i965: Store number of threads in brw_cs_prog_data
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
3ef0957dac i965: Add nir based intrinsic lowering and thread ID uniform
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
 * Create variable during lowering pass. (Ken)

v3:
 * Don't create a variable, but instead just insert an intrisic call
   to load a uniform from the allocated location. (Jason)

v4:
 * Don't run this pass if thread_local_id_index < 0

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
04fc72501a i965: Put CS local thread ID uniform in last push register
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
 * Add variable in intrinsics lowering pass
 * Make sure the ID is pushed last in assign_constant_locations, and
   that we save a spot for the ID in the push constants

v3:
 * Simplify code based with Jason's suggestions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
fa279dfbf0 i965: Add uniform for a CS thread local base ID
v4:
 * Force thread_local_id_index to -1 for now, and have
   fs_visitor::setup_cs_payload look at thread_local_id_index. This
   enables us to more easily cut over from the old local ID layout to
   the new layout, as suggested by Jason.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
8f48d23e0f i965: Add nir channel_num system value
v2:
 * simd16/32 fixes (curro)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
6f316c9d86 nir: Make lowering gl_LocalInvocationIndex optional
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
7b9def3583 glsl: Add glsl LowerCsDerivedVariables option
v2:
 * Move lower flag to context constants. (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jason Ekstrand
1205999c22 i965/fs: Copy the offset when lowering logical pull constant sends
This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 16:00:44 -07:00
Dave Airlie
8d4f4adfbd glsl/distance: make sure we use clip dist varying slot for lowered var.
When lowering, we always want to use the clip dist varying.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-02 07:09:21 +10:00
Nicolai Hähnle
c7877b9dab winsys/amdgpu: decay max_ib_size over time
So that memory use will eventually decrease again after a temporary peak.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
6aff6377b1 winsys/amdgpu: implement IB chaining on the gfx ring
As a consequence, CE IB size never triggers a flush anymore.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
45be461f55 winsys/amdgpu: consolidate IB size management in amdgpu_ib_finalize
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
89ba076de4 radeon/winsys: introduce radeon_winsys_cs_chunk
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
a7c26bfc0c radeonsi/sid: add packet definitions for IB chaining
While we're at it, add packet printing in si_debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
83a01cb498 winsys/amdgpu: start with smaller IBs, growing as necessary
This avoids allocating giant IBs from the outset, especially for CE and DMA.

Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.

With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.

v2:
- clean up buffer_size calculation

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
f80c6abb9e winsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions
The latter function allows getting the containing amdgpu_cs from any IB
(including non-main ones).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
9e5ed559ba winsys/amdgpu: extract IB big buffer allocation for re-use
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
9db851b5ee winsys/amdgpu: add IB buffer in amdgpu_get_new_ib
Adding the buffer when we start using it for the IB makes the logic for
chaining a bit simpler.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
d6211a61b0 gallium/radeon: use cs_check_space throughout
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
46ad3561be radeon/winsys: add cs_check_space
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
92d5d97b10 winsys/amdgpu: simplify interface of amdgpu_get_new_ib
We'll want to have an amdgpu_cs pointer for future changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
8396ab4241 winsys/amdgpu: add amdgpu_cs_has_user_fence
v2: style change

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Kenneth Graunke
25e1b8d366 i965: Fix isoline reads in scalar TES.
Isolines aren't reversed.  commit 5b2d8c2273 fixed this for the vec4
TES backend, but not the scalar one.

Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-01 13:46:09 -07:00
Nicolai Hähnle
ed0e9862c5 st/mesa: implement PBO downloads for ReadPixels
v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of
    the texture targets, but all hardware that has shader images should also
    have this cap.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:51 +02:00
Nicolai Hähnle
f3b62d4c74 st/mesa: hook up a no-op try_pbo_readpixels
For better bisectability given that the order of some of the fallback tests
in the blit path are rearranged.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:48 +02:00
Nicolai Hähnle
1cb4be94ae st/mesa: add layer_offset to PBO fragment shader
This will be used to select a slice of a 3D texture.

v2: fix a comment (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:43 +02:00
Nicolai Hähnle
2bf6dfac8a st/mesa: create PBO download fragment shaders
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:40 +02:00
Nicolai Hähnle
852d3fcd3b st/mesa: add PBO download enable bit and fragment shaders
For downloads, the fragment shader must know the source texture target, hence
we may cache multiple fragment shaders.

v2: break long line (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:34 +02:00
Nicolai Hähnle
581c001532 st/mesa: move shareable parts of PBO upload state and draw to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:31 +02:00
Nicolai Hähnle
e16800226e st/mesa: move PBO buffer address calculation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:28 +02:00
Nicolai Hähnle
21e069f7d4 st/mesa: move PBO upload fs creation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:26 +02:00
Nicolai Hähnle
979688a027 st/mesa: rename pbo_upload to pbo
At the same time, rename members that are upload-specific to say so.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:23 +02:00
Nicolai Hähnle
be82065fbe st/mesa: move PBO vertex and geometry shader creation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:20 +02:00
Nicolai Hähnle
4ecc32b0e1 st/mesa: begin moving PBO functions into their own file
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:18 +02:00
Nicolai Hähnle
d9893feb2c gallium/cso: allow saving the first fragment shader image slot
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:15 +02:00
Nicolai Hähnle
fc0352ff9c gallium/u_inlines: allow NULL src in util_copy_image_view
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:12 +02:00
Nicolai Hähnle
57f576f1fb gallium: add PIPE_BARRIER_ALL define
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:36:48 +02:00
Ian Romanick
a428c955ce glsl: Use Geom.VerticesOut == -1 to specify unset
Because apparently layout(max_vertices=0) is a thing.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:39 -07:00
Ian Romanick
b27dfa5403 i965: If control_data_header_size_bits is zero, don't do EndPrimitive
This can occur when max_vertices=0 is explicitly specified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:39 -07:00
Ian Romanick
049bb94d2e mesa: Fix bogus strncmp
The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned.  That string has length 3.  strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.

v2: Not the same as strncmp(..., 3).  Noticed by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:25 -07:00
Marek Olšák
12740efd29 radeonsi: set correct stencil tile mode for texturing
Sadly, this doesn't affect SI and VI in any way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
ea68215c54 winsys/amdgpu: set flags correctly when allocating depth-stencil buffers
This mimics Vulkan. It also documents how to fix stencil texturing.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
532a5af47f gallium/radeon: lower memory usage during texture transfers
This improves throughput by keeping TTM overhead down.

Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.

v2: use gart_size / 4 as the threshold

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
614e3c6272 gallium/radeon: invalidate busy linear textures for whole-texture uploads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
fc1479a954 gallium/radeon: degrade tiled textures mapped often to linear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
9927c8138a gallium/radeon: clean up and better comment use_staging_texture
Next commits will add other things around this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
b033584299 radeonsi: set some colorbuffer register fields at emit time
to allow reallocating the texture storage with different parameters

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
30b2b860b0 radeonsi: implement global resetting of texture descriptors
it will be used by texture reallocation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
28de7aec0c radeonsi: move code for setting one shader image into separate function
v2: fix set_shader_images(..., NULL). Found by Christoph Haag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
95c5bbae66 radeonsi: set some image descriptor fields at bind time
mainly the fields that can change by reallocating a texture and changing
the tile mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
ef765d0789 gallium/radeon: strenghten some checking for DMA preparation
Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.

v1.1: fix the comment about DCC

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
9d881cc0ac gallium/util: add util_texrange_covers_whole_level from radeon
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Ilia Mirkin
ca135a2612 nir: allow sat on all float destination types
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 10:44:40 -04:00
Alex Deucher
bd85e4a041 radeonsi: fix the raster config setup for 1 RB iceland chips
I didn't realize there were 1 and 2 RB variants when this code
was originally added.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
2016-06-01 09:59:57 -04:00
Dave Airlie
6400144041 mesa/sampler: fix error codes for sampler parameters.
The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.

Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 17:01:19 +10:00
Dave Airlie
0ebf4257a3 glsl: define some GLES3 constants in GLSL 4.1
The GLSL 4.1 spec adds:
gl_MaxVertexUniformVectors
gl_MaxFragmentUniformVectors
gl_MaxVaryingVectors

This fixes:
GL45-CTS.gtf31.GL3Tests.uniform_buffer_object.uniform_buffer_object_build_in_constants

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 17:01:13 +10:00
Topi Pohjolainen
6ca118d2f4 i965: Add norbc debug option
This INTEL_DEBUG option disables lossless compression (also known
as render buffer compression).

v2: (Matt) Use likely(!lossless_compression_disabled) instead of
           !likely(lossless_compression_disabled)
    (Grazvydas) Update docs/envvars.html

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-01 09:16:36 +03:00
Topi Pohjolainen
30e9e6bd07 i965/gen9: Configure rbc buffers as plain for non-rbc tex views
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.

Fortunately none of tracked benchmarks showed a regression with
this.

v2 (Matt): Add missing space

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-01 09:16:36 +03:00
Kenneth Graunke
a3dc99f3d4 i965: Fix the passthrough TCS for isolines.
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants.  We at least
need to initialize them to zero.  We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).

Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.

(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2016-05-31 23:09:13 -07:00
Dave Airlie
ebb81cd683 i965/xfb: skip components in correct buffer.
The driver was adding the skip components but always for buffer 0.

This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 15:53:00 +10:00
Dave Airlie
1fe7bbb911 glsl/linker: fix multiple streams transform feedback.
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.

caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.

The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 13:30:41 +10:00
Dave Airlie
e891f7cf55 mesa/bufferobj: use mapping range in BufferSubData.
According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.

So we should use the if range is mapped path.

This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 13:30:40 +10:00
Ilia Mirkin
18d11c9989 nv50/ir: fix error finding free element in bitset in some situations
This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).

Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-05-31 23:25:51 -04:00
Ilia Mirkin
d873608bcf nv50/ir: print relevant file's bitset when showing RA info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-31 23:25:50 -04:00
Timothy Arceri
98d40b4d11 Revert "glsl: fix xfb_offset unsized array validation"
This reverts commit aac90ba292.

The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom

Also the CTS test it was meant to fix seems like it may be bogus.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 10:33:57 +10:00
Francisco Jerez
c1107cec44 i965/fs: Allow scalar source regions on SNB math instructions.
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:

 "The supported regioning modes for math instructions are align16,
  align1 with the following restrictions:
   - Scalar source is supported.
  [...]
   - Source and destination offset must be the same, except the case of
     scalar source."

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-31 15:57:41 -07:00
Francisco Jerez
06d8765bc0 i965/fs: Fix constant combining for instructions that cannot accept source mods.
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
303ec22ed6 i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
4fe4f6e8a7 i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already.  The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.

Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa5 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
1898673f58 i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
485fbaff03 i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction.  In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
4b0ec9f475 i965/fs: Fix compute-to-mrf VGRF region coverage condition.
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ.  Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
bb61e24787 i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:56:54 -07:00
Francisco Jerez
88f380a2dd i965/fs: Teach regions_overlap() about COMPR4 MRF regions.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:22:04 -07:00
Dylan Baker
604010a7ed Don't use python 3
Now there are not files that require python 3, so for now just remove
the python 3 dependency and use python 2. I think the right plan is to
just get all of the python ready for python 3, and then use whatever
python is available.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
ab31817fed genxml: change chbang to python 2
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
12c1a01c72 genxml: use the isalpha method rather than str.isalpha.
This fixes gen_pack_header to work on python 2, where name[0] is unicode
not str.

Signed-off-by: Dylan Bake <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
a45a25418b genxml: require future imports for python2 compatibility.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
e5681e4d70 genxml: mark re strings as raw
This is a correctness issue.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
de2e9da2e9 genxml: Make classes descendants of object
This is the default in python3, but in python2 you get old style
classes. No one likes old-style classes.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
9f50e3572c genxml: mark gen_pack_header.py as encoded in utf-8
There is unicode in this file, and I'm actually surprised that the
python interpreter hasn't gotten grumpy.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Bas Nieuwenhuizen
35818129a6 radeonsi: Decompress DCC textures in a render feedback loop.
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:04 +02:00
Bas Nieuwenhuizen
cbe3421f05 radeonsi: Add counter to check if a texture is bound to a framebuffer.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:00 +02:00
Rhys Kidd
8cb74dd4e6 vc4: Fix compiler warnings in fail_instr path of QIR validate pass
Introduced in 8e2d0843c0.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-31 10:56:02 -07:00
Emil Velikov
b8e1f59d62 anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards
The generated sources should follow the example set by the vulkan
headers and our non-generated code. Namely: the code for all supported
platforms should be available, each one guarded by its respective
VK_USE_PLATFORM_*_KHR macro.

v2: Reword commit message.

Cc: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96285
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1 over IRC)
2016-05-31 18:41:28 +01:00
Brian Paul
6bea33008e svga: change enum pipe_resource_usage back to unsigned
This parameter is actually a bitmask of PIPE_TRANSFER_x flags.
Change it back to a simple unsigned type.  IIRC, some compilers
complain about masks of enum values.  Also, this make the function
signature match u_resource_vtbl::transfer_map() again.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-31 10:20:36 -06:00
Marek Olšák
7ca55d2da8 radeonsi: fix CP DMA hazard with index buffer fetches
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-31 16:59:32 +02:00
Marek Olšák
d427110882 r600g: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:55 +02:00
Marek Olšák
d5882bb0df radeonsi: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:54 +02:00
Marek Olšák
921ab0028e gallium/u_blitter: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:53 +02:00
Marek Olšák
8a10192b4b mesa: fix crash in driver_RenderTexture_is_safe
This just fixed the crash with the apitrace in bug report.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95246

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:43:34 +02:00
Marek Olšák
fc4896e686 radeonsi: don't flush TC at the end of IBs on DRM >= 3.2.0
It's not needed since it was fixed in the kernel.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-31 16:41:22 +02:00
Jakob Sinclair
877c00c653 gallium/radeon: fixed division by zero
Coverity is getting a false positive that a division by zero can occur
here. This change will silence the Coverity warnings as a division by zero
cannot occur in this case.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 12:51:20 +02:00
Eric Engestrom
35fd5282ea st/glsl_to_tgsi: prevent infinite loop
`unsigned j` would never fail `j >= 0`, leading to an infinite loop as
`j--` wraps around.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 11:46:30 +02:00
Dave Airlie
f87352d769 glsl/images: bounds check image unit assignment
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.

Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-31 10:41:44 +10:00
Ilia Mirkin
4b1a167a2b nvc0/ir: fix spilling predicates to registers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-05-30 18:15:14 -04:00
Ilia Mirkin
1f895caba0 nvc0/ir: limit max number of regs based on availability in SM
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-30 18:15:10 -04:00
Ilia Mirkin
27a51ff9b4 nv50/ir: record number of threads in a compute shader
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-30 18:14:55 -04:00
Pierre Moreau
ae70879530 nv50/ir: Add missing handling of U64/S64 in inlines
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-30 16:12:12 -04:00
Emil Velikov
9074470d7b docs: rename release notes to 12.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7ad2cb6f08)
2016-05-30 20:33:30 +01:00
Ilia Mirkin
68d135011b docs: move nvc0 out of individual lines of GL 4.2, 4.3, ES 3.1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-30 15:18:32 -04:00
Emil Velikov
888cf6eea2 docs: add 12.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 20:03:19 +01:00
Marek Olšák
4291229488 docs/GL3: mark radeonsi as all done up to GL 4.3 and GLES 3.1 2016-05-30 20:48:51 +02:00
Emil Velikov
922b471777 nir: add the SConscript.nir to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 19:19:01 +01:00
1870 changed files with 153260 additions and 66868 deletions

34
.editorconfig Normal file
View File

@@ -0,0 +1,34 @@
# To use this config on you editor, follow the instructions at:
# http://editorconfig.org
root = true
[*]
charset = utf-8
insert_final_newline = true
[*.{c,h,cpp,hpp,cc,hh}]
indent_style = space
indent_size = 3
[{Makefile*,*.mk}]
indent_style = tab
[{*.py,SCons*}]
indent_style = space
indent_size = 4
[*.pl]
indent_style = space
indent_size = 4
[*.m4]
indent_style = space
indent_size = 2
[*.yml]
indent_style = space
indent_size = 2
[*.patch]
trim_trailing_whitespace = false

1
.gitignore vendored
View File

@@ -49,3 +49,4 @@ Makefile.in
.install-mesa-links
.install-gallium-links
/src/git_sha1.h
TAGS

View File

@@ -88,9 +88,11 @@ Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <s3734770@mai
Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <carli@carli-laptop.(none)>
Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <Carl-Philip.Haensch@mailbox.tu-dresden.de>
Chad Versace <chad.versace@intel.com> <chad@chad-versace.us>
Chad Versace <chad.versace@intel.com> <Chad Versace chad@chad-versace.us>
Chad Versace <chad.versace@intel.com> <chad.versace@linux.intel.com>
Chad Versace <chadversary@chromium.org> <chad@kiwitree.net>
Chad Versace <chadversary@chromium.org> <chad@chad-versace.us>
Chad Versace <chadversary@chromium.org> <Chad Versace chad@chad-versace.us>
Chad Versace <chadversary@chromium.org> <chad.versace@intel.com>
Chad Versace <chadversary@chromium.org> <chad.versace@linux.intel.com>
Chia-I Wu <olvaffe@gmail.com> <olv@lunarg.com>
Chia-I Wu <olvaffe@gmail.com> Chia-Wu <olvaffe@gmail.com>
@@ -138,6 +140,8 @@ Dmitry Cherkassov <dcherkassov@gmail.com> Dmitry Cherkasov <dcherkassov@gmail.co
Dylan Baker <dylanx.c.baker@intel.com> <baker.dylan.c@gmail.com>
Edward O'Callaghan <funfunctor@folklore1984.net> <eocallaghan@alterapraxis.com>
Emeric Grange <emeric.grange@gmail.com> Emeric <emeric.grange@gmail.com>
Emil Velikov <emil.l.velikov@gmail.com> <emil.velikov@collabora.com>
@@ -274,7 +278,7 @@ Marc Dietrich <marvin24@gmx.de> marvin24 <marvin24@gmx.de>
Marcin Ślusarz <marcin.slusarz@gmail.com> Marcin Slusarz <marcin.slusarz@gmail.com>
Marek Olšák <marek.olsak@amd.com> <maraeo@gmail.com>
Marek Olšák <maraeo@gmail.com> <marek.olsak@amd.com>
Mario Kleiner <mario.kleiner.de@gmail.com> kleinerm <mario.kleiner@tuebingen.mpg.de>
Mario Kleiner <mario.kleiner.de@gmail.com> <mario.kleiner@tuebingen.mpg.de>

View File

@@ -1,6 +1,7 @@
language: c
sudo: false
sudo: true
dist: trusty
cache:
directories:
@@ -10,12 +11,15 @@ addons:
apt:
packages:
- libdrm-dev
- libudev-dev
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libxcb-dri2-0-dev
- libx11-xcb-dev
- llvm-3.4-dev
- llvm-3.5-dev
# llvm-config is not in the dev package?
- llvm-3.5
# LLVM packaging is broken and misses this dep.
- libedit-dev
- scons
env:
@@ -41,6 +45,16 @@ install:
- export PATH="/usr/lib/ccache:$PATH"
- pip install --user mako
# Since libdrm gets updated in configure.ac regularly, try to pick up the
# latest version from there.
- for line in `grep "^LIBDRM_.*_REQUIRED=" configure.ac`; do
old_ver=`echo $LIBDRM_VERSION | sed 's/libdrm-//'`;
new_ver=`echo $line | sed 's/.*REQUIRED=//'`;
if `echo "$old_ver,$new_ver" | tr ',' '\n' | sort -Vc 2> /dev/null`; then
export LIBDRM_VERSION="libdrm-$new_ver";
fi;
done
# Install dependencies where we require specific versions (or where
# disallowed by Travis CI's package whitelisting).
@@ -78,22 +92,19 @@ install:
- wget http://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
- tar -jxvf $LIBDRM_VERSION.tar.bz2
- (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix && make install)
- (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 && make install)
- wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
- tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
- (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Disabled LLVM (and therefore r300 and r600) because the build fails
# with "undefined reference to `clock_gettime'" and "undefined
# reference to `setupterm'" in llvmpipe.
script:
- if test "x$BUILD" = xmake; then
./autogen.sh --enable-debug
--disable-gallium-llvm
--with-egl-platforms=x11,drm
--with-dri-drivers=i915,i965,radeon,r200,swrast,nouveau
--with-gallium-drivers=svga,swrast,vc4,virgl
--with-gallium-drivers=svga,swrast,vc4,virgl,r300,r600
--disable-llvm-shared-libs
;
make && make check;
elif test x$BUILD = xscons; then

View File

@@ -34,6 +34,10 @@ MESA_VERSION := $(shell cat $(MESA_TOP)/VERSION)
LOCAL_CFLAGS += \
-Wno-unused-parameter \
-Wno-date-time \
-Wno-pointer-arith \
-Wno-missing-field-initializers \
-Wno-initializer-overrides \
-Wno-mismatched-tags \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
@@ -78,6 +82,12 @@ LOCAL_CFLAGS += \
-D__STDC_LIMIT_MACROS
endif
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm
endif
LOCAL_CPPFLAGS += \
$(if $(filter true,$(MESA_LOLLIPOP_BUILD)),-D_USING_LIBCXX) \
-Wno-error=non-virtual-dtor \

View File

@@ -95,6 +95,8 @@ SUBDIRS := \
src/mesa \
src/util \
src/egl \
src/amd \
src/intel \
src/mesa/drivers/dri
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))

View File

@@ -43,8 +43,8 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--disable-llvm-shared-libs \
--with-egl-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl \
--with-vulkan-drivers=intel
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr \
--with-vulkan-drivers=intel,radeon
ACLOCAL_AMFLAGS = -I m4
@@ -62,6 +62,7 @@ noinst_HEADERS = \
include/c99_math.h \
include/c11 \
include/D3D9 \
include/GL/wglext.h \
include/HaikuGL \
include/no_extern_c.h \
include/pci_ids

View File

@@ -104,3 +104,7 @@ F: src/egl/drivers/dri2/platform_wayland.c
FREEDRENO
R: Rob Clark <robclark@freedesktop.org>
F: src/gallium/drivers/freedreno/
GLX
R: Adam Jackson <ajax@redhat.com>
F: src/glx/

View File

@@ -1 +1 @@
11.3.0-devel
13.0.0-rc2

View File

@@ -37,6 +37,8 @@ cache:
- win_flex_bison-2.4.5.zip
- llvm-3.3.1-msvc2013-mtd.7z
os: Visual Studio 2013
environment:
WINFLEXBISON_ARCHIVE: win_flex_bison-2.4.5.zip
LLVM_ARCHIVE: llvm-3.3.1-msvc2013-mtd.7z
@@ -47,11 +49,13 @@ install:
- python -m pip --version
# Install Mako
- python -m pip install --egg Mako
# Install pywin32 extensions, needed by SCons
- python -m pip install pypiwin32
# Install SCons
- python -m pip install --egg scons==2.4.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "http://downloads.sourceforge.net/project/winflexbison/%WINFLEXBISON_ARCHIVE%"
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"
- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul
- set Path=%CD%\winflexbison;%Path%
- win_flex --version

3
bin/.editorconfig Normal file
View File

@@ -0,0 +1,3 @@
[*.sh]
indent_style = space
indent_size = 2

View File

@@ -40,7 +40,7 @@ else
for i in $urls
do
id=$(echo $i | cut -d'=' -f2)
summary=$(wget --quiet -O - $i | grep -e '<title>.*</title>' | sed -e 's/ *<title>Bug [0-9]\+ &ndash; \(.*\)<\/title>/\1/')
summary=$(wget --quiet -O - $i | grep -e '<title>.*</title>' | sed -e 's/ *<title>[0-9]\+ &ndash; \(.*\)<\/title>/\1/')
echo "<li><a href=\"$i\">Bug $id</a> - $summary</li>"
echo ""
done

View File

@@ -86,7 +86,7 @@ def AddOptions(opts):
from SCons.Options.EnumOption import EnumOption
opts.Add(EnumOption('build', 'build type', 'debug',
allowed_values=('debug', 'checked', 'profile',
'release')))
'release', 'opt')))
opts.Add(BoolOption('verbose', 'verbose output', 'no'))
opts.Add(EnumOption('machine', 'use machine-specific assembly code',
default_machine,

View File

@@ -74,11 +74,11 @@ LIBDRM_AMDGPU_REQUIRED=2.4.63
LIBDRM_INTEL_REQUIRED=2.4.61
LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66
LIBDRM_FREEDRENO_REQUIRED=2.4.67
LIBDRM_FREEDRENO_REQUIRED=2.4.68
LIBDRM_VC4_REQUIRED=2.4.69
DRI2PROTO_REQUIRED=2.6
DRI3PROTO_REQUIRED=1.0
PRESENTPROTO_REQUIRED=1.0
LIBUDEV_REQUIRED=151
GLPROTO_REQUIRED=1.4.14
LIBOMXIL_BELLAGIO_REQUIRED=0.0
LIBVA_REQUIRED=0.38.0
@@ -89,7 +89,8 @@ XCBDRI2_REQUIRED=1.8
XCBGLX_REQUIRED=1.8.1
XSHMFENCE_REQUIRED=1.1
XVMC_REQUIRED=1.0.6
PYTHON_MAKO_REQUIRED=0.3.4
PYTHON_MAKO_REQUIRED=0.8.0
LIBSENSORS_REQUIRED=4.0.0
dnl Check for progs
AC_PROG_CPP
@@ -99,7 +100,6 @@ AM_PROG_CC_C_O
AM_PROG_AS
AX_CHECK_GNU_MAKE
AC_CHECK_PROGS([PYTHON2], [python2.7 python2 python])
AC_CHECK_PROGS([PYTHON3], [python3.5 python3.4 python3])
AC_PROG_SED
AC_PROG_MKDIR_P
@@ -109,6 +109,7 @@ LT_PREREQ([2.2])
LT_INIT([disable-static])
AC_CHECK_PROG(RM, rm, [rm -f])
AC_CHECK_PROG(XXD, xxd, [xxd])
AX_PROG_BISON([],
AS_IF([test ! -f "$srcdir/src/compiler/glsl/glcpp/glcpp-parse.c"],
@@ -142,12 +143,6 @@ else
fi
fi
if test -z "$PYTHON3"; then
if test ! -f "$srcdir/src/intel/genxml/gen9_pack.h"; then
AC_MSG_ERROR([Python3 not found - unable to generate sources])
fi
fi
AC_PROG_INSTALL
dnl We need a POSIX shell for parts of the build. Assume we have one
@@ -232,6 +227,7 @@ AX_GCC_FUNC_ATTRIBUTE([packed])
AX_GCC_FUNC_ATTRIBUTE([pure])
AX_GCC_FUNC_ATTRIBUTE([returns_nonnull])
AX_GCC_FUNC_ATTRIBUTE([unused])
AX_GCC_FUNC_ATTRIBUTE([visibility])
AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
AX_GCC_FUNC_ATTRIBUTE([weak])
@@ -261,15 +257,12 @@ case "$host_os" in
*-android)
android=yes
;;
linux*|*-gnu*|gnu*)
linux*|*-gnu*|gnu*|cygwin*)
DEFINES="$DEFINES -D_GNU_SOURCE"
;;
solaris*)
DEFINES="$DEFINES -DSVR4"
;;
cygwin*)
DEFINES="$DEFINES -D_XOPEN_SOURCE=700"
;;
esac
AM_CONDITIONAL(HAVE_ANDROID, test "x$android" = xyes)
@@ -308,16 +301,9 @@ if test "x$GCC" = xyes; then
# Restore CFLAGS; VISIBILITY_CFLAGS are added to it where needed.
CFLAGS=$save_CFLAGS
# Work around aliasing bugs - developers should comment this out
CFLAGS="$CFLAGS -fno-strict-aliasing"
# We don't want floating-point math functions to set errno or trap
CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
# gcc's builtin memcmp is slower than glibc's
# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
CFLAGS="$CFLAGS -fno-builtin-memcmp"
# Flags to help ensure that certain portions of the code -- and only those
# portions -- can be built with MSVC:
# - src/util, src/gallium/auxiliary, rc/gallium/drivers/llvmpipe, and
@@ -354,12 +340,8 @@ if test "x$GXX" = xyes; then
# Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
CXXFLAGS=$save_CXXFLAGS
# Work around aliasing bugs - developers should comment this out
CXXFLAGS="$CXXFLAGS -fno-strict-aliasing"
# gcc's builtin memcmp is slower than glibc's
# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
CXXFLAGS="$CXXFLAGS -fno-builtin-memcmp"
# We don't want floating-point math functions to set errno or trap
CXXFLAGS="$CXXFLAGS -fno-math-errno -fno-trapping-math"
fi
AC_SUBST([MSVC2013_COMPAT_CFLAGS])
@@ -405,6 +387,17 @@ fi
AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
AC_SUBST([SSE41_CFLAGS], $SSE41_CFLAGS)
dnl Check for new-style atomic builtins
AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
int main() {
int n;
return __atomic_load_n(&n, __ATOMIC_ACQUIRE);
}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=1)
if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" = x1; then
DEFINES="$DEFINES -DUSE_GCC_ATOMIC_BUILTINS"
fi
AM_CONDITIONAL([GCC_ATOMIC_BUILTINS_SUPPORTED], [test x$GCC_ATOMIC_BUILTINS_SUPPORTED = x1])
dnl Check for Endianness
AC_C_BIGENDIAN(
little_endian=no,
@@ -790,6 +783,7 @@ if test "x$enable_asm" = xyes; then
esac
fi
AC_HEADER_MAJOR
AC_CHECK_HEADER([xlocale.h], [DEFINES="$DEFINES -DHAVE_XLOCALE_H"])
AC_CHECK_HEADER([sys/sysctl.h], [DEFINES="$DEFINES -DHAVE_SYS_SYSCTL_H"])
AC_CHECK_FUNC([strtof], [DEFINES="$DEFINES -DHAVE_STRTOF"])
@@ -832,9 +826,21 @@ dnl to -pthread, which causes problems if we need -lpthread to appear in
dnl pkgconfig files.
test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread"
PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs)
AC_SUBST(PTHREADSTUBS_CFLAGS)
AC_SUBST(PTHREADSTUBS_LIBS)
dnl pthread-stubs is mandatory on targets where it exists
case "$host_os" in
cygwin* )
pthread_stubs_possible="no"
;;
* )
pthread_stubs_possible="yes"
;;
esac
if test "x$pthread_stubs_possible" = xyes; then
PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs)
AC_SUBST(PTHREADSTUBS_CFLAGS)
AC_SUBST(PTHREADSTUBS_LIBS)
fi
dnl SELinux awareness.
AC_ARG_ENABLE([selinux],
@@ -877,6 +883,32 @@ AC_ARG_ENABLE([dri],
[enable_dri="$enableval"],
[enable_dri=yes])
AC_ARG_ENABLE([gallium-extra-hud],
[AS_HELP_STRING([--enable-gallium-extra-hud],
[enable HUD block/NIC I/O HUD stats support @<:@default=disabled@:>@])],
[enable_gallium_extra_hud="$enableval"],
[enable_gallium_extra_hud=no])
AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = xyes)
if test "x$enable_gallium_extra_hud" = xyes ; then
DEFINES="${DEFINES} -DHAVE_GALLIUM_EXTRA_HUD=1"
fi
#TODO: no pkgconfig .pc available for libsensors.
#PKG_CHECK_MODULES([LIBSENSORS], [libsensors >= $LIBSENSORS_REQUIRED], [enable_lmsensors=yes], [enable_lmsensors=no])
AC_ARG_ENABLE([lmsensors],
[AS_HELP_STRING([--enable-lmsensors],
[enable HUD lmsensor support @<:@default=disabled@:>@])],
[enable_lmsensors="$enableval"],
[enable_lmsensors=no])
AM_CONDITIONAL(HAVE_LIBSENSORS, test "x$enable_lmsensors" = xyes)
if test "x$enable_lmsensors" = xyes ; then
DEFINES="${DEFINES} -DHAVE_LIBSENSORS=1"
LIBSENSORS_LDFLAGS="-lsensors"
else
LIBSENSORS_LDFLAGS=""
fi
AC_SUBST(LIBSENSORS_LDFLAGS)
case "$host_os" in
linux*)
dri3_default=yes
@@ -1067,6 +1099,7 @@ xno)
;;
esac
AM_CONDITIONAL(HAVE_GLX, test "x$enable_glx" != xno)
AM_CONDITIONAL(HAVE_DRI_GLX, test "x$enable_glx" = xdri)
AM_CONDITIONAL(HAVE_XLIB_GLX, test "x$enable_glx" = xxlib)
AM_CONDITIONAL(HAVE_GALLIUM_XLIB_GLX, test "x$enable_glx" = xgallium-xlib)
@@ -1107,11 +1140,20 @@ if test "x$have_libdrm" = xyes; then
DEFINES="$DEFINES -DHAVE_LIBDRM"
fi
require_libdrm() {
if test "x$have_libdrm" != xyes; then
AC_MSG_ERROR([$1 requires libdrm >= $LIBDRM_REQUIRED])
fi
}
# Select which platform-dependent DRI code gets built
case "$host_os" in
darwin*)
dri_platform='apple' ;;
gnu*|cygwin*)
cygwin*)
dri_platform='windows' ;;
gnu*)
dri_platform='none' ;;
*)
dri_platform='drm' ;;
@@ -1127,6 +1169,9 @@ AM_CONDITIONAL(HAVE_DRISW_KMS, test "x$have_drisw_kms" = xyes )
AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xapple )
AM_CONDITIONAL(HAVE_LMSENSORS, test "x$enable_lmsensors" = xyes )
AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = xyes )
AM_CONDITIONAL(HAVE_WINDOWSDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xwindows )
AC_ARG_ENABLE([shared-glapi],
[AS_HELP_STRING([--enable-shared-glapi],
@@ -1307,23 +1352,9 @@ if test "x$with_sha1" = "x"; then
fi
fi
AM_CONDITIONAL([ENABLE_SHADER_CACHE], [test x$enable_shader_cache = xyes])
case "$host_os" in
linux*)
need_pci_id=yes ;;
*)
need_pci_id=no ;;
esac
PKG_CHECK_MODULES([LIBUDEV], [libudev >= $LIBUDEV_REQUIRED],
have_libudev=yes, have_libudev=no)
AC_ARG_ENABLE([sysfs],
[AS_HELP_STRING([--enable-sysfs],
[enable /sys PCI identification @<:@default=disabled@:>@])],
[have_sysfs="$enableval"],
[have_sysfs=no]
)
if test "x$enable_shader_cache" = "xyes"; then
AC_DEFINE([ENABLE_SHADER_CACHE], [1], [Enable shader cache])
fi
if test "x$enable_dri" = xyes; then
if test "$enable_static" = yes; then
@@ -1367,9 +1398,7 @@ xdri)
if test x"$driglx_direct" = xyes; then
if test x"$dri_platform" = xdrm ; then
DEFINES="$DEFINES -DGLX_USE_DRM"
if test "x$have_libdrm" != xyes; then
AC_MSG_ERROR([Direct rendering requires libdrm >= $LIBDRM_REQUIRED])
fi
require_libdrm "Direct rendering"
PKG_CHECK_MODULES([DRI2PROTO], [dri2proto >= $DRI2PROTO_REQUIRED])
GL_PC_REQ_PRIV="$GL_PC_REQ_PRIV libdrm >= $LIBDRM_REQUIRED"
@@ -1391,6 +1420,9 @@ xdri)
if test x"$dri_platform" = xapple ; then
DEFINES="$DEFINES -DGLX_USE_APPLEGL"
fi
if test x"$dri_platform" = xwindows ; then
DEFINES="$DEFINES -DGLX_USE_WINDOWSGL"
fi
fi
# add xf86vidmode if available
@@ -1410,17 +1442,6 @@ xdri)
;;
esac
have_pci_id=no
if test "$have_libudev" = yes; then
DEFINES="$DEFINES -DHAVE_LIBUDEV"
have_pci_id=yes
fi
if test "$have_sysfs" = yes; then
DEFINES="$DEFINES -DHAVE_SYSFS"
have_pci_id=yes
fi
# This is outside the case (above) so that it is invoked even for non-GLX
# builds.
AM_CONDITIONAL(HAVE_XF86VIDMODE, test "x$HAVE_XF86VIDMODE" = xyes)
@@ -1529,10 +1550,6 @@ if test "x$enable_dri" = xyes; then
DEFINES="$DEFINES -DHAVE_DRI3"
fi
if test "x$have_pci_id" != xyes; then
AC_MSG_ERROR([libudev-dev or sysfs required for building DRI])
fi
case "$host_cpu" in
powerpc* | sparc*)
# Build only the drivers for cards that exist on PowerPC/sparc
@@ -1645,9 +1662,9 @@ esac
AC_ARG_WITH([vulkan-icddir],
[AS_HELP_STRING([--with-vulkan-icddir=DIR],
[directory for the Vulkan driver icd files @<:@${sysconfdir}/vulkan/icd.d@:>@])],
[directory for the Vulkan driver icd files @<:@${datarootdir}/vulkan/icd.d@:>@])],
[VULKAN_ICD_INSTALL_DIR="$withval"],
[VULKAN_ICD_INSTALL_DIR='${sysconfdir}/vulkan/icd.d'])
[VULKAN_ICD_INSTALL_DIR='${datarootdir}/vulkan/icd.d'])
AC_SUBST([VULKAN_ICD_INSTALL_DIR])
if test -n "$with_vulkan_drivers"; then
@@ -1664,6 +1681,13 @@ if test -n "$with_vulkan_drivers"; then
HAVE_INTEL_VULKAN=yes;
;;
xradeon)
PKG_CHECK_MODULES([AMDGPU], [libdrm_amdgpu >= $LIBDRM_AMDGPU_REQUIRED])
HAVE_RADEON_VULKAN=yes;
if test "x$with_sha1" == "x"; then
AC_MSG_ERROR([radv vulkan driver requires SHA1])
fi
;;
*)
AC_MSG_ERROR([Vulkan driver '$driver' does not exist])
;;
@@ -1733,10 +1757,6 @@ if test "x$enable_gbm" = xauto; then
esac
fi
if test "x$enable_gbm" = xyes; then
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([gbm requires udev >= $LIBUDEV_REQUIRED or sysfs])
fi
if test "x$enable_dri" = xyes; then
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([gbm_dri requires --enable-shared-glapi])
@@ -1751,11 +1771,8 @@ if test "x$enable_gbm" = xyes; then
fi
fi
AM_CONDITIONAL(HAVE_GBM, test "x$enable_gbm" = xyes)
if test "x$need_pci_id$have_libudev" = xyesyes; then
GBM_PC_REQ_PRIV="libudev >= $LIBUDEV_REQUIRED"
else
GBM_PC_REQ_PRIV=""
fi
# FINISHME: GBM has a number of dependencies which we should add below
GBM_PC_REQ_PRIV=""
GBM_PC_LIB_PRIV="$DLOPEN_LIBS"
AC_SUBST([GBM_PC_REQ_PRIV])
AC_SUBST([GBM_PC_LIB_PRIV])
@@ -2003,8 +2020,8 @@ if test "x$with_egl_platforms" != "x" -a "x$enable_egl" != xyes; then
AC_MSG_ERROR([cannot build egl state tracker without EGL library])
fi
PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland_scanner],
WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner wayland_scanner`,
PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland-scanner],
WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner wayland-scanner`,
WAYLAND_SCANNER='')
if test "x$WAYLAND_SCANNER" = x; then
AC_PATH_PROG([WAYLAND_SCANNER], [wayland-scanner])
@@ -2015,8 +2032,6 @@ egl_platforms=`IFS=', '; echo $with_egl_platforms`
for plat in $egl_platforms; do
case "$plat" in
wayland)
test "x$have_libdrm" != xyes &&
AC_MSG_ERROR([EGL platform wayland requires libdrm >= $LIBDRM_REQUIRED])
PKG_CHECK_MODULES([WAYLAND], [wayland-client >= $WAYLAND_REQUIRED wayland-server >= $WAYLAND_REQUIRED])
@@ -2032,13 +2047,9 @@ for plat in $egl_platforms; do
drm)
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
test "x$have_libdrm" != xyes &&
AC_MSG_ERROR([EGL platform drm requires libdrm >= $LIBDRM_REQUIRED])
;;
surfaceless)
test "x$have_libdrm" != xyes &&
AC_MSG_ERROR([EGL platform surfaceless requires libdrm >= $LIBDRM_REQUIRED])
;;
android)
@@ -2049,10 +2060,11 @@ for plat in $egl_platforms; do
;;
esac
case "$plat$need_pci_id$have_pci_id" in
waylandyesno|drmyesno)
AC_MSG_ERROR([cannot build $plat platform without udev >= $LIBUDEV_REQUIRED or sysfs]) ;;
esac
case "$plat" in
wayland|drm|surfaceless)
require_libdrm "Platform $plat"
;;
esac
done
# libEGL wants to default to the first platform specified in
@@ -2109,6 +2121,9 @@ AC_ARG_WITH([llvm-prefix],
strip_unwanted_llvm_flags() {
# Use \> (marks the end of the word)
echo `$1` | sed \
-e 's/-march=\S*//g' \
-e 's/-mtune=\S*//g' \
-e 's/-mcpu=\S*//g' \
-e 's/-DNDEBUG\>//g' \
-e 's/-D_GNU_SOURCE\>//g' \
-e 's/-pedantic\>//g' \
@@ -2144,7 +2159,7 @@ if test "x$enable_gallium_llvm" = xauto; then
i*86|x86_64|amd64) enable_gallium_llvm=yes;;
esac
fi
if test "x$enable_gallium_llvm" = xyes; then
if test "x$enable_gallium_llvm" = xyes || test "x$HAVE_RADEON_VULKAN" = xyes; then
if test -n "$llvm_prefix"; then
AC_PATH_TOOL([LLVM_CONFIG], [llvm-config], [no], ["$llvm_prefix/bin"])
else
@@ -2185,8 +2200,12 @@ if test "x$enable_gallium_llvm" = xyes; then
LLVM_COMPONENTS="engine bitwriter mcjit mcdisassembler"
if $LLVM_CONFIG --components | grep -q inteljitevents ; then
LLVM_COMPONENTS="${LLVM_COMPONENTS} inteljitevents"
fi
if test "x$enable_opencl" = xyes; then
llvm_check_version_for "3" "5" "0" "opencl"
llvm_check_version_for "3" "6" "0" "opencl"
LLVM_COMPONENTS="${LLVM_COMPONENTS} all-targets ipo linker instrumentation"
LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader option objcarcopts profiledata"
@@ -2265,12 +2284,6 @@ AC_SUBST([D3D_DRIVER_INSTALL_DIR])
dnl
dnl Gallium helper functions
dnl
gallium_require_drm() {
if test "x$have_libdrm" != xyes; then
AC_MSG_ERROR([$1 requires libdrm >= $LIBDRM_REQUIRED])
fi
}
gallium_require_llvm() {
if test "x$MESA_LLVM" = x0; then
case "$host" in *gnux32) return;; esac
@@ -2280,12 +2293,6 @@ gallium_require_llvm() {
fi
}
gallium_require_drm_loader() {
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([Gallium drm loader requires libudev >= $LIBUDEV_REQUIRED or sysfs])
fi
}
dnl This is for Glamor. Skip this if OpenGL is disabled.
require_egl_drm() {
if test "x$enable_opengl" = xno; then
@@ -2310,10 +2317,7 @@ radeon_llvm_check() {
else
amdgpu_llvm_target_name='amdgpu'
fi
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
llvm_check_version_for "3" "6" "0" $1
llvm_check_version_for $2 $3 $4 $1
if test true && $LLVM_CONFIG --targets-built | grep -iqvw $amdgpu_llvm_target_name ; then
AC_MSG_ERROR([LLVM $amdgpu_llvm_target_name not enabled in your LLVM build.])
fi
@@ -2324,6 +2328,13 @@ radeon_llvm_check() {
fi
}
radeon_gallium_llvm_check() {
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
radeon_llvm_check $*
}
swr_llvm_check() {
gallium_require_llvm $1
if test ${LLVM_VERSION_INT} -lt 306; then
@@ -2334,6 +2345,45 @@ swr_llvm_check() {
fi
}
swr_require_cxx_feature_flags() {
feature_name="$1"
preprocessor_test="$2"
option_list="$3"
output_var="$4"
AC_MSG_CHECKING([whether $CXX supports $feature_name])
AC_LANG_PUSH([C++])
save_CXXFLAGS="$CXXFLAGS"
save_IFS="$IFS"
IFS=","
found=0
for opts in $option_list
do
unset IFS
CXXFLAGS="$opts $save_CXXFLAGS"
AC_COMPILE_IFELSE(
[AC_LANG_PROGRAM(
[ #if !($preprocessor_test)
#error
#endif
])],
[found=1; break],
[])
IFS=","
done
IFS="$save_IFS"
CXXFLAGS="$save_CXXFLAGS"
AC_LANG_POP([C++])
if test $found -eq 1; then
AC_MSG_RESULT([$opts])
eval "$output_var=\$opts"
return 0
fi
AC_MSG_RESULT([no])
AC_MSG_ERROR([swr requires $feature_name support])
return 1
}
dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block
if test -n "$with_gallium_drivers"; then
gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2341,35 +2391,30 @@ if test -n "$with_gallium_drivers"; then
case "x$driver" in
xsvga)
HAVE_GALLIUM_SVGA=yes
gallium_require_drm "svga"
gallium_require_drm_loader
require_libdrm "svga"
;;
xi915)
HAVE_GALLIUM_I915=yes
PKG_CHECK_MODULES([INTEL], [libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
gallium_require_drm "Gallium i915"
gallium_require_drm_loader
require_libdrm "Gallium i915"
;;
xilo)
HAVE_GALLIUM_ILO=yes
PKG_CHECK_MODULES([INTEL], [libdrm_intel >= $LIBDRM_INTEL_REQUIRED])
gallium_require_drm "Gallium i965/ilo"
gallium_require_drm_loader
require_libdrm "Gallium i965/ilo"
;;
xr300)
HAVE_GALLIUM_R300=yes
PKG_CHECK_MODULES([RADEON], [libdrm_radeon >= $LIBDRM_RADEON_REQUIRED])
gallium_require_drm "Gallium R300"
gallium_require_drm_loader
require_libdrm "Gallium R300"
gallium_require_llvm "Gallium R300"
;;
xr600)
HAVE_GALLIUM_R600=yes
PKG_CHECK_MODULES([RADEON], [libdrm_radeon >= $LIBDRM_RADEON_REQUIRED])
gallium_require_drm "Gallium R600"
gallium_require_drm_loader
require_libdrm "Gallium R600"
if test "x$enable_opencl" = xyes; then
radeon_llvm_check "r600g"
radeon_gallium_llvm_check "r600g" "3" "6" "0"
LLVM_COMPONENTS="${LLVM_COMPONENTS} bitreader asmparser"
fi
;;
@@ -2377,22 +2422,19 @@ if test -n "$with_gallium_drivers"; then
HAVE_GALLIUM_RADEONSI=yes
PKG_CHECK_MODULES([RADEON], [libdrm_radeon >= $LIBDRM_RADEON_REQUIRED])
PKG_CHECK_MODULES([AMDGPU], [libdrm_amdgpu >= $LIBDRM_AMDGPU_REQUIRED])
gallium_require_drm "radeonsi"
gallium_require_drm_loader
radeon_llvm_check "radeonsi"
require_libdrm "radeonsi"
radeon_gallium_llvm_check "radeonsi" "3" "6" "0"
require_egl_drm "radeonsi"
;;
xnouveau)
HAVE_GALLIUM_NOUVEAU=yes
PKG_CHECK_MODULES([NOUVEAU], [libdrm_nouveau >= $LIBDRM_NOUVEAU_REQUIRED])
gallium_require_drm "nouveau"
gallium_require_drm_loader
require_libdrm "nouveau"
;;
xfreedreno)
HAVE_GALLIUM_FREEDRENO=yes
PKG_CHECK_MODULES([FREEDRENO], [libdrm_freedreno >= $LIBDRM_FREEDRENO_REQUIRED])
gallium_require_drm "freedreno"
gallium_require_drm_loader
require_libdrm "freedreno"
;;
xswrast)
HAVE_GALLIUM_SOFTPIPE=yes
@@ -2403,36 +2445,27 @@ if test -n "$with_gallium_drivers"; then
xswr)
swr_llvm_check "swr"
AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
AVX_CXXFLAGS="-march=core-avx-i"
AVX2_CXXFLAGS="-march=core-avx2"
swr_require_cxx_feature_flags "C++11" "__cplusplus >= 201103L" \
",-std=c++11" \
SWR_CXX11_CXXFLAGS
AC_SUBST([SWR_CXX11_CXXFLAGS])
AC_LANG_PUSH([C++])
save_CXXFLAGS="$CXXFLAGS"
CXXFLAGS="-std=c++11 $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([c++11 compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \
",-mavx,-march=core-avx" \
SWR_AVX_CXXFLAGS
AC_SUBST([SWR_AVX_CXXFLAGS])
save_CXXFLAGS="$CXXFLAGS"
CXXFLAGS="$AVX_CXXFLAGS $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([AVX compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
save_CFLAGS="$CXXFLAGS"
CXXFLAGS="$AVX2_CXXFLAGS $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([AVX2 compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
AC_LANG_POP([C++])
swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
SWR_AVX2_CXXFLAGS
AC_SUBST([SWR_AVX2_CXXFLAGS])
HAVE_GALLIUM_SWR=yes
;;
xvc4)
HAVE_GALLIUM_VC4=yes
gallium_require_drm "vc4"
gallium_require_drm_loader
PKG_CHECK_MODULES([VC4], [libdrm_vc4 >= $LIBDRM_VC4_REQUIRED])
require_libdrm "vc4"
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes;
@@ -2441,8 +2474,7 @@ if test -n "$with_gallium_drivers"; then
;;
xvirgl)
HAVE_GALLIUM_VIRGL=yes
gallium_require_drm "virgl"
gallium_require_drm_loader
require_libdrm "virgl"
require_egl_drm "virgl"
;;
*)
@@ -2452,6 +2484,10 @@ if test -n "$with_gallium_drivers"; then
done
fi
if test "x$HAVE_RADEON_VULKAN" = "xyes"; then
radeon_llvm_check "radv" "3" "9" "0"
fi
dnl Set LLVM_LIBS - This is done after the driver configuration so
dnl that drivers can add additional components to LLVM_COMPONENTS.
dnl Previously, gallium drivers were updating LLVM_LIBS directly
@@ -2500,7 +2536,7 @@ if test "x$MESA_LLVM" != x0; then
AC_MSG_WARN([Building mesa with statically linked LLVM may cause compilation issues])
dnl We need to link to llvm system libs when using static libs
dnl However, only llvm 3.5+ provides --system-libs
if test $LLVM_VERSION_MAJOR -eq 3 -a $LLVM_VERSION_MINOR -ge 5; then
if test $LLVM_VERSION_MAJOR -ge 4 -o $LLVM_VERSION_MAJOR -eq 3 -a $LLVM_VERSION_MINOR -ge 5; then
LLVM_LIBS="$LLVM_LIBS `$LLVM_CONFIG --system-libs`"
fi
fi
@@ -2543,8 +2579,13 @@ AM_CONDITIONAL(HAVE_R200_DRI, test x$HAVE_R200_DRI = xyes)
AM_CONDITIONAL(HAVE_RADEON_DRI, test x$HAVE_RADEON_DRI = xyes)
AM_CONDITIONAL(HAVE_SWRAST_DRI, test x$HAVE_SWRAST_DRI = xyes)
AM_CONDITIONAL(HAVE_RADEON_VULKAN, test "x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_VULKAN, test "x$HAVE_INTEL_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_AMD_DRIVERS, test "x$HAVE_GALLIUM_R600" = xyes -o \
"x$HAVE_GALLIUM_RADEONSI" = xyes -o \
"x$HAVE_RADEON_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \
"x$HAVE_I965_DRI" = xyes)
@@ -2563,6 +2604,8 @@ fi
AM_CONDITIONAL(HAVE_LIBDRM, test "x$have_libdrm" = xyes)
AM_CONDITIONAL(HAVE_OSMESA, test "x$enable_osmesa" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_OSMESA, test "x$enable_gallium_osmesa" = xyes)
AM_CONDITIONAL(HAVE_COMMON_OSMESA, test "x$enable_osmesa" = xyes -o \
"x$enable_gallium_osmesa" = xyes)
AM_CONDITIONAL(HAVE_X86_ASM, test "x$asm_arch" = xx86 -o "x$asm_arch" = xx86_64)
AM_CONDITIONAL(HAVE_X86_64_ASM, test "x$asm_arch" = xx86_64)
@@ -2581,6 +2624,8 @@ VA_MINOR=`$PKG_CONFIG --modversion libva | $SED -n 's/.*\.\(.*\)\..*$/\1/p'`
AC_SUBST([VA_MAJOR], $VA_MAJOR)
AC_SUBST([VA_MINOR], $VA_MINOR)
AM_CONDITIONAL(HAVE_VULKAN_COMMON, test "x$VULKAN_DRIVERS" != "x")
AC_SUBST([XVMC_MAJOR], 1)
AC_SUBST([XVMC_MINOR], 0)
@@ -2594,6 +2639,8 @@ AC_SUBST([XA_MINOR], $XA_MINOR)
AC_SUBST([XA_TINY], $XA_TINY)
AC_SUBST([XA_VERSION], "$XA_MAJOR.$XA_MINOR.$XA_TINY")
AC_SUBST([TIMESTAMP_CMD], '`test $(SOURCE_DATE_EPOCH) && echo $(SOURCE_DATE_EPOCH) || date +%s`')
AC_ARG_ENABLE(valgrind,
[AS_HELP_STRING([--enable-valgrind],
[Build mesa with valgrind support (default: auto)])],
@@ -2632,6 +2679,9 @@ CXXFLAGS="$CXXFLAGS $USER_CXXFLAGS"
dnl Substitute the config
AC_CONFIG_FILES([Makefile
src/Makefile
src/amd/Makefile
src/amd/common/Makefile
src/amd/vulkan/Makefile
src/compiler/Makefile
src/egl/Makefile
src/egl/main/egl.pc
@@ -2706,10 +2756,11 @@ AC_CONFIG_FILES([Makefile
src/glx/Makefile
src/glx/apple/Makefile
src/glx/tests/Makefile
src/glx/windows/Makefile
src/glx/windows/windowsdriproto.pc
src/gtest/Makefile
src/intel/Makefile
src/intel/genxml/Makefile
src/intel/isl/Makefile
src/intel/tools/Makefile
src/intel/vulkan/Makefile
src/loader/Makefile
src/mapi/Makefile
@@ -2733,16 +2784,14 @@ AC_CONFIG_FILES([Makefile
src/mesa/drivers/x11/Makefile
src/mesa/main/tests/Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile])
src/util/tests/hash_table/Makefile
src/vulkan/wsi/Makefile])
AC_OUTPUT
# Fix up dependencies in *.Plo files, where we changed the extension of a
# source file
$SED -i -e 's/brw_blorp.cpp/brw_blorp.c/' src/mesa/drivers/dri/i965/.deps/brw_blorp.Plo
$SED -i -e 's/gen6_blorp.cpp/gen6_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen6_blorp.Plo
$SED -i -e 's/gen7_blorp.cpp/gen7_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen7_blorp.Plo
$SED -i -e 's/gen8_blorp.cpp/gen8_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen8_blorp.Plo
dnl
@@ -2841,6 +2890,19 @@ else
echo " Gallium: no"
fi
echo ""
if test "x$enable_gallium_extra_hud" != xyes; then
echo " HUD extra stats: no"
else
echo " HUD extra stats: yes"
fi
if test "x$enable_lmsensors" != xyes; then
echo " HUD lmsensors: no"
else
echo " HUD lmsensors: yes"
fi
dnl Shader cache
echo ""
echo " Shader cache: $enable_shader_cache"
@@ -2874,7 +2936,6 @@ if test "x$MESA_LLVM" = x1; then
echo ""
fi
echo " PYTHON2: $PYTHON2"
echo " PYTHON3: $PYTHON3"
echo ""
echo " Run '${MAKE-make}' to build Mesa"

View File

@@ -38,7 +38,7 @@ including:
<p>
Other companies including
<a href="http://www.intellinuxgraphics.org/index.html">Intel</a>
<a href="https://01.org/linuxgraphics">Intel</a>
and RedHat also actively contribute to the project.
Intel has recently contributed the new GLSL compiler in Mesa 7.9.
</p>

View File

@@ -252,10 +252,14 @@ check for regressions.
<h3>Mailing Patches</h3>
<p>
Patches should be sent to the Mesa mailing list for review.
When submitting a patch make sure to use git send-email rather than attaching
patches to emails. Sending patches as attachments prevents people from being
able to provide in-line review comments.
Patches should be sent to the mesa-dev mailing list for review:
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">
mesa-dev@lists.freedesktop.org<a/>.
When submitting a patch make sure to use
<a href="https://git-scm.com/docs/git-send-email">git send-email</a>
rather than attaching patches to emails. Sending patches as
attachments prevents people from being able to provide in-line review
comments.
</p>
<p>
@@ -684,9 +688,11 @@ To add a new GL extension to Mesa you have to do at least the following.
</li>
<li>
Add a new entry to the <code>gl_extensions</code> struct in mtypes.h
if the extension requires driver capabilities not already exposed by
another extension.
</li>
<li>
Update the <code>extensions.c</code> file.
Add a new entry to the src/mesa/main/extensions_table.h file.
</li>
<li>
From this point, the best way to proceed is to find another extension,
@@ -697,12 +703,18 @@ To add a new GL extension to Mesa you have to do at least the following.
If the new extension adds new GL state, the functions in get.c, enable.c
and attrib.c will most likely require new code.
</li>
<li>
To determine if the new extension is active in the current context,
use the auto-generated _mesa_has_##name_str() function defined in
src/mesa/main/extensions.h.
</li>
<li>
The dispatch tests check_table.cpp and dispatch_sanity.cpp
should be updated with details about the new extensions functions. These
tests are run using 'make check'
</li>
</ul>
</p>

View File

@@ -50,8 +50,17 @@ sometimes be useful for debugging end-user issues.
if the application generates a GL_INVALID_ENUM error, a corresponding error
message indicating where the error occurred, and possibly why, will be
printed to stderr.<br>
If the value of MESA_DEBUG is 'FP' floating point arithmetic errors will
generate exceptions.
For release builds, MESA_DEBUG defaults to off (no debug output).
MESA_DEBUG accepts the following comma-separated list of named
flags, which adds extra behaviour to just set MESA_DEBUG=1:
<ul>
<li>silent - turn off debug messages. Only useful for debug builds.</li>
<li>flush - flush after each drawing command</li>
<li>incomplete_tex - extra debug messages when a texture is incomplete</li>
<li>incomplete_fbo - extra debug messages when a fbo is incomplete</li>
</ul>
<li>MESA_LOG_FILE - specifies a file name for logging all errors, warnings,
etc., rather than stderr
<li>MESA_TEX_PROG - if set, implement conventional texture env modes with
@@ -144,11 +153,10 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>bat - emit batch information</li>
<li>pix - emit messages about pixel operations</li>
<li>buf - emit messages about buffer objects</li>
<li>reg - emit messages about regions</li>
<li>fbo - emit messages about framebuffers</li>
<li>fs - dump shader assembly for fragment shaders</li>
<li>gs - dump shader assembly for geometry shaders</li>
<li>sync - emit messages about synchronization</li>
<li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>
<li>prim - emit messages about drawing primitives</li>
<li>vert - emit messages about vertex assembly</li>
<li>dri - emit messages about the DRI interface</li>
@@ -163,9 +171,19 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>ann - annotate IR in assembly dumps</li>
<li>no8 - don't generate SIMD8 fragment shader</li>
<li>vec4 - force vec4 mode in vertex shader</li>
<li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>
<li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>
<li>cs - dump shader assembly for compute shaders</li>
<li>hex - print instruction hex dump with the disassembly</li>
<li>nocompact - disable instruction compaction</li>
<li>tcs - dump shader assembly for tessellation control shaders</li>
<li>tes - dump shader assembly for tessellation evaluation shaders</li>
<li>l3 - emit messages about the new L3 state during transitions</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>norbc - disable single sampled render buffer compression</li>
</ul>
</ul>
@@ -197,8 +215,10 @@ Mesa EGL supports different sets of environment variables. See the
<li>GALLIUM_HUD_TOGGLE_SIGNAL - toggle visibility via user specified signal.
Especially useful to toggle hud at specific points of application and
disable for unencumbered viewing the rest of the time. For example, set
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_SIGNAL_TOGGLE to 10 (SIGUSR1).
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_TOGGLE_SIGNAL to 10 (SIGUSR1).
Use kill -10 <pid> to toggle the hud as desired.
<li>GALLIUM_DRIVER - useful in combination with LIBGL_ALWAYS_SOFTWARE=1 for
choosing one of the software renderers "softpipe", "llvmpipe" or "swr".
<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.
rather than stderr.
<li>GALLIUM_PRINT_OPTIONS - if non-zero, print all the Gallium environment

View File

@@ -57,7 +57,7 @@ drivers for X.org.
<ul>
<li>See the <a href="http://dri.freedesktop.org/">DRI website</a>
for more information.</li>
<li>See <a href="http://intellinuxgraphics.org">intellinuxgraphics.org</a>
<li>See <a href="https://01.org/linuxgraphics">01.org</a>
for more information about Intel drivers.</li>
<li>See <a href="http://nouveau.freedesktop.org">nouveau.freedesktop.org</a>
for more information about Nouveau drivers.</li>

View File

@@ -107,11 +107,11 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
GL_ARB_vertex_type_2_10_10_10_rev DONE (swr)
GL 4.0, GLSL 4.00 --- all DONE: nvc0, r600, radeonsi
GL 4.0, GLSL 4.00 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL_ARB_draw_buffers_blend DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (i965, llvmpipe, softpipe, swr)
GL_ARB_gpu_shader5 DONE (i965)
GL_ARB_draw_buffers_blend DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL_ARB_gpu_shader5 DONE (i965/gen7+)
- 'precise' qualifier DONE
- Dynamically uniform sampler array indices DONE (softpipe)
- Dynamically uniform UBO array indices DONE ()
@@ -124,154 +124,214 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, r600, radeonsi
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (i965/gen8+, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50)
GL_ARB_shader_subroutine DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_tessellation_shader DONE (i965)
GL_ARB_texture_buffer_object_rgb32 DONE (i965, llvmpipe, softpipe, swr)
GL_ARB_texture_cube_map_array DONE (i965, nv50, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965/gen6+, nv50)
GL_ARB_shader_subroutine DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_tessellation_shader DONE (i965/gen7+)
GL_ARB_texture_buffer_object_rgb32 DONE (i965/gen6+, llvmpipe, softpipe, swr)
GL_ARB_texture_cube_map_array DONE (i965/gen6+, nv50, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_query_lod DONE (i965, nv50, softpipe)
GL_ARB_transform_feedback2 DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback2 DONE (i965/gen7+, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965/gen7+, nv50, llvmpipe, softpipe, swr)
GL 4.1, GLSL 4.10 --- all DONE: nvc0, r600, radeonsi
GL 4.1, GLSL 4.10 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL_ARB_ES2_compatibility DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (i965/gen8+, llvmpipe, softpipe)
GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe)
GL 4.2, GLSL 4.20 -- all DONE: radeonsi
GL 4.2, GLSL 4.20 -- all DONE: i965/gen8+, nvc0, radeonsi
GL_ARB_texture_compression_bptc DONE (i965, nvc0, r600, radeonsi)
GL_ARB_texture_compression_bptc DONE (i965, r600)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_atomic_counters DONE (i965, softpipe)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_shader_image_load_store DONE (i965, softpipe)
GL_ARB_conservative_depth DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_420pack DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_internalformat_query DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_internalformat_query DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_map_buffer_alignment DONE (all drivers)
GL 4.3, GLSL 4.30:
GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, radeonsi
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_copy_image DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_compute_shader DONE (i965, softpipe)
GL_ARB_copy_image DONE (i965, nv50, r600, softpipe, llvmpipe)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, r600, softpipe)
GL_ARB_internalformat_query2 DONE (all drivers)
GL_ARB_invalidate_subdata DONE (all drivers)
GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_multi_draw_indirect DONE (i965, r600, llvmpipe, softpipe, swr)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior DONE (i965, nvc0, radeonsi)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_robust_buffer_access_behavior DONE (i965)
GL_ARB_shader_image_size DONE (i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, softpipe)
GL_ARB_stencil_texturing DONE (i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (nv50, i965, r600, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_view DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GL 4.4, GLSL 4.40:
GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, radeonsi
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965, nv50, nvc0)
GL_ARB_enhanced_layouts in progress (Timothy)
GL_ARB_buffer_storage DONE (i965, nv50, r600)
GL_ARB_clear_texture DONE (i965, nv50, r600)
GL_ARB_enhanced_layouts DONE (i965, nv50, llvmpipe, softpipe)
- compile-time constant expressions DONE
- explicit byte offsets for blocks DONE
- forced alignment within blocks DONE
- specified vec4-slot component numbers in progress
- specified vec4-slot component numbers DONE (i965, nv50, llvmpipe, softpipe)
- specified transform/feedback layout DONE
- input/output block locations DONE
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object DONE (i965/hsw+, nvc0)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_stencil8 DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_query_buffer_object DONE (i965/hsw+)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_texture_stencil8 DONE (i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL 4.5, GLSL 4.50:
GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi
GL_ARB_ES3_1_compatibility DONE (nvc0, radeonsi)
GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_cull_distance DONE (i965, nv50, nvc0, llvmpipe, softpipe)
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_ES3_1_compatibility DONE (i965/hsw+)
GL_ARB_clip_control DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_cull_distance DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_derivative_control DONE (i965, nv50, r600)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_texture_barrier DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, r600)
GL_ARB_texture_barrier DONE (i965, nv50, r600)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robustness DONE (i965)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
These are the extensions cherry-picked to make GLES 3.1
GLES3.1, GLSL ES 3.1
GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_compute_shader DONE (i965/gen7+, softpipe)
GL_ARB_draw_indirect DONE (i965/gen7+, r600, llvmpipe, softpipe, swr)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_framebuffer_no_attachments DONE (i965/gen7+, r600, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_atomic_counters DONE (i965/gen7+, softpipe)
GL_ARB_shader_image_load_store DONE (i965/gen7+, softpipe)
GL_ARB_shader_image_size DONE (i965/gen7+, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965/gen7+, softpipe)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_stencil_texturing DONE (nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GS5 Enhanced textureGather DONE (i965, nvc0, r600, radeonsi)
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GS5 Enhanced textureGather DONE (i965/gen7+, r600)
GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, r600)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functionality not covered above:
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glGetBooleani_v - restrict to GLES enums
gl_HelperInvocation support DONE (i965, nvc0, r600, radeonsi)
gl_HelperInvocation support DONE (i965, r600)
GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+
GLES3.2, GLSL ES 3.2
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced not started
GL_KHR_blend_equation_advanced DONE (i965)
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness DONE (i965)
GL_KHR_robustness DONE (i965, nvc0, radeonsi)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image DONE (i965)
GL_OES_copy_image DONE (all drivers)
GL_OES_draw_buffers_indexed DONE (all drivers that support GL_ARB_draw_buffers_blend)
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader started (idr)
GL_OES_geometry_shader DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_gpu_shader5 DONE (all drivers that support GL_ARB_gpu_shader5)
GL_OES_primitive_bounding_box not started
GL_OES_primitive_bounding_box DONE (i965/gen7+, nvc0, radeonsi)
GL_OES_sample_shading DONE (i965, nvc0, r600, radeonsi)
GL_OES_sample_variables DONE (i965, nvc0, r600, radeonsi)
GL_OES_shader_image_atomic DONE (all drivers that support GL_ARB_shader_image_load_store)
GL_OES_shader_io_blocks DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_shader_multisample_interpolation DONE (i965, nvc0, r600, radeonsi)
GL_OES_tessellation_shader started (Ken)
GL_OES_tessellation_shader DONE (all drivers that support GL_ARB_tessellation_shader)
GL_OES_texture_border_clamp DONE (all drivers)
GL_OES_texture_buffer DONE (i965, nvc0, radeonsi)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_cube_map_array DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_texture_stencil8 DONE (all drivers that support GL_ARB_texture_stencil8)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES version:
GL_ARB_bindless_texture started (airlied)
GL_ARB_cl_event not started
GL_ARB_compute_variable_group_size DONE (nvc0, radeonsi)
GL_ARB_ES3_2_compatibility DONE (i965/gen8+)
GL_ARB_fragment_shader_interlock not started
GL_ARB_gl_spirv not started
GL_ARB_gpu_shader_int64 started (airlied for core and Gallium, idr for i965)
GL_ARB_indirect_parameters DONE (nvc0, radeonsi)
GL_ARB_parallel_shader_compile not started, but Chia-I Wu did some related work in 2014
GL_ARB_pipeline_statistics_query DONE (i965, nvc0, radeonsi, softpipe, swr)
GL_ARB_post_depth_coverage not started
GL_ARB_robustness_isolation not started
GL_ARB_sample_locations not started
GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr)
GL_ARB_shader_atomic_counter_ops DONE (nvc0, radeonsi, softpipe)
GL_ARB_shader_ballot not started
GL_ARB_shader_clock DONE (i965/gen7+)
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (nvc0)
GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+)
GL_ARB_sparse_buffer not started
GL_ARB_sparse_texture not started
GL_ARB_sparse_texture2 not started
GL_ARB_sparse_texture_clamp not started
GL_ARB_texture_filter_minmax not started
GL_ARB_transform_feedback_overflow_query not started
GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+)
GL_KHR_no_error not started
GL_KHR_texture_compression_astc_hdr DONE (core only)
GL_KHR_texture_compression_astc_sliced_3d not started
GL_OES_depth_texture_cube_map DONE (all drivers that support GLSL 1.30+)
GL_OES_EGL_image DONE (all drivers)
GL_OES_EGL_image_external_essl3 not started
GL_OES_required_internalformat not started - GLES2 extension based on OpenGL ES 3.0 feature
GL_OES_surfaceless_context DONE (all drivers)
GL_OES_texture_compression_astc DONE (core only)
GL_OES_texture_float DONE (i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
GL_OES_texture_float_linear DONE (i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
GL_OES_texture_half_float DONE (i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
GL_OES_texture_half_float_linear DONE (i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
GL_OES_texture_view not started - based on GL_ARB_texture_view
GL_OES_viewport_array DONE (i965, nvc0, radeonsi)
GLX_ARB_context_flush_control not started
GLX_ARB_robustness_application_isolation not started
GLX_ARB_robustness_share_group_isolation not started
The following extensions are not part of any OpenGL or OpenGL ES version, and
we DO NOT WANT implementations of these extensions for Mesa.
GL_ARB_geometry_shader4 Superseded by GL 3.2 geometry shaders
GL_ARB_matrix_palette Superseded by GL_ARB_vertex_program
GL_ARB_shading_language_include Not interesting
GL_ARB_shadow_ambient Superseded by GL_ARB_fragment_program
GL_ARB_vertex_blend Superseded by GL_ARB_vertex_program
More info about these features and the work involved can be found at
http://dri.freedesktop.org/wiki/MissingFunctionality

View File

@@ -56,8 +56,8 @@ You can find some further To-do lists here:
<b>Common To-Do lists:</b>
</p>
<ul>
<li><a href="http://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt">
<b>GL3.txt</b></a> - Status of OpenGL 3.x / 4.x features in Mesa.</li>
<li><a href="http://cgit.freedesktop.org/mesa/mesa/tree/docs/features.txt">
<b>features.txt</b></a> - Status of OpenGL 3.x / 4.x features in Mesa.</li>
<li><a href="http://dri.freedesktop.org/wiki/MissingFunctionality">
<b>MissingFunctionality</b></a> - Detailed information about missing OpenGL features.</li>
</ul>

View File

@@ -16,6 +16,31 @@
<h1>News</h1>
<h2>September 15, 2016</h2>
<p>
<a href="relnotes/12.0.3.html">Mesa 12.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>September 2, 2016</h2>
<p>
<a href="relnotes/12.0.2.html">Mesa 12.0.2</a> is released.
This is a bug-fix release.
</p>
<h2>July 8, 2016</h2>
<p>
<a href="relnotes/12.0.1.html">Mesa 12.0.1</a> is released.
This is a bug-fix release, resolving build issues in the r600 and
radeonsi drivers.
</p>
<p>
<a href="relnotes/12.0.0.html">Mesa 12.0.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>May 9, 2016</h2>
<p>
<a href="relnotes/11.1.4.html">Mesa 11.1.4</a> and

View File

@@ -173,6 +173,27 @@ of the OpenGL specification is implemented.
</p>
<h2>Version 12.x features</h2>
<p>
Version 12.x of Mesa implements the OpenGL 4.3 API, but not all drivers
support OpenGL 4.3.
</p>
<h2>Version 11.x features</h2>
<p>
Version 11.x of Mesa implements the OpenGL 4.1 API, but not all drivers
support OpenGL 4.1.
</p>
<h2>Version 10.x features</h2>
<p>
Version 10.x of Mesa implements the OpenGL 3.3 API, but not all drivers
support OpenGL 3.3.
</p>
<h2>Version 9.x features</h2>
<p>
Version 9.x of Mesa implements the OpenGL 3.1 API.
@@ -182,6 +203,10 @@ community contributed features required for OpenGL 3.1. The primary
features added since the Mesa 8.0 release are
GL_ARB_texture_buffer_object and GL_ARB_uniform_buffer_object.
</p>
<p>
Version 9.0 of Mesa also included the first release of the Clover state
tracker for OpenCL.
</p>
<h2>Version 8.x features</h2>

View File

@@ -21,6 +21,10 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/12.0.3.html">12.0.3 release notes</a>
<li><a href="relnotes/12.0.2.html">12.0.2 release notes</a>
<li><a href="relnotes/12.0.1.html">12.0.1 release notes</a>
<li><a href="relnotes/12.0.0.html">12.0.0 release notes</a>
<li><a href="relnotes/11.2.2.html">11.2.2 release notes</a>
<li><a href="relnotes/11.1.4.html">11.1.4 release notes</a>
<li><a href="relnotes/11.2.1.html">11.2.1 release notes</a>

View File

@@ -1,89 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.3.0 Release Notes / TBD</h1>
<p>
Mesa 11.3.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.3.1.
</p>
<p>
Mesa 11.3.0 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 4.3 on nvc0, radeonsi, i965 (Gen8+)</li>
<li>OpenGL ES 3.1 on nvc0, radeonsi</li>
<li>GL_ARB_ES3_1_compatibility on nvc0, radeonsi</li>
<li>GL_ARB_compute_shader on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_cull_distance on i965/gen6+, nv50, nvc0, llvmpipe, softpipe</li>
<li>GL_ARB_framebuffer_no_attachments on nvc0, r600, radeonsi, softpipe</li>
<li>GL_ARB_internalformat_query2 on all drivers</li>
<li>GL_ARB_query_buffer_object on i965/hsw+</li>
<li>GL_ARB_robust_buffer_access_behavior on i965, nvc0, radeonsi</li>
<li>GL_ARB_shader_atomic_counters on radeonsi, softpipe</li>
<li>GL_ARB_shader_atomic_counter_ops on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_load_store on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_size on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_storage_buffer_objects on radeonsi, softpipe</li>
<li>GL_ATI_fragment_shader on all Gallium drivers</li>
<li>GL_EXT_base_instance on all drivers that support GL_ARB_base_instance</li>
<li>GL_EXT_clip_cull_distance on all drivers that support GL_ARB_cull_distance</li>
<li>GL_KHR_robustness on i965</li>
<li>GL_OES_copy_image on i965 (Baytrail and Gen8+)</li>
<li>GL_OES_draw_buffers_indexed and GL_EXT_draw_buffers_indexed on all drivers that support GL_ARB_draw_buffers_blend</li>
<li>GL_OES_gpu_shader5 and GL_EXT_gpu_shader5 on all drivers that support GL_ARB_gpu_shader5</li>
<li>GL_OES_sample_shading on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_sample_variables on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_shader_image_atomic on all drivers that support GL_ARB_shader_image_load_store</li>
<li>GL_OES_shader_io_blocks on i965, nvc0, radeonsi</li>
<li>GL_OES_shader_multisample_interpolation on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_texture_border_clamp and GL_EXT_texture_border_clamp on all drivers that support GL_ARB_texture_border_clamp</li>
<li>GL_OES_texture_buffer and GL_EXT_texture_buffer on i965, nvc0, radeonsi</li>
<li>EGL_KHR_reusable_sync on all drivers</li>
<li>GL_ARB_stencil_texture8 and GL_OES_stencil_texture8 on i965/gen8+</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

335
docs/relnotes/12.0.0.html Normal file
View File

@@ -0,0 +1,335 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.0 Release Notes / July 8, 2016</h1>
<p>
Mesa 12.0.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 12.0.1.
</p>
<p>
Mesa 12.0.0 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
3b8fa4d86d78f8f6ec86055b92ad1afe869001483593b3dd4531184b8bc4fcfb mesa-12.0.0.tar.gz
0090c025219318935124292b482e3439bc43e8c074ad01086449fcad88547dc6 mesa-12.0.0.tar.xz
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 4.3 on nvc0, radeonsi, i965 (Gen8+)</li>
<li>OpenGL ES 3.1 on nvc0, radeonsi</li>
<li>GL_ARB_ES3_1_compatibility on nvc0, radeonsi</li>
<li>GL_ARB_compute_shader on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_cull_distance on i965/gen6+, nv50, nvc0, llvmpipe, softpipe</li>
<li>GL_ARB_framebuffer_no_attachments on nvc0, r600, radeonsi, softpipe</li>
<li>GL_ARB_internalformat_query2 on all drivers</li>
<li>GL_ARB_query_buffer_object on i965/hsw+</li>
<li>GL_ARB_robust_buffer_access_behavior on i965, nvc0, radeonsi</li>
<li>GL_ARB_shader_atomic_counters on radeonsi, softpipe</li>
<li>GL_ARB_shader_atomic_counter_ops on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_load_store on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_size on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_storage_buffer_objects on radeonsi, softpipe</li>
<li>GL_ATI_fragment_shader on all Gallium drivers</li>
<li>GL_EXT_base_instance on all drivers that support GL_ARB_base_instance</li>
<li>GL_EXT_clip_cull_distance on all drivers that support GL_ARB_cull_distance</li>
<li>GL_KHR_robustness on i965</li>
<li>GL_OES_copy_image on i965 (Baytrail and Gen8+)</li>
<li>GL_OES_draw_buffers_indexed and GL_EXT_draw_buffers_indexed on all drivers that support GL_ARB_draw_buffers_blend</li>
<li>GL_OES_gpu_shader5 and GL_EXT_gpu_shader5 on all drivers that support GL_ARB_gpu_shader5</li>
<li>GL_OES_sample_shading on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_sample_variables on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_shader_image_atomic on all drivers that support GL_ARB_shader_image_load_store</li>
<li>GL_OES_shader_io_blocks on i965, nvc0, radeonsi</li>
<li>GL_OES_shader_multisample_interpolation on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_texture_border_clamp and GL_EXT_texture_border_clamp on all drivers that support GL_ARB_texture_border_clamp</li>
<li>GL_OES_texture_buffer and GL_EXT_texture_buffer on i965, nvc0, radeonsi</li>
<li>EGL_KHR_reusable_sync on all drivers</li>
<li>GL_ARB_stencil_texture8 and GL_OES_stencil_texture8 on i965/gen8+</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=42187">Bug 42187</a> - ES 1.1 conformance pntszary.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=81585">Bug 81585</a> - piglit spec_glsl-1.10_compiler_literals_invalid-float-suffix-capital-f.vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83036">Bug 83036</a> - [ILK]Piglit spec_ARB_copy_image_arb_copy_image-formats fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89607">Bug 89607</a> - Assertion hit in opt_array_splitting with recursive array indexing</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90513">Bug 90513</a> - Odd gray and red flicker in The Talos Principle on GK104</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92628">Bug 92628</a> - HTTP site for Mesa downloads</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92743">Bug 92743</a> - Centroid shouldn't have to match between the FS and the VS</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92850">Bug 92850</a> - Segfault loading War Thunder</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93054">Bug 93054</a> - [BDW] DiRT Showdown and Bioshock Infinite only render half the screen (bottom left triangle)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93524">Bug 93524</a> - Clover doesn't build</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93551">Bug 93551</a> - Divinity: Original Sin Enhanced Edition(Native) crash on start</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93667">Bug 93667</a> - Crash in eglCreateImageKHR with huge texture size</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93767">Bug 93767</a> - Glitches with soft shadows and MSAA in Knights of the Old Republic 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93840">Bug 93840</a> - [i965] Alien: Isolation fails with GL_ARB_compute_shader enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93962">Bug 93962</a> - [HSW, regression, bisected, CTS] ES2-CTS.gtf.GL2FixedTests.scissor.scissor - segfault/asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94081">Bug 94081</a> - [HSW] compute shader shared var + atomic op = fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94086">Bug 94086</a> - Multiple conflicting libGL libraries installed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94116">Bug 94116</a> - program interface queries not returning right data for UBO / GL_BLOCK_INDEX</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94129">Bug 94129</a> - Mesa's compiler should warn about undefined values</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94181">Bug 94181</a> - [regression] piglit.spec.ext_framebuffer_object.getteximage-formats init-by-clear-and-render</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94198">Bug 94198</a> - [HSW] segfault in copy image when copying from cubemap to 2d</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94199">Bug 94199</a> - Shader abort/crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94253">Bug 94253</a> - [llvmpipe] piglit gl-1.0-swapbuffers-behavior regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94254">Bug 94254</a> - [llvmpipe] [softpipe] piglit read-front regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94257">Bug 94257</a> - [softpipe] piglit glx-copy-sub-buffer regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94274">Bug 94274</a> - [swrast] piglit arb_occlusion_query2-render regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94284">Bug 94284</a> - [radeonsi] outlast segfault on start</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94291">Bug 94291</a> - llvmpipe tests fail if built on skylake i7-6700k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94348">Bug 94348</a> - vkBindImageMemory doesn't take into account the offset when the image is used as a depth buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94383">Bug 94383</a> - build error on i386 when enabling swr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94388">Bug 94388</a> - r600_blit.c:281: r600_decompress_depth_textures: Assertion `tex-&gt;is_depth &amp;&amp; !tex-&gt;is_flushing_texture' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94412">Bug 94412</a> - Trine 3 misrender</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94447">Bug 94447</a> - glsl/glcpp/tests/glcpp-test-cr-lf regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94453">Bug 94453</a> - dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_{center,corner} fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94454">Bug 94454</a> - dEQP-GLES3.functional.clipping.point.wide_point_clip* fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94456">Bug 94456</a> - dEQP-GLES3.functional.state_query.floats.{blend_color,color_clear_value,depth_clear_value}_getinteger64 fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94458">Bug 94458</a> - dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94468">Bug 94468</a> - [HSW, regression, bisected] numerous Sascha demos render incorrectly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94485">Bug 94485</a> - dEQP-GLES3.functional.negative_api.shader.compile_shader and delete_shader broken by Meta</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94524">Bug 94524</a> - Wrong gl_TessLevelOuter interpretation for isolines</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&amp;swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94657">Bug 94657</a> - [llvmpipe] [softpipe] piglit arb_texture_view-getteximage-srgb regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94661">Bug 94661</a> - [bdw, skl] vk-cts: new test failing</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94671">Bug 94671</a> - [radeonsi] Blue-ish textures in Shadow of Mordor</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94713">Bug 94713</a> - [Gen8+] ES 3.1 Stencil texturing broken for 2DArray/Cubes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94747">Bug 94747</a> - Convert phi nodes to logical operations</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94835">Bug 94835</a> - Increase fragment shader sample limits from 16 to 32 (AMD Linux - Mesa/RadeonSi)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94847">Bug 94847</a> - [ES3.1CTS] es31-cts.draw_buffers_indexed.color_masks fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94896">Bug 94896</a> - [vulkan] new CTS tests fail on i965</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94904">Bug 94904</a> - [vulkan, BSW] dEQP-VK.api.object_management.multithreaded_per_thread_device intermittent crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94907">Bug 94907</a> - codegen/nv50_ir_ra.cpp:1330:29: error: isinf was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94909">Bug 94909</a> - [llvmpipe] piglit fs-roundEven-float regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94917">Bug 94917</a> - radeonsi supports GL_ARB_shader_storage_buffer_object with 0 GL_MAX_COMBINED_SHADER_STORAGE_BLOCKS</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94924">Bug 94924</a> - [GEN8] Ungine Valley fails to run due to &quot;intel_do_flush_locked failed: Input/output error&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94925">Bug 94925</a> - Crash in egl_dri3_get_dri_context with Dolphin EGL/X11 in single-core mode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94944">Bug 94944</a> - [regression, hswgt1] gpu hang on arb_shader_image_load_store</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94955">Bug 94955</a> - Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94969">Bug 94969</a> - build fails because install-data-local doesn't follow $DESTDIR</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94972">Bug 94972</a> - blend failures on llvmpipe with llvm 3.7 due to vector selects</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94979">Bug 94979</a> - dolphin-emu rendering broken on gallium/SWR + crashing often</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94984">Bug 94984</a> - XCom2 crashes with SIGSEGV on radeonsi</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94994">Bug 94994</a> - OSMesaGetProcAdress always fails on mangled OSMesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94997">Bug 94997</a> - [vulkan, SKL,BDW,HSW] deqp-vk.spirv_assembly.instruction.compute.opcopymemory.array regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94998">Bug 94998</a> - [vulkan] deqp-vk.pipeline.push_constant.graphics_pipeline.count_3shader_vgf regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95001">Bug 95001</a> - [vulkan] deqp-vk.binding_model.shader_access regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95005">Bug 95005</a> - Unreal engine demos segfault after shader compilation error with OpenGL 4.3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95026">Bug 95026</a> - Alien Isolation segfault after initial loading screen/video</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95034">Bug 95034</a> - vkResetCommandPool should not destroy the command buffers.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95071">Bug 95071</a> - [bisected] Wrong colors in KDE/Qt applications</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95133">Bug 95133</a> - X-COM Enemy Within crashes when entering tactical mission with Bonaire</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95138">Bug 95138</a> - [deqp, 32bit, gen8+] deqp-gles31.functional.draw_indirect.negative</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95142">Bug 95142</a> - [ES3.1CTS,GEN8] ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos assertion</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95158">Bug 95158</a> - glx-test compilation fails in `make check`</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95164">Bug 95164</a> - GLSL compiler (linker I think) emits assertion upon call to glAttachShader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95180">Bug 95180</a> - rasterizer/memory/Convert.h:170:9: error: __builtin_isnan is not a member of std</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95198">Bug 95198</a> - Shadow of Mordor beta has missing geometry with gl 4.3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95203">Bug 95203</a> - Tonga GST/OMX/VCE encode broken since mesa: st/omx: Fix resource leak on OMX_ErrorNone</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95211">Bug 95211</a> - scons TypeError: 'tuple' object is not callable</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95246">Bug 95246</a> - Segfault in glBindFramebuffer()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95251">Bug 95251</a> - vdpau decoder capabilities: not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95252">Bug 95252</a> - [deqp] deqp-gles31.functional.debug.object_labels.query_length_only crashes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95292">Bug 95292</a> - [IVB,SKL] vulkan: stride/tiling issue with vkCmdCopyBufferToImage from larger source buffer into destination image</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95296">Bug 95296</a> - nir_lower_double_packing.c:79:4: error: void function 'lower_double_pack_impl' should not return a value [-Wreturn-type]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95324">Bug 95324</a> - GL33-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo fails in one case on Haswell</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95370">Bug 95370</a> - [965GM] piglit fails many tests after a5d7e144</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95373">Bug 95373</a> - Suspicious warning in brw_blorp_clear.cpp</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95403">Bug 95403</a> - [GK110] misaligned_gpr spamming dmesg when playing victor vran</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95419">Bug 95419</a> - [HSW][regression][bisect] RPG Maker game gives &quot;invalid floating point operation&quot; at startup</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95456">Bug 95456</a> - glXGetFBConfigs has invalid screen bounds</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95462">Bug 95462</a> - [BXT,BSW] arb_gpu_shader_fp64 causes gpu hang</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95529">Bug 95529</a> - [regression, bisected] Image corruption in Chrome</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95537">Bug 95537</a> - Invalid argument in anv_ioctl called from anv_physical_device_init</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96221">Bug 96221</a> - nir/nir_lower_tex.c:202: error: unknown field f32 specified in initializer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96228">Bug 96228</a> - SSBO test regressions from mesa 5b267509</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96236">Bug 96236</a> - dri_interface.h:404: error: redefinition of typedef mesa_glinterop_device_info</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96238">Bug 96238</a> - swr fails to build outside of the main directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96239">Bug 96239</a> - [radeonsi tessellation] [R9 290/390] Random &quot;texture flickering&quot; (Shadow of Mordor, Tomb Raider, Unigine Heaven 4.0)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96258">Bug 96258</a> - [NVC0] Hang when running compute program</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96285">Bug 96285</a> - Mesa build broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96299">Bug 96299</a> - [vulkan] 64 regressions due to mesa d5f2f32</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96346">Bug 96346</a> - [SNB,CTS] es2-cts.gtf.gl.atan regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96349">Bug 96349</a> - [CTS,SKL,BSW,BDW,KBL,BXT] es31-cts.arrays_of_arrays.interactionuniformbuffers3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96351">Bug 96351</a> - [CTS,SKL,KBL,BXT] es2-cts.gtf.gl2extensiontests.egl_image.egl_image</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96358">Bug 96358</a> - SSO: wrong interface validation between GS and VS (regresion due to latest gles 3.1)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96425">Bug 96425</a> - [bisected] occasional dark render in The Talos Principle</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96504">Bug 96504</a> - [vulkancts] compute tests crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96516">Bug 96516</a> - [bisected: 482526] &quot;clover: Update OpenCL version string to match OpenGL&quot;: clover's build fails because of missing git_sha1.h</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96565">Bug 96565</a> - Clive Barker's Jericho displays strange,vivid colors when motion blur enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96607">Bug 96607</a> - [bisected] texture misrender / flicker in The Talos Principle on SKL</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96617">Bug 96617</a> - gl_SecondaryFragDataEXT doesn't work for extended blend func</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96629">Bug 96629</a> - dEQP-GLES2.functional.texture.completeness.cube.not_positive_level_0: Assertion `width &gt;= 1' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96639">Bug 96639</a> - st/mesa: transfer_map with too-high level with dEQP-GLES2.functional.texture.completeness.cube.extra_level</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96674">Bug 96674</a> - [SNB, ILK] spec.ext_image_dma_buf_import.ext_image_dma_buf_import-sample_nv1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96765">Bug 96765</a> - BindFragDataLocationIndexed on array fragment shader output.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96791">Bug 96791</a> - Cannot use image from swapchains for sampling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96825">Bug 96825</a> - anv_device.c:31:27: fatal error: anv_timestamp.h: No such file or directory</li>
</ul>
<h2>Changes</h2>
Radeon drivers (r600 and radeonsi) now require LLVm 3.6 as a minimum.
</div>
</body>
</html>

65
docs/relnotes/12.0.1.html Normal file
View File

@@ -0,0 +1,65 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.1 Release Notes / July 8, 2016</h1>
<p>
Mesa 12.0.1 is a bug fix release which fixes bugs found since the 12.0.1 release.
</p>
<p>
Mesa 12.0.1 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
28dff9c045f4305c96a875a487b9f06c7e88d910511cd6016dbddcd1f53ade0d mesa-12.0.1.tar.gz
bab24fb79f78c876073527f515ed871fc9c81d816f66c8a0b051d8d653896389 mesa-12.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96864">Bug 96864</a> - Mesa 12.0 radeon build broken</li>
</ul>
<h2>Changes</h2>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 12.0.0</li>
<li>radeon: reference the correct cdw/max_dw</li>
<li>Update version to 12.0.1</li>
<li>docs: add release notes for 12.0.1</li>
</ul>
</div>
</body>
</html>

403
docs/relnotes/12.0.2.html Normal file
View File

@@ -0,0 +1,403 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.2 Release Notes / September 2, 2016</h1>
<p>
Mesa 12.0.2 is a bug fix release which fixes bugs found since the 12.0.1 release.
</p>
<p>
Mesa 12.0.2 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
a08565ab1273751ebe2ffa928cbf785056594c803077c9719d0763da780f2918 mesa-12.0.2.tar.gz
d957a5cc371dcd7ff2aa0d87492f263aece46f79352f4520039b58b1f32552cb mesa-12.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=69622">Bug 69622</a> - eglTerminate then eglMakeCurrent crahes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89599">Bug 89599</a> - symbol 'x86_64_entry_start' is already defined when building with LLVM/clang</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92306">Bug 92306</a> - GL Excess demo renders incorrectly on nv43</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94148">Bug 94148</a> - Framebuffer considered invalid when a draw call is done before glCheckFramebufferStatus</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96274">Bug 96274</a> - [NVC0] Failure when compiling compute shader: Assertion `bb-&gt;getFirst()-&gt;serial &lt;= bb-&gt;getExit()-&gt;serial' failed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96358">Bug 96358</a> - SSO: wrong interface validation between GS and VS (regresion due to latest gles 3.1)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96381">Bug 96381</a> - Texture artifacts with immutable texture storage and mipmaps</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96762">Bug 96762</a> - [radeonsi,apitrace] Firewatch: nothing rendered in scrollable (text) areas</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96835">Bug 96835</a> - &quot;gallium: Force blend color to 16-byte alignment&quot; crash with &quot;-march=native -O3&quot; causes some 32bit games to crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96850">Bug 96850</a> - Crucible tests fail for 32bit mesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96908">Bug 96908</a> - [radeonsi] MSAA causes graphical artifacts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96911">Bug 96911</a> - webgl2 conformance2/textures/misc/tex-mipmap-levels.html crashes 12.1 Intel driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96971">Bug 96971</a> - invariant qualifier is not valid for shader inputs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97039">Bug 97039</a> - The Talos Principle and Serious Sam 3 GPU faults</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97207">Bug 97207</a> - [IVY BRIDGE] Fragment shader discard writing to depth</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97214">Bug 97214</a> - X not running with error &quot;Failed to make EGL context current&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97225">Bug 97225</a> - [i965 on HD4600 Haswell] xcom switch to ingame cinematics cause segmentation fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97231">Bug 97231</a> - GL_DEPTH_CLAMP doesn't clamp to the far plane</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97307">Bug 97307</a> - glsl/glcpp/tests/glcpp-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97331">Bug 97331</a> - glDrawElementsBaseVertex doesn't work in display list on i915</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97351">Bug 97351</a> - DrawElementsBaseVertex with VBO ignores base vertex on Intel GMA 9xx in some cases</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97426">Bug 97426</a> - glScissor gives vertically inverted result</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97476">Bug 97476</a> - Shader binaries should not be stored in the PipelineCache</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97567">Bug 97567</a> - [SNB, ILK] ctl, piglit regressions in mesa 12.0.2rc1</li>
</ul>
<h2>Changes</h2>
<p>Andreas Boll (1):</p>
<ul>
<li>configure.ac: Use ${datarootdir} for --with-vulkan-icddir help string too</li>
</ul>
<p>Bernard Kilarski (1):</p>
<ul>
<li>glx: fix error code when there is no context bound</li>
</ul>
<p>Brian Paul (4):</p>
<ul>
<li>svga: handle mismatched number of samplers, sampler views</li>
<li>mesa: use _mesa_clear_texture_image() in clear_texture_fields()</li>
<li>swrast: fix incorrectly positioned putImage() in swrast driver</li>
<li>mesa: fix format conversion bug in get_tex_rgba_uncompressed()</li>
</ul>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Fix miptree layout for EGLImage-based renderbuffers</li>
<li>i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()</li>
</ul>
<p>Christian König (1):</p>
<ul>
<li>st/mesa: fix reference counting bug in st_vdpau</li>
</ul>
<p>Chuck Atkins (1):</p>
<ul>
<li>swr: Refactor checks for compiler feature flags</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: Fix fixed function spot lighting on newer hardware (again)</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>anv: fix writemask on blit fragment shader.</li>
<li>st/glsl_to_tgsi: fix st_src_reg_for_double constant.</li>
</ul>
<p>Emil Velikov (15):</p>
<ul>
<li>docs: add sha256 checksums for 12.0.1</li>
<li>mesa: automake: list builddir before srcdir</li>
<li>mesa: scons: list builddir before srcdir</li>
<li>i965: store reference to the context within struct brw_fence (v2)</li>
<li>anv: remove internal 'validate' layer</li>
<li>anv: automake: use VISIBILITY_CFLAGS to restrict symbol visibility</li>
<li>anv: automake: build with -Bsymbolic</li>
<li>anv: do not export the Vulkan API</li>
<li>anv: remove dummy VK_DEBUG_MARKER_EXT entry points</li>
<li>isl: automake: use VISIBILITY_CFLAGS to restrict symbol visibility</li>
<li>cherry-ignore: temporary(?) drop "a4xx: make sure to actually clamp depth"</li>
<li>i915: Check return value of screen-&gt;image.loader-&gt;getBuffers</li>
<li>Revert "i965/miptree: Set logical_depth0 == 6 for cube maps"</li>
<li>glx/glvnd: list the strcmp arguments in correct order</li>
<li>Update version to 12.0.2</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>vc4: Close our screen's fd on screen close.</li>
<li>vc4: Disable early Z with computed depth.</li>
<li>vc4: Fix a leak of the src[] array of VPM reads in optimization.</li>
<li>vc4: Fix leak of the bo_handles table.</li>
</ul>
<p>Francisco Jerez (3):</p>
<ul>
<li>i965: Emit SKL VF cache invalidation W/A from brw_emit_pipe_control_flush.</li>
<li>i965: Make room in the batch epilogue for three more pipe controls.</li>
<li>i965: Fix remaining flush vs invalidate race conditions in brw_emit_pipe_control_flush.</li>
</ul>
<p>Haixia Shi (1):</p>
<ul>
<li>platform_android: prevent deadlock in droid_swap_buffers</li>
</ul>
<p>Ian Romanick (5):</p>
<ul>
<li>mesa: Strip arrayness from interface block names in some IO validation</li>
<li>glsl: Pack integer and double varyings as flat even if interpolation mode is none</li>
<li>glcpp: Track the actual version instead of just the version_resolved flag</li>
<li>glcpp: Only disallow #undef of pre-defined macros on GLSL ES &gt;= 3.00 shaders</li>
<li>glsl: Mark cube map array sampler types as reserved in GLSL ES 3.10</li>
</ul>
<p>Ilia Mirkin (16):</p>
<ul>
<li>mesa: etc2 online compression is unsupported, don't attempt it</li>
<li>st/mesa: return appropriate mesa format for ETC texture formats</li>
<li>mesa: set _NEW_BUFFERS when updating texture bound to current buffers</li>
<li>nv50,nvc0: srgb rendering is only available for rgba/bgra</li>
<li>vbo: allow DrawElementsBaseVertex in display lists</li>
<li>gallium/util: add helper to compute zmin/zmax for a viewport state</li>
<li>nv50,nvc0: fix depth range when halfz is enabled</li>
<li>nv50/ir: fix bb positions after exit instructions</li>
<li>vbo: add basevertex when looking up elements for vbo splitting</li>
<li>a4xx: only disable depth clipping, not all clipping, when requested</li>
<li>nv50/ir: make sure cfg iterator always hits all blocks</li>
<li>main: add missing EXTRA_END in OES_sample_variables get check</li>
<li>nouveau: always enable at least one RC</li>
<li>nv30: only bail on color/depth bpp mismatch when surfaces are swizzled</li>
<li>a4xx: make sure to actually clamp depth as requested</li>
<li>gk110/ir: fix quadop dall emission</li>
</ul>
<p>Jan Ziak (2):</p>
<ul>
<li>egl/x11: avoid using freed memory if dri2 init fails</li>
<li>loader: fix memory leak in loader_dri3_open</li>
</ul>
<p>Jason Ekstrand (31):</p>
<ul>
<li>nir/spirv: Don't multiply the push constant block size by 4</li>
<li>anv: Add a stub for CmdCopyQueryPoolResults on Ivy Bridge</li>
<li>glsl/types: Fix function type comparison function</li>
<li>glsl/types: Use _mesa_hash_data for hashing function types</li>
<li>genxml: Make gen6-7 blending look more like gen8</li>
<li>anv/pipeline: Unify blend state setup between gen7 and gen8</li>
<li>anv: Enable independentBlend on gen7</li>
<li>anv: Add an align_down_npot_u32 helper</li>
<li>anv: Handle VK_WHOLE_SIZE properly for buffer views</li>
<li>i965/miptree: Enforce that height == 1 for 1-D array textures</li>
<li>i965/miptree: Set logical_depth0 == 6 for cube maps</li>
<li>nir: Add a nir_deref_foreach_leaf helper</li>
<li>nir/inline: Constant-initialize local variables in the callee if needed</li>
<li>anv/pipeline: Set up point coord enables</li>
<li>i965/miptree: Stop multiplying cube depth by 6 in HiZ calculations</li>
<li>i965/vec4: Make opt_vector_float reset at the top of each block</li>
<li>anv/blit2d: Add a format parameter to bind_dst and create_iview</li>
<li>anv/blit2d: Add support for RGB destinations</li>
<li>anv/clear: Make cmd_clear_image take an actual VkClearValue</li>
<li>anv/clear: Clear E5B9G9R9 images as R32_UINT</li>
<li>anv: Include the pipeline layout in the shader hash</li>
<li>isl: Allow multisampled array textures</li>
<li>anv/descriptor_set: memset anv_descriptor_set_layout</li>
<li>anv/pipeline: Fix bind maps for fragment output arrays</li>
<li>anv/allocator: Correctly set the number of buckets</li>
<li>anv/pipeline: Properly handle OOM during shader compilation</li>
<li>anv: Remove unused fields from anv_pipeline_bind_map</li>
<li>anv: Add pipeline_has_stage guards a few places</li>
<li>anv: Add a struct for storing a compiled shader</li>
<li>anv/pipeline: Add support for caching the push constant map</li>
<li>anv: Rework pipeline caching</li>
</ul>
<p>José Fonseca (2):</p>
<ul>
<li>appveyor: Install pywin32 extensions.</li>
<li>appveyor: Force Visual Studio 2013 image.</li>
</ul>
<p>Kenneth Graunke (21):</p>
<ul>
<li>genxml: Add CLIPMODE_* prefix to 3DSTATE_CLIP's "Clip Mode" enum values.</li>
<li>genxml: Add APIMODE_D3D missing enum values and improve consistency.</li>
<li>anv: Fix near plane clipping on Gen7/7.5.</li>
<li>anv: Enable early culling on Gen7.</li>
<li>anv: Unify 3DSTATE_CLIP code across generations.</li>
<li>genxml: Rename "API Rendering Disable" to "Rendering Disable".</li>
<li>anv: Properly call gen75_emit_state_base_address on Haswell.</li>
<li>i965: Include VUE handles for GS with invocations &gt; 1.</li>
<li>nir: Add a base const_index to shared atomic intrinsics.</li>
<li>i965: Fix shared atomic intrinsics to pay attention to base.</li>
<li>mesa: Add GL_BGRA_EXT to the list of GenerateMipmap internal formats.</li>
<li>mesa: Don't call GenerateMipmap if Width or Height == 0.</li>
<li>glsl: Delete bogus ir_set_program_inouts assert.</li>
<li>glsl: Fix the program resource names of gl_TessLevelOuter/Inner[].</li>
<li>glsl: Fix location bias for patch variables.</li>
<li>glsl: Fix invariant matching in GLSL 4.30 and GLSL ES 1.00.</li>
<li>mesa: Fix uf10_to_f32() scale factor in the E == 0 and M != 0 case.</li>
<li>nir/builder: Add bany_inequal and bany helpers.</li>
<li>i965: Implement the WaPreventHSTessLevelsInterference workaround.</li>
<li>i965: Fix execution size of scalar TCS barrier setup code.</li>
<li>i965: Fix barrier count shift in scalar TCS backend.</li>
</ul>
<p>Leo Liu (2):</p>
<ul>
<li>st/omx/enc: check uninitialized list from task release</li>
<li>vl/dri3: fix a memory leak from front buffer</li>
</ul>
<p>Marek Olšák (7):</p>
<ul>
<li>glsl_to_tgsi: don't use the negate modifier in integer ops after bitcast</li>
<li>radeonsi: add a workaround for a compute VGPR-usage LLVM bug</li>
<li>winsys/amdgpu: disallow DCC with mipmaps</li>
<li>gallium/util: fix align64</li>
<li>radeonsi: only set dual source blending for MRT0</li>
<li>radeonsi: fix VM faults due NULL internal const buffers on CIK</li>
<li>radeonsi: disable SDMA texture copying on Carrizo</li>
</ul>
<p>Matt Turner (4):</p>
<ul>
<li>mapi: Massage code to allow clang to compile.</li>
<li>i965/vec4: Ignore swizzle of VGRF for use by var_range_end().</li>
<li>mesa: Use AC_HEADER_MAJOR to include correct header for major().</li>
<li>nir: Walk blocks in source code order in lower_vars_to_ssa.</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>glx: Don't use current context in __glXSendError</li>
</ul>
<p>Miklós Máté (1):</p>
<ul>
<li>vbo: set draw_id</li>
</ul>
<p>Nanley Chery (5):</p>
<ul>
<li>anv/descriptor_set: Fix binding partly undefined descriptor sets</li>
<li>isl: Fix assert on raw buffer surface state size</li>
<li>anv/device: Fix max buffer range limits</li>
<li>isl: Fix isl_tiling_is_any_y()</li>
<li>anv/gen7_pipeline: Set PixelShaderKillPixel for discards</li>
</ul>
<p>Nicolai Hähnle (7):</p>
<ul>
<li>radeonsi: explicitly choose center locations for 1xAA on Polaris</li>
<li>radeonsi: fix Polaris MSAA regression</li>
<li>radeonsi: ensure sample locations are set for line and polygon smoothing</li>
<li>st_glsl_to_tgsi: only skip over slots of an input array that are present</li>
<li>glsl: fix optimization of discard nested multiple levels</li>
<li>radeonsi: flush TC L2 cache for indirect draw data</li>
<li>radeonsi: add si_set_rw_buffer to be used for internal descriptors</li>
</ul>
<p>Nicolas Boichat (6):</p>
<ul>
<li>egl/dri2: dri2_make_current: Set EGL error if bindContext fails</li>
<li>egl/wayland: Set disp-&gt;DriverData to NULL on error</li>
<li>egl/surfaceless: Set disp-&gt;DriverData to NULL on error</li>
<li>egl/drm: Set disp-&gt;DriverData to NULL on error</li>
<li>egl/android: Set dpy-&gt;DriverData to NULL on error</li>
<li>egl/dri2: Add reference count for dri2_egl_display</li>
</ul>
<p>Rob Herring (3):</p>
<ul>
<li>Android: add missing u_math.h include path for libmesa_isl</li>
<li>vc4: fix vc4_resource_from_handle() stride calculation</li>
<li>vc4: add hash table look-up for exported dmabufs</li>
</ul>
<p>Samuel Pitoiset (7):</p>
<ul>
<li>nvc0/ir: fix images indirect access on Fermi</li>
<li>nvc0: fix the driver cb size when draw parameters are used</li>
<li>gm107/ir: add missing NEG modifier for IADD32I</li>
<li>gm107/ir: make use of ADD32I for all immediates</li>
<li>nvc0: upload sample locations on GM20x</li>
<li>nvc0: invalidate textures/samplers on GK104+</li>
<li>nv50/ir: always emit the NDV bit for OP_QUADOP</li>
</ul>
<p>Stefan Dirsch (1):</p>
<ul>
<li>Avoid overflow in 'last' variable of FindGLXFunction(...)</li>
</ul>
<p>Stencel, Joanna (1):</p>
<ul>
<li>egl/wayland-egl: Fix for segfault in dri2_wl_destroy_surface.</li>
</ul>
<p>Tim Rowley (2):</p>
<ul>
<li>Revert "gallium: Force blend color to 16-byte alignment"</li>
<li>swr: switch from overriding -march to selecting features</li>
</ul>
<p>Tomasz Figa (8):</p>
<ul>
<li>gallium/dri: Add shared glapi to LIBADD on Android</li>
<li>egl/android: Remove unused variables</li>
<li>egl/android: Check return value of dri2_get_dri_config()</li>
<li>egl/android: Stop leaking DRI images</li>
<li>gallium/winsys/kms: Fix double refcount when importing from prime FD (v2)</li>
<li>gallium/winsys/kms: Fully initialize kms_sw_dt at prime import time (v2)</li>
<li>gallium/winsys/kms: Move display target handle lookup to separate function</li>
<li>gallium/winsys/kms: Look up the GEM handle after importing a prime FD</li>
</ul>
</div>
</body>
</html>

71
docs/relnotes/12.0.3.html Normal file
View File

@@ -0,0 +1,71 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.3 Release Notes / September 15, 2016</h1>
<p>
Mesa 12.0.3 is a bug fix release which fixes bugs found since the 12.0.3 release.
</p>
<p>
Mesa 12.0.3 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
79abcfab3de30dbd416d1582a3cf6b1be308466231488775f1b7bb43be353602 mesa-12.0.3.tar.gz
1dc86dd9b51272eee1fad3df65e18cda2e556ef1bc0b6e07cd750b9757f493b1 mesa-12.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97781">Bug 97781</a> - [HSW, BYT, IVB] es2-cts.gtf.gl2extensiontests.depth_texture_cube_map.depth_texture_cube_map</li>
</ul>
<h2>Changes</h2>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 12.0.2</li>
<li>Revert "i965/miptree: Stop multiplying cube depth by 6 in HiZ calculations"</li>
<li>Update version to 12.0.3</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>appveyor: Update winflexbison download URL.</li>
</ul>
</div>
</body>
</html>

85
docs/relnotes/13.0.0.html Normal file
View File

@@ -0,0 +1,85 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.0 Release Notes / TBD</h1>
<p>
Mesa 13.0.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 13.0.1.
</p>
<p>
Mesa 13.0.0 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL ES 3.1 on i965/hsw</li>
<li>OpenGL ES 3.2 on i965/gen9+ (Skylake and later)</li>
<li>GL_ARB_ES3_1_compatibility on i965</li>
<li>GL_ARB_ES3_2_compatibility on i965/gen8+</li>
<li>GL_ARB_clear_texture on r600, radeonsi</li>
<li>GL_ARB_compute_variable_group_size on nvc0, radeonsi</li>
<li>GL_ARB_cull_distance on radeonsi</li>
<li>GL_ARB_enhanced_layouts on i965, nv50, nvc0, radeonsi, llvmpipe, softpipe</li>
<li>GL_ARB_indirect_parameters on radeonsi</li>
<li>GL_ARB_query_buffer_object on radeonsi</li>
<li>GL_ARB_shader_draw_parameters on radeonsi</li>
<li>GL_ARB_shader_group_vote on nvc0</li>
<li>GL_ARB_shader_viewport_layer_array on i965/gen6+</li>
<li>GL_ARB_stencil_texturing on i965/hsw</li>
<li>GL_ARB_texture_stencil8 on i965/hsw</li>
<li>GL_EXT_window_rectangles on nv50, nvc0</li>
<li>GL_KHR_blend_equation_advanced on i965</li>
<li>GL_KHR_robustness on nvc0, radeonsi</li>
<li>GL_KHR_texture_compression_astc_sliced_3d on i965</li>
<li>GL_OES_copy_image on nv50, nvc0, r600, radeonsi, softpipe, llvmpipe</li>
<li>GL_OES_geometry_shader on i965/gen8+, nvc0, radeonsi</li>
<li>GL_OES_primitive_bounding_box on i965/gen7+, nvc0, radeonsi</li>
<li>GL_OES_texture_cube_map_array on i965/gen8+, nvc0, radeonsi</li>
<li>GL_OES_tessellation_shader on i965/gen7+, nvc0, radeonsi</li>
<li>GL_OES_viewport_array on nvc0, radeonsi</li>
<li>GL_ANDROID_extension_pack_es31a on i965/gen9+</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -0,0 +1,120 @@
Name
MESA_platform_surfaceless
Name Strings
EGL_MESA_platform_surfaceless
Contributors
Chad Versace <chadversary@google.com>
Haixia Shi <hshi@google.com>
Stéphane Marchesin <marcheu@google.com>
Zach Reizner <zachr@chromium.org>
Gurchetan Singh <gurchetansingh@google.com>
Contacts
Chad Versace <chadversary@google.com>
Status
DRAFT
Version
Version 2, 2016-10-13
Number
EGL Extension #TODO
Extension Type
EGL client extension
Dependencies
Requires EGL 1.5 or later; or EGL 1.4 with EGL_EXT_platform_base.
This extension is written against the EGL 1.5 Specification (draft
20140122).
This extension interacts with EGL_EXT_platform_base as follows. If the
implementation supports EGL_EXT_platform_base, then text regarding
eglGetPlatformDisplay applies also to eglGetPlatformDisplayEXT;
eglCreatePlatformWindowSurface to eglCreatePlatformWindowSurfaceEXT; and
eglCreatePlatformPixmapSurface to eglCreatePlatformPixmapSurfaceEXT.
Overview
This extension defines a new EGL platform, the "surfaceless" platform. This
platfom's defining property is that it has no native surfaces, and hence
neither eglCreatePlatformWindowSurface nor eglCreatePlatformPixmapSurface
can be used. The platform is independent of any native window system.
The platform's intended use case is for enabling OpenGL and OpenGL ES
applications on systems where no window system exists. However, the
platform's permitted usage is not restricted to this case. Since the
platform is independent of any native window system, it may also be used on
systems where a window system is present.
New Types
None
New Procedures and Functions
None
New Tokens
Accepted as the <platform> argument of eglGetPlatformDisplay:
EGL_PLATFORM_SURFACELESS_MESA 0x31DD
Additions to the EGL Specification
None.
New Behavior
To determine if the EGL implementation supports this extension, clients
should query the EGL_EXTENSIONS string of EGL_NO_DISPLAY.
To obtain an EGLDisplay on the surfaceless platform, call
eglGetPlatformDisplay with <platform> set to EGL_PLATFORM_SURFACELESS_MESA.
The <native_display> parameter must be EGL_DEFAULT_DISPLAY.
eglCreatePlatformWindowSurface fails when called with a <display> that
belongs to the surfaceless platform. It returns EGL_NO_SURFACE and
generates EGL_BAD_NATIVE_WINDOW. The justification for this unconditional
failure is that the surfaceless platform has no native windows, and
therefore the <native_window> parameter is always invalid.
Likewise, eglCreatePlatformPixmapSurface also fails when called with a
<display> that belongs to the surfaceless platform. It returns
EGL_NO_SURFACE and generates EGL_BAD_NATIVE_PIXMAP.
The surfaceless platform imposes no platform-specific restrictions on the
creation of pbuffers, as eglCreatePbufferSurface has no native surface
parameter. Specifically, if the EGLDisplay advertises an EGLConfig whose
EGL_SURFACE_TYPE attribute contains EGL_PBUFFER_BIT, then the EGLDisplay
permits the creation of pbuffers with that config.
Issues
None.
Revision History
Version 2, 2016-10-13 (Chad Versace)
- Assign enum values
- Define interfactions with EGL 1.4 and EGL_EXT_platform_base.
- Add Gurchetan as contributor, as he implemented the pbuffer support.
Version 1, 2016-09-23 (Chad Versace)
- Initial version
- Posted for review at
https://lists.freedesktop.org/archives/mesa-dev/2016-September/129549.html

View File

@@ -12,11 +12,12 @@ Contact
Status
Proposal
Superseded by the functionally identical EGL_KHR_no_config_context
extension.
Version
Version 1, February 28, 2014
Version 2, September 9, 2016
Number
@@ -121,5 +122,8 @@ Issues
Revision History
Version 2, September 9, 2016
Defer to EGL_KHR_no_config_context (Adam Jackson)
Version 1, February 28, 2014
Initial draft (Neil Roberts)

View File

@@ -0,0 +1,520 @@
Name
MESA_shader_integer_functions
Name Strings
GL_MESA_shader_integer_functions
Contact
Ian Romanick <ian.d.romanick@intel.com>
Contributors
All the contributors of GL_ARB_gpu_shader5
Status
Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later
Version
Version 2, July 7, 2016
Number
TBD
Dependencies
This extension is written against the OpenGL 3.2 (Compatibility Profile)
Specification.
This extension is written against Version 1.50 (Revision 09) of the OpenGL
Shading Language Specification.
GLSL 1.30 is required.
This extension interacts with ARB_gpu_shader5.
This extension interacts with ARB_gpu_shader_fp64.
This extension interacts with NV_gpu_shader5.
Overview
GL_ARB_gpu_shader5 extends GLSL in a number of useful ways. Much of this
added functionality requires significant hardware support. There are many
aspects, however, that can be easily implmented on any GPU with "real"
integer support (as opposed to simulating integers using floating point
calculations).
This extension provides a set of new features to the OpenGL Shading
Language to support capabilities of these GPUs, extending the capabilities
of version 1.30 of the OpenGL Shading Language. Shaders
using the new functionality provided by this extension should enable this
functionality via the construct
#extension GL_MESA_shader_integer_functions : require (or enable)
This extension provides a variety of new features for all shader types,
including:
* support for implicitly converting signed integer types to unsigned
types, as well as more general implicit conversion and function
overloading infrastructure to support new data types introduced by
other extensions;
* new built-in functions supporting:
* splitting a floating-point number into a significand and exponent
(frexp), or building a floating-point number from a significand and
exponent (ldexp);
* integer bitfield manipulation, including functions to find the
position of the most or least significant set bit, count the number
of one bits, and bitfield insertion, extraction, and reversal;
* extended integer precision math, including add with carry, subtract
with borrow, and extenended multiplication;
The resulting extension is a strict subset of GL_ARB_gpu_shader5.
IP Status
No known IP claims.
New Procedures and Functions
None
New Tokens
None
Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)
None.
Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)
None.
Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)
None.
Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)
None.
Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)
None.
Additions to the AGL/GLX/WGL Specifications
None.
Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_MESA_shader_integer_functions : <behavior>
where <behavior> is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_MESA_shader_integer_functions 1
Modify Section 4.1.10, Implicit Conversions, p. 27
(modify table of implicit conversions)
Can be implicitly
Type of expression converted to
--------------------- -----------------
int uint, float
ivec2 uvec2, vec2
ivec3 uvec3, vec3
ivec4 uvec4, vec4
uint float
uvec2 vec2
uvec3 vec3
uvec4 vec4
(modify second paragraph of the section) No implicit conversions are
provided to convert from unsigned to signed integer types or from
floating-point to integer types. There are no implicit array or structure
conversions.
(insert before the final paragraph of the section) When performing
implicit conversion for binary operators, there may be multiple data types
to which the two operands can be converted. For example, when adding an
int value to a uint value, both values can be implicitly converted to uint
and float. In such cases, a floating-point type is chosen if either
operand has a floating-point type. Otherwise, an unsigned integer type is
chosen if either operand has an unsigned integer type. Otherwise, a
signed integer type is chosen.
Modify Section 5.9, Expressions, p. 57
(modify bulleted list as follows, adding support for implicit conversion
between signed and unsigned types)
Expressions in the shading language are built from the following:
* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
types, and all matrix types.
...
* The operator modulus (%) operates on signed or unsigned integer scalars
or vectors. If the fundamental types of the operands do not match, the
conversions from Section 4.1.10 "Implicit Conversions" are applied to
produce matching types. ...
Modify Section 6.1, Function Definitions, p. 63
(modify description of overloading, beginning at the top of p. 64)
Function names can be overloaded. The same function name can be used for
multiple functions, as long as the parameter types differ. If a function
name is declared twice with the same parameter types, then the return
types and all qualifiers must also match, and it is the same function
being declared. For example,
vec4 f(in vec4 x, out vec4 y); // (A)
vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type
vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type
int f(in vec4 x, out ivec4 y); // error, only return type differs
vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs
vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs
When function calls are resolved, an exact type match for all the
arguments is sought. If an exact match is found, all other functions are
ignored, and the exact match is used. If no exact match is found, then
the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
applied to find a match. Mismatched types on input parameters (in or
inout or default) must have a conversion from the calling argument type
to the formal parameter type. Mismatched types on output parameters (out
or inout) must have a conversion from the formal parameter type to the
calling argument type.
If implicit conversions can be used to find more than one matching
function, a single best-matching function is sought. To determine a best
match, the conversions between calling argument and formal parameter
types are compared for each function argument and pair of matching
functions. After these comparisons are performed, each pair of matching
functions are compared. A function definition A is considered a better
match than function definition B if:
* for at least one function argument, the conversion for that argument
in A is better than the corresponding conversion in B; and
* there is no function argument for which the conversion in B is better
than the corresponding conversion in A.
If a single function definition is considered a better match than every
other matching function definition, it will be used. Otherwise, a
semantic error occurs and the shader will fail to compile.
To determine whether the conversion for a single argument in one match is
better than that for another match, the following rules are applied, in
order:
1. An exact match is better than a match involving any implicit
conversion.
2. A match involving an implicit conversion from float to double is
better than a match involving any other implicit conversion.
3. A match involving an implicit conversion from either int or uint to
float is better than a match involving an implicit conversion from
either int or uint to double.
If none of the rules above apply to a particular pair of conversions,
neither conversion is considered better than the other.
For the function prototypes (A), (B), and (C) above, the following
examples show how the rules apply to different sets of calling argument
types:
f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y)
f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y)
f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y)
// (C) not relevant, can't convert vec4 to
// ivec4. (A) better than (B) for 2nd
// argument (rule 2), same on first argument.
f(ivec4, vec4); // NOT matched. All three match by implicit
// conversion. (C) is better than (A) and (B)
// on the first argument. (A) is better than
// (B) and (C).
Modify Section 8.3, Common Functions, p. 84
(add support for single-precision frexp and ldexp functions)
Syntax:
genType frexp(genType x, out genIType exp);
genType ldexp(genType x, in genIType exp);
The function frexp() splits each single-precision floating-point number in
<x> into a binary significand, a floating-point number in the range [0.5,
1.0), and an integral exponent of two, such that:
x = significand * 2 ^ exponent
The significand is returned by the function; the exponent is returned in
the parameter <exp>. For a floating-point value of zero, the significant
and exponent are both zero. For a floating-point value that is an
infinity or is not a number, the results of frexp() are undefined.
If the input <x> is a vector, this operation is performed in a
component-wise manner; the value returned by the function and the value
written to <exp> are vectors with the same number of components as <x>.
The function ldexp() builds a single-precision floating-point number from
each significand component in <x> and the corresponding integral exponent
of two in <exp>, returning:
significand * 2 ^ exponent
If this product is too large to be represented as a single-precision
floating-point value, the result is considered undefined.
If the input <x> is a vector, this operation is performed in a
component-wise manner; the value passed in <exp> and returned by the
function are vectors with the same number of components as <x>.
(add support for new integer built-in functions)
Syntax:
genIType bitfieldExtract(genIType value, int offset, int bits);
genUType bitfieldExtract(genUType value, int offset, int bits);
genIType bitfieldInsert(genIType base, genIType insert, int offset,
int bits);
genUType bitfieldInsert(genUType base, genUType insert, int offset,
int bits);
genIType bitfieldReverse(genIType value);
genUType bitfieldReverse(genUType value);
genIType bitCount(genIType value);
genIType bitCount(genUType value);
genIType findLSB(genIType value);
genIType findLSB(genUType value);
genIType findMSB(genIType value);
genIType findMSB(genUType value);
The function bitfieldExtract() extracts bits <offset> through
<offset>+<bits>-1 from each component in <value>, returning them in the
least significant bits of corresponding component of the result. For
unsigned data types, the most significant bits of the result will be set
to zero. For signed data types, the most significant bits will be set to
the value of bit <offset>+<base>-1. If <bits> is zero, the result will be
zero. The result will be undefined if <offset> or <bits> is negative, or
if the sum of <offset> and <bits> is greater than the number of bits used
to store the operand. Note that for vector versions of bitfieldExtract(),
a single pair of <offset> and <bits> values is shared for all components.
The function bitfieldInsert() inserts the <bits> least significant bits of
each component of <insert> into the corresponding component of <base>.
The result will have bits numbered <offset> through <offset>+<bits>-1
taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
directly from the corresponding bits of <base>. If <bits> is zero, the
result will simply be <base>. The result will be undefined if <offset> or
<bits> is negative, or if the sum of <offset> and <bits> is greater than
the number of bits used to store the operand. Note that for vector
versions of bitfieldInsert(), a single pair of <offset> and <bits> values
is shared for all components.
The function bitfieldReverse() reverses the bits of <value>. The bit
numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
<value>, where <bits> is the total number of bits used to represent
<value>.
The function bitCount() returns the number of one bits in the binary
representation of <value>.
The function findLSB() returns the bit number of the least significant one
bit in the binary representation of <value>. If <value> is zero, -1 will
be returned.
The function findMSB() returns the bit number of the most significant bit
in the binary representation of <value>. For positive integers, the
result will be the bit number of the most significant one bit. For
negative integers, the result will be the bit number of the most
significant zero bit. For a <value> of zero or negative one, -1 will be
returned.
(support for unsigned integer add/subtract with carry-out)
Syntax:
genUType uaddCarry(genUType x, genUType y, out genUType carry);
genUType usubBorrow(genUType x, genUType y, out genUType borrow);
The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
<y>, returning the sum modulo 2^32. The value <carry> is set to zero if
the sum was less than 2^32, or one otherwise.
The function usubBorrow() subtracts the 32-bit unsigned integer or vector
<y> from <x>, returning the difference if non-negative or 2^32 plus the
difference, otherwise. The value <borrow> is set to zero if x >= y, or
one otherwise.
(support for signed and unsigned multiplies, with 32-bit inputs and a
64-bit result spanning two 32-bit outputs)
Syntax:
void umulExtended(genUType x, genUType y, out genUType msb,
out genUType lsb);
void imulExtended(genIType x, genIType y, out genIType msb,
out genIType lsb);
The functions umulExtended() and imulExtended() multiply 32-bit unsigned
or signed integers or vectors <x> and <y>, producing a 64-bit result. The
32 least significant bits are returned in <lsb>; the 32 most significant
bits are returned in <msb>.
GLX Protocol
None.
Dependencies on ARB_gpu_shader_fp64
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
of implicit conversions supported in the OpenGL Shading Language. If more
than one of these extensions is supported, an expression of one type may
be converted to another type if that conversion is allowed by any of these
specifications.
If ARB_gpu_shader_fp64 or a similar extension introducing new data types
is not supported, the function overloading rule in the GLSL specification
preferring promotion an input parameters to smaller type to a larger type
is never applicable, as all data types are of the same size. That rule
and the example referring to "double" should be removed.
Dependencies on NV_gpu_shader5
This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
of implicit conversions supported in the OpenGL Shading Language. If more
than one of these extensions is supported, an expression of one type may
be converted to another type if that conversion is allowed by any of these
specifications.
If NV_gpu_shader5 is supported, integer data types are supported with four
different precisions (8-, 16, 32-, and 64-bit) and floating-point data
types are supported with three different precisions (16-, 32-, and
64-bit). The extension adds the following rule for output parameters,
which is similar to the one present in this extension for input
parameters:
5. If the formal parameters in both matches are output parameters, a
conversion from a type with a larger number of bits per component is
better than a conversion from a type with a smaller number of bits
per component. For example, a conversion from an "int16_t" formal
parameter type to "int" is better than one from an "int8_t" formal
parameter type to "int".
Such a rule is not provided in this extension because there is no
combination of types in this extension and ARB_gpu_shader_fp64 where this
rule has any effect.
Errors
None
New State
None
New Implementation Dependent State
None
Issues
(1) What should this extension be called?
UNRESOLVED. This extension borrows from GL_ARB_gpu_shader5, so creating
some sort of a play on that name would be viable. However, nothing in
this extension should require SM5 hardware, so such a name would be a
little misleading and weird.
Since the primary purpose is to add integer related functions from
GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions
for now.
(2) Why is some of the formatting in this extension weird?
RESOLVED: This extension is formatted to minimize the differences (as
reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5
specification.
(3) Should ldexp and frexp be included?
RESOLVED: Yes. Few GPUs have native instructions to implement these
functions. These are generally implemented using existing GLSL built-in
functions and the other functions provided by this extension.
(4) Should umulExtended and imulExtended be included?
RESOLVED: Yes. These functions should be implementable on any GPU that
can support the rest of this extension, but the implementation may be
complex. The implementation on a GPU that only supports 32bit x 32bit =
32bit multiplication would be quite expensive. However, many GPUs
(including OpenGL 4.0 GPUs that already support this function) have a
32bit x 16bit = 48bit multiplier. The implementation there is only
trivially more expensive than regular 32bit multiplication.
(5) Should the pack and unpack functions be included?
RESOLVED: No. These functions are already available via
GL_ARB_shading_language_packing.
(6) Should the "BitsTo" functions be included?
RESOLVED: No. These functions are already available via
GL_ARB_shader_bit_encoding.
Revision History
Rev. Date Author Changes
---- ----------- -------- -----------------------------------------
2 7-Jul-2016 idr Fix typo in #extension line
1 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5.

View File

@@ -1,10 +1,18 @@
The definitive source for enum values and reserved ranges are the XML files in
the Khronos registry:
See the OpenGL ARB enum registry at http://www.opengl.org/registry/api/enum.spec
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/egl.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/gl.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/glx.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/wgl.xml
Blocks allocated to Mesa:
GL blocks allocated to Mesa:
0x8750-0x875F
0x8BB0-0x8BBF
EGL blocks allocated to Mesa:
0x31D0-0x31DF
0x3290-0x329F
GL_MESA_packed_depth_stencil
GL_DEPTH_STENCIL_MESA 0x8750
@@ -13,7 +21,7 @@ GL_MESA_packed_depth_stencil
GL_UNSIGNED_SHORT_15_1_MESA 0x8753
GL_UNSIGNED_SHORT_1_15_REV_MESA 0x8754
GL_MESA_trace.spec:
GL_MESA_trace:
GL_TRACE_ALL_BITS_MESA 0xFFFF
GL_TRACE_OPERATIONS_BIT_MESA 0x0001
GL_TRACE_PRIMITIVES_BIT_MESA 0x0002
@@ -24,12 +32,12 @@ GL_MESA_trace.spec:
GL_TRACE_MASK_MESA 0x8755
GL_TRACE_NAME_MESA 0x8756
MESA_ycbcr_texture.spec:
GL_MESA_ycbcr_texture:
GL_YCBCR_MESA 0x8757
GL_UNSIGNED_SHORT_8_8_MESA 0x85BA /* same as Apple's */
GL_UNSIGNED_SHORT_8_8_REV_MESA 0x85BB /* same as Apple's */
GL_MESA_pack_invert.spec
GL_MESA_pack_invert:
GL_PACK_INVERT_MESA 0x8758
GL_MESA_shader_debug.spec: (obsolete)
@@ -37,7 +45,7 @@ GL_MESA_shader_debug.spec: (obsolete)
GL_DEBUG_PRINT_MESA 0x875A
GL_DEBUG_ASSERT_MESA 0x875B
GL_MESA_program_debug.spec: (obsolete)
GL_MESA_program_debug: (obsolete)
GL_FRAGMENT_PROGRAM_CALLBACK_MESA 0x????
GL_VERTEX_PROGRAM_CALLBACK_MESA 0x????
GL_FRAGMENT_PROGRAM_POSITION_MESA 0x????
@@ -55,3 +63,24 @@ GL_MESAX_texture_stack:
GL_TEXTURE_1D_STACK_BINDING_MESAX 0x875D
GL_TEXTURE_2D_STACK_BINDING_MESAX 0x875E
EGL_MESA_drm_image
EGL_DRM_BUFFER_FORMAT_MESA 0x31D0
EGL_DRM_BUFFER_USE_MESA 0x31D1
EGL_DRM_BUFFER_FORMAT_ARGB32_MESA 0x31D2
EGL_DRM_BUFFER_MESA 0x31D3
EGL_DRM_BUFFER_STRIDE_MESA 0x31D4
EGL_MESA_platform_gbm
EGL_PLATFORM_GBM_MESA 0x31D7
EGL_MESA_platform_surfaceless
EGL_PLATFORM_SURFACELESS_MESA 0x31DD
EGL_WL_bind_wayland_display
EGL_TEXTURE_FORMAT 0x3080
EGL_WAYLAND_BUFFER_WL 0x31D5
EGL_WAYLAND_PLANE_WL 0x31D6
EGL_TEXTURE_Y_U_V_WL 0x31D7
EGL_TEXTURE_Y_UV_WL 0x31D8
EGL_TEXTURE_Y_XUXV_WL 0x31D9
EGL_WAYLAND_Y_INVERTED_WL 0x31DB

View File

@@ -199,7 +199,7 @@ This incurs a small performance penalty.
<h2>Extensions</h2>
<p>
The following MESA-specific extensions are implemented in the Xlib driver.
The following Mesa-specific extensions are implemented in the Xlib driver.
</p>
<h3>GLX_MESA_pixmap_colormap</h3>

View File

@@ -0,0 +1,2 @@
[*.h]
indent_style = tab

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -38,7 +38,7 @@ extern "C" {
#include <EGL/eglplatform.h>
#define EGL_EGLEXT_VERSION 20150508
#define EGL_EGLEXT_VERSION 20160809
/* Generated C header for:
* API: egl
@@ -99,6 +99,33 @@ EGLAPI EGLSyncKHR EGLAPIENTRY eglCreateSync64KHR (EGLDisplay dpy, EGLenum type,
#define EGL_CONTEXT_OPENGL_NO_ERROR_KHR 0x31B3
#endif /* EGL_KHR_create_context_no_error */
#ifndef EGL_KHR_debug
#define EGL_KHR_debug 1
typedef void *EGLLabelKHR;
typedef void *EGLObjectKHR;
typedef void (EGLAPIENTRY *EGLDEBUGPROCKHR)(EGLenum error,const char *command,EGLint messageType,EGLLabelKHR threadLabel,EGLLabelKHR objectLabel,const char* message);
#define EGL_OBJECT_THREAD_KHR 0x33B0
#define EGL_OBJECT_DISPLAY_KHR 0x33B1
#define EGL_OBJECT_CONTEXT_KHR 0x33B2
#define EGL_OBJECT_SURFACE_KHR 0x33B3
#define EGL_OBJECT_IMAGE_KHR 0x33B4
#define EGL_OBJECT_SYNC_KHR 0x33B5
#define EGL_OBJECT_STREAM_KHR 0x33B6
#define EGL_DEBUG_MSG_CRITICAL_KHR 0x33B9
#define EGL_DEBUG_MSG_ERROR_KHR 0x33BA
#define EGL_DEBUG_MSG_WARN_KHR 0x33BB
#define EGL_DEBUG_MSG_INFO_KHR 0x33BC
#define EGL_DEBUG_CALLBACK_KHR 0x33B8
typedef EGLint (EGLAPIENTRYP PFNEGLDEBUGMESSAGECONTROLKHRPROC) (EGLDEBUGPROCKHR callback, const EGLAttrib *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDEBUGKHRPROC) (EGLint attribute, EGLAttrib *value);
typedef EGLint (EGLAPIENTRYP PFNEGLLABELOBJECTKHRPROC) (EGLDisplay display, EGLenum objectType, EGLObjectKHR object, EGLLabelKHR label);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLint EGLAPIENTRY eglDebugMessageControlKHR (EGLDEBUGPROCKHR callback, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglQueryDebugKHR (EGLint attribute, EGLAttrib *value);
EGLAPI EGLint EGLAPIENTRY eglLabelObjectKHR (EGLDisplay display, EGLenum objectType, EGLObjectKHR object, EGLLabelKHR label);
#endif
#endif /* EGL_KHR_debug */
#ifndef EGL_KHR_fence_sync
#define EGL_KHR_fence_sync 1
typedef khronos_utime_nanoseconds_t EGLTimeKHR;
@@ -223,6 +250,16 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurface64KHR (EGLDisplay dpy, EGLSurface s
#endif
#endif /* EGL_KHR_lock_surface3 */
#ifndef EGL_KHR_mutable_render_buffer
#define EGL_KHR_mutable_render_buffer 1
#define EGL_MUTABLE_RENDER_BUFFER_BIT_KHR 0x1000
#endif /* EGL_KHR_mutable_render_buffer */
#ifndef EGL_KHR_no_config_context
#define EGL_KHR_no_config_context 1
#define EGL_NO_CONFIG_KHR ((EGLConfig)0)
#endif /* EGL_KHR_no_config_context */
#ifndef EGL_KHR_partial_update
#define EGL_KHR_partial_update 1
#define EGL_BUFFER_AGE_KHR 0x313D
@@ -402,11 +439,28 @@ EGLAPI void EGLAPIENTRY eglSetBlobCacheFuncsANDROID (EGLDisplay dpy, EGLSetBlobF
#endif
#endif /* EGL_ANDROID_blob_cache */
#ifndef EGL_ANDROID_create_native_client_buffer
#define EGL_ANDROID_create_native_client_buffer 1
#define EGL_NATIVE_BUFFER_USAGE_ANDROID 0x3143
#define EGL_NATIVE_BUFFER_USAGE_PROTECTED_BIT_ANDROID 0x00000001
#define EGL_NATIVE_BUFFER_USAGE_RENDERBUFFER_BIT_ANDROID 0x00000002
#define EGL_NATIVE_BUFFER_USAGE_TEXTURE_BIT_ANDROID 0x00000004
typedef EGLClientBuffer (EGLAPIENTRYP PFNEGLCREATENATIVECLIENTBUFFERANDROIDPROC) (const EGLint *attrib_list);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLClientBuffer EGLAPIENTRY eglCreateNativeClientBufferANDROID (const EGLint *attrib_list);
#endif
#endif /* EGL_ANDROID_create_native_client_buffer */
#ifndef EGL_ANDROID_framebuffer_target
#define EGL_ANDROID_framebuffer_target 1
#define EGL_FRAMEBUFFER_TARGET_ANDROID 0x3147
#endif /* EGL_ANDROID_framebuffer_target */
#ifndef EGL_ANDROID_front_buffer_auto_refresh
#define EGL_ANDROID_front_buffer_auto_refresh 1
#define EGL_FRONT_BUFFER_AUTO_REFRESH_ANDROID 0x314C
#endif /* EGL_ANDROID_front_buffer_auto_refresh */
#ifndef EGL_ANDROID_image_native_buffer
#define EGL_ANDROID_image_native_buffer 1
#define EGL_NATIVE_BUFFER_ANDROID 0x3140
@@ -424,6 +478,15 @@ EGLAPI EGLint EGLAPIENTRY eglDupNativeFenceFDANDROID (EGLDisplay dpy, EGLSyncKHR
#endif
#endif /* EGL_ANDROID_native_fence_sync */
#ifndef EGL_ANDROID_presentation_time
#define EGL_ANDROID_presentation_time 1
typedef khronos_stime_nanoseconds_t EGLnsecsANDROID;
typedef EGLBoolean (EGLAPIENTRYP PFNEGLPRESENTATIONTIMEANDROIDPROC) (EGLDisplay dpy, EGLSurface surface, EGLnsecsANDROID time);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglPresentationTimeANDROID (EGLDisplay dpy, EGLSurface surface, EGLnsecsANDROID time);
#endif
#endif /* EGL_ANDROID_presentation_time */
#ifndef EGL_ANDROID_recordable
#define EGL_ANDROID_recordable 1
#define EGL_RECORDABLE_ANDROID 0x3142
@@ -616,9 +679,13 @@ EGLAPI EGLSurface EGLAPIENTRY eglCreatePlatformPixmapSurfaceEXT (EGLDisplay dpy,
#define EGL_PLATFORM_X11_SCREEN_EXT 0x31D6
#endif /* EGL_EXT_platform_x11 */
#ifndef EGL_EXT_protected_content
#define EGL_EXT_protected_content 1
#define EGL_PROTECTED_CONTENT_EXT 0x32C0
#endif /* EGL_EXT_protected_content */
#ifndef EGL_EXT_protected_surface
#define EGL_EXT_protected_surface 1
#define EGL_PROTECTED_CONTENT_EXT 0x32C0
#endif /* EGL_EXT_protected_surface */
#ifndef EGL_EXT_stream_consumer_egloutput
@@ -697,6 +764,12 @@ EGLAPI EGLSurface EGLAPIENTRY eglCreatePixmapSurfaceHI (EGLDisplay dpy, EGLConfi
#define EGL_CONTEXT_PRIORITY_LOW_IMG 0x3103
#endif /* EGL_IMG_context_priority */
#ifndef EGL_IMG_image_plane_attribs
#define EGL_IMG_image_plane_attribs 1
#define EGL_NATIVE_BUFFER_MULTIPLANE_SEPARATE_IMG 0x3105
#define EGL_NATIVE_BUFFER_PLANE_OFFSET_IMG 0x3106
#endif /* EGL_IMG_image_plane_attribs */
#ifndef EGL_MESA_drm_image
#define EGL_MESA_drm_image 1
#define EGL_DRM_BUFFER_FORMAT_MESA 0x31D0
@@ -812,6 +885,48 @@ EGLAPI EGLBoolean EGLAPIENTRY eglPostSubBufferNV (EGLDisplay dpy, EGLSurface sur
#endif
#endif /* EGL_NV_post_sub_buffer */
#ifndef EGL_NV_robustness_video_memory_purge
#define EGL_NV_robustness_video_memory_purge 1
#define EGL_GENERATE_RESET_ON_VIDEO_MEMORY_PURGE_NV 0x334C
#endif /* EGL_NV_robustness_video_memory_purge */
#ifndef EGL_NV_stream_consumer_gltexture_yuv
#define EGL_NV_stream_consumer_gltexture_yuv 1
#define EGL_YUV_PLANE0_TEXTURE_UNIT_NV 0x332C
#define EGL_YUV_PLANE1_TEXTURE_UNIT_NV 0x332D
#define EGL_YUV_PLANE2_TEXTURE_UNIT_NV 0x332E
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERGLTEXTUREEXTERNALATTRIBSNVPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLAttrib *attrib_list);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerGLTextureExternalAttribsNV (EGLDisplay dpy, EGLStreamKHR stream, EGLAttrib *attrib_list);
#endif
#endif /* EGL_NV_stream_consumer_gltexture_yuv */
#ifndef EGL_NV_stream_metadata
#define EGL_NV_stream_metadata 1
#define EGL_MAX_STREAM_METADATA_BLOCKS_NV 0x3250
#define EGL_MAX_STREAM_METADATA_BLOCK_SIZE_NV 0x3251
#define EGL_MAX_STREAM_METADATA_TOTAL_SIZE_NV 0x3252
#define EGL_PRODUCER_METADATA_NV 0x3253
#define EGL_CONSUMER_METADATA_NV 0x3254
#define EGL_PENDING_METADATA_NV 0x3328
#define EGL_METADATA0_SIZE_NV 0x3255
#define EGL_METADATA1_SIZE_NV 0x3256
#define EGL_METADATA2_SIZE_NV 0x3257
#define EGL_METADATA3_SIZE_NV 0x3258
#define EGL_METADATA0_TYPE_NV 0x3259
#define EGL_METADATA1_TYPE_NV 0x325A
#define EGL_METADATA2_TYPE_NV 0x325B
#define EGL_METADATA3_TYPE_NV 0x325C
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDISPLAYATTRIBNVPROC) (EGLDisplay dpy, EGLint attribute, EGLAttrib *value);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSETSTREAMMETADATANVPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLint n, EGLint offset, EGLint size, const void *data);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSTREAMMETADATANVPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum name, EGLint n, EGLint offset, EGLint size, void *data);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribNV (EGLDisplay dpy, EGLint attribute, EGLAttrib *value);
EGLAPI EGLBoolean EGLAPIENTRY eglSetStreamMetadataNV (EGLDisplay dpy, EGLStreamKHR stream, EGLint n, EGLint offset, EGLint size, const void *data);
EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamMetadataNV (EGLDisplay dpy, EGLStreamKHR stream, EGLenum name, EGLint n, EGLint offset, EGLint size, void *data);
#endif
#endif /* EGL_NV_stream_metadata */
#ifndef EGL_NV_stream_sync
#define EGL_NV_stream_sync 1
#define EGL_SYNC_NEW_FRAME_NV 0x321F

View File

@@ -84,6 +84,11 @@ typedef EGLBoolean (EGLAPIENTRYP PFNEGLSWAPBUFFERSREGIONNOK) (EGLDisplay dpy, EG
#define EGL_NO_CONFIG_MESA ((EGLConfig)0)
#endif
#ifndef EGL_MESA_platform_surfaceless
#define EGL_MESA_platform_surfaceless 1
#define EGL_PLATFORM_SURFACELESS_MESA 0x31DD
#endif /* EGL_MESA_platform_surfaceless */
#ifdef __cplusplus
}
#endif

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,7 +33,7 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 27684 $ on $Date: 2014-08-11 01:21:35 -0700 (Mon, 11 Aug 2014) $
** Khronos $Revision: 32433 $ on $Date: 2016-02-10 02:02:08 -0500 (Wed, 10 Feb 2016) $
*/
#if defined(_WIN32) && !defined(APIENTRY) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__)
@@ -1160,6 +1160,22 @@ typedef unsigned short GLhalf;
#define GL_COLOR_ATTACHMENT13 0x8CED
#define GL_COLOR_ATTACHMENT14 0x8CEE
#define GL_COLOR_ATTACHMENT15 0x8CEF
#define GL_COLOR_ATTACHMENT16 0x8CF0
#define GL_COLOR_ATTACHMENT17 0x8CF1
#define GL_COLOR_ATTACHMENT18 0x8CF2
#define GL_COLOR_ATTACHMENT19 0x8CF3
#define GL_COLOR_ATTACHMENT20 0x8CF4
#define GL_COLOR_ATTACHMENT21 0x8CF5
#define GL_COLOR_ATTACHMENT22 0x8CF6
#define GL_COLOR_ATTACHMENT23 0x8CF7
#define GL_COLOR_ATTACHMENT24 0x8CF8
#define GL_COLOR_ATTACHMENT25 0x8CF9
#define GL_COLOR_ATTACHMENT26 0x8CFA
#define GL_COLOR_ATTACHMENT27 0x8CFB
#define GL_COLOR_ATTACHMENT28 0x8CFC
#define GL_COLOR_ATTACHMENT29 0x8CFD
#define GL_COLOR_ATTACHMENT30 0x8CFE
#define GL_COLOR_ATTACHMENT31 0x8CFF
#define GL_DEPTH_ATTACHMENT 0x8D00
#define GL_STENCIL_ATTACHMENT 0x8D20
#define GL_FRAMEBUFFER 0x8D40
@@ -2097,6 +2113,10 @@ GLAPI void APIENTRY glGetDoublei_v (GLenum target, GLuint index, GLdouble *data)
#ifndef GL_VERSION_4_2
#define GL_VERSION_4_2 1
#define GL_COPY_READ_BUFFER_BINDING 0x8F36
#define GL_COPY_WRITE_BUFFER_BINDING 0x8F37
#define GL_TRANSFORM_FEEDBACK_ACTIVE 0x8E24
#define GL_TRANSFORM_FEEDBACK_PAUSED 0x8E23
#define GL_UNPACK_COMPRESSED_BLOCK_WIDTH 0x9127
#define GL_UNPACK_COMPRESSED_BLOCK_HEIGHT 0x9128
#define GL_UNPACK_COMPRESSED_BLOCK_DEPTH 0x9129
@@ -2642,7 +2662,6 @@ GLAPI void APIENTRY glBindVertexBuffers (GLuint first, GLsizei count, const GLui
#define GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES 0x82FA
#define GL_TEXTURE_TARGET 0x1006
#define GL_QUERY_TARGET 0x82EA
#define GL_TEXTURE_BINDING 0x82EB
#define GL_GUILTY_CONTEXT_RESET 0x8253
#define GL_INNOCENT_CONTEXT_RESET 0x8254
#define GL_UNKNOWN_CONTEXT_RESET 0x8255
@@ -2655,25 +2674,25 @@ GLAPI void APIENTRY glBindVertexBuffers (GLuint first, GLsizei count, const GLui
typedef void (APIENTRYP PFNGLCLIPCONTROLPROC) (GLenum origin, GLenum depth);
typedef void (APIENTRYP PFNGLCREATETRANSFORMFEEDBACKSPROC) (GLsizei n, GLuint *ids);
typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERBASEPROC) (GLuint xfb, GLuint index, GLuint buffer);
typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERRANGEPROC) (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizei size);
typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERRANGEPROC) (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKIVPROC) (GLuint xfb, GLenum pname, GLint *param);
typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKI_VPROC) (GLuint xfb, GLenum pname, GLuint index, GLint *param);
typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKI64_VPROC) (GLuint xfb, GLenum pname, GLuint index, GLint64 *param);
typedef void (APIENTRYP PFNGLCREATEBUFFERSPROC) (GLsizei n, GLuint *buffers);
typedef void (APIENTRYP PFNGLNAMEDBUFFERSTORAGEPROC) (GLuint buffer, GLsizei size, const void *data, GLbitfield flags);
typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAPROC) (GLuint buffer, GLsizei size, const void *data, GLenum usage);
typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizei size, const void *data);
typedef void (APIENTRYP PFNGLCOPYNAMEDBUFFERSUBDATAPROC) (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizei size);
typedef void (APIENTRYP PFNGLNAMEDBUFFERSTORAGEPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLbitfield flags);
typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLenum usage);
typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, const void *data);
typedef void (APIENTRYP PFNGLCOPYNAMEDBUFFERSUBDATAPROC) (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERDATAPROC) (GLuint buffer, GLenum internalformat, GLenum format, GLenum type, const void *data);
typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizei size, GLenum format, GLenum type, const void *data);
typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERPROC) (GLuint buffer, GLenum access);
typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizei length, GLbitfield access);
typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length, GLbitfield access);
typedef GLboolean (APIENTRYP PFNGLUNMAPNAMEDBUFFERPROC) (GLuint buffer);
typedef void (APIENTRYP PFNGLFLUSHMAPPEDNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizei length);
typedef void (APIENTRYP PFNGLFLUSHMAPPEDNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length);
typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERIVPROC) (GLuint buffer, GLenum pname, GLint *params);
typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERI64VPROC) (GLuint buffer, GLenum pname, GLint64 *params);
typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPOINTERVPROC) (GLuint buffer, GLenum pname, void **params);
typedef void (APIENTRYP PFNGLGETNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizei size, void *data);
typedef void (APIENTRYP PFNGLGETNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, void *data);
typedef void (APIENTRYP PFNGLCREATEFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers);
typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERRENDERBUFFERPROC) (GLuint framebuffer, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERPARAMETERIPROC) (GLuint framebuffer, GLenum pname, GLint param);
@@ -2687,7 +2706,7 @@ typedef void (APIENTRYP PFNGLINVALIDATENAMEDFRAMEBUFFERSUBDATAPROC) (GLuint fram
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERIVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLint *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERUIVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLuint *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLfloat *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFIPROC) (GLuint framebuffer, GLenum buffer, const GLfloat depth, GLint stencil);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFIPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
typedef void (APIENTRYP PFNGLBLITNAMEDFRAMEBUFFERPROC) (GLuint readFramebuffer, GLuint drawFramebuffer, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
typedef GLenum (APIENTRYP PFNGLCHECKNAMEDFRAMEBUFFERSTATUSPROC) (GLuint framebuffer, GLenum target);
typedef void (APIENTRYP PFNGLGETNAMEDFRAMEBUFFERPARAMETERIVPROC) (GLuint framebuffer, GLenum pname, GLint *param);
@@ -2698,7 +2717,7 @@ typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEMULTISAMPLEPROC) (GLuint re
typedef void (APIENTRYP PFNGLGETNAMEDRENDERBUFFERPARAMETERIVPROC) (GLuint renderbuffer, GLenum pname, GLint *params);
typedef void (APIENTRYP PFNGLCREATETEXTURESPROC) (GLenum target, GLsizei n, GLuint *textures);
typedef void (APIENTRYP PFNGLTEXTUREBUFFERPROC) (GLuint texture, GLenum internalformat, GLuint buffer);
typedef void (APIENTRYP PFNGLTEXTUREBUFFERRANGEPROC) (GLuint texture, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizei size);
typedef void (APIENTRYP PFNGLTEXTUREBUFFERRANGEPROC) (GLuint texture, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
typedef void (APIENTRYP PFNGLTEXTURESTORAGE1DPROC) (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width);
typedef void (APIENTRYP PFNGLTEXTURESTORAGE2DPROC) (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (APIENTRYP PFNGLTEXTURESTORAGE3DPROC) (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
@@ -2746,6 +2765,10 @@ typedef void (APIENTRYP PFNGLGETVERTEXARRAYINDEXED64IVPROC) (GLuint vaobj, GLuin
typedef void (APIENTRYP PFNGLCREATESAMPLERSPROC) (GLsizei n, GLuint *samplers);
typedef void (APIENTRYP PFNGLCREATEPROGRAMPIPELINESPROC) (GLsizei n, GLuint *pipelines);
typedef void (APIENTRYP PFNGLCREATEQUERIESPROC) (GLenum target, GLsizei n, GLuint *ids);
typedef void (APIENTRYP PFNGLGETQUERYBUFFEROBJECTI64VPROC) (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
typedef void (APIENTRYP PFNGLGETQUERYBUFFEROBJECTIVPROC) (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
typedef void (APIENTRYP PFNGLGETQUERYBUFFEROBJECTUI64VPROC) (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
typedef void (APIENTRYP PFNGLGETQUERYBUFFEROBJECTUIVPROC) (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
typedef void (APIENTRYP PFNGLMEMORYBARRIERBYREGIONPROC) (GLbitfield barriers);
typedef void (APIENTRYP PFNGLGETTEXTURESUBIMAGEPROC) (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, GLsizei bufSize, void *pixels);
typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXTURESUBIMAGEPROC) (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLsizei bufSize, void *pixels);
@@ -2762,25 +2785,25 @@ typedef void (APIENTRYP PFNGLTEXTUREBARRIERPROC) (void);
GLAPI void APIENTRY glClipControl (GLenum origin, GLenum depth);
GLAPI void APIENTRY glCreateTransformFeedbacks (GLsizei n, GLuint *ids);
GLAPI void APIENTRY glTransformFeedbackBufferBase (GLuint xfb, GLuint index, GLuint buffer);
GLAPI void APIENTRY glTransformFeedbackBufferRange (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizei size);
GLAPI void APIENTRY glTransformFeedbackBufferRange (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
GLAPI void APIENTRY glGetTransformFeedbackiv (GLuint xfb, GLenum pname, GLint *param);
GLAPI void APIENTRY glGetTransformFeedbacki_v (GLuint xfb, GLenum pname, GLuint index, GLint *param);
GLAPI void APIENTRY glGetTransformFeedbacki64_v (GLuint xfb, GLenum pname, GLuint index, GLint64 *param);
GLAPI void APIENTRY glCreateBuffers (GLsizei n, GLuint *buffers);
GLAPI void APIENTRY glNamedBufferStorage (GLuint buffer, GLsizei size, const void *data, GLbitfield flags);
GLAPI void APIENTRY glNamedBufferData (GLuint buffer, GLsizei size, const void *data, GLenum usage);
GLAPI void APIENTRY glNamedBufferSubData (GLuint buffer, GLintptr offset, GLsizei size, const void *data);
GLAPI void APIENTRY glCopyNamedBufferSubData (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizei size);
GLAPI void APIENTRY glNamedBufferStorage (GLuint buffer, GLsizeiptr size, const void *data, GLbitfield flags);
GLAPI void APIENTRY glNamedBufferData (GLuint buffer, GLsizeiptr size, const void *data, GLenum usage);
GLAPI void APIENTRY glNamedBufferSubData (GLuint buffer, GLintptr offset, GLsizeiptr size, const void *data);
GLAPI void APIENTRY glCopyNamedBufferSubData (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
GLAPI void APIENTRY glClearNamedBufferData (GLuint buffer, GLenum internalformat, GLenum format, GLenum type, const void *data);
GLAPI void APIENTRY glClearNamedBufferSubData (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizei size, GLenum format, GLenum type, const void *data);
GLAPI void APIENTRY glClearNamedBufferSubData (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
GLAPI void *APIENTRY glMapNamedBuffer (GLuint buffer, GLenum access);
GLAPI void *APIENTRY glMapNamedBufferRange (GLuint buffer, GLintptr offset, GLsizei length, GLbitfield access);
GLAPI void *APIENTRY glMapNamedBufferRange (GLuint buffer, GLintptr offset, GLsizeiptr length, GLbitfield access);
GLAPI GLboolean APIENTRY glUnmapNamedBuffer (GLuint buffer);
GLAPI void APIENTRY glFlushMappedNamedBufferRange (GLuint buffer, GLintptr offset, GLsizei length);
GLAPI void APIENTRY glFlushMappedNamedBufferRange (GLuint buffer, GLintptr offset, GLsizeiptr length);
GLAPI void APIENTRY glGetNamedBufferParameteriv (GLuint buffer, GLenum pname, GLint *params);
GLAPI void APIENTRY glGetNamedBufferParameteri64v (GLuint buffer, GLenum pname, GLint64 *params);
GLAPI void APIENTRY glGetNamedBufferPointerv (GLuint buffer, GLenum pname, void **params);
GLAPI void APIENTRY glGetNamedBufferSubData (GLuint buffer, GLintptr offset, GLsizei size, void *data);
GLAPI void APIENTRY glGetNamedBufferSubData (GLuint buffer, GLintptr offset, GLsizeiptr size, void *data);
GLAPI void APIENTRY glCreateFramebuffers (GLsizei n, GLuint *framebuffers);
GLAPI void APIENTRY glNamedFramebufferRenderbuffer (GLuint framebuffer, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
GLAPI void APIENTRY glNamedFramebufferParameteri (GLuint framebuffer, GLenum pname, GLint param);
@@ -2794,7 +2817,7 @@ GLAPI void APIENTRY glInvalidateNamedFramebufferSubData (GLuint framebuffer, GLs
GLAPI void APIENTRY glClearNamedFramebufferiv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLint *value);
GLAPI void APIENTRY glClearNamedFramebufferuiv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLuint *value);
GLAPI void APIENTRY glClearNamedFramebufferfv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLfloat *value);
GLAPI void APIENTRY glClearNamedFramebufferfi (GLuint framebuffer, GLenum buffer, const GLfloat depth, GLint stencil);
GLAPI void APIENTRY glClearNamedFramebufferfi (GLuint framebuffer, GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
GLAPI void APIENTRY glBlitNamedFramebuffer (GLuint readFramebuffer, GLuint drawFramebuffer, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
GLAPI GLenum APIENTRY glCheckNamedFramebufferStatus (GLuint framebuffer, GLenum target);
GLAPI void APIENTRY glGetNamedFramebufferParameteriv (GLuint framebuffer, GLenum pname, GLint *param);
@@ -2805,7 +2828,7 @@ GLAPI void APIENTRY glNamedRenderbufferStorageMultisample (GLuint renderbuffer,
GLAPI void APIENTRY glGetNamedRenderbufferParameteriv (GLuint renderbuffer, GLenum pname, GLint *params);
GLAPI void APIENTRY glCreateTextures (GLenum target, GLsizei n, GLuint *textures);
GLAPI void APIENTRY glTextureBuffer (GLuint texture, GLenum internalformat, GLuint buffer);
GLAPI void APIENTRY glTextureBufferRange (GLuint texture, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizei size);
GLAPI void APIENTRY glTextureBufferRange (GLuint texture, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
GLAPI void APIENTRY glTextureStorage1D (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width);
GLAPI void APIENTRY glTextureStorage2D (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
GLAPI void APIENTRY glTextureStorage3D (GLuint texture, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
@@ -2853,6 +2876,10 @@ GLAPI void APIENTRY glGetVertexArrayIndexed64iv (GLuint vaobj, GLuint index, GLe
GLAPI void APIENTRY glCreateSamplers (GLsizei n, GLuint *samplers);
GLAPI void APIENTRY glCreateProgramPipelines (GLsizei n, GLuint *pipelines);
GLAPI void APIENTRY glCreateQueries (GLenum target, GLsizei n, GLuint *ids);
GLAPI void APIENTRY glGetQueryBufferObjecti64v (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
GLAPI void APIENTRY glGetQueryBufferObjectiv (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
GLAPI void APIENTRY glGetQueryBufferObjectui64v (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
GLAPI void APIENTRY glGetQueryBufferObjectuiv (GLuint id, GLuint buffer, GLenum pname, GLintptr offset);
GLAPI void APIENTRY glMemoryBarrierByRegion (GLbitfield barriers);
GLAPI void APIENTRY glGetTextureSubImage (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, GLsizei bufSize, void *pixels);
GLAPI void APIENTRY glGetCompressedTextureSubImage (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLsizei bufSize, void *pixels);
@@ -2990,8 +3017,6 @@ GLAPI void APIENTRY glDispatchComputeGroupSizeARB (GLuint num_groups_x, GLuint n
#ifndef GL_ARB_copy_buffer
#define GL_ARB_copy_buffer 1
#define GL_COPY_READ_BUFFER_BINDING 0x8F36
#define GL_COPY_WRITE_BUFFER_BINDING 0x8F37
#endif /* GL_ARB_copy_buffer */
#ifndef GL_ARB_copy_image
@@ -3346,13 +3371,13 @@ GLAPI void APIENTRY glGetNamedStringivARB (GLint namelen, const GLchar *name, GL
#define GL_ARB_sparse_buffer 1
#define GL_SPARSE_STORAGE_BIT_ARB 0x0400
#define GL_SPARSE_BUFFER_PAGE_SIZE_ARB 0x82F8
typedef void (APIENTRYP PFNGLBUFFERPAGECOMMITMENTARBPROC) (GLenum target, GLintptr offset, GLsizei size, GLboolean commit);
typedef void (APIENTRYP PFNGLNAMEDBUFFERPAGECOMMITMENTEXTPROC) (GLuint buffer, GLintptr offset, GLsizei size, GLboolean commit);
typedef void (APIENTRYP PFNGLNAMEDBUFFERPAGECOMMITMENTARBPROC) (GLuint buffer, GLintptr offset, GLsizei size, GLboolean commit);
typedef void (APIENTRYP PFNGLBUFFERPAGECOMMITMENTARBPROC) (GLenum target, GLintptr offset, GLsizeiptr size, GLboolean commit);
typedef void (APIENTRYP PFNGLNAMEDBUFFERPAGECOMMITMENTEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, GLboolean commit);
typedef void (APIENTRYP PFNGLNAMEDBUFFERPAGECOMMITMENTARBPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, GLboolean commit);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glBufferPageCommitmentARB (GLenum target, GLintptr offset, GLsizei size, GLboolean commit);
GLAPI void APIENTRY glNamedBufferPageCommitmentEXT (GLuint buffer, GLintptr offset, GLsizei size, GLboolean commit);
GLAPI void APIENTRY glNamedBufferPageCommitmentARB (GLuint buffer, GLintptr offset, GLsizei size, GLboolean commit);
GLAPI void APIENTRY glBufferPageCommitmentARB (GLenum target, GLintptr offset, GLsizeiptr size, GLboolean commit);
GLAPI void APIENTRY glNamedBufferPageCommitmentEXT (GLuint buffer, GLintptr offset, GLsizeiptr size, GLboolean commit);
GLAPI void APIENTRY glNamedBufferPageCommitmentARB (GLuint buffer, GLintptr offset, GLsizeiptr size, GLboolean commit);
#endif
#endif /* GL_ARB_sparse_buffer */
@@ -3360,7 +3385,7 @@ GLAPI void APIENTRY glNamedBufferPageCommitmentARB (GLuint buffer, GLintptr offs
#define GL_ARB_sparse_texture 1
#define GL_TEXTURE_SPARSE_ARB 0x91A6
#define GL_VIRTUAL_PAGE_SIZE_INDEX_ARB 0x91A7
#define GL_MIN_SPARSE_LEVEL_ARB 0x919B
#define GL_NUM_SPARSE_LEVELS_ARB 0x91AA
#define GL_NUM_VIRTUAL_PAGE_SIZES_ARB 0x91A8
#define GL_VIRTUAL_PAGE_SIZE_X_ARB 0x9195
#define GL_VIRTUAL_PAGE_SIZE_Y_ARB 0x9196
@@ -3369,9 +3394,9 @@ GLAPI void APIENTRY glNamedBufferPageCommitmentARB (GLuint buffer, GLintptr offs
#define GL_MAX_SPARSE_3D_TEXTURE_SIZE_ARB 0x9199
#define GL_MAX_SPARSE_ARRAY_TEXTURE_LAYERS_ARB 0x919A
#define GL_SPARSE_TEXTURE_FULL_ARRAY_CUBE_MIPMAPS_ARB 0x91A9
typedef void (APIENTRYP PFNGLTEXPAGECOMMITMENTARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
typedef void (APIENTRYP PFNGLTEXPAGECOMMITMENTARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean commit);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean commit);
#endif
#endif /* GL_ARB_sparse_texture */
@@ -3479,8 +3504,6 @@ GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xo
#ifndef GL_ARB_transform_feedback2
#define GL_ARB_transform_feedback2 1
#define GL_TRANSFORM_FEEDBACK_PAUSED 0x8E23
#define GL_TRANSFORM_FEEDBACK_ACTIVE 0x8E24
#endif /* GL_ARB_transform_feedback2 */
#ifndef GL_ARB_transform_feedback3
@@ -3537,6 +3560,11 @@ GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xo
#define GL_KHR_debug 1
#endif /* GL_KHR_debug */
#ifndef GL_KHR_no_error
#define GL_KHR_no_error 1
#define GL_CONTEXT_FLAG_NO_ERROR_BIT_KHR 0x00000008
#endif /* GL_KHR_no_error */
#ifndef GL_KHR_robust_buffer_access_behavior
#define GL_KHR_robust_buffer_access_behavior 1
#endif /* GL_KHR_robust_buffer_access_behavior */
@@ -3582,6 +3610,10 @@ GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xo
#define GL_KHR_texture_compression_astc_ldr 1
#endif /* GL_KHR_texture_compression_astc_ldr */
#ifndef GL_KHR_texture_compression_astc_sliced_3d
#define GL_KHR_texture_compression_astc_sliced_3d 1
#endif /* GL_KHR_texture_compression_astc_sliced_3d */
#ifdef __cplusplus
}
#endif

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2015 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,7 +33,7 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 31811 $ on $Date: 2015-08-10 17:01:11 +1000 (Mon, 10 Aug 2015) $
** Khronos $Revision: 33061 $ on $Date: 2016-07-14 20:14:13 -0400 (Thu, 14 Jul 2016) $
*/
#if defined(_WIN32) && !defined(APIENTRY) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__)
@@ -53,7 +53,7 @@ extern "C" {
#define GLAPI extern
#endif
#define GL_GLEXT_VERSION 20150809
#define GL_GLEXT_VERSION 20160714
/* Generated C header for:
* API: gl
@@ -2654,7 +2654,7 @@ typedef void (APIENTRYP PFNGLINVALIDATENAMEDFRAMEBUFFERSUBDATAPROC) (GLuint fram
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERIVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLint *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERUIVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLuint *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFVPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLfloat *value);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFIPROC) (GLuint framebuffer, GLenum buffer, const GLfloat depth, GLint stencil);
typedef void (APIENTRYP PFNGLCLEARNAMEDFRAMEBUFFERFIPROC) (GLuint framebuffer, GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
typedef void (APIENTRYP PFNGLBLITNAMEDFRAMEBUFFERPROC) (GLuint readFramebuffer, GLuint drawFramebuffer, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
typedef GLenum (APIENTRYP PFNGLCHECKNAMEDFRAMEBUFFERSTATUSPROC) (GLuint framebuffer, GLenum target);
typedef void (APIENTRYP PFNGLGETNAMEDFRAMEBUFFERPARAMETERIVPROC) (GLuint framebuffer, GLenum pname, GLint *param);
@@ -2777,7 +2777,7 @@ GLAPI void APIENTRY glInvalidateNamedFramebufferSubData (GLuint framebuffer, GLs
GLAPI void APIENTRY glClearNamedFramebufferiv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLint *value);
GLAPI void APIENTRY glClearNamedFramebufferuiv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLuint *value);
GLAPI void APIENTRY glClearNamedFramebufferfv (GLuint framebuffer, GLenum buffer, GLint drawbuffer, const GLfloat *value);
GLAPI void APIENTRY glClearNamedFramebufferfi (GLuint framebuffer, GLenum buffer, const GLfloat depth, GLint stencil);
GLAPI void APIENTRY glClearNamedFramebufferfi (GLuint framebuffer, GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
GLAPI void APIENTRY glBlitNamedFramebuffer (GLuint readFramebuffer, GLuint drawFramebuffer, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
GLAPI GLenum APIENTRY glCheckNamedFramebufferStatus (GLuint framebuffer, GLenum target);
GLAPI void APIENTRY glGetNamedFramebufferParameteriv (GLuint framebuffer, GLenum pname, GLint *param);
@@ -4984,6 +4984,10 @@ GLAPI void APIENTRY glBlendBarrierKHR (void);
#define GL_KHR_texture_compression_astc_ldr 1
#endif /* GL_KHR_texture_compression_astc_ldr */
#ifndef GL_KHR_texture_compression_astc_sliced_3d
#define GL_KHR_texture_compression_astc_sliced_3d 1
#endif /* GL_KHR_texture_compression_astc_sliced_3d */
#ifndef GL_OES_byte_coordinates
#define GL_OES_byte_coordinates 1
typedef void (APIENTRYP PFNGLMULTITEXCOORD1BOESPROC) (GLenum texture, GLbyte s);
@@ -5597,6 +5601,10 @@ GLAPI void APIENTRY glSetMultisamplefvAMD (GLenum pname, GLuint index, const GLf
#define GL_AMD_shader_atomic_counter_ops 1
#endif /* GL_AMD_shader_atomic_counter_ops */
#ifndef GL_AMD_shader_explicit_vertex_parameter
#define GL_AMD_shader_explicit_vertex_parameter 1
#endif /* GL_AMD_shader_explicit_vertex_parameter */
#ifndef GL_AMD_shader_stencil_export
#define GL_AMD_shader_stencil_export 1
#endif /* GL_AMD_shader_stencil_export */
@@ -8637,6 +8645,20 @@ GLAPI void APIENTRY glVertexWeightPointerEXT (GLint size, GLenum type, GLsizei s
#endif
#endif /* GL_EXT_vertex_weighting */
#ifndef GL_EXT_window_rectangles
#define GL_EXT_window_rectangles 1
#define GL_INCLUSIVE_EXT 0x8F10
#define GL_EXCLUSIVE_EXT 0x8F11
#define GL_WINDOW_RECTANGLE_EXT 0x8F12
#define GL_WINDOW_RECTANGLE_MODE_EXT 0x8F13
#define GL_MAX_WINDOW_RECTANGLES_EXT 0x8F14
#define GL_NUM_WINDOW_RECTANGLES_EXT 0x8F15
typedef void (APIENTRYP PFNGLWINDOWRECTANGLESEXTPROC) (GLenum mode, GLsizei count, const GLint *box);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glWindowRectanglesEXT (GLenum mode, GLsizei count, const GLint *box);
#endif
#endif /* GL_EXT_window_rectangles */
#ifndef GL_EXT_x11_sync_object
#define GL_EXT_x11_sync_object 1
#define GL_SYNC_X11_FENCE_EXT 0x90E1
@@ -8814,6 +8836,11 @@ GLAPI void APIENTRY glBlendFuncSeparateINGR (GLenum sfactorRGB, GLenum dfactorRG
#define GL_INTERLACE_READ_INGR 0x8568
#endif /* GL_INGR_interlace_read */
#ifndef GL_INTEL_conservative_rasterization
#define GL_INTEL_conservative_rasterization 1
#define GL_CONSERVATIVE_RASTERIZATION_INTEL 0x83FE
#endif /* GL_INTEL_conservative_rasterization */
#ifndef GL_INTEL_fragment_shader_ordering
#define GL_INTEL_fragment_shader_ordering 1
#endif /* GL_INTEL_fragment_shader_ordering */
@@ -9130,6 +9157,17 @@ GLAPI void APIENTRY glBlendBarrierNV (void);
#define GL_NV_blend_square 1
#endif /* GL_NV_blend_square */
#ifndef GL_NV_clip_space_w_scaling
#define GL_NV_clip_space_w_scaling 1
#define GL_VIEWPORT_POSITION_W_SCALE_NV 0x937C
#define GL_VIEWPORT_POSITION_W_SCALE_X_COEFF_NV 0x937D
#define GL_VIEWPORT_POSITION_W_SCALE_Y_COEFF_NV 0x937E
typedef void (APIENTRYP PFNGLVIEWPORTPOSITIONWSCALENVPROC) (GLuint index, GLfloat xcoeff, GLfloat ycoeff);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glViewportPositionWScaleNV (GLuint index, GLfloat xcoeff, GLfloat ycoeff);
#endif
#endif /* GL_NV_clip_space_w_scaling */
#ifndef GL_NV_command_list
#define GL_NV_command_list 1
#define GL_TERMINATE_SEQUENCE_COMMAND_NV 0x0000
@@ -9232,6 +9270,17 @@ GLAPI void APIENTRY glConservativeRasterParameterfNV (GLenum pname, GLfloat valu
#endif
#endif /* GL_NV_conservative_raster_dilate */
#ifndef GL_NV_conservative_raster_pre_snap_triangles
#define GL_NV_conservative_raster_pre_snap_triangles 1
#define GL_CONSERVATIVE_RASTER_MODE_NV 0x954D
#define GL_CONSERVATIVE_RASTER_MODE_POST_SNAP_NV 0x954E
#define GL_CONSERVATIVE_RASTER_MODE_PRE_SNAP_TRIANGLES_NV 0x954F
typedef void (APIENTRYP PFNGLCONSERVATIVERASTERPARAMETERINVPROC) (GLenum pname, GLint param);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glConservativeRasterParameteriNV (GLenum pname, GLint param);
#endif
#endif /* GL_NV_conservative_raster_pre_snap_triangles */
#ifndef GL_NV_copy_depth_to_color
#define GL_NV_copy_depth_to_color 1
#define GL_DEPTH_STENCIL_TO_RGBA_NV 0x886E
@@ -10224,6 +10273,11 @@ GLAPI void APIENTRY glGetCombinerStageParameterfvNV (GLenum stage, GLenum pname,
#endif
#endif /* GL_NV_register_combiners2 */
#ifndef GL_NV_robustness_video_memory_purge
#define GL_NV_robustness_video_memory_purge 1
#define GL_PURGED_CONTEXT_RESET_NV 0x92BB
#endif /* GL_NV_robustness_video_memory_purge */
#ifndef GL_NV_sample_locations
#define GL_NV_sample_locations 1
#define GL_SAMPLE_LOCATION_SUBPIXEL_BITS_NV 0x933D
@@ -10256,6 +10310,10 @@ GLAPI void APIENTRY glResolveDepthValuesNV (void);
#define GL_NV_shader_atomic_float 1
#endif /* GL_NV_shader_atomic_float */
#ifndef GL_NV_shader_atomic_float64
#define GL_NV_shader_atomic_float64 1
#endif /* GL_NV_shader_atomic_float64 */
#ifndef GL_NV_shader_atomic_fp16_vector
#define GL_NV_shader_atomic_fp16_vector 1
#endif /* GL_NV_shader_atomic_fp16_vector */
@@ -10319,6 +10377,10 @@ GLAPI void APIENTRY glProgramUniformui64vNV (GLuint program, GLint location, GLs
#define GL_NV_shader_thread_shuffle 1
#endif /* GL_NV_shader_thread_shuffle */
#ifndef GL_NV_stereo_view_rendering
#define GL_NV_stereo_view_rendering 1
#endif /* GL_NV_stereo_view_rendering */
#ifndef GL_NV_tessellation_program5
#define GL_NV_tessellation_program5 1
#define GL_MAX_PROGRAM_PATCH_ATTRIBS_NV 0x86D8
@@ -11089,6 +11151,26 @@ GLAPI void APIENTRY glVideoCaptureStreamParameterdvNV (GLuint video_capture_slot
#define GL_NV_viewport_array2 1
#endif /* GL_NV_viewport_array2 */
#ifndef GL_NV_viewport_swizzle
#define GL_NV_viewport_swizzle 1
#define GL_VIEWPORT_SWIZZLE_POSITIVE_X_NV 0x9350
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_X_NV 0x9351
#define GL_VIEWPORT_SWIZZLE_POSITIVE_Y_NV 0x9352
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_Y_NV 0x9353
#define GL_VIEWPORT_SWIZZLE_POSITIVE_Z_NV 0x9354
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_Z_NV 0x9355
#define GL_VIEWPORT_SWIZZLE_POSITIVE_W_NV 0x9356
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_W_NV 0x9357
#define GL_VIEWPORT_SWIZZLE_X_NV 0x9358
#define GL_VIEWPORT_SWIZZLE_Y_NV 0x9359
#define GL_VIEWPORT_SWIZZLE_Z_NV 0x935A
#define GL_VIEWPORT_SWIZZLE_W_NV 0x935B
typedef void (APIENTRYP PFNGLVIEWPORTSWIZZLENVPROC) (GLuint index, GLenum swizzlex, GLenum swizzley, GLenum swizzlez, GLenum swizzlew);
#ifdef GL_GLEXT_PROTOTYPES
GLAPI void APIENTRY glViewportSwizzleNV (GLuint index, GLenum swizzlex, GLenum swizzley, GLenum swizzlez, GLenum swizzlew);
#endif
#endif /* GL_NV_viewport_swizzle */
#ifndef GL_OML_interlace
#define GL_OML_interlace 1
#define GL_INTERLACE_OML 0x8980

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,10 +33,10 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 27684 $ on $Date: 2014-08-11 01:21:35 -0700 (Mon, 11 Aug 2014) $
** Khronos $Revision: 32889 $ on $Date: 2016-05-31 07:09:51 -0400 (Tue, 31 May 2016) $
*/
#define GLX_GLXEXT_VERSION 20140810
#define GLX_GLXEXT_VERSION 20160531
/* Generated C header for:
* API: glx
@@ -250,6 +250,26 @@ __GLXextFuncPtr glXGetProcAddressARB (const GLubyte *procName);
#define GLX_GPU_NUM_SIMD_AMD 0x21A6
#define GLX_GPU_NUM_RB_AMD 0x21A7
#define GLX_GPU_NUM_SPI_AMD 0x21A8
typedef unsigned int ( *PFNGLXGETGPUIDSAMDPROC) (unsigned int maxCount, unsigned int *ids);
typedef int ( *PFNGLXGETGPUINFOAMDPROC) (unsigned int id, int property, GLenum dataType, unsigned int size, void *data);
typedef unsigned int ( *PFNGLXGETCONTEXTGPUIDAMDPROC) (GLXContext ctx);
typedef GLXContext ( *PFNGLXCREATEASSOCIATEDCONTEXTAMDPROC) (unsigned int id, GLXContext share_list);
typedef GLXContext ( *PFNGLXCREATEASSOCIATEDCONTEXTATTRIBSAMDPROC) (unsigned int id, GLXContext share_context, const int *attribList);
typedef Bool ( *PFNGLXDELETEASSOCIATEDCONTEXTAMDPROC) (GLXContext ctx);
typedef Bool ( *PFNGLXMAKEASSOCIATEDCONTEXTCURRENTAMDPROC) (GLXContext ctx);
typedef GLXContext ( *PFNGLXGETCURRENTASSOCIATEDCONTEXTAMDPROC) (void);
typedef void ( *PFNGLXBLITCONTEXTFRAMEBUFFERAMDPROC) (GLXContext dstCtx, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
#ifdef GLX_GLXEXT_PROTOTYPES
unsigned int glXGetGPUIDsAMD (unsigned int maxCount, unsigned int *ids);
int glXGetGPUInfoAMD (unsigned int id, int property, GLenum dataType, unsigned int size, void *data);
unsigned int glXGetContextGPUIDAMD (GLXContext ctx);
GLXContext glXCreateAssociatedContextAMD (unsigned int id, GLXContext share_list);
GLXContext glXCreateAssociatedContextAttribsAMD (unsigned int id, GLXContext share_context, const int *attribList);
Bool glXDeleteAssociatedContextAMD (GLXContext ctx);
Bool glXMakeAssociatedContextCurrentAMD (GLXContext ctx);
GLXContext glXGetCurrentAssociatedContextAMD (void);
void glXBlitContextFramebufferAMD (GLXContext dstCtx, GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
#endif
#endif /* GLX_AMD_gpu_association */
#ifndef GLX_EXT_buffer_age
@@ -297,6 +317,11 @@ void glXFreeContextEXT (Display *dpy, GLXContext context);
#endif
#endif /* GLX_EXT_import_context */
#ifndef GLX_EXT_libglvnd
#define GLX_EXT_libglvnd 1
#define GLX_VENDOR_NAMES_EXT 0x20F6
#endif /* GLX_EXT_libglvnd */
#ifndef GLX_EXT_stereo_tree
#define GLX_EXT_stereo_tree 1
typedef struct {
@@ -523,6 +548,11 @@ int glXBindVideoDeviceNV (Display *dpy, unsigned int video_slot, unsigned int vi
#endif
#endif /* GLX_NV_present_video */
#ifndef GLX_NV_robustness_video_memory_purge
#define GLX_NV_robustness_video_memory_purge 1
#define GLX_GENERATE_RESET_ON_VIDEO_MEMORY_PURGE_NV 0x20F7
#endif /* GLX_NV_robustness_video_memory_purge */
#ifndef GLX_NV_swap_group
#define GLX_NV_swap_group 1
typedef Bool ( *PFNGLXJOINSWAPGROUPNVPROC) (Display *dpy, GLXDrawable drawable, GLuint group);

View File

@@ -1094,7 +1094,7 @@ struct __DRIdri2ExtensionRec {
* extensions.
*/
#define __DRI_IMAGE "DRI_IMAGE"
#define __DRI_IMAGE_VERSION 12
#define __DRI_IMAGE_VERSION 13
/**
* These formats correspond to the similarly named MESA_FORMAT_*
@@ -1208,6 +1208,8 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_ATTRIB_FOURCC 0x2008 /* available in versions 11 */
#define __DRI_IMAGE_ATTRIB_NUM_PLANES 0x2009 /* available in versions 11 */
#define __DRI_IMAGE_ATTRIB_OFFSET 0x200A /* available in versions 13 */
enum __DRIYUVColorSpace {
__DRI_YUV_COLOR_SPACE_UNDEFINED = 0,
__DRI_YUV_COLOR_SPACE_ITU_REC601 = 0x327F,

View File

@@ -58,8 +58,8 @@ extern "C" {
#endif
/* Forward declarations to avoid inclusion of GL/glx.h */
typedef struct _XDisplay Display;
typedef struct __GLXcontextRec *GLXContext;
struct _XDisplay;
struct __GLXcontextRec;
/* Forward declarations to avoid inclusion of EGL/egl.h */
typedef void *EGLDisplay;
@@ -97,7 +97,7 @@ struct mesa_glinterop_device_info {
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
* by the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_DEVICE_INFO_VERSION macro */
uint32_t version;
@@ -125,7 +125,7 @@ struct mesa_glinterop_export_in {
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
* by the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_EXPORT_IN_VERSION macro */
uint32_t version;
@@ -190,7 +190,7 @@ struct mesa_glinterop_export_out {
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
* by the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_EXPORT_OUT_VERSION macro */
uint32_t version;
@@ -246,7 +246,7 @@ struct mesa_glinterop_export_out {
* \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
*/
int
MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
MesaGLInteropGLXQueryDeviceInfo(struct _XDisplay *dpy, struct __GLXcontextRec *context,
struct mesa_glinterop_device_info *out);
@@ -271,7 +271,7 @@ MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, EGLContext context,
* \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
*/
int
MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
MesaGLInteropGLXExportObject(struct _XDisplay *dpy, struct __GLXcontextRec *context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
@@ -286,11 +286,11 @@ MesaGLInteropEGLExportObject(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_export_out *out);
typedef int (PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(Display *dpy, GLXContext context,
typedef int (PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(struct _XDisplay *dpy, struct __GLXcontextRec *context,
struct mesa_glinterop_device_info *out);
typedef int (PFNMESAGLINTEROPEGLQUERYDEVICEINFOPROC)(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_device_info *out);
typedef int (PFNMESAGLINTEROPGLXEXPORTOBJECTPROC)(Display *dpy, GLXContext context,
typedef int (PFNMESAGLINTEROPGLXEXPORTOBJECTPROC)(struct _XDisplay *dpy, struct __GLXcontextRec *context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
typedef int (PFNMESAGLINTEROPEGLEXPORTOBJECTPROC)(EGLDisplay dpy, EGLContext context,

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,7 +33,7 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 27684 $ on $Date: 2014-08-11 01:21:35 -0700 (Mon, 11 Aug 2014) $
** Khronos $Revision: 32686 $ on $Date: 2016-04-19 21:08:44 -0400 (Tue, 19 Apr 2016) $
*/
#if defined(_WIN32) && !defined(APIENTRY) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__)
@@ -41,7 +41,7 @@ extern "C" {
#include <windows.h>
#endif
#define WGL_WGLEXT_VERSION 20140810
#define WGL_WGLEXT_VERSION 20160419
/* Generated C header for:
* API: wgl

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,12 +33,16 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 24614 $ on $Date: 2013-12-30 04:44:46 -0800 (Mon, 30 Dec 2013) $
** Khronos $Revision: 32749 $ on $Date: 2016-04-28 09:03:03 -0700 (Thu, 28 Apr 2016) $
*/
#include <GLES2/gl2platform.h>
/* Generated on date 20131230 */
#ifndef GL_APIENTRYP
#define GL_APIENTRYP GL_APIENTRY*
#endif
/* Generated on date 20160428 */
/* Generated C header for:
* API: gles2
@@ -374,6 +378,148 @@ typedef khronos_uint8_t GLubyte;
#define GL_RENDERBUFFER_BINDING 0x8CA7
#define GL_MAX_RENDERBUFFER_SIZE 0x84E8
#define GL_INVALID_FRAMEBUFFER_OPERATION 0x0506
typedef void (GL_APIENTRYP PFNGLACTIVETEXTUREPROC) (GLenum texture);
typedef void (GL_APIENTRYP PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
typedef void (GL_APIENTRYP PFNGLBINDFRAMEBUFFERPROC) (GLenum target, GLuint framebuffer);
typedef void (GL_APIENTRYP PFNGLBINDRENDERBUFFERPROC) (GLenum target, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLBINDTEXTUREPROC) (GLenum target, GLuint texture);
typedef void (GL_APIENTRYP PFNGLBLENDCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONSEPARATEPROC) (GLenum modeRGB, GLenum modeAlpha);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCPROC) (GLenum sfactor, GLenum dfactor);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCSEPARATEPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
typedef void (GL_APIENTRYP PFNGLBUFFERDATAPROC) (GLenum target, GLsizeiptr size, const void *data, GLenum usage);
typedef void (GL_APIENTRYP PFNGLBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, const void *data);
typedef GLenum (GL_APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLCLEARPROC) (GLbitfield mask);
typedef void (GL_APIENTRYP PFNGLCLEARCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLCLEARDEPTHFPROC) (GLfloat d);
typedef void (GL_APIENTRYP PFNGLCLEARSTENCILPROC) (GLint s);
typedef void (GL_APIENTRYP PFNGLCOLORMASKPROC) (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha);
typedef void (GL_APIENTRYP PFNGLCOMPILESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOPYTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
typedef void (GL_APIENTRYP PFNGLCOPYTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
typedef GLuint (GL_APIENTRYP PFNGLCREATEPROGRAMPROC) (void);
typedef GLuint (GL_APIENTRYP PFNGLCREATESHADERPROC) (GLenum type);
typedef void (GL_APIENTRYP PFNGLCULLFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLDELETEFRAMEBUFFERSPROC) (GLsizei n, const GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLDELETEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLDELETERENDERBUFFERSPROC) (GLsizei n, const GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLDELETESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLDELETETEXTURESPROC) (GLsizei n, const GLuint *textures);
typedef void (GL_APIENTRYP PFNGLDEPTHFUNCPROC) (GLenum func);
typedef void (GL_APIENTRYP PFNGLDEPTHMASKPROC) (GLboolean flag);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEFPROC) (GLfloat n, GLfloat f);
typedef void (GL_APIENTRYP PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLDISABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSPROC) (GLenum mode, GLint first, GLsizei count);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices);
typedef void (GL_APIENTRYP PFNGLENABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLFINISHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFLUSHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFERPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
typedef void (GL_APIENTRYP PFNGLFRONTFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLGENERATEMIPMAPPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGENFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLGENRENDERBUFFERSPROC) (GLsizei n, GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLGENTEXTURESPROC) (GLsizei n, GLuint *textures);
typedef void (GL_APIENTRYP PFNGLGETACTIVEATTRIBPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETATTACHEDSHADERSPROC) (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *shaders);
typedef GLint (GL_APIENTRYP PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETBOOLEANVPROC) (GLenum pname, GLboolean *data);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef GLenum (GL_APIENTRYP PFNGLGETERRORPROC) (void);
typedef void (GL_APIENTRYP PFNGLGETFLOATVPROC) (GLenum pname, GLfloat *data);
typedef void (GL_APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETINTEGERVPROC) (GLenum pname, GLint *data);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETSHADERPRECISIONFORMATPROC) (GLenum shadertype, GLenum precisiontype, GLint *range, GLint *precision);
typedef void (GL_APIENTRYP PFNGLGETSHADERSOURCEPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source);
typedef const GLubyte *(GL_APIENTRYP PFNGLGETSTRINGPROC) (GLenum name);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMFVPROC) (GLuint program, GLint location, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMIVPROC) (GLuint program, GLint location, GLint *params);
typedef GLint (GL_APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBFVPROC) (GLuint index, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIVPROC) (GLuint index, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVPROC) (GLuint index, GLenum pname, void **pointer);
typedef void (GL_APIENTRYP PFNGLHINTPROC) (GLenum target, GLenum mode);
typedef GLboolean (GL_APIENTRYP PFNGLISBUFFERPROC) (GLuint buffer);
typedef GLboolean (GL_APIENTRYP PFNGLISENABLEDPROC) (GLenum cap);
typedef GLboolean (GL_APIENTRYP PFNGLISFRAMEBUFFERPROC) (GLuint framebuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISPROGRAMPROC) (GLuint program);
typedef GLboolean (GL_APIENTRYP PFNGLISRENDERBUFFERPROC) (GLuint renderbuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISSHADERPROC) (GLuint shader);
typedef GLboolean (GL_APIENTRYP PFNGLISTEXTUREPROC) (GLuint texture);
typedef void (GL_APIENTRYP PFNGLLINEWIDTHPROC) (GLfloat width);
typedef void (GL_APIENTRYP PFNGLLINKPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLPIXELSTOREIPROC) (GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLPOLYGONOFFSETPROC) (GLfloat factor, GLfloat units);
typedef void (GL_APIENTRYP PFNGLREADPIXELSPROC) (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, void *pixels);
typedef void (GL_APIENTRYP PFNGLRELEASESHADERCOMPILERPROC) (void);
typedef void (GL_APIENTRYP PFNGLRENDERBUFFERSTORAGEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSAMPLECOVERAGEPROC) (GLfloat value, GLboolean invert);
typedef void (GL_APIENTRYP PFNGLSCISSORPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSHADERBINARYPROC) (GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length);
typedef void (GL_APIENTRYP PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCPROC) (GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCSEPARATEPROC) (GLenum face, GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKPROC) (GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKSEPARATEPROC) (GLenum face, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILOPPROC) (GLenum fail, GLenum zfail, GLenum zpass);
typedef void (GL_APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
typedef void (GL_APIENTRYP PFNGLTEXIMAGE2DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFPROC) (GLenum target, GLenum pname, GLfloat param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIPROC) (GLenum target, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params);
typedef void (GL_APIENTRYP PFNGLTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FPROC) (GLint location, GLfloat v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FPROC) (GLint location, GLfloat v0, GLfloat v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IPROC) (GLint location, GLint v0, GLint v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IPROC) (GLint location, GLint v0, GLint v1, GLint v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUSEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVALIDATEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FPROC) (GLuint index, GLfloat x);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FPROC) (GLuint index, GLfloat x, GLfloat y);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
typedef void (GL_APIENTRYP PFNGLVIEWPORTPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
GL_APICALL void GL_APIENTRY glActiveTexture (GLenum texture);
GL_APICALL void GL_APIENTRY glAttachShader (GLuint program, GLuint shader);
GL_APICALL void GL_APIENTRY glBindAttribLocation (GLuint program, GLuint index, const GLchar *name);

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2015 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,14 +33,14 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 32120 $ on $Date: 2015-10-15 04:27:13 -0700 (Thu, 15 Oct 2015) $
** Khronos $Revision: 33080 $ on $Date: 2016-08-05 04:09:22 -0700 (Fri, 05 Aug 2016) $
*/
#ifndef GL_APIENTRYP
#define GL_APIENTRYP GL_APIENTRY*
#endif
/* Generated on date 20151015 */
/* Generated on date 20160805 */
/* Generated C header for:
* API: gles2
@@ -52,6 +52,10 @@ extern "C" {
* Extensions removed: _nomatch_^
*/
#ifndef GL_ARB_sparse_texture2
#define GL_ARB_sparse_texture2 1
#endif /* GL_ARB_sparse_texture2 */
#ifndef GL_KHR_blend_equation_advanced
#define GL_KHR_blend_equation_advanced 1
#define GL_MULTIPLY_KHR 0x9294
@@ -752,6 +756,34 @@ GL_APICALL GLboolean GL_APIENTRY glIsVertexArrayOES (GLuint array);
#define GL_INT_10_10_10_2_OES 0x8DF7
#endif /* GL_OES_vertex_type_10_10_10_2 */
#ifndef GL_OES_viewport_array
#define GL_OES_viewport_array 1
#define GL_MAX_VIEWPORTS_OES 0x825B
#define GL_VIEWPORT_SUBPIXEL_BITS_OES 0x825C
#define GL_VIEWPORT_BOUNDS_RANGE_OES 0x825D
#define GL_VIEWPORT_INDEX_PROVOKING_VERTEX_OES 0x825F
typedef void (GL_APIENTRYP PFNGLVIEWPORTARRAYVOESPROC) (GLuint first, GLsizei count, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVIEWPORTINDEXEDFOESPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat w, GLfloat h);
typedef void (GL_APIENTRYP PFNGLVIEWPORTINDEXEDFVOESPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLSCISSORARRAYVOESPROC) (GLuint first, GLsizei count, const GLint *v);
typedef void (GL_APIENTRYP PFNGLSCISSORINDEXEDOESPROC) (GLuint index, GLint left, GLint bottom, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSCISSORINDEXEDVOESPROC) (GLuint index, const GLint *v);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEARRAYFVOESPROC) (GLuint first, GLsizei count, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEINDEXEDFOESPROC) (GLuint index, GLfloat n, GLfloat f);
typedef void (GL_APIENTRYP PFNGLGETFLOATI_VOESPROC) (GLenum target, GLuint index, GLfloat *data);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glViewportArrayvOES (GLuint first, GLsizei count, const GLfloat *v);
GL_APICALL void GL_APIENTRY glViewportIndexedfOES (GLuint index, GLfloat x, GLfloat y, GLfloat w, GLfloat h);
GL_APICALL void GL_APIENTRY glViewportIndexedfvOES (GLuint index, const GLfloat *v);
GL_APICALL void GL_APIENTRY glScissorArrayvOES (GLuint first, GLsizei count, const GLint *v);
GL_APICALL void GL_APIENTRY glScissorIndexedOES (GLuint index, GLint left, GLint bottom, GLsizei width, GLsizei height);
GL_APICALL void GL_APIENTRY glScissorIndexedvOES (GLuint index, const GLint *v);
GL_APICALL void GL_APIENTRY glDepthRangeArrayfvOES (GLuint first, GLsizei count, const GLfloat *v);
GL_APICALL void GL_APIENTRY glDepthRangeIndexedfOES (GLuint index, GLfloat n, GLfloat f);
GL_APICALL void GL_APIENTRY glGetFloati_vOES (GLenum target, GLuint index, GLfloat *data);
#endif
#endif /* GL_OES_viewport_array */
#ifndef GL_AMD_compressed_3DC_texture
#define GL_AMD_compressed_3DC_texture 1
#define GL_3DC_X_AMD 0x87F9
@@ -1086,6 +1118,21 @@ GL_APICALL void GL_APIENTRY glBufferStorageEXT (GLenum target, GLsizeiptr size,
#endif
#endif /* GL_EXT_buffer_storage */
#ifndef GL_EXT_clip_cull_distance
#define GL_EXT_clip_cull_distance 1
#define GL_MAX_CLIP_DISTANCES_EXT 0x0D32
#define GL_MAX_CULL_DISTANCES_EXT 0x82F9
#define GL_MAX_COMBINED_CLIP_AND_CULL_DISTANCES_EXT 0x82FA
#define GL_CLIP_DISTANCE0_EXT 0x3000
#define GL_CLIP_DISTANCE1_EXT 0x3001
#define GL_CLIP_DISTANCE2_EXT 0x3002
#define GL_CLIP_DISTANCE3_EXT 0x3003
#define GL_CLIP_DISTANCE4_EXT 0x3004
#define GL_CLIP_DISTANCE5_EXT 0x3005
#define GL_CLIP_DISTANCE6_EXT 0x3006
#define GL_CLIP_DISTANCE7_EXT 0x3007
#endif /* GL_EXT_clip_cull_distance */
#ifndef GL_EXT_color_buffer_float
#define GL_EXT_color_buffer_float 1
#endif /* GL_EXT_color_buffer_float */
@@ -1412,6 +1459,15 @@ GL_APICALL void GL_APIENTRY glGetIntegeri_vEXT (GLenum target, GLuint index, GLi
#define GL_ANY_SAMPLES_PASSED_CONSERVATIVE_EXT 0x8D6A
#endif /* GL_EXT_occlusion_query_boolean */
#ifndef GL_EXT_polygon_offset_clamp
#define GL_EXT_polygon_offset_clamp 1
#define GL_POLYGON_OFFSET_CLAMP_EXT 0x8E1B
typedef void (GL_APIENTRYP PFNGLPOLYGONOFFSETCLAMPEXTPROC) (GLfloat factor, GLfloat units, GLfloat clamp);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glPolygonOffsetClampEXT (GLfloat factor, GLfloat units, GLfloat clamp);
#endif
#endif /* GL_EXT_polygon_offset_clamp */
#ifndef GL_EXT_post_depth_coverage
#define GL_EXT_post_depth_coverage 1
#endif /* GL_EXT_post_depth_coverage */
@@ -1425,6 +1481,12 @@ GL_APICALL void GL_APIENTRY glPrimitiveBoundingBoxEXT (GLfloat minX, GLfloat min
#endif
#endif /* GL_EXT_primitive_bounding_box */
#ifndef GL_EXT_protected_textures
#define GL_EXT_protected_textures 1
#define GL_CONTEXT_FLAG_PROTECTED_CONTENT_BIT_EXT 0x00000010
#define GL_TEXTURE_PROTECTED_EXT 0x8BFA
#endif /* GL_EXT_protected_textures */
#ifndef GL_EXT_pvrtc_sRGB
#define GL_EXT_pvrtc_sRGB 1
#define GL_COMPRESSED_SRGB_PVRTC_2BPPV1_EXT 0x8A54
@@ -1604,6 +1666,10 @@ GL_APICALL void GL_APIENTRY glProgramUniformMatrix4x3fvEXT (GLuint program, GLin
#define GL_FRAGMENT_SHADER_DISCARDS_SAMPLES_EXT 0x8A52
#endif /* GL_EXT_shader_framebuffer_fetch */
#ifndef GL_EXT_shader_group_vote
#define GL_EXT_shader_group_vote 1
#endif /* GL_EXT_shader_group_vote */
#ifndef GL_EXT_shader_implicit_conversions
#define GL_EXT_shader_implicit_conversions 1
#endif /* GL_EXT_shader_implicit_conversions */
@@ -1616,6 +1682,10 @@ GL_APICALL void GL_APIENTRY glProgramUniformMatrix4x3fvEXT (GLuint program, GLin
#define GL_EXT_shader_io_blocks 1
#endif /* GL_EXT_shader_io_blocks */
#ifndef GL_EXT_shader_non_constant_global_initializers
#define GL_EXT_shader_non_constant_global_initializers 1
#endif /* GL_EXT_shader_non_constant_global_initializers */
#ifndef GL_EXT_shader_pixel_local_storage
#define GL_EXT_shader_pixel_local_storage 1
#define GL_MAX_SHADER_PIXEL_LOCAL_STORAGE_FAST_SIZE_EXT 0x8F63
@@ -1623,6 +1693,21 @@ GL_APICALL void GL_APIENTRY glProgramUniformMatrix4x3fvEXT (GLuint program, GLin
#define GL_SHADER_PIXEL_LOCAL_STORAGE_EXT 0x8F64
#endif /* GL_EXT_shader_pixel_local_storage */
#ifndef GL_EXT_shader_pixel_local_storage2
#define GL_EXT_shader_pixel_local_storage2 1
#define GL_MAX_SHADER_COMBINED_LOCAL_STORAGE_FAST_SIZE_EXT 0x9650
#define GL_MAX_SHADER_COMBINED_LOCAL_STORAGE_SIZE_EXT 0x9651
#define GL_FRAMEBUFFER_INCOMPLETE_INSUFFICIENT_SHADER_COMBINED_LOCAL_STORAGE_EXT 0x9652
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERPIXELLOCALSTORAGESIZEEXTPROC) (GLuint target, GLsizei size);
typedef GLsizei (GL_APIENTRYP PFNGLGETFRAMEBUFFERPIXELLOCALSTORAGESIZEEXTPROC) (GLuint target);
typedef void (GL_APIENTRYP PFNGLCLEARPIXELLOCALSTORAGEUIEXTPROC) (GLsizei offset, GLsizei n, const GLuint *values);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glFramebufferPixelLocalStorageSizeEXT (GLuint target, GLsizei size);
GL_APICALL GLsizei GL_APIENTRY glGetFramebufferPixelLocalStorageSizeEXT (GLuint target);
GL_APICALL void GL_APIENTRY glClearPixelLocalStorageuiEXT (GLsizei offset, GLsizei n, const GLuint *values);
#endif
#endif /* GL_EXT_shader_pixel_local_storage2 */
#ifndef GL_EXT_shader_texture_lod
#define GL_EXT_shader_texture_lod 1
#endif /* GL_EXT_shader_texture_lod */
@@ -1888,11 +1973,39 @@ GL_APICALL void GL_APIENTRY glTextureViewEXT (GLuint texture, GLenum target, GLu
#define GL_UNPACK_SKIP_PIXELS_EXT 0x0CF4
#endif /* GL_EXT_unpack_subimage */
#ifndef GL_EXT_window_rectangles
#define GL_EXT_window_rectangles 1
#define GL_INCLUSIVE_EXT 0x8F10
#define GL_EXCLUSIVE_EXT 0x8F11
#define GL_WINDOW_RECTANGLE_EXT 0x8F12
#define GL_WINDOW_RECTANGLE_MODE_EXT 0x8F13
#define GL_MAX_WINDOW_RECTANGLES_EXT 0x8F14
#define GL_NUM_WINDOW_RECTANGLES_EXT 0x8F15
typedef void (GL_APIENTRYP PFNGLWINDOWRECTANGLESEXTPROC) (GLenum mode, GLsizei count, const GLint *box);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glWindowRectanglesEXT (GLenum mode, GLsizei count, const GLint *box);
#endif
#endif /* GL_EXT_window_rectangles */
#ifndef GL_FJ_shader_binary_GCCSO
#define GL_FJ_shader_binary_GCCSO 1
#define GL_GCCSO_SHADER_BINARY_FJ 0x9260
#endif /* GL_FJ_shader_binary_GCCSO */
#ifndef GL_IMG_framebuffer_downsample
#define GL_IMG_framebuffer_downsample 1
#define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE_AND_DOWNSAMPLE_IMG 0x913C
#define GL_NUM_DOWNSAMPLE_SCALES_IMG 0x913D
#define GL_DOWNSAMPLE_SCALES_IMG 0x913E
#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_SCALE_IMG 0x913F
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DDOWNSAMPLEIMGPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint xscale, GLint yscale);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERDOWNSAMPLEIMGPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer, GLint xscale, GLint yscale);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glFramebufferTexture2DDownsampleIMG (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint xscale, GLint yscale);
GL_APICALL void GL_APIENTRY glFramebufferTextureLayerDownsampleIMG (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer, GLint xscale, GLint yscale);
#endif
#endif /* GL_IMG_framebuffer_downsample */
#ifndef GL_IMG_multisampled_render_to_texture
#define GL_IMG_multisampled_render_to_texture 1
#define GL_RENDERBUFFER_SAMPLES_IMG 0x9133
@@ -1944,6 +2057,11 @@ GL_APICALL void GL_APIENTRY glFramebufferTexture2DMultisampleIMG (GLenum target,
#define GL_CUBIC_MIPMAP_LINEAR_IMG 0x913B
#endif /* GL_IMG_texture_filter_cubic */
#ifndef GL_INTEL_conservative_rasterization
#define GL_INTEL_conservative_rasterization 1
#define GL_CONSERVATIVE_RASTERIZATION_INTEL 0x83FE
#endif /* GL_INTEL_conservative_rasterization */
#ifndef GL_INTEL_framebuffer_CMAA
#define GL_INTEL_framebuffer_CMAA 1
typedef void (GL_APIENTRYP PFNGLAPPLYFRAMEBUFFERATTACHMENTCMAAINTELPROC) (void);
@@ -2120,6 +2238,17 @@ GL_APICALL void GL_APIENTRY glSubpixelPrecisionBiasNV (GLuint xbits, GLuint ybit
#endif
#endif /* GL_NV_conservative_raster */
#ifndef GL_NV_conservative_raster_pre_snap_triangles
#define GL_NV_conservative_raster_pre_snap_triangles 1
#define GL_CONSERVATIVE_RASTER_MODE_NV 0x954D
#define GL_CONSERVATIVE_RASTER_MODE_POST_SNAP_NV 0x954E
#define GL_CONSERVATIVE_RASTER_MODE_PRE_SNAP_TRIANGLES_NV 0x954F
typedef void (GL_APIENTRYP PFNGLCONSERVATIVERASTERPARAMETERINVPROC) (GLenum pname, GLint param);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glConservativeRasterParameteriNV (GLenum pname, GLint param);
#endif
#endif /* GL_NV_conservative_raster_pre_snap_triangles */
#ifndef GL_NV_copy_buffer
#define GL_NV_copy_buffer 1
#define GL_COPY_READ_BUFFER_NV 0x8F36
@@ -2307,6 +2436,109 @@ GL_APICALL void GL_APIENTRY glRenderbufferStorageMultisampleNV (GLenum target, G
#define GL_NV_geometry_shader_passthrough 1
#endif /* GL_NV_geometry_shader_passthrough */
#ifndef GL_NV_gpu_shader5
#define GL_NV_gpu_shader5 1
typedef khronos_int64_t GLint64EXT;
typedef khronos_uint64_t GLuint64EXT;
#define GL_INT64_NV 0x140E
#define GL_UNSIGNED_INT64_NV 0x140F
#define GL_INT8_NV 0x8FE0
#define GL_INT8_VEC2_NV 0x8FE1
#define GL_INT8_VEC3_NV 0x8FE2
#define GL_INT8_VEC4_NV 0x8FE3
#define GL_INT16_NV 0x8FE4
#define GL_INT16_VEC2_NV 0x8FE5
#define GL_INT16_VEC3_NV 0x8FE6
#define GL_INT16_VEC4_NV 0x8FE7
#define GL_INT64_VEC2_NV 0x8FE9
#define GL_INT64_VEC3_NV 0x8FEA
#define GL_INT64_VEC4_NV 0x8FEB
#define GL_UNSIGNED_INT8_NV 0x8FEC
#define GL_UNSIGNED_INT8_VEC2_NV 0x8FED
#define GL_UNSIGNED_INT8_VEC3_NV 0x8FEE
#define GL_UNSIGNED_INT8_VEC4_NV 0x8FEF
#define GL_UNSIGNED_INT16_NV 0x8FF0
#define GL_UNSIGNED_INT16_VEC2_NV 0x8FF1
#define GL_UNSIGNED_INT16_VEC3_NV 0x8FF2
#define GL_UNSIGNED_INT16_VEC4_NV 0x8FF3
#define GL_UNSIGNED_INT64_VEC2_NV 0x8FF5
#define GL_UNSIGNED_INT64_VEC3_NV 0x8FF6
#define GL_UNSIGNED_INT64_VEC4_NV 0x8FF7
#define GL_FLOAT16_NV 0x8FF8
#define GL_FLOAT16_VEC2_NV 0x8FF9
#define GL_FLOAT16_VEC3_NV 0x8FFA
#define GL_FLOAT16_VEC4_NV 0x8FFB
#define GL_PATCHES 0x000E
typedef void (GL_APIENTRYP PFNGLUNIFORM1I64NVPROC) (GLint location, GLint64EXT x);
typedef void (GL_APIENTRYP PFNGLUNIFORM2I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y);
typedef void (GL_APIENTRYP PFNGLUNIFORM3I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
typedef void (GL_APIENTRYP PFNGLUNIFORM4I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
typedef void (GL_APIENTRYP PFNGLUNIFORM1I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UI64NVPROC) (GLint location, GLuint64EXT x);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMI64VNVPROC) (GLuint program, GLint location, GLint64EXT *params);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1I64NVPROC) (GLuint program, GLint location, GLint64EXT x);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glUniform1i64NV (GLint location, GLint64EXT x);
GL_APICALL void GL_APIENTRY glUniform2i64NV (GLint location, GLint64EXT x, GLint64EXT y);
GL_APICALL void GL_APIENTRY glUniform3i64NV (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
GL_APICALL void GL_APIENTRY glUniform4i64NV (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
GL_APICALL void GL_APIENTRY glUniform1i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform2i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform3i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform4i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform1ui64NV (GLint location, GLuint64EXT x);
GL_APICALL void GL_APIENTRY glUniform2ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y);
GL_APICALL void GL_APIENTRY glUniform3ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
GL_APICALL void GL_APIENTRY glUniform4ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
GL_APICALL void GL_APIENTRY glUniform1ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform2ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform3ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glUniform4ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glGetUniformi64vNV (GLuint program, GLint location, GLint64EXT *params);
GL_APICALL void GL_APIENTRY glProgramUniform1i64NV (GLuint program, GLint location, GLint64EXT x);
GL_APICALL void GL_APIENTRY glProgramUniform2i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y);
GL_APICALL void GL_APIENTRY glProgramUniform3i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
GL_APICALL void GL_APIENTRY glProgramUniform4i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
GL_APICALL void GL_APIENTRY glProgramUniform1i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform2i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform3i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform4i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform1ui64NV (GLuint program, GLint location, GLuint64EXT x);
GL_APICALL void GL_APIENTRY glProgramUniform2ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y);
GL_APICALL void GL_APIENTRY glProgramUniform3ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
GL_APICALL void GL_APIENTRY glProgramUniform4ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
GL_APICALL void GL_APIENTRY glProgramUniform1ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform2ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform3ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
GL_APICALL void GL_APIENTRY glProgramUniform4ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
#endif
#endif /* GL_NV_gpu_shader5 */
#ifndef GL_NV_image_formats
#define GL_NV_image_formats 1
#endif /* GL_NV_image_formats */
@@ -2713,6 +2945,10 @@ GL_APICALL void GL_APIENTRY glResolveDepthValuesNV (void);
#define GL_NV_sample_mask_override_coverage 1
#endif /* GL_NV_sample_mask_override_coverage */
#ifndef GL_NV_shader_atomic_fp16_vector
#define GL_NV_shader_atomic_fp16_vector 1
#endif /* GL_NV_shader_atomic_fp16_vector */
#ifndef GL_NV_shader_noperspective_interpolation
#define GL_NV_shader_noperspective_interpolation 1
#endif /* GL_NV_shader_noperspective_interpolation */
@@ -2779,6 +3015,26 @@ GL_APICALL GLboolean GL_APIENTRY glIsEnablediNV (GLenum target, GLuint index);
#define GL_NV_viewport_array2 1
#endif /* GL_NV_viewport_array2 */
#ifndef GL_NV_viewport_swizzle
#define GL_NV_viewport_swizzle 1
#define GL_VIEWPORT_SWIZZLE_POSITIVE_X_NV 0x9350
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_X_NV 0x9351
#define GL_VIEWPORT_SWIZZLE_POSITIVE_Y_NV 0x9352
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_Y_NV 0x9353
#define GL_VIEWPORT_SWIZZLE_POSITIVE_Z_NV 0x9354
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_Z_NV 0x9355
#define GL_VIEWPORT_SWIZZLE_POSITIVE_W_NV 0x9356
#define GL_VIEWPORT_SWIZZLE_NEGATIVE_W_NV 0x9357
#define GL_VIEWPORT_SWIZZLE_X_NV 0x9358
#define GL_VIEWPORT_SWIZZLE_Y_NV 0x9359
#define GL_VIEWPORT_SWIZZLE_Z_NV 0x935A
#define GL_VIEWPORT_SWIZZLE_W_NV 0x935B
typedef void (GL_APIENTRYP PFNGLVIEWPORTSWIZZLENVPROC) (GLuint index, GLenum swizzlex, GLenum swizzley, GLenum swizzlez, GLenum swizzlew);
#ifdef GL_GLEXT_PROTOTYPES
GL_APICALL void GL_APIENTRY glViewportSwizzleNV (GLuint index, GLenum swizzlex, GLenum swizzley, GLenum swizzlez, GLenum swizzlew);
#endif
#endif /* GL_NV_viewport_swizzle */
#ifndef GL_OVR_multiview
#define GL_OVR_multiview 1
#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_NUM_VIEWS_OVR 0x9630

View File

@@ -1,7 +1,7 @@
#ifndef __gl2platform_h_
#define __gl2platform_h_
/* $Revision: 10602 $ on $Date:: 2010-03-04 22:35:34 -0800 #$ */
/* $Revision: 23328 $ on $Date:: 2013-10-02 02:28:28 -0700 #$ */
/*
* This document is licensed under the SGI Free Software B License Version

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -33,17 +33,21 @@ extern "C" {
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
**
** Khronos $Revision: 24614 $ on $Date: 2013-12-30 04:44:46 -0800 (Mon, 30 Dec 2013) $
** Khronos $Revision: 32749 $ on $Date: 2016-04-28 09:03:03 -0700 (Thu, 28 Apr 2016) $
*/
#include <GLES3/gl3platform.h>
/* Generated on date 20131230 */
#ifndef GL_APIENTRYP
#define GL_APIENTRYP GL_APIENTRY*
#endif
/* Generated on date 20160428 */
/* Generated C header for:
* API: gles2
* Profile: common
* Versions considered: [23]\.[0-9]
* Versions considered: 2\.[0-9]|3\.0
* Versions emitted: .*
* Default extensions included: None
* Additional extensions included: _nomatch_^
@@ -374,6 +378,148 @@ typedef khronos_uint8_t GLubyte;
#define GL_RENDERBUFFER_BINDING 0x8CA7
#define GL_MAX_RENDERBUFFER_SIZE 0x84E8
#define GL_INVALID_FRAMEBUFFER_OPERATION 0x0506
typedef void (GL_APIENTRYP PFNGLACTIVETEXTUREPROC) (GLenum texture);
typedef void (GL_APIENTRYP PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
typedef void (GL_APIENTRYP PFNGLBINDFRAMEBUFFERPROC) (GLenum target, GLuint framebuffer);
typedef void (GL_APIENTRYP PFNGLBINDRENDERBUFFERPROC) (GLenum target, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLBINDTEXTUREPROC) (GLenum target, GLuint texture);
typedef void (GL_APIENTRYP PFNGLBLENDCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONSEPARATEPROC) (GLenum modeRGB, GLenum modeAlpha);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCPROC) (GLenum sfactor, GLenum dfactor);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCSEPARATEPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
typedef void (GL_APIENTRYP PFNGLBUFFERDATAPROC) (GLenum target, GLsizeiptr size, const void *data, GLenum usage);
typedef void (GL_APIENTRYP PFNGLBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, const void *data);
typedef GLenum (GL_APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLCLEARPROC) (GLbitfield mask);
typedef void (GL_APIENTRYP PFNGLCLEARCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLCLEARDEPTHFPROC) (GLfloat d);
typedef void (GL_APIENTRYP PFNGLCLEARSTENCILPROC) (GLint s);
typedef void (GL_APIENTRYP PFNGLCOLORMASKPROC) (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha);
typedef void (GL_APIENTRYP PFNGLCOMPILESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOPYTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
typedef void (GL_APIENTRYP PFNGLCOPYTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
typedef GLuint (GL_APIENTRYP PFNGLCREATEPROGRAMPROC) (void);
typedef GLuint (GL_APIENTRYP PFNGLCREATESHADERPROC) (GLenum type);
typedef void (GL_APIENTRYP PFNGLCULLFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLDELETEFRAMEBUFFERSPROC) (GLsizei n, const GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLDELETEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLDELETERENDERBUFFERSPROC) (GLsizei n, const GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLDELETESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLDELETETEXTURESPROC) (GLsizei n, const GLuint *textures);
typedef void (GL_APIENTRYP PFNGLDEPTHFUNCPROC) (GLenum func);
typedef void (GL_APIENTRYP PFNGLDEPTHMASKPROC) (GLboolean flag);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEFPROC) (GLfloat n, GLfloat f);
typedef void (GL_APIENTRYP PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLDISABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSPROC) (GLenum mode, GLint first, GLsizei count);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices);
typedef void (GL_APIENTRYP PFNGLENABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLFINISHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFLUSHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFERPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
typedef void (GL_APIENTRYP PFNGLFRONTFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLGENERATEMIPMAPPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGENFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLGENRENDERBUFFERSPROC) (GLsizei n, GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLGENTEXTURESPROC) (GLsizei n, GLuint *textures);
typedef void (GL_APIENTRYP PFNGLGETACTIVEATTRIBPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETATTACHEDSHADERSPROC) (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *shaders);
typedef GLint (GL_APIENTRYP PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETBOOLEANVPROC) (GLenum pname, GLboolean *data);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef GLenum (GL_APIENTRYP PFNGLGETERRORPROC) (void);
typedef void (GL_APIENTRYP PFNGLGETFLOATVPROC) (GLenum pname, GLfloat *data);
typedef void (GL_APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETINTEGERVPROC) (GLenum pname, GLint *data);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETSHADERPRECISIONFORMATPROC) (GLenum shadertype, GLenum precisiontype, GLint *range, GLint *precision);
typedef void (GL_APIENTRYP PFNGLGETSHADERSOURCEPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source);
typedef const GLubyte *(GL_APIENTRYP PFNGLGETSTRINGPROC) (GLenum name);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMFVPROC) (GLuint program, GLint location, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMIVPROC) (GLuint program, GLint location, GLint *params);
typedef GLint (GL_APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBFVPROC) (GLuint index, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIVPROC) (GLuint index, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVPROC) (GLuint index, GLenum pname, void **pointer);
typedef void (GL_APIENTRYP PFNGLHINTPROC) (GLenum target, GLenum mode);
typedef GLboolean (GL_APIENTRYP PFNGLISBUFFERPROC) (GLuint buffer);
typedef GLboolean (GL_APIENTRYP PFNGLISENABLEDPROC) (GLenum cap);
typedef GLboolean (GL_APIENTRYP PFNGLISFRAMEBUFFERPROC) (GLuint framebuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISPROGRAMPROC) (GLuint program);
typedef GLboolean (GL_APIENTRYP PFNGLISRENDERBUFFERPROC) (GLuint renderbuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISSHADERPROC) (GLuint shader);
typedef GLboolean (GL_APIENTRYP PFNGLISTEXTUREPROC) (GLuint texture);
typedef void (GL_APIENTRYP PFNGLLINEWIDTHPROC) (GLfloat width);
typedef void (GL_APIENTRYP PFNGLLINKPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLPIXELSTOREIPROC) (GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLPOLYGONOFFSETPROC) (GLfloat factor, GLfloat units);
typedef void (GL_APIENTRYP PFNGLREADPIXELSPROC) (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, void *pixels);
typedef void (GL_APIENTRYP PFNGLRELEASESHADERCOMPILERPROC) (void);
typedef void (GL_APIENTRYP PFNGLRENDERBUFFERSTORAGEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSAMPLECOVERAGEPROC) (GLfloat value, GLboolean invert);
typedef void (GL_APIENTRYP PFNGLSCISSORPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSHADERBINARYPROC) (GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length);
typedef void (GL_APIENTRYP PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCPROC) (GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCSEPARATEPROC) (GLenum face, GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKPROC) (GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKSEPARATEPROC) (GLenum face, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILOPPROC) (GLenum fail, GLenum zfail, GLenum zpass);
typedef void (GL_APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
typedef void (GL_APIENTRYP PFNGLTEXIMAGE2DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFPROC) (GLenum target, GLenum pname, GLfloat param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIPROC) (GLenum target, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params);
typedef void (GL_APIENTRYP PFNGLTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FPROC) (GLint location, GLfloat v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FPROC) (GLint location, GLfloat v0, GLfloat v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IPROC) (GLint location, GLint v0, GLint v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IPROC) (GLint location, GLint v0, GLint v1, GLint v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUSEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVALIDATEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FPROC) (GLuint index, GLfloat x);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FPROC) (GLuint index, GLfloat x, GLfloat y);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
typedef void (GL_APIENTRYP PFNGLVIEWPORTPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
GL_APICALL void GL_APIENTRY glActiveTexture (GLenum texture);
GL_APICALL void GL_APIENTRY glAttachShader (GLuint program, GLuint shader);
GL_APICALL void GL_APIENTRY glBindAttribLocation (GLuint program, GLuint index, const GLchar *name);
@@ -705,6 +851,22 @@ typedef unsigned short GLhalf;
#define GL_COLOR_ATTACHMENT13 0x8CED
#define GL_COLOR_ATTACHMENT14 0x8CEE
#define GL_COLOR_ATTACHMENT15 0x8CEF
#define GL_COLOR_ATTACHMENT16 0x8CF0
#define GL_COLOR_ATTACHMENT17 0x8CF1
#define GL_COLOR_ATTACHMENT18 0x8CF2
#define GL_COLOR_ATTACHMENT19 0x8CF3
#define GL_COLOR_ATTACHMENT20 0x8CF4
#define GL_COLOR_ATTACHMENT21 0x8CF5
#define GL_COLOR_ATTACHMENT22 0x8CF6
#define GL_COLOR_ATTACHMENT23 0x8CF7
#define GL_COLOR_ATTACHMENT24 0x8CF8
#define GL_COLOR_ATTACHMENT25 0x8CF9
#define GL_COLOR_ATTACHMENT26 0x8CFA
#define GL_COLOR_ATTACHMENT27 0x8CFB
#define GL_COLOR_ATTACHMENT28 0x8CFC
#define GL_COLOR_ATTACHMENT29 0x8CFD
#define GL_COLOR_ATTACHMENT30 0x8CFE
#define GL_COLOR_ATTACHMENT31 0x8CFF
#define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE 0x8D56
#define GL_MAX_SAMPLES 0x8D57
#define GL_HALF_FLOAT 0x140B
@@ -826,7 +988,111 @@ typedef unsigned short GLhalf;
#define GL_MAX_ELEMENT_INDEX 0x8D6B
#define GL_NUM_SAMPLE_COUNTS 0x9380
#define GL_TEXTURE_IMMUTABLE_LEVELS 0x82DF
GL_APICALL void GL_APIENTRY glReadBuffer (GLenum mode);
typedef void (GL_APIENTRYP PFNGLREADBUFFERPROC) (GLenum src);
typedef void (GL_APIENTRYP PFNGLDRAWRANGEELEMENTSPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
typedef void (GL_APIENTRYP PFNGLTEXIMAGE3DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLCOPYTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLGENQUERIESPROC) (GLsizei n, GLuint *ids);
typedef void (GL_APIENTRYP PFNGLDELETEQUERIESPROC) (GLsizei n, const GLuint *ids);
typedef GLboolean (GL_APIENTRYP PFNGLISQUERYPROC) (GLuint id);
typedef void (GL_APIENTRYP PFNGLBEGINQUERYPROC) (GLenum target, GLuint id);
typedef void (GL_APIENTRYP PFNGLENDQUERYPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGETQUERYIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETQUERYOBJECTUIVPROC) (GLuint id, GLenum pname, GLuint *params);
typedef GLboolean (GL_APIENTRYP PFNGLUNMAPBUFFERPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPOINTERVPROC) (GLenum target, GLenum pname, void **params);
typedef void (GL_APIENTRYP PFNGLDRAWBUFFERSPROC) (GLsizei n, const GLenum *bufs);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLBLITFRAMEBUFFERPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
typedef void (GL_APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
typedef void *(GL_APIENTRYP PFNGLMAPBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length, GLbitfield access);
typedef void (GL_APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length);
typedef void (GL_APIENTRYP PFNGLBINDVERTEXARRAYPROC) (GLuint array);
typedef void (GL_APIENTRYP PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays);
typedef void (GL_APIENTRYP PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays);
typedef GLboolean (GL_APIENTRYP PFNGLISVERTEXARRAYPROC) (GLuint array);
typedef void (GL_APIENTRYP PFNGLGETINTEGERI_VPROC) (GLenum target, GLuint index, GLint *data);
typedef void (GL_APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKPROC) (GLenum primitiveMode);
typedef void (GL_APIENTRYP PFNGLENDTRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERRANGEPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERBASEPROC) (GLenum target, GLuint index, GLuint buffer);
typedef void (GL_APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSPROC) (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
typedef void (GL_APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBIPOINTERPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIIVPROC) (GLuint index, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIUIVPROC) (GLuint index, GLenum pname, GLuint *params);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4IPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4IVPROC) (GLuint index, const GLint *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4UIVPROC) (GLuint index, const GLuint *v);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMUIVPROC) (GLuint program, GLint location, GLuint *params);
typedef GLint (GL_APIENTRYP PFNGLGETFRAGDATALOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UIPROC) (GLint location, GLuint v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UIPROC) (GLint location, GLuint v0, GLuint v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERIVPROC) (GLenum buffer, GLint drawbuffer, const GLint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERUIVPROC) (GLenum buffer, GLint drawbuffer, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERFVPROC) (GLenum buffer, GLint drawbuffer, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERFIPROC) (GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
typedef const GLubyte *(GL_APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index);
typedef void (GL_APIENTRYP PFNGLCOPYBUFFERSUBDATAPROC) (GLenum readTarget, GLenum writeTarget, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMINDICESPROC) (GLuint program, GLsizei uniformCount, const GLchar *const*uniformNames, GLuint *uniformIndices);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMSIVPROC) (GLuint program, GLsizei uniformCount, const GLuint *uniformIndices, GLenum pname, GLint *params);
typedef GLuint (GL_APIENTRYP PFNGLGETUNIFORMBLOCKINDEXPROC) (GLuint program, const GLchar *uniformBlockName);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKIVPROC) (GLuint program, GLuint uniformBlockIndex, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKNAMEPROC) (GLuint program, GLuint uniformBlockIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformBlockName);
typedef void (GL_APIENTRYP PFNGLUNIFORMBLOCKBINDINGPROC) (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSINSTANCEDPROC) (GLenum mode, GLint first, GLsizei count, GLsizei instancecount);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSINSTANCEDPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount);
typedef GLsync (GL_APIENTRYP PFNGLFENCESYNCPROC) (GLenum condition, GLbitfield flags);
typedef GLboolean (GL_APIENTRYP PFNGLISSYNCPROC) (GLsync sync);
typedef void (GL_APIENTRYP PFNGLDELETESYNCPROC) (GLsync sync);
typedef GLenum (GL_APIENTRYP PFNGLCLIENTWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
typedef void (GL_APIENTRYP PFNGLWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
typedef void (GL_APIENTRYP PFNGLGETINTEGER64VPROC) (GLenum pname, GLint64 *data);
typedef void (GL_APIENTRYP PFNGLGETSYNCIVPROC) (GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
typedef void (GL_APIENTRYP PFNGLGETINTEGER64I_VPROC) (GLenum target, GLuint index, GLint64 *data);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPARAMETERI64VPROC) (GLenum target, GLenum pname, GLint64 *params);
typedef void (GL_APIENTRYP PFNGLGENSAMPLERSPROC) (GLsizei count, GLuint *samplers);
typedef void (GL_APIENTRYP PFNGLDELETESAMPLERSPROC) (GLsizei count, const GLuint *samplers);
typedef GLboolean (GL_APIENTRYP PFNGLISSAMPLERPROC) (GLuint sampler);
typedef void (GL_APIENTRYP PFNGLBINDSAMPLERPROC) (GLuint unit, GLuint sampler);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERIPROC) (GLuint sampler, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, const GLint *param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERFPROC) (GLuint sampler, GLenum pname, GLfloat param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, const GLfloat *param);
typedef void (GL_APIENTRYP PFNGLGETSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBDIVISORPROC) (GLuint index, GLuint divisor);
typedef void (GL_APIENTRYP PFNGLBINDTRANSFORMFEEDBACKPROC) (GLenum target, GLuint id);
typedef void (GL_APIENTRYP PFNGLDELETETRANSFORMFEEDBACKSPROC) (GLsizei n, const GLuint *ids);
typedef void (GL_APIENTRYP PFNGLGENTRANSFORMFEEDBACKSPROC) (GLsizei n, GLuint *ids);
typedef GLboolean (GL_APIENTRYP PFNGLISTRANSFORMFEEDBACKPROC) (GLuint id);
typedef void (GL_APIENTRYP PFNGLPAUSETRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLRESUMETRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMBINARYPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLenum *binaryFormat, void *binary);
typedef void (GL_APIENTRYP PFNGLPROGRAMBINARYPROC) (GLuint program, GLenum binaryFormat, const void *binary, GLsizei length);
typedef void (GL_APIENTRYP PFNGLPROGRAMPARAMETERIPROC) (GLuint program, GLenum pname, GLint value);
typedef void (GL_APIENTRYP PFNGLINVALIDATEFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments);
typedef void (GL_APIENTRYP PFNGLINVALIDATESUBFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments, GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLTEXSTORAGE2DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLTEXSTORAGE3DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
typedef void (GL_APIENTRYP PFNGLGETINTERNALFORMATIVPROC) (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint *params);
GL_APICALL void GL_APIENTRY glReadBuffer (GLenum src);
GL_APICALL void GL_APIENTRY glDrawRangeElements (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
GL_APICALL void GL_APIENTRY glTexImage3D (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
GL_APICALL void GL_APIENTRY glTexSubImage3D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -38,12 +38,16 @@ extern "C" {
#include <GLES3/gl3platform.h>
/* Generated on date 20140317 */
#ifndef GL_APIENTRYP
#define GL_APIENTRYP GL_APIENTRY*
#endif
/* Generated on date 20160428 */
/* Generated C header for:
* API: gles2
* Profile: common
* Versions considered: 2.[0-9]|3.[01]
* Versions considered: 2\.[0-9]|3\.[01]
* Versions emitted: .*
* Default extensions included: None
* Additional extensions included: _nomatch_^
@@ -374,6 +378,148 @@ typedef khronos_uint8_t GLubyte;
#define GL_RENDERBUFFER_BINDING 0x8CA7
#define GL_MAX_RENDERBUFFER_SIZE 0x84E8
#define GL_INVALID_FRAMEBUFFER_OPERATION 0x0506
typedef void (GL_APIENTRYP PFNGLACTIVETEXTUREPROC) (GLenum texture);
typedef void (GL_APIENTRYP PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
typedef void (GL_APIENTRYP PFNGLBINDFRAMEBUFFERPROC) (GLenum target, GLuint framebuffer);
typedef void (GL_APIENTRYP PFNGLBINDRENDERBUFFERPROC) (GLenum target, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLBINDTEXTUREPROC) (GLenum target, GLuint texture);
typedef void (GL_APIENTRYP PFNGLBLENDCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLBLENDEQUATIONSEPARATEPROC) (GLenum modeRGB, GLenum modeAlpha);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCPROC) (GLenum sfactor, GLenum dfactor);
typedef void (GL_APIENTRYP PFNGLBLENDFUNCSEPARATEPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
typedef void (GL_APIENTRYP PFNGLBUFFERDATAPROC) (GLenum target, GLsizeiptr size, const void *data, GLenum usage);
typedef void (GL_APIENTRYP PFNGLBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, const void *data);
typedef GLenum (GL_APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLCLEARPROC) (GLbitfield mask);
typedef void (GL_APIENTRYP PFNGLCLEARCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
typedef void (GL_APIENTRYP PFNGLCLEARDEPTHFPROC) (GLfloat d);
typedef void (GL_APIENTRYP PFNGLCLEARSTENCILPROC) (GLint s);
typedef void (GL_APIENTRYP PFNGLCOLORMASKPROC) (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha);
typedef void (GL_APIENTRYP PFNGLCOMPILESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOPYTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
typedef void (GL_APIENTRYP PFNGLCOPYTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
typedef GLuint (GL_APIENTRYP PFNGLCREATEPROGRAMPROC) (void);
typedef GLuint (GL_APIENTRYP PFNGLCREATESHADERPROC) (GLenum type);
typedef void (GL_APIENTRYP PFNGLCULLFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLDELETEFRAMEBUFFERSPROC) (GLsizei n, const GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLDELETEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLDELETERENDERBUFFERSPROC) (GLsizei n, const GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLDELETESHADERPROC) (GLuint shader);
typedef void (GL_APIENTRYP PFNGLDELETETEXTURESPROC) (GLsizei n, const GLuint *textures);
typedef void (GL_APIENTRYP PFNGLDEPTHFUNCPROC) (GLenum func);
typedef void (GL_APIENTRYP PFNGLDEPTHMASKPROC) (GLboolean flag);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEFPROC) (GLfloat n, GLfloat f);
typedef void (GL_APIENTRYP PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
typedef void (GL_APIENTRYP PFNGLDISABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSPROC) (GLenum mode, GLint first, GLsizei count);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices);
typedef void (GL_APIENTRYP PFNGLENABLEPROC) (GLenum cap);
typedef void (GL_APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
typedef void (GL_APIENTRYP PFNGLFINISHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFLUSHPROC) (void);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFERPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
typedef void (GL_APIENTRYP PFNGLFRONTFACEPROC) (GLenum mode);
typedef void (GL_APIENTRYP PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
typedef void (GL_APIENTRYP PFNGLGENERATEMIPMAPPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGENFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers);
typedef void (GL_APIENTRYP PFNGLGENRENDERBUFFERSPROC) (GLsizei n, GLuint *renderbuffers);
typedef void (GL_APIENTRYP PFNGLGENTEXTURESPROC) (GLsizei n, GLuint *textures);
typedef void (GL_APIENTRYP PFNGLGETACTIVEATTRIBPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETATTACHEDSHADERSPROC) (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *shaders);
typedef GLint (GL_APIENTRYP PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETBOOLEANVPROC) (GLenum pname, GLboolean *data);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef GLenum (GL_APIENTRYP PFNGLGETERRORPROC) (void);
typedef void (GL_APIENTRYP PFNGLGETFLOATVPROC) (GLenum pname, GLfloat *data);
typedef void (GL_APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETINTEGERVPROC) (GLenum pname, GLint *data);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLGETSHADERPRECISIONFORMATPROC) (GLenum shadertype, GLenum precisiontype, GLint *range, GLint *precision);
typedef void (GL_APIENTRYP PFNGLGETSHADERSOURCEPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source);
typedef const GLubyte *(GL_APIENTRYP PFNGLGETSTRINGPROC) (GLenum name);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETTEXPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMFVPROC) (GLuint program, GLint location, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMIVPROC) (GLuint program, GLint location, GLint *params);
typedef GLint (GL_APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBFVPROC) (GLuint index, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIVPROC) (GLuint index, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVPROC) (GLuint index, GLenum pname, void **pointer);
typedef void (GL_APIENTRYP PFNGLHINTPROC) (GLenum target, GLenum mode);
typedef GLboolean (GL_APIENTRYP PFNGLISBUFFERPROC) (GLuint buffer);
typedef GLboolean (GL_APIENTRYP PFNGLISENABLEDPROC) (GLenum cap);
typedef GLboolean (GL_APIENTRYP PFNGLISFRAMEBUFFERPROC) (GLuint framebuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISPROGRAMPROC) (GLuint program);
typedef GLboolean (GL_APIENTRYP PFNGLISRENDERBUFFERPROC) (GLuint renderbuffer);
typedef GLboolean (GL_APIENTRYP PFNGLISSHADERPROC) (GLuint shader);
typedef GLboolean (GL_APIENTRYP PFNGLISTEXTUREPROC) (GLuint texture);
typedef void (GL_APIENTRYP PFNGLLINEWIDTHPROC) (GLfloat width);
typedef void (GL_APIENTRYP PFNGLLINKPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLPIXELSTOREIPROC) (GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLPOLYGONOFFSETPROC) (GLfloat factor, GLfloat units);
typedef void (GL_APIENTRYP PFNGLREADPIXELSPROC) (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, void *pixels);
typedef void (GL_APIENTRYP PFNGLRELEASESHADERCOMPILERPROC) (void);
typedef void (GL_APIENTRYP PFNGLRENDERBUFFERSTORAGEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSAMPLECOVERAGEPROC) (GLfloat value, GLboolean invert);
typedef void (GL_APIENTRYP PFNGLSCISSORPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLSHADERBINARYPROC) (GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length);
typedef void (GL_APIENTRYP PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCPROC) (GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILFUNCSEPARATEPROC) (GLenum face, GLenum func, GLint ref, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKPROC) (GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILMASKSEPARATEPROC) (GLenum face, GLuint mask);
typedef void (GL_APIENTRYP PFNGLSTENCILOPPROC) (GLenum fail, GLenum zfail, GLenum zpass);
typedef void (GL_APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
typedef void (GL_APIENTRYP PFNGLTEXIMAGE2DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFPROC) (GLenum target, GLenum pname, GLfloat param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIPROC) (GLenum target, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLTEXPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params);
typedef void (GL_APIENTRYP PFNGLTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FPROC) (GLint location, GLfloat v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM1IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FPROC) (GLint location, GLfloat v0, GLfloat v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IPROC) (GLint location, GLint v0, GLint v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM2IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IPROC) (GLint location, GLint v0, GLint v1, GLint v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM3IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM4IVPROC) (GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUSEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVALIDATEPROGRAMPROC) (GLuint program);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FPROC) (GLuint index, GLfloat x);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB1FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FPROC) (GLuint index, GLfloat x, GLfloat y);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB2FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB3FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIB4FVPROC) (GLuint index, const GLfloat *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
typedef void (GL_APIENTRYP PFNGLVIEWPORTPROC) (GLint x, GLint y, GLsizei width, GLsizei height);
GL_APICALL void GL_APIENTRY glActiveTexture (GLenum texture);
GL_APICALL void GL_APIENTRY glAttachShader (GLuint program, GLuint shader);
GL_APICALL void GL_APIENTRY glBindAttribLocation (GLuint program, GLuint index, const GLchar *name);
@@ -705,6 +851,22 @@ typedef unsigned short GLhalf;
#define GL_COLOR_ATTACHMENT13 0x8CED
#define GL_COLOR_ATTACHMENT14 0x8CEE
#define GL_COLOR_ATTACHMENT15 0x8CEF
#define GL_COLOR_ATTACHMENT16 0x8CF0
#define GL_COLOR_ATTACHMENT17 0x8CF1
#define GL_COLOR_ATTACHMENT18 0x8CF2
#define GL_COLOR_ATTACHMENT19 0x8CF3
#define GL_COLOR_ATTACHMENT20 0x8CF4
#define GL_COLOR_ATTACHMENT21 0x8CF5
#define GL_COLOR_ATTACHMENT22 0x8CF6
#define GL_COLOR_ATTACHMENT23 0x8CF7
#define GL_COLOR_ATTACHMENT24 0x8CF8
#define GL_COLOR_ATTACHMENT25 0x8CF9
#define GL_COLOR_ATTACHMENT26 0x8CFA
#define GL_COLOR_ATTACHMENT27 0x8CFB
#define GL_COLOR_ATTACHMENT28 0x8CFC
#define GL_COLOR_ATTACHMENT29 0x8CFD
#define GL_COLOR_ATTACHMENT30 0x8CFE
#define GL_COLOR_ATTACHMENT31 0x8CFF
#define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE 0x8D56
#define GL_MAX_SAMPLES 0x8D57
#define GL_HALF_FLOAT 0x140B
@@ -826,7 +988,111 @@ typedef unsigned short GLhalf;
#define GL_MAX_ELEMENT_INDEX 0x8D6B
#define GL_NUM_SAMPLE_COUNTS 0x9380
#define GL_TEXTURE_IMMUTABLE_LEVELS 0x82DF
GL_APICALL void GL_APIENTRY glReadBuffer (GLenum mode);
typedef void (GL_APIENTRYP PFNGLREADBUFFERPROC) (GLenum src);
typedef void (GL_APIENTRYP PFNGLDRAWRANGEELEMENTSPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
typedef void (GL_APIENTRYP PFNGLTEXIMAGE3DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
typedef void (GL_APIENTRYP PFNGLCOPYTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
typedef void (GL_APIENTRYP PFNGLGENQUERIESPROC) (GLsizei n, GLuint *ids);
typedef void (GL_APIENTRYP PFNGLDELETEQUERIESPROC) (GLsizei n, const GLuint *ids);
typedef GLboolean (GL_APIENTRYP PFNGLISQUERYPROC) (GLuint id);
typedef void (GL_APIENTRYP PFNGLBEGINQUERYPROC) (GLenum target, GLuint id);
typedef void (GL_APIENTRYP PFNGLENDQUERYPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGETQUERYIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETQUERYOBJECTUIVPROC) (GLuint id, GLenum pname, GLuint *params);
typedef GLboolean (GL_APIENTRYP PFNGLUNMAPBUFFERPROC) (GLenum target);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPOINTERVPROC) (GLenum target, GLenum pname, void **params);
typedef void (GL_APIENTRYP PFNGLDRAWBUFFERSPROC) (GLsizei n, const GLenum *bufs);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX2X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX3X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLUNIFORMMATRIX4X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLBLITFRAMEBUFFERPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
typedef void (GL_APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
typedef void *(GL_APIENTRYP PFNGLMAPBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length, GLbitfield access);
typedef void (GL_APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length);
typedef void (GL_APIENTRYP PFNGLBINDVERTEXARRAYPROC) (GLuint array);
typedef void (GL_APIENTRYP PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays);
typedef void (GL_APIENTRYP PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays);
typedef GLboolean (GL_APIENTRYP PFNGLISVERTEXARRAYPROC) (GLuint array);
typedef void (GL_APIENTRYP PFNGLGETINTEGERI_VPROC) (GLenum target, GLuint index, GLint *data);
typedef void (GL_APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKPROC) (GLenum primitiveMode);
typedef void (GL_APIENTRYP PFNGLENDTRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERRANGEPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
typedef void (GL_APIENTRYP PFNGLBINDBUFFERBASEPROC) (GLenum target, GLuint index, GLuint buffer);
typedef void (GL_APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSPROC) (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
typedef void (GL_APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBIPOINTERPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIIVPROC) (GLuint index, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETVERTEXATTRIBIUIVPROC) (GLuint index, GLenum pname, GLuint *params);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4IPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4IVPROC) (GLuint index, const GLint *v);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBI4UIVPROC) (GLuint index, const GLuint *v);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMUIVPROC) (GLuint program, GLint location, GLuint *params);
typedef GLint (GL_APIENTRYP PFNGLGETFRAGDATALOCATIONPROC) (GLuint program, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UIPROC) (GLint location, GLuint v0);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UIPROC) (GLint location, GLuint v0, GLuint v1);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
typedef void (GL_APIENTRYP PFNGLUNIFORM1UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM2UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM3UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLUNIFORM4UIVPROC) (GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERIVPROC) (GLenum buffer, GLint drawbuffer, const GLint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERUIVPROC) (GLenum buffer, GLint drawbuffer, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERFVPROC) (GLenum buffer, GLint drawbuffer, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLCLEARBUFFERFIPROC) (GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
typedef const GLubyte *(GL_APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index);
typedef void (GL_APIENTRYP PFNGLCOPYBUFFERSUBDATAPROC) (GLenum readTarget, GLenum writeTarget, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
typedef void (GL_APIENTRYP PFNGLGETUNIFORMINDICESPROC) (GLuint program, GLsizei uniformCount, const GLchar *const*uniformNames, GLuint *uniformIndices);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMSIVPROC) (GLuint program, GLsizei uniformCount, const GLuint *uniformIndices, GLenum pname, GLint *params);
typedef GLuint (GL_APIENTRYP PFNGLGETUNIFORMBLOCKINDEXPROC) (GLuint program, const GLchar *uniformBlockName);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKIVPROC) (GLuint program, GLuint uniformBlockIndex, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKNAMEPROC) (GLuint program, GLuint uniformBlockIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformBlockName);
typedef void (GL_APIENTRYP PFNGLUNIFORMBLOCKBINDINGPROC) (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSINSTANCEDPROC) (GLenum mode, GLint first, GLsizei count, GLsizei instancecount);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSINSTANCEDPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount);
typedef GLsync (GL_APIENTRYP PFNGLFENCESYNCPROC) (GLenum condition, GLbitfield flags);
typedef GLboolean (GL_APIENTRYP PFNGLISSYNCPROC) (GLsync sync);
typedef void (GL_APIENTRYP PFNGLDELETESYNCPROC) (GLsync sync);
typedef GLenum (GL_APIENTRYP PFNGLCLIENTWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
typedef void (GL_APIENTRYP PFNGLWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
typedef void (GL_APIENTRYP PFNGLGETINTEGER64VPROC) (GLenum pname, GLint64 *data);
typedef void (GL_APIENTRYP PFNGLGETSYNCIVPROC) (GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
typedef void (GL_APIENTRYP PFNGLGETINTEGER64I_VPROC) (GLenum target, GLuint index, GLint64 *data);
typedef void (GL_APIENTRYP PFNGLGETBUFFERPARAMETERI64VPROC) (GLenum target, GLenum pname, GLint64 *params);
typedef void (GL_APIENTRYP PFNGLGENSAMPLERSPROC) (GLsizei count, GLuint *samplers);
typedef void (GL_APIENTRYP PFNGLDELETESAMPLERSPROC) (GLsizei count, const GLuint *samplers);
typedef GLboolean (GL_APIENTRYP PFNGLISSAMPLERPROC) (GLuint sampler);
typedef void (GL_APIENTRYP PFNGLBINDSAMPLERPROC) (GLuint unit, GLuint sampler);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERIPROC) (GLuint sampler, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, const GLint *param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERFPROC) (GLuint sampler, GLenum pname, GLfloat param);
typedef void (GL_APIENTRYP PFNGLSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, const GLfloat *param);
typedef void (GL_APIENTRYP PFNGLGETSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBDIVISORPROC) (GLuint index, GLuint divisor);
typedef void (GL_APIENTRYP PFNGLBINDTRANSFORMFEEDBACKPROC) (GLenum target, GLuint id);
typedef void (GL_APIENTRYP PFNGLDELETETRANSFORMFEEDBACKSPROC) (GLsizei n, const GLuint *ids);
typedef void (GL_APIENTRYP PFNGLGENTRANSFORMFEEDBACKSPROC) (GLsizei n, GLuint *ids);
typedef GLboolean (GL_APIENTRYP PFNGLISTRANSFORMFEEDBACKPROC) (GLuint id);
typedef void (GL_APIENTRYP PFNGLPAUSETRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLRESUMETRANSFORMFEEDBACKPROC) (void);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMBINARYPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLenum *binaryFormat, void *binary);
typedef void (GL_APIENTRYP PFNGLPROGRAMBINARYPROC) (GLuint program, GLenum binaryFormat, const void *binary, GLsizei length);
typedef void (GL_APIENTRYP PFNGLPROGRAMPARAMETERIPROC) (GLuint program, GLenum pname, GLint value);
typedef void (GL_APIENTRYP PFNGLINVALIDATEFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments);
typedef void (GL_APIENTRYP PFNGLINVALIDATESUBFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments, GLint x, GLint y, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLTEXSTORAGE2DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
typedef void (GL_APIENTRYP PFNGLTEXSTORAGE3DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
typedef void (GL_APIENTRYP PFNGLGETINTERNALFORMATIVPROC) (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint *params);
GL_APICALL void GL_APIENTRY glReadBuffer (GLenum src);
GL_APICALL void GL_APIENTRY glDrawRangeElements (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
GL_APICALL void GL_APIENTRY glTexImage3D (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
GL_APICALL void GL_APIENTRY glTexSubImage3D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
@@ -1107,6 +1373,74 @@ GL_APICALL void GL_APIENTRY glGetInternalformativ (GLenum target, GLenum interna
#define GL_MAX_VERTEX_ATTRIB_RELATIVE_OFFSET 0x82D9
#define GL_MAX_VERTEX_ATTRIB_BINDINGS 0x82DA
#define GL_MAX_VERTEX_ATTRIB_STRIDE 0x82E5
typedef void (GL_APIENTRYP PFNGLDISPATCHCOMPUTEPROC) (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z);
typedef void (GL_APIENTRYP PFNGLDISPATCHCOMPUTEINDIRECTPROC) (GLintptr indirect);
typedef void (GL_APIENTRYP PFNGLDRAWARRAYSINDIRECTPROC) (GLenum mode, const void *indirect);
typedef void (GL_APIENTRYP PFNGLDRAWELEMENTSINDIRECTPROC) (GLenum mode, GLenum type, const void *indirect);
typedef void (GL_APIENTRYP PFNGLFRAMEBUFFERPARAMETERIPROC) (GLenum target, GLenum pname, GLint param);
typedef void (GL_APIENTRYP PFNGLGETFRAMEBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMINTERFACEIVPROC) (GLuint program, GLenum programInterface, GLenum pname, GLint *params);
typedef GLuint (GL_APIENTRYP PFNGLGETPROGRAMRESOURCEINDEXPROC) (GLuint program, GLenum programInterface, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMRESOURCENAMEPROC) (GLuint program, GLenum programInterface, GLuint index, GLsizei bufSize, GLsizei *length, GLchar *name);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMRESOURCEIVPROC) (GLuint program, GLenum programInterface, GLuint index, GLsizei propCount, const GLenum *props, GLsizei bufSize, GLsizei *length, GLint *params);
typedef GLint (GL_APIENTRYP PFNGLGETPROGRAMRESOURCELOCATIONPROC) (GLuint program, GLenum programInterface, const GLchar *name);
typedef void (GL_APIENTRYP PFNGLUSEPROGRAMSTAGESPROC) (GLuint pipeline, GLbitfield stages, GLuint program);
typedef void (GL_APIENTRYP PFNGLACTIVESHADERPROGRAMPROC) (GLuint pipeline, GLuint program);
typedef GLuint (GL_APIENTRYP PFNGLCREATESHADERPROGRAMVPROC) (GLenum type, GLsizei count, const GLchar *const*strings);
typedef void (GL_APIENTRYP PFNGLBINDPROGRAMPIPELINEPROC) (GLuint pipeline);
typedef void (GL_APIENTRYP PFNGLDELETEPROGRAMPIPELINESPROC) (GLsizei n, const GLuint *pipelines);
typedef void (GL_APIENTRYP PFNGLGENPROGRAMPIPELINESPROC) (GLsizei n, GLuint *pipelines);
typedef GLboolean (GL_APIENTRYP PFNGLISPROGRAMPIPELINEPROC) (GLuint pipeline);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMPIPELINEIVPROC) (GLuint pipeline, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1IPROC) (GLuint program, GLint location, GLint v0);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2IPROC) (GLuint program, GLint location, GLint v0, GLint v1);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3IPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4IPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1UIPROC) (GLuint program, GLint location, GLuint v0);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1FPROC) (GLuint program, GLint location, GLfloat v0);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM1FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM2FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM3FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORM4FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
typedef void (GL_APIENTRYP PFNGLVALIDATEPROGRAMPIPELINEPROC) (GLuint pipeline);
typedef void (GL_APIENTRYP PFNGLGETPROGRAMPIPELINEINFOLOGPROC) (GLuint pipeline, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
typedef void (GL_APIENTRYP PFNGLBINDIMAGETEXTUREPROC) (GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format);
typedef void (GL_APIENTRYP PFNGLGETBOOLEANI_VPROC) (GLenum target, GLuint index, GLboolean *data);
typedef void (GL_APIENTRYP PFNGLMEMORYBARRIERPROC) (GLbitfield barriers);
typedef void (GL_APIENTRYP PFNGLMEMORYBARRIERBYREGIONPROC) (GLbitfield barriers);
typedef void (GL_APIENTRYP PFNGLTEXSTORAGE2DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
typedef void (GL_APIENTRYP PFNGLGETMULTISAMPLEFVPROC) (GLenum pname, GLuint index, GLfloat *val);
typedef void (GL_APIENTRYP PFNGLSAMPLEMASKIPROC) (GLuint maskNumber, GLbitfield mask);
typedef void (GL_APIENTRYP PFNGLGETTEXLEVELPARAMETERIVPROC) (GLenum target, GLint level, GLenum pname, GLint *params);
typedef void (GL_APIENTRYP PFNGLGETTEXLEVELPARAMETERFVPROC) (GLenum target, GLint level, GLenum pname, GLfloat *params);
typedef void (GL_APIENTRYP PFNGLBINDVERTEXBUFFERPROC) (GLuint bindingindex, GLuint buffer, GLintptr offset, GLsizei stride);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBFORMATPROC) (GLuint attribindex, GLint size, GLenum type, GLboolean normalized, GLuint relativeoffset);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBIFORMATPROC) (GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
typedef void (GL_APIENTRYP PFNGLVERTEXATTRIBBINDINGPROC) (GLuint attribindex, GLuint bindingindex);
typedef void (GL_APIENTRYP PFNGLVERTEXBINDINGDIVISORPROC) (GLuint bindingindex, GLuint divisor);
GL_APICALL void GL_APIENTRY glDispatchCompute (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z);
GL_APICALL void GL_APIENTRY glDispatchComputeIndirect (GLintptr indirect);
GL_APICALL void GL_APIENTRY glDrawArraysIndirect (GLenum mode, const void *indirect);

1817
include/GLES3/gl32.h Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
[*.h]
indent_style = space
indent_size = 4

View File

@@ -184,7 +184,7 @@ mtx_destroy(mtx_t *mtx)
* Thus the linker will be happy and things don't clash when building
* with -O1 or greater.
*/
#ifdef HAVE_FUNC_ATTRIBUTE_WEAK
#if defined(HAVE_FUNC_ATTRIBUTE_WEAK) && !defined(__CYGWIN__)
__attribute__((weak))
int pthread_mutexattr_init(pthread_mutexattr_t *attr);

View File

@@ -36,8 +36,8 @@
*/
#if defined(_MSC_VER)
# if _MSC_VER < 1800
# error "Microsoft Visual Studio 2013 or higher required"
# if _MSC_VER < 1800 || (_MSC_FULL_VER < 180031101 && !defined(__clang__))
# error "Microsoft Visual Studio 2013 Update 4 or higher required"
# endif
/*

View File

@@ -0,0 +1,3 @@
[*.h]
indent_style = space
indent_size = 4

View File

@@ -137,6 +137,7 @@ CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4e)")
CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5908, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
@@ -149,13 +150,10 @@ CHIPSET(0x591B, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591D, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591E, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593A, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593D, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")

View File

@@ -0,0 +1,3 @@
[*.h]
indent_style = space
indent_size = 4

View File

@@ -13,8 +13,8 @@ all-local : .install-gallium-links
fi; \
$(MKDIR_P) $$link_dir; \
file_list="$(dri_LTLIBRARIES:%.la=.libs/%.so)"; \
file_list+="$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
file_list+="$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
file_list="$$file_list$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
file_list="$$file_list$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
for f in $$file_list; do \
if test -h .libs/$$f; then \
cp -d $$f $$link_dir; \

View File

@@ -1,72 +0,0 @@
# ===========================================================================
# http://www.gnu.org/software/autoconf-archive/ax_check_compile_flag.html
# ===========================================================================
#
# SYNOPSIS
#
# AX_CHECK_COMPILE_FLAG(FLAG, [ACTION-SUCCESS], [ACTION-FAILURE], [EXTRA-FLAGS])
#
# DESCRIPTION
#
# Check whether the given FLAG works with the current language's compiler
# or gives an error. (Warnings, however, are ignored)
#
# ACTION-SUCCESS/ACTION-FAILURE are shell commands to execute on
# success/failure.
#
# If EXTRA-FLAGS is defined, it is added to the current language's default
# flags (e.g. CFLAGS) when the check is done. The check is thus made with
# the flags: "CFLAGS EXTRA-FLAGS FLAG". This can for example be used to
# force the compiler to issue an error when a bad flag is given.
#
# NOTE: Implementation based on AX_CFLAGS_GCC_OPTION. Please keep this
# macro in sync with AX_CHECK_{PREPROC,LINK}_FLAG.
#
# LICENSE
#
# Copyright (c) 2008 Guido U. Draheim <guidod@gmx.de>
# Copyright (c) 2011 Maarten Bosmans <mkbosmans@gmail.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program. If not, see <http://www.gnu.org/licenses/>.
#
# As a special exception, the respective Autoconf Macro's copyright owner
# gives unlimited permission to copy, distribute and modify the configure
# scripts that are the output of Autoconf when processing the Macro. You
# need not follow the terms of the GNU General Public License when using
# or distributing such scripts, even though portions of the text of the
# Macro appear in them. The GNU General Public License (GPL) does govern
# all other use of the material that constitutes the Autoconf Macro.
#
# This special exception to the GPL applies to versions of the Autoconf
# Macro released by the Autoconf Archive. When you make and distribute a
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.
#serial 2
AC_DEFUN([AX_CHECK_COMPILE_FLAG],
[AC_PREREQ(2.59)dnl for _AC_LANG_PREFIX
AS_VAR_PUSHDEF([CACHEVAR],[ax_cv_check_[]_AC_LANG_ABBREV[]flags_$4_$1])dnl
AC_CACHE_CHECK([whether _AC_LANG compiler accepts $1], CACHEVAR, [
ax_check_save_flags=$[]_AC_LANG_PREFIX[]FLAGS
_AC_LANG_PREFIX[]FLAGS="$[]_AC_LANG_PREFIX[]FLAGS $4 $1"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
[AS_VAR_SET(CACHEVAR,[yes])],
[AS_VAR_SET(CACHEVAR,[no])])
_AC_LANG_PREFIX[]FLAGS=$ax_check_save_flags])
AS_IF([test x"AS_VAR_GET(CACHEVAR)" = xyes],
[m4_default([$2], :)],
[m4_default([$3], :)])
AS_VAR_POPDEF([CACHEVAR])dnl
])dnl AX_CHECK_COMPILE_FLAGS

View File

@@ -103,8 +103,14 @@ def python_scan(node, env, path):
# http://www.scons.org/doc/0.98.5/HTML/scons-user/c2781.html#AEN2789
# https://docs.python.org/2/library/modulefinder.html
contents = node.get_contents()
source_dir = node.get_dir()
finder = modulefinder.ModuleFinder()
# Tell ModuleFinder to search dependencies in the script dir, and the glapi
# dirs
source_dir = node.get_dir().abspath
GLAPI = env.Dir('#src/mapi/glapi/gen').abspath
path = [source_dir, GLAPI] + sys.path
finder = modulefinder.ModuleFinder(path=path)
finder.run_script(node.abspath)
results = []
for name, mod in finder.modules.iteritems():

View File

@@ -256,7 +256,7 @@ def generate(env):
if env['build'] == 'profile':
env['debug'] = False
env['profile'] = True
if env['build'] == 'release':
if env['build'] in ('release', 'opt'):
env['debug'] = False
env['profile'] = False
@@ -301,6 +301,8 @@ def generate(env):
cppdefines += ['NDEBUG']
if env['build'] == 'profile':
cppdefines += ['PROFILE']
if env['build'] in ('opt', 'profile'):
cppdefines += ['VMX86_STATS']
if env['platform'] in ('posix', 'linux', 'freebsd', 'darwin'):
cppdefines += [
'_POSIX_SOURCE',
@@ -450,7 +452,7 @@ def generate(env):
ccflags += [
'/O2', # optimize for speed
]
if env['build'] == 'release':
if env['build'] in ('release', 'opt'):
if not env['clang']:
ccflags += [
'/GL', # enable whole program optimization
@@ -561,7 +563,7 @@ def generate(env):
shlinkflags += ['-Wl,--enable-stdcall-fixup']
#shlinkflags += ['-Wl,--kill-at']
if msvc:
if env['build'] == 'release' and not env['clang']:
if env['build'] in ('release', 'opt') and not env['clang']:
# enable Link-time Code Generation
linkflags += ['/LTCG']
env.Append(ARFLAGS = ['/LTCG'])
@@ -650,7 +652,6 @@ def generate(env):
env.PkgCheckModules('XCB', ['x11-xcb', 'xcb-glx >= 1.8.1', 'xcb-dri2 >= 1.8'])
env.PkgCheckModules('XF86VIDMODE', ['xxf86vm'])
env.PkgCheckModules('DRM', ['libdrm >= 2.4.38'])
env.PkgCheckModules('UDEV', ['libudev >= 151'])
if env['x11']:
env.Append(CPPPATH = env['X11_CPPPATH'])

View File

@@ -865,7 +865,7 @@ sub top_of_mesa_tree {
$lk_path .= "/";
}
if ( (-f "${lk_path}docs/mesa.css")
&& (-f "${lk_path}docs/GL3.txt")
&& (-f "${lk_path}docs/features.txt")
&& (-f "${lk_path}src/mesa/main/version.c")
&& (-f "${lk_path}REVIEWERS")
&& (-d "${lk_path}scripts")) {

View File

@@ -19,22 +19,65 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
git_sha1.h:
.PHONY: git_sha1.h.tmp
git_sha1.h.tmp:
@# Don't assume that $(top_srcdir)/.git is a directory. It may be
@# a gitlink file if $(top_srcdir) is a submodule checkout or a linked
@# worktree.
@# If we are building from a release tarball copy the bundled header.
@touch git_sha1.h.tmp
@if test -e $(top_srcdir)/.git; then \
if which git > /dev/null; then \
git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
> git_sha1.h ; \
> git_sha1.h.tmp ; \
fi \
fi
git_sha1.h: git_sha1.h.tmp
@echo "updating git_sha1.h"
@if ! cmp -s git_sha1.h.tmp git_sha1.h; then \
mv git_sha1.h.tmp git_sha1.h ;\
else \
rm git_sha1.h.tmp ;\
fi
BUILT_SOURCES = git_sha1.h
CLEANFILES = $(BUILT_SOURCES)
SUBDIRS = . gtest util mapi/glapi/gen mapi
if HAVE_OPENGL
gldir = $(includedir)/GL
gl_HEADERS = \
$(top_srcdir)/include/GL/gl.h \
$(top_srcdir)/include/GL/glext.h \
$(top_srcdir)/include/GL/glcorearb.h \
$(top_srcdir)/include/GL/gl_mangle.h
endif
if HAVE_GLX
glxdir = $(includedir)/GL
glx_HEADERS = \
$(top_srcdir)/include/GL/glx.h \
$(top_srcdir)/include/GL/glxext.h \
$(top_srcdir)/include/GL/glx_mangle.h
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = mesa/gl.pc
endif
if HAVE_COMMON_OSMESA
osmesadir = $(includedir)/GL
osmesa_HEADERS = $(top_srcdir)/include/GL/osmesa.h
endif
# include only conditionally ?
SUBDIRS += compiler
if HAVE_AMD_DRIVERS
SUBDIRS += amd
endif
if HAVE_INTEL_DRIVERS
SUBDIRS += intel
endif
@@ -68,17 +111,32 @@ if HAVE_EGL
SUBDIRS += egl
endif
if HAVE_INTEL_DRIVERS
SUBDIRS += intel/tools
endif
if HAVE_VULKAN_COMMON
SUBDIRS += vulkan/wsi
endif
## Requires the i965 compiler (part of mesa) and wayland-drm
if HAVE_INTEL_VULKAN
SUBDIRS += intel/vulkan
endif
# Requires wayland-drm
if HAVE_RADEON_VULKAN
SUBDIRS += amd/common
SUBDIRS += amd/vulkan
endif
if HAVE_GALLIUM
SUBDIRS += gallium
endif
EXTRA_DIST = \
getopt hgl SConscript git_sha1.h
getopt hgl SConscript \
$(top_srcdir)/include/GL/mesa_glinterop.h
AM_CFLAGS = $(VISIBILITY_CFLAGS)
AM_CXXFLAGS = $(VISIBILITY_CXXFLAGS)
@@ -87,12 +145,15 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/include/ \
-I$(top_srcdir)/src/mapi/ \
-I$(top_srcdir)/src/mesa/ \
-I$(top_srcdir)/src/gallium/include \
-I$(top_srcdir)/src/gallium/auxiliary \
$(DEFINES)
noinst_LTLIBRARIES = libglsl_util.la
libglsl_util_la_SOURCES = \
mesa/main/extensions_table.c \
mesa/main/imports.c \
mesa/program/prog_hash_table.c \
mesa/program/prog_parameter.c \
mesa/program/symbol_table.c \
mesa/program/dummy_errors.c

View File

@@ -1,5 +1,8 @@
Import('*')
import filecmp
import os
import subprocess
Import('*')
if env['platform'] == 'windows':
SConscript('getopt/SConscript')
@@ -12,6 +15,50 @@ if env['hostonly']:
# compilation
Return()
def write_git_sha1_h_file(filename):
"""Mesa looks for a git_sha1.h file at compile time in order to display
the current git hash id in the GL_VERSION string. This function tries
to retrieve the git hashid and write the header file. An empty file
will be created if anything goes wrong."""
args = [ 'git', 'rev-parse', '--short=10', 'HEAD' ]
try:
(commit, foo) = subprocess.Popen(args, stdout=subprocess.PIPE).communicate()
except:
print "Warning: exception in write_git_sha1_h_file()"
# git log command didn't work
if not os.path.exists(filename):
dirname = os.path.dirname(filename)
if dirname and not os.path.exists(dirname):
os.makedirs(dirname)
# create an empty file if none already exists
f = open(filename, "w")
f.close()
return
# note that commit[:-1] removes the trailing newline character
commit = '#define MESA_GIT_SHA1 "git-%s"\n' % commit[:-1]
tempfile = "git_sha1.h.tmp"
f = open(tempfile, "w")
f.write(commit)
f.close()
if not os.path.exists(filename) or not filecmp.cmp(tempfile, filename):
# The filename does not exist or it's different from the new file,
# so replace old file with new.
if os.path.exists(filename):
os.remove(filename)
os.rename(tempfile, filename)
return
# Create the git_sha1.h header file
write_git_sha1_h_file("git_sha1.h")
# and update CPPPATH so the git_sha1.h header can be found
env.Append(CPPPATH = ["#" + env['build_dir']])
if env['platform'] != 'windows':
SConscript('loader/SConscript')

View File

@@ -0,0 +1,44 @@
# Copyright © 2016 Red Hat.
# Copyright © 2016 Mauro Rossi <issor.oruam@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
# ---------------------------------------
# Build libmesa_amdgpu_addrlib
# ---------------------------------------
include $(CLEAR_VARS)
LOCAL_MODULE := libmesa_amdgpu_addrlib
LOCAL_SRC_FILES := $(ADDRLIB_FILES)
LOCAL_CFLAGS := -DBRAHMA_BUILD=1
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src \
$(MESA_TOP)/src/amd/common \
$(MESA_TOP)/src/amd/addrlib \
$(MESA_TOP)/src/amd/addrlib/core \
$(MESA_TOP)/src/amd/addrlib/inc/chip/r800 \
$(MESA_TOP)/src/amd/addrlib/r800/chip
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

28
src/amd/Android.mk Normal file
View File

@@ -0,0 +1,28 @@
# Copyright © 2016 Red Hat.
# Copyright © 2016 Mauro Rossi <issor.oruam@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
LOCAL_PATH := $(call my-dir)
# Import variables
include $(LOCAL_PATH)/Makefile.sources
include $(LOCAL_PATH)/Android.addrlib.mk

View File

@@ -0,0 +1,38 @@
# Copyright 2016 Red Hat Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
ADDRLIB_LIBS = addrlib/libamdgpu_addrlib.la
addrlib_libamdgpu_addrlib_la_CPPFLAGS = \
-I$(top_srcdir)/src/ \
-I$(srcdir)/common \
-I$(srcdir)/addrlib \
-I$(srcdir)/addrlib/core \
-I$(srcdir)/addrlib/inc/chip/r800 \
-I$(srcdir)/addrlib/r800/chip \
-DBRAHMA_BUILD=1
addrlib_libamdgpu_addrlib_la_CXXFLAGS = \
$(VISIBILITY_CXXFLAGS)
noinst_LTLIBRARIES += $(ADDRLIB_LIBS)
addrlib_libamdgpu_addrlib_la_SOURCES = $(ADDRLIB_FILES)

27
src/amd/Makefile.am Normal file
View File

@@ -0,0 +1,27 @@
# Copyright © 2016 Red Hat.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
include Makefile.sources
noinst_LTLIBRARIES =
EXTRA_DIST = $(COMMON_HEADER_FILES)
include Makefile.addrlib.am

27
src/amd/Makefile.sources Normal file
View File

@@ -0,0 +1,27 @@
COMMON_HEADER_FILES = \
common/sid.h \
common/r600d_common.h \
common/amd_family.h \
common/amd_kernel_code_t.h \
common/amdgpu_id.h
ADDRLIB_FILES = \
addrlib/addrinterface.cpp \
addrlib/addrinterface.h \
addrlib/addrtypes.h \
addrlib/core/addrcommon.h \
addrlib/core/addrelemlib.cpp \
addrlib/core/addrelemlib.h \
addrlib/core/addrlib.cpp \
addrlib/core/addrlib.h \
addrlib/core/addrobject.cpp \
addrlib/core/addrobject.h \
addrlib/inc/chip/r800/si_gb_reg.h \
addrlib/inc/lnx_common_defs.h \
addrlib/r800/chip/si_ci_vi_merged_enum.h \
addrlib/r800/ciaddrlib.cpp \
addrlib/r800/ciaddrlib.h \
addrlib/r800/egbaddrlib.cpp \
addrlib/r800/egbaddrlib.h \
addrlib/r800/siaddrlib.cpp \
addrlib/r800/siaddrlib.h

View File

@@ -0,0 +1,51 @@
# Copyright © 2016 Bas Nieuwenhuizen
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
include Makefile.sources
# TODO cleanup these
AM_CPPFLAGS = \
$(VALGRIND_CFLAGS) \
$(DEFINES) \
-I$(top_srcdir)/include \
-I$(top_builddir)/src \
-I$(top_srcdir)/src \
-I$(top_builddir)/src/compiler \
-I$(top_builddir)/src/compiler/nir \
-I$(top_srcdir)/src/compiler \
-I$(top_srcdir)/src/mapi \
-I$(top_srcdir)/src/mesa \
-I$(top_srcdir)/src/mesa/drivers/dri/common \
-I$(top_srcdir)/src/gallium/auxiliary \
-I$(top_srcdir)/src/gallium/include
AM_CFLAGS = $(VISIBILITY_CFLAGS) \
$(PTHREAD_CFLAGS) \
$(LLVM_CFLAGS) \
$(LIBELF_CFLAGS)
AM_CXXFLAGS = \
$(VISIBILITY_CXXFLAGS) \
$(LLVM_CXXFLAGS)
noinst_LTLIBRARIES = libamd_common.la
libamd_common_la_SOURCES = $(AMD_COMPILER_SOURCES)

View File

@@ -0,0 +1,29 @@
# Copyright © 2016 Bas Nieuwenhuizen
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
AMD_COMPILER_SOURCES := \
ac_binary.c \
ac_binary.h \
ac_llvm_helper.cpp \
ac_llvm_util.c \
ac_llvm_util.h \
ac_nir_to_llvm.c \
ac_nir_to_llvm.h

288
src/amd/common/ac_binary.c Normal file
View File

@@ -0,0 +1,288 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors: Tom Stellard <thomas.stellard@amd.com>
*
* Based on radeon_elf_util.c.
*/
#include "ac_binary.h"
#include "util/u_math.h"
#include "util/u_memory.h"
#include <gelf.h>
#include <libelf.h>
#include <stdio.h>
#include <sid.h>
#define SPILLED_SGPRS 0x4
#define SPILLED_VGPRS 0x8
static void parse_symbol_table(Elf_Data *symbol_table_data,
const GElf_Shdr *symbol_table_header,
struct ac_shader_binary *binary)
{
GElf_Sym symbol;
unsigned i = 0;
unsigned symbol_count =
symbol_table_header->sh_size / symbol_table_header->sh_entsize;
/* We are over allocating this list, because symbol_count gives the
* total number of symbols, and we will only be filling the list
* with offsets of global symbols. The memory savings from
* allocating the correct size of this list will be small, and
* I don't think it is worth the cost of pre-computing the number
* of global symbols.
*/
binary->global_symbol_offsets = CALLOC(symbol_count, sizeof(uint64_t));
while (gelf_getsym(symbol_table_data, i++, &symbol)) {
unsigned i;
if (GELF_ST_BIND(symbol.st_info) != STB_GLOBAL ||
symbol.st_shndx == 0 /* Undefined symbol */) {
continue;
}
binary->global_symbol_offsets[binary->global_symbol_count] =
symbol.st_value;
/* Sort the list using bubble sort. This list will usually
* be small. */
for (i = binary->global_symbol_count; i > 0; --i) {
uint64_t lhs = binary->global_symbol_offsets[i - 1];
uint64_t rhs = binary->global_symbol_offsets[i];
if (lhs < rhs) {
break;
}
binary->global_symbol_offsets[i] = lhs;
binary->global_symbol_offsets[i - 1] = rhs;
}
++binary->global_symbol_count;
}
}
static void parse_relocs(Elf *elf, Elf_Data *relocs, Elf_Data *symbols,
unsigned symbol_sh_link,
struct ac_shader_binary *binary)
{
unsigned i;
if (!relocs || !symbols || !binary->reloc_count) {
return;
}
binary->relocs = CALLOC(binary->reloc_count,
sizeof(struct ac_shader_reloc));
for (i = 0; i < binary->reloc_count; i++) {
GElf_Sym symbol;
GElf_Rel rel;
char *symbol_name;
struct ac_shader_reloc *reloc = &binary->relocs[i];
gelf_getrel(relocs, i, &rel);
gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &symbol);
symbol_name = elf_strptr(elf, symbol_sh_link, symbol.st_name);
reloc->offset = rel.r_offset;
strncpy(reloc->name, symbol_name, sizeof(reloc->name)-1);
reloc->name[sizeof(reloc->name)-1] = 0;
}
}
void ac_elf_read(const char *elf_data, unsigned elf_size,
struct ac_shader_binary *binary)
{
char *elf_buffer;
Elf *elf;
Elf_Scn *section = NULL;
Elf_Data *symbols = NULL, *relocs = NULL;
size_t section_str_index;
unsigned symbol_sh_link = 0;
/* One of the libelf implementations
* (http://www.mr511.de/software/english.htm) requires calling
* elf_version() before elf_memory().
*/
elf_version(EV_CURRENT);
elf_buffer = MALLOC(elf_size);
memcpy(elf_buffer, elf_data, elf_size);
elf = elf_memory(elf_buffer, elf_size);
elf_getshdrstrndx(elf, &section_str_index);
while ((section = elf_nextscn(elf, section))) {
const char *name;
Elf_Data *section_data = NULL;
GElf_Shdr section_header;
if (gelf_getshdr(section, &section_header) != &section_header) {
fprintf(stderr, "Failed to read ELF section header\n");
return;
}
name = elf_strptr(elf, section_str_index, section_header.sh_name);
if (!strcmp(name, ".text")) {
section_data = elf_getdata(section, section_data);
binary->code_size = section_data->d_size;
binary->code = MALLOC(binary->code_size * sizeof(unsigned char));
memcpy(binary->code, section_data->d_buf, binary->code_size);
} else if (!strcmp(name, ".AMDGPU.config")) {
section_data = elf_getdata(section, section_data);
binary->config_size = section_data->d_size;
binary->config = MALLOC(binary->config_size * sizeof(unsigned char));
memcpy(binary->config, section_data->d_buf, binary->config_size);
} else if (!strcmp(name, ".AMDGPU.disasm")) {
/* Always read disassembly if it's available. */
section_data = elf_getdata(section, section_data);
binary->disasm_string = strndup(section_data->d_buf,
section_data->d_size);
} else if (!strncmp(name, ".rodata", 7)) {
section_data = elf_getdata(section, section_data);
binary->rodata_size = section_data->d_size;
binary->rodata = MALLOC(binary->rodata_size * sizeof(unsigned char));
memcpy(binary->rodata, section_data->d_buf, binary->rodata_size);
} else if (!strncmp(name, ".symtab", 7)) {
symbols = elf_getdata(section, section_data);
symbol_sh_link = section_header.sh_link;
parse_symbol_table(symbols, &section_header, binary);
} else if (!strcmp(name, ".rel.text")) {
relocs = elf_getdata(section, section_data);
binary->reloc_count = section_header.sh_size /
section_header.sh_entsize;
}
}
parse_relocs(elf, relocs, symbols, symbol_sh_link, binary);
if (elf){
elf_end(elf);
}
FREE(elf_buffer);
/* Cache the config size per symbol */
if (binary->global_symbol_count) {
binary->config_size_per_symbol =
binary->config_size / binary->global_symbol_count;
} else {
binary->global_symbol_count = 1;
binary->config_size_per_symbol = binary->config_size;
}
}
static
const unsigned char *ac_shader_binary_config_start(
const struct ac_shader_binary *binary,
uint64_t symbol_offset)
{
unsigned i;
for (i = 0; i < binary->global_symbol_count; ++i) {
if (binary->global_symbol_offsets[i] == symbol_offset) {
unsigned offset = i * binary->config_size_per_symbol;
return binary->config + offset;
}
}
return binary->config;
}
static const char *scratch_rsrc_dword0_symbol =
"SCRATCH_RSRC_DWORD0";
static const char *scratch_rsrc_dword1_symbol =
"SCRATCH_RSRC_DWORD1";
void ac_shader_binary_read_config(struct ac_shader_binary *binary,
struct ac_shader_config *conf,
unsigned symbol_offset)
{
unsigned i;
const unsigned char *config =
ac_shader_binary_config_start(binary, symbol_offset);
bool really_needs_scratch = false;
/* LLVM adds SGPR spills to the scratch size.
* Find out if we really need the scratch buffer.
*/
for (i = 0; i < binary->reloc_count; i++) {
const struct ac_shader_reloc *reloc = &binary->relocs[i];
if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
!strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
really_needs_scratch = true;
break;
}
}
for (i = 0; i < binary->config_size_per_symbol; i+= 8) {
unsigned reg = util_le32_to_cpu(*(uint32_t*)(config + i));
unsigned value = util_le32_to_cpu(*(uint32_t*)(config + i + 4));
switch (reg) {
case R_00B028_SPI_SHADER_PGM_RSRC1_PS:
case R_00B128_SPI_SHADER_PGM_RSRC1_VS:
case R_00B228_SPI_SHADER_PGM_RSRC1_GS:
case R_00B848_COMPUTE_PGM_RSRC1:
conf->num_sgprs = MAX2(conf->num_sgprs, (G_00B028_SGPRS(value) + 1) * 8);
conf->num_vgprs = MAX2(conf->num_vgprs, (G_00B028_VGPRS(value) + 1) * 4);
conf->float_mode = G_00B028_FLOAT_MODE(value);
break;
case R_00B02C_SPI_SHADER_PGM_RSRC2_PS:
conf->lds_size = MAX2(conf->lds_size, G_00B02C_EXTRA_LDS_SIZE(value));
break;
case R_00B84C_COMPUTE_PGM_RSRC2:
conf->lds_size = MAX2(conf->lds_size, G_00B84C_LDS_SIZE(value));
break;
case R_0286CC_SPI_PS_INPUT_ENA:
conf->spi_ps_input_ena = value;
break;
case R_0286D0_SPI_PS_INPUT_ADDR:
conf->spi_ps_input_addr = value;
break;
case R_0286E8_SPI_TMPRING_SIZE:
case R_00B860_COMPUTE_TMPRING_SIZE:
/* WAVESIZE is in units of 256 dwords. */
if (really_needs_scratch)
conf->scratch_bytes_per_wave =
G_00B860_WAVESIZE(value) * 256 * 4;
break;
case SPILLED_SGPRS:
conf->spilled_sgprs = value;
break;
case SPILLED_VGPRS:
conf->spilled_vgprs = value;
break;
default:
{
static bool printed;
if (!printed) {
fprintf(stderr, "Warning: LLVM emitted unknown "
"config register: 0x%x\n", reg);
printed = true;
}
}
break;
}
if (!conf->spi_ps_input_addr)
conf->spi_ps_input_addr = conf->spi_ps_input_ena;
}
}

View File

@@ -0,0 +1,88 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Authors: Tom Stellard <thomas.stellard@amd.com>
*
*/
#pragma once
#include <stdint.h>
struct ac_shader_reloc {
char name[32];
uint64_t offset;
};
struct ac_shader_binary {
/** Shader code */
unsigned char *code;
unsigned code_size;
/** Config/Context register state that accompanies this shader.
* This is a stream of dword pairs. First dword contains the
* register address, the second dword contains the value.*/
unsigned char *config;
unsigned config_size;
/** The number of bytes of config information for each global symbol.
*/
unsigned config_size_per_symbol;
/** Constant data accessed by the shader. This will be uploaded
* into a constant buffer. */
unsigned char *rodata;
unsigned rodata_size;
/** List of symbol offsets for the shader */
uint64_t *global_symbol_offsets;
unsigned global_symbol_count;
struct ac_shader_reloc *relocs;
unsigned reloc_count;
/** Disassembled shader in a string. */
char *disasm_string;
};
struct ac_shader_config {
unsigned num_sgprs;
unsigned num_vgprs;
unsigned spilled_sgprs;
unsigned spilled_vgprs;
unsigned lds_size;
unsigned spi_ps_input_ena;
unsigned spi_ps_input_addr;
unsigned float_mode;
unsigned scratch_bytes_per_wave;
};
/*
* Parse the elf binary stored in \p elf_data and create a
* ac_shader_binary object.
*/
void ac_elf_read(const char *elf_data, unsigned elf_size,
struct ac_shader_binary *binary);
void ac_shader_binary_read_config(struct ac_shader_binary *binary,
struct ac_shader_config *conf,
unsigned symbol_offset);

View File

@@ -0,0 +1,46 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
*/
/* based on Marek's patch to lp_bld_misc.cpp */
// Workaround http://llvm.org/PR23628
#if HAVE_LLVM >= 0x0307
# pragma push_macro("DEBUG")
# undef DEBUG
#endif
#include "ac_nir_to_llvm.h"
#include <llvm-c/Core.h>
#include <llvm/Target/TargetOptions.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
extern "C" void
ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
{
llvm::Argument *A = llvm::unwrap<llvm::Argument>(val);
llvm::AttrBuilder B;
B.addDereferenceableAttr(bytes);
A->addAttr(llvm::AttributeSet::get(A->getContext(), A->getArgNo() + 1, B));
}

View File

@@ -0,0 +1,142 @@
/*
* Copyright 2014 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
*/
/* based on pieces from si_pipe.c and radeon_llvm_emit.c */
#include "ac_llvm_util.h"
#include <llvm-c/Core.h>
#include "c11/threads.h"
#include <assert.h>
#include <stdio.h>
static void ac_init_llvm_target()
{
#if HAVE_LLVM < 0x0307
LLVMInitializeR600TargetInfo();
LLVMInitializeR600Target();
LLVMInitializeR600TargetMC();
LLVMInitializeR600AsmPrinter();
#else
LLVMInitializeAMDGPUTargetInfo();
LLVMInitializeAMDGPUTarget();
LLVMInitializeAMDGPUTargetMC();
LLVMInitializeAMDGPUAsmPrinter();
#endif
}
static once_flag ac_init_llvm_target_once_flag = ONCE_FLAG_INIT;
static LLVMTargetRef ac_get_llvm_target(const char *triple)
{
LLVMTargetRef target = NULL;
char *err_message = NULL;
call_once(&ac_init_llvm_target_once_flag, ac_init_llvm_target);
if (LLVMGetTargetFromTriple(triple, &target, &err_message)) {
fprintf(stderr, "Cannot find target for triple %s ", triple);
if (err_message) {
fprintf(stderr, "%s\n", err_message);
}
LLVMDisposeMessage(err_message);
return NULL;
}
return target;
}
static const char *ac_get_llvm_processor_name(enum radeon_family family)
{
switch (family) {
case CHIP_TAHITI:
return "tahiti";
case CHIP_PITCAIRN:
return "pitcairn";
case CHIP_VERDE:
return "verde";
case CHIP_OLAND:
return "oland";
case CHIP_HAINAN:
return "hainan";
case CHIP_BONAIRE:
return "bonaire";
case CHIP_KABINI:
return "kabini";
case CHIP_KAVERI:
return "kaveri";
case CHIP_HAWAII:
return "hawaii";
case CHIP_MULLINS:
return "mullins";
case CHIP_TONGA:
return "tonga";
case CHIP_ICELAND:
return "iceland";
case CHIP_CARRIZO:
return "carrizo";
#if HAVE_LLVM <= 0x0307
case CHIP_FIJI:
return "tonga";
case CHIP_STONEY:
return "carrizo";
#else
case CHIP_FIJI:
return "fiji";
case CHIP_STONEY:
return "stoney";
#endif
#if HAVE_LLVM <= 0x0308
case CHIP_POLARIS10:
return "tonga";
case CHIP_POLARIS11:
return "tonga";
#else
case CHIP_POLARIS10:
return "polaris10";
case CHIP_POLARIS11:
return "polaris11";
#endif
default:
return "";
}
}
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family)
{
assert(family >= CHIP_TAHITI);
const char *triple = "amdgcn--";
LLVMTargetRef target = ac_get_llvm_target(triple);
LLVMTargetMachineRef tm = LLVMCreateTargetMachine(
target,
triple,
ac_get_llvm_processor_name(family),
"+DumpCode,+vgpr-spilling",
LLVMCodeGenLevelDefault,
LLVMRelocDefault,
LLVMCodeModelDefault);
return tm;
}

View File

@@ -0,0 +1,31 @@
/*
* Copyright 2016 Bas Nieuwenhuizen
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
*/
#pragma once
#include <llvm-c/TargetMachine.h>
#include "amd_family.h"
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family);

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,119 @@
/*
* Copyright © 2016 Bas Nieuwenhuizen
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#pragma once
#include <stdbool.h>
#include "llvm-c/Core.h"
#include "llvm-c/TargetMachine.h"
#include "amd_family.h"
struct ac_shader_binary;
struct ac_shader_config;
struct nir_shader;
struct radv_pipeline_layout;
struct ac_vs_variant_key {
uint32_t instance_rate_inputs;
};
struct ac_fs_variant_key {
uint32_t col_format;
uint32_t is_int8;
};
union ac_shader_variant_key {
struct ac_vs_variant_key vs;
struct ac_fs_variant_key fs;
};
struct ac_nir_compiler_options {
struct radv_pipeline_layout *layout;
union ac_shader_variant_key key;
bool unsafe_math;
enum radeon_family family;
enum chip_class chip_class;
};
struct ac_shader_variant_info {
unsigned num_user_sgprs;
unsigned num_input_sgprs;
unsigned num_input_vgprs;
union {
struct {
unsigned param_exports;
unsigned pos_exports;
unsigned vgpr_comp_cnt;
uint32_t export_mask;
bool writes_pointsize;
uint8_t clip_dist_mask;
uint8_t cull_dist_mask;
} vs;
struct {
unsigned num_interp;
uint32_t input_mask;
unsigned output_mask;
uint32_t flat_shaded_mask;
bool has_pcoord;
bool can_discard;
bool writes_z;
bool writes_stencil;
bool early_fragment_test;
bool writes_memory;
} fs;
struct {
unsigned block_size[3];
} cs;
};
};
void ac_compile_nir_shader(LLVMTargetMachineRef tm,
struct ac_shader_binary *binary,
struct ac_shader_config *config,
struct ac_shader_variant_info *shader_info,
struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
bool dump_shader);
/* SHADER ABI defines */
/* offset in dwords */
#define AC_USERDATA_DESCRIPTOR_SET_0 0
#define AC_USERDATA_DESCRIPTOR_SET_1 2
#define AC_USERDATA_DESCRIPTOR_SET_2 4
#define AC_USERDATA_DESCRIPTOR_SET_3 6
#define AC_USERDATA_PUSH_CONST_DYN 8
#define AC_USERDATA_VS_VERTEX_BUFFERS 10
#define AC_USERDATA_VS_BASE_VERTEX 12
#define AC_USERDATA_VS_START_INSTANCE 13
#define AC_USERDATA_PS_SAMPLE_POS 10
#define AC_USERDATA_CS_GRID_SIZE 10
#ifdef __cplusplus
extern "C"
#endif
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes);

111
src/amd/common/amd_family.h Normal file
View File

@@ -0,0 +1,111 @@
/*
* Copyright 2008 Corbin Simpson <MostAwesomeDude@gmail.com>
* Copyright 2010 Marek Olšák <maraeo@gmail.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE. */
#ifndef AMD_FAMILY_H
#define AMD_FAMILY_H
enum radeon_family {
CHIP_UNKNOWN = 0,
CHIP_R300, /* R3xx-based cores. */
CHIP_R350,
CHIP_RV350,
CHIP_RV370,
CHIP_RV380,
CHIP_RS400,
CHIP_RC410,
CHIP_RS480,
CHIP_R420, /* R4xx-based cores. */
CHIP_R423,
CHIP_R430,
CHIP_R480,
CHIP_R481,
CHIP_RV410,
CHIP_RS600,
CHIP_RS690,
CHIP_RS740,
CHIP_RV515, /* R5xx-based cores. */
CHIP_R520,
CHIP_RV530,
CHIP_R580,
CHIP_RV560,
CHIP_RV570,
CHIP_R600,
CHIP_RV610,
CHIP_RV630,
CHIP_RV670,
CHIP_RV620,
CHIP_RV635,
CHIP_RS780,
CHIP_RS880,
CHIP_RV770,
CHIP_RV730,
CHIP_RV710,
CHIP_RV740,
CHIP_CEDAR,
CHIP_REDWOOD,
CHIP_JUNIPER,
CHIP_CYPRESS,
CHIP_HEMLOCK,
CHIP_PALM,
CHIP_SUMO,
CHIP_SUMO2,
CHIP_BARTS,
CHIP_TURKS,
CHIP_CAICOS,
CHIP_CAYMAN,
CHIP_ARUBA,
CHIP_TAHITI,
CHIP_PITCAIRN,
CHIP_VERDE,
CHIP_OLAND,
CHIP_HAINAN,
CHIP_BONAIRE,
CHIP_KAVERI,
CHIP_KABINI,
CHIP_HAWAII,
CHIP_MULLINS,
CHIP_TONGA,
CHIP_ICELAND,
CHIP_CARRIZO,
CHIP_FIJI,
CHIP_STONEY,
CHIP_POLARIS10,
CHIP_POLARIS11,
CHIP_LAST,
};
enum chip_class {
CLASS_UNKNOWN = 0,
R300,
R400,
R500,
R600,
R700,
EVERGREEN,
CAYMAN,
SI,
CIK,
VI,
};
#endif

View File

@@ -0,0 +1,534 @@
/*
* Copyright 2015,2016 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
#ifndef AMDKERNELCODET_H
#define AMDKERNELCODET_H
//---------------------------------------------------------------------------//
// AMD Kernel Code, and its dependencies //
//---------------------------------------------------------------------------//
// Sets val bits for specified mask in specified dst packed instance.
#define AMD_HSA_BITS_SET(dst, mask, val) \
dst &= (~(1 << mask ## _SHIFT) & ~mask); \
dst |= (((val) << mask ## _SHIFT) & mask)
// Gets bits for specified mask from specified src packed instance.
#define AMD_HSA_BITS_GET(src, mask) \
((src & mask) >> mask ## _SHIFT) \
/* Every amd_*_code_t has the following properties, which are composed of
* a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),
* bit width (AMD_CODE_PROPERTY_*_WIDTH, and bit shift amount
* (AMD_CODE_PROPERTY_*_SHIFT) for convenient access. Unused bits must be 0.
*
* (Note that bit fields cannot be used as their layout is
* implementation defined in the C standard and so cannot be used to
* specify an ABI)
*/
enum amd_code_property_mask_t {
/* Enable the setup of the SGPR user data registers
* (AMD_CODE_PROPERTY_ENABLE_SGPR_*), see documentation of amd_kernel_code_t
* for initial register state.
*
* The total number of SGPRuser data registers requested must not
* exceed 16. Any requests beyond 16 will be ignored.
*
* Used to set COMPUTE_PGM_RSRC2.USER_SGPR (set to total count of
* SGPR user data registers enabled up to 16).
*/
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER_SHIFT = 0,
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR_SHIFT = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR_SHIFT = 2,
AMD_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR_SHIFT = 3,
AMD_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID_SHIFT = 4,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT_SHIFT = 5,
AMD_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_SHIFT = 6,
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X_SHIFT = 7,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y_SHIFT = 8,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y_SHIFT,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z_SHIFT = 9,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z = ((1 << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z_SHIFT,
AMD_CODE_PROPERTY_RESERVED1_SHIFT = 10,
AMD_CODE_PROPERTY_RESERVED1_WIDTH = 6,
AMD_CODE_PROPERTY_RESERVED1 = ((1 << AMD_CODE_PROPERTY_RESERVED1_WIDTH) - 1) << AMD_CODE_PROPERTY_RESERVED1_SHIFT,
/* Control wave ID base counter for GDS ordered-append. Used to set
* COMPUTE_DISPATCH_INITIATOR.ORDERED_APPEND_ENBL. (Not sure if
* ORDERED_APPEND_MODE also needs to be settable)
*/
AMD_CODE_PROPERTY_ENABLE_ORDERED_APPEND_GDS_SHIFT = 16,
AMD_CODE_PROPERTY_ENABLE_ORDERED_APPEND_GDS_WIDTH = 1,
AMD_CODE_PROPERTY_ENABLE_ORDERED_APPEND_GDS = ((1 << AMD_CODE_PROPERTY_ENABLE_ORDERED_APPEND_GDS_WIDTH) - 1) << AMD_CODE_PROPERTY_ENABLE_ORDERED_APPEND_GDS_SHIFT,
/* The interleave (swizzle) element size in bytes required by the
* code for private memory. This must be 2, 4, 8 or 16. This value
* is provided to the finalizer when it is invoked and is recorded
* here. The hardware will interleave the memory requests of each
* lane of a wavefront by this element size to ensure each
* work-item gets a distinct memory memory location. Therefore, the
* finalizer ensures that all load and store operations done to
* private memory do not exceed this size. For example, if the
* element size is 4 (32-bits or dword) and a 64-bit value must be
* loaded, the finalizer will generate two 32-bit loads. This
* ensures that the interleaving will get the work-item
* specific dword for both halves of the 64-bit value. If it just
* did a 64-bit load then it would get one dword which belonged to
* its own work-item, but the second dword would belong to the
* adjacent lane work-item since the interleaving is in dwords.
*
* The value used must match the value that the runtime configures
* the GPU flat scratch (SH_STATIC_MEM_CONFIG.ELEMENT_SIZE). This
* is generally DWORD.
*
* USE VALUES FROM THE AMD_ELEMENT_BYTE_SIZE_T ENUM.
*/
AMD_CODE_PROPERTY_PRIVATE_ELEMENT_SIZE_SHIFT = 17,
AMD_CODE_PROPERTY_PRIVATE_ELEMENT_SIZE_WIDTH = 2,
AMD_CODE_PROPERTY_PRIVATE_ELEMENT_SIZE = ((1 << AMD_CODE_PROPERTY_PRIVATE_ELEMENT_SIZE_WIDTH) - 1) << AMD_CODE_PROPERTY_PRIVATE_ELEMENT_SIZE_SHIFT,
/* Are global memory addresses 64 bits. Must match
* amd_kernel_code_t.hsail_machine_model ==
* HSA_MACHINE_LARGE. Must also match
* SH_MEM_CONFIG.PTR32 (GFX6 (SI)/GFX7 (CI)),
* SH_MEM_CONFIG.ADDRESS_MODE (GFX8 (VI)+).
*/
AMD_CODE_PROPERTY_IS_PTR64_SHIFT = 19,
AMD_CODE_PROPERTY_IS_PTR64_WIDTH = 1,
AMD_CODE_PROPERTY_IS_PTR64 = ((1 << AMD_CODE_PROPERTY_IS_PTR64_WIDTH) - 1) << AMD_CODE_PROPERTY_IS_PTR64_SHIFT,
/* Indicate if the generated ISA is using a dynamically sized call
* stack. This can happen if calls are implemented using a call
* stack and recursion, alloca or calls to indirect functions are
* present. In these cases the Finalizer cannot compute the total
* private segment size at compile time. In this case the
* workitem_private_segment_byte_size only specifies the statically
* know private segment size, and additional space must be added
* for the call stack.
*/
AMD_CODE_PROPERTY_IS_DYNAMIC_CALLSTACK_SHIFT = 20,
AMD_CODE_PROPERTY_IS_DYNAMIC_CALLSTACK_WIDTH = 1,
AMD_CODE_PROPERTY_IS_DYNAMIC_CALLSTACK = ((1 << AMD_CODE_PROPERTY_IS_DYNAMIC_CALLSTACK_WIDTH) - 1) << AMD_CODE_PROPERTY_IS_DYNAMIC_CALLSTACK_SHIFT,
/* Indicate if code generated has support for debugging. */
AMD_CODE_PROPERTY_IS_DEBUG_SUPPORTED_SHIFT = 21,
AMD_CODE_PROPERTY_IS_DEBUG_SUPPORTED_WIDTH = 1,
AMD_CODE_PROPERTY_IS_DEBUG_SUPPORTED = ((1 << AMD_CODE_PROPERTY_IS_DEBUG_SUPPORTED_WIDTH) - 1) << AMD_CODE_PROPERTY_IS_DEBUG_SUPPORTED_SHIFT,
AMD_CODE_PROPERTY_IS_XNACK_SUPPORTED_SHIFT = 22,
AMD_CODE_PROPERTY_IS_XNACK_SUPPORTED_WIDTH = 1,
AMD_CODE_PROPERTY_IS_XNACK_SUPPORTED = ((1 << AMD_CODE_PROPERTY_IS_XNACK_SUPPORTED_WIDTH) - 1) << AMD_CODE_PROPERTY_IS_XNACK_SUPPORTED_SHIFT,
AMD_CODE_PROPERTY_RESERVED2_SHIFT = 23,
AMD_CODE_PROPERTY_RESERVED2_WIDTH = 9,
AMD_CODE_PROPERTY_RESERVED2 = ((1 << AMD_CODE_PROPERTY_RESERVED2_WIDTH) - 1) << AMD_CODE_PROPERTY_RESERVED2_SHIFT
};
/* AMD Kernel Code Object (amd_kernel_code_t). GPU CP uses the AMD Kernel
* Code Object to set up the hardware to execute the kernel dispatch.
*
* Initial Kernel Register State.
*
* Initial kernel register state will be set up by CP/SPI prior to the start
* of execution of every wavefront. This is limited by the constraints of the
* current hardware.
*
* The order of the SGPR registers is defined, but the Finalizer can specify
* which ones are actually setup in the amd_kernel_code_t object using the
* enable_sgpr_* bit fields. The register numbers used for enabled registers
* are dense starting at SGPR0: the first enabled register is SGPR0, the next
* enabled register is SGPR1 etc.; disabled registers do not have an SGPR
* number.
*
* The initial SGPRs comprise up to 16 User SRGPs that are set up by CP and
* apply to all waves of the grid. It is possible to specify more than 16 User
* SGPRs using the enable_sgpr_* bit fields, in which case only the first 16
* are actually initialized. These are then immediately followed by the System
* SGPRs that are set up by ADC/SPI and can have different values for each wave
* of the grid dispatch.
*
* SGPR register initial state is defined as follows:
*
* Private Segment Buffer (enable_sgpr_private_segment_buffer):
* Number of User SGPR registers: 4. V# that can be used, together with
* Scratch Wave Offset as an offset, to access the Private/Spill/Arg
* segments using a segment address. It must be set as follows:
* - Base address: of the scratch memory area used by the dispatch. It
* does not include the scratch wave offset. It will be the per process
* SH_HIDDEN_PRIVATE_BASE_VMID plus any offset from this dispatch (for
* example there may be a per pipe offset, or per AQL Queue offset).
* - Stride + data_format: Element Size * Index Stride (???)
* - Cache swizzle: ???
* - Swizzle enable: SH_STATIC_MEM_CONFIG.SWIZZLE_ENABLE (must be 1 for
* scratch)
* - Num records: Flat Scratch Work Item Size / Element Size (???)
* - Dst_sel_*: ???
* - Num_format: ???
* - Element_size: SH_STATIC_MEM_CONFIG.ELEMENT_SIZE (will be DWORD, must
* agree with amd_kernel_code_t.privateElementSize)
* - Index_stride: SH_STATIC_MEM_CONFIG.INDEX_STRIDE (will be 64 as must
* be number of wavefront lanes for scratch, must agree with
* amd_kernel_code_t.wavefrontSize)
* - Add tid enable: 1
* - ATC: from SH_MEM_CONFIG.PRIVATE_ATC,
* - Hash_enable: ???
* - Heap: ???
* - Mtype: from SH_STATIC_MEM_CONFIG.PRIVATE_MTYPE
* - Type: 0 (a buffer) (???)
*
* Dispatch Ptr (enable_sgpr_dispatch_ptr):
* Number of User SGPR registers: 2. 64 bit address of AQL dispatch packet
* for kernel actually executing.
*
* Queue Ptr (enable_sgpr_queue_ptr):
* Number of User SGPR registers: 2. 64 bit address of AmdQueue object for
* AQL queue on which the dispatch packet was queued.
*
* Kernarg Segment Ptr (enable_sgpr_kernarg_segment_ptr):
* Number of User SGPR registers: 2. 64 bit address of Kernarg segment. This
* is directly copied from the kernargPtr in the dispatch packet. Having CP
* load it once avoids loading it at the beginning of every wavefront.
*
* Dispatch Id (enable_sgpr_dispatch_id):
* Number of User SGPR registers: 2. 64 bit Dispatch ID of the dispatch
* packet being executed.
*
* Flat Scratch Init (enable_sgpr_flat_scratch_init):
* Number of User SGPR registers: 2. This is 2 SGPRs.
*
* For CI/VI:
* The first SGPR is a 32 bit byte offset from SH_MEM_HIDDEN_PRIVATE_BASE
* to base of memory for scratch for this dispatch. This is the same offset
* used in computing the Scratch Segment Buffer base address. The value of
* Scratch Wave Offset must be added by the kernel code and moved to
* SGPRn-4 for use as the FLAT SCRATCH BASE in flat memory instructions.
*
* The second SGPR is 32 bit byte size of a single work-item's scratch
* memory usage. This is directly loaded from the dispatch packet Private
* Segment Byte Size and rounded up to a multiple of DWORD.
*
* \todo [Does CP need to round this to >4 byte alignment?]
*
* The kernel code must move to SGPRn-3 for use as the FLAT SCRATCH SIZE in
* flat memory instructions. Having CP load it once avoids loading it at
* the beginning of every wavefront.
*
* Private Segment Size (enable_sgpr_private_segment_size):
* Number of User SGPR registers: 1. The 32 bit byte size of a single
* work-item's scratch memory allocation. This is the value from the dispatch
* packet. Private Segment Byte Size rounded up by CP to a multiple of DWORD.
*
* \todo [Does CP need to round this to >4 byte alignment?]
*
* Having CP load it once avoids loading it at the beginning of every
* wavefront.
*
* \todo [This will not be used for CI/VI since it is the same value as
* the second SGPR of Flat Scratch Init.
*
* Grid Work-Group Count X (enable_sgpr_grid_workgroup_count_x):
* Number of User SGPR registers: 1. 32 bit count of the number of
* work-groups in the X dimension for the grid being executed. Computed from
* the fields in the HsaDispatchPacket as
* ((gridSize.x+workgroupSize.x-1)/workgroupSize.x).
*
* Grid Work-Group Count Y (enable_sgpr_grid_workgroup_count_y):
* Number of User SGPR registers: 1. 32 bit count of the number of
* work-groups in the Y dimension for the grid being executed. Computed from
* the fields in the HsaDispatchPacket as
* ((gridSize.y+workgroupSize.y-1)/workgroupSize.y).
*
* Only initialized if <16 previous SGPRs initialized.
*
* Grid Work-Group Count Z (enable_sgpr_grid_workgroup_count_z):
* Number of User SGPR registers: 1. 32 bit count of the number of
* work-groups in the Z dimension for the grid being executed. Computed
* from the fields in the HsaDispatchPacket as
* ((gridSize.z+workgroupSize.z-1)/workgroupSize.z).
*
* Only initialized if <16 previous SGPRs initialized.
*
* Work-Group Id X (enable_sgpr_workgroup_id_x):
* Number of System SGPR registers: 1. 32 bit work group id in X dimension
* of grid for wavefront. Always present.
*
* Work-Group Id Y (enable_sgpr_workgroup_id_y):
* Number of System SGPR registers: 1. 32 bit work group id in Y dimension
* of grid for wavefront.
*
* Work-Group Id Z (enable_sgpr_workgroup_id_z):
* Number of System SGPR registers: 1. 32 bit work group id in Z dimension
* of grid for wavefront. If present then Work-group Id Y will also be
* present
*
* Work-Group Info (enable_sgpr_workgroup_info):
* Number of System SGPR registers: 1. {first_wave, 14'b0000,
* ordered_append_term[10:0], threadgroup_size_in_waves[5:0]}
*
* Private Segment Wave Byte Offset
* (enable_sgpr_private_segment_wave_byte_offset):
* Number of System SGPR registers: 1. 32 bit byte offset from base of
* dispatch scratch base. Must be used as an offset with Private/Spill/Arg
* segment address when using Scratch Segment Buffer. It must be added to
* Flat Scratch Offset if setting up FLAT SCRATCH for flat addressing.
*
*
* The order of the VGPR registers is defined, but the Finalizer can specify
* which ones are actually setup in the amd_kernel_code_t object using the
* enableVgpr* bit fields. The register numbers used for enabled registers
* are dense starting at VGPR0: the first enabled register is VGPR0, the next
* enabled register is VGPR1 etc.; disabled registers do not have an VGPR
* number.
*
* VGPR register initial state is defined as follows:
*
* Work-Item Id X (always initialized):
* Number of registers: 1. 32 bit work item id in X dimension of work-group
* for wavefront lane.
*
* Work-Item Id X (enable_vgpr_workitem_id > 0):
* Number of registers: 1. 32 bit work item id in Y dimension of work-group
* for wavefront lane.
*
* Work-Item Id X (enable_vgpr_workitem_id > 0):
* Number of registers: 1. 32 bit work item id in Z dimension of work-group
* for wavefront lane.
*
*
* The setting of registers is being done by existing GPU hardware as follows:
* 1) SGPRs before the Work-Group Ids are set by CP using the 16 User Data
* registers.
* 2) Work-group Id registers X, Y, Z are set by SPI which supports any
* combination including none.
* 3) Scratch Wave Offset is also set by SPI which is why its value cannot
* be added into the value Flat Scratch Offset which would avoid the
* Finalizer generated prolog having to do the add.
* 4) The VGPRs are set by SPI which only supports specifying either (X),
* (X, Y) or (X, Y, Z).
*
* Flat Scratch Dispatch Offset and Flat Scratch Size are adjacent SGRRs so
* they can be moved as a 64 bit value to the hardware required SGPRn-3 and
* SGPRn-4 respectively using the Finalizer ?FLAT_SCRATCH? Register.
*
* The global segment can be accessed either using flat operations or buffer
* operations. If buffer operations are used then the Global Buffer used to
* access HSAIL Global/Readonly/Kernarg (which are combine) segments using a
* segment address is not passed into the kernel code by CP since its base
* address is always 0. Instead the Finalizer generates prolog code to
* initialize 4 SGPRs with a V# that has the following properties, and then
* uses that in the buffer instructions:
* - base address of 0
* - no swizzle
* - ATC=1
* - MTYPE set to support memory coherence specified in
* amd_kernel_code_t.globalMemoryCoherence
*
* When the Global Buffer is used to access the Kernarg segment, must add the
* dispatch packet kernArgPtr to a kernarg segment address before using this V#.
* Alternatively scalar loads can be used if the kernarg offset is uniform, as
* the kernarg segment is constant for the duration of the kernel execution.
*/
typedef struct amd_kernel_code_s {
uint32_t amd_kernel_code_version_major;
uint32_t amd_kernel_code_version_minor;
uint16_t amd_machine_kind;
uint16_t amd_machine_version_major;
uint16_t amd_machine_version_minor;
uint16_t amd_machine_version_stepping;
/* Byte offset (possibly negative) from start of amd_kernel_code_t
* object to kernel's entry point instruction. The actual code for
* the kernel is required to be 256 byte aligned to match hardware
* requirements (SQ cache line is 16). The code must be position
* independent code (PIC) for AMD devices to give runtime the
* option of copying code to discrete GPU memory or APU L2
* cache. The Finalizer should endeavour to allocate all kernel
* machine code in contiguous memory pages so that a device
* pre-fetcher will tend to only pre-fetch Kernel Code objects,
* improving cache performance.
*/
int64_t kernel_code_entry_byte_offset;
/* Range of bytes to consider prefetching expressed as an offset
* and size. The offset is from the start (possibly negative) of
* amd_kernel_code_t object. Set both to 0 if no prefetch
* information is available.
*/
int64_t kernel_code_prefetch_byte_offset;
uint64_t kernel_code_prefetch_byte_size;
/* Number of bytes of scratch backing memory required for full
* occupancy of target chip. This takes into account the number of
* bytes of scratch per work-item, the wavefront size, the maximum
* number of wavefronts per CU, and the number of CUs. This is an
* upper limit on scratch. If the grid being dispatched is small it
* may only need less than this. If the kernel uses no scratch, or
* the Finalizer has not computed this value, it must be 0.
*/
uint64_t max_scratch_backing_memory_byte_size;
/* Shader program settings for CS. Contains COMPUTE_PGM_RSRC1 and
* COMPUTE_PGM_RSRC2 registers.
*/
uint64_t compute_pgm_resource_registers;
/* Code properties. See amd_code_property_mask_t for a full list of
* properties.
*/
uint32_t code_properties;
/* The amount of memory required for the combined private, spill
* and arg segments for a work-item in bytes. If
* is_dynamic_callstack is 1 then additional space must be added to
* this value for the call stack.
*/
uint32_t workitem_private_segment_byte_size;
/* The amount of group segment memory required by a work-group in
* bytes. This does not include any dynamically allocated group
* segment memory that may be added when the kernel is
* dispatched.
*/
uint32_t workgroup_group_segment_byte_size;
/* Number of byte of GDS required by kernel dispatch. Must be 0 if
* not using GDS.
*/
uint32_t gds_segment_byte_size;
/* The size in bytes of the kernarg segment that holds the values
* of the arguments to the kernel. This could be used by CP to
* prefetch the kernarg segment pointed to by the dispatch packet.
*/
uint64_t kernarg_segment_byte_size;
/* Number of fbarrier's used in the kernel and all functions it
* calls. If the implementation uses group memory to allocate the
* fbarriers then that amount must already be included in the
* workgroup_group_segment_byte_size total.
*/
uint32_t workgroup_fbarrier_count;
/* Number of scalar registers used by a wavefront. This includes
* the special SGPRs for VCC, Flat Scratch Base, Flat Scratch Size
* and XNACK (for GFX8 (VI)). It does not include the 16 SGPR added if a
* trap handler is enabled. Used to set COMPUTE_PGM_RSRC1.SGPRS.
*/
uint16_t wavefront_sgpr_count;
/* Number of vector registers used by each work-item. Used to set
* COMPUTE_PGM_RSRC1.VGPRS.
*/
uint16_t workitem_vgpr_count;
/* If reserved_vgpr_count is 0 then must be 0. Otherwise, this is the
* first fixed VGPR number reserved.
*/
uint16_t reserved_vgpr_first;
/* The number of consecutive VGPRs reserved by the client. If
* is_debug_supported then this count includes VGPRs reserved
* for debugger use.
*/
uint16_t reserved_vgpr_count;
/* If reserved_sgpr_count is 0 then must be 0. Otherwise, this is the
* first fixed SGPR number reserved.
*/
uint16_t reserved_sgpr_first;
/* The number of consecutive SGPRs reserved by the client. If
* is_debug_supported then this count includes SGPRs reserved
* for debugger use.
*/
uint16_t reserved_sgpr_count;
/* If is_debug_supported is 0 then must be 0. Otherwise, this is the
* fixed SGPR number used to hold the wave scratch offset for the
* entire kernel execution, or uint16_t(-1) if the register is not
* used or not known.
*/
uint16_t debug_wavefront_private_segment_offset_sgpr;
/* If is_debug_supported is 0 then must be 0. Otherwise, this is the
* fixed SGPR number of the first of 4 SGPRs used to hold the
* scratch V# used for the entire kernel execution, or uint16_t(-1)
* if the registers are not used or not known.
*/
uint16_t debug_private_segment_buffer_sgpr;
/* The maximum byte alignment of variables used by the kernel in
* the specified memory segment. Expressed as a power of two. Must
* be at least HSA_POWERTWO_16.
*/
uint8_t kernarg_segment_alignment;
uint8_t group_segment_alignment;
uint8_t private_segment_alignment;
/* Wavefront size expressed as a power of two. Must be a power of 2
* in range 1..64 inclusive. Used to support runtime query that
* obtains wavefront size, which may be used by application to
* allocated dynamic group memory and set the dispatch work-group
* size.
*/
uint8_t wavefront_size;
int32_t call_convention;
uint8_t reserved3[12];
uint64_t runtime_loader_kernel_symbol;
uint64_t control_directives[16];
} amd_kernel_code_t;
#endif // AMDKERNELCODET_H

View File

@@ -32,7 +32,7 @@
#ifndef AMDGPU_ID_H
#define AMDGPU_ID_H
#include "pipe/p_config.h"
#include "util/u_endian.h"
#if defined(PIPE_ARCH_LITTLE_ENDIAN)
#define LITTLEENDIAN_CPU

View File

@@ -203,6 +203,12 @@
#define S_028BDC_LAST_PIXEL(x) (((unsigned)(x) & 0x1) << 10)
#define G_028BDC_LAST_PIXEL(x) (((x) >> 10) & 0x1)
#define C_028BDC_LAST_PIXEL 0xFFFFFBFF
#define S_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((unsigned)(x) & 0x1) << 11)
#define G_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((x) >> 11) & 0x1)
#define C_028BDC_PERPENDICULAR_ENDCAP_ENA 0xFFFFF7FF
#define S_028BDC_DX10_DIAMOND_TEST_ENA(x) (((unsigned)(x) & 0x1) << 12)
#define G_028BDC_DX10_DIAMOND_TEST_ENA(x) (((x) >> 12) & 0x1)
#define C_028BDC_DX10_DIAMOND_TEST_ENA 0xFFFFEFFF
#define CM_R_028BE0_PA_SC_AA_CONFIG 0x28be0
#define S_028BE0_MSAA_NUM_SAMPLES(x) (((unsigned)(x) & 0x7) << 0)
#define S_028BE0_AA_MASK_CENTROID_DTMN(x) (((unsigned)(x) & 0x1) << 4)
@@ -216,7 +222,6 @@
#define EG_S_028C70_FAST_CLEAR(x) (((unsigned)(x) & 0x1) << 17)
#define SI_S_028C70_FAST_CLEAR(x) (((unsigned)(x) & 0x1) << 13)
#define VI_S_028C70_DCC_ENABLE(x) (((unsigned)(x) & 0x1) << 28)
/*CIK+*/
#define R_0300FC_CP_STRMOUT_CNTL 0x0300FC
@@ -241,5 +246,7 @@
#define S_028254_BR_Y(x) (((unsigned)(x) & 0x7FFF) << 16)
#define G_028254_BR_Y(x) (((x) >> 16) & 0x7FFF)
#define C_028254_BR_Y 0x8000FFFF
#define R_0282D0_PA_SC_VPORT_ZMIN_0 0x0282D0
#define R_0282D4_PA_SC_VPORT_ZMAX_0 0x0282D4
#endif

Some files were not shown because too many files have changed in this diff Show More