Compare commits

...

2308 Commits

Author SHA1 Message Date
Emil Velikov
623f64efc1 util: use RTLD_LOCAL with util_dl_open()
Otherwise we risk things blowing up due to conflicting symbols.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
8943a562e2 targets/nine: remove unused static functions
Dead code since commit 8f50614910

Cc: Axel Davy <axel.davy@ens.fr>
Cc: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
42dde5aa24 targets/nine: add note about messy header inclusion order
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
0942250781 targets/nine: add note about fd owndership
v2:
 - move autotools hunk into correct patch
 - correct the note based on Axel's feedback

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
f8a1665542 auxiliary/vl: Don't close the drm fd on failure
Ported from an identically named commit in st/xa

commit 35cf3831d7
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Thu Jul 3 02:07:36 2014 -0700

    st/xa: Don't close the drm fd on failure v2

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
e43a771dfa st/dri: NULL check the pscreen earlier
We delay the null check only to jump through hoops to work around that.
Check early to make our lives easier.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
13bccee87d st/dri: Don't close the drm fd on failure
Ported from an identically named commit in st/xa

commit 35cf3831d7
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Thu Jul 3 02:07:36 2014 -0700

    st/xa: Don't close the drm fd on failure v2

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
b7f5c2ee48 target-helpers: remove inline_drm_helper.h
As of earlier all the targets use the non inline version. Don't forget
to remove the function prototypes/declarations.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
dddedbec0e {st,targets}/nine: use static/dynamic pipe-loader
Analogous to previous commits.

v2: add the missing winsys libs linkage

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
611ef64ed5 {st,targets}/xa: use static/dynamic pipe-loader
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
1eb6e8a23c {auxiliary,targets}/vl: use static/dynamic pipe-loader
Analogous to previous commit.

v2: rebase on top of vl_winsys_drm.c addition

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
23fb11455b {st,targets}/dri: use static/dynamic pipe-loader
Covert DRI to use only the pipe-loader interface.

With drisw_create_screen and kms_swrast_create_screen replaced by their
pipe-loader equivalent, we can now drop them.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
c4d337146a pipe-loader: add preliminary Android support
Add a 'static' pipe-loader build, which will be used with follow-up
commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
234b03cc23 pipe-loader: add preliminary scons support
Add a 'static' pipe-loader build, which will be used with follow-up
commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
7999e6ddba pipe-loader: don't mix code and variable declarations
We cannot use this C99 feature here quite yet, as the code needs to be
build with MSVC prior to 2013.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
17d3a5f857 target-helpers: add a non-inline drm_helper.h
Unlike the inline ones, here we'd want to have an extern definition of
the functions. This is required as with follow-up commits, we'll
gradually start using the static pipe-loader, with the latter needing
the symbols.

These are direct copy from the inline version.

v2:
 - rebase on top of virgl support
 - add "driver missing" printfs (Nicolai)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
af031deed6 target-helpers: move the DRI specifics to the target
Rather than having all targets include the file, with only some defining
the relevant guard macro, just move things where they are used.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
950e06a29b automake: remove no longer needed HAVE_LOADER_GALLIUM conditional
As of last few commits we have a static and dynamic pipe-loader. Either
of which will be used with (almost) all targets..

We can look into allowing the user to select which way the targets are
built, be that 'static for all' or 'per target' in follow up commits.
After which we can look into building only the static or dynamic
version, although building both shouldn't cause any issues.

Hack/workaround alert:
Control the standalone pipe-drivers via HAVE_CLOVER. Will need to be
fixed as the targets are converted/configure knobs are in.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
be78f73b37 pipe-loader: wire up the 'static' sw pipe-loader
Analogous to previous commit with a small catch.

As the sw inline helpers are mere wrappers, and the screen <> winsys
split is more prominent (with the latter not being part of the final
pipe-driver), things will just work.

v2: rebase on top of earlier 'consolitate teardown' changes

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
1b589207de pipe-loader: wire up the 'static' drm pipe-loader
Add a list of driver descriptors and select one from the list, during
probe time.

As we'll need to have all the driver pipe_foo_screen_create() functions
provided externally (i.e. from another static lib) we need a separate
(non-inline) drm_helper, which contains the function declarations.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
0f39f9cb7a pipe-loader: add a dummy 'static' pipe-loader
It is to be used in contrast of the dynamic one. The state-tracker does
not need to know if the pipe-driver is built into the final blob or
a separate object. This will allow us to move the logic to the final
step (in target) where the appropriate pipe-loader will be chosen.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
ad12027d8f gallium: rename libpipe_loader to libpipe_loader_dynamic
With the next commits we'll introduce a 'static' version, which will
essentially load the statically linked-in pipe-drivers, rather than the
standalone pipe-$foo.so ones.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
3ca12ee976 pipe-loader: dlopen/dlsym the pipe-driver at probe time
Rather than giving false hopes that things might work, just check at
probe time. This allows us to remove the duplication and consolidate
the code wrt the upcomming static pipe-loader.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
e465de5a51 pipe-loader: annotate the ops as const data
Already defined as such in struct pipe_loader_device::ops.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
46991ab9aa pipe-loader: teardown the winsys, if create_screen fails
i.e. plug some (hard to hit) memory leaks.

v2: fix rebase fallout - really teardown the winsys (Brian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
d54ca54faa pipe-loader: rework the sw backend
Move the winsys into the pipe-target, similar to the hardware
pipe-driver.

v2:
 - move int declaration outside of loop (Brian)
 - fold the teardown into a goto + separate function.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
f58a6f7be3 gallium: keep the libdrm link alongside libkmsdri.la
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
ff9cd8a67c pipe-loader: directly use pipe_loader_sw_probe_null() at probe time
Due to the nature of the other sw winsys' we cannot use them during the
generic probe stage. As such there is little point in keeping the
abstraction layer.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
4e3c06a501 pipe-loader: add pipe_loader_sw_probe_init_common() helper
Allows us to fold the duplication in pipe_loader_sw_probe_*().

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
6d68d714c0 gallium/tests: remove unneeded include paths
The tests don't (and shouldn't) need to have anything driver and/or
winsys specific.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
74d41a32bc gallium: remove library_path argument from pipe_loader_create_screen()
Currently the location is determined at configure/build time and
consistently copied across gallium. Just remove the extra argument, and
use PIPE_SEARCH_DIR where appropriate.

This will allow us to remove the duplication in the *configuration and
*screen_create APIs by moving util_dl_get_proc_address() and friends to
probe time.

v2: rebase on top of vl_winsys_drm.c addition

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
cbc4d9730a targets/nine: remove the custom pipe-driver path management
Since the up-streaming of nine, the static target was used by default.
The dynamic pipe-drivers being available only via manual tweak of
configure.ac.

As we'll be removing the library_path argument from the pipe-loader with
follow-up commits, we can remove D3D9_DRIVERS_PATH/D3D9_DRIVERS_DIR.
Everyone doing local hacking on nine, or wishing to have a env override
can bring them back within the pipe-loader.

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
149454bb13 pipe-loader: remove HAVE_DRM_LOADER_GALLIUM and HAVE_PIPE_LOADER_DRM
... in favour of HAVE_LIBDRM. After all we solely want to build the code
when the latter is available.

In the not too distant future we will remove the libudev/sysfs
dependency and simplify configure.ac even further.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
33f1db1eb4 pipe-loader: add pipe_loader_sw_probe_kms() implementation
Will be used as a counterpart for target-helpers'
kms_swrast_create_screen().

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
be430726e2 configure: use HAVE_DRISW_KMS when handling kms swrast
Using HAVE_DRI2 to manage it seems counter-intuitive.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
f9c9471b76 targets/nine: use the existing sw_screen_wrap() over our custom version
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
6bcd5f0d02 automake: use GALLIUM_PIPE_LOADER_DEFINES only where applicable
As of last commit we no longer need the defines in order to have the
function prototypes.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
b7875ca493 pipe-loader: remove HAVE_PIPE_LOADER_foo function prototype guards
They serve little to no purpose, as we don't need any additional
dependencies (headers and/or symbols). On the other hand dropping them
will allow us to use GALLIUM_PIPE_LOADER_DEFINES in only one single
place - the pipe-loader.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
c751d33a20 gallium/trace: remove useless NULL check from trace_screen_create()
Currently every target makes sure that the screen is non-null prior to
using the debug (trace including) wrappers. If that no longer holds true
we want to know and fix this ASAP rather than silently bailing out.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
e762a46a07 configure: remove obsolete _CLIENT comment
The referenced variable(s) have been removed with commit abc20120e4
(automake: pipe-loader: remove the 'client' pipe-loader)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
1a18457a52 docs: add news item and link release notes for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 12:42:48 +00:00
Emil Velikov
da2cb8a2ee docs: add sha256 checksums for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2555e000fc)
2015-11-21 12:41:22 +00:00
Emil Velikov
380aec1703 docs: add release notes for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 04fd3a6f62)
2015-11-21 12:41:21 +00:00
Ilia Mirkin
d8c26969d5 freedreno/a4xx: add missing formats to enable ARB_vertex_type_2_10_10_10_rev
Same as commit 84d087aea but for a4xx. The RE'd enums had the same issue
too.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 20:41:39 -05:00
Matt Turner
f6986a81c9 i965: Test that nonrepresentable floats cannot be converted to VF.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-20 17:39:34 -08:00
Matt Turner
f450030f66 i965: Use ldexpf() in VF float test set up.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-20 17:39:34 -08:00
Matt Turner
0684aed8ab i965/vec4: Initialize nir_inputs with src_reg().
nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not
that that makes a whole lot of sense to me), and only nir_inputs is a
src_reg.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-20 17:39:34 -08:00
Matt Turner
c875e3cdd2 i965/fs: Add support for gl_HelperInvocation system value.
In most cases (when the negate is copy propagated and the MOV removed),
this is two instructions on Gen >= 8 and only two instructions on
earlier platforms -- and it doesn't use the flag register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-20 17:39:33 -08:00
Matt Turner
4b15281295 i965: Add brw_imm_uv(). 2015-11-20 17:39:33 -08:00
Matt Turner
ce11d4f369 i965: Don't bother setting regioning on immediates.
The region fields are unioned with the immediate storage.
2015-11-20 17:39:33 -08:00
Matt Turner
c28b574170 nir: Add support for gl_HelperInvocation system value.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-20 17:39:33 -08:00
Ilia Mirkin
fe29330406 freedreno/a4xx: use hardware RGTC texture samplers
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
39fa5c8419 freedreno/a4xx: hook up RGB565 format
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
3b77826cc1 freedreno/a4xx: logic op handling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
e1319dcdd6 freedreno/a4xx: add 16-bit unorm/snorm format texturing/rendering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
ff9450ecd1 freedreno/a4xx: point regid to "red" even for alpha-only rb formats
Looks like a4xx hw does this in a more standard way and we don't need to
hack around it like we do on a3xx. Fixes GL_ALPHA formats in
fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-20 18:15:15 -05:00
Ilia Mirkin
4fd24caf92 ttn: add TEX2 support
This fixes CubeArrayShadow tests (where the shadow comes in via a second
arg to the TEX2 instruction).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-20 17:45:08 -05:00
Ilia Mirkin
c1babbd85c freedreno: always set all border colors
Instead of playing the guessing game as to which texture format reads
from which border color encoding type, just write both of them always.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 17:44:10 -05:00
Ilia Mirkin
ec106e9f62 freedreno/a4xx: fix dst_alpha blend for RGBX render targets
There are not native RGBX render formats, so we must manually force
dst_alpha to be one, same as for a3xx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 17:44:10 -05:00
Nicolai Hähnle
5bda3d0958 radeon: re-prepare query buffers on begin_query for predicate queries
The point of prepare_buffer is to ensure that the query buffer contains valid
initial data for conditional rendering: as long as the buffer is initialized
correctly, the GPU is able to tell whether query results have been written
already (and wait or fall back to unconditional rendering if desired).

This means prepare_buffer needs to be called again when a buffer is reused.

Conversely, for queries that cannot be used for conditional rendering
(notably pipeline statistics), we can re-use buffers immediately, and they
do not need to be initialized.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
2015-11-20 22:46:11 +01:00
Nicolai Hähnle
6f4fe8e76a radeon: reset query buffers for PIPE_QUERY_TIMESTAMP
Since begin_query is not called for this query type, we need to reset the
query buffer state in end_query instead.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93015
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Tested-by: Mathias Tillman <master.homer@gmail.com>
2015-11-20 22:46:11 +01:00
Brian Paul
47fae842d0 mesa: update some old-style (K&R?) function pointer calls
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 14:09:15 -07:00
Brian Paul
1def5ef958 docs: mention GL 3.3 support for VMware driver in Mesa 11.1 relnotes
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-20 14:06:25 -07:00
Brian Paul
527466d9a1 svga: add num-bytes-uploaded HUD query
To graph the number of bytes uploaded to GPU per frame (vertex buffer data,
constant buffer data, texture data, etc).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-20 13:40:06 -07:00
Brian Paul
e96d7a1489 svga: add some sanity check assertions in svga_buffer_transfer_map()
Make sure y and z values of buffers are as expected.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-20 13:40:06 -07:00
Timothy Arceri
b109cd3c27 docs: mark compile-time constant expressions as done
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:30:18 +11:00
Timothy Arceri
f7af69c350 glsl: add subroutine index qualifier support
ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.

This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().

ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.

A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.

Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-21 07:30:12 +11:00
Timothy Arceri
02d2ab2378 glsl: add support for complie-time constant expressions
This patch replaces the old interger constant qualifiers with either
the new ast_layout_expression type if the qualifier requires merging
or ast_expression if the qualifier can't have mulitple declarations
or if all but the newest qualifier is simply ignored.

We also update the process_qualifier_constant() helper to be
similar to the one in the ast_layout_expression class, but in
this case it will be used to process the ast_expression qualifiers.

Global shader layout qualifier validation is moved out of the parser
in this change as we now need to evaluate any constant expression
before doing the validation.

V2: Fix minimum value check for vertices (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:28:06 +11:00
Timothy Arceri
0954b813a3 glsl: add new type for compile time constants
In this patch we introduce a new ast type for holding the new
compile-time constant expressions. The main reason for this is that
we can no longer do merging of layout qualifiers before they have been
converted into GLSL IR so we need to store them to be proccessed later.

The new type has two helper functions:

- process_qualifier_constant()

 Used to merge and then evaluate qualifier expressions

- merge_qualifier()

 Simply appends a qualifier to a list to be merged later by
 process_qualifier_constant()

In order to avoid cascading error messages the process_qualifier_constant()
helpers return a bool

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:56 +11:00
Timothy Arceri
4196af4ce7 glsl: call set_shader_inout_layout() earlier
This will allow us to add error checking to this function
in a later patch, if we don't move it the error messages
will go missing.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:49 +11:00
Timothy Arceri
e74fe2a844 glsl: replace binding layout min boundary check
Use new helper that will in a later patch allow for
compile time constants.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:42 +11:00
Timothy Arceri
64710db664 glsl: encapsulate binding validation and setting
This change moves the binding layout handing code into an apply
function to be consistent with other helper functions in the ast
code, and to encapsulate the code so that when we introduce
compile time constants the code will be much cleaner.

One small downside is for unnamed interface blocks we will now
be revalidating the binding for each member its applied to.
However this seems a small sacrifice in order to have code which
is readable.

We also remove the incorrect comment in the named interface code
about propagating bindings to members which seems to have been
copied from the unnamed interface code.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:30 +11:00
Timothy Arceri
db3c36aedf glsl: move stream layout max validation
This validation is moved later so we can validate the
max value when compile time constant support is added in a
later patch.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:21 +11:00
Timothy Arceri
17e224e8ec glsl: move stream layout qualifier validation
We are moving this out of the parser in preparation for compile
time constant support.

The reason a validation function is used rather than an apply
function like what is used with bindings is because glsl allows
streams to be defined on members of blocks even though they must
match the stream thats associated with the current block, this
means we need access to the value after validation to do this
comparision.

V2: Fix typo in comment (Emil)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:15 +11:00
Timothy Arceri
efa34e4a1d glsl: replace index layout min boundary check
Use new helper that will in a later patch allow for
compile time constants.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:09 +11:00
Timothy Arceri
1d87d6f9ca glsl: remove duplicate validation for index layout qualifier
The minimum value for index is validated in apply_explicit_location()
and we want to remove validation from the parser so we can add
compile time constant support.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:04 +11:00
Timothy Arceri
d1f23545a1 glsl: move location layout qualifier validation
We are moving this out of the parser in preparation for compile
time constant support.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:00 +11:00
Timothy Arceri
de8f0c9ab9 glsl: add process_qualifier_constant() helper
For now this just validates that a qualifier is inside its
minimum boundary, in a later patch we will expand it to
evaluate compile time constants.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:26:55 +11:00
Samuel Pitoiset
f57285c8fc docs: mark GL_AMD_performance_monitor for nv50
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 21:03:14 +01:00
Samuel Pitoiset
aede8ca9a7 nv50: expose two groups of compute-related MP perf counters
This turns on GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 21:03:14 +01:00
Ben Widawsky
0288f92e7b i965/gen9: Support fast clears for 32b float
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).

v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:44 -08:00
Ben Widawsky
7c690da29c Revert "i965/gen9: Enable rep clears on gen9"
This reverts commit 8a0c85b258.

It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
f838e53c70 Revert "i965/gen9: Disable MCS for 1x color surfaces"
This reverts commit dcd59a9e32.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
c4edc048c6 i965/meta/gen9: Individually fast clear color attachments
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).

It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0

v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)

v3: Patch from Neil whose feedback I originally missed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
6fa1130cd2 i965/skl: skip fast clears for certain surface formats
Some of the information originally in this commit message is now in the patch
before this.

SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".

The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.

v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)

v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.

v4: Use better quote from spec (Topi)

Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
9d94eeb8a4 i965: Add lossless compression to surface format table
Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth.  This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".

Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.

Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.

v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
  - related changes to the code to reflect this.
- remove excess ccs (Chad)

v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)

Requested-by: Chad Versace <chad.versace@intel.com>
Requested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
d23aa634e0 i965/skl: Add fast color clear infrastructure
Patch was originally called:
i965/skl: Enable fast color clears on SKL

Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.

Based on a patch originally from Kristian.

Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
   --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
          /* In release builds, fall through */
       case I915_TILING_Y:
          *width_px = 32 / mt->cpp;
   -      *height = 4;
   +      if (brw->gen >= 9)
   +         *height = 2;
   +      else
   +         *height = 4;

v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:32 -08:00
Ian Romanick
2f7d2fd997 docs: Add GL_EXT_shader_samples_identical to the release notes
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-20 11:38:11 -08:00
Leo Liu
8762570cc5 radeon/vce: disable two pipe mode for stoney
Only one encoding pipe available for Stoney

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-20 13:21:54 -05:00
Leo Liu
99d92de5d0 radeon/vce: add new firmware interface support
Add new interface to create and encode

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-20 13:21:54 -05:00
Emil Velikov
8fdb548799 egl: don't forget to ship platform_x11_dri3.h into the tarball
Should have been a part of f35198bade

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:08:04 +00:00
Emil Velikov
ae6d6941f6 glsl: move builtin_type_macros.h into the correct list
Commit b9b40ef9b7 moved the file, but forgot to update the reference in
the makefile. Thus the out of tree build was busted :\

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:07:58 +00:00
Emil Velikov
c45b4257c2 automake: use static llvm for make distcheck
With llvm 3.7 semi-dropping the autoconf build, we rely on their cmake
build. With the latter of which annoyingly using another (busted?)
SONAME.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:07:52 +00:00
Brian Paul
0743e14aee mesa: remove unused var in _mesa_PushDebugGroup()
Trivial.
2015-11-20 09:35:18 -07:00
Brian Paul
108013b8e5 mesa: whitespaces fixes in _mesa_one_time_init_extension_overrides()
Trivial.
2015-11-20 09:35:05 -07:00
Nicolai Hähnle
8a125afa6e radeon: ensure that timing/profiling queries are suspended on flush
The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-20 17:27:40 +01:00
Nicolai Hähnle
6a14a39fab st/mesa: add support for batch driver queries to perfmon
v2 + v3: forgot null-pointer checks (spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:36 +01:00
Nicolai Hähnle
424a614ff1 gallium/hud: add support for batch queries
v2 + v3: be more defensive about allocations

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:32 +01:00
Nicolai Hähnle
d61d4df02e gallium: add the concept of batch queries
Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.

Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.

v3: adjust recently added nv50 queries

v2: documentation for create_batch_query

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:28 +01:00
Nicolai Hähnle
c235300bfc st/mesa: maintain active perfmon counters in an array
It is easy enough to pre-determine the required size, and arrays are
generally better behaved especially when they get large.

v2: make sure init_perf_monitor returns true when no counters are active
(spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:23 +01:00
Nicolai Hähnle
afa6121b4e st/mesa: use BITSET_FOREACH_SET to loop through active perfmon counters
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:18 +01:00
Nicolai Hähnle
0aea83dc4a st/mesa: store mapping from perfmon counter to query type
Previously, when a performance monitor was initialized, an inner loop through
all driver queries with string comparisons for each enabled performance
monitor counter was used. This hurts when a driver exposes lots of queries.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:09 +01:00
Nicolai Hähnle
4e1339691d st/mesa: map semantic driver query types to underlying type
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:59 +01:00
Nicolai Hähnle
050db20d37 gallium/hud: remove unused field in query_info
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:50 +01:00
Nicolai Hähnle
ddf27a3dd0 gallium: remove pipe_driver_query_group_info field type
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says

  A performance monitor consists of a number of hardware and software
  counters that can be sampled by the GPU and reported back to the
  application.

I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.

v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:39 +01:00
Roland Scheidegger
24dc0316b4 gallivm: use sampler index 0 for texel fetches
texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-20 17:00:15 +01:00
Ilia Mirkin
9a93da4e83 freedreno/a4xx: add BPTC support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 09:25:39 -05:00
François Tigeot
8a94ba5e0c xmlconfig: Add support for DragonFly
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:18:47 +00:00
Mauro Rossi
480ba46bcb android: export the path of glsl nir headers
The change is necessary to avoid building errors in glsl and i965
modules due to missing glsl_types.h header

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:18:19 +00:00
Boyan Ding
b8547a5063 mesa: re-enable KHR_debug for ES contexts
With the earlier issues resolved we can expose the extension.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Boyan Ding
ab7294668c main: Don't restrict several KHR_debug enum to desktop GL
In preparation for supporting GL_KHR_debug in OpenGL ES

v2: add a missing hunk in _mesa_IsEnabled (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Emil Velikov
af27236854 mesa: use the correct string for the ES GL_KHR_debug functions
As defined in the spec

    when implemented in an OpenGL ES context, all entry points defined
    by this extension must have a "KHR" suffix.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Gregory Hainaut
9108a785a0 glsl: avoid linker and user varying location to overlap
Current behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR0 by the linker

New behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR1 by the linker

v4:
* Fix variable name in assert

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-20 22:04:02 +11:00
Emil Velikov
3afb253e9b auxiliary/vl/dri2: coding style fixes
Rewrap long(ish) lines, add space between struct foo and *.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
b31f092bfb auxiliary/vl/dri2: hide internal functions
Analogous to previous commit. While we're here prefix all functions
identically -> vl_dri2_foo

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
4533c022f4 auxiliary/vl/drm: hide internal functions
As of last commit everyone is using the vl_screen dispatch, thus we can
hide this function from the headers and make it static.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
abbfda60d8 st/vdpau: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
4307155127 st/xvmc: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
422356ed2f st/va: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
9eb109f4d3 st/omx: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
32094979f7 auxiliary/vl/dri2: setup the dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
6150d8d4bd auxiliary/vl/drm: use a label for the error path
... just like every other place in gallium.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
d03d9ecafa auxiliary/vl/drm: setup the dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
6b152ee7b6 auxiliary/vl: add dispatch table
As mentioned previously, it will allow us to use different vl backend in
a generic way from either video state-tracker.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:41 +00:00
Emil Velikov
2bd9116b82 auxiliary/vl: rename vl_screen_create to vl_dri2_screen_create
In a preparation of having proper multi-platform/backend handling in VL.

With follow up commits we'll introduce a dispatch within vl_screen
similar to the one in pipe_screen. This way any VL state-tracker can
operate seamlessly, considering the backend/platform is properly setup.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:56:34 +00:00
Emil Velikov
c31218cdb3 st/va: trivial cleanup
Drop the temporary variable and fold the two conditional.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:56:17 +00:00
Emil Velikov
a8f45e0161 st/omx: straighten get/put_screen
The current code is busted in a number of ways.

 - initially checks for omx_display (rather than omx_screen), which may
or may not be around.
 - blindly feeds the empty env variable string to loader_open_device()
 - reads the env variable every time get_screen is called
 - the latter manifests into memory leaks, and other issues as one sets
the variable between two get_screen calls.

Additionally it cleans up a couple of extra bits
 - drops unneeded set/check of omx_display.
 - make the teardown (put_screen) order was not symmetrical to the setup
(get_screen)

v2: Drop the "is empty string" check (Leo)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-20 10:56:10 +00:00
Emil Velikov
7157085140 automake: loader: don't create an empty dri3 helper
Seems that creating an empty one does not fair too well with MacOSX's
ar. Considering that all the users of the helper include it only when
needed, let's reshuffle the makefile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92985
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:35 +00:00
Emil Velikov
115f179852 automake: loader: honour the XCB_DRI3 cflags
Without this the compilation will fail, as the headers are installed in
a non-default location.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:29 +00:00
Emil Velikov
166314dd88 automake: egl: add symbols test
Should help us catch issues where we expose any extra symbols by
mistake. Just like the ones fixes with previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:23 +00:00
Emil Velikov
5a79e0a8e3 automake: loader: rework the CPPFLAGS
Rather than duplicating things, just use the generic AM_CPPFLAGS. This
has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
helper. The latter of which was erroneously exposing some internal
symbols.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:11 +00:00
Ian Romanick
99840eb983 i965: Enable EXT_shader_samples_identical
On the vec4 backend, textureSamplesIdentical() will always return
false.  There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation.  We also don't think
anyone is likely to miss it.

v2: Handle immediate value for MCS smarter.  Rebase on changes to
nir_texop_sampels_identical (missing second parameter).  Suggested by
Jason.

v3: Add Neil's code to handle 16x MSAA in the FS.  Also rebase on top of
f9a9ba5e.  Stub out the vec4 implementation.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v2]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
2015-11-19 20:17:16 -08:00
Ian Romanick
84b6c64efc i965/vec4: Handle nir_tex_src_ms_index more like the scalar
v2: Rebase on top of f9a9ba5e.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
457bb290ef nir: Add nir_texop_samples_identical opcode
This is the NIR analog to GLSL IR ir_samples_identical.

v2: Don't add the second nir_tex_src_ms_index parameter.  Suggested by
Ken and Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
06c56f443a glsl: Add textureSamplesIdenticalEXT built-in functions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
8343583557 glsl: Add ir_samples_identical opcode
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
ef54434c52 glsl: Extension tracking for EXT_shader_samples_indentical
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
ff59700d29 mesa: Extension tracking for EXT_shader_samples_indentical
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:15 -08:00
Ian Romanick
b1b9f68d4c Import current draft of EXT_shader_samples_identical spec
v2: Add Neil to the list of contributors.  I meant to do that before,
but Matt reminded me.

v3: Fix typos noticed by Nicolai.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:15 -08:00
Rob Clark
acca6c65d3 nir: add nir_ssa_for_alu_src()
Using something like:

   numer = nir_ssa_for_src(bld, alu->src[0].src,
                           nir_ssa_alu_instr_src_components(alu, 0));

for alu src's with swizzle, like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_2 = udiv ssa_10.xx, ssa_11

ends up turning into something like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10
   ...

because nir_ssa_for_src() ignore's the original nir_alu_src's swizzle.
Instead for alu instructions, nir_src_for_alu_src() should be used to
ensure the original alu src's swizzle doesn't get lost in translation:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10.xx
   ...

v2: check for abs/neg, and re-use existing nir_alu_src

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
c73f40c473 nir: fix missing increments of num_inputs/num_outputs
Note: not quite perfect, we should use type_size vfunc (in
compiler_options or nir_shader?) to determine how much we
increment num_inputs/outputs/uniforms.  But we don't have
that yet, so let's at least fix things for the existing
users of these passes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
fec9367deb nir/print: show # of uniforms/inputs/outputs
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-19 20:03:32 -05:00
Rob Clark
01e94d8d5d nir/print: show shader name/label if set
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
006e4f070f nir: add nir_var_all enum
Otherwise, passing -1 gets you:

  error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Ilia Mirkin
769b3ab6c5 freedreno/a4xx: fix 5_5_5_1 texture sampler format
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-19 19:00:18 -05:00
Ilia Mirkin
a05e5491c3 freedreno/a4xx: add depth clamp and halfz clip
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 19:00:18 -05:00
Ilia Mirkin
b17a405609 freedreno/a4xx: allow seamless cubemap filtering to be enabled per-texture
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 19:00:18 -05:00
Ilia Mirkin
0a4462ad6e freedreno/a4xx: support lod_bias
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-19 19:00:17 -05:00
Samuel Pitoiset
0cfc1304be nv50: allow using inline vertex data submit when gl_VertexID is used
The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.

This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")

The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 21:11:38 +01:00
Samuel Pitoiset
9e40a621c1 nv50: add NV84_3D macro
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 21:11:27 +01:00
Matt Turner
a5b3115f0a i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.
The previous two commits make this unnecessary.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
f9a9ba5eac i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().
Cuts 1.5k of .text.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
9b978046eb i965/fs: Use brw_imm_uw().
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.

Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
3ccc41ecfc i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

   text     data      bss      dec      hex  filename
5204535   214112    27784  5446431   531b1f  i965_dri.so before
5193977   214112    27784  5435873   52f1e1  i965_dri.so after

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
c15a407eb4 i965: Make brw_imm_vf4() take 8-bit restricted floats.
This partially reverts commit bbf8239f92.

I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.

Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.

brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Nanley Chery
e8c5ef3eca mesa: Add test for sorted extension table
Enable developers to know if the table's alphabetical sorting
is maintained or lost.

v2: Move "*" next to pointer name (Matt)
    Include extensions_table.h instead of extensions.h (Ian)
    Remove extra " *" in comment (Ian)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-19 11:12:45 -08:00
Nanley Chery
f030227f46 mesa/extensions: Sort the extension table alphabetically
Make it easier to determine where to add new extensions.
Performed with the vim sort command.

v2: Insert newline after last #define (Matt)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-19 10:35:20 -08:00
Ilia Mirkin
bcda79676a docs: GL3.1 for a3xx and a4xx
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 12:26:28 -05:00
Ryan Houdek
0ec218d167 mesa: enable EXT_blend_func_extended if the driver supports the ARB version
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
f7c23f225f mesa: allow MAX_DUAL_SOURCE_DRAW_BUFFERS to be available to ES
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
4b549f0d8c mesa: enable usage of blend_func_extended blend factors in GLES2
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
33ddc8e865 glsl: add a parse check to check for the index layout qualifier
This can only be used if EXT_blend_func_extended is enabled

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
ef9e6d1ec8 glsl: add GL_EXT_blend_func_extended preprocessor define
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
1d1d02f2ac glsl: add support for EXT_blend_func_extended builtins
gl_MaxDualSourceDrawBuffersEXT - Maximum dual-source draw buffers supported

For ESSL 1.0, it provides two builtins since you can't have user-defined
color output variables:
  gl_SecondaryFragColorEXT
  gl_SecondaryFragDataEXT[MaxDSDrawBuffers]

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
ceecb0876f glsl: add EXT_blend_func_extended parser enables
This adds a state for the maximum dual source draw variables available
and the variable for determining if the extension has been enabled
in the program shaders.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
625414f78c glapi: add EXT_blend_func_extended XML definitions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Brian Paul
15f8dc7b23 os: check for GALLIUM_PROCESS_NAME to override os_get_process_name()
Useful for debugging and for glretrace.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-19 09:23:04 -07:00
Connor Abbott
f1ba0a5ea0 glsl: fix ir_constant::equals() for doubles
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Connor Abbott
84ed3819a4 glsl: fix isinf() for doubles
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Connor Abbott
7820b2c071 nir: fix constant folding of bfi
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Brian Paul
1cfffb95eb hud: fix Windows build break
Protect signal-related code with PIPE_OS_UNIX test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-19 07:57:09 +00:00
Ian Romanick
2f55476153 glsl: Fix off-by-one error in array size check assertion
Apparently, this has been a bug since 2010 (c30f6e5d).

Also use ARRAY_SIZE instead of open coding it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-11-18 18:35:56 -08:00
Ian Romanick
0aded03046 mesa: Don't expose GL_EXT_shader_integer_mix in GLES 1.x
There are no shaders, so it doesn't even make sense to expose the
extension.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
2015-11-18 18:35:56 -08:00
Ian Romanick
37c2cfa6bc glsl: Silence unused parameter warnings
builtin_functions.cpp:5289:52: warning: unused parameter 'num_arguments' [-Wunused-parameter]
                                           unsigned num_arguments,
                                                    ^
builtin_functions.cpp:5290:52: warning: unused parameter 'flags' [-Wunused-parameter]
                                           unsigned flags)
                                                    ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-18 18:35:56 -08:00
Ian Romanick
c82498c4da glsl: Silence ignored qualifier warning
I think the intention was to mark the "this" parameter as const, but
const goes on the other end to do that.

In file included from glsl_symbol_table.cpp:26:0:
ast.h:339:35: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
    const bool is_single_dimension()
                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-18 18:35:56 -08:00
Kenneth Graunke
fc19a0d2e4 i965: Allow indirect GS input indexing in the scalar backend.
This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.

All indirects are handled via the pull model.  We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data.  Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly.  We'd need to be careful.

v2: Use updated MOV_INDIRECT opcode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-18 15:42:36 -08:00
Jimmy Berry
09d610796c gallium/hud: document GALLIUM_HUD_PERIOD in envvars.html.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-11-19 00:02:34 +01:00
Jimmy Berry
56a1c10bb8 gallium/hud: control visibility at startup and runtime.
- env GALLIUM_HUD_VISIBLE: control default visibility
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-11-19 00:02:33 +01:00
Jason Ekstrand
0bee3acc2a i965/nir: Add hooks for testing nir_shader_clone
This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader.  Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-18 12:28:55 -08:00
Jason Ekstrand
9fbd390dd4 nir: Add support for cloning shaders
This commit is heavily based on one by Rob Clark <robdclark@gmail.com> but
reworked to re-use nir_create functions and do less hashing.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 12:28:32 -08:00
Kenneth Graunke
9ff71b649b i965/nir: Validate that NIR passes call nir_metadata_preserve().
Failing to call nir_metadata_preserve() can have nasty consequences:
some pass breaks dominance information, but leaves it marked as valid,
causing some subsequent pass to go haywire and probably crash.

This pass adds a simple validation mechanism to ensure passes handle
this properly.  We add a new bogus metadata flag that isn't used for
anything in particular, set it before each pass, and ensure it *isn't*
still set after the pass.  nir_metadata_preserve will reset the flag,
so correct passes will work, and bad passes will assert fail.

(I would have made these functions static inline, but nir.h is included
in C++, so we can't bit-or enums without lots of casting...)

Thanks to Dylan Baker for the idea.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Kenneth Graunke
7bc0978999 i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
OPT() is the normal macro for passes that return booleans, while OPT_V()
is a variant that works for passes that don't properly report progress.
(Such passes should be fixed to return a boolean, eventually.)

These macros take care of calling nir_validate_shader() and setting
progress appropriately.  In the future, it would be easy to add shader
dumping similar to INTEL_DEBUG=optimizer by extending the macro.

v2 (Jason Ekstrand):
 - Fix an unused variable warning

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
d27ae2cf8c nir: add array length field
This will simplify things somewhat in clone.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
624ec66653 nir: remove nir_variable::max_ifc_array_access
No users.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
4671c13852 freedreno/a4xx: add fake RGTC support (required for GL3)
The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support
(required for GL3)'

TODO some more r/e.. maybe we get lucky and hw supports some of this
directly?  For now this will help us enable gl3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
2379cc9fe0 freedreno/a4xx: add compressed texture formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
fadd39442b freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
4607b2b9b6 freedreno: expose GLSL 140 and fake MSAA for GL3.0/3.1 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
9c409c8df3 freedreno/a3xx: fix texture buffers, enable offsets
The main issue is that the current logic looked into cso->u.tex, which
is the wrong side of the union to look into for texture buffers. While I
was at it, it was easy enough to add the logic to handle offsets
(first_element).

 - reduce texture buffer size limit (determined experimentally)
 - don't look at first/last levels, instead look at first/last element
 - include the first element offset
 - set offset alignment to 16 (determined experimentally)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
d69e557f2a freedreno: add support for conditional rendering, required for GL3.0
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
059da344ec freedreno/a3xx: add fake RGTC support (required for GL3)
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.

Lastly fix up Z32S8 transfers to non-0 layers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
84d087aea2 freedreno/a3xx: add missing formats to enable ARB_vertex_type_2_10_10_10_rev
The previously RE'd formats were from an ES driver implementing
OES_vertex_type_10_10_10_2 and thus backwards. A future change could add
the 2_10_10_10 support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
8106fec74c freedreno/a3xx+a4xx: fix for stk binning pass hang
We'd end up in a state where shader uses no inputs, yet num_elements is
greater than zero.  Triggered by a TF vertex shader which did:

  gl_Position = vec4(0.0, 0.0, 0.0, 0.0);

resulting in a binning pass variant with no inputs.

Includes equiv fix in a4xx, even though we don't have binning-pass
enabled yet on a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
b24c9a8aee freedreno/a3xx+a4xx: fix GL_POINTS lockup w/ GLES
point_size_per_vertex is always TRUE for GLES, causing us to configure
the hw as if gl_PointSize was written, even if it was not.  Which makes
for grumpy hw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
b40e144a66 nir: fix typo in idiv lowering, causing large-udiv-udiv failures
In nv50, and in the python script that Rob circulated, we do:

   bld.mkCmp(OP_SET, CC_GE, TYPE_U32, (s = bld.getSSA()), TYPE_U32, m, b);

Do the same in the nir div lowering pass. This fixes the large-udiv-udiv
piglit tests on freedreno.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Oded Gabbay
4581f8428e llvmpipe: disable VSX in ppc due to LLVM PPC bug
This patch disables the use of VSX instructions, as they cause some
piglit tests to fail

For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7

With this patch, ppc64le reaches parity with x86-64 as far as piglit test
suite is concerned.

v2:
- Added check that we have at least LLVM 3.4
- Added the LLVM bug URL as a comment in the code

v3:

- Only disable VSX if Altivec is supported, because if Altivec support
is missing, then VSX support doesn't exist anyway.

- Change original patch description.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-18 21:27:29 +02:00
Ilia Mirkin
8e68113c1a nvc0/ir: actually emit AFETCH on kepler
Looks like this was forgotten in the commit which added the AFETCH
logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-18 14:26:16 -05:00
Kenneth Graunke
2631bfd62c nir: Store the size of the TCS output patch in nir_shader_info.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-18 10:49:18 -08:00
Kenneth Graunke
b196f1fff3 i965: Add enums for 3DSTATE_TE field values.
3DSTATE_TE has partitioning, output topology, and domain fields,
each of which has several enumerated values.  We'll also need to
switch on the domain, so enums (rather than #defines) seem like a
natural fit.

I chose to put these in brw_compiler.h because they'll be stored
in struct brw_tes_prog_data, which will live there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-18 10:49:18 -08:00
Ian Romanick
72e232374e meta/generate_mipmap: Don't leak the framebuffer object
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-18 09:38:21 -08:00
Brian Paul
1a48326a84 svga: use more VGPU10 formats
We always want to prefer the VGPU10 formats over the VGPU9 ones when
we have VGPU10 support.

Original patch by Jose and updated by Brian.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:16:12 -07:00
Brian Paul
1a90e3e1e3 svga: add/use new svga_sampler_format() function
This is important for the case of sampling from a depth texture.  In
that case, we need to sample the texture as if it were a single-channel
color texture.  For other/color formats, we can use the format as-is.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:15:54 -07:00
Nicolai Hähnle
27ce75ed12 radeon: count cs dwords separately for query begin and end
This will be important for perfcounter queries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
ffd01b7781 radeon: expose r600_query_hw functions for reuse
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
50f0f938e3 radeon: implement r600_query_hw_get_result via function pointers
We will need the clear_result override for the batch query implementation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
c207c55fc0 radeon: split hw query buffer handling from cs emit
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
1d10b3d01e radeon: convert hardware queries to the new style
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
019106760d radeon: convert software queries to the new style
Software queries are all queries that do not require suspend/resume
and explicit handling of result buffers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
829a9808a9 radeon: add query handler function pointers
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
50cab4788d radeon: move R600_QUERY_* constants into a new query header file
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
c56e83e518 radeon: cleanup driver query list
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
e117e74baf radeon: move get_driver_query_info to r600_query.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:11 +01:00
Neil Roberts
5dfb4dbc05 i965: Prevent fast clears for MSRTs on SKL
There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
prevents fast clears for MSRTs on SKL altogether so that when fast
clears are eventually enabled it will only be for single-sampled
surfaces. The assumption is that clears are probably more likely to be
used in single-sampled applications anyway so we can at least get them
working and we can enable MSRTs later once we understand the problem
better.

This patch should have no functional effect other than perhaps
receiving fewer perf_debug messages on SKL+.

v2: Improve the commit message to avoid saying the patch disables fast
    clears because it will be merged before fast clears are enabled
    for any surfaces so it doesn't actually disable anything.
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-18 10:29:07 +01:00
Eric Anholt
dd05ffebfc vc4: Don't bother lowering uniforms when the same value is used twice.
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered.  The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).

No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).
2015-11-17 17:45:23 -08:00
Eric Anholt
dffe7260cd vc4: Fix uniform reordering to support reading the same uniform twice.
This does actually happen in the wild (particularly fabs of a uniform), so
we'd like to support it.
2015-11-17 17:45:23 -08:00
Eric Anholt
d18d1ba587 vc4: Fix documentation on vc4_qir_lower_uniforms.c. 2015-11-17 17:45:23 -08:00
Eric Anholt
a4bf28178f vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 17:45:23 -08:00
Kenneth Graunke
27b1d34438 i965: Fix PIPE_CONTOL typo.
PIPE_CONTOL!!!
2015-11-17 16:33:48 -08:00
Ben Widawsky
c531d40927 i965: Add assertion for src_stencil payload size
This helps address a coverity warning and prevents future questions about this
code.

Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-17 14:47:33 -08:00
Kenneth Graunke
2bec154b47 i965: Implement ARB_pipeline_statistics_query tessellation counters.
We basically just need to uncomment Ben's code.

v2: Fix obvious bugs caught by Ben.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-17 14:23:26 -08:00
Timothy Arceri
d4fbf11b58 glsl: rename location layout helper
Change name from validate -> apply to more accurately describe what
the function does.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-18 07:30:23 +11:00
Timothy Arceri
03bbddd139 glsl: don't validate binding when its not needed
Checking that the flag has been set is all the validation thats
needed here.

Also not calling the binding validation function will make things
much simpler when adding compile time constant support as we
won't need to resolve the binding value.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:30:19 +11:00
Timothy Arceri
4f4ca6b90a glsl: remove temp variable to make code easier to read
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:30:07 +11:00
Timothy Arceri
a01b8c7e77 glsl: cleanup and fix validate matrix function for arrays
Previously if the member was an array of matrices then a
warning message would be incorrectly given.

Also the struct case could never be met so it has been removed.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:58 +11:00
Timothy Arceri
f8b5cc827e glsl: use better location in struct and block error messages
Previously we only gave the location for some members and never
gave the variable location. In those cases we were just giving
the location of the struct/block.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:53 +11:00
Timothy Arceri
c54865db78 glsl: only do type and qualifier validation once per declaration
For struct and block members previously we were doing it for
every variable declaration.

So for example

struct S {
  atomic_uint x, y, z;
};

Would previously generate three error messages when one is sufficient.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:47 +11:00
Timothy Arceri
14d343b024 glsl: rename function that processes struct and iface members
As of the previous commit this function handles only struct/iface
members.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:37 +11:00
Timothy Arceri
8cf795dc7c glsl: move block validation outside function that validates members
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:32 +11:00
Timothy Arceri
649803742d glsl: move ast layout qualifier handling code into its own function
We now also only apply these rules to variables rather than also
trying to apply them to function params.

V2: move code for handling stream layout qualifier

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:25 +11:00
Kenneth Graunke
5b596f3878 i965: Add INTEL_DEBUG=shader_time support for tessellation shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:34:04 -08:00
Kenneth Graunke
df87cb837f i965: Add INTEL_DEBUG=tcs,tes and hs,ds flags for tessellation shaders.
Even though both tessellation shader stages must be used together, I
still think it makes sense to add separate debug flags for each stage.
It makes it possible to read the TCS/HS, rule out problems, then read
the TES/DS separately, without sifting through as much printed text.

I decided to add both the GL names (tcs/tes) and hardware names (hs/ds)
so they can be used interchangeably.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:33:54 -08:00
Kenneth Graunke
e9b0fa496c i965: Add more MAX_*_URB_ENTRY_SIZE_BYTES #defines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-11-17 10:18:08 -08:00
Kenneth Graunke
874a1ed813 i965: Add missing stdio.h include to brw_compiler.h.
This is needed for the FILE * type in brw_print_vue_map().

Apparently, all files that include brw_compiler.h already pick this up
via some include chain, so this isn't actually a build fix.  However,
I have patches which introduce new consumers of brw_compiler.h that
fail to build because of the missing #include.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-17 10:18:08 -08:00
Martin Peres
4518eea065 egl: make it clear which platform x11 backend is being used (dri2 or 3)
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
fcdc798515 egl/x11_dri3: Implement EGL_KHR_image_pixmap
v2: from Martin Peres
 - Replace a tab with spaces

v3: from Martin Peres
 - disable EGL_KHR_image_pixmap when is_different_gpu is set (Axel Davy)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
bd6131a8d1 loader/dri3: Expose function to create __DRIimage from pixmap
Used to support EGL_KHR_image_pixmap.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
f35198bade egl/x11: Implement dri3 support with loader's dri3 helper
v2: From Martin Peres
 - Tell we are compiling the dri3 backend in configure.ac
 - Update the Makefile.am
 - get rid of the LIBDRM_HAS_RENDERNODE_SUPPORT macro
 - fix some warnings related to EGLuint64KHR to int64_t conversions
 - use dri2_get_dri_config to get the __DRIconfig instead of open-coding it
 - replace the occasional tabs with spaces

v3: From Martin Peres
 - fix and indent problem (Matt Turner)
 - drop the authenticate function, use NULL in the vtable instead (Emil)
 - drop some useless includes (Emil Velikov)
 - mandate libdrm (Emil Velikov)
 - link to xcb-dri3 (Kristian Høgsberg)
 - convert to the new loader interface for drwable (Kristian)
 - remove some dead code after the dropping of some vfuncs (Kristian)
 - add a comment on the topic of rendering to the frontbuffer

v4: From Martin Peres
 - do not expose the preserved swap behavior (Acked by Eric Anholt)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
a25df54571 egl_dri2: Add a function to let platform code return dri drawable from _EGLSurface
dri3 for EGL will use different struct other than dri2_egl_surface for
an EGL surface, the common code only uses __DRIdrawable from that
struct, so instead of converting _EGLSurface to dri2_egl_surface, let
the platform code return the __DRIdrawable by its own (although the
current platforms use the same function).

v2: From Martin Peres
 - convert to the new drawable interface (Kristian)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
fdacbc439e glx/dri3: Convert to use dri3 helper in loader library
v2: From Martin Peres
 - convert to the new drawable interface
 - delete dead code after the dropping of some vfuncs
 - delete the width and height attributes since they are found in the helper

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
6bd9ba7d07 loader: Add dri3 helper
v2: From Martin Peres
 - Try to fit in the 80-col limit as much as possible

v3: From Martin Peres
 - introduce loader_dri3_helper.la to avoid dragging the xcb dep everywhere (Kristian & Emil)
 - get rid of the width, height, dri_screen and is_different_gpu vfuncs (Kristian)
 - replace the create/destroy functions with init/fini for dri3 drawables
 - prefix static functions with dri3_ and exported ones with loader_dri3 (Emil)
 - keep the function definition consistent (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Eduardo Lima Mitev
252b143e9e i965: Return the correct value type from brw_compile_gs()
brw_compile_gs() should return a pointer to unsigned, but it is returning the
bool 'false' at some point, hence annoying us with a compiler warning:

In function 'const unsigned int* brw::brw_compile_gs(const brw_compiler*,
   void*, void*, const brw_gs_prog_key*, brw_gs_prog_data*, const nir_shader*,
   gl_shader_program*, int, unsigned int*, char**)':

brw_vec4_gs_visitor.cpp:776:14: warning: converting 'false' to pointer type
                                'const unsigned int*' [-Wconversion-null]
                                return false;
                                       ^
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-17 12:50:09 +01:00
Samuel Iglesias Gonsálvez
dfa60e7057 glsl: copy each field's precision information in glsl_types's structure constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
688b58c40c glsl: copy each field's precision information from the old gl_PerVertex interface block
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
cfe32cfa8e glsl: copy each field's precision information when generating varying variables
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
91eefe8505 glsl: initialize data.precision value in ir_variable constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
58954e4daa glsl/nir: initialize precision field in glsl_struct_field constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
a96afaced8 nir: reduce memory footprint of glsl_struct_field's precision
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:41 +01:00
Tapani Pälli
f4f30ad730 mesa: do runtime validation of precision varyings only on ES
Precision qualifier should be ignored on desktop OpenGL.

v2: include spec quote (Samuel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-17 09:23:54 +02:00
Tapani Pälli
023fd58fd6 glsl: initialize precision when adding per vertex record fields
Fixes issues with tessellation builtin variables since precision was
introduced to IR with commit f84bc57d7d.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-17 07:37:13 +02:00
Kenneth Graunke
292df19401 i965: Set MaxCombinedUniformBlocks properly.
Up until now, we've been letting core Mesa initialize it to 36 for us
(which is presumably BRW_MAX_UBO (12) * (VS+GS+FS stages -> 3)).

With compute and tessellation, we need to increase this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Kenneth Graunke
5ee5dfddea i965: Clean up context constant initialization code.
This was getting pretty out of hand, and with compute partially in place
and tessellation on the way, it was only going to get worse.

This patch makes a "stage exists?" predicate and a "number of stages"
count and uses them to clean up a lot of calculations.  We can just
loop over shader stages and set things for the ones that exist.  For
combined counts, we can just multiply by the number of stages.

It also tries to organize a little bit.

We should probably use _mesa_has_geometry_shaders/tessellation/compute
here, but we can't because ctx->Version isn't initialized yet.  Perhaps
that could be fixed in the future.

No change in "glxinfo -l" on Broadwell.

v2: Drop stray compute shader hunk.  Mark stage_exists as const.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Kenneth Graunke
44d6c0c805 i965: Convert scalar_* flags to a scalar_stage array.
I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array.  Simpler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Roland Scheidegger
a2611ffe4b r200: fix bgrx8/xrgb8 blits
Since 779cabfc7d the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there).
This is untested but essentially addressing the same bug as for radeon.
(I don't think that the second entry per le/be table is actually necessary,
but shouldn't hurt...)

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 01:04:09 +01:00
Roland Scheidegger
983614dbed radeon: fix bgrx8/xrgb8 blits
Since d21320f625 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there). This caused lots of piglit regressions (and probably lots of
trouble outside piglit too).
This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900.

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 01:01:38 +01:00
Ian Romanick
c40a88b6c5 meta/generate_mipmap: Only modify the draw framebuffer binding in fallback_required
Previously GL_FRAMEBUFFER was used.  However, if GL_EXT_framebuffer_blit
is supported (note: it is supported by every Mesa driver), this is
*sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes*
an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER
(setters).  As a result, the code saved one binding but modified both.
If the bindings were different, the GL_READ_FRAMEBUFFER would be
incorrect on exit.

Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test.

Ideally this function would use DSA functions and not modify the binding
at all.  However, that would be a much more intrusive change because
_mesa_meta_bind_fbo_image would also need to be modified.
_mesa_meta_bind_fbo_image has a lot of callers.  Much of this code is
about to get a major rework due to bug #92363, so I don't think it
matters too much.  In fact, I discovered this bug while working on the
other bug.  Le bon temps!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-16 10:30:10 -08:00
Matt Turner
d564b5b58e nir/glsl: Fix copy-n-paste mistakes from commit 213f864.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-16 09:05:53 -08:00
Alex Deucher
00f554abba radeonsi: enable optimal raster config setting for fiji (v2)
Requires proper kernel tiling configuration so check the tiling
config registers.

v2: send the right version of the patch

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-16 10:09:47 -05:00
Alex Deucher
5b37d8b50c radeonsi: use proper GRBM_GFX_INDEX offset for CI+
The offset is different on CI and newer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-16 10:09:34 -05:00
Neil Roberts
2ca018cb65 docs: Add 16x MSAA on i965 to the release notes
Signed-off-by: Neil Roberts <neil@linux.intel.com>
2015-11-16 14:36:27 +01:00
Emil Velikov
1780a562bc nv50: add missing header into the sources list
Otherwise it won't end up in the tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-16 10:49:14 +00:00
Juan A. Suarez Romero
40c2acef5c nir/glsl_to_nir: use _mesa_fls() to compute num_textures
Replace the current loop by a direct call to _mesa_fls() function.

It also fixes an implicit bug in the current code where num_textures
seems to be one value less than it should be when sh->Program->SamplersUsed > 0.

For instance, num_textures is 0 instead of 1 when
sh->Program->SamplersUsed is 1.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-16 09:24:28 +01:00
Iago Toral Quiroga
3f34afa0aa nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.

v2: remove the check for source modifiers from is_move() (Jason)

v3: Put the check for source modifiers back into is_move() since
    this function is called from copy_prop_alu_src(). Add source
    modifiers checks to is_vec() instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-16 08:11:13 +01:00
Ilia Mirkin
ff17b3ccf4 nv50,nvc0: disable render condition around clear_* functions
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 20:15:22 -05:00
Kenneth Graunke
d2f089ba17 i965: Introduce a MOV_INDIRECT opcode.
The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex).  The thread payload contains
several URB handles which reference these separate memory segments.

In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read.  This
array index does not necessarily need to be constant.

To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)

This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)

Based on a patch by Jason Ekstrand.

v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
    a bit more generic.  Use regs_read() instead of hacking up the
    register allocator.  (Suggested by Jason Ekstrand.)

v3: Fix regs_read() to be more accurate for small unaligned regions.
    Also rebase on Matt's work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
2015-11-14 16:41:37 -08:00
Samuel Pitoiset
848fa3101d nv50: add support for performance metrics on G84+
Currently only one metric is exposed but more will be added later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:46 +01:00
Samuel Pitoiset
6a9c151dbb nv50: add compute-related MP perf counters on G84+
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.

As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.

Only G84+ is supported because G80 is an old and weird card.

Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:42 +01:00
Samuel Pitoiset
ff72440b40 nv50: implement a basic compute support
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.

This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute

I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.

Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.

Note that, textures, samplers and surfaces still need to be implemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:15 +01:00
Samuel Pitoiset
7167a058ba nv50: free interpolation parameters in nv50_program_destroy()
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:16:12 +01:00
Samuel Pitoiset
69271bba06 nvc0: reduce the number of GPR used when reading MP perf counters
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-14 17:38:57 +01:00
Ilia Mirkin
f94e1d9738 nouveau: don't expose HEVC decoding support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-14 10:32:10 -05:00
Vinson Lee
3a0fef0005 nir: Silence GCC maybe-uninitialized warnings.
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_after = after;
               ^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_before = before;
                ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-13 16:19:11 -08:00
Kenneth Graunke
5480bbd90e i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.
We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index.  We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-11-13 16:11:02 -08:00
Kenneth Graunke
511de1a80c glsl: Allow implicit int -> uint conversions for the % operator.
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them.  (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)

This allows expressions such as:

   int foo;
   uint bar;
   uint mod = foo % bar;

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-13 16:09:58 -08:00
Kenneth Graunke
a4ba476c30 i965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems.  I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.

It's not that much spam, so I figure we may as well just print it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-13 16:08:51 -08:00
Kenneth Graunke
f88c175a29 i965: Make convert_attr_sources_to_hw_regs handle stride == 0.
This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.

Nobody uses this today, but I plan to.

v2: Rebase on Matt's changes; simplify.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2015-11-13 15:17:58 -08:00
Kenneth Graunke
26f9469a46 nir: Add helpers for getting input/output intrinsic sources.
With the many variants of IO intrinsics, particular sources are often in
different locations.  It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:46 -08:00
Kenneth Graunke
d12bde0944 nir: Don't lower TCS outputs to temporaries.
We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs.  For now, disable it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:46 -08:00
Kenneth Graunke
134728fdae nir: Allow outputs reads and add the relevant intrinsics.
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program.  However, this is infeasible for tessellation
control shader outputs.

Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs.  These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.

Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.

Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.

v2: Rebase to accomodate Jason's review feedback.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:41 -08:00
Kenneth Graunke
c51d7d5fe3 nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.
Similar to nir_load_per_vertex_input, but for outputs.  This is not
useful in geometry shaders, but will be useful in tessellation shaders.

v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
    taking a nir_variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:10 -08:00
Kenneth Graunke
0df452cd0d nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.
Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs.  There aren't per-patch TCS inputs.

Tessellation evaluation shaders have both per-vertex and per-patch
inputs.  Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.

v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
    which takes a variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:10 -08:00
Ian Romanick
1cb49eedb5 i965: Silence unused parameter warnings in get_buffer_rect
brw_meta_fast_clear.c: In function 'get_buffer_rect':
brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter]
 get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb,
                                     ^
brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter]
                 struct intel_renderbuffer *irb, struct rect *rect)
                                            ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-13 12:29:57 -08:00
Ian Romanick
758f12fd98 meta/generate_mipmap: Don't leak the sampler object
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-13 12:29:56 -08:00
Matt Turner
7a879e422b i965: Remove unneeded #includes.
Some of these are no longer needed since all the backends switched to
NIR.
2015-11-13 12:16:48 -08:00
Matt Turner
386759b02d i965: Silence warning.
intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
       ann->error = ralloc_strdup(annotation->mem_ctx, error);
                         ^

I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.
2015-11-13 12:13:14 -08:00
Juha-Pekka Heikkila
8b145d6a3d i965: Don't write beyond allocated memory.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-13 12:06:11 -08:00
Matt Turner
0eb3db117b i965: Use BRW_MRF_COMPR4 macro in more places.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
49b3215d70 i965: Combine register file field.
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b3315a6f56 i965: Replace HW_REG with ARF/FIXED_GRF.
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
4b0fbebf02 i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).
The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1.  This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).

The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).

In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b99e1fd547 i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b163aa0148 i965: Rename GRF to VGRF.
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
5a23b31c75 i965: Move BAD_FILE from the beginning of enum register_file.
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
dba309fc14 i965: Initialize registers.
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero.  The next commit will move
BAD_FILE in the enum so that it's no longer zero.

In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
7638e75cf9 i965: Use brw_reg's nr field to store register number.
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
3048053908 i965: Unwrap some lines.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
58fa9d47b5 i965/vec4: Remove swizzle/writemask fields from src/dst_reg.
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
94b1031703 i965: Remove fixed_hw_reg field from backend_reg.
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
1392e45bfb i965: Use immediate storage in inherited brw_reg.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
d74dd703f8 i965: Add and use enum brw_reg_file.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
977df90d65 i965: Reorganize brw_reg fields.
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
e42fb0c2a6 i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.
Generated by

   sed -i -e 's/\.bits\././g' *.c *.h *.cpp
   sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
182f137521 i965: Delete type field from backend_reg.
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
433df2e03c i965: Delete abs/negate fields from backend_reg.
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
c7ed5d1d1c i965: Make backend_reg inherit from brw_reg.
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
88f349c4e1 i965/fs: Replace nested ternary with if ladder.
Since the types of the expression were

   bool ? src_reg : (bool ? brw_reg : brw_reg)

the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,

   bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)

In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,

   src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)

Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-13 11:27:50 -08:00
Marek Olšák
3694d58e6c radeonsi: remove dead code after ES-GS linkage change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
d79a3449a7 radeonsi: link ES-GS just like LS-HS
This reduces the shader key for ES.

Use a fixed attrib location based on (semantic name,  index).

The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
2f5d911ba2 radeonsi: rename si_update_gs_rings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
4acd856088 radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
a0cf589961 radeonsi: move maximum gs stream calculation into create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3ab0c49f04 radeonsi: clean up small duplication in si_shader_gs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
eb0d3e8a90 gallium/radeon: shorten render_cond variable names
and ..._cond -> ..._invert

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
70c40cc989 gallium/radeon: remove predicate_drawing flag
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
12596cfd4c gallium/radeon: atomize render condition (SET_PREDICATION)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3521907622 gallium/radeon: simplify restoring render condition after flush
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
600e212d87 gallium/radeon: don't use PREDICATION_OP_CLEAR
Not setting the predication bit is sufficient.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6eff5415e4 gallium/radeon: simplify disabling render condition for u_blitter
just disable it by not setting the predication bit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8dd1ee6ff3 r600g: don't set predication on non-draw packets
This has no effect.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6cc8f6c6a7 gallium/radeon: inline the r600_rings structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8569f9a87e gallium/radeon: remove the IB flushing flag
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
81d412e02c gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_space
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
10130ccd8c radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as well
I missed this in commit c3e527f93d
    radeonsi: only enable write confirmation on the last CP DMA packet

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
40912dd91e radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney
otherwise the SX or CB blocks can go bananas

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-13 19:54:41 +01:00
Marek Olšák
f7757100f2 radeonsi: add glClearBufferSubData acceleration
8-bit and 16-bit clears which are not aligned to dwords are done in software.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19773f9805 radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flag
Buffer clears via transform feedback won't set this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19a9c1ecc7 gallium/u_blitter: add support for multi-dword clear values in clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
e15c5c7a06 radeonsi: fix a future crash in emit_cb_target_mask
This can't crash currently, but it would crash if clear_buffer
from u_blitter were used with a clean context.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
65d0c558d5 radeonsi: fix unaligned clear_buffer fallback
This is unreachable currently, but it will be used by unaligned 8-bit and
16-bit fills.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:40 +01:00
Marek Olšák
7f1e34e6c8 r600g: fix clear_buffer fallback with offset != 0
Discovered by luck. This code path hasn't been exercised since transform
feedback was implemented.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:40 +01:00
Marek Olšák
01526136ba gallium/radeon: fix PIPE_QUERY_GPU_FINISHED
Broken by the addition of r600_multi_fence
in 3b37155a68

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89014

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:40 +01:00
Brian Paul
40663864d2 mesa: minor comment fix in blend.c 2015-11-13 08:02:19 -07:00
Brian Paul
5a5efbf804 docs: add link to Coverity on developer utilities page
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-13 08:02:19 -07:00
Brian Paul
00046393f8 docs: update VMware driver instructions
Use a LIBDIR variable, set per-platform.
Update the Mesa configuration flags.
Run update-initramfs or dracut, update /etc/modules

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-13 08:02:19 -07:00
Daniel Stone
d1314de293 egl/wayland: Ignore rects from SwapBuffersWithDamage
eglSwapBuffersWithDamage accepts damage-region rectangles to hint the
compositor that it only needs to redraw certain areas, which was passed
through the wl_surface_damage request, as designed.

Wayland also offers a buffer transformation interface, e.g. to allow
users to render pre-rotated buffers. Unfortunately, there is no way to
query buffer transforms, and the damage region was provided in surface,
rather than buffer, co-ordinate space.

Users could in theory account for this themselves, but EGL also requires
co-ordinates to be passed in GL/mathematical co-ordinate space, with an
inversion to Wayland's natural/scanout co-ordinate space, so
transformations other than a 180-degree rotation will fail as EGL
attempts to subtract the region from (its view of the) surface height.

Pending creation and acceptance of a wl_surface.buffer_damage request,
which will accept co-ordinates in buffer co-ordinate space, pessimise to
always sending full-surface damage.

bce64c6c provides the explanation for why we send maximum-range damage,
rather than the full size of the surface: in the presence of buffer
transformations, full-surface damage may not actually cover the entire
surface.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 10:09:23 +00:00
Iago Toral Quiroga
a29d922c1a Revert "nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers"
The change proposed in the review leads to piglit regressions because
is_move() is used in other places and relies on the checks for source
modifiers to be there.

Revert this until we agree on a better solution.
2015-11-13 08:53:10 +01:00
Samuel Iglesias Gonsálvez
5f004fd197 glsl: fix 'shared' layout qualifier related regressions
Commit 8b28b35 added 'shared' as a keyword for compute shaders
but it broke the existing 'shared' layout qualifier support for
uniform and shader storage blocks.

This patch fixes 578 dEQP-GLES31.functional.ssbo.* tests.

v2:
- Move SHARED to interface_block_layout_qualifier (Timothy)
- Don't remove "shared" case insensitive check (Timothy)
- Remove the clearing of shared_storage flag (Timothy)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-13 08:04:49 +01:00
Iago Toral Quiroga
8610cd6b8c nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.

v2: remove the check for source source modifiers from is_move() (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 07:54:33 +01:00
Jason Ekstrand
5f43e074d4 nir/vars_to_ssa: Delete dead output set code
This was a remnant of an early attempt to handle output reads in
vars_to_ssa.  That attempt was abandon a long time ago but these few lines
were aparently left in the pass and managed to evade review.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Jason Ekstrand
226ba889a0 nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_store
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets.  This commit changes this to only remove it from the other
node's copy set (not the one we're lowering).  At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies.  This has two advantages:

 1) It's more efficient because we're doing potentially half as many set
    search operations.

 2) It now properly handles copies from a node to itself.  Perviously, it
    would delete the copy from the set when processing the destinatioon and
    then assert-fail when we couldn't find it for the source.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Jason Ekstrand
4bbf2ac06e nir/validate: Allow subroutine types for the tails of derefs
The shader-subroutine code creates uniforms of type SUBROUTINE for
subroutines that are then read as integers in the backends.  If we ever
want to do any optimizations on these, we'll need to come up with a better
plan where they are actual scalars or something, but this works for now.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92859
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Nanley Chery
79f68306d2 mesa: Replace gl_extensions::EXT_texture3D with ::dummy_true
Mesa unconditionally sets this driver flag to true in
_mesa_init_extensions(). There is therefore no need for
the driver to communicate support for this extension.
Replace the driver capability flag with ::dummy_true.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 21:31:05 -08:00
Brian Paul
2de2e1702b mesa: fix MSVC build break in extensions.h
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-12 16:57:18 -07:00
Ilia Mirkin
39f51ec96f nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATION
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-12 17:58:42 -05:00
Ilia Mirkin
e3d9dbe304 gallium: add support for gl_HelperInvocation semantic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-11-12 17:58:23 -05:00
Ilia Mirkin
20748318c5 glsl: add gl_HelperInvocation system value
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-12 17:58:23 -05:00
Jordan Justen
b52cb9ec6a glsl: Correctly handle vector extract on function parameter
This commit accidentally used a '==' when '=' was intended.

commit 96b22fb080
Author: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Date:   Wed Nov 4 14:58:54 2015 -0800

    glsl: Use array deref for access to vector components

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-12 14:11:16 -08:00
Nanley Chery
a16ffb743c mesa: In helpers, only check driver capability for meta
Make API context and version checks done by the helper functions pass
unconditionally while meta is in progress. This transparently makes
extension checks solely dependent on struct gl_extensions while in meta.

v2: Use an 8-bit data type instead of a GLuint

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
5645770742 mesa/extensions: Prefix global struct and extension type
Rename the following types and variables:
* struct extension -> struct mesa_extension,
  like the mesa_format type.
* extension_table -> _mesa_extension_table,
  like the _mesa_extension_override_{enables,disables} structs.

Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
ab129a44ae mesa: Generate a helper function for each extension
Generate functions which determine if an extension is supported in the
current context. Initially, enums were going to be explicitly used with
_mesa_extension_supported(). The idea to embed the function and enums
into generated helper functions was suggested by Kristian Høgsberg.

For performance, the function body no longer uses
_mesa_extension_supported() and, as suggested by Chad Versace, the
functions are also declared static inline.

v2: Place function qualifiers on separate line (Chad)
v3: Move function curly brace to new line (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
eda15abd84 mesa/extensions: Replace extension::api_set with ::version
The api_set field has no users outside of _mesa_extension_supported().
Remove it and allow the version field to take its place.

The brunt of the transformation was performed with the following vim commands:
s/\(GL [^,]\+\),\s*\d*,\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1, GLL, GLC\2\3/g
s/\(GLL [^,]\+\)\,\s*\d*/\1, GLL/g
s/\(GLC [^,]\+\)\(,\s*\d*\),\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1\2, GLC\3\4/g
s/\( ES1[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4, ES1/g
s/\( ES2[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4\6, ES2/g

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
a82bc779af mesa/extensions: Use _mesa_extension_supported()
Replace open-coded checks for extension support with
_mesa_extension_supported().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
f6a818e76d mesa/extensions: Create _mesa_extension_supported()
Create a function which determines if an extension is supported in the
current context.

v2: Use common variable names (Emil)
    Insert new line between variables and return statement (Chad)
    Rename api_set variable to api_bit (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
f47df8f729 mesa/extensions: Add extension::version
Enable limiting advertised extension support by context version with
finer granularity. This new field is currently unused and is set to
0 everywhere. When it is used, a value of 0 will indicate that the
extension is supported for any version of a context.

v2: Use uint*t type for version and note the expected values (Emil)
    Use an 8-bit data type
    Reformat macro for better readability (Chad)

v3: Note preparatory nature of commit (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
8bd82a91c0 mesa/extensions: Move entries entries to separate file
With this infrastructure set in place, we can now reuse the entries to
generate useful code.

v2: Add the new file into Makefile.sources (Emil)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
c0b568f3db mesa/extensions: Wrap array entries in macros
Now that we're using macros, remove the redundant text from each entry.

Remove comments between the entries to make editing easier and separate
the sections with blank lines. Structure the EXT macros in a way that
helps reviewers verify that no meaning has been altered.

v2: Indent the entries (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
e5af09f9ba mesa/extensions: Remove array sentinel
Simplify future updates to the extension struct array by removing
the sentinel.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Matt Turner
74e48e9544 i965: Check instructions appear only on supported hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:06:08 -08:00
Matt Turner
0b45d47f71 i965: Add initial assembly validation pass.
Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit 6c846dc5.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:06:04 -08:00
Matt Turner
34ed45557e i965: Add annotation_insert_error() and support for printing errors.
Will allow annotations to contain error messages (indicating an
instruction violates a rule for instance) that are printed after the
disassembly of the block.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
a280e83d71 i965: Combine assembly annotations if possible.
Often annotations are identical between sets of consecutive
instructions. We can perhaps avoid some memory allocations by reusing
the previous annotation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
93e371c140 i965: Set annotation_info's mem_ctx.
It was being memset to 0 previously.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-12 11:00:10 -08:00
Matt Turner
9ab45b4df9 i965: Don't consider control flow instructions to have sources.
And why did IFF have a destination?

I suspect that once upon a time the disassembler used this information
to know which fields to find the jump targets in. The jump targets have
moved, so the disassembler has to know how to handle these
per-generation anyway.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
0865e743c1 i965: Fill out instruction list.
Add some instructions: illegal, movi, sends, sendsc.

Remove some instructions with reused opcodes: msave, mrestore, push,
pop, goto. I did have some gross code for disassembling opcodes
per-generation, but there's very little meaningful overlap so it's
probably not needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
238877207e ralloc: Set *start in ralloc_vasprintf_rewrite_tail() if str is NULL.
We were leaving it undefined, even though we were writing a string to
*str.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
903050694b i965: Consolidate is_3src() functions.
Otherwise I'll have to add another later in this series.
2015-11-12 11:00:10 -08:00
Brian Paul
3e74038280 st/wgl: add a comment about recursive locking in stw_make_current()
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
f45b644e11 st/wgl: add a lock assertion in stw_framebuffer_from_hwnd_locked()
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
José Fonseca
a1c9feafd5 st/wgl: add some mutex checking code
This would have caught the locking bug that was fixed in the earlier
"st/wgl: fix locking issue in stw_st_framebuffer_present_locked()"
patch.

v2: minor coding style changes by Brian.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
166769fe4b st/wgl: rename stw_framebuffer_release() to stw_framebuffer_unlock()
To match the new stw_framebuffer_lock() function.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
dabc423ed0 st/wgl: reimplement stw_framebuffer::mutex with CRITICAL_SECTION
v2: update comments on the stw_framebuffer::mutex field regarding locking
order.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
f71508ae79 st/wgl: include u_debug.h
To get declaration for debug_printf() directly instead of getting it
indirectly through os_thread.h

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
fce68832c5 st/wgl: reimplement stw_device::fb_mutex with CRITICAL_SECTION
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
fa30de7643 st/wgl: re-implement stw_device::ctx_mutex with CRITICAL_SECTION
This is Windows-only code so we can use the native Win32 functions for
critical sections.  This will also allow us to (cleanly) add some mutex
check/debug code in subsequent patches.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:24 -07:00
Brian Paul
a02385cd69 gallium/hud: add cpu graph support for Windows
We support "cpu" but not "cpu#" because there's no good way of querying
per-cpu usage.  Also, the cpu usage is for the process, not the whole
system.

Original code cobbled together by Brian and then fixed/polished by Jose.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-12 09:11:15 -07:00
Tapani Pälli
f2fe607261 glsl: set matrix_stride for non matrices with atomic counter buffers
Patch sets matrix_stride as 0 for non matrix uniforms that are in a
atomic counter buffer. Matrix stride calculation for actual matrix
uniforms is done during link_assign_uniform_locations.

From ARB_program_interface_query specification:

GL_MATRIX_STRIDE:

   "For active variables not declared as a matrix or array of matrices,
   zero is written to <params>.  For active variables not backed by a
   buffer object, -1 is written to <params>, regardless of the variable
   type."

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-12 14:15:29 +02:00
Tapani Pälli
7e6dac1186 mesa: validate precision of varyings during ValidateProgramPipeline
Fixes following failing ES3.1 CTS tests:

   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingFloat
   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingInt
   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingUInt

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-12 09:50:14 +02:00
Tapani Pälli
5bd122cad9 glsl: do not lose precision information when packing varyings
This information will be used by cross stage validation of varyings
for pipeline objects.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-12 09:50:14 +02:00
Iago Toral Quiroga
f84bc57d7d glsl: Add precision information to ir_variable
We will need this later on when we implement proper support for
precision qualifiers in the drivers and also to do link time checks for
uniforms as indicated by the spec.

This patch also adds compile-time checks for variables without precision
information (currently, Mesa only checks that a default precision is set
for floats in fragment shaders).

As indicated by Ian, the addition of the precision information to
ir_variable has been done using a bitfield and pahole to identify an
available hole so that memory requirements for ir_variable stay the
same.

v2 (Ian):
  - Avoid if-ladders by defining arrays of supported sampler names and
    indexing
    into them with type->sampler_array + 2 * type->sampler_shadow
  - Make the code that selects the precision qualifier to use an utility
    function
  - Fix a typo

v3 (Tapani):
  - rebased
  - squashed in "Precision qualifiers are not allowed on structs"
  - fixed select_gles_precision for sampler arrays
  - fixed precision_qualifier_allowed for arrays of structs

v4 (Tapani):
  - add atomic_uint handling
  - do not allow precision qualifier on images
  (issues reported by Marta)

v5 (Tapani):
  - support precision qualifier on image types

v6 (Tapani):
  - set precision qualifier on interface block members

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
9a00e1a69d glsl: Move the definition of precision_qualifier_allowed
We will need this to build later patches

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
e6629d814f glsl: Add user-defined default precision qualifiers to the symbol table
Notice that the spec requires that a default precision has been set for every
type used by a shader that can use a precision qualifier and does not have a
predefined precision, however, at the moment, Mesa only checks this for floats
in the fragment shader. This is probably because the GLSL ES 1.0 specs mentions
this case specifically, but GLSL ES 3.0 clarifies that the same applies to
other types:

"The fragment language has no default precision qualifier for floating point
 types. Hence for float, floating point vector and matrix variable
 declarations, either the declaration must include a precision qualifier or
 the default float precision must have been previously declared. Similarly,
 there is no default precision qualifier for the following sampler types in
 either the vertex or fragment language:

 sampler3D;
 samplerCubeShadow;
 sampler2DShadow;
 sampler2DArray;
 sampler2DArrayShadow;
 isampler2D;
 isampler3D;
 isamplerCube;
 isampler2DArray;
 usampler2D;
 usampler3D;
 usamplerCube;
 usampler2DArray;"

we will fix this in a later patch.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
e3082fb273 glsl: Add default precision qualifiers to the symbol table
The GLSL ES spec specifies default precision qualifiers for certain types,
so populate the symbol table with these.

Notice that the desktop GLSL spec also indicates defaults for some types
but this is not really useful since precision qualifiers are completely
ignored in desktop GLSL.

v2: simplify and add samplerExternalOES, specified by
    OES_EGL_image_external (Tapani)

v3: add atomic_uint (reported missing by Marta)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
d6a6167354 glsl: Add API to put default precision qualifiers in the symbol table
These have scoping rules that match the ones defined for other things such
as variables, so we want them in the symbol table.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Samuel Iglesias Gonsálvez
d4fdb84f80 i965/fs/nir: fix the number of register written by FS_OPCODE_GET_BUFFER_SIZE
FS_OPCODE_GET_BUFFER_SIZE is calculated with a resinfo's sampler message.

This patch adjusts the number of registers written by the opcode
following what the PRM spec says about the number of registers written
by the SIMD8 and SIMD16's writeback messages for sampler messages.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-12 08:39:14 +01:00
Ben Widawsky
55314c5be4 i965/skl/gt4: Fix URB programming restriction.
The comment in the code details the restriction. Thanks to Ken for having a very
helpful conversation with me, and spotting the blurb in the link I sent him :P.

There are still stability problems for me on GT4, but this definitely helps with
some of the failures.

v2: Comment fixes

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-11 18:13:19 -08:00
Ilia Mirkin
c4182bb9b0 nv50,nvc0: add ARB_clear_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-11 19:20:41 -05:00
Ilia Mirkin
ae39b0fda8 st/mesa: implement ARB_clear_texture
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 19:20:41 -05:00
Ilia Mirkin
3695b253f9 gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 19:20:41 -05:00
Timothy Arceri
725fcdfbb1 glsl: add helper to check for enhanced layouts support
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-12 10:18:14 +11:00
Timothy Arceri
82e4f22d1e mesa: add ARB_enhanced_layouts
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-12 10:18:08 +11:00
Dave Airlie
df8af7d751 r600: initialised PGM_RESOURCES_2 for ES/GS
This fixes the corruption on rendering that we are seeing in
certain geometry shaders.

Fixes:  https://bugs.freedesktop.org/show_bug.cgi?id=91780
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested / Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-12 09:03:13 +10:00
Kenneth Graunke
918bda23dd i965: Split nir_emit_intrinsic by stage with a general fallback.
Many intrinsics only apply to a particular stage (such as discard).
In other cases, we may want to interpret them differently based on
the stage (such as load_primitive_id or load_input).

The current method isn't that pretty - we handle all intrinsics in
one giant function.  Sometimes we assert on stage, sometimes we forget.
Different behaviors are handled via if-ladders based on stage.

This commit introduces new nir_emit_<stage>_intrinsic() functions,
and makes nir_emit_instr() call those.  In turn, those fall back to
the generic nir_emit_intrinsic() function for cases they don't want
to handle specially.

This makes it clear which intrinsics only exist in one stage, and makes
it easy to handle inputs/outputs differently for various stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-11 11:57:37 -08:00
Ilia Mirkin
912babba7b mesa/copyimage: allow width/height to not be multiples of block
For compressed textures, the image size is not necessarily a multiple of
the block size (e.g. the last mip levels). Section 18.3.2 (Copying
Between Images) of the OpenGL 4.5 Core Profile spec says:

    An INVALID_VALUE error is generated if the dimensions of either
    subregion exceeds the boundaries of the corresponding image
    object, or if the image format is compressed and the dimensions of
    the subregion fail to meet the alignment constraints of the
    format.

and Section 8.7 (Compressed Texture Images) says:

    An INVALID_OPERATION error is generated if any of the following
    conditions occurs:

      * width is not a multiple of four, and width + xoffset is not
        equal to the value of TEXTURE_WIDTH.
      * height is not a multiple of four, and height + yoffset is not
        equal to the value of TEXTURE_HEIGHT.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-11 14:37:55 -05:00
Jason Ekstrand
80890eb0d3 i965/brw_reg: Add a brw_VxH_indirect helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-11 10:52:30 -08:00
Brian Paul
68993f77cd mesa: remove old comments in arrayobj.c 2015-11-11 09:38:22 -07:00
Brian Paul
9870a5c6c9 st/wgl: clarify code in stw_framebuffer_from_hwnd_locked()
Just a minor code change to make it obvious that NULL is returned when
we don't find the given HWND.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-11 09:38:22 -07:00
Brian Paul
004ed6f4a9 st/wgl: improve some function comments
In particular, explain when stw_framebuffer objects are
locked/unlocked/etc.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-11 09:38:22 -07:00
Brian Paul
b93cb6c1dc st/wgl: whitespace/formatting fixes 2015-11-11 09:38:22 -07:00
Brian Paul
eb812921ac st/wgl: fix locking issue in stw_st_framebuffer_present_locked()
When stw_st_framebuffer_present_locked() is called, the
stw_framebuffer's mutex will already be locked.  Normally, the
stw_framebuffer_present_locked() function calls
stw_framebuffer_release() to unlock the mutex when it's done.  But if
for some reason the 'resource' pointer in
stw_st_framebuffer_present_locked() is null, we'd return without
unlocking the stw_framebuffer.  This fixes that to avoid potential
deadlocks.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-11 09:38:22 -07:00
Kenneth Graunke
e42a29531a i965: Print force_writemask_all in dump_instructions().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-11 08:35:15 -08:00
Kenneth Graunke
ecb5e0a986 i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.
A while back, we moved to directly emitting the Gen7+ state when
constructing the binding tables.  These flags are only used on
Gen4-6, which emit all the binding table pointers at once.

We gain nothing by having separate flags, so combine them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-11 08:33:58 -08:00
Kenneth Graunke
a2987ff57f i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.
Inspired by a patch by Fabian Bieler.

Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid
topology type); I instead chose to make a macro that takes an argument.
He also took the number of patch vertices from _mesa_prim (which was set
to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to
avoid the need for the VBO patch.

v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match
    the documentation (suggested by Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-11 08:33:48 -08:00
Emil Velikov
cbb7d90e57 docs: add news item and link release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-11 11:18:32 +00:00
Emil Velikov
6435d8ac5a docs: add sha256 checksums for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 66c949d0a1)
2015-11-11 11:16:43 +00:00
Emil Velikov
07948b03fb docs: add release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ee57c22141)
2015-11-11 11:16:42 +00:00
Glenn Kennard
3f45d29fe4 r600g: Pass conservative depth parameters to hw
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-11 09:06:25 +10:00
Dave Airlie
b3e793f2db Revert "r600g: Pass conservative depth parameters to hw"
This reverts commit a1fc78911e.

I pushed the wrong patch.
2015-11-11 09:05:50 +10:00
Glenn Kennard
c878d61124 r600g: Implement ARB_texture_view
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-11 08:36:08 +10:00
Glenn Kennard
a1fc78911e r600g: Pass conservative depth parameters to hw
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 08:32:35 +10:00
Eduardo Lima Mitev
de51676b41 i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.

This patch detects these situations and prevents fusing fmul+fadd into ffma.

Shader-db results on i965 Haswell:

total instructions in shared programs: 6235835 -> 6225895 (-0.16%)
instructions in affected programs:     1124094 -> 1114154 (-0.88%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                7612
HURT:                                  843
GAINED:                                4
LOST:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-10 21:13:35 +01:00
Eduardo Lima Mitev
fb3b5669ce util: Add list_is_singular() helper function
Returns whether the list has exactly one element.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-10 21:13:35 +01:00
Eduardo Lima Mitev
94ff35204d nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver
Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.

This is safe because i965 is the only consumer.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-10 21:13:35 +01:00
Kristian Høgsberg Kristensen
96b22fb080 glsl: Use array deref for access to vector components
We've assumed that we could lower per-component vector access from

  vec[i] = scalar

to

  vec = ir_triop_vector_insert(vec, scalar, i)

but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.

Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.

The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Kristian Høgsberg Kristensen
60dd5287ff glsl: Lower UBO and SSBO access in glsl linker
All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Kristian Høgsberg Kristensen
f0e95c2500 glsl: Drop exec_list argument to lower_ubo_reference
We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Connor Abbott
213f86416f nir/glsl: switch to using the builder
v2: use nir_bulder_cf_insert (Ken)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:56:43 -05:00
Connor Abbott
fbbfb7c025 nir/glsl: make emit() take nir_ssa_def * sources
Again, this matches what the builder will have to do.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:56:35 -05:00
Connor Abbott
a60e990dd2 nir/glsl: convert nir_visitor::result to a nir_ssa_def *
Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:55:54 -05:00
Connor Abbott
30fe8eaa8e nir/glsl: make evaluate_rvalue() return a nir_ssa_def *
A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:55:14 -05:00
Jose Fonseca
6f42162329 st/mesa: Destroy buffer object's mutex.
Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-10 11:04:28 +00:00
Kenneth Graunke
db54673b54 nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.
These tessellation shader related fields need plumbing through NIR.

v2: Use uint32_t instead of uint64_t to match the source type of
    GLbitfield (caught by Iago Toral).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-10 01:03:43 -08:00
Eric Anholt
437d7b6119 vc4: Avoid loading undefined (newly-allocated) FBO contents.
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear.  For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
2015-11-09 19:17:36 -08:00
Eric Anholt
5980389bbf vc4: Return NULL when we can't make our shadow for a sampler view.
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09 19:17:36 -08:00
Eric Anholt
eb8fb0064d vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09 19:17:36 -08:00
Eric Anholt
84608e07e7 vc4: Add CL dumping for GL_ARRAY_PRIMITIVE. 2015-11-09 19:17:36 -08:00
Eric Anholt
855a3ca598 vc4: Fix a compiler warning. 2015-11-09 19:17:36 -08:00
Jordan Justen
fb3da129d1 glsl: Use shared storage variable type for shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:24 -08:00
Jordan Justen
32746fc9b4 glsl: Add shared variable type
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.

These variables are similar to OpenCL's local/__local variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:24 -08:00
Jordan Justen
c0ac4740a7 glsl: Add space to shader_storage in print_visitor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:17 -08:00
Jordan Justen
007d96730e glsl: Align comments on variables types
v2:
 * Split from patch to add ir_var_shader_shared (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:17 -08:00
Jordan Justen
8b28b35531 glsl: Parse shared keyword for compute shader variables
v2:
 * Move shared parsing under storage qualifiers (tarceri)
 * Fail to compile if shared is used in non-compute shader (tarceri)
 * Use separate shared_storage bit for shared variables (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:12 -08:00
Timothy Arceri
a4a46fe3fa glsl: simplify interface block stream qualifier validation
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.

Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-10 12:02:30 +11:00
Ilia Mirkin
3ea3727998 docs: note that ARB_copy_image was added to nv50, nvc0 in this release
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-09 07:14:07 -05:00
Brian Paul
28f6faca51 st/wgl: add null pointer check for HUD texture
Fixes crash when using HUD with Nobel Clinician Viewer.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
Brian Paul
75d1e363ff st/wgl: fix double-present on swapbuffers bug
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers.  First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given.  Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.

Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.

Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
Brian Paul
8083943e2e st/wgl: reorder pixel formats to put MSAA formats last
And put 8-bit/channel formats before 5/6/5 formats.

The ChoosePixelFormat() function seems to be finicky about format
selection.  Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format.  Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.

VMware bug 1455030

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
José Fonseca
e524df5ef3 st/wgl: Don't rely on GDI to bookkeep pixelformat for us.
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.

See also e4a4f15f5b
2015-11-09 11:08:27 +00:00
Michel Dänzer
24abbaff9a winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3
Fixes GPUVM conflicts with non-4K page size.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738

v2: Replace sanitization of VM base address alignment with comment why
    that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
    (Marek)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-09 17:24:32 +09:00
Christian König
df4f9b0236 radeon/uvd: add H.265/HEVC to legal notes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-08 18:16:01 -05:00
Leo Liu
519502d08f st/omx: add headless support
This will allow dec/enc/transcode without X

v2:  use env override even with X,
     use loader_open_device instead of open
v3:  clean up

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
25526d77b1 st/va: use vl screen drm support from vl_wys_drm
v2: move the dup to vl_wys_drm for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
7da86e0ec0 vl: add drm support for vl_screen
This will allow the state trackers to use render nodes
with screen creation

v2: dup fd for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
d115e47099 st/va: fix build fails with pipe loader
There is no dev in drv, and dev should be from vl_screen here

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Samuel Pitoiset
ffb60e7788 nvc0: enable compute support on Fermi
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.

This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-08 16:47:59 +01:00
Ilia Mirkin
e06238cb9e nv50/ir: fix emission of s[] args in certain situations
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Ilia Mirkin
af218217d7 nv50/ir: only take abs value when computing high result
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Ilia Mirkin
53cbb11707 nouveau: avoid queueing too much work onto a single fence
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.

This fixes crashes in teximage-colors --benchmark with too many active
maps.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Dave Airlie
0f5b1409fd llvmpipe: disable front updates for now
As pointed out by Emil, this sometimes hangs, appears to be due to threading

need to rethink how this stuff works for llvmpipe.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-08 07:55:17 +10:00
Dave Airlie
87711183ac virgl: wrap ret assignment with braces to do correct thing
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-08 06:27:02 +10:00
Jason Ekstrand
6c731d8566 nir: Add a nir_deref_tail helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 12:09:44 -08:00
Jason Ekstrand
7d90e570f3 nir/types: Add an is_vector_or_scalar helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 12:09:38 -08:00
Jason Ekstrand
d43e16b163 i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16).  This isn't actually true.  In
particular, the PLN instruction reads 2 logical registers in one of the
components.  This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:48 -08:00
Jason Ekstrand
c839174d55 nir/validate: Add better validation of load/store types
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:35 -08:00
Marek Olšák
d57ede92b7 radeonsi: add register definitions for Stoney
There are a few non-stoney changes too.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
2658777f46 radeonsi: add workarounds for CP DMA to stay on the fast path
v2: set emit_scratch_reloc, add a NULL check

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
fc0416ef5d radeonsi: unify CP DMA preparation logic
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
89da3b4458 radeonsi: unify CP DMA code determining various flags
v2: don't call get_flush_flags twice per function

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Marek Olšák
c3e527f93d radeonsi: only enable write confirmation on the last CP DMA packet
This should improve performance for big copies that need to be split.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Ilia Mirkin
8e9ade7eb3 nv50/ir: allow emission of immediates in imul/imad ops
Nothing actually uses this yet (due to complications), but the emission
logic is right.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 00:42:15 -05:00
Ilia Mirkin
393d0c336b nv50/ir: properly set the type of the constant folding result
This removes the hack used for merge, which only covers a fraction of
the cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin
2f9aaed749 nv50/ir: add support for const-folding OP_CVT with F64 source/dest
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin
76957389fc nv50/ir: add fp64 opcode emission support for G200 (NVA0)
Need to emulate rcp/rsq before providing full fp64 support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:36:25 -05:00
Hans de Goede
f979d3cfec nv50/ir: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
9f2f8bda6e nvc0/ir: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"

This turns the following (nvc0) code:
  1: mov u32 $r2 0x00000000 (8)
  2: mov u32 $r3 0x3fe00000 (8)
  3: add f64 $r0d $r0d $r2d (8)

Into:
  1: add f64 $r0d $r0d 0.500000 (8)

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
428506ece2 nv50/ir: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin
2437f00853 nv50/ir: disallow 64-bit immediates on nv50 targets
No instructions are able to load short immediates like nvc0 can.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin
11e3dac36e nv50/ir: allow movs with TYPE_F64 destinations to be split
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
b487b55f7d gm107/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Hans de Goede
12c850d01c nvc0/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Francisco Jerez
5169407221 i965/nir/fs: Add comment for no-op memory barrier functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-06 13:19:56 -08:00
Jordan Justen
faa1193070 i965/nir/fs: Implement new barrier functions for compute shaders
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:

 * nir_intrinsic_memory_barrier_atomic_counter
 * nir_intrinsic_memory_barrier_buffer
 * nir_intrinsic_memory_barrier_image

We treat these nir intrinsics as no-ops:
 * nir_intrinsic_group_memory_barrier
 * nir_intrinsic_memory_barrier_shared

v3:
 * Add comment for no-op cases (curro)

v4:
 * Moving comment to a separate patch authored by curro

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:16:11 -08:00
Jordan Justen
9d65f3208b nir: Add new barrier functions for compute shaders
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:15:16 -08:00
Jordan Justen
91f188710a glsl: Add new barrier functions for compute shaders
When these functions are called in GLSL code, we create an intrinsic
function call:

 * groupMemoryBarrier => __intrinsic_group_memory_barrier
 * memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
 * memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
 * memoryBarrierImage => __intrinsic_memory_barrier_image
 * memoryBarrierShared => __intrinsic_memory_barrier_shared

v2:
 * Consolidate with memoryBarrier function/intrinsic creation (curro)

v3:
 * Instead of add_memory_barrier_function, add an intrinsic_name
   parameter to _memory_barrier (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:14:44 -08:00
Boyuan Zhang
6bad554d98 radeon/uvd: fix VC-1 simple/main profile decode v2
We just needed to set the extra width/height fields to get this working.

v2 (chk): rebased, CC stable added, commit message added, fixed coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Boyuan Zhang
ed55def44f st/vaapi: fix vaapi VC-1 simple/main corruption v2
Apply the start code fix only to advanced profile.

v2 (chk): add commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Julien Isorce
cc1e5c972e st/va: add support for RGBX and BGRX in VPP
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.

This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:45 +00:00
Julien Isorce
42a5e143a8 vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:38 +00:00
Julien Isorce
bf6acbb2db st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:16 +00:00
Julien Isorce
bfc245e9ac st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:01 +00:00
Rob Clark
6459e780ae freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:19:04 -05:00
Rob Clark
7465e16124 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:18:47 -05:00
Guillaume Charifi
6f5e0c08a4 freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:58 -05:00
Boyan Ding
8f55ebe802 freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:53 -05:00
Rob Clark
99597d033a nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit.  Plus couple other corrections/
clarifications.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-06 11:15:41 -05:00
Ilia Mirkin
d68226087c nvc0: reintroduce BGRA4 format support
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.

The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 00:47:44 -05:00
Brian Paul
581111c4d6 mesa: report enum name in glClientActiveTexture() error string
As we do for glActiveTexture().  Trivial.
2015-11-05 20:12:33 -07:00
Julien Isorce
497bde6727 st/va: fix memory leak on error in vlVaCreateSurfaces2
Found by coverity: CID #1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:45 +00:00
Julien Isorce
e0b896c86c st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:43 +00:00
Kenneth Graunke
8dcf807cb4 i965: Fix scalar VS float[] and vec2[] output arrays.
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken).  Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4.  However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4.  This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them.  Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s.  So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
    by Connor Abbott), and rename it to type_size_vec4_times_4()
    (renaming suggested by Jason Ekstrand).  Use type_size_vec4
    rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 15:26:07 -08:00
Roland Scheidegger
5ae37ae615 llvmpipe: disable texture cache
There are some weird problems with 8-wide vectors.
2015-11-05 18:00:42 +01:00
Ilia Mirkin
ba093a099a nouveau: send back a debug message when waiting for a fence to complete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
4f6cd5fad0 nv50,nvc0: provide debug messages with shader compilation stats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
4335b28840 nouveau: add support for sending debug messages via KHR_debug
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
6706cc1671 st/clover: provide a path for drivers to call through to pfn_notify
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]
2015-11-05 11:22:19 -05:00
Ilia Mirkin
c93c9d220b st/mesa: set debug callback for debug contexts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
fc76cc05e3 gallium: expose a debug message callback settable by context owner
This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Ilia Mirkin
e587590a83 st/mesa: account for texture views when doing CopyImageSubData
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Iago Toral Quiroga
eea3c907cc i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
eca4c43a33 i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
6105d1d0a0 i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
d7013988fb i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
027b64a55a i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Neil Roberts
6c5f371a27 i965/skl+: Enable support for 16x multisampling
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:21 +01:00
Neil Roberts
aa3f9aaf31 mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
    [Ian Romanick]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
b080b3d54d meta/blit: Always try to enable GL_ARB_sample_shading
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
2dd76ec16e meta: Support 16x MSAA in the multisample scaled blit shader
v2: Fix the x_scale in the shader. Remove the doubts in the commit
    message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
1a22b12fc5 i965/meta: Support 16x MSAA in the meta stencil blit
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
a680465428 i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
bf6bd7eaf0 i965: Support allocating the MCS buffer for 16x MSAA
When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
b4c2e6054f i965: Support calculating the bits needed to set up 16x MSAA
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
1a97cac767 i965/fs: Add a sampler program key for whether the texture is 16x MSAA
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.

Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
4ef27745c8 i965/vec4/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
e386fb0dee i965/fs/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
    the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
20250e854e i965: Program 16x MSAA sample positions.
This is the standard pattern used by the other 3D graphics API.

BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.

The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.

(Based on a patch by Kenneth Graunke)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
5048da974e i965: Handle 16x MSAA in IMS dimension munging code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
b9f8e729c8 nir: Rename nir_live_variables.c to nir_liveness.c.
It doesn't actually operate on variables.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Kenneth Graunke
5c6f21579d nir: Rename live_variables to live_ssa_defs.
This computes liveness of SSA values, not nir_variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Alejandro Piñeiro
56774e6302 i965/vec4: select predicate based on writemask for sel emissions
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.

This patch helps on cases like this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.

So, with this patch, the previous case becames this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs:     219850 -> 212023 (-3.56%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                1192
HURT:                                  0
2015-11-05 08:57:23 +01:00
Ilia Mirkin
bb73fc4cb8 nouveau: relax fence emit space assert
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2015-11-04 22:43:56 -05:00
Eric Anholt
6d3a24bce8 vc4: When the create ioctl fails, free our cache and try again.
This greatly increases the pressure you can put on the driver before
create fails.  Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-04 14:04:14 -08:00
Eric Anholt
3f7c96c36c vc4: Print the rounded shader size in debug output.
It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.
2015-11-04 13:32:07 -08:00
Eric Anholt
4a951f1c08 vc4: Fix dumping the size of BOs allocated/cached.
60MB of cached BOs are a lot less scary than 600MB.
2015-11-04 13:32:07 -08:00
Ilia Mirkin
5bbd522452 mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.

Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 14:28:57 -05:00
Brian Paul
bdf6cef033 vbo: fix another GL_LINE_LOOP bug
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.

The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this.  Even 'lineloop -count 100000' doesn't
trigger the bug.

For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.

Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 11:51:59 -07:00
Brian Paul
d31481e70a svga: implement 'white_fragments' option for VGPU10 fragment shaders
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white.  We had this implemented
for VGPU9, but not VGPU10.

VMware bug 1545492.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:41 -07:00
Brian Paul
149ac1fe43 u_vbuf: minor code reformatting / line wrapping
Trivial.
2015-11-04 11:51:41 -07:00
Brian Paul
e450d4371a u_vbuf: add some const qualifiers
Trivial.
2015-11-04 11:51:40 -07:00
Brian Paul
3f98c812b3 svga: use new enum indices_mode type
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Brian Paul
fa6efbd27d util/indices: replace #define tokens with enum type
To ease debugging in gdb.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Alejandro Piñeiro
c3d7caa1e0 i965: check inst->predicate when clearing flag_live at dead code eliminate
Detected by Matt Turner while reviewing commit
a59359ecd2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-04 19:33:56 +01:00
Roland Scheidegger
c19443bc8b gallivm: fix sampling for s3tc srgb formats when using texture cache
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
2015-11-04 14:21:43 +01:00
Ben Widawsky
d56a1478a8 i965/meta: Assert fast clears and rep clears never overlap
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 21:54:11 -08:00
Ryan Houdek
13b19aa815 mesa: expose support for GL_EXT_buffer_storage
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 00:01:03 -05:00
Timothy Arceri
8e4cf900f0 glsl: make sure to only add subroutines to resource list
Over looked in 763cd8c080.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-04 15:43:12 +11:00
Timothy Arceri
f6b3c163f9 glsl: remove old TODO
SSBO support now exists as of commits f24e5e and f408a13dd3.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-04 15:40:38 +11:00
Timothy Arceri
6e3b380387 docs: Mark AoA as done for i965
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-04 13:41:16 +11:00
Timothy Arceri
5b75dbd7be i965: enable ARB_arrays_of_arrays
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-04 13:39:08 +11:00
Timothy Arceri
fb77da89f5 i965: add support for image AoA
V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.

V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-04 13:38:32 +11:00
Roland Scheidegger
9285ed98f7 llvmpipe: add cache for compressed textures
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-04 02:51:02 +01:00
Oded Gabbay
39b4dfe6ab llvmpipe: use simple coeffs calc for 128bit vectors
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 02:38:53 +01:00
Kenneth Graunke
59bbe2681b nir: Properly invalidate metadata in nir_opt_remove_phis().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
bc3942e297 nir: Properly invalidate metadata in nir_lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
0f037bd71f nir: Properly invalidate metadata in nir_opt_copy_prop().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
4cb7546066 nir: Properly invalidate metadata in nir_remove_dead_variables().
v2: Preserve live_variables too (Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Kenneth Graunke
8bb44510fc nir: Properly invalidate metadata in nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
aea40091f0 nir: Properly invalidate metadata in nir_lower_global_vars_to_local().
v2: Preserve nir_metadata_live_variables as well (caught by Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Jason Ekstrand
531be601d5 nir: Unexpose _impl versions of copy_prop and dce
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 17:06:48 -08:00
Jordan Justen
4bc16ad217 mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Iago Toral <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-03 16:44:22 -08:00
Matt Turner
cf3121ed18 i965/vec4: Send from GRF in atomic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 16:38:36 -08:00
Marek Olšák
3b37155a68 gallium/radeon: allow returning SDMA fences from pipe->flush
pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Marek Olšák
7f9122c968 gallium/radeon: always return the last SDMA fence on SDMA flush if needed
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Kenneth Graunke
36fd653817 i965: Add scalar geometry shader support.
This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy.  But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
  tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Kenneth Graunke
c9541a74e4 i965: Add scalar GS input lowering code.
We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it.  We can improve that later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Kenneth Graunke
4861835d1c i965: Fix the fs_visitor GS constructor to take shader_time_index.
Jason reworked this so it isn't simply ST_GS anymore...it's either -1
(not enabled) or an actual offset.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Ben Widawsky
5d4b019d2a i965/gen8+: Extract color clear surface state
On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 13:49:21 -08:00
Ben Widawsky
f3223ebd6c i965/gen8+: Remove redundant zeroing of surface state
The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 13:49:21 -08:00
Samuel Pitoiset
e887407491 nvc0: add missing compute parameters required by clover
This fixes crashes with some piglit OpenCL tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:17:00 +01:00
Samuel Pitoiset
e640ba41ed nvc0: handle NULL pointer in nvc0_get_compute_param()
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:16:45 +01:00
Ben Widawsky
dde33fc23c i965/skl: PCI ID cleanup and brand strings
A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.

v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-11-03 10:00:17 -08:00
Ben Widawsky
7cbd6608f5 i965/skl: Add GT4 PCI IDs
Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.

One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.

v2: No longer relevant.

v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-11-03 09:45:04 -08:00
Jordan Justen
55365a7ad5 mesa: Add spec citations for DispatchCompute*
Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 15:25:37 -08:00
Jordan Justen
44c399f20a mesa: Update DispatchComputeIndirect errors for indirect parameter
There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.

The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 15:25:37 -08:00
Matt Turner
0b19f65195 i965/fs: Clean up FBH code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
c22d62f599 i965/vec4: Clean up FBH code.
It did a bunch of unnecessary stuff, emitting an extra MOV included.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
7c81a6a647 i965: Replace default case with list of enum values.
If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
d9b09f8a30 i965/vec4: Don't disable channels in any/all comparisons.
We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)

So, if we emit

               cmp.z.f0(8) null.xy<1>D  g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
               mov(8)      g12<1>.xUD   0x00000000UD
   (+f0.all4h) mov(8)      g12<1>.xUD   0xffffffffUD

where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.

We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-02 09:33:31 -08:00
Tapani Pälli
f4466c856f mesa: fix uniforms calculation in glGetProgramiv
Since introduction of SSBO, UniformStorage contains not just uniforms
but also buffer variables, this needs to be taken in to account when
calculating active uniforms with GL_ACTIVE_UNIFORMS and
GL_ACTIVE_UNIFORM_MAX_LENGTH.

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-02 11:22:10 +02:00
Tapani Pälli
efb333acb7 mesa: fix program resource queries for atomic counter buffers
gl_active_atomic_buffer contains index to UniformStorage, we need to
calculate resource index for that gl_uniform_storage.

Fixes following CTS tests:
   ES31-CTS.program_interface_query.atomic-counters
   ES31-CTS.program_interface_query.atomic-counters-one-buffer

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 11:22:06 +02:00
Juha-Pekka Heikkila
c2c124f891 glsl: join calculate_array_size() and calculate_array_stride()
These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.

v2: Make the loop simpler to read as per Tapani's suggestion

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-02 10:03:32 +02:00
Ryan Houdek
af7c98a9c7 mesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES
This has been tested with the piglits in the mailing list and
on the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-01 23:02:06 -05:00
Ilia Mirkin
985b51551a nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-01 20:15:15 -05:00
Samuel Pitoiset
00bb524716 nv50: use correct heaps for FP and GP code segments
This is just a cosmetic change. Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-01 23:29:20 +01:00
Jordan Justen
39bb59a566 mesa/sso: Add compute shader support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
[itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 01:10:01 -07:00
Jordan Justen
6e11855050 mesa/sso: Add MESA_VERBOSE=api trace support
v2:
 * Use %u for unsigned values (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 01:09:20 -07:00
Jordan Justen
5bfe2835c2 i965: Setup pull constant state for compute programs
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 00:35:12 -07:00
Jordan Justen
a4a416f567 main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-01 00:11:42 -07:00
Jordan Justen
218e94906d glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules
The OpenGLES GLSL 3.1 specification uses the precision qualifier
ordering rules from ARB_shading_language_420pack.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-31 23:17:06 -07:00
Jordan Justen
b6e9b2b7a0 glsl: Add compute shader builtin variables for OpenGLES 3.1
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-31 23:08:09 -07:00
Ilia Mirkin
67635a0a71 nouveau: get rid of tabs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-31 19:58:14 -04:00
Connor Abbott
0ef8c5cb96 i965/sched: don't calculate live intervals for post-RA scheduling
For some reason, this causes assertions on gm965 only. In any case, it's
unnecessary since we don't need liveness information in the post-RA
scheduler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92744
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-31 08:05:52 -07:00
Dave Airlie
425d8c2578 virgl/vtest: fix extra malloc
This somehow got added twice, drop the first one.

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 18:05:33 +10:00
Dave Airlie
8d731ebd33 virgl: free sampler view on failure path
Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
7153b12651 gallium/swrast: fixup build breakage and warnings
The front buffer rendering changes broke an interface, I didn't
fix up all of them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
2b67657096 gallium/swrast: fix front buffer blitting. (v2)
So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:04:36 +10:00
Timothy Arceri
103de0225b glsl: set image access qualifiers for AoA
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-31 08:37:08 +11:00
Ian Romanick
7b3684877c i965: Do legacy userclipping in OpenGL ES 1.x contexts.
Commit fba4823a disabled user clipping for everything except
compatibility profile.  Core profile and OpenGL ES 2.0+ have all removed
the classic, OpenGL 1.0 user clip planes.  ES 1.x, however, still has
them.

Fixes OpenGL ES 1.1 conformance mustpass.c and userclip.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Olivier Berthier <olivierx.berthier@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92639
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92641
2015-10-30 14:25:33 -07:00
Emmanuel Gil Peyrot
f3d4d10a1d gbm.h: Add a missing stddef.h include for size_t.
This was causing compilation issues when one of its providers wasn’t
already included before gbm.h.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-30 19:12:14 +00:00
Emil Velikov
7bac333508 winsys/virgl: rework line wrapping/indent
Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
493e410d55 virgl: unwrap the includes
Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7154d48c6e winsys/virgl: remove temporary ret variable
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
bdcb005788 winsys/virgl: always memset prior to ioctl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
e992715da2 winsys/virgl: use MALLOC to match FREE
The uppercase versions are wrappers which must be matched.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
72d7d1e224 winsys/virgl: remove calloc/malloc casts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
1ce685f05e winsys/virgl: throw in some inline wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
78be78b681 virgl: introduce virgl_query() inline wrapper
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
dafcb21405 virgl: use virgl_screen/surface upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7af46b9c74 virgl: introduce and use virgl_transfer/texture/resource inline wrappers
The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
6b123fa07f virgl: add virgl_context/sampler_view/so_target() upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
1f43e4e1a3 winsys/virgl/drm: drop unneeded forward declaration
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
e0056228f6 virgl: remove sw_winsys pointer from virgl_screen
The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
0c82c2fb0b virgl: rename virgl.h to virgl_screen.h
Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
87f7d61e19 virgl: move virgl_hw.h into the driver dir
Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
014f8ef2ff virgl: straighten the includes confusion
Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
2c705d2220 virgl: remove the _FILE_OFFSET_BITS defines
The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
a05648fd7e winsys/virgl/drm: add all files to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
8b9e69e2ea winsys/virgl/vtest: list all files in Makefile.sources
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:36:46 +00:00
Emil Velikov
73308ca802 virgl: move sources list to Makefile.sources
... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:33:11 +00:00
Emil Velikov
c1bf71f77c virgl: fix drm.h include path
The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:29:01 +00:00
Emil Velikov
60418a28ea i965: enable ARB_shader_clock on gen7+
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:23:18 +00:00
Emil Velikov
4379ca22f1 i965: Implement nir_intrinsic_shader_clock
v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:40 +00:00
Emil Velikov
6a15517242 i965/fs: move the fs_reg::smear() from get_timestamp() to the callers
We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:36 +00:00
Emil Velikov
7682844f34 nir: add shader_clock intrinsic
v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:32 +00:00
Emil Velikov
f1d98fc90a glsl: add support for the clock2x32ARB function
v2: correctly set the return type

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:29 +00:00
Emil Velikov
51265c1b85 glsl: add ARB_shader_clock infrastructure
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:27 +00:00
Emil Velikov
e916d5e013 mesa: add infra for ARB_shader_clock
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:23 +00:00
Samuel Pitoiset
0d0329df8f nv50: do not create an invalid HW query type
While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
5f1eeb799b nv50: move HW queries to nv50_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
76b48ceee9 nv50: move nva0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
2e3fe0379e nv50: add a header file for nv50_query
Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-30 17:57:15 +01:00
Julien Isorce
e7ed3963ed st/va: add support to export a surface as dmabuf
I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:20 +01:00
Julien Isorce
802ba6f865 st/va: implement VaDeriveImage
And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:11 +01:00
Julien Isorce
5e763aaa21 st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:41 +01:00
Julien Isorce
86eb4131a9 st/va: add headless support, i.e. VA_DISPLAY_DRM
This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:35 +01:00
Julien Isorce
1bdea0e579 st/va: handle Video Post Processing for configs
Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:29 +01:00
Julien Isorce
0b868807e4 st/va: add colospace conversion through Video Post Processing
Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:10 +01:00
Julien Isorce
05b6ce4209 st/va: implement dmabuf import for VaCreateSurfaces2
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:03 +01:00
Julien Isorce
adf1133118 st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:54 +01:00
Julien Isorce
d42029d2d9 st/va: do not destroy old buffer when new one failed
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:47 +01:00
Julien Isorce
87109e5f88 st/va: properly defines VAImageFormat formats and improve VaCreateImage
Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:05:23 +01:00
Samuel Iglesias Gonsalvez
7b8cc37585 main: fix basename match's check if it's an array or struct
Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.

Fixes two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
  read. Reported by Tapani Pälli.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-30 08:12:53 +01:00
Kristian Høgsberg
f7f1bc6cca i965: Fix invalid memory accesses after resizing brw_codegen's store table
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-30 07:49:10 +01:00
Connor Abbott
73caa26e43 i965/sched: use liveness analysis for computing register pressure
Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.

This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.

Shader-db results on bdw:

total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1

total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873

LOST:   8
GAINED: 0

v2: use regs_read() in more places.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:43 -04:00
Connor Abbott
c1860299b8 i965/fs: split out calculation of payload live ranges
We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:33 -04:00
Connor Abbott
45cd76e342 i965: dump scheduling cycle estimates
The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:24 -04:00
Connor Abbott
486268bdb0 i965: always run the post-RA scheduler
Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
2015-10-30 02:19:00 -04:00
Connor Abbott
85fce2d2f5 i965/sched: write-after-read dependencies are free
Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:56 -04:00
Connor Abbott
6f231fddff i965: fix cycle estimates when there's a pipeline stall
The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:53 -04:00
Eric Anholt
04c42f3ab5 vc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).
2015-10-29 22:58:01 -07:00
Ilia Mirkin
06fa2e864a nv50: mark contexts shareable, compile at creation time
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 23:25:08 -04:00
Ilia Mirkin
f768eaa87d nv50: allow per-sample interpolation to be forced via rast
Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 22:42:38 -04:00
Matt Turner
85ee2f7fcf i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
93268939e4 i965: Add INTEL_DEBUG=hex to print the hex with the disassembly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
18b194925f i965: Print the type and writemask on null destinations.
These are often useful in debugging, and the writemask (actually
"Channel Enables") determines more than just what goes into the
destination.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
bcdf664682 i965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
ee46c1e626 i965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
8c4151b866 i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
9115fa1d13 i965/cfg: Handle no-idom case in cfg_t::dump_domtree().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
5916b073f6 i965/disasm: Remove unused _addr_mode argument from src_ia1().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
e09e5f992e i965: Set correct field for indirect align16 addrimm.
This has been wrong since the initial import of the i965 driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
fa142773d9 i965/vec4: Drop brw_set_default_* before popping insn state.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
11a7b6bbaa i965/vec4: Remove unnecessary #includes from the generator.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-29 17:51:16 -07:00
Dave Airlie
744cc036b9 r600: enable SB for geom shaders on pre-evergreen
I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.

Otherwise SB seems to be working fine for geom shaders on my
rv635.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-30 10:40:59 +10:00
Kenneth Graunke
c6b24448b5 i965/vec4: Eliminate the vec4_generator class altogether.
We really weren't taking advantage of vec4_generator being a class.
By adding a "p" parameter to the helper methods, and "prog_data" to
ones which need binding table information, we can convert everything
to static functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
1a094a2ee2 i965/vec4: Move vec4_generator class definition into the .cpp file.
The public API for the generator is brw_vec4_generate_code(); nobody
actually needs to use the class.  This means we can extend it without
triggering the recompiles associated with altering brw_vec4.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
4cba8f5d21 i965/vec4: Wrap vec4_generator in a C function.
vec4_generator is a class for convenience, but only exports a single
method as its public API.  It makes much more sense to just export a
single function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
73ff0ead36 i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.
This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging.  After that, everything is in fixed brw_regs.

This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files.  In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.

Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Ivan Kalvachev
f75f21a24a r600g: Fix special negative immediate constants when using ABS modifier.
Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.

However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.

The patch simply prevents the additional switching of NEG when ABS is set.

[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
2015-10-29 23:56:57 +01:00
Nicolai Hähnle
24c90888ae st/mesa: fix mipmap generation for immutable textures with incomplete pyramids
Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .

A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).

v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-29 23:56:57 +01:00
Nanley Chery
65f6caf43e mesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries
In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.

This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-29 15:40:37 -07:00
Nanley Chery
8090a1c326 mesa/texcompress: Restrict FXT1 format to desktop GL subset
In agreement with the extension spec and commit
dd0eb00487, filter FXT1 formats to the
desktop GL profiles. Now we no longer advertise such formats as supported
in an ES context and then throw an INVALID_ENUM error when the client
tries to use such formats with CompressedTexImage2D.

Fixes the following 26 dEQP tests:
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d

v2. Use _mesa_is_desktop_gl() (Ilia, Ian)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-10-29 15:31:59 -07:00
Samuel Pitoiset
0260620ab3 nvc0: expose a group of performance metrics on Fermi
This allows to monitor those performance metrics through
GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-29 23:01:29 +01:00
Ilia Mirkin
6166a8e369 st/mesa: create temporary textures with the same nr_samples as source
Not sure if this is actually reachable in practice (to have a complex
copy with MS textures).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-29 13:20:45 -04:00
Tapani Pälli
afbe8b6085 glsl: add fragdata arrays to program resource list
This makes sure that user is still able to query properties about
variables that have gotten removed by opt_dead_builtin_varyings pass.

Fixes following OpenGL ES 3.1 test:
   ES31-CTS.program_interface_query.output-layout

No Piglit regressions.

v2: cleanup, drop extra parenthesis (Topi)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-29 17:17:42 +02:00
Tapani Pälli
6ce0857e30 mesa: add fragdata_arrays list to gl_shader
This is required to store information about fragdata arrays, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Patch also modifies opt_dead_builtin_varyings pass to build list when
lowering fragdata arrays. This is identical approach as taken with
packed varyings pass.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-29 17:16:22 +02:00
Samuel Iglesias Gonsalvez
85f1f04413 glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays
From ARB_program_interface_query:

"For the property of BUFFER_DATA_SIZE, then the implementation-dependent
 minimum total buffer object size, in basic machine units, required to hold
 all active variables associated with an active uniform block, shader
 storage block, or atomic counter buffer is written to <params>.  If the
 final member of an active shader storage block is array with no declared
 size, the minimum buffer size is computed assuming the array was declared
 as an array with one element."

Fixes the following dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array

v2:
- Fix comment's indentation and explain that the parser already
  checked that unsized array is in last element of a shader
  storage block (Iago).
- Add assert (Iago).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-29 08:29:06 +01:00
Kenneth Graunke
cf93251bed docs: Mark GL_ARB_fragment_layer_viewport as done on i965. 2015-10-28 22:05:08 -07:00
Kenneth Graunke
8c902a580a i965: Implement ARB_fragment_layer_viewport.
Normally, we could read gl_Layer from bits 26:16 of R0.0.  However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying.  We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
5392328a32 i965: Make calculate_attr_overrides return the URB read offset.
Traditionally, we've hardcoded "URB Entry Read Offset" to 1 (which
represents 2 vec4 varying slots) to skip over the 8 DWord VUE header.

In order to support ARB_fragment_layer_viewport, we'll need to read
from that header.  This patch adds the basic plumbing necessary to
calculate a value dynamically and hook it up in the SBE packets.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
b3d19d20f2 glsl: Mark gl_ViewportIndex and gl_Layer varyings as flat.
Integer varyings need to be flat qualified - all others were already.
I think we just missed this.  Presumably some hardware passes this via
sideband and ignores attribute interpolation, so no one has noticed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
b94cdcdada i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.
Commit 268008f98c changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT.  I missed
updating this.  It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything.  We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
6ae47a3eb4 i965: Update stale comment about unused VUE map slots.
I changed this from COUNT to PAD in commit 268008f98c.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Ilia Mirkin
5227e91580 nv50/ir: adapt to new method for passing in cull/clip distance masks
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
a5bae7b31d nvc0: share shaders between contexts and build immediately
Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
b75fff70d8 nvc0: do upload-time fixups for interpolation parameters
Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.

While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.

Fixes the remaining failing glsl-1.30/execution/interpolation piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Kenneth Graunke
77f58c04cc nir: Copy "patch" flag from ir_variable to nir_variable.
This was introduced in GLSL IR after NIR development had branched.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:15:29 -07:00
Kenneth Graunke
9c8208f2c1 nir: Add intrinsics for tessellation shader system values.
nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.

nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.

nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:14:53 -07:00
Kenneth Graunke
bf05af3f0e i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.
Consider the case of two nearly identical GLSL fragment shaders:

   out vec4 color;
   void main() { color = vec4(1); }

and

   layout(early_fragment_tests) in;
   out vec4 color;
   void main() { color = vec4(1); }

These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.

Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.

When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL.  Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.

However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset.  With reuse, both
programs have the same offset, but prog_data changed.  We have to
flag it, but failed to.

To fix this, we simply need to check if the aux (prog_data) pointer
changed.  If either the assembly or the prog_data differs, flag it.

This fixes a regression since 1bba29ed40,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly.  Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!

Fixes GPU hangs in Dolphin on Broadwell.

Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
2015-10-28 21:13:54 -07:00
Laurent Carlier
37402014e8 clover: fix building fix clang-3.8
https://bugs.freedesktop.org/show_bug.cgi?id=92705

v2.1: use Linker::Flags::None instead of 0 and emplace_back()

Signed-off-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-29 12:35:37 +09:00
Ilia Mirkin
d0693d7515 nv50: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:53:30 -04:00
Ilia Mirkin
ebbd7b41c0 nvc0: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:42:59 -04:00
Julien Isorce
3bbb8715ac nvc0: fix crash when nv50_miptree_from_handle fails
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-28 18:26:20 +01:00
Brian Paul
2bf224b3f9 vbo: replace assertion with conditional in vbo_compute_max_verts()
With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test).  Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-28 11:03:27 -06:00
Brian Paul
8e9c3070bf mesa: minor formatting fix in get_tex_rgba_compressed() 2015-10-28 11:03:21 -06:00
Marek Olšák
f04f13622f st/mesa: implement ARB_copy_image
I wonder if the craziness was worth it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
ce9db16e1c gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS
For ARB_copy_image.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
e82c527f1f radeonsi: allow copying between compatible compressed and uncompressed formats
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
     DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
6a4dc1ad49 mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)
We initialized Target, but not TargetIndex.
This is required since 7d7dd18711.

v2: do it in the right place. Noticed by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Emil Velikov
bfc73ff10e i965: remove unneeded src_reg copy in emit_shader_time_write
The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Emil Velikov
0325f68228 i965: remove cache_aux_free_func array
There is only one function that can be called, which is well known at
compilation time.

The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Samuel Iglesias Gonsalvez
74fcc4c41f main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks
The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.

Fixes 3 dEQP-GLES31.functional.* tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-28 08:40:51 +01:00
Boyuan Zhang
03c92ffbf6 st/vdpau: disable RefPicList for Vdpau HEVC
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
ad2752e94b st/va: add VAAPI HEVC decode support
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
38c3d7cfc4 radeon/uvd: implement and add flag for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
231605d14d vl: add RefPicList defines for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Marta Lofstedt
16c49da63a mesa: Draw indirect is not allowed if the default VAO is bound.
From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure,  be in buffer objects,  and may not be called when
the default vertex array object is bound."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 12:16:23 +01:00
Marek Olšák
93eb4f9287 winsys/amdgpu: remove the dcc_enable surface flag
dcc_size is sufficient and doesn't need a further comment in my opinion.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
3aebc596b3 radeonsi: add debug flags that disable DCC and DCC fast clear
For debugging, bug reports, etc.
This is not in the radeonsi directory, but it is about radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
235d38584c radeonsi: properly check if DCC is enabled and allocated
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
5bc5dca0cb radeonsi: simplify DCC handling in si_initialize_color_surface
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marta Lofstedt
3daa7e5147 mesa: Draw indirect is not allowed when xfb is active and unpaused
OpenGL ES 3.1 specification, section 10.5:
"An INVALID_OPERATION error is generated if
transform feedback is active and not paused."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:49:21 +01:00
Marta Lofstedt
2c91e08656 mesa: Draw Indirect return wrong error code on unalinged
From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."

However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if <indirect> is not word aligned."

V2: After discussions on the list, it was suggested to
only keep the INVALID_VALUE error.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-27 08:49:21 +01:00
Samuel Iglesias Gonsalvez
4565b6f4fb main: Remove interface block array index for doing the name comparison
From ARB_program_query_interface spec:

"uint GetProgramResourceIndex(uint program, enum programInterface,
                                   const char *name);
 [...]
 If <name> exactly matches the name string of one of the active resources
 for <programInterface>, the index of the matched resource is returned.
 Additionally, if <name> would exactly match the name string of an active
 resource if "[0]" were appended to <name>, the index of the matched
 resource is returned. [...]"

"A string provided to GetProgramResourceLocation or
 GetProgramResourceLocationIndex is considered to match an active variable
 if:
[...]
   * if the string identifies the base name of an active array, where the
     string would exactly match the name of the variable if the suffix
     "[0]" were appended to the string;
[...]
"

Fixes the following two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
  because ARB_program_interface_query says that they are equivalent
  to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:10:04 +01:00
Eric Anholt
3359ad6cda vc4: Add support for copy propagation with unpack flags present.
total instructions in shared programs: 89251 -> 87862 (-1.56%)
instructions in affected programs:     52971 -> 51582 (-2.62%)
2015-10-26 16:48:34 -07:00
Eric Anholt
01ca4f207e vc4: Rewrite the pack instructions as a MOV with a dst pack flag
Another step in reducing the special-casing of instructions.
2015-10-26 16:48:34 -07:00
Eric Anholt
72fa2ae20b vc4: Move dst pack setup out to a helper function with more asserts. 2015-10-26 16:48:34 -07:00
Eric Anholt
99a9a5a345 vc4: Switch the unpack ops to being unpack flags on a mov.
This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
2015-10-26 16:48:34 -07:00
Eric Anholt
548b05d53f vc4: Drop some confused code about pack/unpack handling.
At one point I thought packs and unpacks were in the same field of the
instruction.  They aren't.  These instructions therefore never cause a
pack.

total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs:     15261 -> 15179 (-0.54%)
2015-10-26 16:48:34 -07:00
Eric Anholt
a7b424e835 vc4: Reduce MOV special-casing in QIR-to-QPU.
I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.
2015-10-26 16:48:34 -07:00
Eric Anholt
652a864b25 vc4: Fix up the test for whether the unpack can be from r4.
We can do 16a/16b from float as well.  No difference on shader-db.
2015-10-26 16:48:34 -07:00
Eric Anholt
3d7a088608 vc4: Don't try to follow MOVs across a pack. 2015-10-26 16:48:34 -07:00
Eric Anholt
6eb0760f48 vc4: Only copy propagate raw MOVs.
No problems being fixed, but needed for the new unpack changes.
2015-10-26 16:48:34 -07:00
Eric Anholt
0ccacfa017 vc4: If a QIR source has an unpack set, print it.
Not used yet, but will be.
2015-10-26 16:48:34 -07:00
Kenneth Graunke
8034e7d6f1 glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.
When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers.  So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.

If the TCS is absent, we can't do this optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-26 16:37:07 -07:00
Ian Romanick
8f84a8e257 i965: Add missing close-parenthesis in error messages
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-26 16:15:55 -07:00
Ian Romanick
7070c8879a i965: Fix is-renderable check in intel_image_target_renderbuffer_storage
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

    ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-26 16:15:55 -07:00
Timothy Arceri
a3d0359aff glsl: keep track of intra-stage indices for atomics
This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
2015-10-27 07:03:05 +11:00
Roland Scheidegger
711489648b gallivm: disable f16c when not using AVX
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919a it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-26 16:45:49 +01:00
Julien Isorce
a61be1a798 st/va: pass picture desc to begin and decode
At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.

https://bugs.freedesktop.org/show_bug.cgi?id=92634

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-26 13:53:10 +01:00
Tapani Pälli
8ae4317c36 mesa: add additional checks for uniform location query
Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.

From page 79 of the OpenGL 4.2 spec:
    "A valid name cannot be a structure, an array of structures, or any
    portion of a single vector or a matrix."

v2: use without-array() to simplify code (Timothy)

No Piglit or CTS regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-26 12:52:17 +02:00
Emil Velikov
a305d59baa docs: add news item and link release notes for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-25 10:17:14 +00:00
Emil Velikov
47dd80a35d docs: add sha256 checksums for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ec14e6f8fd)
2015-10-25 10:14:04 +00:00
Emil Velikov
bddb7a51c3 docs: add release notes for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 31bf247031)
2015-10-25 10:14:03 +00:00
Kenneth Graunke
fcb39f5b6a i965: Make brw_varying_to_offset take a const pointer to the VUE map.
It doesn't modify it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-24 20:30:14 -07:00
Eric Anholt
a2eba3362f vc4: Fix names of the 16-bit unpacks
They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.
2015-10-24 17:55:55 -07:00
Eric Anholt
a238ad372d vc4: Don't try to register coalesce into the VPM across non-raw MOVs.
No known bugs, just something I noticed while updating optimization code
for other changes.
2015-10-24 17:55:38 -07:00
Eric Anholt
ae1d3322cc vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.
One instruction instead of four, and it turns out you do this a lot for
the Over operator.

total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs:     318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs:     6434 -> 6076 (-5.56%)
2015-10-24 17:55:22 -07:00
Eric Anholt
f09ed63f43 vc4: Fix the test for skipping raw MOVs.
I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c.  No change to shader-db.
2015-10-24 17:55:22 -07:00
Ben Widawsky
9ecfc6baf1 i965: Remove unused devinfo revision
I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.

This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Fri Aug 7 13:58:37 2015 -0700

    i965/skl: Remove early platform support

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-24 12:48:01 -07:00
Fabio Pedretti
b0342f48d0 docs/index.html: fix typo
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-24 19:27:24 +01:00
Rob Clark
1e8d0cc628 freedreno: remove unnecessary null checks
According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null).  And the null checks were in-
consistent anyways, so remove them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-24 12:38:33 -04:00
Bas Nieuwenhuizen
6529daca39 radeonsi: Implement DCC fast clear.
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 17:46:08 +02:00
Roland Scheidegger
205a3ce5c1 gallivm: fix tex offsets with mirror repeat linear
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
71ff5af5dd gallivm: fix sampling with texture offsets in SoA path
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).

This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fb586e1edb softpipe: fix using non-zero layer in non-array view from array resource
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fe707c0373 llvmpipe: fix using non-zero layer in non-array view from array resource
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Alex Deucher
830e57b82d radeonsi: add Stoney to si_init_gs_info()
This patch was originally written before stoney support
was merged.  Add stoney.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-23 18:56:45 -04:00
Bas Nieuwenhuizen
48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Bas Nieuwenhuizen
81ebd6a882 radeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:28 +02:00
Bas Nieuwenhuizen
bb77467df9 radeonsi: Disable operations that do not work with DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:24 +02:00
Bas Nieuwenhuizen
afa357c3b0 radeonsi: Allocate buffers for DCC.
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.

DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:01 +02:00
Marek Olšák
edf6a4537c radeonsi: only apply the SNORM blit workaround to *8_SNORM
Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
e1c098f238 util/format: add helper util_format_is_snorm8
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
0d2cb35f68 radeonsi: merge two ifs setting WD_SWITCH_ON_EOP
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ca18f12dbb radeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOI
This catches the other cases that enable SWITCH_ON_EOI.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
2070af2fb1 radeonsi: add one more SWITCH_ON_EOI requirement for Hawaii and VI
The VI condition depends on geometry shaders and MAX_PRIMGRP_IN_WAVE.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
a6b5684e99 radeonsi: only apply the instancing bug workaround to Bonaire
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
96d5879d38 radeonsi: add SWITCH_ON_EOI requirement for 4 SE parts
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
7e056f872f radeonsi: remove unnecessary PARTIAL_VS_WAVE setting for streamout
hardware does this automatically

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
3a157e6e68 radeonsi: allow unbinding vertex shaders
Draw calls without a vertex shader are skipped.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
50bb2decf7 radeonsi: add draw_vbo check for a NULL pixel shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ed95cb3a31 radeonsi: add checks for a NULL pixel shader
This will allow removing the dummy PS.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
d842d2f251 gallium/util: add a test for NULL fragment shaders
Just to validate that radeonsi doesn't crash.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
dd05824b89 st/mesa: don't load state parameters if there are none
Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-24 00:01:20 +02:00
Samuel Li
98546bfd03 radeonsi: add Stoney pci ids
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:48 -04:00
Samuel Li
bf0d0ce0d5 radeonsi: add support for Stoney asics (v3)
v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:14 -04:00
Ilia Mirkin
e05021ff72 nvc0: respect edgeflag attribute width
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.

Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 16:43:06 -04:00
Jose Fonseca
ea421e919a gallivm: Explicitly disable unsupported CPU features.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-23 20:25:19 +01:00
Eric Anholt
70b06fb5d5 vc4: Convert blending to being done in 4x8 unorm normally.
We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.

total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs:     327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs:     15580 -> 12766 (-18.06%)

Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
2015-10-23 18:11:21 +01:00
Eric Anholt
8e701fda49 vc4: Add QIR/QPU support for the 8-bit vector instructions. 2015-10-23 18:11:21 +01:00
Eric Anholt
817a7eb588 vc4: Don't try to CSE non-SSA instructions.
This can happen when we're doing destination packing -- we don't know
what's in the rest of the register.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-10-23 18:11:21 +01:00
Eric Anholt
5b2fb138bc nir: Add opcodes for saturated vector math.
This corresponds to instructions used on vc4 for its blending inside of
shaders.  I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.

v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
    by jekstrand)
2015-10-23 18:11:21 +01:00
Eric Anholt
1066a372d8 vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE. 2015-10-23 18:11:21 +01:00
Eric Anholt
7d7fbcdf4e vc4: Add a workaround for HW-2116 (state counter wrap fails).
I haven't proven that this happens (I've got other GPU hangs in the
way), but the closed driver also does this and it's documented as an
errata.
2015-10-23 18:11:21 +01:00
Eric Anholt
73f6104532 vc4: Fix missing \n in a perf_debug(). 2015-10-23 18:11:21 +01:00
Kristian Høgsberg Kristensen
8f60dc83f7 i965/fs: Allow copy propagating into new surface access opcodes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0cb7d7b4b7 i965/fs: Optimize ssbo stores
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
feff21d1a6 i965/fs: Drop offset_reg temporary in ssbo load
Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0a5a738252 i965/fs: Avoid scalar destinations in emit_uniformize()
The scalar destination registers break copy propagation. Instead compute
the results to a regular register and then reference a component when we
later use the result as a source.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
a19bf6d3cc i965/fs: Don't uniformize surface index twice
The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
aedc0aab19 i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL
The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register.  When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
24a3a697e5 i965/fs: Read all components of a SSBO field with one send
Instead of looping through single-component reads, read all components
in one go.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
de5a450bd3 i965: Don't use message headers for untyped reads
We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.

v2: Only remove header for untyped reads. Typed reads always need the
    header.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Alejandro Piñeiro
2f1bc1da86 i965/vec4: check opcode on vec4_instruction::reads_flag(channel)
Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.

That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-23 18:11:09 +02:00
Eric Anholt
fb064901e9 vc4: Use Rob's NIR-based user clip lowering. 2015-10-23 14:30:15 +01:00
Eric Anholt
b3797a8f88 vc4: Also dump the decimation mode for resolved stores. 2015-10-23 14:30:15 +01:00
Eric Anholt
7516cbd261 vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG. 2015-10-23 14:30:15 +01:00
Eric Anholt
b0963ce758 vc4: Add a sentinel after simulator buffers for buffer overflow detection.
This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead.  The downside is that only catches
writes, not reads.

It didn't catch any bad writes on a current piglit run, but may be useful
in the future.
2015-10-23 14:29:07 +01:00
Samuel Iglesias Gonsalvez
f408a13dd3 glsl: fix shader storage block member rules when adding program resources
Commit f24e5e did not take into account arrays of named shader
storage blocks.

Fixes 20 dEQP-GLES31.functional.ssbo.* tests:

dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.2
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.29
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.33
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.3

V2:
- Rename some variables (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-10-23 13:12:43 +02:00
Chia-I Wu
582ecb3b91 ilo: add support for scratch spaces
When a kernel reports a non-zero per-thread scratch space size, make sure the
hardware state is correctly set up, and a scratch bo is allocated.
2015-10-23 17:29:58 +08:00
Chia-I Wu
4a7d18296a ilo: fix scratch space setup in core
Move scratch_size out of ilo_state_shader_kernel_info and
ilo_state_compute_interface_info.  A scratch space is shared by all
kernels/interfaces.  Update builder to emit relocs for scratch bos.
2015-10-23 17:29:58 +08:00
Timothy Arceri
3994ef5f1b glsl: remove excess location qualifier validation
Location has never been able to be a negative value because it has
always been validated in the parser.

Also the linker doesn't check for negatives like the comment claims.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-23 17:05:56 +11:00
Dave Airlie
5145243018 docs: update relnotes to mention virgl driver. 2015-10-23 14:40:07 +10:00
Dave Airlie
b3b82fe8ea virgl/vtest: add vtest driver
virgl/vtest is a swrast driver that allows the
virgl acceleration to be tested without having
a virtual machine.

The backend has a unix socket server that
this connects to.

This is run by setting
LIBGL_ALWAYS_SOFTWARE=y
GALLIUM_DRIVER=virpipe

In this mode all renderering is sent over
a socket to the remote renderer, and the
results are readback and copies to the screen
using drisw. This works well enough to develop
new features and to help debug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 14:40:07 +10:00
Dave Airlie
a8987b88ff virgl: add driver for virtio-gpu 3D (v2)
virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.

The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.

This is the initial import of the driver to mesa.

The kernel driver portions are lined up for drm-next.

Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.

v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 14:40:07 +10:00
Dave Airlie
531f5d1270 tgsi: try and handle overflowing shaders. (v2)
This is used to detect error in virgl if we overflow the shader
dumping buffers.

v2: return a bool.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 11:57:56 +10:00
Dave Airlie
041081dc21 tgsi: add option to dump floats as hex values
This adds support to the parser to accept hex values as floats,
and then adds support to the dumper to allow the user to select
to dump float as 32-bit hex numbers.

This is required to get accurate values for virgl use of TGSI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 11:55:02 +10:00
Sinclair Yeh
231d539239 svga: Condition preemptive flush on draw emission
On ultra high resolution modes, the preemptive flush flag can be
set midway through command submission, a condition that cannot be
recovered from a flush-retry, causing rendering artifacts.

This patch prevents a preemtive_flush until a draw has been
emitted.

Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
99effaa965 svga: try to avoid index generation for some primitive types
The svga device doesn't directly support quads, quad strips or polygons
so we have to convert those types to indexed triangle lists.  But we
can sometimes avoid that if we're drawing flat/constant-colored prims
and we don't have to worry about provoking vertex.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
129d34da49 svga: avoid provoking vertex conversion when possible
Provoking vertex comes into play when doing flat shading.  But if we know
that all fragments in a primitive are the same color, the provoking vertex
doesn't matter.  Check for that case and use whichever provoking vertex
convention is supported by the device.

This avoids generating an index buffer to do the PV conversion.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
1082735bb6 svga: detect constant color writes in fragment shaders
Examine the fragment shader to try to detect TGSI shaders which use
"MOV OUT[0], CONST[i]" to write a constant value for the fragment color.
In this case, all fragments will have the same color (unless blending is
enabled).

This is a common case for OpenGL code such as: glColor(), glBegin(),
glVertex(), ..., glEnd() when lighting/fog/etc are disabled.  In this
case, the Mesa/gallium state tracker actually generates a simple
"MOV OUT[0], CONST[i]" fragment shader.

This will be used by the next commit to avoid provoking vertex conversion
(creating/rewriting an index buffer) when drawing flat-shaded primitives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
df0f817e31 mesa: check for unchanged line width before error checking
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 17:19:20 -06:00
Brian Paul
990afdc045 st/mesa: use _mesa_RasterPos() when possible
The st_RasterPos() function goes to great pains to implement the
rasterpos transformation.  It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.

But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.

This can add up for legacy apps that make a lot of calls to glRasterPos.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
af0399a1ce tnl: remove t_rasterpos.c
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
234d5320bb drivers/common: use _mesa_RasterPos instead of _tnl_RasterPos
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
614a743767 mesa: copy rasterpos evaluation code into core Mesa
We'll remove it from the tnl module next.  By lifting this code into core
Mesa we can use it from the gallium state tracker.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
9919f56099 vbo: optimize vertex copying when 'wrapping'
Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 17:19:20 -06:00
Alex Deucher
7b63658125 radeon/uvd: don't expose HEVC on old UVD hw (v3)
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.

v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well.  Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-22 16:22:44 -04:00
Alejandro Piñeiro
8cf84a7e47 i965/vec4: print predicate control at brw_vec4 dump_instruction
v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
    a copy on brw_vec4.c, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
92ae101ed0 i965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests
The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).

But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.

v2: use the same idea for the fs counterpart too, as suggested by
    Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
8fc8fcc04f i965/vec4: Add unit tests for cmod propagation pass
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.

Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.

v2: update tests after Matt Turner's review of the optimization pass
v3: tweaks on the tests (mostly on the comments), after Matt Turner's
    review

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
627f94b72e i965/vec4: adding vec4_cmod_propagation optimization
vec4 port of fs_cmod_propagation.

Shader-db results (no vec4 grepping):
total instructions in shared programs: 6240413 -> 6235841 (-0.07%)
instructions in affected programs:     401933 -> 397361 (-1.14%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                2265
HURT:                                  0

v2: remove extra space and combine two if blocks, as suggested by
    Matt Turner
v3: add condition check to bail out if current inst and inst being
    scanned has different writemask, as pointed by Matt Turner
v3: updated shader-db numbers
v4: remove block from foreach_inst_in_block_*_starting_from after
    commit 801f151917

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
a59359ecd2 i965/vec4: track and use independently each flag channel
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.

v2: added support to track each flag channel independently at vec4
    live_variables, as v1 assumed that it was already doing it, as
    pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
8ac3b525c7 i965/vec4: nir_emit_if doesn't need to predicate based on all the channels
v2: changed comment, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-22 21:58:03 +02:00
Matt Turner
1095d837dc i965/vec4/gs: Fix signed/unsigned comparison warning. 2015-10-22 12:27:04 -07:00
Matt Turner
e2707c8765 i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+.
Gen8+ lifted the register region restriction that an instruction whose
destination spans two registers must have sources that also span two
registers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:27:00 -07:00
Matt Turner
0f74796e33 i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:57 -07:00
Matt Turner
e2344e11ce i965/fs: Trim unneeded channels in SampleID setup.
The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:54 -07:00
Matt Turner
e10fc055e7 i965/fs: Use type-W for immediate in SampleID setup.
Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).

Also change the type of g0 to allow for compaction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:49 -07:00
Matt Turner
cfb67c3d06 i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().
We implement textureQueryLevels (which takes no arguments, save the
sampler) using the resinfo message (which takes an argument of LOD).
Without initializing it, we'd generate a MOV from the null register to
load the LOD argument.

Essentially the same logic applies to texture. A vertex shader cannot
compute derivatives and so cannot produce an LOD, so TXL with an LOD of
0.0 is used.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-22 10:16:52 -07:00
Matt Turner
65ffaf2740 i965: Note that the UV immediate type is Gen6+. 2015-10-22 10:16:52 -07:00
Jose Fonseca
718249843b gallivm: Translate all util_cpu_caps bits to LLVM attributes.
This should prevent disparity between features Mesa and LLVM
believe are supported by the CPU.

http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990

Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.

v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-22 11:11:40 +01:00
Jordan Justen
627c15cde4 i965/fs: Disable CSE optimization for untyped & typed surface reads
An untyped surface read is volatile because it might be affected by a
write.

In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r3 = untyped_surface_read(ssbo_float)
  r4 = r3 + 1
  untyped_surface_write(ssbo_float, r4)

And after CSE, we had:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r4 = r1 + 1
  untyped_surface_write(ssbo_float, r4)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-22 00:36:37 -07:00
Chia-I Wu
13a5805b64 ilo: make sure there is HiZ before resolving
We do not want to perform a depth resolve on an MCS enabled surface.
2015-10-22 14:06:21 +08:00
Chia-I Wu
0b6f6ee50f ilo: fix max thread count for HS on Gen8
It is in DW2 on Gen8.
2015-10-22 14:06:21 +08:00
Ben Widawsky
8eefdacb38 i965: Advertise ARB_shader_stencil_export (gen9+)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Ben Widawsky
1db44252d0 i965: Implement ARB_shader_stencil_export (gen9+)
v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
  v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Ben Widawsky
5fa7114652 i965/fs: Enumerate logical fb writes arguments
Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)

Cc: Matt Turner <mattst88@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Brian Paul
18a631eb90 svga: fix clip plane regression after recent tgsi_scan change
Before the change "tgsi/scan: use properties for clip/cull distance
writemasks", the tgsi_shader_info::num_written_clipdistance field
was a multiple of four, now it's an accurate count.  In the svga
driver, we need a minor change to the loop test.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-21 17:12:19 -06:00
Kenneth Graunke
48c76eae8e i965: Implement gl_InvocationID.
It's stored in bits 31:27 of g1 (along with the URB handles).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:58 -07:00
Kenneth Graunke
c5ae34f38f i965: Implement nir_intrinsic_load_primitive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:56 -07:00
Kenneth Graunke
b3ebf03b84 i965: Add a fs_visitor constructor that takes a brw_gs_compile.
Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains
the input VUE map and information about the control data headers.
Passing this in allows us to share that code in brw_gs.c, and calculate
them before deciding on vec4 vs. scalar mode, as it's independent of
that choice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:54 -07:00
Kenneth Graunke
55dfd39b5f i965: Add a brw->scalar_gs flag controlled by INTEL_SCALAR_GS=1.
This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs,
which controls whether or not to use SIMD8 geometry shaders.

For now, we control it via a new environment variable, INTEL_SCALAR_GS.
This provides a convenient way to try it out.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:53 -07:00
Kenneth Graunke
ac0a33666b i965: Make emit_urb_writes() reserve space for GS header information.
Geometry shaders have additional header data at the beginning of their
output URB entries.  Shaders that use EndPrimitive() or multiple streams
have a control data header; shaders with a dynamic vertex count have an
additional vec4 slot to hold the 32-bit vertex count (and 96 bits of
padding).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:52 -07:00
Kenneth Graunke
cb755996d9 i965: Make emit_urb_writes() only set EOT for the VS.
The GS will emit a bunch of vertices, and we don't want to do an EOT
prematurely.  We'll emit GS_OPCODE_THREAD_END when we want to terminate
the thread.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:50 -07:00
Kenneth Graunke
6ae419b94d i965: Make fs_visitor::emit_urb_writes reusable for scalar GS.
GS doesn't have ClampVertexColor, and we don't want to go through VS
structures.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:49 -07:00
Kenneth Graunke
72d84ae7ce i965: Introduce a brw_vue_prog_data::include_vue_handles flag.
Tessellation shaders and SIMD8 geometry shaders may need to resort to
the pull model for inputs at times.  When set, the state upload code
will tell the hardware to provide URB handles for input data.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:48 -07:00
Kenneth Graunke
ac98888afd i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.
In scalar mode, geometry shader inputs can easily take up hundreds of
registers.  This makes pushing VUE entries impractical; we'll need to
resort to the pull model in some cases.

To support this, we introduce a new opcode corresponding to the "URB
Read SIMD8" message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:46 -07:00
Kenneth Graunke
bea7522782 i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.
In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
There are many kinds of flags for SIMD4x2 messages.

However, there are really only two (per-slot offset, use channel masks)
for SIMD8 messages.  Rather than adding a boolean flag for per-slot
offsets (polluting all instructions), I decided to just make three new
opcodes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:41 -07:00
Jason Ekstrand
0e57694745 i965/gs: Do prog_data setup and other calculations in brw_compile_gs
This commit moves the large pile of setup calculations we have to do for
geometry shaders out of brw_gs_emit and into brw_compile_gs.  This has a
couple of nice implications.  First, it's less work that the caller of
brw_compile_gs has to do.  Second, it's consistent with the vertex and
fragment stages.  Finally, it allows us to put brw_gs_compile back behind
the API boundary where it belongs.

v2 (Jason Ekstrand):
 - Pull the changes to use nir info into a separate patch
 - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h
   so that we can use it for scalar GS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
f3bc73073a i965/gs: Use NIR info for setting up prog_data
Previously, we were pulling bits from GL data structures in order to set up
the prog_data.  However, in this brave new world of NIR, we want to be
pulling it out of the NIR shader whenever possible.  This way, we can move
all this setup code into brw_compile_gs without depending on the old GL
stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
fac9b21e03 i965/gs: Pull prog_data out of brw_gs_compile
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
6ac2bbec16 i965/gs: Use NIR instead of the brw_geometry_program for GS metadata
With this, we can remove the geometry program from brw_gs_compile.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
72148de217 i965/gs: Move the mem_ctx argument to brw_compile_gs
This makes it better match the other brw_compile_* functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
8e8b527b27 i965/gs: Set static_vertex_count unconditionally on GEN8+
We always have NIR, so there's no reason for the check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
2686477d37 nir: Constify nir_gs_count_vertices
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
4eb84a03be nir/info: Add more information about geometry shaders
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Ben Widawsky
3c5d24363a i965: (trivial) rename computes stencil to gen9
All the documentation I can find says that this bit (and functionality) only
exists on SKL+. Since the bit isn't yet used, there is no real impact here.

The original code was added by Ken here (a surprisingly long time ago):
commit f3c6d6f1e1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Nov 29 21:00:27 2012 -0800

    i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 11:00:03 -07:00
Ben Widawsky
c643518452 i965: Correct the comment about fb write payload
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-21 11:00:00 -07:00
Nanley Chery
f1147a238a mesa/glformats: Undo code changes from _mesa_base_tex_format() move
The refactoring commit, c6bf1cd, accidentally reverted cd49b97
and 99b1f47. These changes caused more code to be added to the
function and removed the existing support for ASTC. This patch
reverts those modifications.

v2. Actually include ASTC support again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92221
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-21 10:36:31 -07:00
Matt Turner
2ce659b5e4 i965: Mark compacted 3-src instructions as Gen8+.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
05cc56cca3 i965: Add const to brw_compact_inst_bits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
b29f92daec i965: Add mask_control_ex field and handle it in compaction.
Documentation is sparse, but it appears to have existed on G45 and ILK
as a second bit extension of the mask_control field. Setting the pair of
bits to 0b11 enables "NoCMask".

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
3ec9d96d43 i965: Add devinfo->gen assertions for acc_wr_control.
... and for flag_subreg_nr since it's right near by.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
d14907b946 i965: Prepare for next commit by adding more whitespace.
We're going to add a field with a longer name that wouldn't align with
the rest.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
35f3f06c8a i965: Compact acc_wr_control only on Gen6+.
It only exists on Gen6+, and the next patches will add compaction
support for the (unused) field in the same location on earlier
platforms.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Matt Turner
ee868c46e8 i965: Add devinfo parameter to brw_compact_inst_* funcs.
The next commit will add assertions dependent on devinfo->gen.

Use compact()/uncompact() macros where possible, like the 3-src code
does.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Matt Turner
4a132349c3 i965/vec4: Don't emit MOVs for unused URB slots.
Otherwise we'd emit a MOV from the null register (which isn't allowed).

Helps 24 programs in shader-db (the geometry shaders in GSCloth):

instructions in affected programs:     302 -> 262 (-13.25%)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Nigel Stewart
04703762e5 osmesa: Expose GL entry points for Windows build via DEF file.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92437
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-21 14:06:58 +01:00
Jonathan Gray
99c4079c37 configure.ac: ensure RM is set
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set.  This fixes "make clean" on OpenBSD.

v2: use AC_CHECK_PROG

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-21 14:09:38 +01:00
Neil Roberts
ee77796a5c i965/fs: Disable opt_sampler_eot for more message types
In bfdae9149e I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-21 11:08:37 +02:00
Neil Roberts
801f151917 i965: Remove block arg from foreach_inst_in_block_*_starting_from
Since 49374fab5d these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-21 11:07:04 +02:00
Timothy Arceri
38ceeeadaa glsl: check for arrays of arrays when assigning explicit locations
This fixes assigning explicit locations in the CTS test:

ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-21 15:49:32 +11:00
Timothy Arceri
9a04057ef1 glsl: add is_array_of_arrays() helper
As suggested by Ian Romanick

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-21 15:49:17 +11:00
Kenneth Graunke
156b7d3113 glsl: Fix bad indentation in bit_logic_result_type().
The first level of indentation was using 4 spaces.  Mesa uses 3.

Trivial.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-20 21:25:11 -07:00
Timothy Arceri
fd01840c0b glsl: add AoA support to subroutines
process_parameters() will now be called earlier because we need
actual_parameters processed earlier so we can use it with
match_subroutine_by_name() to get the subroutine variable, we need
to do this inside the recursive function generate_array_index() because
we can't create the ir_dereference_array() until we have gotten to the
outermost array.

For the remainder of the array dimensions the type doesn't matter so we
can just use the existing _mesa_ast_array_index_to_hir() function to
process the ast.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-21 14:56:57 +11:00
Tapani Pälli
a59c1adcc6 glsl: fix record type detection in explicit location assign
Check current_var directly instead of using the passed in record_type.

This fixes following failing CTS test:
	ES31-CTS.explicit_uniform_location.uniform-loc-types-structs

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:12:15 +03:00
Tapani Pälli
1f48ea1193 glsl: do not try to reserve explicit locations for buffer variables
Explicit locations are only used with uniform variables.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:11:38 +03:00
Tapani Pälli
96bbb3707f glsl: skip buffer variables when filling UniformRemapTable
UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer
variables should not utilize uniform locations.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:10:52 +03:00
Brian Paul
f1682fdafa svga: add switch case for PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT
A third instance of this was needed but missed in the previous commit.
Return 32 as for the two other cases.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-20 19:14:51 -06:00
Brian Paul
b48e16fa2f draw: fix splitting of line loops (v2)
When the draw module splits long line loops, the sections are emitted
as line strips.  But the primitive type wasn't set correctly so each
section was being drawn as a loop, introducing extra line segments.

To fix this, we pass a new DRAW_LINE_LOOP_AS_STRIP flag to the run()
function.  The linear/elt_run() functions have to check for this flag
and set their primitive type accordingly.

No piglit regressions.  Fixes piglit's lineloop with -count 4097 or
higher.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-20 19:14:51 -06:00
Anuj Phogat
876d07d837 i965/gen9: Remove temporary variable 'bpp' in tr_mode_..._texture_alignment()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
06ec19bca4 i965/gen9: Remove temporary variable 'align_yf' in tr_mode_..._texture_alignment()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
8f8c450bc7 i965/gen9: Remove parameter 'brw' from tr_mode_..._texture_alignment()
V2: Rebased on master.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
a5a00bd747 i965/gen9: Reuse YF alignment tables in tr_mode_..._texture_alignment()
Patch just does some refactoring to make the code look better. No
functional changes in here.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Brian Paul
f221580937 vbo: convert display list GL_LINE_LOOP prims to GL_LINE_STRIP
When a long GL_LINE_LOOP prim was split across primitives we drew
stray lines.  See previous commit for details.

This patch converts GL_LINE_LOOP prims into GL_LINE_STRIP prims so
that drivers don't have to worry about the _mesa_prim::begin/end flags.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d79595bf02 vbo: fix GL_LINE_LOOP stray line bug
When long GL_LINE_LOOP primitives don't fit in one vertex buffer they
have to be split across buffers.  The code to do this was basically correct
but drivers had to pay special attention to the _mesa_prim::begin,end flags
in order to draw the sections of the line loop properly.  Apparently, the
only drivers to do this were those using the old 'tnl' module for software
vertex processing.

Now we convert the split pieces of GL_LINE_LOOP prims into GL_LINE_STRIP
primitives so that drivers don't have to worry about the special begin/end
flags.  The only time a driver will get a GL_LINE_LOOP prim is when the
whole thing fits in one vertex buffer.

Mostly fixes bug 81174, but not completely.  There's another bug somewhere
in the src/gallium/auxiliary/draw/ code.  If the piglit lineloop test is
run with -count 4096, rendering is correct, but with -count 4097 there are
stray lines.  4096 is a magic number in the draw code (search for "4096").

Also note that this does not fix long line loops in display lists.  The
next patch fixes that.

v2: fix incorrect -1 in vbo_compute_max_verts(), per Charmaine.  Remove
incorrect assertion which was added in vbo_copy_vertices().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49779
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28130

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
03d2f08539 vbo: add new vbo_compute_max_verts() helper function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
002c5c1da3 vbo: simplify some code in vbo_exec_End()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d916175c4d vbo: simplify some code in vbo_copy_vertices()
As before, use a new 'last_prim' pointer to simplify things.  Plus, add
some const qualifiers.

v2: use 'sz' in another place, per Sinclair.  And update subject line.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d24c3a680e vbo: simplify some code in vbo_exec_wrap_buffers()
Use a new 'last_prim' pointer to simplify things.

v2: remove unneeded assert(exec->vtx.prim_count > 0)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
1637cec8f8 vbo: replace the comment on vbo_copy_vertices()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
e05ffcf1d9 vbo: make vbo_exec_vtx_wrap() static
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
971b56c643 vbo: remove unneeded ctx parameter for merge_prims()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
6cc596c66b tnl: add some comments in render_line_loop code
And remove '(void) flags' line which is not needed.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
f7272032be mesa: simple whitespace fix in texstore.c 2015-10-20 12:52:40 -06:00
Brian Paul
f6d4e20d10 vbo: reduce number of vertex buffer mappings for vertex attributes
Whenever we got a glColor, glNormal, glTexCoord, etc. call outside a
glBegin/End pair, we'd immediately map a vertex buffer to begin
accumulating vertex data.  In some cases, such as with display lists,
this led to excessive vertex buffer mapping.  For example, if we have
a display list such as:

glNewList(42, GL_COMPILE);
glBegin(prim);
glVertex2f();
...
glVertex2f();
glEnd();
glEndList();

Then did:

glColor3f();
glCallList(42);

We'd map a vertex buffer as soon as we saw glColor3f but we'd never
actually write anything to it.  Note that the vertex position data
was put into a vertex buffer during display list compilation.

With this change, we delay mapping the vertex buffer until we actually
have a vertex to write to it (triggered by a glVertex() call).  In the
above case, we no longer map a vertex buffer when setting the color and
calling the list.

For drivers such as VMware's, reducing buffer mappings gives improved
performance.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
d11fefa961 st/mesa: optimize 4-component ubyte glDrawPixels
If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format code.

That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.

For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.

v2: fix incorrect swizzle setup (need to invert the tex format's swizzle)

Reviewed by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
cf405922eb mesa: make memcpy_texture() non-static
So that we can use it directly from the mesa/gallium state tracker.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
31ae52acce st/mesa: check for out-of-memory in st_DrawPixels()
Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.

v2: also check if allocation of sv[1] fails, per Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
c5de38abc9 st/mesa: use MAX3() instead of MAX2(MAX2) in draw_textured_quad()
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
e24d04e436 mesa: fix incorrect opcode in save_BlendFunci()
Fixes assertion failure with new piglit
arb_draw_buffers_blend-state_set_get test.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
b1f8ef5ae3 mesa: add more cases to print_list() in dlist.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Emil Velikov
6994d8ec01 i965: silence incompatible pointer type warning
src/mesa/drivers/dri/i965/brw_program.c:94:39:
warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible
pointer type [-Wincompatible-pointer-types]
          return _mesa_init_gl_program(&prog->program, target, id);

                                       ^

Runtime was unaffected as brw_geometry_program is subclassed from
gl_geometry_program, thus the address passed was the same.

Fixes: bcb56c2c69 (program: convert _mesa_init_gl_program() to take
struct gl_program *)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-20 18:37:22 +01:00
Marek Olšák
814f31457e gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.

I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720

Cc: 11.0 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 18:27:11 +02:00
Eric Anholt
921feb8782 vc4: Switch our vertex attr lowering to being NIR-based.
This exposes more information to NIR's optimization, and should be
particularly useful when we do range-based optimization.

total uniforms in shared programs: 32066 -> 32065 (-0.00%)
uniforms in affected programs:     21 -> 20 (-4.76%)
total instructions in shared programs: 93104 -> 92630 (-0.51%)
instructions in affected programs:     31901 -> 31427 (-1.49%)
2015-10-20 12:47:27 +01:00
Eric Anholt
85b946478c vc4: Add limited support for ibfe/ubfe.
This is just enough to cover our unpack modes, which will be used by some
new NIR-based lowering in the next commit.
2015-10-20 12:47:27 +01:00
Marek Olšák
8910ebd8e8 tgsi/scan: use properties for clip/cull distance writemasks
No changes needed for drivers already relying on tgsi_shader_info.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
7c75f23cb9 st/mesa: pass the clip distance array size to drivers
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
e70c66197e gallium: add new properties for clip and cull distance usage
The TGSI usage mask can't be used, because these are declared as an output
array of 2 elements.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
67f489ded3 mesa: replace UsesClipDistance with ClipDistanceArraySize
This is more practical and needed by gallium.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
8339585b12 radeonsi: enable BC_OPTIMIZE if centroid isn't used
This solution was recommended by a Catalyst developer.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:56:46 +02:00
Marek Olšák
38391835b5 radeonsi: fix the export_prim_id field size in the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:56:40 +02:00
Marek Olšák
9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák
e57dd7a08b st/mesa: create shaders which have only one variant immediatelly (v2)
v2: fix the condition when lacking sample shading

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
b99645f819 st/mesa: negate the can_force_persample_interp flag
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
f4e938e9ae st/mesa: decouple shaders from contexts if they are shareable
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
d74e7b6fb9 gallium: add PIPE_CAP_SHAREABLE_SHADERS
I'll let drivers figure out how to do it.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
12321966ae radeonsi: add support for ARB_texture_view
All tests pass. We don't need to do much - just set CUBE if the view
target is CUBE or CUBE_ARRAY, otherwise set the resource target.

The reason this can be so simple is that texture instructions
have a greater effect on the target than the sampler view.

Thanks Glenn for the piglit test.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:25:19 +02:00
Boyan Ding
6bd9e03512 vc4: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-20 09:54:53 +01:00
Timothy Arceri
2832ca95ec glsl: fix stream qualifier for blocks with an instance name
This also removes the validation from the parser as it is not required
and once arb_enhanced_layouts comes along we wont be able to do validation
on the stream qualifier in the parser anyway as it adds constant expression
support to the stream qualifier.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-10-20 11:58:28 +11:00
Timothy Arceri
aa9f06b3ea glsl: fix regression when building interface field name for SSBOs
Fixes regression cased by bb5aeb8549

We don't care about the swizzle when building the name so just skip over it.

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-20 11:54:09 +11:00
Leo Liu
867284a8f0 st/omx/dec/h264: fix field picture type 0 poc disorder
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-19 20:43:03 -04:00
Anuj Phogat
2eed9e6b75 i965/gen9: Handle the GL_TEXTURE_{1D, 1D_ARRAY} targets inside switch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 13:43:44 -07:00
Matt Turner
de862f03ac i965/fs: Localize variables' scopes. 2015-10-19 10:19:32 -07:00
Matt Turner
35a2d259f2 i965/fs: Consider type mismatches in saturate propagation.
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform

   (+f0) sel      g20:UD, g30:UD, g40:UD
         mov.sat  g50:F,  g20:F

into

   (+f0) sel.sat  g20:UD, g30:UD, g40:UD
         mov      g50:F,  g20:F

But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.

Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.

Fixes bad code gen in 158 programs.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-19 10:19:32 -07:00
Matt Turner
9e17c36b8b i965: Extract can_change_source_types() functions.
Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-19 10:19:32 -07:00
Jason Ekstrand
41c474df53 i965/vs: Move URB entry_size and read_length calculations to compile_vs
Reviewed-By: Eduardo Lima Mitev <elima@igalia.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
6980372010 i965: Move the entire compiler API into a single file
At this point, the compiler API has been substantially simplified.  In the
spirit of Kristian's making a compiler library, this commit makes a single
header file that contains, more-or-less, the entire compiler API.

There's still a bit of cleanup to do particularly in the area of geometry
shaders.  However, this gets us much closer to having a separate compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4467344c82 i965: Rename brw_foo_emit to brw_compile_foo
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
67db9072b9 i965/fs: Move some of the prog_data setup into brw_wm_emit
This commit moves the common/modern stuff.  Some legacy stuff such as
setting use_alt_mode was left because it needs to know whether or not we're
an ARB program.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4e711872d0 i965/cs: Rework cs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
657863bb5c i965/gs: Rework gs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.  Unfortunately, we still
have to pass in the gl_shader_program for gen6 because it's needed for
transform feedback.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5d8bf6de61 i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

v2 (Jason Ekstrand):
   - Patch use_legacy_snorm_formula through as a function argument rather
     than trying to go through the shader key.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
22ad44910e i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
0ca401327e i965: Use a const nir_shader in backend_shader
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
8f1d968704 i965/vec4: Remove gl_program and gl_shader_program from the generator
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5e86f5b3d2 i965/fs: Remove the gl_program from the generator
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
688d2e4585 nir/info: Add a few bits of info for fragment shaders
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4889c73dd1 nir/info: Add compute shader local size to nir_shader_info
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
fe399f3a69 nir/info: Move the GS info into a stage-specific info union
This way we can have other stage-specific info without consuming too much
extra space.  While we're at it, we make sure that the geometry info is
only set if we're actually a goemetry shader.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
16619477bc mesa: Move gl_frag_depth_layout from mtypes.h to shader_enums.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5d4bc5ec13 nir: Add a label to nir_shader_info
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:45:14 -07:00
Jason Ekstrand
e00314bc57 i965/asm: Explicitly use a nir_instr for IR annotations
Now that everything goes through NIR, we don't need this to be a void
pointer anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:45:14 -07:00
Jose Fonseca
b23a4859f4 scons: Build nir/glsl_types.cpp once.
Undoes early hacks, and ensures nir/glsl_types.cpp is built once, and
only once.

The root problem is that SCons doesn't know about NIR nor any source
file in the NIR_FILES source list.

Tested with libgl-gdi and libgl-xlib scons targets.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-19 15:59:59 +01:00
Brian Paul
530eb39c71 svga: fix incorrect round-down arithmetic
Spotted by Roland.  Luckily, this code should never really be hit
since the const buffer size and offset should already be multiples
of 16.  I could probably add more assertions to that effect, but
let's just fix the arithmetic for now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-19 08:54:42 -06:00
Samuel Iglesias Gonsalvez
6f3954618b glsl: fix segfault when indirect indexing a buffer variable which is an array
Fixes a regression added by bb5aeb8549.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-19 11:16:50 +02:00
Indrajit Das
b0a44f1017 st/va: Added support for NV12 to IYUV conversion in vlVaGetImage
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-19 09:47:33 +02:00
Indrajit Das
381c17d695 st/va: Used correct parameter to derive the value of the "h" variable in vlVaCreateImage
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-19 09:47:24 +02:00
Iago Toral Quiroga
36c93e9659 glsl_to_tgsi: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
5a9ff87d0f st/mesa: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
55403665b6 i965: Do not use NumBufferInterfaceBlocks
This is the only place in the driver where we use this. Since we now work
with separate index spaces, always use NumUniformBlocks and
NumShaderStorageBlocks instead of NumBufferInterfaceBlocks to be more
consistent with the rest of the code.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
14c3db7bc5 main: GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH is about UBOS, not SSBOs
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
fba582efc7 main: Use NumUniformBlocks to count UBOs
Now that we have separate index spaces for UBOs and SSBOs we do not need
to iterate through BufferInterfaceBlocks any more, we can just take the
UBO count directly from NumUniformBlocks.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Chia-I Wu
86ccb2a16f ilo: set VME for 3DSTATE_PS
When the bit is not set, we can see sampling artifacts on triangle edges when
the mip filter is not GEN6_MIPFILTER_NONE.
2015-10-18 21:35:16 +08:00
Chia-I Wu
d04126a773 ilo: ignore prefer_linear_threshold when zero
This was the intended behavior but it did not work as intended until now.
2015-10-18 21:04:52 +08:00
Chia-I Wu
a445e0f7ef ilo: remove some unused kernel params 2015-10-18 21:04:52 +08:00
Chia-I Wu
6e132f4730 ilo: remove unused ilo_shader_get_type() 2015-10-18 21:04:52 +08:00
Chia-I Wu
29a0f7479d ilo: remove u_debug.h inclusion from ilo_core.h
Move it to ilo_debug.h.
2015-10-18 21:04:52 +08:00
Chia-I Wu
3fe568e2a4 ilo: remove u_memory.h inclusion from ilo_core.h
We do not make allocations generally in the core.
2015-10-18 21:04:52 +08:00
Samuel Pitoiset
fc5ae0c13f nvc0: do not bind input params at compute state init on Fermi
It looks like binding a constant buffer on compute overwrites the 3D
state. To avoid that, we already re-bind all the 3D constant buffers
after launching a compute grid but this is not enough.

Binding the constant buffer of input parameters for the compute state at
initialization corrupts the 3D constant buffers, and it's just useless
to bind it because this is not needed until we really launch a grid.

This fixes some piglit regressions related to interpolation tests
introduced in "nvc0: enable compute support by default on Fermi".

Fixes: 00d6186 (nvc0: enable compute support by default on Fermi)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-18 14:25:05 +02:00
Kenneth Graunke
ca2b807ca3 i965/vs: Drop hack that created NIR for fixed function vertex programs.
Marek made core Mesa call ProgramStringNotify(), which solves this
properly.  The hack is no longer needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-17 17:26:11 -07:00
Kenneth Graunke
dbac0a6352 i965/nir: Switch on shader stage in nir_lower_outputs().
VS, GS, and FS continue doing the same thing they did before.  We can
simplify the FS code a bit because it is always scalar.

Compute shaders now assert that there are no outputs instead of doing
a loop over 0 outputs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-17 17:26:11 -07:00
Marek Olšák
7c10af6425 radeonsi: don't use the AMDGPU intrinsic for CMP
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
f2cdb68c8b radeonsi: use LRP from gallivm
Totals:
SGPRS: 344552 -> 344368 (-0.05 %)
VGPRS: 197132 -> 197552 (0.21 %)
Code Size: 7375376 -> 7366304 (-0.12 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave

Totals from affected shaders:
SGPRS: 47736 -> 47552 (-0.39 %)
VGPRS: 27952 -> 28372 (1.50 %)
Code Size: 1392724 -> 1383652 (-0.65 %) bytes
LDS: 39 -> 39 (0.00 %) blocks
Scratch: 513024 -> 449536 (-12.38 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
eb11efc989 radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max
No difference according to shader-db. (with the new S_ABS_I32 pattern)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
d72a26ec5d radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
6660ca7121 radeonsi: initialize output, temp, and address registers to "undef"
This removes "v_mov v0, 0" which typically occurs before exports.

Totals:
SGPRS: 345216 -> 344552 (-0.19 %)
VGPRS: 197684 -> 197132 (-0.28 %)
Code Size: 7390408 -> 7375376 (-0.20 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave

Totals from affected shaders:
SGPRS: 101336 -> 100672 (-0.66 %)
VGPRS: 53920 -> 53368 (-1.02 %)
Code Size: 2170176 -> 2155144 (-0.69 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
529c5e7740 gallivm: implement the correct version of LRP
The previous version has precision issues. This can be a problem
with tessellation. Sadly, I can't find the article where I read it
anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert
this.

v2: added the comment
2015-10-17 21:40:03 +02:00
Marek Olšák
a2197cac7f gallivm: set correct opcode info from unary/binary/ternary emits
and clear the emit_data structure.

The new radeonsi min/max opcode implementation requires this.

(it looks good according to Roland S.)
2015-10-17 21:40:03 +02:00
Marek Olšák
5bc871a4ca radeonsi: implement vertex color clamping
This is only supported in the compatibility profile (without GS and tess).

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
208d1ed38d radeonsi: implement fragment color clamping
using the shader key for now.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
acc6a07874 radeonsi: clean up other scratch buffer functions
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
9098d7e9bd radeonsi: clean up copy-pasted scratch buffer updates
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
938a1bee34 radeonsi: unify shader create functions
The shader specifies the processor type, so use that instead.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
b0167809f1 radeonsi: unify shader delete functions
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
aa060e276c radeonsi: fix a GS copy shader leak
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
c4f086f399 radeonsi: remove an unused ctx parameter in si_shader_destroy
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
4f4f477d6d radeonsi: print export_prim_id from the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
b11edf8872 radeonsi: disable NaNs for LS and HS
They're disabled for all other shaders except compute, but I forgot
to do this for tess stages.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
73e3fba335 radeonsi: clean up si_llvm_init_export_args
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
82335978bb tgsi: move pipe_shader_from_tgsi_processor function to util
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Brian Paul
8c5647db5e mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-17 19:36:46 +02:00
Marek Olšák
3c6156a4a7 st/mesa: fix clip state dependencies
This allows removing FLUSH_VERTICES in MatrixMode.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-17 19:36:44 +02:00
Marek Olšák
006fcc0da6 gallium/hud: fix possible NULL pointer dereference
Trivial.
2015-10-17 19:06:27 +02:00
Brian Paul
3272f632ee scons: fix MSVC, MinGW build
Duplicate the glsl_types_hack.cpp work-around from the libgl-xlib target.
2015-10-17 10:06:49 -06:00
Rob Clark
7e6aafd6ab build: fix make-check after a6a6a71
commit a6a6a71092
   Author:     Rob Clark <robclark@freedesktop.org>
   AuthorDate: Sat Oct 10 14:13:50 2015 -0400

       glsl: (mostly) remove libglsl_util

Was a bit too ambitious on removal of libglsl_util.. it is still needed
by some of the tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-17 09:51:29 -04:00
Rob Clark
b7963b6926 build: fix out-of-tree build after b9b40ef
commit b9b40ef9b7
   Author:     Rob Clark <robclark@freedesktop.org>
   AuthorDate: Sat Oct 10 13:55:07 2015 -0400

       nir: remove dependency on glsl

broke things for i965 out of tree build.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-17 09:51:29 -04:00
Samuel Pitoiset
c188235d1b nvc0: add support for performance monitoring metrics on Fermi
As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-17 10:50:00 +02:00
Rob Clark
a6a6a71092 glsl: (mostly) remove libglsl_util
Now that NIR does not depend on glsl, we can (mostly[*]) get rid of the
libglsl_util hack.

[*] glsl_compiler is the one remaining user of libglsl_util

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:38 -04:00
Rob Clark
b9b40ef9b7 nir: remove dependency on glsl
Move glsl_types into NIR, now that the dependency on glsl_symbol_table
has been split out.

Possibly makes sense to rename things at this point, but if we do that
I'd like to keep it split out into a separate patch to make git history
easier to follow (IMHO).

v2: fix android build
v3: I f***ing hate scons.. but at least it builds

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:38 -04:00
Rob Clark
183db3a645 glsl: move half<->float convertion to util
Needed in NIR too, so move out of mesa/main/imports.c

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Rob Clark
60690cb3b3 glsl: move builtin vector types to glsl_types.cpp
First step at untangling NIR's dependency on glsl_types without bringing
in the dependency on glsl_symbol_table.  The builtin types are now in
glsl_types (which will end up in NIR), but adding them to the symbol-
table stays in builtin_types.cpp (which will not be part of NIR).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Rob Clark
33de998230 glsl: couple shader_enums cleanups
Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.

v2: add STATIC_ASSERT()'s

Reported-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Timothy Arceri
698cdbf492 glsl: initialise record array count to 1
This was only being done in one of the two process methods.

Fixes an issue with samplers using the array size of a previous record.

Tested-by: Marek Olšák <marek.olsak@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2015-10-17 08:50:40 +11:00
Timothy Arceri
3c87377d0b nir: add atomic lowering support for AoA
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-17 08:43:21 +11:00
Timothy Arceri
2e1798f183 nir: wrapper for glsl_type arrays_of_arrays_size()
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-17 08:43:15 +11:00
Ilia Mirkin
fd5e0581dd configure: show which gallium drivers/sts are built
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-16 17:18:43 -04:00
Brian Paul
2023906667 tgsi: initialize ctx.file in tgsi_dump_instruction()
Fixes segfault because of uninitialized file pointer.
Trivial.
2015-10-16 14:32:09 -06:00
Samuel Pitoiset
a3b1757551 nvc0: add a note about MP counters on GF100/GF110
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
0461260d77 nvc0: add MP counters variants for GF100/GF110
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.

Signed-off-by: Samuel Pitoiet <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
ec5001d25b nvc0: move SW/HW queries info to their respective files
This will help for handling HW SM queries variants on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
00d61869a5 nvc0: enable compute support by default on Fermi
Compute support was not enabled by default because weird effects
on 3D state happened, but I can't reproduce them anymore.

This also enables MP performance counters by default on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
8cd4b8478a nvc0: allow only one active query for the MP counters group
Because we can't expose the number of hardware counters needed for each
different query, we don't want to allow more than one active query
simultaneously to avoid failure when the maximum number of counters
is reached. Note that these groups of GPU counters are currently only
used by AMD_performance_monitor.

Like for Kepler, this limits the maximum number of active queries
to 1 on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
cef22f3490 nvc0: read MP counters of all GPCs on Fermi
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.

Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
1825898e04 nvc0: store the number of GPCs to nvc0_screen
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.

This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
c4896c99cb nvc0: fix unaligned mem access when reading MP counters on Fermi
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.

This patch fixes the following dmesg fail:

gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
7abd707251 nvc0: fix monitoring multiple MP counters queries on Fermi
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
4fcb661711 nvc0: fix queries which use multiple MP counters on Fermi
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.

According to the blob, computing the result is now as follows:

FOR  i..n
val += ctr[i] * pow(2, i)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
6353f620cd nvc0: allow to use 8 MP counters on Fermi
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
cac897197b nvc0: fix sequence field init for MP counters on Fermi
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
409658c367 nvc0: correctly enable the MP counters' multiplexer on Fermi
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x80000000.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
c3570c3fb9 nvc0: rip off the kepler MP-enabling logic from the Fermi codepath
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and
has no effect on Fermi (although this enables MP counters on Kepler).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
dab7e0ed09 nvc0: split out begin_query() hook used by MP counters
The way we configure MP performance counters is going to pretty
different between Fermi and Kepler. Having two separate functions
is much better.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
d4ecc2bce4 nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Brian Paul
efe37519b0 svga: only count hardware buffer mappings for HUD
Don't count client memory buffer mappings since they're basically free.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:44:00 -06:00
Neha Bhende
9bc7e3105a svga: add new GALLIUM_HUD queries
Add new GALLIUM_HUD queries for:
    num-shaders
    num-resources
    num-state-objects
    num-validations
    map-buffer-time
    num-surface-views
    num-resources-mapped
    num-flushes

Most of this patch was originally written by Neha.  Additional clean-ups
and num-flushes counter added by Brian Paul.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
f413f1a17c svga: use new svga_new_shader_variant() function
To simplify upcoming new HUD shader count implementation.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
8d0d5dca5b svga: pass context to svga_tgsi_vgpu9_translate()
Will be used for upcoming change.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
615b37a0e2 svga: remove svga_tgsi_vgpu9_translate() call in GS path
We can never have geometry shaders with vgpu9.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
cb473c46fe glsl: silence warning about unhandled ast_unsized_array_dim case in switch
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-16 11:34:05 -06:00
Brian Paul
afff809fea st/mesa: fix incorrect pointer type arguments in st_new_program()
Silences 5 warnings of the type:
state_tracker/st_cb_program.c: In function 'st_new_program':
state_tracker/st_cb_program.c:108:7: warning: passing argument 1 of
'_mesa_init_gl_program' from incompatible pointer type [enabled by default]
       return _mesa_init_gl_program(&prog->Base, target, id);
       ^
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-16 09:15:05 -06:00
Brian Paul
4627e8058e Revert "mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()"
This reverts commit 0de5e0f3fb.

Michel Dänzer spotted two piglit regressions from the change.  I suspect
that removing the FLUSH_VERTICES() actually exposed a bug elsewhere but
I don't have time to hunt down the root issue at this time.
2015-10-16 09:10:22 -06:00
Samuel Iglesias Gonsalvez
ccbb52ac11 glsl: fix check SSBOs support for builtin functions
has_shader_storage_buffer_objects() returns true also if the OpenGL
context is 4.30 or ES 3.1.

Previously, we were saying that all atomic*() GLSL builtin functions
for SSBOs were not available when OpenGL ES 3.1 context was in use.

Fixes 48 dEQP-GLES31 tests:

dEQP-GLES31.functional.ssbo.atomic.*

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-16 12:47:58 +02:00
Tapani Pälli
dc8c221e28 mesa: Set api prefix to version string when overriding version
Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).

This makes it currently impossible to run glslparsertest tests for
OpenGL ES when using version override.

Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.

Before:
	"3.1 Mesa 11.1.0-devel (git-24a1a15)"

After:
	"OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"

v2: only include api prefix for OpenGL ES (Boyan Ding)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-16 12:58:52 +03:00
Iago Toral Quiroga
c8f5274b52 nir: Get the number of SSBOs and UBOs right
Before d31f98a272 and 56e2bdbca3 we had a sigle index space for UBOs
and SSBOs, so NumBufferInterfaceBlocks would contain the combined number of
blocks, not just one kind. This means that for shader programs using both
UBOs and SSBOs, we were setting num_ssbos and num_ubos to a larger number than
we should. Since the above commits  we have separate index spaces for each
so we can just get the right numbers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Iago Toral Quiroga
f534f331ca i965/vec4: Use the right number of UBOs
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Iago Toral Quiroga
6f9ca30266 i965/fs: use the right number of UBOs
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Rob Clark
ef7a563829 freedreno: add debug option to dirty state after draw
Similar to "dclear", "ddraw" will mark all state dirty after each draw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-15 18:04:18 -04:00
Rob Clark
6206da736c freedreno/a3xx: cache-flush is needed after MEM_WRITE
Otherwise the mem2gmem blit would see potentially bogus texture
coordinates.  Fixes an issue that shows up with glamor.

CC: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-15 18:04:17 -04:00
Rob Clark
fefffdc2b2 gallium/util: fix debug_get_flags_option on 32-bit harder
(yes, we want PRI?64, but we want the x version rather than the u
version)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-15 18:04:17 -04:00
Chih-Wei Huang
7599f8b167 nv30: include the header of ffs prototype
It fixes a building error of the android 6.0 64-bit target.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-15 15:00:14 -04:00
Chih-Wei Huang
d31005e3e5 nv50/ir: use C++11 standard std::unordered_map if possible
Note Android version before Lollipop is not supported.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-15 14:59:59 -04:00
Jason Ekstrand
5f106153f5 nir/prog: Don't double-insert the fog-coord variable
nir_variable_create already inserts it in the right list for us so
inserting it again causes a linked list corruption.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-15 10:48:21 -07:00
Jason Ekstrand
b705005584 nir/glsl: Use shader_prog->Name for naming the NIR shader
This has the better name to use. Aparently, sh->Name is usually 0.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-15 07:31:09 -07:00
Jason Ekstrand
eb893c220c nir: Add helpers for creating variables and adding them to lists
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15 07:31:09 -07:00
Jason Ekstrand
635daef76e nir/prog: Use nir_foreach_variable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15 07:31:09 -07:00
Brian Paul
5d954fd5cb mesa: wrap a ridiculously long line in es1_conversion.c
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
d8c23d156d mesa: add num_buffers() helper in blend.c
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
dfbd62e772 mesa: optimize _UsesDualSrc blend flag setting
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag
will be the same for all buffers, so no need to compute it N times.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
d21e17f48f mesa: fix incorrect error string in _mesa_BlendEquationiARB()
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
1d75165501 mesa: move validate_blend_factors() call after no-change check
A redundant call to glBlendFuncSeparateiARB() is more likely than getting
invalid values, so do the no-op check first.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
34de3c4c16 mesa: optimize no-change check in _mesa_BlendEquationSeparate()
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
2dfedf105d mesa: optimize no-change check in _mesa_BlendEquation()
Same story as preceeding change to _mesa_BlendFuncSeparate().

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
6fd29e6c31 mesa: optimize no-change check in _mesa_BlendFuncSeparate()
Streamline the checking for no state change in _mesa_BlendFuncSeparate()
(and _mesa_BlendFunc()).  If _BlendFuncPerBuffer is false, we only need
to check the 0th buffer state.  Move argument validation after the no-op
check.

I'm looking at an app that issues about 1000 redundant glBlendFunc()
calls per frame!

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
083b3f5cb4 mesa: short-cut new_state == _NEW_LINE in _mesa_update_state_locked()
We can skip to the end of _mesa_update_state_locked() if only the
_NEW_LINE flag is set since none of the derived state depends on it
(just like _NEW_CURRENT_ATTRIB).  Note that we still call the
ctx->Driver.UpdateState() function, of course.

v2: use bitmask-based test, per Eric.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
0de5e0f3fb mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Chih-Wei Huang
67d8518a0e mesa: android: Fix the incorrect path of sse_minmax.c
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes: 669cfc267a (android: mesa: fix the path of the SSE4_1
optimisations)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15 13:41:02 +01:00
Mauro Rossi
45f0392ceb i965: android: add the i965_compile_FILES sources to the driver
i965_compile_FILES are needed otherwise we'll error out as below:

target SharedLib: i915_dri (out/target/product/x86/obj/SHARED_LIBRARIES/i915_dri_intermediates/LINKED/i915_dri.so)
external/mesa/src/mesa/drivers/dri/i965/brw_ir_fs.h:181: error: undefined reference to 'fs_inst::~fs_inst()'
...
...
external/mesa/src/mesa/drivers/dri/i965/intel_screen.c:1484: error: undefined reference to 'brw_compiler_create'
collect2: error: ld returned 1 exit status
build/core/shared_library.mk:81: recipe for target 'out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so' failed
make: *** [out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so] Error 1

[Emil Velikov: tweak commit message]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15 13:35:19 +01:00
Emil Velikov
bcb56c2c69 program: convert _mesa_init_gl_program() to take struct gl_program *
Rather than accepting a void pointer, only to down and up cast around
it, convert the function to take the base (struct gl_program) pointer.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-15 13:30:52 +01:00
Emil Velikov
2034bdd46c nir: include nir_instr_set.h in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-15 13:30:22 +01:00
Timothy Arceri
8da9e154b7 glsl: Allow arrays of arrays in GLSL ES 3.10 and GLSL 4.30
V3: use a check_*_allowed style function for requirements checking
rather than has_* which doesn't encapsulate the error message

V2: add missing 's' to the extension name in error messages
 and add decimal place in version string

Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
f22b7933e2 glsl: allow for AoA in calculating offset to ubo start region
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
bb5aeb8549 glsl: build ubo name and indexing offset for AoA
V2: split out unrelated change as suggested by Samuel

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
8cf1333b18 glsl: link uniform block arrays of arrays
This adds support for setting up the UniformBlock structures for AoA
and also adds support for resizing AoA blocks with a packed layout.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
d9f1f2bbc6 glsl: Add AoA support when checking for non-const index
When checking for non-const indexing of interfaces
take into account arrays of arrays

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
082b1ca2fe glsl: Add support for lowering interface block arrays of arrays
V2: make array processing functions static

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
132b9e9dd9 glsl: add AoA support for an inteface with unsized array members
Add support for setting the max access of an unsized member
of an interface array of arrays.

For example ifc[j][k].foo[i] where foo is unsized.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
d1d05c0f85 glsl: add AoA support for linking interface blocks with unsized members
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
dd89880dc0 glsl: avoid hitting assert for arrays of arrays
Also add TODO comment about adding proper support

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:21:33 +11:00
Timothy Arceri
2d7a98de18 glsl: add AoA support for atomic counters
This marks all counters in an AoA as active.

For AoA all but the innermost array are treated as separate
counters/uniforms. The Nvidia binary also goes further and
finds inactive counters in the AoA, in future we should do
this too, however this gets things working for the time being.

This change also removes the use of UniformHash for atomic counters,
this avoids having to generate name strings used as hash keys.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:21:27 +11:00
Timothy Arceri
261a434996 glsl: add std140 layout support for AoA
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:44:33 +11:00
Timothy Arceri
176e6930e6 i965: add arrays of arrays support for varyings
V2: get the correct vector elements value for outputs

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:44:26 +11:00
Timothy Arceri
be822b89ac glsl: calculate AoA uniform offset correctly for structs
This allows the correct offset to be calculated for use in indirect
indexing of samplers.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:44:20 +11:00
Timothy Arceri
410609c968 glsl: remove dead code in a single pass
Currently only one ir assignment is removed for each var in a single
dead code optimisation pass. This means if a var has more than one
assignment, then it requires all the glsl optimisations to be run again
for each additional assignment to be removed.
Another pass is also required to remove the variable itself.

With this change all assignments and the variable are removed in a single
pass.

Some of the arrays of arrays conformance tests that were looping
through 8 dimensions ended up with a var with hundreds of assignments.

This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 3 min 20 sec -> 2 min

ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 went from
around 9 min 20 sec to 7 min 30 sec

I had difficulty getting the public shader-db to give a consistent result
with or without this change but the results seemed unchanged at between
15-20 seconds.

Thomas Helland measured change with shader-db on his machine from
approx 117 secs to 112 secs.

V3: Simplify freeing of list as suggested by Ian, and spelling fixes.

V2: Add assert to be sure references are counted before assignments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-By: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:36:14 +11:00
Timothy Arceri
d337da81f2 glsl: dont allow gl_PerVertex to be redeclared as an array of arrays
V3: move patch after fixes to ast for AoA and add const to helper
as suggested by Ian

V2: move single dimensional array detection into a helper

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:36:01 +11:00
Timothy Arceri
dea0af8f82 glsl: check that only the outermost array is unsized
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:35:54 +11:00
Timothy Arceri
3129359ed7 glsl: allow AoA to be sized by initializer or constructor
V2: Split out unsized array validation to its own patch as
suggested by Samuel.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:35:45 +11:00
Timothy Arceri
296a7ea471 glsl: add support for initialising sampler AoA
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:35:40 +11:00
Timothy Arceri
db280e951a glsl: Add support for linking uniform arrays of arrays
V3: Fix setting of data.location for struct AoA UBO members

V2: Handle arrays of arrays in the same way structures are handled

The ARB_arrays_of_arrays spec doesn't give very many details on how
AoA uniforms are intended to be implemented. However in the
ARB_program_interface_query spec there are details that show AoA are
intended to be handled in a similar way to structs.

Issues 7 from the ARB_program_interface_query spec:

 We define rules consistent with our enumeration rules for
 other complex types.  For existing one-dimensional arrays, we enumerate
 a single entry if the array is an array of basic types, or separate
 entries for each array element if the array is an array of structures.
 We follow similar rules here.  For a uniform array such as:

   uniform vec4 a[5][4][3];

 we enumerate twenty different entries ("a[0][0][0]" through
 "a[4][3][0]"), each of which is treated as an array with three elements.
 This is morally equivalent to what you'd get if you worked around the
 limitation in current GLSL via:

    struct ArrayBottom { vec4 c[3]; };
    struct ArrayMid    { ArrayBottom b[3]; };
    uniform ArrayMid   a[5];

 which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]".

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:35:35 +11:00
Kenneth Graunke
ff31c243e3 i965: Don't hardcode FS in "validation failed!" message.
Instead, print "Scalar VS" or "Scalar FS".  Otherwise it's really
confusing which stage is broken.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 14:07:47 -07:00
Jordan Justen
a274eff9ff glsl: Support uint index in lower_vector_insert
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14 13:16:35 -07:00
Jordan Justen
ab04adcf63 glsl: Support uint index in do_vec_index_to_cond_assign
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14 13:16:34 -07:00
Jordan Justen
0d1eef536b i965/fs: Ignore compute shaders in brw_nir_lower_inputs
The commit shown below caused compute shaders to hit the unreachable
in the default of the switch block. Since compute shaders don't have
any inputs, we can make brw_nir_lower_inputs a no-op for CS.

commit 2953c3d761
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Aug 14 15:15:11 2015 -0700

    i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 13:16:30 -07:00
Jordan Justen
63728dac57 i965/fs: Simplify FS in brw_nir_lower_inputs to only support scalar mode
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 13:16:29 -07:00
Brian Paul
9abbf65d0a mesa: remove unused functions in program.c
replace_registers() and adjust_param_indexes() were unused.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-14 12:47:15 -06:00
Brian Paul
9d4ce80736 mesa: minor indentation fix in _mesa_BindTextureUnit() 2015-10-14 12:47:15 -06:00
Brian Paul
77eef81370 mesa: remove unused texUnit local in _mesa_BindTextureUnit()
The texture unit is error-checked before this and the texUnit var
is unused, so remove it.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-14 12:47:15 -06:00
Krzysztof Sobiecki
14f7ce4248 st/fbo: use pipe_surface_release instead of pipe_surface_reference
pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.

Fixes Wasteland 2 Director's Cut crash on start.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-14 12:47:07 -06:00
Marta Lofstedt
93267887a0 glsl: Enable split of lower UBOs and SSBO also for compute shaders
The split of Uniform blocks and shader storage block only loops
up to MESA_SHADER_FRAGMENT and igonres compute shaders.
This cause segfault when running the OpenGL ES 3.1 CTS tests
with GL_ARB_compute_shader enabled.

V2: Changed to use MESA_SHADER_STAGES instead of
MESA_SHADER_COMPUTE

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
2015-10-14 16:05:42 +02:00
Jose Fonseca
5423c1e855 glsl: Include util/strndup.h.
Fixes Windows builds.

Trivial.
2015-10-14 11:50:06 +01:00
Tapani Pälli
ac257f1070 glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources
Patch moves existing calculation code from shader_query.cpp to happen
during program resource list creation.

No Piglit or CTS regressions were observed during testing.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-14 12:39:04 +03:00
Tapani Pälli
b76159b096 glsl: add top level array size and stride to gl_uniform_storage
Patch adds 2 new fields to gl_uniform_storage so that we don't need to
calculate these values during runtime shader queries. This is required by
upcoming changes to free GLSL IR after linking.

Patch moves 3 booleans inside structure so that structure size stays the
same after this change.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-14 09:32:58 +03:00
Iago Toral Quiroga
d3f4588804 i965: Adapt SSBOs to work with their own separate index space
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
56e2bdbca3 glsl/lower_ubo_reference: lower UBOs and SSBOs to separate index spaces
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
d31f98a272 mesa: Add {Num}UniformBlocks and {Num}ShaderStorageBlocks to gl_shader{_program}
These arrays provide backends with separate index spaces for UBOS and SSBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
27dccf097d mesa: Rename {Num}UniformBlocks to {Num}BufferInterfaceBlocks
Currently, these arrays in gl_shader and gl_shader_program hold both
UBOs and SSBOs, so this looks like a better name. We were already
using NumBufferInterfaceBlocks in gl_shader_program, so this makes
things more consistent as well.

In a later patch we will add {Num}UniformBlocks and
{Num}ShaderStorageBlocks which will contain only references to
UBOs and SSBOs respectively that will provide backends with
a separate index space for both types of objects.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
9de651b261 glsl: Fix variable_referenced() for vector_{extract,insert} expressions
We get these when we operate on vector variables with array accessors
(i.e. things like a[0] where 'a' is a vec4). When we call variable_referenced()
on these expressions we want to return a reference to 'a' instead of NULL.

This fixes a problem where we pass a[0] as the first argument to an atomic
SSBO function that expects a buffer variable. In order to check this, we use
variable_referenced(), but that is currently returning NULL in this case, since
the underlying rvalue is a vector_extract expression.

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:08:12 +02:00
Iago Toral Quiroga
baee16bf02 nir: split SSBO min/max atomic instrinsics into signed/unsigned versions
NIR is typeless so this is the only way to keep track of the
type to select the proper atomic to use.

v2:
  - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
  - Change message for unreachable paths (Michael Schellenberger)

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 08:03:58 +02:00
Iago Toral Quiroga
be800ef6d8 i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-14 08:01:58 +02:00
Iago Toral Quiroga
9d2bbca98d i965/fs: Fix indentation in fs_live_variables::compute_start_end
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-14 08:01:46 +02:00
Brian Paul
4a168ad797 mesa: clean up comments for gl_current_attrib struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a7b6e6192a vbo: make void vbo_exec_BeginVertices() static
Not called from any other file.  Rename and move before use.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
84719ad9df vbo: document vbo_exec_context fields
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
d65b029dc2 vbo: minor clean-ups for vbo_exec_fixup_vertex()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
7f67bfaa74 vbo: add assertion in ATTR_UNION macro
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
3491ec5930 vbo: add comments, braces in ATTR_UNION() in vbo_exec_api.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
e729f36c09 vbo: fix whitespace in vbo_exec_draw.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
8fbb72c297 vbo: move 'tmp' var initialization
Improve readability a bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a1cbf85de0 vbo: improve fprintf() formatting
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a639bbf098 vbo: simplify vertex array initializations in vbo_context.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
20f31ae37c vbo: get rid of needless NR_MAT_ATTRIBS constant
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
dd293d8aae vbo: fix incorrect switch statement in init_mat_currval()
The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
switch clause for all loop iterations.

This doesn't fix any known issues but was clearly incorrect.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:22 -06:00
Brian Paul
c73c481c4a mesa: pass caller name to create_textures()
Simpler than the dsa flag approach.
2015-10-13 08:28:22 -06:00
Samuel Iglesias Gonsalvez
6a506689db glsl: fix matrix stride calculation for std430's row_major matrices with two columns
This is the result of applying several rules:

From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout":

"2. If the member is a two- or four-component vector with components
consuming N basic machine units, the base alignment is 2N or 4N,
respectively."
[...]
"4. If the member is an array of scalars or vectors, the base alignment
and array stride are set to match the base alignment of a single array
element, according to rules (1), (2), and (3), and rounded up to the
base alignment of a vec4."
[...]
"7. If the member is a row-major matrix with C columns and R rows, the
matrix is stored identically to an array of R row vectors with C
components each, according to rule (4)."
[...]
"When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4."

In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is
stored like an array of Y row vectors with 2 components each. Because
of std430 storage layout, the base alignment of the array of vectors
is not rounded up to vec4, so it is still 2*N.

Fixes 15 dEQP tests:

dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x4

v2:
- Add spec quote in both commit log and code (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-13 15:58:54 +02:00
Christian König
685335639a r600/vce: enable VCE for trinity/richland
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-13 14:32:52 +02:00
Christian König
83de93309e r600/uvd: disable UVD tiling by default
It has only minimal advantages for post processing and doesn't work with VCE.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-13 14:32:48 +02:00
Glenn Kennard
24a1a157a6 r600g: Enable GL_ARB_gpu_shader5 extension
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:55:42 +10:00
Glenn Kennard
1befb7ed98 r600g/sb: SB support for UBO indexing
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:55:33 +10:00
Glenn Kennard
80c5062abf r600g/sb: Support gs5 sampler indexing (v2)
[airlied: v2 cayman fixups]

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:53:35 +10:00
Kenneth Graunke
bd198b9f0a i965/vs: Simplify fs_visitor's ATTR file.
Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.

This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning.  assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number.  (We can't do this earlier as those values aren't
known.)

ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 14:33:26 -07:00
Ilia Mirkin
bf97f8d467 nouveau: avoid double-emitting fence
The act of ensuring that there is space can cause a flush to happen,
which will emit the current screen fence. If that is the fence we're
trying to wait on, then it will have been emitted as a result of doing
the PUSH_SPACE. Don't attempt to emit it a second time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 8053c9208f (nouveau: avoid emitting new fences unnecessarily)
Cc: mesa-stable@lists.freedesktop.org
2015-10-12 17:21:29 -04:00
Ian Romanick
eeb444bc99 glsl: Never allow the sequence operator anywhere in an array size
Fixes:

    spec/glsl-1.20/compiler/structure-and-array-operations/array-size-sequence-in-parenthesis.vert
    spec/glsl-es-1.00/compiler/array-sized-by-sequence-in-parenthesis.vert
    spec/glsl-es-3.00/compiler/array-sized-by-sequence-in-parenthesis.vert

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 10:15:14 -07:00
Ian Romanick
92635a84a7 glsl: In later GLSL versions, sequence operator is cannot be a constant expression
Fixes:
    ES3-CTS.shaders.negative.constant_sequence

    spec/glsl-es-3.00/compiler/global-initializer/from-sequence.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-sequence.frag

v2: Fix a couple copy-and-paste mistake in the spec quotations.
Suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:14 -07:00
Ian Romanick
05e4601c6b glsl: Add method to determine whether an expression contains the sequence operator
This will be used in the next patch to enforce some language sematics.

v2: Fix inverted logic in
ast_function_expression::has_sequence_subexpression.  The method
originally had a different name and a different meaning.  I fixed the
logic in ast_to_hir.cpp, but I only changed the names in
ast_function.cpp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
bb329f2ff6 glsl: Restrict initializers for global variables to constant expression in ES
v2: Combine this check with the existing const and uniform checks.  This
change depends on the previous patch (glsl: Only set
ir_variable::constant_value for const-decorated variables).

Fixes:

    ES2-CTS.shaders.negative.initialize
    ES3-CTS.shaders.negative.initialize

    spec/glsl-es-1.00/compiler/global-initializer/from-attribute.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-uniform.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-uniform.frag
    spec/glsl-es-1.00/compiler/global-initializer/from-global.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-global.frag
    spec/glsl-es-1.00/compiler/global-initializer/from-varying.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-uniform.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-uniform.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-in.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-in.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-global.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-global.frag

Note: spec/glsl-es-3.00/compiler/global-initializer/from-sequence.*
still fail because the result of a sequence operator is still considered
to be a constant expression.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92304
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
3524d6df33 glsl: Only set ir_variable::constant_value for const-decorated variables
Right now we're also setting for uniforms, and that doesn't seem to hurt
things.  The next patch will make general global variables in GLSL ES,
and those definitely should not have constant_value set!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
5bc68f0f2b glsl: Use constant_initializer instead of constant_value to determine whether to keep an unused uniform
This even matches the comment "uniform initializers are precious, and
could get used by another stage."

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
313372cae8 glsl/linker: Use constant_initializer instead of constant_value to initialize uniforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
8acce5d53a ff_fragment_shader: Use binding to set the sampler unit
This is the way layout(binding=xxx) works from GLSL.  The old method
just happened to work (and significantly predated support for
layout(binding=xxx)), but future changes will break this.

v2: Remove some stale comments.  Suggested by Matt and Chris Forbes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
43b07eb60f glsl: Allow built-in functions as constant expressions in OpenGL ES 1.00
In d4a24745 (August 2012), Paul made functions calls not be constant
expressions in GLSL ES 1.00.  Since this feature was added in desktop
GLSL 1.20, we believed that it was added in GLSL ES 3.00.  That turns
out to be completely wrong.  Built-in functions have always been allowed
as constant expressions in GLSL ES, and the patch adds the (many) spec
quotations to prove it.

While we never previously encountered this, a later patch enforces a GLSL
ES 1.00 rule that global variable initializers must be constant
expressions.  Without this fix, several dEQP tests fail.

Fixes:

    tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.frag
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.vert
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.frag
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.vert

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0 10.1 10.2 10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>

Yes, I know we don't maintain stable branches that far back, but that
*is* how far back this bug goes!
2015-10-12 10:15:13 -07:00
Nicolai Hähnle
45ed627d89 u_vbuf: fix vb slot assignment for translated buffers
Vertex attributes of different categories (constant/per-instance/
per-vertex) go into different buffers for translation, and this is now
properly reflected in the vertex buffers passed to the driver.

Fixes e.g. piglit's point-vertex-id divisor test.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-12 16:46:30 +02:00
Iago Toral Quiroga
7a1143f29e glsl: include variable name in error messages about initializers
Also fix style / wrong indentation along the way and make the messages
more uniform.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 08:31:08 +02:00
Iago Toral Quiroga
f09c229cc6 glsl: shader outputs cannot have initializers
GLSL Spec 4.20.8, 4.3 Storage Qualifiers:

"Initializers in global declarations may only be used in declarations of
 global variables with no storage qualifier, with a const qualifier or
 with a uniform qualifier."

We do this for input variables, but not for output variables. AMD and NVIDIA
proprietary drivers don't allow this either.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 08:31:08 +02:00
Iago Toral Quiroga
8281a7c533 i965: Fix unsafe pointer when dumping VS/FS IR
For the VS and FS stages that use ARB_vertex_program or
ARB_fragment_program we don't have a shader program, however,
when debuging is enabled, we call brw_dump_ir like this:

brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base);

where vs will be NULL (since prog is NULL).

As pointed out by Chris, this &vs->base is not really a dereference,
it simply computes a new address that just happens to be 0x0 because
the offset of base in brw_shader is 0. Then brw_dump_ir will see a
NULL pointer and not do anything. This is why this does not crash at
the moment. However, this does not look very safe (it would crash
for any location of base that is not the first in brw_shader), so
patch it to prevent a potential (even if unlikely) problem in the
future.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-12 08:30:57 +02:00
Dave Airlie
bcfaab3885 mesa/uniforms: fix get_uniform for doubles (v2)
The initial glGetUniformdv support didn't cover all the
casting cases that are apparantly legal, and cts seems to
test for them.

I've updated the piglit test to cover these cases now.

v2: fix indentation - it's all broken in this file (Ilia)
fix src/dst index tracking in light of fp64 support (Ilia)

cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-12 13:10:59 +10:00
Chia-I Wu
c8083b1adc ilo: improve Gen8 defines based on its PRMs 2015-10-12 10:15:28 +08:00
Matt Turner
4642d53a03 i965/vec4: Implement b2f and b2i using negation.
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
added) but it was missed in the new NIR backend. Add it there as well.

instructions in affected programs:     1857 -> 1810 (-2.53%)
helped:                                15

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-11 16:19:52 -07:00
Ilia Mirkin
9fe458335f nv50,nvc0: don't base decisions on available pushbuf space
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
2015-10-11 17:57:04 -04:00
Ilia Mirkin
8053c9208f nouveau: avoid emitting new fences unnecessarily
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.

This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
2015-10-11 17:57:04 -04:00
Samuel Pitoiset
06abd1a25e nvc0: make use of NVC0_COMPUTE_CLASS for GF110
In theory, GF110+ should also support NVC8_COMPUTE_CLASS but, in practice,
a ILLEGAL_CLASS dmesg fail appears when using it.

This fixes compute support and MP performance counters on GF110.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-10 22:11:03 +02:00
Kenneth Graunke
a23bdd1fae i965/gs: Make MAX_GS_INPUT_VERTICES a #define in brw_context.h.
For scalar VS, I'll need this in brw_fs.cpp as well.  It seems silly to
redeclare it in three places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
2953c3d761 i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.
Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers.  Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries.  Virtually all of these were copy propagated away,
but it's still ugly.

This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.

No changes in shader-db.

v2: Fix unreachable() message (Ken), update commit message (Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
6842ad7912 i965/vs: Fix a subtlety in the nr_attributes == 0 workaround.
nr_attributes is used to compute first_non_payload_grf, which is the
first register we're allowed to use for ordinary register allocation.

The hardware requires us to read at least one pair of values, but we're
completely free to overwrite that garbage register with whatever we like.

Instead of altering nr_attributes, we should alter urb_read_length, which
only affects the amount we ask the VF to read.  This should save us a
register in trivial cases (which admittedly isn't very useful).

While we're at it, improve the explanation in the comments.

v2: Actually do what I said (caught by Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
031d350132 i965/vs: Unify URB entry size/read length calculations between backends.
Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations.  We can move those up a level to
backend-agnostic code and reuse it for both.

Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data.  We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.

There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
a4e988f481 i965/cfg: Fix cfg_t::dump() when a block has no immediate dominator.
Switch statements introduce a bogus loop with an unconditional break at
the end of the loop, just before the while...so the while is unreachable
and has no immediate dominator.

v2: With less exuberance

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Emil Velikov
2496cfd771 docs: add news item and link release notes for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-10 17:10:26 +01:00
Emil Velikov
55a8f072ea docs: add sha256 checksums for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b4bfea0094)
2015-10-10 17:10:26 +01:00
Emil Velikov
8337a31bcc docs: add release notes for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 914966befc)
2015-10-10 17:10:26 +01:00
Chad Versace
82b324c24b i965/gen8: Remove gen<8 checks in gen8 code
Some assertions in gen8_surface_state.c checked for gen < 8.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-09 14:24:12 -07:00
Chad Versace
8a0c85b258 i965/gen9: Enable rep clears on gen9
The (gen < 9) check in brw_clear() was too broad. It disabled all types
of fast color clears:
    a. singlesample rep clears
    b. singlesample MCS fast clears
    c. multisample MCS fast clears

The MCS clears are still buggy, but the rep clear works well. So let's
enable it.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-09 14:24:12 -07:00
Chad Versace
dcd59a9e32 i965/gen9: Disable MCS for 1x color surfaces
Fast color clears are disabled for gen9 (see the checks in
brw_meta_fast_clear), so there is no reason to allocate the MCS and
track its clear/resolve state.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-09 14:24:12 -07:00
Roland Scheidegger
4c4ba5a8c3 tgsi: (trivial) kill c99-ism. 2015-10-09 23:12:14 +02:00
Marek Olšák
d695c676ea program: remove _mesa_init_*_program wrappers
They didn't do anything useful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:19 +02:00
Marek Olšák
092f0427dc program: remove other unused functions
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
5042a3eef8 program: remove unused cloning and combining functions
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c947a3a4c4 program: remove unused function _mesa_find_line_column
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
ee01942eb5 st/mesa: release the glsl_to_tgsi visitor after translation
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
e5073e8d0c st/mesa: translate tessellation shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
897177020b st/mesa: translate geometry shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
a907b5dd16 st/mesa: translate fragment shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
46021ace51 st/mesa: translate vertex shaders into TGSI when we get them
The translate functions is split into two:
- translation to TGSI
- creating the variant (TGSI transformations only)

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
de6a004035 st/mesa: fix glDrawPixels with a texture
The samplers for DrawPixels data and the pixel map are assigned to slots
which don't overlap with the existing sampler slots.

The texture coordinates for the user texture are uploaded as a constant.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
f15bb3e633 st/mesa: implement DrawPixels shader transformation using tgsi_transform_shader
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
b55b986dc9 st/mesa: make Z/S drawpix shaders independent of variants, don't use Mesa IR v2
- there is no connection to user fragment shaders, so having these as
  shader variants makes no sense
- don't use Mesa IR, use TGSI
- don't create gl_fragment_program, just create the shader CSO

v2: generate exactly the same shader as before to fix llvmpipe

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
f4ec81032b st/mesa: implement glBitmap shader transformation using tgsi_transform_shader
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
3eedb63371 st/mesa: remove old emulation for VS and FS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c04e91a0e9 st/mesa: use TGSI utility to emulate features for FS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
941721ee2a st/mesa: use TGSI utility to emulate features for VS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
4bbe418b4b st/mesa: decrease the size of st_vertex_program
The other variables can't be moved.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
4a21edf067 st/mesa: inline st_prepare_vertex_program
No other shader stage has a "prepare" function.
This will allow removing some variables from st_vertex_program.

Also, prepare_fragment_program was a dead prototype.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c80c19a9d5 tgsi/scan: add info about declared samplers (v2)
v2: get it from declarations, not instructions
2015-10-09 22:02:18 +02:00
Marek Olšák
417927ebde tgsi: add a utility for emulating some GL features
st/mesa will use this, but drivers can use it too.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
9ea2a86809 mesa: call ProgramStringNotify for fixed-function vertex programs
Drivers weren't notified about this at all.
This allows disabling on-demand compilation in drivers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Rob Clark
c9b982b72d glsl: move shader_enums into nir
First step towards inverting the dependency between glsl and nir (so nir
can be used without glsl).  Also solves this issue with 'make distclean'

  Making distclean in mesa
  make[2]: Entering directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
  Makefile:2486: ../glsl/.deps/shader_enums.Plo: No such file or directory
  make[2]: *** No rule to make target '../glsl/.deps/shader_enums.Plo'. Stop.
  make[2]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
  Makefile:684: recipe for target 'distclean-recursive' failed
  make[1]: *** [distclean-recursive] Error 1
  make[1]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src'
  Makefile:615: recipe for target 'distclean-recursive' failed
  make: *** [distclean-recursive] Error 1

Reported-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-09 15:03:28 -04:00
Francisco Jerez
7e441bf025 mesa: Get rid of texture-dependent image unit derived state.
The point is to avoid having to re-validate all image units when
_NEW_TEXTURE is flagged, which can be expensive if the driver exposes
a large number of image units.  This has been reported to fix a 36%
performance regression in the Synmark2 Multithread benchmark on the
i965 driver which exposes 192 image units.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788
Reported-by: Wendy Wang <wendy.wang@intel.com>
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:49:01 +03:00
Francisco Jerez
2d97a78b37 i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.
gl_image_unit::_Valid will be removed in a future commit.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:52 +03:00
Francisco Jerez
25d3338be3 mesa: Skip redundant texture completeness checking during image validation.
The call to _mesa_test_texobj_completeness() is unnecessary if the
texture is already known to be complete.  If the texture object is
dirtied in the meantime _BaseComplete and _MipmapComplete will be
reset to false.  _mesa_is_image_unit_valid() will start to be called
more frequently in a future commit, so it seems desirable to avoid the
unnecessary work.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:46 +03:00
Francisco Jerez
5152db415f mesa: Expose function to calculate whether a shader image unit is valid.
A future commit will remove all texture object-dependent derived state
from the image unit struct to make validation unnecessary on texture
state changes.  Instead of checking gl_image_unit::_Valid drivers will
be required to call this function when needed to find out whether an
image unit is in a valid state and whether access from the shader is
allowed.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:28 +03:00
Francisco Jerez
5346c11670 i965: Don't tell the hardware about our UAV access.
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives.  The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.

Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage.  An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required".  The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.

The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.

None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled.  We do set the "Accesses UAV" bits
though since my commit ac7664e493 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:

> 3DSTATE_GS
>[...]
> 12 Accesses UAV
>    Format: Enable
>    This field must be set when GS has a UAV access.

There are similar comments in the documentation for the other
3DSTATE_*S commands.  The "must" part is misleading and unjustified
AFAIK.  Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty.  There is an exception though.  On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.

The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).

The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only.  Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-10-09 17:48:26 +03:00
Connor Abbott
bb59ba8634 nir/instr_set: remove unnecessary check in nir_instrs_equal()
This was originally added to nir_instrs_equal() instead of
nir_instr_can_cse() incorrectly, but this was fixed when moving to the
instruction set API (as it had to be, otherwise hashing wouldn't work).
Now, this is dead code since instr_can_rewrite() will only return true
for texture instructions that use an index, so we can turn the check into
an assert. This also means that now nir_instrs_equal(instr, instr) will
always return true unless it assert-fails.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:15:28 -04:00
Connor Abbott
bf5f931aee nir: make nir_instrs_equal() static
This was previously tied to CSE, since it would only work for
instructions where nir_can_cse() (now instr_can_rewrite()) returned
true. Now that CSE uses the instruction set abstraction which only uses
this internally, we can make it local to nir_instr_set.c.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:15:15 -04:00
Connor Abbott
e8308d0523 nir/cse: use the instruction set API
This replaces an O(n^2) algorithm with an O(n) one, while allowing us to
import most of the infrastructure required for GVN. The idea is to walk
the dominance tree depth-first, similar when converting to SSA, and
remove the instructions from the set when we're done visiting the
sub-tree of the dominance tree so that the only instructions in the set
are the instructions that dominate the current block.

No piglit regressions. No shader-db changes.

Compilation time for full shader-db:

Difference at 95.0% confidence
        -35.826 +/- 2.16018
        -6.2852% +/- 0.378975%
        (Student's t, pooled s = 3.37504)

v2:
- rebase on start_block removal
- remove useless state struct
- change commit message

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:42 -04:00
Connor Abbott
523a28d3fe nir: add an instruction set API
This will replace direct usage of nir_instrs_equal() in the CSE pass,
which reduces an O(n^2) algorithm with an effectively O(n) one. It'll
also be useful for implementing GVN on top of GCM.

v2:
- Add texture support.
- Add more comments.
- Rename instr_can_hash() to instr_can_rewrite() since it's really more
about whether its uses can be rewritten, and it's implicitly used by
nir_instrs_equal() as well.
- Rename nir_instr_set_add() to nir_instr_set_add_or_rewrite() (Jason).
- Make the HASH() macro less magical (Topi).
- Rewrite the commit message.

v3:
- For sorting phi sources, use a VLA, store pointers to the sources, and
compare the predecessor pointer directly (Jason).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:35 -04:00
Connor Abbott
005c2efb7b nir: constify instruction comparison functions
v2: rebase, don't constify nir_srcs_equal() as it's pass-by-value
anyways

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:28 -04:00
Connor Abbott
d6bc35934f nir: constify nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:20 -04:00
Connor Abbott
20d6d812dc nir: split out instruction comparison functions
Right now nir_instrs_equal() is tied pretty tightly to CSE, but we're
going to introduce the idea of an instruction set and tie it to that
instead.  In anticipation of that, move this into its own file where
we'll add the rest of the instruction set implementation later.

v2: Rebase on texture support.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:13:27 -04:00
Neil Roberts
da361acd1c i965/fs: Handle non-const sample number in interpolateAtSample
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.

The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.

This fixes the following two Piglit tests:

arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform

v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
    Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
    number.
v5: Don't special case immediate values in
    brw_pixel_interpolator_query. Make a better wrapper for the
    function to set up the PI send instruction. Ensure that the SHL
    instructions are scalar. (Francisco Jerez).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-09 15:13:40 +02:00
Neil Roberts
728d7bc85f i965: Add a second successor to BRW_OPCODE_WHILE
It is possible to directly predicate the WHILE instruction. In this
case there will be a second successor block because the execution can
resume from the instruction after the loop. This will be used in a
subsequent patch.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-09 15:13:22 +02:00
Neil Roberts
886d46b089 nir: Add a function to determine if a source is dynamically uniform
Adds nir_src_is_dynamically_uniform which returns true if the source
is known to be dynamically uniform. This will be used in a later patch
to add a workaround for cases that only work with dynamically uniform
sources. Note that the function is not definitive, it can return false
negatives (but not false positives). Currently it only detects
constants and uniform accesses. It could easily be extended to include
more cases.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-09 15:10:40 +02:00
Samuel Pitoiset
7129cbf5f4 nvc0: move HW SM queries to nvc0_query_hw_sm.c/h files
Global performance counters (PCOUNTER) will be added to
nvc0_query_hw_pm.c/h files.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
224fec05ea nvc0: move HW queries to nvc0_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
77b6990d14 nvc0: move SW queries to nvc0_query_sw.c/h files
Loosely based on freedreno driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
0678530b9e nvc0: move nvc0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
0644196ab1 nvc0: add a header file for nvc0_query
This will allow to split SW and HW queries in an upcoming patch.

While we are at it, make use of nvc0_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Iglesias Gonsalvez
3da58730ee main: fix length of values written to glGetProgramResourceiv() for ACTIVE_VARIABLES
Return the number of values written.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
d0992fa15a main: buffer array variables can have array size of 0 if they are unsized
From ARB_program_query_interface:

  For the property ARRAY_SIZE, a single integer identifying the number of
  active array elements of an active variable is written to <params>. The
  array size returned is in units of the type associated with the property
  TYPE. For active variables not corresponding to an array of basic types,
  the value one is written to <params>. If the variable is a shader
  storage block member in an array with no declared size, the value zero
  is written to <params>.

v2:
- Unsized arrays of arrays have an array size different than zero

v3:
- Arrays and unsized arrays will have an array_stride > 0. Use it
  instead of is_unsized_array flag (Timothy).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
66ca8e6632 main: consider that unsized arrays have at least one active element
From ARB_shader_storage_buffer_object:

"When using the ARB_program_interface_query extension to enumerate the
 set of active buffer variables, only the first element of arrays (sized
 or unsized) will be enumerated"

_mesa_program_resource_array_size() is used when getting the name (and
name length) of the active variables. When it is an unsized array,
we want to indicate it has one active element so the returned name
would have "[0]" at the end.

v2:
- Use array_stride > 0 and array_elements == 0 to detect unsized
  arrays. Because of that, we don't need is_unsized_array flag
  (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
77c0b64ce3 main: fix TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE
When the active variable is an array which is already a top-level
shader storage block member, don't return its array size and stride
when querying TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE
respectively.

Fixes the following 12 dEQP-GLES31 tests:

dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.column_major_mat3x4

v2:
- Fix check when the shader storage block is instanced
- Write auxiliary function to do the check.

v3:
- Check if full_instanced_name is NULL just after allocation (Ilia)
- Remove () from one strcmp() in the if statement (Ilia)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-09 08:13:49 +02:00
Samuel Iglesias Gonsalvez
5be9bf2746 main: fix goto in program_resource_top_level_array_stride
Use found_top_level_array_stride instead of found_top_level_array_size.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-09 08:12:10 +02:00
Tapani Pälli
d8d0e4a81e mesa: add GL_UNSIGNED_INT_24_8 to _mesa_pack_depth_span
Patch adds missing type (used with NV_read_depth) so that it gets
handled correctly. This fixes errors seen with following CTS test:

   ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-09 09:11:14 +03:00
Brian Paul
7d7dd18711 mesa,meta: move gl_texture_object::TargetIndex initializations
Before, we were unconditionally assigning the TargetIndex field in
_mesa_BindTexture(), even if it was already set properly.  Now we
initialize TargetIndex wherever we initialize the Target field, in
_mesa_initialize_texture_object(), finish_texture_init(), etc.

v2: also update the meta_copy_image code.  In make_view() the
view_tex_obj->Target field was set, but not the TargetIndex field.
Also, remove a second, redundant assignment to view_tex_obj->Target.
Add sanity check assertions too.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Brian Paul
d61f492aba mesa: remove unused _mesa_create_nameless_texture()
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Brian Paul
b373c77693 mesa: remove unneeded error check in create_textures()
Callers of create_texture() will either pass target=0 or a validated
GL texture target enum so no need to do another error check inside
the loop.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Kristian Høgsberg Kristensen
c71f0d45e6 i965: Link compiler unit tests to libi965_compiler.la
We can now link the unit tests against just libi965_compiler.la. This
lets us drop a lot of DRI driver dependencies, but we still pull in all
of libmesa and more.

This also provides a few standalone users of libi965_compiler.la, which
will help us accidentally using i965_dri.so functions from the compiler.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
08d890d3bb i965: Break out backend compiler to its own library
This introduces a new libtool helper library, libi965_compiler.la.  This
library is moderately self-contained, but still needs to link to all of
libmesa.la among other things.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
9a2573e5fc i965/cs: Get max_cs_threads from brw_compiler devinfo
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
ee0f0108c8 i965: Move brw_get_shader_time_index() call out of emit functions
brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
ffc841cae5 i965: Move brw_select_clip_planes() to brw_shader.cpp
We call this from the compiler so move it to brw_shader.cpp.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
365e5d7892 i965: Use util_next_power_of_two() for brw_get_scratch_size()
This function computes the next power of two, but at least 1024. We can
do that by bitwise or'ing in 1023 and calling util_next_power_of_two().

We use brw_get_scratch_size() from the compiler so we need it out of
brw_program.c. We could move it to brw_shader.cpp, but let's make it a
small inline function instead.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
cc4683992b i965: Move brw_mark_surface_used() to brw_shader.cpp
brw_program.c won't be part of the compiler library, but we need
brw_mark_surface_used() in the compiler. Move to brw_shader.cpp.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
469d0e449b i965/cs: Split out helper for building local id payload
The initial motivation for this patch was to avoid calling
brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the
compiler. This commit ends up refactoring things a bit more so as to
split out the logic to build the local id payload to brw_fs.cpp. This
moves the payload building closer to the compiler code that uses the
payload layout and makes it available to other users of the compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:02 -07:00
Kristian Høgsberg Kristensen
4f33700f5a i965: Move brw_link_shader() and friends to new file brw_link.cpp
We want to use the rest of brw_shader.cpp with the rest of the compiler
without pulling in the GLSL linking code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:14:44 -07:00
Kristian Høgsberg Kristensen
99ca2256c1 i965: Configure bufmgr debug options from intel_screen.c
We need the debug flag parsing and INTEL_DEBUG in the compiler, but we
don't want the dependency on bufmgr (libdrm_intel) in there. Move to
intel_screen.c.

There are now only two lines left in brw_process_intel_debug_variable(),
but we keep it in intel_debug.h to avoid having to expose
'debug_control' as a global variable.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Kristian Høgsberg Kristensen
04158fb0f6 util: Move DRI parse_debug_string() to util
We want to use intel_debug.c in code that doesn't link to dri common.

v2: Remove unnecessary stddef.h include (Topi), use util/debug.h
    in all DRI driver and remove driParseDebugString() (Iago).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Kristian Høgsberg Kristensen
ba71d581ae i965: Move brw_dump_ir() out of brw_*_emit() functions
We move these calls one level up into the codegen functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Emil Velikov
1fda56cdb2 gallium/ddebug: add missing dd_util.h to sources list
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-08 18:13:24 +01:00
Emil Velikov
62741ff052 gallium/ddebug: automake: sort sources alphabetically
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-08 18:13:24 +01:00
Jason Ekstrand
9c528f5dfa nir/sweep: Reparent the shader name
Previously the name of the nir shader was being freed prematurely during
nir_sweep. Since 756613ed35 the name was later being used to generate
filenames for the optimiser debug output and these would end up with
garbage from the dangling pointer.

Co-authored-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-08 08:20:31 -07:00
Jan Vesely
c8031a879a c11/threads: initialize timeout structure
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-08 14:05:57 +01:00
Boyan Ding
89ae41ab4c docs/relnotes: document EGL_KHR_create_context on llvmpipe and softpipe
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-10-08 14:05:36 +01:00
Iago Toral Quiroga
1efbb8151b i965/gs/gen6: Maximum allowed size of SEND messages is 15 (4 bits)
Comit d48ac93066 addressed this for VS, but we forgot to do the same for
URB writes generated by the gen6 GS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
3141906fa3 i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place
That should make tracking where we do spills and pull loads a bit easier.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
36e82b137d i965: make pull constant loads in gen6 start at MRFs 16/17
So they do not conflict with our (un)spills (MRF 21..23) or our
URB writes (MRF 1..15)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
0c2add7751 i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Tapani Pälli
aee28a0aa3 mesa: include bad type in error string of _mesa_pack_depth_span
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-08 09:25:16 +03:00
Tapani Pälli
4e7fd66cf0 glsl: add varyings to resource list only with SSO
Varyings can be considered inputs or outputs of a program only when
SSO is in use. With multi-stage programs, inputs contain only inputs
for first stage and outputs contains outputs of the final shader stage.

I've tested that fix works for Assault Android Cactus (demo version)
and does not cause Piglit or CTS regressions in glGetProgramiv tests.

Following ES 3.1 CTS separate shader tests that do query properties
of varyings in SSO shader programs pass:

   ES31-CTS.program_interface_query.separate-programs-vertex
   ES31-CTS.program_interface_query.separate-programs-fragment

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92122
2015-10-08 07:43:11 +03:00
Jason Ekstrand
6ad9ebb073 mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks
The EXT_texture_format_BGRA8888 extension (which mesa supports
unconditionally) adds a new format and internal format called GL_BGRA_EXT.
Previously, this was not really handled at all in
_mesa_ex3_error_check_format_and_type.  When the checks were tightened in
commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT
would always cause an error to be thrown.

There were two primary issues here.  First, is that
_mesa_es3_effective_internal_format_for_format_and_type didn't handle the
GL_BGRA_EXT format.  Second is that it blindly uses _mesa_base_tex_format
which returns GL_RGBA for GL_BGRA_EXT.  This commit fixes both of these
issues as well as adds explicit checks that GL_BGRA_EXT is only ever used
with GL_BGRA_EXT and GL_UNSIGNED_BYTE.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-07 20:32:53 -07:00
Emil Velikov
bbf728f11b Revert "mesa: enable KHR_debug for ES contexts"
This reverts commit b69cfbdf18.

This isn't quite baked yet. Seems that despite building the ES piglits,
none of them got executed.
2015-10-07 21:49:50 +01:00
Matt Turner
164c8277f0 egl/dri2: Properly dereference array.
Fixes a regression that broke EGL since

commit 858f2f2ae6
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date:   Sun Sep 13 12:25:27 2015 +0100

    egl/dri2: ease srgb __DRIconfig conditionals
2015-10-07 11:48:49 -07:00
Marek Olšák
13e69805ea radeonsi: fix a GS hang on VI
Broken by one of the cleanups: 0d46c3bc9d
Not applicable to stable.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Marek Olšák
5749676d03 radeonsi: remove TC L2 cache flush for index buffers on VI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Brian Paul
6ed8fd3d67 svga: whitespace fixes in svga_sampler_view.c 2015-10-07 08:45:56 -06:00
Brian Paul
70c4cde453 svga: whitespace fixes in svga_resource_buffer.c 2015-10-07 08:45:56 -06:00
Stefan Dösinger
a2bc4a7b04 mesa: Remove GL_ARB_sampler_object depth compare error checking.
Version 3: Simplify the code comment, word wrap commit description.

Version 2: Return GL_FALSE if ARB_shadow is unsupported instead of
pretending to store the value as suggested by Brian Paul.

This fixes a GL error warning on r200 in Wine.

The GL_ARB_sampler_objects extension does not specify a dependency on
GL_ARB_shadow or GL_ARB_depth_texture for setting the depth texture
compare mode and function. Silently ignore attempts to change these
settings. They won't matter without a depth texture being assigned
anyway.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-07 08:45:56 -06:00
Brian Paul
2bad030ac9 svga: round UBO constant buffer size up/down to multiple of 16 bytes
The svga3d device requires constant buffers to be a multiple of 16 bytes
in size.  OpenGL UBOs may not fit that restriction.  As a work-around,
round the size up if possible, else round down.

Note that this patch only effects UBO constant buffers (index 1 or higher),
not the 0th/default constant buffer.

Fixes the game Grim Fandango Remastered.  VMware bug 1510130.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-07 08:45:56 -06:00
Emil Velikov
4ea5ed9f51 egl/dri2: enable EGL_KHR_gl_colorspace for swrast
No driver changes needed for softpipe/llvmpipe - things just work.

v2: Whitespace fixes.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 15:18:03 +01:00
Emil Velikov
858f2f2ae6 egl/dri2: ease srgb __DRIconfig conditionals
One can simplify the if-else chain, by declaring the driconfigs as a
two sized array, whist using srgb as a index to the correct entry.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 15:17:57 +01:00
Emil Velikov
b69cfbdf18 mesa: enable KHR_debug for ES contexts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:08:50 +01:00
Matthew Waters
70643a1389 main/get: make KHR_debug enums available everywhere
Move all the enums but CONTEXT_FLAGS. The spec seems quite explicit
about the latter (wrt OpenGL ES)

    "In OpenGL ES versions prior to and including ES 3.1 there is no
    CONTEXT_FLAGS state and therefore the CONTEXT_FLAG_DEBUG_BIT cannot
    be queried."

v2 [Emil Velikov] Rebase.
v3 [Emil Veliokv] Drop the CONTEXT_FLAGS hunk - not applicable for GLES

Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:07:01 +01:00
Matthew Waters
ae6ff72f5a glapi: add function pointers for KHR_debug for gles
v2 [Emil Velikov]
 - Rebase.
 - Correct version in gles11 dispatch_sanity.
 - Move the extension enable to a separate patch.

Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:07:01 +01:00
Varad Gautam
deb1765ec6 egl: move memcpy to bring conf->base operations together
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:05:28 +01:00
Varad Gautam
f988eff379 egl: restore surface type before linking config to its display
commit c2c2e9a (egl: implement EGL_KHR_gl_colorspace (v2)) leaves
_EGLConfig->SurfaceType set incorrectly before calling _eglLinkConfig(),
and the bad value is passed around to platform_android. set it to zero
as earlier.

v2: Set SurfaceType to 0, rather than surface_type (Suggested by Emil)

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91596
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:05:20 +01:00
Ilia Mirkin
47d11990b2 nouveau: make sure there's always room to emit a fence
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-07 04:30:05 -04:00
Boyan Ding
64d9d4b730 vc4: use nir two-sided-color lowering
Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering).
No piglit regression.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-06 16:34:07 -07:00
Eric Anholt
b6cd39fc47 vc4: Fix a leak of the last color read/write surface on context destroy. 2015-10-06 16:32:03 -07:00
Eric Anholt
922e0680f9 vc4: Fix a memory leak in the simulator case.
We validate per draw call, and need to free the shader per draw call, too.
2015-10-06 16:29:14 -07:00
Mark Janes
3861010213 mesa: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
3475b68abd radeon/r200: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
eb6b80842f i965: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
83f9f911b2 i915: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Ville Syrjälä
3bcc780126 i915: Drop broken front_buffer_reading/drawing optimization
Bring the following commit over to i915:
 commit ec542d7457
 Author: Eric Anholt <eric@anholt.net>
 Date:   Mon Mar 3 10:43:10 2014 -0800

    i965: Drop broken front_buffer_reading/drawing optimization.

Not sure if it might fix anything, but since the i965 and i915 used to
share a bunch of that code, it would seem reasonable the same problems
could be present in the i915 code still, and the i965 approach is well
tested by now so bringing it over seems fairly safe.

No piglit regressions on 855.

v2: Rebase on _mesa_is_front_buffer_* refactor.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
ea8b77e892 mesa/i965: Refactor brw_is_front_buffer_{drawing,reading} to common code
There are multiple similar implementations of these functions, and a
later patch was going to add another.

v2: Move removing intel_framebuffer to a different patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
5c4ef9f1d2 st/mesa: Don't override NewFramebuffer just to call _mesa_new_framebuffer
v2: Since state_tracker does not call _mesa_init_driver_functions, we
need to initialize the dd::NewFramebuffer pointer to
_mesa_new_framebuffer here.  Suggested by Brian.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
df75babf74 radeon: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-06 11:36:32 -07:00
Ian Romanick
e32a6590a4 i915: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:28:00 -07:00
Ian Romanick
ed7f00f564 i965: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:27:45 -07:00
Ville Syrjälä
021f15816e i830: Fix culling with user fbos on gen2
Flip the cull bits when rendering to a user fbo on gen2. This
was already done on gen3 (since before git history starts)
but was missing from the gen2 code.

Fixes rendering of the driver+kart model in supertuxkart kart
selection screen.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
3e2c7ca773 i915: Adjust line size limits
The hardware can draw lines 0.5 to 7.5 pixels wide. Adjust the limits
to 1.0-7.0. The old limits seems to be from the era when i915 and i965
were sharing this code.

Not really sure if 1.0-7.0 is correct. Maybe it could be 0.5.7.5 as
those are the hw limits, or maybe some combination of the two?

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
00ee403883 i915: Enable intel_render path for points
The sub-pixel adjustment for points was killed off in
 commit 60d762aa62
 Author: Xiang, Haihao <haihao.xiang@intel.com>
 Date:   Wed Jan 2 11:38:51 2008 +0800

    i915: Needn't adjust pixel centers. fix #12944

so if we don't need it in intel_tris.c we don't need it in
intel_render.c either, which means we can allow intel_render.c to render
points.

No apparent regressions on PNV in ES1 or ES2 conformance.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
0febd0ecfd i915: Use COPY_DWORDS for points
The sub-pixel adjustment for points was killed off in
 commit 60d762aa62
 Author: Xiang, Haihao <haihao.xiang@intel.com>
 Date:   Wed Jan 2 11:38:51 2008 +0800

    i915: Needn't adjust pixel centers. fix #12944

so we can just as well use COPY_DWORDS().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
bcf650496f i915: Use _tnl_RenderClippedPolygon and _tnl_RenderClippedLine
_tnl_RenderClippedPolygon and _tnl_RenderClippedLine already do most of
what we want so use them.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
303895655c i915: Handle provoking vertex in intelFastRenderClippedPoly()
intelFastRenderClippedPoly() renders the polygon using triangles. For
polygons the provoking vertex is always the first one, and currently
this function assumes that the provoking vertex for triangles is the
last one. In case the user changed the provoking vertex convention,
the hardware may be configured to treat the first vertex of triangles
as the provoking vertex. So check the convention and emit the triangles
in the appropriate order to avoid having to change the hardware
provoking vertex convention for rendering polygons.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
0886426503 t_dd_dmatmp: Check provoking vertex convention when rendering quads
When drawing quads using triangles we need to be careful to make
the provoking vertices match when flat shading.

v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
83d511e190 t_dd_dmatmp: Disallow flat shading when rendering quad strips via tri strips
When rendering quad strips via tri strips we can't get the provoking
vertex right, so disallow flat shading.

v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
b15b4581d1 t_dd_dmatmp: Allow flat shaded polygons with tri fans
We can allow rendering flat shaded polygons using tri fans if we check
the provoking vertex convention.

v2 (idr): Remove _EXT suffixes from GL_FIRST_VERTEX_CONVENTION.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ian Romanick
5ca00e0b8d t_dd_dmatmp: Replace fprintf with unreachable
From http://lists.freedesktop.org/archives/mesa-dev/2015-May/084883.html:

    "There are no real error cases here, just dead code.
    validate_render() is supposed to make sure we never call these
    functions if the code can't actually render the primitives. The
    fprintf()+return branches should really just contain assert(0) or
    equivalent."

I also rearranged the if-else-block in render_quad_strip_verts to look
more like the other functions.  A future patch is going to change a
bunch of that code anyway.

v2: Make "unreachable" message more descriptive.  Suggested by Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-06 10:44:00 -07:00
Ian Romanick
46b13666d8 radeon: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Ian Romanick
68976a5a00 i965: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Ville Syrjälä
fad5fd3a25 i915: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Brian Paul
3801fa65c1 tgsi: add const qualifier to silence warning
Trivial.
2015-10-06 08:51:33 -06:00
Brian Paul
b7766a95e1 glsl: whitespace/formatting/typo fixes in link_uniforms.cpp 2015-10-06 08:51:33 -06:00
Samuel Iglesias Gonsalvez
50d5a36f35 main: array stride for unsized arrays of arrays are calculated like records
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-06 14:28:26 +02:00
Samuel Iglesias Gonsalvez
82db642042 glsl: add std430 layout support for AoA
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-06 14:02:13 +02:00
Timothy Arceri
6483183279 docs: Mark GL_ARB_enhanced_layouts as in progress 2015-10-06 14:04:23 +11:00
Ilia Mirkin
dbae576f7f i965: add EXT_polygon_offset_clamp support to gen4/gen5
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-05 14:39:38 -07:00
Matt Turner
833fa9a8cd meta: Update comment about unsupported texture types.
Ken added support for 2DArray (commit ec23d5197e) and 1DArray (commit
14ca61125) last year.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 14:35:13 -07:00
Matt Turner
d4ff638504 glx: Drop CRAY support.
It couldn't have worked anyway. There were calls to undefined functions.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-05 14:34:16 -07:00
Matt Turner
617eb5e6c3 glsl: Remove CSE pass.
With NIR, it actually hurts things.

total instructions in shared programs: 6529329 -> 6528888 (-0.01%)
instructions in affected programs:     14833 -> 14392 (-2.97%)
helped:                                299
HURT:                                  1

In all affected programs I inspected (including the single hurt one) the
pass CSE'd some multiplies and caused some reassociation (e.g., caused
(A * B) * C to be A * (B * C)) when the original intermediate result was
reused elsewhere.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 14:31:26 -07:00
Matt Turner
5a360dcad1 i965: Generalize predicated break pass for use in vec4 backend.
instructions in affected programs:     44204 -> 43762 (-1.00%)
helped:                                221

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
4098a756b5 i965/fs: Use backend_instruction in predicated break peephole.
We're not using any fs_inst fields, and the next commit will make the
peephole used by the vec4 backend.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
5964419921 i965/fs: Remove SNB embedded-comparison support from optimizations.
We never emit IF instructions with an embedded comparison (lost in the
switch to NIR), so this code is not used. If we want to readd support,
we should have a pass that merges a CMP instruction with an IF or a
WHILE instruction after other optimizations have run.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
36ea9922ad mesa: Add missing _mm_mfence() before streaming loads.
According to the Intel Software Development Manual (Volume 1: Basic
Architecture, 12.10.3 Streaming Load Hint Instruction):

   Streaming loads may be weakly ordered and may appear to software to
   execute out of order with respect to other memory operations.
   Software must explicitly use fences (e.g. MFENCE) if it needs to
   preserve order among streaming loads or between streaming loads and
   other memory operations.

That is, a memory fence is needed to preserve the order between the GPU
writing the buffer and the streaming loads reading it back.

Reported-by: Joseph Nuzman <joseph.nuzman@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-10-05 12:06:33 -07:00
Chad Versace
93161be9e7 i965: Fix intel_miptree_is_fast_clear_capable()
There are three types of fast clears:
  a. fast depth clears
  b. fast singlesample color clears
  c. fast multisample color clears
Function intel_miptree_is_fast_clear_capable() checks if a miptree
supports fast clears of type (b).

Rename the function to disambiguate what it does:
  old: intel_miptree_is_fast_clear_capable
  new: intel_miptree_supports_non_msrt_fast_clear

The functionally accidentally rejected multisampled color surfaces
because it thought they were singlesample array surfaces. Fix that by
explicitly rejecting surfaces with samples > 1.

This fix would have been needed before we enabled layered fast
singlesample color clears (introduced in gen8), which we want to do
eventually. For now, though, this patch changes no behavior; it just
fixes how the driver chooses its behavior.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 11:14:04 -07:00
Chad Versace
125a04b474 i965/mt: Declare some functions as static
intel_tiling_supports_non_msrt_mcs() and
intel_miptree_is_fast_clear_capable() are not used outside of
intel_mipmap_tree.c.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 11:10:11 -07:00
Iago Toral Quiroga
73e0dfbaca i965: Make vec4_visitor's destructor virtual
We need a virtual destructor when at least one of the class' methods is virtual.
Failure to do so might lead to undefined behavior when destructing derived classes.
Fixes the following warning:

brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)':
brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
    delete gs;

Curro: This shouldn't be causing any actual bugs at the moment because
gen6_gs_visitor is the only subclass of vec4_visitor destroyed through
a pointer of a base class (vec4_gs_visitor *) and its destructor is
basically the same as its parent's. Anyway it seems sensible to change
this so it doesn't bite us in the future.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-05 13:50:15 +02:00
Tapani Pälli
a90feb581a glsl: set glsl error if binding qualifier used on global scope
Fixes following Piglit test:
	global-scope-binding-qualifier.frag

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-05 14:44:24 +03:00
Iago Toral Quiroga
102f6c446b i965: Assert on the number of combined UBO and SSBO binding table entries
In theory we can't break this assertion since the compiler frontend checks
that we don't exceed any of the individual limits, but it does not hurt to
be extra safe.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:19:34 +02:00
Iago Toral Quiroga
20cbe3688a i965: Reserve binding table space for SSBO surfaces
These share the space with UBO surfaces but we need to make sure we
allocate enough space for both sets (12 of each)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Iago Toral Quiroga
41c4d45e08 i965: Define BRW_MAX_SSBO
Instead of using hard-coded values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Iago Toral Quiroga
440f9348c1 i965: Define BRW_MAX_UBO
Instead of using hard-coded values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Matt Turner
4caa10193f i965/vec4: Remove more dead visitor/vertex program code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-04 23:03:59 -07:00
Matt Turner
cd7fa1034a i965: Don't print line numbers with INTEL_DEBUG=optimizer.
The thing you want to do with the output files is diff them, which is
made more difficult by line numbers changing.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2015-10-04 23:03:59 -07:00
Ilia Mirkin
78ec9e28ec nv30: always go through translate module on big-endian
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-04 21:50:41 -04:00
Ilia Mirkin
1fec05d114 nv30: pretend to have packed texture/surface formats
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-04 21:50:41 -04:00
Michel Dänzer
87c3c9acd2 st/dri: Use packed RGB formats
Fixes Gallium based DRI drivers failing to load on big endian hosts
because they can't find any matching fbconfigs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71789
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-04 21:50:31 -04:00
Timothy Arceri
763cd8c080 glsl: reduce memory footprint of uniform_storage struct
The uniform will only be of a single type so store the data for
opaque types in a single array.

Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-05 10:53:24 +11:00
Kenneth Graunke
b85757bc72 i965: Remove shader_prog from vec4_gs_visitor.
Unfortunately it has to stay in gen6_gs_visitor.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
21585048a2 i965: Use nir->has_transform_feedback_varyings to avoid shader_prog.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
7768b802e5 nir: Add a nir_shader_info::has_transform_feedback_varyings flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
5d7f8cb5a5 nir: Introduce new nir_intrinsic_load_per_vertex_input intrinsics.
Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number.  While GLSL makes
this look like a normal array, it can be very different behind the
scenes.

On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct.  This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.

NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset.  We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.

This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case.  They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.

v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
f2a4b40cf1 nir/lower_io: Make get_io_offset() return a nir_ssa_def * for indirects.
get_io_offset() already walks the dereference chain and discovers
whether or not we have an indirect; we can just return that rather than
computing it a second time via deref_has_indirect().  This means moving
the call a bit earlier.

By returning a nir_ssa_def *, we can pass back both an existence flag
(via NULL checking the pointer) and the value in one parameter.  It
also simplifies the code somewhat.  nir_lower_samplers works in a
similar fashion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-04 14:00:01 -07:00
Timothy Arceri
6994ca20aa glsl: fix whitespace
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-04 17:42:41 +11:00
Marek Olšák
814b7d1ab9 radeonsi: enable PIPE_CAP_FORCE_PERSAMPLE_INTERP
Now st/mesa won't generate 2 variants for this state.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
b3c55fc669 radeonsi: do force_persample_interp in shaders for non-trivial cases
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
9652bfcf2d radeonsi: implement the simple case of force_persample_interp
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
214de2d815 radeonsi: move SPI_PS_INPUT_ENA/ADDR registers to a separate state
This will be a derived state used for changing center->sample and
centroid->sample at runtime.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
55d406b71e tgsi/scan: add interpolation info into tgsi_shader_info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
6b0f21cb28 st/mesa: automatically set per-sample interpolation if using SampleID/Pos
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
4e9fc7e4e2 st/mesa: set force_persample_interp if ARB_sample_shading is used
This is only a half of the work. The next patch will handle
gl_SampleID/SamplePos, which is the other half of ARB_sample_shading.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
f3b37e321f gallium: add per-sample interpolation control into rasterizer statOAe
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
d8932a355d st/mesa: add ST_DEBUG=precompile support for tessellation shaders
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
dd340b34f3 mesa: remove Driver.BindImageTexture
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
92709dcb9b mesa: remove Driver.DeleteSamplerObject
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
00f6beed02 mesa: remove Driver.EndCallList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
ef6c0714af mesa: remove Driver.BeginCallList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
f457964885 mesa: remove Driver.EndList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
55735cad00 mesa: remove Driver.NewList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
7a54939728 mesa: remove Driver.NotifySaveBegin
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
4b8bb2f559 mesa: remove Driver.SaveFlushVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
72a5dff9cb mesa: remove Driver.FlushVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
91799880b3 mesa: remove Driver.BeginVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
82a950f187 mesa: remove Driver.BindArrayObject
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
d1269a844f mesa: remove Driver.DeleteArrayObject
Nothing reimplements it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
7401807e8d mesa: remove Driver.NewArrayObject
Nothing reimplements it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
1044f99812 mesa: remove Driver.Hint
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
8de82faf95 mesa: remove Driver.ColorMaskIndexed
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
379255298f mesa: remove some Driver.Blend* hooks
Nothing sets them.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
a6cc895e93 mesa: remove Driver.Accum
Nothing calls it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
a4fca24484 mesa: remove Driver.ResizeBuffers
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
6863d5b02a mesa: remove Driver.DeleteShaderProgram
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
b37dcb8c18 mesa: remove Driver.NewShaderProgram
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
95e0303312 mesa: remove Driver.DeleteShader
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
18123a732b egl/dri2: don't require a context for ClientWaitSync (v2)
The spec doesn't require it. This fixes a crash on Android.

v2: don't set any flags if ctx == NULL
v3: add the spec note

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
b78336085b st/dri: don't use _ctx in client_wait_sync
Not needed and it can be NULL.

v2: fix dri2_get_fence_from_cl_event - thanks Albert

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
27b102e7fd r600g: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
5804c6adf8 gallium/radeon: add separate stencil level dirty flags
We will only do depth-only or stencil-only decompress blits, whichever is
needed by textures, instead of always doing both.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
cc92b90375 radeonsi: dump buffer lists while debugging
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
eb55610c89 winsys/radeon: implement cs_get_buffer_list
This is more complicated, because tracking priority_usage needed changing
the relocs_bo type.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
6f48e2bee1 winsys/amdgpu: add winsys function cs_get_buffer_list
For debugging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
93641f4341 gallium/radeon: stop using "reloc" in a few places
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
2edb060639 gallium/radeon: tell the winsys the exact resource binding types
Use the priority flags and expand them.
This information will be used for debugging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
9bd7928a35 radeonsi: add an option for debugging VM faults
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
4502d0bf88 radeonsi: move dumping the last IB into its own function
v2: indentation fix

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
89f73827d0 ddebug: separate creation of debug files
This will be used by radeonsi for logging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Emil Velikov
3cd5395206 docs: add news item and link release notes for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-03 13:23:13 +01:00
Emil Velikov
61c35ce4f9 docs: add sha256 checksums for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8957b696f9)
2015-10-03 13:20:08 +01:00
Emil Velikov
b2a987fc12 docs: add release notes for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit ab9aacce2d)
2015-10-03 13:20:06 +01:00
Matthew Waters
11cabc45b7 egl: rework handling EGL_CONTEXT_FLAGS
As of version 15 of the EGL_KHR_create_context spec, debug contexts
are allowed for ES contexts.  We should allow creation instead of
erroring.

While we're here provide a more comprehensive checking for the other two
flags - ROBUST_ACCESS_BIT_KHR and FORWARD_COMPATIBLE_BIT_KHR

v2 [Emil Velikov] Rebase. Minor tweak in commit message.

Cc: Boyan Ding <boyan.j.ding@gmail.com>
Cc: Chad Versace <chad.versace@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91044
Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-03 12:30:13 +01:00
Jason Ekstrand
443d3bf340 i965/wm: Make compute_barycentric_interp_modes take a nir_shader and a devinfo
Now that everything comes in through NIR, we can pick this directly out of
the shader source and don't need to reference the gl_fragment_program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:20 -07:00
Jason Ekstrand
1e3c1b107e i965: Use nir_foreach_variable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:18 -07:00
Jason Ekstrand
050e4787d3 nir: Add a nir_foreach_variable macro
This is a common enough operation that it's nice to not have to think about
the arguments to foreach_list_typed every time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:16 -07:00
Jason Ekstrand
ca941799ce i965/nir: Remove the prog parameter from brw_nir_lower_inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:00 -07:00
Tom Stellard
a2e1e3d325 radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target v2
This fixes a race condition in the glx-multithreaded-shader-compile
test.

v2:
  - Replace gallivm_init_llvm_{begin,end}() with gallivm_init_llvm_targets().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:41:27 +00:00
Tom Stellard
76cfd6f1da gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets v2
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.

When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).

The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.

This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.

v2:
  - Use call_once and remove mutexes and static initializations.
  - Replace gallivm_init_llvm_{begin,end}() with
    gallivm_init_llvm_targets().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:41:26 +00:00
Tom Stellard
3219b48ae5 gallium/radeon: Use call_once() when initailizing LLVM targets
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:19:01 +00:00
Jason Ekstrand
bf7b6fd3fd i965/shader: Get rid of the shader, prog, and shader_prog fields
Unfortunately, we can't get rid of them entirely.  The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE.  The GS vec4 backend still
needs gl_shader_program for handling transfom feedback.  However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:54 -07:00
Jason Ekstrand
404419ee1a i965/fs,vec4: Get rid of the sanity_param_count
It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip.  Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
ca6a436f12 i965/vec4: Use nir info instead of pulling things out of [shader_]prog
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
756613ed35 i965/fs: Use the nir info instead of pulling things out of [shader_]prog
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
b62e36d18f i965/fs: Move sampler unit lookup into rescale_texcoord
The texunit variable we create and assign in nir_emit_texture gets passed
through two more layers of function calls before it gets to its sole use in
rescale_texcoord.  The best part is that we already pass the sampler into
rescale_texcoord so we can just look it up there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7b974c5f90 i965/cs: Remove the prog argument from local_id_payload_dwords
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7926c3ea7d i965/backend_shader: Add a field to store the NIR shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7a8d06b6dd nir: Move GS data to nir_shader_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
e4fea486da nir: Add a a nir_shader_info struct
This commit also adds code to glsl_to_nir and prog_to_nir to fill it out.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
cd1ae6ebfa nir/glsl: Take a gl_shader_program and a stage rather than a gl_shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
30c6357113 i965: Move prog_data uniform setup to the codegen level
As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor.  This makes uniform setup more of a
language/API thing than a backend compiler thing.  This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
ea006c4cb5 i965: Move binding table setup to codegen time.
Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data.  This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
28709e37d9 i965/shader: Pull assign_common_binding_table_offsets out of backend_shader
This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:52 -07:00
Jason Ekstrand
cdf314cb21 i965/nir: Simplify uniform setup
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
7fee8b6f05 i965/nir: Pull GLSL uniform handling into a common function
The way we deal with GLSL uniforms and builtins is basically the same in
both the vec4 and the fs backend.  This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
03c4171b57 i965/nir: Pull common ARB program uniform handling into a common function
The way we deal with ARB program uniforms is basically the same in both the
vec4 and the fs backend.  This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
390b48fc4a i965/vec4: Use the uniform count from nir_assign_var_locations
Previously, we were counting up uniforms as we set them up.  However, this
count should be exactly identical to shader->num_uniforms provided by
nir_assign_var_locations.  (If it's not, we're in trouble anyway because
that means that locations don't match up.)  This matches what the fs
backend is already doing.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
3de81508ea i965/shader: Get rid of the setup_vec4_uniform_value helper
It's not used by anything anymore

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
58cea0c2b6 i965/shader: Pull setup_image_uniform_values out of backend_shader
I tried to do this once before but Curro pointed out that having it in
backend_shader meant it could use the setup_vec4_uniform_values helper
which did different things in vec4 and fs.  Now the setup_uniform_values
function differs only by an assert in the two backends so there's no real
good reason to be using it anymore.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
5609e0d7b4 i965/vec4: Get rid of the uniform_vector_size array
The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
ea35fb0fbe i965/vec4: Use the actual channels used in pack_uniform_registers
Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms.  However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go.  This may also allow us
to pack things tighter in a few cases.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
cd2132f45b glsl/types: Make subroutine types have a single matrix column
That way, if we do the usual thing of multiplying vector_elements by
matrix_columns we get the actual number of components in the type as per
component_slots().

While we're at it, we also switch to using the actual C++ field
initializers for vector_elements and matrix_columns.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
a7e0f755bc i965: Pull stage_prog_data.nr_params out of the NIR shader
Previously, we had a bunch of code in each stage to figure out how many
slots we needed in stage_prog_data.param.  This code was mostly identical
across the stages and had been copied and pasted around.  Unfortunately,
this meant that any time you did something special, you had to add code for
it to each of these places.  In particular, none of the stages took
subroutines into account; they were working entirely by accident.  By
taking this data from the NIR shader, we know the exact number of entries
we need and everything goes a bit smoother.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
fc3f45234b i965/vs: Move lazy NIR creation to codegen_vs_prog
The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases.  It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:38 -07:00
Jason Ekstrand
64b145422b i965/vec4: Delete the old vec4_vp code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:19:36 -07:00
Jason Ekstrand
1153f12076 i965/vec4: Delete the old ir_visitor code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:19:34 -07:00
Jason Ekstrand
b85761d11d i965/vec4: Always use NIR
GLSL IR vs. NIR shader-db results for vec4 programs on i965:

   total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
   instructions in affected programs:     1245199 -> 1134225 (-8.91%)
   helped:                                7469
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on G4x:

   total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
   instructions in affected programs:     1205599 -> 1094625 (-9.20%)
   helped:                                7469
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:

   total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
   instructions in affected programs:     1205503 -> 1094531 (-9.21%)
   helped:                                7468
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:

   total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
   instructions in affected programs:     1850547 -> 1621331 (-12.39%)
   helped:                                14856
   HURT:                                  1481

GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:

   Test                        master-glsl     master-nir     diff
   bench_OglGeomPoint          461.556         463.006        1.450
   bench_OglTerrainFlyInst     184.484         187.574        3.090
   bench_OglTerrainPanInst     132.412         136.307        3.895
   bench_OglTexFilterAniso     19.653          19.645         -0.008
   bench_OglTexFilterTri       58.333          58.009         -0.324
   bench_OglVSInstancing       65.049          65.327         0.278
   bench_trexoff               69.474          69.694         0.220
   bench_valley                40.708          41.125         0.417

v2 (Jason Ekstrand):
 - Remove more uses of NirOptions as a switch
 - New shader-db numbers
 - Added benchmark numbers

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:18:46 -07:00
Ilia Mirkin
4e0a8e0a50 i965: don't forget to free image_param on prog_data free
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Ilia Mirkin
19598aaa5d glsl: avoid leaking hiddenUniforms map when there are no uniforms
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Ilia Mirkin
da2fdf950f mesa: avoid leaking closure when iterating over a string_to_uint_map
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Chris Wilson
6b7036498a nir: Fix uninitialized 'progress' variable in nir_lower_system_values.
Commit 0a1adaf11d (nir: Report progress
from nir_lower_system_values().) introduced a bug caught by Valgrind:

==823== Conditional jump or move depends on uninitialised value(s)
==823==    at 0xB09020C: convert_block (nir_lower_system_values.c:68)
==823==    by 0xB079FB8: foreach_cf_node (nir.c:1310)
==823==    by 0xB07A0AF: nir_foreach_block (nir.c:1336)
==823==    by 0xB09026B: convert_impl (nir_lower_system_values.c:79)
...
==823==  Uninitialised value was created by a stack allocation
==823==    at 0xB090249: convert_impl (nir_lower_system_values.c:76)

which is trivially fixed by initializing progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 10:44:28 -07:00
Connor Abbott
33da78adee nir/remove_phis: handle trivial back-edges
Some loops may have phi nodes that look like:

foo = ...
loop {
    bar = phi(foo, bar)
    ...
}

in which case we can remove the phi node and replace all uses of 'bar'
with 'foo'. In particular, there are some L4D2 vertex shaders with loops
that, after optimization, look like:

        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 block_4 */
                vec1 ssa_2195 = phi block_0: ssa_2136, block_4: ssa_994
                vec1 ssa_7321 = phi block_0: ssa_8195, block_4: ssa_7321
                vec1 ssa_7324 = phi block_0: ssa_8198, block_4: ssa_7324
                vec1 ssa_7327 = phi block_0: ssa_8174, block_4: ssa_7327
                vec1 ssa_8139 = intrinsic load_uniform () () (232)
                vec1 ssa_588 = ige ssa_2195, ssa_8139
                /* succs: block_2 block_3 */
                if ssa_588 {
                        block block_2:
                        /* preds: block_1 */
                        break
                        /* succs: block_5 */
                } else {
                        block block_3:
                        /* preds: block_1 */
                        /* succs: block_4 */
                }
                block block_4:
                /* preds: block_3 */
                vec1 ssa_994 = iadd ssa_2195, ssa_2150
                /* succs: block_1 */
        }

where after removing the second, third, and fourth phi nodes, the loop becomes
entirely dead, and this patch will cause the loop to be deleted entirely.

No piglit regressions.

Shader-db results on bdw:

instructions in affected programs:     5824 -> 5664 (-2.75%)
total loops in shared programs:        2234 -> 2202 (-1.43%)
helped:                                32

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-02 13:19:45 -04:00
Kyle Brenneman
d35391cfda glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)
Add a macro GL_LIB_NAME to hold the filename that configure comes up with
based on the --with-gl-lib-name and --enable-mangling options.

In driOpenDriver, use the GL_LIB_NAME macro instead of hard-coding
"libGL.so.1".

v2: Add an #ifndef/#define for GL_LIB_NAME so that non-autoconf builds will
    work.
v3: Fix the library filename in the Makefile.

Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 13:25:05 +01:00
Kyle Brenneman
798f260a2f mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.

This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 13:23:18 +01:00
Kyle Brenneman
a27f2d991b glx: Fix build errors with --enable-mangling (v2)
Rearranged the GLX_ALIAS macro in glextensions.h so that it will pick up
the renames from glx_mangle.h.

Fixed the alias attribute for glXGetProcAddress when USE_MGL_NAMESPACE is
defined.

v2: Add a comment clarifying why GLX_ALIAS needs two macros.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-02 13:22:46 +01:00
Tapani Pälli
85313ff8ab glsl: validate binding qualifier on block members
Fixes following Piglit test:
	member-invalid-binding-qualifier.frag

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-02 10:50:42 +03:00
Samuel Iglesias Gonsalvez
f42466322a glsl: emit row_major matrix's SSBO stores only for components in writemask
When writing to a column of a row-major matrix, each component of the
vector is stored to non-consecutive memory addresses, so we generate
one instruction per component.

This patch skips the disabled components in the writemask, saving some
store instructions plus avoid storing wrong data on each disabled
component.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 08:34:25 +02:00
Tapani Pälli
a552b77dcc glsl: error out if non-constant indexing of SSBO arrays with GLSL ES
Fixes a failing subtest in:
	ES31-CTS.shader_storage_buffer_object.negative-glsl-compileTime

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-02 08:37:02 +03:00
Daniel Scharrer
b3f9c5cc0f mesa: Add abs input modifier to base for POW in ffvertex_prog
The result of POW for a negative base is undefined. Even when the result
is multiplied by zero (which is the case here whenever the base is
negative), the Inf and NaNs can propagate past that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91342
Signed-off-by: Daniel Scharrer <daniel@constexpr.org>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-01 16:37:55 -04:00
Kenneth Graunke
604ce8253a i965/fs: Print reg and reg_offset separately for ATTR files.
Reading this output was really confusing.  reg represents attribute
slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 11:01:58 -07:00
Kenneth Graunke
193d29516d i965/nir: Refactor input/output lowering setup into helpers.
The code for input lowering is going to get significantly more
complicated shortly, so I wanted to pull it out.  Vertex shader inputs
are handled nearly identically regardless of vec4/scalar mode, so I
opted to not split that.

I thought about having each function actually do the lowering, but one
pass through nir_lower_io that handles all types (which weren't handled
earlier) is probably more efficient.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 10:58:30 -07:00
Kenneth Graunke
39a1d36a67 nir: Allow nir_lower_io() to only lower one type of variable.
We may want to use different type_size functions for (e.g.) inputs
vs. uniforms.  Passing in -1 for mode ignores this, handling all
modes as before.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 10:58:30 -07:00
Brian Paul
1c6689bf03 mesa: fix incorrect error in _mesa_BindTextureUnit()
If the texture object exists, but the Name field is zero, it means
the object was created but never bound to a target.  Trying to bind it
in _mesa_BindTextureUnit() should generate GL_INVALID_OPERATION.

Fixes piglit's arb_direct_state_access-bind-texture-unit test.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
a9408f3ca1 mesa: remove _mesa_get_tex_unit_err() and fix error handling
This helper was only called from _mesa_BindTextureUnit().  It's simpler
to just inline it.

The error check / code / message in the helper was incorrect.  It was
written for glBindTextures(), not glBindTextureUnit().  The correct
error for a bad texture unit number is GL_INVALID_VALUE.  The error
message now reports the unit number rather than a GL_TEXTUREi enum.

Fixes a failure in piglit's arb_direct_state_access-bind-texture-unit test.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
c277fa3940 mesa: consolidate texture binding code
Before, we were doing the actual _mesa_reference_texobj() call and
ctx->Driver.BindTexture() and misc housekeeping in three different
places.  This consolidates the common code in a new bind_texture()
function.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
78f908c54b mesa: fix indentation in _mesa_create_nameless_texture() 2015-10-01 07:45:43 -06:00
Brian Paul
aa249190a5 st/mesa: clean up #includes in st_draw.c
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
82e3d8ba8b mesa: clean up #includes in sampler.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
32a4999ee7 mesa: clean up #includes in ir_to_mesa.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
b9b13d873a mesa: clean up #includes in uniforms.h
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
e13b515044 mesa: clean up #includes in uniform_query.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Brian Paul
85ea125620 mesa: clean up #includes in pipelineobj.c
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Brian Paul
1a22550725 mesa: clean up #includes in ff_fragment_shader.cpp
Get rid of "../glsl/" paths.  Sort alphabetically.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Iago Toral Quiroga
7455324030 main: Fix block index when mixing UBO and SSBO blocks
Since we store both in UniformBlocks, we can't just compute the index by
subtracting the array address start, we need to count the number of
buffers of the approriate type.

v2:
  - Just fall back to calc_resource_index (Tapani)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 09:25:30 +02:00
Tapani Pälli
ca2e16d26e mesa: use strtok_s for strtok_r on windows
https://msdn.microsoft.com/en-us/library/ftsafwz3.aspx

v2: use _WIN32 instead of _MSC_VER (Brian Paul)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92183
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-01 08:01:03 +03:00
Ian Romanick
9bd9cf1fa4 meta: Handle array textures in scaled MSAA blits
The old code had some significant problems with respect to
sampler2DArray textures.  The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2.  The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.

The input to the fragment shader is always treated as a vec3.  If the
source data is only vec2, the vertex puller will supply 0 for the .z
component.  The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer.  If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.

Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.

Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 16:22:56 -07:00
Chad Versace
b217e6f035 i965/miptree: Add PRM references for most struct members (v2)
Add comments that link the driver's miptree structures to the hardware
structures documented in the PRM.  This provides sorely needed
orientation to developers new to the miptree code. And for miptree
veterans, this clarifies some of the more obscure miptree data.

For each driver struct field that closely corresponds to a
hardware struct field, add a PRM reference to that hardware field's
name. For example,

    struct intel_mipmap_tree {
       ...
       /**
        * @brief One of GL_TEXTURE_2D, GL_TEXTURE_2D_ARRAY, etc.
        *
        * @see RENDER_SURFACE_STATE.SurfaceType
        * @see RENDER_SURFACE_STATE.SurfaceArray
        * @see 3DSTATE_DEPTH_BUFFER.SurfaceType
        */
       GLenum target;
       ...
    };

Also annotate the INTEL_MSAA_LAYOUT_* enums with the name of the PRM
sections that documents the layout.

v2: Replace "2D subimage" with "slice", and define what a "slice" is.
    For Ben.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
2015-09-30 15:32:03 -07:00
Chad Versace
f7fe9fb0f1 i965/miptree: Rename align_w,align_h -> halign,valign
The values of intel_mipmap_tree::align_w and ::align_h correspond to the
hardware enums HALIGN_* and VALIGN_*.

See the confusion?
    align_h != HALIGN
    align_h == VALIGN

Reduce the confusion by renaming the variables to match the hardware
enum names:
    git ls-files |
    xargs sed -i -e 's/align_w/halign/g' \
                 -e 's/align_h/valign/g'

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-30 15:31:06 -07:00
Chad Versace
56367b0290 i965/miptree: Rename intel_miptree_map::mt -> ::linear_mt (v2)
Because that's what it is. It's an untiled, *linear* miptree.

v2:
  - Add space after /*.
  - Use one comment per function argument.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:31:04 -07:00
Chad Versace
b7882ae677 i965/miptree: Fix comments for map mode
The comment for intel_miptree_map::mode claimed that it was a bitmask of
GL_MAP_{READ,WRITE,INVALIDATE}_BIT. In reality, the bitmask may include
any of {GL,BRW}_MAP_*_BIT.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:31:03 -07:00
Chad Versace
bd191b7cc6 i965/miptree: More comments for BRW_MAP_DIRECT_BIT (v2)
Clarify that this bit extends the set of GL_MAP_*_BIT enums.
Also fix typo of "temporary".

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:30:55 -07:00
Kenneth Graunke
651395b6e8 i965: Remove duplicate copy of is_scalar_shader_stage().
Jason open coded this in 60befc63 when cleaning up some ugly code;
using our existing helper tidies it up a bit more.

v2: Drop inline (suggested by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-30 13:56:24 -07:00
Ville Syrjälä
a1a3f0961b i915: Remember to call intel_prepare_render() before blitting
Bring over the following fix from i965:
 commit fb3d62fe3d
 Author: Kenneth Graunke <kenneth@whitecape.org>
 Date:   Tue Aug 6 14:36:09 2013 -0700

    i965: Remember to call intel_prepare_render() before blitting.

Fixes a crash in the following piglit tests:
 bin/fbo-sys-blit -auto
 bin/fbo-sys-sub-blit -auto

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 13:10:03 -07:00
Ville Syrjälä
c349031c27 i915: Fix texcoord vs. varying collision in fragment programs
i915 fragment programs utilize the texture coordinate registers
for both texture coordinates and varyings. Unfortunately the
code doesn't check if the same index might be in use for both.
It just naively uses the index to pick a texture unit, which
could lead to collisions.

Add an extra mapping step to allocate non conflicting texture
units for both uses.

The issue can be reproduced with a pair of simple shaders like
these:
 attribute vec4 in_mod;
 varying vec4 mod;
 void main() {
   mod = in_mod;
   gl_TexCoord[0] = gl_MultiTexCoord0;
   gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
 }

 varying vec4 mod;
 uniform sampler2D tex;
 void main() {
   gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod;
 }

Fixes many piglit tests on i915:

    glsl-link-varyings-2
    glsl-orangebook-ch06-bump
    interpolation-none-gl_frontcolor-smooth-fixed
    interpolation-none-gl_frontcolor-smooth-none
    interpolation-none-gl_frontcolor-smooth-vertex
    interpolation-none-gl_frontsecondarycolor-smooth-fixed
    interpolation-none-gl_frontsecondarycolor-smooth-vertex
    interpolation-none-gl_frontsecondarycolor-smooth-none
    interpolation-none-other-flat-fixed
    interpolation-none-other-flat-none
    interpolation-none-other-flat-vertex
    interpolation-none-other-smooth-fixed
    interpolation-none-other-smooth-none
    interpolation-none-other-smooth-vertex

v2 [idr]: Minor formatting tweaks.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 13:10:03 -07:00
Ville Syrjälä
9504740f3e i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0)
I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy
the same bit. Move the texture bits upwards a bit to make room for
I830_UPLOAD_RASTER_RULES.

Now the driver will actually upload the raster rules which is rather
important to get the provoking vertex right. Fixes the appearance
of glxgears teeth on gen2.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 12:49:28 -07:00
Jordan Justen
7b391142e9 i965/cs: Upload UBO/SSBO surfaces
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-30 11:28:12 -07:00
Rhys Kidd
83018f5c20 mesa: Fix format specifier warning in mesa_DispatchComputeIndirect()
Commit 1665d29ee3 introduced an incorrect
format specifier that operates on GLintptr indirect within the function
_mesa_DispatchComputeIndirect().

This patch mitigates the introduced GCC warning:

src/mesa/main/compute.c: In function '_mesa_DispatchComputeIndirect':
src/mesa/main/compute.c:53:7: warning: format '%d' expects argument of type 'int', but argument 3 has type 'GLintptr' [-Wformat=]
       _mesa_debug(ctx, "glDispatchComputeIndirect(%d)\n", indirect);
           ^

v2: Amend for Boyan Ding <boyan.j.ding@gmail.com> feedback.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-30 10:13:41 -07:00
Jason Ekstrand
3948ac19a4 i965: Get rid of prog_data compare functions
They are no longer used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Jason Ekstrand
bfdc76c133 i965/state_cache: Remove the aux_compare fields
They haven't been used since 1bba29ed40 so
there's no good reason to keep them around.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Jason Ekstrand
a4734b34b3 i965/copy_image: Fix a copy+past error
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Chris Wilson
70e91d61fd i965: Remove early release of DRI2 miptree
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-30 10:52:30 +03:00
Samuel Iglesias Gonsalvez
e21bb9e7bd glsl: assert base_alignment > 0 for records
From GLSL 1.50 spec, section 4.1.8 "Structures":

"Structures must have at least one member declaration."

So the base_alignment should be higher than zero.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
f3afcbecc6 util: use strnlen() in strndup() implementations
If the string being copied is not NULL-terminated the result of
strlen() is undefined.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
023165a734 i965/vec4/nir: add nir_intrinsic_memory_barrier support
Fix OpenGL ES 3.1 conformance tests: advanced-readWrite-case1-vsfs
and advanced-matrix-vsfs.

v2:
- Fix SHADER_OPCODE_MEMORY_FENCE emission and the allocation of 'tmp'
  (Francisco).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
f24e5e68d6 glsl: apply shader storage block member rules when adding program resources
From ARB_program_interface_query:

"For an active shader storage block member declared as an array, an
 entry will be generated only for the first array element, regardless
 of its type. For arrays of aggregate types, the enumeration rules are
 applied recursively for the single enumerated array element."

v2:
- Simplify 'if' conditions and return true if it is not a buffer
  variable, because these rules only apply to buffer variables (Timothy).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-30 08:13:07 +02:00
Jordan Justen
4810d02112 nir: Don't set dest in SSBO store glsl_to_nir conversion
This matches the function signature created in
lower_ubo_reference_visitor::ssbo_store which has a void return.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-29 17:17:20 -07:00
Kenneth Graunke
476e6d732f nir: Use a system value for gl_PrimitiveIDIn.
At least on Intel hardware, gl_PrimitiveIDIn comes in as a special part
of the payload rather than a normal input.  This is typically what we
use system values for.  Dave and Ilia also agree that a system value
would be nicer.

At some point, we should change it at the GLSL IR level as well.  But
that requires changing most of the drivers.  For now, let's at least
make NIR do the right thing, which is easy.

v2: Add a comment about not creating a temporary (suggested by Iago).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-29 14:19:32 -07:00
Brian Paul
cb758b892a st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats
For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag
to try to get a hardware format which also supports rendering (for FBO
textures).  Do the same thing for floating point formats.

This allows the Redway3D Flat demo to run.

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-29 11:52:22 -06:00
Brian Paul
daf23bd4cb st/mesa: add some debugging code in st_ChooseTextureFormat()
I've temporarily added code like this many times.  Wrap it in a
conditional that can be enabled when needed.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-29 11:52:03 -06:00
Brian Paul
7147f7098e mesa: clean up #includes in shaderapi.c
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:56 -06:00
Brian Paul
b24c6d3fef mesa: clean up the #includes in shader_query.cpp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:51 -06:00
Brian Paul
3bbff1e26e mesa: remove an extern "C" wrapper in shader_query.cpp
The shaderapi.h header already has the extern "C" wrapper.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:38 -06:00
Jordan Justen
681b4badae i965/cs: Generate code to load gl_NumWorkGroups
This code also sets cs_prog_data->uses_num_work_groups which is later
used by state setup to indicate that the gl_NumWorkGroups surface
needs to be setup.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
4c6ddd3397 nir: Convert SYSTEM_VALUE_NUM_WORK_GROUPS to a nir intrinsic
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
f6ae914069 glsl/cs: Add gl_NumWorkGroups as a system value
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
63d7b33f51 i965/cs: Setup surface binding for gl_NumWorkGroups
This will only be setup when the prog_data uses_num_work_groups
boolean is set.

At this point nothing will set uses_num_work_groups, but soon code
will set it when emitting code for the intrinsic that loads
gl_NumWorkGroups.

We can't emit this surface information earlier at the start of the
DispatchCompute* call because we may not have generated the program
yet. Until we generate the program, we don't know if the
gl_NumWorkGroups variable is accessed.

We also can't emit the surface as part of the brw_cs_state atom,
because we might not need the surface if gl_NumWorkGroups is not used
by the program.

Lastly, we cannot emit the surface later (after state upload) in the
DispatchCompute* call, because it needs to be run before the
brw_cs_state atom is emitted, since it changes the surface state.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
d1be9d2126 i965/cs: Add a binding table entry for gl_NumWorkGroups
If glDispatchComputeIndirect is used, then the value for this variable
must be read from the indirect BO.

To allow the same generated code to support indirect and
glDispatchCompute, we will also setup a BO for the number of work
groups using the intel_upload_data mechanism. This will only be
required if the gl_NumWorkGroups variable is accessed.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
d57a85f32b i965/cs: Store compute invocation information in brw context
We will need this in an atom to setup a surface to read the
gl_NumWorkGroups values from.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
60cf84dea7 i965/cs: Re-emit cs_state when surfaces have changed
Unlike rendering (BINDING_TABLE_POINTERS_*S), compute doesn't have a
binding table pointers command. Instead it is part of the
MEDIA_INTERFACE_DESCRIPTOR structure loaded by the brw_cs_state atom.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
2ec5f3e1d5 i965/cs: Re-emit push constants and cs_state on new batches
We need to re-emit push constansts when a new batch is started since
the push constants are stored in the batch. We also need to re-emit
the MEDIA_INTERFACE_DESCRIPTOR (in brw_cs_state) since it is stored in
the batch.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
1665d29ee3 mesa/cs: Add MESA_VERBOSE=api support in DispatchCompute*
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jose Fonseca
952366a60e util: Fix strndup prototype on C++.
Trivial.
2015-09-29 16:01:56 +01:00
Tapani Pälli
c0722be9f5 mesa: fix ARRAY_SIZE query for GetProgramResourceiv
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.

Fixes rest of the failures with
   ES31-CTS.program_interface_query.no-locations

v2: make additional check only for GS inputs
v3: create helper function for resource name length
    so that it gets calculated only in one place

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
2015-09-29 12:46:28 +03:00
Iago Toral Quiroga
12d510ab74 glsl: Fix forward NULL dereference coverity warning
The comment says that it should be impossible for decl_type to be NULL
here, so don't try to handle the case where it is, simply add an assert.

>>>     CID 1324977:  Null pointer dereferences  (FORWARD_NULL)
>>>     Comparing "decl_type" to null implies that "decl_type" might be null.

No piglit regressions observed.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:53:08 +02:00
Iago Toral Quiroga
1dc2db7a4d glsl: Fix null return coverity warning
Add an assert on the result of as_dereference() not being NULL:

>>>     CID 1324978:  Null pointer dereferences  (NULL_RETURNS)
>>>     Dereferencing a null pointer "deref_record->record->as_dereference()".

Since we are introducing a new variable to hold the result of
as_dereference(), take the opportunity to rename deref_record_type to
interface_type and just name the new variable interface_deref, which is
less confusing.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:53:08 +02:00
Iago Toral Quiroga
6bf718fec2 glsl: Fix unused value warning reported by Coverity
We don't use param in this part of the code, so no point in advancing
the pointer forward:

>>>     CID 1324983:  Code maintainability issues  (UNUSED_VALUE)
>>>     Assigning value from "param->get_next()" to "param" here, but that stored value is overwritten before it can be used.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:53:08 +02:00
Samuel Iglesias Gonsalvez
bea66d22f2 util: implement strndup for WIN32
v2:
- Add strndup.h to Makefile.sources (Emil)
- Use calloc instead of malloc (Emil).
- Check if allocation fails (Emil, Jose)
- Add '#pragma once' and include stdlib.h to strndup.h (Jose)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92124
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
7efb235019 glsl: use correct number of uniform blocks in error message
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
6668eb5a45 mesa: rename gl_shader_program's NumUniformBlocks to NumBufferInterfaceBlocks
Because it counts shader storage blocks too.

v2:
- Use NumBufferInterfaceBlocks instead (Jordan).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
38004eb17c main: fix ACTIVE_UNIFORM_BLOCKS value
NumUniformBlocks also counts shader storage blocks.
NumUniformBlocks variable will be renamed in a later patch to avoid
misunderstandings.

v2:

- Modify the condition to use !IsShaderStorage and the list of
  uniform blocks (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:03:47 +02:00
Emil Velikov
589249a792 docs: add news item and link release notes for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-29 00:22:32 +01:00
Emil Velikov
dda02d202e docs: add sha256 checksums for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4c0b484612)
2015-09-29 00:21:14 +01:00
Emil Velikov
58e02b2a4e docs: add release notes for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 51e0b06d99)
2015-09-29 00:21:12 +01:00
Anuj Phogat
945592f92c i965/gen9: Add a condition for starting pixel in fast copy blit
This condition restricts the use of fast copy blit to cases
where starting pixel of src and dst is oword (16 byte) aligned.

Many piglit tests (if using fast copy blit in Mesa) failed earlier
because I missed adding this condition.Fast copy blit is currently
enabled for use only with Yf/Ys tiling.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 15:00:53 -07:00
Ilia Mirkin
1d8cba9b51 nouveau: wait to unref the transfer's bo until it's no longer used
The bo will often come from a slab in which case it doesn't matter. But
for larger allocations this will be in its own bo, and we have to make
sure to wait until it's no longer used in order for it to be freed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Ilia Mirkin
3a6b9a7830 nouveau: delay deleting buffer with unflushed fence
If there is an unflushed fence on the bo, then the resource may still be
used in commands built up in the local pushbuf. Flushing can cause all
sorts of unwanted effects, so just free the bo when the relevant fence
is hit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Ilia Mirkin
d4e650b07b nouveau: be more careful about freeing temporary transfer buffers
Deleting a buffer does not flush the command stream. Make sure that we
wait for the copies to finish before deleting the temporary bo.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Anuj Phogat
4c5308bbf4 i965: Rename intel_miptree_get_dimensions_for_image()
This function isn't specific to miptrees. So, drop the "miptree"
from function name.

V3: Add a comment explaining how the 1D Array texture height and
    depth is interpreted by Intel hardware.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
0bfd914f9f i965/gen9: Fix {src, dst}_pitch alignment check for XY_FAST_COPY_BLT
I misinterpreted the alignmnet restriction in XY_FAST_COPY_BLT earlier.
Instead of checking pitch for 64KB alignmnet we need to check it for
tile widh alignment.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
0fa39bff19 i965: Fix {src, dst}_pitch alignment check for XY_SRC_COPY_BLT
Current code checks the alignment restrictions only for Y tiling.
From Broadwell PRM vol 10:

 "pitch is of 512Byte granularity for Tile-X: This means the tiled-x
  surface pitch can be (512, 1024, 1536, 2048...)/4 (in Dwords)."

This patch adds the restriction for X tiling as well.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
e83b07aa7b i965: Move conversion of {src, dst}_pitch to dwords outside if/else
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
485285498f i965: Delete temporary variable 'src_pitch'
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
bbbc9fd8e5 i965: Use helper function intel_get_tile_dims() in surface setup
It takes care of using the correct tile width if we later use other
tiling patterns for aux miptree.

V2: Remove the comment about using Yf for aux miptree.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
1dc41be9eb i965: Use intel_get_tile_dims() to get tile masks
This will require change in the parameters passed to
intel_miptree_get_tile_masks().

V2: Rearrange the order of parameters. (Ben)
    Change the name to intel_get_tile_masks(). (Topi)

V3: Use temporary variables in intel_get_tile_masks()
    for clarity. Fix mask_y computation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
21fdc59d34 i965: Add a helper function intel_get_tile_dims()
V2:
- Do the tile width/height computations in the new helper
  function and use it later in intel_miptree_get_tile_masks().
- Change the name to intel_get_tile_dims().

V3: Return the tile_h in number of rows in place of bytes.
    Document the units of tile_w, tile_h parameters.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Eduardo Lima Mitev
5edd9961c1 mesa: Use the effective internal format instead for validation
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:

    "if internalformat is a base internal format, the effective internal
     format is a sized internal format that is derived from the format and
     type for internal use by the GL. Table 3.12 specifies the mapping of
     format and type to effective internal formats. The effective internal
     format is used by the GL for purposes such as texture completeness or
     type checks for CopyTex* commands. In these cases, the GL is required
     to operate as if the effective internal format was used as the
     internalformat when specifying the texture data."

v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.

v4: * Improved comments in
      _mesa_es3_effective_internal_format_for_format_and_type().
    * Splitted patch to separate chunk about reordering of
      error_check_subtexture_dimensions() error check, which is not directly
      related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
    dEQP tests that are buggy:

    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Eduardo Lima Mitev
c6bf1cd146 mesa: Move _mesa_base_tex_format() from teximage to glformats files
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.

Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.

The function is moved as-is, without any modification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Eduardo Lima Mitev
15ab968f62 mesa: Fix order of format+type and internal format checks for glTexImageXD ops
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.

v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Alexander von Gluck IV
7cdd818d2a egl: Fix missing Haiku include path 2015-09-28 13:58:25 -04:00
Alexander von Gluck IV
255a225265 state_trackers/hgl: Fix missing include path 2015-09-28 13:58:24 -04:00
Francisco Jerez
b61292296b i965/fs: Fix hang on IVB and VLV with image format mismatch.
IVB and VLV hang sporadically when an untyped surface read or write
message is used to access a surface of format other than RAW, as may
happen when there is a mismatch between the format qualifier of the
image uniform and the format of the actual image bound to the
pipeline.  According to the spec this condition gives undefined
results but may not lead to program termination (which is one of the
possible outcomes of the hang).  Fix it by checking at runtime whether
the surface is of the right type.

Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
subtest.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-28 18:10:39 +03:00
Serge Martin
2518645f63 clover: Implement clCreateImage?D w/ clCreateImage.
Remplace clCreateImage2D and clCreateImage3D implementation with call
to clCreateImage.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 18:10:39 +03:00
Serge Martin
f2c52e392b clover: Implement CL1.2 clCreateImage().
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 18:10:39 +03:00
Francisco Jerez
92666b90c0 clover: Move down canonicalization of memory object flags into validate_flags().
This will be used to share the same logic between buffer and image
creation.

v2: Make memory flag set constants local to validate_flags. (Serge
    Martin)
2015-09-28 18:10:39 +03:00
Samuel Iglesias Gonsalvez
2b9248dc58 docs: mention ARB_shader_storage_buffer_object on 11.1.0 release notes
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-28 16:34:24 +02:00
Iago Toral Quiroga
e7ae6d9e14 glsl: revert "glsl: atomic counters can be declared as buffer-qualified variables"
This reverts commit 586142658e.

The specs are not explicit about any restrictions related to the types allowed
on buffer variables, however, the description of opaque types (like atomic
counters) is in conclict with the purpose of buffer variables:

"The opaque types declare variables that are effectively opaque
 handles to other objects. These objects are
 accessed through built-in functions, not through direct reading or
 writing of the declared variable.
 (...)
 Opaque variables cannot be treated as l-values;(...)"

Also, Mesa is already disallowing opaque types in interface blocks anyway, so
that commit was not really achieving anything.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 14:23:26 +02:00
Ilia Mirkin
5bff12ecb4 gallium/util: avoid unreferencing random memory on buffer alloc failure
Found by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-28 02:38:58 -04:00
Ilia Mirkin
6dd059fefe mesa: don't leak interface_name
Found by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-28 02:38:58 -04:00
Timothy Arceri
e413d2fbc4 glsl: fix component size calculation for tessellation and geom shaders
Broken in commit abdab88b30 when adding arrays of arrays support

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-09-28 11:31:50 +10:00
Boyan Ding
3c63a2d2f0 docs/GL3.txt: fix typo
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-09-27 17:54:23 -07:00
Kenneth Graunke
d6a41b5f70 i965/gs: Optimize away the EOT write on Gen8+ with static vertex count.
With static vertex counts, the final EOT write doesn't actually write
any data - it's just there to end the thread.  Typically, the last
thing before ending the thread will be an EmitVertex() call, resulting
in a URB write.  We can just set EOT on that.

Note that this isn't always possible - there might be an intervening
SSBO write/image store, or the URB write may have been in a loop.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3173 -> 3149 (-0.76%)
instructions in affected programs:     176 -> 152 (-13.64%)
helped:                                8

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 12:02:34 -07:00
Kenneth Graunke
08fe5799e6 i965/gs: Allow src0 immediates in GS_OPCODE_SET_WRITE_OFFSET.
GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special
strides.  We can easily make the generator handle constant src[0]
arguments by instead generating a MOV with the product of both operands.

This isn't necessarily a win in and of itself - instead of a MUL, we
generate a MOV, which should be basically the same cost.  However, we
can probably avoid the earlier MOV to put src[0] into a register.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3207 -> 3173 (-1.06%)
instructions in affected programs:     3207 -> 3173 (-1.06%)
helped:                                11

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 12:02:31 -07:00
Kenneth Graunke
f0a618ee7c i965: Implement "Static Vertex Count" geometry shader optimization.
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization.  Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly.  However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.

Thanks to the new NIR GS intrinsics, this is easy.  We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.

Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs:     242 -> 222 (-8.26%)
helped:                                10

v2: Don't break non-NIR paths (just skip this optimization).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:58 -07:00
Kenneth Graunke
bcef2abad7 i965: Move GS_THREAD_END mlen calculations out of the generator.
The visitor was setting a mlen that was wrong for Broadwell, but the
generator was ignoring it and doing the right thing regardless.  We may
as well move the logic fully into the visitor.  This will be useful in
the next commit as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:57 -07:00
Kenneth Graunke
02530c5dc5 nir: Add a function to count the number of vertices a GS emits.
Some hardware (such as Broadwell) can run geometry shaders more
efficiently when the number of vertices emitted is statically known.

This pass provides a way to obtain the constant vertex count, or
-1 indicating that the vertex count is unknown/non-constant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:53 -07:00
Kenneth Graunke
df221f65e2 i965: Simplify handling of VUE map changes.
The old code was disasterously complex - spread across multiple atoms
which may not even run, inspecting the dirty bits to try and decide
whether it was necessary to do checks...storing VS information in
brw_context...extra flagging...

This code tripped me and Carl up very badly when working on the
shader cache code.  It's very fragile and hard to maintain.

Now that geometry shaders only depend on their inputs and don't have
to worry about the VS VUE map, we can dramatically simplify this:
just compute the VUE map coming out of the geometry shader stage
in brw_upload_programs.  If it changes, flag it.  Done.

v2: Also check vue_map.separable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
6301af22bb i965/gs: Remove the dependency on the VS VUE map.
Because we only support geometry shaders in core profile, we can safely
ignore any driver-extending of VS outputs.

Those are:
- Legacy userclipping (doesn't exist in core profile)
- Edgeflag copying (Gen4-5 only, no GS support)
- Point coord replacement (Gen4-5 only, no GS support)
- front/back color hacks (Gen4-5 only, no GS support)

v2: Rebase; leave a comment about why SSO works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
99df02ca26 i965: Don't re-layout varyings for separate shader programs.
Previously, our VUE map code always assigned slots to varyings
sequentially, in one contiguous block.

This was a bad fit for separate shaders - the GS input layout depended
or the VS output layout, so if we swapped out vertex shaders, we might
have to recompile the GS on the fly - which rather defeats the point of
using separate shader objects.  (Tessellation would suffer from this
as well - we could have to recompile the HS, DS, and GS.)

Instead, this patch makes the VUE map for separate shaders use a fixed
layout, based on the input/output variable's location field.  (This is
either specified by layout(location = ...) or assigned by the linker.)
Corresponding inputs/outputs will match up by location; if there's a
mismatch, we're allowed to have undefined behavior.

This may be less efficient - depending what locations were chosen, we
may have empty padding slots in the VUE.  But applications presumably
use small consecutive integers for locations, so it hopefully won't be
much worse in practice.

3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions.
This seems like a small price to pay for avoiding recompiles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
1e5180316c i965/vue: Make assign_vue_map() take an explicit slot.
Our plan of assigning consecutive slots doesn't work properly for
separate shader objects - at least, if we want to avoid recompiling them
whenever the interface changes.

As a first step, make assign_vue_map take an explicit slot parameter,
rather than implicitly incrementing it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
268008f98c i965: Initialize unused VUE map slots to BRW_VARYING_SLOT_PAD.
Nothing actually relies on unused slots being initialized to
BRW_VARYING_SLOT_COUNT.  Soon, we're going to have VUE maps with holes
in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot
more sense.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
39d4b553a8 i965: Fix BRW_VARYING_SLOT_PAD handling in the scalar VS backend.
We can't just break for padding slots.  Instead, treat them like
unwritten output variables, so we handle flushing and incrementing
urb_offset correctly.

Paul introduced the concept of padding slots back in 2011, but we've
never actually used them for anything.  So it's unsurprising that the
scalar VS backend didn't handle them quite right.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Samuel Iglesias Gonsalvez
511a86383b main/tests: Enable glShaderStorageBlockBinding() check in dispatch_sanity test
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 16:54:02 +02:00
Emil Velikov
d2d4f00a2c docs: add news item and link release notes for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-26 14:25:19 +01:00
Emil Velikov
5d08669e2f docs: add sha256 checksums for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 7f1a77ae66)
2015-09-26 14:23:00 +01:00
Emil Velikov
aeec994954 docs: add release notes for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit bcb9e1d26b)
2015-09-26 14:22:59 +01:00
Timothy Arceri
abdab88b30 glsl: calculate component size for arrays of arrays when varying packing disabled
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:48:49 +10:00
Timothy Arceri
1d401f9ce4 glsl: validate binding qualifier for AoA
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:28:05 +10:00
Timothy Arceri
9bad7afbc2 glsl: add helper for calculating size of AoA
V2: return 0 if not array rather than -1

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:27:47 +10:00
Timothy Arceri
776a3845d6 glsl: clean-up link uniform code
These changes are also needed to allow linking of
struct and interface arrays of arrays.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-26 22:27:24 +10:00
Marek Olšák
9932142192 radeonsi: add scratch buffer to the buffer list when it's re-allocated
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-09-26 01:51:05 +02:00
Leo Liu
1e97b41893 radeon/vce: fix vui time_scale zero error
if app pass 0 as frame_rate_num, it should not be encoded to the VUI.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-25 18:47:14 -04:00
Matt Turner
1dd943d7fb mesa: Add locking to programs.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
3c57a102eb mesa: Add locking to sampler objects.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
d4b0e0b717 mesa: Remove debugging code from _mesa_reference_*.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
c8dc04d4c0 c11/threads: Assert that mtx is non-NULL and check return values.
Passing NULL to C11 threads functions isn't safe, so there's no need for
our implementation to handle it. Cuts about 1k of .text.

   text     data      bss      dec      hex  filename
5009514   198440    26328  5234282   4fde6a  i965_dri.so before
5008346   198440    26328  5233114   4fd9da  i965_dri.so after

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Tapani Pälli
266d05a3a0 glsl: fix packed varyings interface type and add default case
fixes Piglit test:
   arb_program_interface_query/linker/query-varyings.shader_test

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-25 12:19:36 +03:00
Antia Puentes
e92c35a872 glsl: Mark as active all elements of shared/std140 block arrays
Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140' blocks
or block members) considered as active 'shared' and 'std140' uniform
blocks and uniform block arrays, but did not include the block array
elements. Because of that, it was possible to have an active uniform
block array without any elements marked as used, making the assertion
   ((b->num_array_elements > 0) == b->type->is_array())
in link_uniform_blocks() fail.

Fixes the following 5 dEQP tests:

 * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18
 * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24
 * dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19
 * dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49
 * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83508
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
065e7d37f1 docs: Mark ARB_shader_storage_buffer_object as done for i965
v2:
- Mark it too for GLES 3.1

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
614b5307fd i965: Enable ARB_shader_storage_buffer_object extension for gen7+
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
5b080e3ddf mesa: enable ARB_shader_storage_buffer_object extension for GLES 3.1
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
10b5c6491f mesa: Add getters for the GL_ARB_shader_storage_buffer_object max constants
v2:
- Add tessellation shader constants support

v3:
- Add GLES 3.1 support.

v4:
- Move the getters to the proper place

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
91191af6d6 glapi: add ARB_shader_storage_block_buffer_object
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
26011fa22a main/tests: add ARB_shader_storage_buffer_object tokens to enum_strings
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
9b477ad49d main: Add SHADER_STORAGE_BLOCK and BUFFER_VARIABLE support for ARB_program_interface_query
Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE queries.

v2:
- Use std430_array_stride() to get top level array stride following std430's rules.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
0f18945cb6 glsl: Do not allow reads from write-only buffer variables
The error location won't be right, but fixing that would require to check
for this as we process each type of AST node that can involve a variable
read.

v2:
  - Limit the check to buffer variables, image variables have different
    semantics involved.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
995a719499 glsl: Do not allow assignments to read-only buffer variables
v2:
  - Merge the error check for the readonly qualifier with the already
    existing check for variables flagged as readonly (Timothy).
  - Limit the check to buffer variables, image variables have different
    semantics involved (Curro).

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
6ef82f039c glsl: Allow memory qualifiers on shader storage buffer blocks
v2:
  - Memory qualifiers on shader storage buffer objects do not come in the form
    of layout qualifiers, they are block-level qualifiers.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
f1b647fdd1 glsl: Apply memory qualifiers to buffer variables
v2:
  - Save memory qualifier info in the top level members of a shader
    storage block.
  - Add a checks to record_compare() which is used when comparing
    shader storage buffer declarations in different shaders.
  - Always report an error for incompatible readonly/writeonly
    definitions, whether they are present at block or field level.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
f4c8c01a3d glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
3b2037f88c glsl: fix UNIFORM_BUFFER_START or UNIFORM_BUFFER_SIZE query when no buffer object is bound
According to ARB_uniform_buffer_object spec:

"If the parameter (starting offset or size) was not specified when the
 buffer object was bound (e.g. if bound with BindBufferBase), or if no
 buffer object is bound to <index>, zero is returned."

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
2e16dd1350 mesa: Add queries for GL_SHADER_STORAGE_BUFFER
These handle querying the buffer name attached to a giving binding point
as well as the start offset and size of that buffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
4b7b1cf3c0 mesa: add glShaderStorageBlockBinding()
Defined in ARB_shader_storage_buffer_object extension.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
a07d0c2657 glsl: First argument to atomic functions must be a buffer variable
v2:
  - Add ssbo_in the names of the static functions so it is clear that this
    is specific to SSBO atomics.

v3:
  - Move the check after the loop (Kristian Høgsberg)

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
5ef169034c i965/nir/vec4: Implement nir_intrinsic_ssbo_atomic_*
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
14af6f4698 i965/nir/fs: Implement nir_intrinsic_ssbo_atomic_*
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
9d5c0be5d5 nir: Implement lowered SSBO atomic intrinsics
The original GLSL IR intrinsics have been lowered to an internal
version that accepts a block index and an offset instead of a
SSBO reference.

v2 (Connor):
  - Document the sources used by the atomic intrinsics.

Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
d2719b6e4f glsl: lower SSBO atomic intrinsics
The first argument to SSBO atomics is a reference to a SSBO buffer variable
so we want to compute its block index and offset and provide these values
to an internal version of the intrinsic that takes them instead of the
buffer variable reference.

v2:
- Support single components of integer vectors to be passed in as arguments.
- Get interface packing information from interface's type.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
da659087b9 glsl: use ir_rvalue instead of ir_dereference in auxiliary functions
In a later commit we will need to handle ir_swizzle nodes too, which are
not an ir_dereference. That can happen, for example, when we pass a
component of an integer vector as argument to any of the SSBO atomic
functions.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
ea0a1f5beb glsl: Add atomic functions from ARB_shader_storage_buffer_object
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
2cacebaad3 glsl: Rename atomic counter functions
Shader Storage Buffer Object will add new atomic functions that are not
associated with counters, so better have atomic counter-specific functions
explicitly include the word "counter" in their names.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
586142658e glsl: atomic counters can be declared as buffer-qualified variables
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
475d9c32d1 nir/glsl_to_nir: ignore an instruction's dest if it hasn't any
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
e3f9c7829c i965/nir/vec4: Implement nir_intrinsic_load_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
5b186aafe7 i965/nir/fs: Implement nir_intrinsic_load_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
e59ae238b6 nir: Implement __intrinsic_load_ssbo
v2:
- Fix ssbo loads with boolean variables.

v3:
- Simplify the changes (Kristian)

Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
3e70c968de nir: modify the instruction insertion in nir_visitor::visit(ir_call *ir)
This patch moves nir_instr_insert_after_cf_list call into each case
in the intrinsics switch at nir_visitor::visit(ir_call *ir) and
define a nir_dest variable which will be used when handling
ir->return_deref after the switch.

This patch simplifies the code for nir_intrinsic_load_ssbo
implementation changes we are going to do next.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
922b3d1bb1 i965/nir/vec4: Implement nir_intrinsic_store_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
337dad8cee i965/nir/fs: Implement nir_intrinsic_store_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
9bb7d9ecf8 nir: Implement __intrinsic_store_ssbo
v2 (Connor):
 - Make the STORE() macro take arguments for the extra sources (and their
   size) and any extra indices required.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
f17c6b9066 i965/vec4: Import surface message builder functions.
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit.

v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
d5503ce39f i965/vec4: Import helpers to convert vectors into arrays and back.
These functions handle the conversion of a vec4 into the form expected
by the dataport unit in message and message return payloads.  The
conversion is not always trivial because some messages don't support
SIMD4x2 for some generations, in which case a strided copy may be
necessary.

v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
402cb7ce13 i965/vec4: Introduce VEC4 IR builder.
See "i965/fs: Introduce FS IR builder." for the rationale.

v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument.  Improve handling
    of debug annotations and execution control flags.  Rename "instr"
    variable.  Initialize cursor to NULL by default and add method to
    explicitly point the builder at the end of the program.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
203cd1bf28 glsl: shader storage blocks use different max block size values than uniforms
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
eb9a9b62b1 glsl: ignore buffer variables when counting uniform components
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
138e4ae8ae glsl: number of active shader storage blocks must be within allowed limits
Notice that we should differentiate between shader storage blocks and
uniform blocks, since they have different limits.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
a7b4ab45d0 glsl: a shader storage buffer must be smaller than the maximum size allowed
Otherwise, generate a link time error as per the
ARB_shader_storage_buffer_object spec.

v2:
- Fix error message (Jordan)

v3:
- Move std140_size() changes to its own patch (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
e854a98001 glsl: add std430 interface packing support to ssbo related operations
v2:
- Get interface packing information from interface's type, not the
  variable type.
- Simplify is_std430 condition in emit_access() for readability (Jordan)
- Add a commment explaing why array of three-component vector case is
  different in std430 than the rest of cases.
- Add calls to std430_array_stride().

v3:
- Simplify size_mul change for std430's case (Jordan)
- Fix commit log lines length (Jordan)
- Pass 'packing' instead of 'is_std430' to emit_access() (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
1be180b941 glsl: Add std430 support to program_resource_visitor's member functions
They are used to calculate the offset, array stride of uniform/shader
storage buffer variables. Take into account this info to get the right
value for std430.

v2:
- Fix commit log line length and indention. (Jordan)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
8f0167c65b glsl: Add parser/compiler support for std430 interface packing qualifier
v2:
- Fix a missing check in has_layout()

v3:
- Mention shader storage block in error message for layout qualifiers
  (Kristian).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
35476c2bae glsl: Add std430 related member functions to glsl_type class
They are used to calculate size, base alignment and array stride values
for a glsl_type following std430 rules.

v2:
- Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
a40f917c4b glsl: allow default qualifiers for shader storage block definitions
This kind of definitions:

    layout(xxx) buffer;

was not supported by commit 84fc5fece0.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
3763a0e0a7 glsl: Move interface block processing to glsl_parser_extras.cpp
No functional changes.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
9c1f10b1bc glsl: ignore default qualifier declarations when checking for duplicate layout qualifiers
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
130031168d glsl: layout qualifier can appear more than once since OpenGL 4.20
Also if GL_ARB_shading_language_420pack extension is enabled.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
5bb5eeea00 i965/wm: surfaces should have the API buffer size, not the drm buffer size
The returned drm buffer object has a size multiple of 4096 but that should not
be exposed to the API user, which is working with a different size.

As far as I can see this problem is only visible in the calculation of the
length of unsized arrays used in SSBOs, as the implementation of this needs
to query the underlying buffer size via a message.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
eaa6f01c8d i965/wm: emit null buffer surfaces when null buffers are attached
Otherwise we can expect odd things to happen if, for example, we ask
for the size of the attached buffer from shader code, since that
might query this value from the surface we uploaded and get random
results.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f5dd2c1822 i965/fs/nir: implement nir_intrinsic_get_buffer_size
v2:
- Remove inst->regs_written assignment as the instruction only
  writes to one register.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
b23eb643eb i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZE
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
65d7f5fe9f i965/vec4/nir: implement nir_intrinsic_get_buffer_size
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
6485880232 i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZE
Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
003ce30e36 nir: Implement ir_unop_get_buffer_size
This is how backends provide the buffer size required to compute
the size of unsized arrays in the previous patch

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
750c694474 glsl: implement unsized array length
v2:
- Reduce the number of lines over 80 character line width
  limit. (Thomas Hellan)

v3:
- Inject the formula to compute the array length in the IR, backends
  only need to provide the buffer size (Curro)
- Create an auxiliary function to simplify code (Jordan Justen)
- Rename variables (Jordan Justen)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
273f61a005 glsl: Add parser/compiler support for unsized array's length()
The unsized array length is computed with the following formula:

array.length() =
   max((buffer_object_size - offset_of_array) / stride_of_array, 0)

Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.

This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.

It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.

v2:
- Do not define a triop that will force backends to implement the
  entire formula, they should only need to provide the buffer size
  since the other values are known by the frontend (Curro).

v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
  of using state->ARB_shader_storage_buffer_object_enable (Tapani).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
1440d2a683 glsl: Add unsized array support to glsl_type::std140_size()
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
68f5a4e6d2 glsl: fix indention in glsl_types.cpp
No functional changes.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f3f64cd0c4 glsl: add support for unsized arrays in shader storage blocks
They only can be defined in the last position of the shader
storage blocks.

When an unsized array is used in different shaders, it might be
converted in different sized arrays, avoid get a linker error
in that case.

v2:
- Rework error condition and error messages (Timothy Arceri)

v3:
- Move OpenGL ES check to its own patch.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f45d39f6af glsl: return error if unsized arrays are found in OpenGL ES
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
6335c79236 i965/fs: Do not split buffer variables
Buffer variables are the same as uniforms, only that read/write, so we want
the same treatment.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
2773a7cf1d i965: handle visiting of ir_var_shader_storage variables
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
37da6a2acd i965: Upload Shader Storage Buffer Object surfaces
Since these are a special kind of UBOs we emit them together reusing the
same infrastructure, however, we use a RAW surface so we can reuse
existing untyped read/write/atomic messages which include a pixel mask
header that we need to set to obtain correct behavior with helper
invocations of the fragment shader.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
bdbabc57e3 i965: Set MaxShaderStorageBuffers for compute shaders
v2:
- Set it after the driver's MaxShaderStorageBuffers value assignment.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Samuel Iglesias Gonsalvez
36f392c4ef i965: set ARB_shader_storage_buffer_object related constant values
v2:
- Add tessellation shader constants assignment

v3:
- Set MaxShaderStorageBufferBindings to 36.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
dfdeb94a5a i965: Implement DriverFlags.NewShaderStorageBuffer
We use the same dirty state for SSBOs and UBOs because they share the
same infrastructure.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
332ff009ff i965: Use 64-byte offset alignment for shader storage buffers
This should be a cacheline (64 bytes) so that we can safely have the
CPU and GPU writing the same SSBO on non-cachecoherent systems (our
Atom CPUs). With UBOs, the GPU never writes, so there's no
problem. For an SSBO, the GPU and the CPU can be updating disjoint
regions of the buffer simultaneously and that will break if the
regions overlap the same cacheline.

v2:
- Use cacheline size (64 bytes) instead of 16 bytes (Kristian).
- Update commit log and add a comment in the code explaining
  why we use cacheline size (Ben).

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Samuel Iglesias Gonsalvez
4cf908f9cb mesa: set MAX_SHADER_STORAGE_BUFFERS to 16.
v2:
- Set the value to 16 and drop the comment. (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Tapani Pälli
4639cea292 glsl: add packed varyings to program resource list
This makes sure that user is still able to query properties about
variables that have gotten packed by lower_packed_varyings pass.

Fixes following OpenGL ES 3.1 test:
   ES31-CTS.program_interface_query.separate-programs-vertex

v2: fix 'name included in packed list' check (Ilia Mirkin)
v3: iterate over instances of name using strtok_r (Ilia Mirkin)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-25 08:14:41 +03:00
Tapani Pälli
a6b55beb78 mesa: add packed_varyings list to gl_shader
This is required to store information about packed varyings, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-25 08:05:59 +03:00
Jordan Justen
ebbe6cdad7 i965/cs: Implement DispatchComputeIndirect support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
d11d018ce3 mesa/cs: Implement glDispatchComputeIndirect
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
12cf91db02 mesa/cs: Support GL_DISPATCH_INDIRECT_BUFFER
v2:
 * Use _mesa_has_compute_shaders (Ilia)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
4a1ba7e6bd mesa/cs: Add _mesa_validate_DispatchCompute
Move API validation to _mesa_validate_DispatchCompute in
api_validate.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Roland Scheidegger
19604d30e1 mesa: fix mipmap generation for immutable, compressed textures
If the immutable compressed texture didn't have the full mip pyramid,
this didn't work, because it tried to generate mip levels for non-existing
levels. _mesa_prepare_mipmap_level() would correctly handle this by returning
FALSE if the mip level didn't exist, however we actually created the
non-existing mip level right before that because we used _mesa_get_tex_image()
before calling _mesa_prepare_mipmap_level(). It would then proceed to crash
(we allocated the mip level, which is a bad idea on an immutable texture,
but didn't initialize the values, leading to assertion failures or segfaults).
Fix this by using _mesa_select_tex_image() instead and call it after
_mesa_prepare_mipmap_level(), as that function will allocate missing mip levels
for non-immutable textures already.
This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu
packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip
chains - I believe this app not doing it is actually unintentional, always one
level less than full mip chain...).

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 00:06:10 +02:00
Matt Turner
d6bb46bbe8 glsl: Expose gl_MaxTess{Control,Evaluation}AtomicCounters.
... with only ARB_shader_atomic_counters.

I expected to see interactions with ARB_tessellation_shader in the
ARB_shader_atomic_counters spec, but they do not exist. It seems that we
should unconditionally expose these variables in the presence of
ARB_shader_atomic_counters:

   gl_MaxTessControlAtomicCounters
   gl_MaxTessEvaluationAtomicCounters

This partially reverts commit da7adb99e8. The commit also affected
gl_MaxTessControlImageUniforms and gl_MaxTessEvaluationImageUniforms
similarly but the ARB_shader_image_load_store spec does list an
interaction with ARB_tessellation_shader.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92095
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-24 12:15:47 -07:00
Alejandro Piñeiro
7fee23569b i965/vec4: check swizzle before discarding a uniform on a 3src operand
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.

For example, this is what vec4_visitor::fix_3src_operand() is already
doing:

   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
      return src;

Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs:     812755 -> 795090 (-2.17%)
helped:                                7930
HURT:                                  0

Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs:     296630 -> 292596 (-1.36%)
helped:                                2533
HURT:                                  0

v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
  Ekstrand suggested
- Clean up the condition

v3:
- Move the check back to the original place, to keep things
  tidy, as suggested by Jason Ekstrand

v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-24 21:12:53 +02:00
Mauro Rossi
1d040160f8 android: radeonsi: fix sid_tables.h missing LOCAL_MODULE_CLASS
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 20:05:41 +02:00
Benjamin Bellec
ebcc886d87 gallium/radeon: remove the percentage symbol from HUD temperature
The HUD adds '%' if max == 100.

Signed-off-by: Benjamin Bellec <b.bellec@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 19:54:50 +02:00
Marek Olšák
7bbce21e45 gallium/u_blitter: handle allocation failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
ae418a7b56 radeonsi: handle dummy constant buffer allocation failure
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
b737d9c1dc radeonsi: don't forget to update scratch relocations for LS, HS, ES shaders
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
d556346b35 radeonsi: skip drawing if updating the scratch buffer fails
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
1f99b0be7e radeonsi: skip drawing if PS fails to compile or upload
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
237d7cccce radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
9b6d9dd7d8 radeonsi: handle fixed-func TCS shader create failure
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
5dbadb0257 radeonsi: handle shader precompile failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
263f5a2cf9 radeonsi: skip drawing if GS ring allocations fail
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
22d3ccf5a8 radeonsi: skip drawing if the tess factor ring allocation fails
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
5c219ab552 radeonsi: add malloc fail paths to si_create_shader_state
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
394d67a58f radeonsi: report alloc failure from si_shader_binary_read
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
dea834e639 gallium/radeon: add a fail path for depth MSAA texture readback
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
f95e695059 gallium/radeon: handle buffer alloc failures in r600_draw_rectangle
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
282b378012 gallium/radeon: handle buffer_map staging buffer failures better
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
cd27ff6a0f radeonsi: handle constant buffer alloc failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
29dff6f676 radeonsi: handle index buffer alloc failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
f3a0819533 st/mesa: fix front buffer regression after dropping st_validate_state in Blit
Broken by: d082c53249
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-24 19:51:42 +02:00
Kristian Høgsberg Kristensen
21c1c7ff81 wayland: Add copyright notice for wayland-egl.c
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-24 10:51:10 -07:00
Kristian Høgsberg Kristensen
2ea16966ae i965: Respect stride and subreg_offset for ATTR registers
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-24 10:17:27 -07:00
Brian Paul
200aee4247 mesa: rework Driver.CopyImageSubData() and related code
Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures
to wrap renderbuffer sources/destinations.  This caused a bit of a mess in
the Mesa/gallium state tracker because we had to basically undo that
wrapping.

Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer
and gl_texture_image src/dst pointers (one being null, the other non-null)
so the driver can handle renderbuffer vs. texture as needed.

For the i965 driver, we basically moved the code that wrapped textures
around renderbuffers from copyimage.c down into the met and driver code.

The old code in copyimage.c also made some questionable calls to
_mesa_BindTexture(), etc. which weren't undone at the end.

v2 (Jason Ekstrand): Rework the intel bits
v3 (Brian Paul): Update the temporary st_CopyImageSubData() function.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
2015-09-24 07:52:42 -06:00
Thomas Hellstrom
c8cb5ed93c st/xa: Fixups for PIPE_FORMAT_R8_UNORM A8 usage v2.
Check for PIPE_FORMAT_R8_UNORM when setting up the copy shader.
Also re-enable the dest alpha blending with A8 destination that
actually turned out to be correct.

Verified using rendercheck that the composite operators
overreverse, in, out, atop, atopreverse and xor seem to work fine
with a8 destiation.

v2: Fix a copy-paste error.

Reported-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-24 04:47:48 -07:00
Ilia Mirkin
1614c39a8f st/mesa: keep track of saturated writes when eliminating dead code
It doesn't matter whether a write is saturated or not, in another
implementation it might even have been a separate opcode. This code was
most likely copied from the copy-propagation pass (where one does have
to distinguish saturation).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 00:19:55 -04:00
Timothy Arceri
827d794834 glsl: correctly detect inactive UBO arrays
Previously the code was trying to get the packing type from the array not the
interface.

Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-24 10:07:42 +10:00
Ilia Mirkin
71e187430c i965: add ARB_texture_barrier support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 15:49:54 -04:00
Kenneth Graunke
31a36ffbc8 i965/gs: Fix extra level of indentation left by the previous commit.
I left a bunch of code indented a level in the previous patch to make
the diff easier to read.  But now we should fix that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
df31c1850d i965/gs: Use new NIR intrinsics.
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:

total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs:     3491 -> 3259 (-6.65%)
helped:                                11
HURT:                                  0

Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.

This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
542d40d698 nir: Add new GS intrinsics that maintain a count of emitted vertices.
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones.  See the comments above that for the
rationale behind the new intrinsics.

This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.

v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
  Ekstrand - I'd mistakenly used nir_after_block when rebasing this
  code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
  the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
  Michael Schellenberger Costa).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
0a040975ec nir: Add unit tests for control flow graphs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
fbaa1b19d7 nir/cf: Fix dominance metadata in the dead control flow pass.
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on.  Preserving
information about block dominance is hard (and probably not worthwhile).

This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.

We then make the dead control flow pass compute dominance information
right before it uses it.  This is necessary because earlier work by the
pass may have invalidated it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
6560838703 nir/cf: Fix unlink_block_successors to actually unlink the second one.
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0].  So the successors[1] != NULL check will
always fail.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
024e5ec977 nir/cf: Alter block successors before adding a fake link.
Consider the case of "while (...) { break }".  Or in NIR:

        block block_0 (0x7ab640):
        ...
        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 */
                break
                /* succs: block_2 */
        }
        block block_2:

Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.

Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set.  This is illegal: a block cannot specify the
same successor twice.  In particular, adding the predecessor would have
no effect, as it was already present in the set.

We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set.  It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.

The fix appears to be simple: simply unlink the block's successors and
recreate them to point at the correct blocks first.  Then, add the fake
link.  In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), so adding
the fake link won't cause a duplicate successor.

v2: Add comments (requested by Connor Abbott) and fix commit message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
0991b2eb35 nir/cf: Conditionally do block_add_normal_succs() in unlink_jump();
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges.  In order to
fix that, I need everything in one routine.

However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
9674c76c0e nir/cf: Don't break outer-block successors in split_block_beginning().
Consider the following NIR:

   block block_0;
   /* succs: block_1 block_2 */
   if (...) {
      block block_1;
      ...
   } else {
      block block_2;
   }

Calling split_block_beginning() on block_1 would break block_0's
successors:  link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */.  This is invalid: the
block before an if statement must have two successors.

Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order.  NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.

This patch creates a new function that swaps out one successor for
another, preserving the ordering.  It then uses this to fix the issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
e2637db618 nir/cf: Make a helper function for removing a predecessor.
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
6a67ede6b3 nir: Validate that a block doesn't have two identical successors.
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Jason Ekstrand
8dcbca5957 nir/lower_vec_to_movs: Don't emit unneeded movs
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself.  If swizzling is involved,
we need to emit the move but.  However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.

Shader-db results on Haswell:

   total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
   instructions in affected programs:     184780 -> 181802 (-1.61%)
   helped:                                838
   HURT:                                  0

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Jason Ekstrand
65e80ce5b5 nir/lower_vec_to_movs: Properly handle source modifiers on vecN ops
I don't know of any piglit tests that are currently broken.  However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Ville Syrjälä
aae0c88797 i915: Make hw_prim[] const
The table used to map the GL primitive to the hw primitive never
changes so make it const.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-23 09:57:46 -07:00
Ville Syrjälä
84fec757de t_dd_dmatmp: Make the render_tab[]s const
These tables hold function pointers and they never change so
make them const.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-23 09:57:46 -07:00
Ian Romanick
abbaf3301f mesa: Remove unused HAVE_TRI_STRIP_1 defines
Defined to 0 in a few places, but it's not used anywhere.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:42 -07:00
Ian Romanick
d830965057 t_dd_dmatmp: Constify dmasz
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:38 -07:00
Ian Romanick
8e9968f184 t_dd_dmatmp: Silence comparison between signed and unsigned integer expression warnings
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                                                           ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                                                           ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                                                        ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:26 -07:00
Ian Romanick
d663d8f5d4 t_dd_dmatmp: Use stdbool.h
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:24 -07:00
Ian Romanick
b7259fc6b0 t_dd_dmatmp: General indentation and formatting fixes
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:22 -07:00
Ian Romanick
57ae5c237d t_dd_dmatmp: Indentation and formatting fixes after HAVE_ELTS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:20 -07:00
Ian Romanick
25b42f13bd t_dd_dmatmp: Remove HAVE_ELTS support
Two drivers use this file, and neither supports ELTs.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:17 -07:00
Ian Romanick
1f374958fd t_dd_dmatmp: Indentation and formatting fixes after HAVE_TRI_FANS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:15 -07:00
Ian Romanick
03c3208c18 t_dd_dmatmp: Require HAVE_TRI_FANS
Two drivers use this file, and both support triangle fans.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:13 -07:00
Ian Romanick
2e19ed3cb5 t_dd_dmatmp: Indentation and formatting fixes after HAVE_TRI_STRIPS change
v2: Fix '- nr' typo noticed by Marius.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
2015-09-23 09:57:11 -07:00
Ian Romanick
fd97a05508 t_dd_dmatmp: Require HAVE_TRI_STRIPS
Two drivers use this file, and both support triangle strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:08 -07:00
Ian Romanick
22b73f3c2a t_dd_dmatmp: Require HAVE_TRIANGLES
Two drivers use this file, and both support triangles.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:06 -07:00
Ian Romanick
dcd8e49962 t_dd_dmatmp: Indentation and formatting fixes after HAVE_LINE_STRIPS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:04 -07:00
Ian Romanick
1ecdf956ac t_dd_dmatmp: Require HAVE_LINE_STRIPS
Two drivers use this file, and both support line strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:01 -07:00
Ian Romanick
1ab8a69a3b t_dd_dmatmp: Indentation and formatting fixes after HAVE_LINES change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:59 -07:00
Ian Romanick
b8461e03f0 t_dd_dmatmp: Require HAVE_LINES
Two drivers use this file, and both support lines.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:56 -07:00
Ian Romanick
265624c5af t_dd_dmatmp: Indentation and formatting fixes after HAVE_QUADS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:53 -07:00
Ian Romanick
4ecc387a93 t_dd_dmatmp: Remove HAVE_QUADS support
Two drivers use this file, and neither supports quads.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:51 -07:00
Ian Romanick
249ba09f59 t_dd_dmatmp: Remove HAVE_QUAD_STRIPS support
Two drivers use this file, and neither supports quad strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:48 -07:00
Ian Romanick
25543d8ec5 t_dd_dmatmp: Use addition instead of subtraction in loop bounds
This is used everywhere else in this file because it avoids problems
when count is zero (due to trimming).

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: Marius Predut <marius.predut@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:46 -07:00
Ian Romanick
c0b3b2f760 t_dd_dmatmp: Pull out common 'count -= count & 3' code
This was missing in the HAVE_TRIANGLES path, and that could cause
incorrect rendering.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: Marius Predut <marius.predut@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:43 -07:00
Ian Romanick
0d475ee2b9 t_dd_dmatmp: Use '& 3' instead of '% 4' everywhere
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:36 -07:00
Ian Romanick
fad8d54de7 t_dd_dmatmp: Clean up improper code formatting from previous patch
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:34 -07:00
Ian Romanick
d7bf7969b9 t_dd_dmatmp: Make "count" actually be the count
The value passed in count previously was "vertex after the last vertex
to be processed."  Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count.  That's just silly.

If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.

NOTE: There is some strange formatting left by this patch.  That was
done to make it more obvious that the before and after code is
equivalent.  These will be fixed in the next patch.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

v2: Fix a remaining (count-start) in render_quad_strip_verts.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:01 -07:00
Antia Puentes
f2e75ac88a i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 13:12:25 +02:00
Iago Toral Quiroga
cf439951b7 mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.
From section 9.2. Binding and Managing Framebuffer Objects:

"Upon successful return from Get*FramebufferAttachmentParameteriv, if
pname is FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, then params will contain
one of NONE, FRAMEBUFFER_DEFAULT, TEXTURE, or RENDERBUFFER, identifying
the type of object which contains the attached image."

And then it clarifies further:

"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default framebuffer is
bound, attachment is DEPTH or STENCIL, and the number of depth or stencil
bits, respectively, is zero"

Currently, if the default framebuffer is bound, we always return
GL_FRAMEBUFFER_DEFAULT for FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, but
according to the spec, when GL_DEPTH or GL_STENCIL attachments are
the ones being queried, we should return GL_NONE if they don't exist.

Fixes the following dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
2015-09-23 12:50:00 +02:00
Tapani Pälli
89524e7171 glsl: bail out early in _mesa_ShaderSource if no shaderobj
Patch fixes a crash in conformance test that tries out different
invalid arguments for glShaderSource and glGetShaderSource:

   ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle

This is a regression from commit:
   04e201d0c0

Additions in v2 also fix following failing deqp test:
   dEQP-GLES[2|3].functional.negative_api.shader.shader_source

v2: cleanup function, do check earlier (Iago Toral)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-23 08:45:00 +03:00
Matt Turner
10da96887c i965/vec4: Detect and delete useless MOVs.
With NIR:

instructions in affected programs:     111508 -> 109193 (-2.08%)
helped:                                507

Without NIR:

instructions in affected programs:     28763 -> 28474 (-1.00%)
helped:                                186

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 21:20:29 -07:00
Jason Ekstrand
e7496fed2a prog_to_nir: Use nir_op_dph
Shader-db results on HSW:

   instructions in affected programs:     72 -> 56 (-22.22%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
999ff3c77d nir/lower_alu_to_scalar: Add support for nir_op_fdph
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
2e5423ad63 i965/vec4: Add support for fdph_replicated
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
e5a9346d00 nir: Add fdph and fdph_replicated opcodes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
0f9bf64770 nir/lower_alu_to_scalar: Return after lower_reduction
We don't use any of the code after the switch anyway.  Since we check for
num_components == 1 and early-return, it doesn't get executed so
everything's ok.  However, it makes it much clearer what's going on if we
simply do an early return.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
2b79db2c02 nir/lower_alu_to_scalar: Use the builder
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Chris Forbes
f5991ebf34 i965: Add defines for tessellation stages
v2 (Ken):
- Squash together commits for HS, DS, and TE, as well as fixes.
- Add INTEL_MASK variants so we can use SET_FIELD if we want.
- Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match
  the documentation.
- Add some more fields from the PRMs.
- Add Broadwell variants.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:23:46 -07:00
Grazvydas Ignotas
8ae8feca84 r600g: update num_dw in scissor_enable workaround
"r600g: apply disable workaround on all scissors" forgot to update
num_dw, fix it.

Fixes: fbb423b433 "r600g: apply disable workaround on all scissors"
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-23 09:09:04 +10:00
Alejandro Piñeiro
1bd89db921 i965/vec4: refactor brw_vec4_copy_propagation.
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:

1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value

Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches, even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).

Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs:     739837 -> 725032 (-2.00%)
helped:                                6237
HURT:                                  0

v2: using 'arg' index to get the from inst was wrong
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
    only set the source on a direct copy
v5: change the approach for a refactoring
v6: tweaked comments

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-22 19:30:18 +02:00
Brian Paul
4a03066e5a st/mesa: remove st_bind_framebuffer()
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer
is null, Mesa won't try to use it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:32 -06:00
Brian Paul
b590ffd0f9 mesa: const-qualify _mesa_is_legal_tex_storage_format ctx param
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:32 -06:00
Brian Paul
acee1a322d mesa: const-qualify _mesa_base_tex_format() ctx param
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:31 -06:00
Brian Paul
4879b76601 mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:30 -06:00
Brian Paul
76dbab0a69 mesa: whitespace, comment fixes in texstorage.c 2015-09-22 09:10:10 -06:00
Marta Lofstedt
419210005a mesa/es3.1: Enable GL_ARB_vertex_attrib_binding functionality for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:22:13 +02:00
Marta Lofstedt
cf293e518e mesa/es3.1: Allow query of Vertex bindings for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:22:06 +02:00
Marta Lofstedt
6c3de8996f mesa/es3.1 : Align OpenGL ES 3.1 glBindVertexBuffer error handling with OpenGL Core
According to OpenGL ES 3.1 specification 10.3.1:
"An INVALID_OPERATION error is generated if buffer is not zero
or a name returned from a previous call to GenBuffers,
or if such a name has since been deleted with DeleteBuffers."
This error check was previously limited to OpenGL Core.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:21:59 +02:00
Tapani Pälli
7f8815bcb9 i965: fix textureGrad for cubemaps
Fixes bugs exposed by commit
2b1cdb0edd in:
   ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_frag

No regressions observed in deqp, CTS or Piglit.

v2: address review feedback from Iago Toral:
   - move rho calculation to else branch
   - optimize dx and dy calculation
   - fix documentation inconsistensies

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91114
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-22 08:14:20 +03:00
Kenneth Graunke
5cede90f62 nir: Report progress from nir_normalize_cubemap_coords().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:54:34 -07:00
Kenneth Graunke
d7ffd90ecb nir: Add braces around multi-line loop.
This was correct but not our usual style.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:01 -07:00
Kenneth Graunke
0a1adaf11d nir: Report progress from nir_lower_system_values().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:00 -07:00
Kenneth Graunke
dc18b9357b nir: Report progress from nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:59 -07:00
Kenneth Graunke
cfae0f8a3a nir: Report progress from nir_lower_locals_to_regs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:57 -07:00
Kenneth Graunke
1adde5b87e nir: Report progress from nir_remove_dead_variables().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:55 -07:00
Jason Ekstrand
9f5e7ae9d8 nir: Report progress from lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:54 -07:00
Kenneth Graunke
967a5ddb88 nir: Report progress from nir_lower_globals_vars_to_local().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:45 -07:00
Jason Ekstrand
60befc6347 i965: Clean up GLSL compiler option setup
The only functional change here is that we now set EmitNoIndirectOutput and
EmitNoIndirectTemp for compute shaders.  Compute shaders don't have outputs
per-se and we should have been setting EmitNoIndirectTemp all along.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-21 13:26:52 -07:00
Jeremy Huddleston
6dfc5e28f7 configure.ac: Add support to enable read-only text segment on x86.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.gentoo.org/240956
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-21 12:47:09 -07:00
Ben Widawsky
c1e38ad370 i965/skl: Use larger URB size where available.
All SKL SKUs except the lowest one which has half the L3 size actually have 384K
of URB per slice.

For once, I can explain how this mistake was made and how it was missed in
review...  Historically when we enable a platform and put the production sizes,
you can simply look at the "smallest" SKU and see what its URB size is (and we
assumed it was the 1 slice variant). Since on newer platforms the URB sizes are
scaled automatically by HW, this was sufficient. On SKL, this is a bit different
as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not
GT1) variant and it has 384K.

There are no Jenkins tests fixed (or regressions) and we don't expect any fixes
here because you can always run with less URB size.

Thanks to Sarah for bringing this to my attention.

Cc: Sarah Sharp <sarah.a.sharp@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-21 11:27:08 -07:00
Jason Ekstrand
46362db4a6 nir/builder: Don't use designated initializers
Designated initializers are not allowed in C++ (not even C++11).  Since
nir_lower_samplers is now using nir_builder, and nir_lower_samplers is in
C++, this breaks the build on some compilers.  Aparently, GCC 5 allows it
in some limited extent because mesa still builds on my system without this
patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92052
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 10:41:43 -07:00
Jason Ekstrand
d513388c8a nir: Move system value -> intrinsic mapping into nir.c
This way they're right next to the map going the other direction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 09:49:40 -07:00
Emil Velikov
de7ffdb383 nir: rename nir_lower_samplers.c{pp,}
With the only C++ function having its own wrapper we can 'demote' this
file to a normal C one. This allows us to get rid of extern C { #include
<foo.h> } 'hacks'. Plus some of the headers may use C99 initializers,
which are not supported by the ISO standard.

This may cause build issue on incremental builds. If so run the
following:

sed -i -e 's|samplers\.cpp|samplers.c|' src/glsl/nir/.deps/nir_lower_samplers.Plo

Fixes: ef8eebc6ad5(nir: support indirect indexing samplers in struct arrays)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Gottfried Haider <gottfried.haider@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:02:06 +01:00
Emil Velikov
d130cda453 nir: add C wrapper around glsl_type::record_location_offset
This will allow us to convert nir_lower_sampler.cpp to C.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:56 +01:00
Emil Velikov
bdb1faf44e nir: move stdio.h inclusion before extern C
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:32 +01:00
Kenneth Graunke
c1070550c2 i965: Fix MRF register number assertions for compr4.
compr4 is represented by setting the high bit on the MRF number.
We need to mask it out before sanity checking the register number.

Fixes ~8000 assert fails on Ironlake and G45.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 07:45:14 -07:00
Ilia Mirkin
72ebd532a1 radeonsi: implement TXQS support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fredrik Bruhn <f@unibap.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-21 08:31:29 -04:00
Ilia Mirkin
7d5162bdc0 radeonsi: load fmask ptr relative to the resources array
res_ptr already contains the resource values. fmask_ptr needs to be
looked up relative to the start of the resource params.

Note that this only affects indirect loads of MS sampler arrays.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-21 08:30:51 -04:00
Iago Toral Quiroga
5d23ce2f15 i965/vec4: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:48:05 +02:00
Iago Toral Quiroga
6789a32075 i965/fs: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:56 +02:00
Iago Toral Quiroga
f50645d05c i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631

Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).

Notice that the hardware docs are not clear about this fact:

SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"

However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:

"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."

Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:

Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461

Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461

This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:45 +02:00
Iago Toral Quiroga
0858610836 i965: Move MRF register asserts out of brw_reg.h
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.

Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:35 +02:00
Iago Toral Quiroga
d48ac93066 i965: Maximum allowed size of SEND messages is 15 (4 bits)
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.

Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:03 +02:00
Rob Clark
b65f91dd32 nir/print: fix coverity error
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path).  But it
was something I overlooked the first time, so might as well fix it.

    *** CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    /src/glsl/nir/nir_print.c: 299 in print_var_decl()
    293
    294           fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
    295        }
    296
    297        fprintf(fp, "\n");
    298
    >>>     CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    >>>     Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    299        if (state) {
    300           _mesa_set_add(state->syms, name);
    301           _mesa_hash_table_insert(state->ht, var, name);
    302        }
    303     }
    304

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-20 14:04:06 -04:00
Eduardo Lima Mitev
6ba291db4b i965/vec4/nir: Remove all "this->" snippets
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-20 17:11:49 +02:00
Marcin Ślusarz
8f6fd57db2 dri/common: fix gbm-symbols-check regression
Broken by commit c228514c72
"dri/common: use sysconfdir when looking for drirc".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92054
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-20 13:44:07 +02:00
Emil Velikov
1e01db0fa9 docs: add news item and link release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-20 11:59:24 +01:00
Emil Velikov
278a32374c docs: add sha256 checksums for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 02387926ad)
2015-09-20 11:58:04 +01:00
Emil Velikov
72d407da10 docs: add release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 91c6302734)
2015-09-20 11:58:03 +01:00
Nanley Chery
99b1f4751f mesa/teximage: reuse compressed format utility functions for base_format
Reuse utility functions instead of reimplementing the same logic.

* _mesa_is_compressed_format() performs the required checking to
  determine format support in the current context.
* _mesa_gl_compressed_format_base_format() returns the base format.

As a side effect, we now check that we're in a desktop context when
determining support for the FXT1 and RGTC formats. This is in agreement
with our extension table and the glext headers.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:27:15 -07:00
Nanley Chery
db2777091d mesa/texcompress: add compressed formats to base format utility function
Add S3TC and PALETTE formats.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:27:10 -07:00
Nanley Chery
29835fe19e mesa/glformats: refactor compressed format support function
Instead of case statements, use _mesa_get_format_layout() to
determine if a GL format is part of a family of compressed formats.

v2. restrict LATC formats to API_OPENGL_COMPAT (Ilia).
    rename the variable mFormat to m_format.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:26:55 -07:00
Nanley Chery
31a5135cd7 mesa/formats: add MESA_LAYOUT_LATC
This enables us to predicate statments on a compressed format being
a type of LATC format. Also, remove the comment that lists the enum
(it was getting a tad long).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:25:59 -07:00
Marcin Ślusarz
c228514c72 dri/common: use sysconfdir when looking for drirc
Useful when locally installed mesa has more quirks than the system one.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-19 19:17:34 +02:00
Rob Clark
9ffc1049ca freedreno/ir3: use nir two-sided-color lowering
With this, we completely switch over to nir lowering passes instead of
tgsi_lowering.  So one step closer to supporting direct glsl or spirv to
nir support for freedreno a3xx/a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-18 21:07:50 -04:00
Rob Clark
e13ed3ffb4 nir: add two-sided-color lowering pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-18 21:07:50 -04:00
Rob Clark
e4dfcdcbec nir/build: add nir_vec() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-18 21:07:50 -04:00
Rob Clark
c71cb670ba freedreno/ir3: lower txp/clamp in NIR
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-18 21:07:50 -04:00
Rob Clark
3745c38425 nir/lower_tex: add support to clamp texture coords
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP.  This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.

v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
1ce8060c25 nir/lower_tex: support for lowering RECT textures
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
faf5f174dd nir/lower_tex: support projector lowering per sampler type
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types.  In this case we want more fine grained control over
which texture projectors get lowered.

v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
f83ba7bc41 nir/lower_tex: split out project_src() helper
Split this out to reduce noise in later patches.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
d9b9ff76f1 nir: rename nir_lower_tex_projector
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Alejandro Piñeiro
06d31dceae i965/vec4: Change types as needed to propagate source modifiers using current instruction
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around.  So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.

This commit adds support for copy propagation using current instruction
as reference.

Equivalent to commit 472ef9 but for vec4.

v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
    solve (among others) the following deqp regression with v2:
    dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-19 00:31:25 +02:00
Iago Toral Quiroga
f7ca52dd6d i965/fs: Fix comparison between signed and unsigned integer expressions
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-18 13:37:25 +02:00
Tapani Pälli
afa1efdc85 mesa: fix errors when reading depth with glReadPixels
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().

Fix regression caused by 81d2fd91a9 in:
   ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels

Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.

v2: change check in _mesa_error_check_format_and_type to be explicit
    for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
    this function for both glReadPixels and glDrawPixels validation.
    (No Piglit regressions seen with v2.)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-18 07:41:47 +03:00
Rob Clark
2e4ab489b5 nir/builder: fix c++11 compiler warning
Fixes:

   In file included from nir/nir_lower_samplers.cpp:27:0:
   nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
   nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
       unsigned swizzle[4] = {c, c, c, c};

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:08:25 -04:00
Rob Clark
7c72f593ad nir: really actually fix comment this time
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:06:11 -04:00
Rob Clark
5305603b9d nir/print: print variable names
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:26:12 -04:00
Rob Clark
ba78260b0f nir: some comment fixups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:25:33 -04:00
Rob Clark
c70ed86172 freedreno/ir3: add --gpu arg to cmdline compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
c970ec0577 freedreno/a4xx: wire up ucp support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
91ec210ea8 freedreno/ir3: add support for ucp
Use nir_lower_clip pass for adding the VS/FS instructions to handle
user-clip-planes and CLIPDIST.  Wire up support for load_user_clip_plane
intrinsic to fetch ucp[plane] values as driver-params (passed as const's
to the shader).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
509e0c4505 nir: add lowering stage for user-clip-planes / clipdist
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).

Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them).  But it's better than no UCP
support at all for drivers that don't have this in hw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-17 19:57:21 -04:00
Rob Clark
53671a3723 nir: add sysval for user-clip-planes
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-09-17 19:55:43 -04:00
Rob Clark
c4572b7dfe freedreno/ir3: convert from tgsi semantic/index to varying-slot
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Rob Clark
4a121e1a90 glsl: add SYSTEM_VALUE_VERTEX_CNT
Used internally in freedreno/ir3 to calc stream-out position.  Seems
like a generic enough way to implement stream-out (using str instrs),
plus it avoids compiler warnings by sneaking in a non-enum value in
switch statements.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Rob Clark
e523f69b1d freedreno/ir3: switch to shader_enums.h interp constants
A small step towards un-TGSI'ifying ir3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Ilia Mirkin
e844e1007d nv50,nvc0: flush texture cache in presence of coherent bufs
This fixes the newly-added arb_texture_buffer_object-bufferstorage
piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-17 19:50:47 -04:00
Ilia Mirkin
323c912506 nv50,nvc0: detect underlying resource changes and update tic
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.

This fixes:
  arb_direct_state_access-texture-buffer
  arb_texture_buffer_object-data-sync

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-17 19:50:47 -04:00
Boyan Ding
8d3b92af21 vc4: Try to pair up instructions when only one of them has PM bit
Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.

total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs:     11688 -> 10939 (-6.41%)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-17 14:57:46 -04:00
Jason Ekstrand
fc11dbe13f i965/vec4: Use nir_move_vec_src_uses_to_dest
The idea here is not that it gives register coalescing a little bit of a
helping hand.  It doesn't actually fix the coalescing problems, but it
seems to help a good bit.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
   instructions in affected programs:     1259166 -> 1196845 (-4.95%)
   helped:                                11363
   HURT:                                  148

v2 (Jason Ekstrand):
 - Run nir_move_vec_src_uses_to_dest after going out of SSA
 - New shader-db numbers

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-17 08:21:31 -07:00
Jason Ekstrand
a6c467d6c5 nir: Add a pass to rewrite uses of vecN sources to the vecN destination
v2 (Jason Ekstrand):
 - Handle non-SSA sources and destinations

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-17 08:19:48 -07:00
Jason Ekstrand
ddffe30f40 nir: Add comments to nir_index_instrs and nir_index_ssa_defs
The provided indices have the very nice property that if A dominates B then
A->index <= B->index.  We should document that somewhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Jason Ekstrand
8ecaef967d nir: Add a generic instruction index
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Ulrich Weigand
bd016a2601 mesa: Fix texture compression on big-endian systems
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.

The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order.  However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.

This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer.  This fixes
about 100 piglit test case failures on s390x for me.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-09-17 21:23:45 +10:00
Thomas Hellstrom
7e28650649 st/xa: Use PIPE_FORMAT_R8_UNORM when available
XA has been using L8_UNORM for a8 and yuv component surfaces.
This commit instead makes XA prefer R8_UNORM since it's assumed to have a
higher availability.

Also neither of these formats are suitable as destination formats using
destination alpha blending, so reject those operations.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-17 00:03:00 -07:00
Tapani Pälli
ba02f7a3b6 mesa: return initial value for VALIDATE_STATUS if pipe not bound
From OpenGL 4.5 Core spec (7.13):

    "If pipeline is a name that has been generated (without subsequent
    deletion) by GenProgramPipelines, but refers to a program pipeline
    object that has not been previously bound, the GL first creates a
    new state vector in the same manner as when BindProgramPipeline
    creates a new program pipeline object."

I interpret this as "If GetProgramPipelineiv gets called without a
bound (but valid) pipeline object, the state should reflect initial
state of a new pipeline object." This is also expected behaviour by
ES31-CTS.sepshaderobjs.PipelineApi conformance test.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Tapani Pälli
d9689be5c6 mesa: return initial value for PROGRAM_SEPARABLE when not linked
From OpenGL ES 3.1 spec (7.12):

    "Most properties set within program objects are specified not to
    take effect until the next call to LinkProgram or ProgramBinary.
    Some properties further require a successful call to either of
    these commands before taking effect. GetProgramiv returns the
    properties currently in effect for program, which may differ from
    the properties set within program since the most recent call to
    LinkProgram or ProgramBinary, which have not yet taken effect. If
    there has been no such call putting changes to pname into effect,
    initial values are returned."

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Tapani Pälli
8f1ae9abeb mesa: enable query of PROGRAM_PIPELINE_BINDING for ES 3.1
Specified in OpenGL ES 3.1 spec, Table 23.32: Program Object State.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Timothy Arceri
ef8eebc6ad nir: support indirect indexing samplers in struct arrays
As a bonus we get indirect support for arrays of arrays for free.

V5: couple of small clean-ups suggested by Jason.

V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason

V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason

V2: Fixed validation error in debug build

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:34 +10:00
Timothy
0ad44ce373 glsl: add helper for calculating offsets for struct members
V2: update comments

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:27 +10:00
Timothy Arceri
12af915e27 glsl: make variables private
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:21 +10:00
Timothy Arceri
dcd9cd0383 glsl: store uniform slot id in var location field
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.

V3: remove line wrap change from this patch

V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:14 +10:00
Timothy Arceri
9788700caf glsl: assign hidden uniforms their slot id earlier
This is required so that the next patch can safely assign the slot id
to the var.

The ids are now assigned in the order we want before allocating storage
so there is no need to sort the storage array and move things around.

V2: rename variable to make code easier to follow as suggested by Jason

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:26:45 +10:00
Timothy Arceri
874a0217fd glsl: order indices for samplers inside a struct array
This allows the correct offset to be easily calculated for indirect
indexing when a struct array contains multiple samplers, or any crazy
nesting.

The indices for the folling struct will now look like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 1 Name: s[1].tex
Sampler index: 2 Name: s[0].si.tex
Sampler index: 3 Name: s[1].si.tex
Sampler index: 4 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2

Before this change it looked like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 3 Name: s[1].tex
Sampler index: 1 Name: s[0].si.tex
Sampler index: 4 Name: s[1].si.tex
Sampler index: 2 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2

struct S_inner {
   sampler2D tex;
   sampler2D tex2;
};

struct S {
   sampler2D tex;
   S_inner si;
};

uniform S s[2];

V3: Update comments with suggestions from Jason

V2: rename struct array counter to have better name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:26:39 +10:00
Dave Airlie
b5df52b112 Revert "mesa/extensions: restrict GL_OES_EGL_image to GLES"
This reverts commit 48961fa3ba.

glamor/Xwayland use this, the spec saying something when it
was written, and the fact that the comment says Mesa relies on it
hasn't changed.

I also don't have a copy of this patch in my mail archive, which
seems wierd, did it get posted to mesa-dev?

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-17 06:58:51 +10:00
Eric Anholt
f5b26b4744 vc4: Only build in simulator mode if we find pkg-config for it.
This will let other developers build it x86 for build-testing purposes.
2015-09-16 15:54:00 -04:00
Ilia Mirkin
37d0becfd9 freedreno/a3xx: use NUM_USER_CLIP_PLANES helper instead of magic number
Use the helper from the newly-updated generated header file.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-16 15:42:55 -04:00
Ilia Mirkin
545a3cbb01 freedreno/a3xx: fix blending of L8 format
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-16 15:42:55 -04:00
Ilia Mirkin
ee6b95c82c freedreno/a3xx: add support for dual-source blending
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-16 15:42:54 -04:00
Eric Anholt
cfa980f493 vc4: convert from tgsi semantic/index to varying-slot
(originally part of previous patch, split out to separate patch by Rob)

v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:07:08 -04:00
Eric Anholt
8fd3e53f3d gallium/ttn: Convert to using VARYING_SLOT_* / FRAG_RESULT_*.
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.

v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
    helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
    frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:03:53 -04:00
Ilia Mirkin
7a275fcda8 nv50, nvc0: fix max texture buffer size to 128M elements
This is what the hardware supports, there never was any sort of 64K
limit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-16 12:51:58 -04:00
Ilia Mirkin
eb081681df st/mesa: avoid integer overflows with buffers >= 512MB
This fixes failures with the newly-submitted max-size texture buffer
piglit test for GPUs exposing >= 128M max texels.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-16 12:51:58 -04:00
Brian Paul
1aff899a87 mesa: move GL_APPLE_object_purgeable functions to new file
Move this code out of bufferobj.c since it's not strongly connected to
buffer objects.

Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-16 09:02:40 -06:00
Brian Paul
8faed71830 mesa: remove trailing whitespace in bufferobj.c
Trivial.
2015-09-16 08:53:21 -06:00
Brian Paul
edc01c6704 mesa: whitespace, line wrap fixes in varray.c
Trivial.
2015-09-16 08:53:21 -06:00
Rob Clark
aecbc93f2d nir/print: print symbolic names from shader-enum
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 10:15:35 -04:00
Rob Clark
840df72f93 nir/print: bit of state refactoring
Rename print_var_state to print_state, and stuff FILE ptr into the state
object.  This avoids passing around an extra parameter everywhere.

v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-16 10:15:17 -04:00
Rob Clark
f2533f2f8c glsl: shader-enum to name debug fxns
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 10:04:13 -04:00
Rob Clark
5bb41d9094 freedreno: one screen to rule them all
Similar to fee0686c21, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.

Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 09:14:39 -04:00
Rob Clark
b3958f9f83 freedreno/ir3: use NIR to lower ffract instead of tgsi_lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 08:28:18 -04:00
Rob Clark
d9efe40dc9 nir: add lowering for ffract
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 08:27:36 -04:00
Jordan Justen
47e18a5957 i965/fs: The barrier send uses only 1 payload register
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.

fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.

These issues were flagged by:

commit cadd7dd384
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Thu Jul 2 15:41:02 2015 -0700

    i965/fs: Add a very basic validation pass

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-15 15:41:07 -07:00
Jason Ekstrand
cb503c3227 nir/builder: Use a normal temporary array in nir_channel
C++ gets cranky if we take references of temporaries.  This isn't a problem
yet in master because nir_builder is never used from C++.  However, it will
be in the future so we should fix it now.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 14:51:05 -07:00
Rob Clark
18385bc3ac freedreno/a4xx: more texture formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
d85267c4bb freedreno/a4xx: border-color support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
f8222724f5 freedreno/a4xx: wire up texture clamp lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
9124a49d54 freedreno: helper for a3xx/a4xx border-colors
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
76977222af freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:00 -04:00
Jason Ekstrand
29348631fe nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions.  We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction.  With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
   instructions in affected programs:     143274 -> 141801 (-1.03%)
   helped:                                667
   HURT:                                  0

It turns out that dot-products matter...

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
a88ce0c1c4 i965/vec4: Use the replicated fdot instruction in NIR
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
47739c7df4 nir: Add a fdot instruction that replicates the result to a vec4
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
2458ea95c5 nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed.  This adds code to try and
coalesce into the destination of the instruction providing the value.

Shader-db results for vec4 shaders on Haswell:

   total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
   instructions in affected programs:     231230 -> 224563 (-2.88%)
   helped:                                1017
   HURT:                                  2

This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>.  Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:07 -07:00
Jason Ekstrand
2b2f1f16a0 nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable.  I think this was a relic
of the way that GLSL IR implements writemasks.  In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing.  In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone.  We can simplify things a good deal (and
make them correct) by removing this "compacting" step.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-15 11:13:48 -07:00
Jason Ekstrand
c951bb8305 i965/vec4_nir: Use partial SSA form rather than full non-SSA
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier.  In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:48 -07:00
Jason Ekstrand
c3f8cde964 nir/lower_vec_to_movs: Handle partially SSA shaders
v2 (Jason Ekstrand):
 - Use nir_instr_rewrite_dest
 - Pass the impl directly into lower_vec_to_movs_block

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:45 -07:00
Jason Ekstrand
b7eeced3c7 nir/lower_vec_to_movs: Pass the shader around directly
Previously, we were passing the shader around, we were just calling it
"mem_ctx".  However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:40 -07:00
Jason Ekstrand
cadd7dd384 i965/fs: Add a very basic validation pass
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's.  We can add more as
we find it to be useful.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-15 11:11:50 -07:00
Jason Ekstrand
0c6df7a1cb i965/fs_surface_builder: Only apply predicate to components that exist
In certain conditions, we have to do bounds-checking in the shader for
image_load_store.  The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds.  However, we were hard-coding 4 components which may not be
correct.  Instead, we should be using size which is the number of
components read.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-15 11:09:48 -07:00
Jason Ekstrand
5182400054 i965/fs: Only read output_components many components when writing an output
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 11:08:12 -07:00
Jason Ekstrand
f55836f567 i965/fs: Set output_components for lowered clip distance outputs
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 11:07:54 -07:00
Nanley Chery
8200793649 mesa/teximage: restrict GL_ETC1_RGB8_OES support to GLES
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:11:14 -07:00
Nanley Chery
48961fa3ba mesa/extensions: restrict GL_OES_EGL_image to GLES
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:00:00 -07:00
Nanley Chery
fe796a1831 mesa/extensions: restrict luminance alpha formats to API_OPENGL_COMPAT
According the GL 3.1 spec, luminance alpha formats are deprecated.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:00:00 -07:00
Thomas Hellstrom
edfb7ed109 gallium/svga: Enable PIPE_FORMAT_L8_UNORM for vgpu10
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-15 09:25:02 -07:00
Emil Velikov
a1ac742f70 egl/dri2: don't leak the fd on dri2_terminate
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:39:02 +01:00
Emil Velikov
bd5bcb5b8c egl/dri2/drm: compact existing device mgmt
Move the fcntl(dupfd_cloexec) to the else branch where it belongs.
Otherwise it's not immediately obvious that the code is hit, only when
an existing device is used.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:37:27 +01:00
Matt Turner
e4f0d26c8c egl/dri2: Close file descriptor on error.
v2: [Emil Velikov]
Rework the error path to a common goto, close only if we own the fd.
v3; [Emil Velikov]
Always close the fd (we either opened the device or dup'd) (Boyan, Ian)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:37:26 +01:00
Ray Strode
4bf151e662 gbm: convert gbm bo format to fourcc format on dma-buf import
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)

This commit addresses the problem in two ways:

1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created

2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.

Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 12:27:45 +01:00
Alejandro Piñeiro
a26e82b81d docs: document INTEL_DEBUG 'optimizer' envvar
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-15 08:33:35 +02:00
Kristian Høgsberg Kristensen
a548c75e31 i965: Move perf_debug code to brw_codegen_*_prog()
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:56:59 -07:00
Kristian Høgsberg Kristensen
84f2ed2cfd i965: Move brw_fs_precompile() to brw_wm.c
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:55:49 -07:00
Kristian Høgsberg Kristensen
dc70c86b9b i965: Move compute shader code around
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:

    brw_vs.c, brw_wm.c brw_gs.c:

        code to drive code generation and implement precompiling and
        cache search.

    genX_<stage>_state.c

        gen specific implementation of the state emission for the shader
        stage.

The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).

To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files.  Finally, move state setup
and atoms to gen7_cs_state.c.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:52:42 -07:00
Anuj Phogat
64e25167ed meta: Abort meta pbo path if TexSubImage need signed unsigned conversion
See similar fix for Readpixels in mesa commit 0d20790. Jason suggested
we need that for TexSubImage as well.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 15:22:37 -07:00
Ilia Mirkin
5877a594d5 nvc0/ir: start offset at texBindBase for txq, like regular texturing
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-14 17:26:25 -04:00
Eric Anholt
64aee8fe9f vc4: Fix build from recent NIR cleanups. 2015-09-14 11:21:07 -04:00
Antia Puentes
b8d2263c83 i965/vec4_nir: Load constants as integers
Loads constants using integer as their register type, like it is
done in FS backend.

No shader-db changes in HSW.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91716
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 12:11:46 +02:00
Antia Puentes
79f1a7ae28 i965/vec4: Fix saturation errors when coalescing registers
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.

This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.

For example,
   mov      vgrf7:D vgrf2:D
   mov.sat  m4:F vgrf7:F

is coalesced to:
   mov.sat  m4:D vgrf2:D

The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.

Shader-db results in HSW (only vec4 instructions shown):

total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs:     74 -> 75 (1.35%)
helped:                                0
HURT:                                  1
GAINED:                                0
LOST:                                  0

Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 12:11:46 +02:00
Tapani Pälli
d1bce52e13 docs: cleanups + mark some work as done
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-14 09:29:30 +03:00
Ilia Mirkin
f0b9d53262 docs: only astc ldr required for ES3.2, not hdr
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-14 02:08:42 -04:00
Ilia Mirkin
67d2d3ba43 st/mesa: emit TXQS, support ARB_shader_texture_image_samples
The image component of the ext is a no-op since there is no image support
in gallium (yet).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 18:24:45 -04:00
Ilia Mirkin
ec3fe42b3a r600g: add support for TXQS tgsi opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-13 18:24:44 -04:00
Ilia Mirkin
4294db90b1 nv50/ir: add support for TXQS tgsi opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-13 18:24:44 -04:00
Ilia Mirkin
f46a53ffa5 gallium: add PIPE_CAP_TGSI_TXQS to let st know if TXQS is supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-13 18:24:37 -04:00
Ilia Mirkin
d173c5e77d tgsi: add a TXQS opcode to retrieve the number of texture samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-09-13 18:24:01 -04:00
Jordan Justen
c4cf824658 glsl/cs: Initialize gl_LocalInvocationIndex in main()
We initialize gl_LocalInvocationIndex based on the extension spec
formula:

    gl_LocalInvocationIndex =
        gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
        gl_LocalInvocationID.y * gl_WorkGroupSize.x +
        gl_LocalInvocationID.x;

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:17 -07:00
Jordan Justen
6823e12d5a glsl/cs: Exclude gl_LocalInvocationIndex from builtin variable stripping
We lower gl_LocalInvocationIndex based on the extension spec formula:

    gl_LocalInvocationIndex =
        gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
        gl_LocalInvocationID.y * gl_WorkGroupSize.x +
        gl_LocalInvocationID.x;

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

We need to set this variable in main(), even if gl_LocalInvocationIndex
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate it as a dead variable.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
2b6cc0395b glsl/cs: Initialize gl_GlobalInvocationID in main()
We initialize gl_GlobalInvocationID based on the extension spec
formula:

    gl_GlobalInvocationID =
        gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
c4d049f646 glsl: Move link_get_main_function_signature to a common location
Also rename to _mesa_get_main_function_signature.

We will call it near the end of compilation to insert some code into
main for initializing some compute shader global variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
34e187ec38 glsl/cs: Don't strip gl_GlobalInvocationID and dependencies
We lower gl_GlobalInvocationID based on the extension spec formula:

    gl_GlobalInvocationID =
        gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

We need to set this variable in main(), even if gl_GlobalInvocationID
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate these as dead variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
c5743a5d7f i965/nir: Support gl_WorkGroupID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
4e454cb7c6 i965/cs: Initialize gl_WorkGroupID variable from payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
4f178f0d8b nir: Add gl_WorkGroupID system variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
f5bb5a1bf1 glsl/cs: Add gl_WorkGroupID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
49f999b9cb i965/nir: Support gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
43624361df i965/cs: Initialize gl_LocalInvocationID from payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
b94b57f7c5 i965/cs: Initialize gl_LocalInvocationID in push constant data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
c7161a3c35 i965/cs: Reserve local invocation id in payload regs
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
62e011d593 nir: Add gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
bf8d6e501c glsl/cs: Add gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Krzesimir Nowak
08ceb5e076 softpipe: Change faces type to uint
This is to avoid needless float<->int conversions, since all
face-related computations are made on integers. Spotted by Emil
Velikov.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 09:50:21 -06:00
Rob Clark
59519c2283 freedreno/ir3: fix compile warn after 1807a08e
New enum to add to switch so compiler doesn't complain.

   commit 1807a08e4f
   Author:     Ilia Mirkin <imirkin@alum.mit.edu>
   AuthorDate: Thu Aug 27 23:05:03 2015 -0400
   Commit:     Ilia Mirkin <imirkin@alum.mit.edu>
   CommitDate: Thu Sep 10 17:38:33 2015 -0400

       nir: add nir_texop_texture_samples and convert from glsl

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-13 11:31:45 -04:00
Rob Clark
bf45a7d28e freedreno/ir3: fix compile break after a4aa25be
Following commit dropped the unused memctx arg:

   commit a4aa25be1e
   Author:     Jason Ekstrand <jason.ekstrand@intel.com>
   AuthorDate: Wed Sep 9 13:24:35 2015 -0700
   Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
   CommitDate: Fri Sep 11 09:21:20 2015 -0700

       nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-13 11:31:30 -04:00
Rob Clark
b88aeff4f5 nir: add nir_channel() to get at single components of vec's
Rather than make yet another copy of channel(), let's move it into nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-13 11:08:27 -04:00
Rob Clark
86358e949e tgsi/scan: add support to figure out max nesting depth
Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
know.  And pretty trivial for scan to figure this out for us.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 11:08:27 -04:00
Kai Wasserbäch
d6fbcf6ee2 r600: Fix llvm build since const buffer changes
In commit f9caabe8f1:

One place in r600_llvm.c was forgotten when replacing
R600_UCP_CONST_BUFFER with R600_BUFFER_INFO_CONST_BUFFER.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91985
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-09-13 07:09:08 +10:00
Jason Ekstrand
1037e0a84f i965/vec4: Don't reswizzle hardware registers
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-12 10:46:26 -07:00
Jason Ekstrand
dd7290cf59 i965/emit: Add assertions for accumulator restrictions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-12 10:46:26 -07:00
Emil Velikov
7852a44e3c docs: add news item and link release notes for 11.0.0
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-12 13:50:33 +01:00
Emil Velikov
c34ed46217 docs: add sha256 checksums for 11.0.0
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c4bae5792b)
2015-09-12 13:48:15 +01:00
Emil Velikov
09223bfa9b docs: Update 11.0.0 release notes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4f1e500150)
2015-09-12 13:48:14 +01:00
Glenn Kennard
ce34048b57 r600: Enable fp64 on chips with native support
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset
of the needed fp64 ops, and don't do GL4 anyway.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 07:32:08 +01:00
Glenn Kennard
d2ca9afd5d r600g: Support I2D/U2D/D2I/D2U
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support.
Uses float intermediate values so only accurate for int24 range, which
matches what the blob does.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 07:30:10 +01:00
Dave Airlie
f9caabe8f1 r600g: lower number of driver const buffers
I'm going to want a driver constant buffer for tess to coordinate
LDS storage, so before I go tackling that I decided to merge the
clip/samplepos and texture info buffers into one. So I can steal
the spare one.

This creates a single constant buffer between the two, with
clip/samplepos taking up a reserved 128 bytes at the start.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 06:56:58 +01:00
Dave Airlie
0337a9b2af r600: define some values for the fetch constant offsets.
This just puts these in one place and #defines them.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 06:56:51 +01:00
Thomas Helland
2e7e3fe55f docs: Update with GLES3.2 entries and status
V2: -Change to "not started" for most entries
    -Add status for multisample_2d_array
    -Change shader_multisample_interpolation to "not_stared"

V3 (idr): Move the GLES 3.2 section after the "Additional functions"
section from GLES 3.1.  Note that GL_KHR_texture_compression_astc_hdr is
done for i965 on gen9+ hardware.  Note that GL_OES_shader_io_blocks is
based on some features from GLSL 1.50.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v2]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-11 18:46:43 -07:00
Krzesimir Nowak
2135aba8d9 softpipe: Constify variables
This commit makes a lot of variables constant - this is basically done
by moving the computation to variable definition. Some of them are
moved into lower scopes (like in img_filter_2d_ewa).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:37:00 -06:00
Krzesimir Nowak
231687c19b softpipe: Constify sp_tgsi_sampler
Add a small inline function doing the casting - this is to make sure
we don't do a cast from some completely unrelated type. This commit
does not make tgsi_sampler parameters const in vfuncs themselves for
now - probably llvmpipe would need looking at before making such a
change.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:54 -06:00
Krzesimir Nowak
ac23116de5 softpipe: Constify sampler and view parameters in mip filters
Those functions actually could always take them as constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:47 -06:00
Krzesimir Nowak
ea764baa61 softpipe: Constify sampler and view parameters in img filters
Those functions actually could always take them as constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:43 -06:00
Krzesimir Nowak
ba72e6cfb8 tgsi, softpipe: Constify tgsi_sampler in query_lod vfunc
A followup from previous commit - since all functions called by
query_lod take pointers to const sp_sampler_view and const sp_sampler,
which are taken from tgsi_sampler subclass, we can the tgsi_sampler as
const itself now.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:38 -06:00
Krzesimir Nowak
ea0fecd1a3 softpipe: Constify some sampler and view parameters
This is to prepare for making tgsi_sampler parameter in query_lod a
const too. These functions do not modify anything in either sampler or
view anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:32 -06:00
Krzesimir Nowak
4ca2896e8e softpipe: Move the faces array from view to filter_args
With that, sp_sampler_view instances are not abused anymore as a local
storage, so we can later make them constant.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:23 -06:00
Jason Ekstrand
ca11c3c0a4 nir/from_ssa: Use instr_rewrite_dest
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
cee29220e3 nir: Add a function for rewriting instruction destinations
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
106a3b2cc3 nir: Only unlink sources that are actually valid
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
a4aa25be1e nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
8c8fc5f833 nir: Fix a bunch of ralloc parenting errors
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader.  In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.

We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:04 -07:00
Jason Ekstrand
794355e771 nir/lower_outputs_to_temporaries: Reparent the output name
We copy the output, make the old output the temporary, and give the
temporary a new name.  The copy keeps the pointer to the old name.  This
works just fine up until the point where we lower things to SSA and delete
the old variable and, with it, the name.  Instead, we should re-parent to
the copy.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 08:55:51 -07:00
Alejandro Piñeiro
d4e29af234 i965/vec4: check writemask when bailing out at register coalesce
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs:     1238390 -> 1191754 (-3.77%)
helped:                                12782
HURT:                                  0
GAINED:                                0
LOST:                                  0

v2: removed some parenthesis, fixed indentation, as suggested by
    Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-11 17:43:22 +02:00
Brian Paul
2c52c794d7 tgsi,softpipe: capitalize the tgsi_sampler_control enum values
We use capitalized enum values everywhere else.
This improves understanding a bit too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 08:50:10 -06:00
Kenneth Graunke
b811085b79 nir: Store some geometry shader data in nir_shader.
This makes it possible for NIR shaders to know the number of output
vertices and the number of invocations.  Drivers could also access
these directly without going through gl_program.

We should probably add InputType and OutputType here too, but currently
those are stored as GL_* enums, and I wanted to avoid using those in
NIR, as I suspect Vulkan/SPIR-V will use different enums.  (We should
probably make our own.)

We could add VerticesIn, but it's easily computable from the input
topology, so I'm not sure whether it's worth it.  It's also currently
not stored in gl_shader (only gl_shader_program), which would require
changes to the glsl_to_nir interface or require us to store it there.

This is a bit of duplication of data...ideally, we would factor these
substructs out of gl_program, gl_shader_program, and nir_shader, creating
a gl_geometry_info class...but it would need to go in a new place (in
src/glsl?) that isn't mtypes.h nor nir.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:05:09 -07:00
Kenneth Graunke
cb2b118e40 nir/builder: Add nir_load_var() and nir_store_var() helpers.
These provide a convenient way to do simple variable loads and stores.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:04:17 -07:00
Kenneth Graunke
4654439fdd glsl: Use hash tables for opt_constant_propagation() kill sets.
Cuts compile/link time of the fragment shader in #91857 by 19%
(16.28 -> 13.05).

I didn't bother with the acp sets because they're smaller, but it
might be worth doing as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Kenneth Graunke
e20f30eb51 i965: Use hash tables for brw_fs_vector_splitting().
Cuts compile/link time of the fragment shader in #91857 by 25%
(21.64 -> 16.28).

v2: Drop unnecessary _mesa_hash_table_destroy call, and use
    refs.ht->entries == 0 rather than ad-hoc checking (suggested by
    Timothy Arceri).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Kenneth Graunke
2fc0ce293a glsl: Use hash tables in opt_constant_variable().
Cuts compile/link time of the fragment shader in bug #91857 by 31%
(31.79 -> 21.64).  It has over 8,000 variables so linked lists are
terrible.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Ian Romanick
4603723722 meta: Use result of texture coordinate clamping operation
Previously the result of the complicated clamp() expression just dropped
on the floor: clamp does not modify any of its parameters.  Looking at
the surrounding code, I believe this is supposed to modify the value of
tex_coord.

This change (along with a change to avoid the use of
brw_blorp_framebuffer) does not affect any existing piglit tests.  I'm
not sure what this clamp is trying to accomplish, so I'm not sure how to
write a test to exercise this path.

I also noticed another bug in this code.  There is no way the array
texture case could possibly work.  This will generate code for the
TEXEL_FETCH macro like:

    #define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord), sample_map[int(2 * fract(coord.x))]);

Since the coord parameter of this macro is a vec2 at all invocations, no
expansion of this macro will even compile.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
767c33e881 meta: Always bind the texture
We may have been called from glGenerateTextureMipmap with CurrentUnit
still set to 0, so we don't know when we can skip binding the texture.
Assume that _mesa_BindTexture will be fast if we're rebinding the same
texture.

v2: Remove currentTexUnitSave because it is now unused.  Suggested by
both Neil and Anuj.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91847
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
86c0a2d574 i915, i965: Silence unused parameter warnings in intel_batchbuffer_advance
These only occurred in release builds, but they occurred in every file
that included intel_batchbuffer.h.  Lots of spam. :(

intel_batchbuffer.h: In function 'intel_batchbuffer_advance':
intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_batchbuffer_advance(struct brw_context *brw)
                                               ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
307d5e5849 i915: Silence unused parameter warning in intel_miptree_create_layout
The for_bo parameter of intel_miptree_create_layout appears to be unused
since 27eedca when Eric removed some Gen5 code (after the i915 and i965
drivers parted ways).

intel_mipmap_tree.c: In function 'old_intel_miptree_create_layout':
intel_mipmap_tree.c:77:35: warning: unused parameter 'for_bo' [-Wunused-parameter]
                             bool for_bo)
                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
5c8aa21309 i915, i965: Silence unused parameter warnings in intel_miptree_unmap_gtt
intel_mipmap_tree.c: In function 'intel_miptree_unmap_gtt':
intel_mipmap_tree.c:777:34: warning: unused parameter 'map' [-Wunused-parameter]
    struct intel_miptree_map *map,
                                  ^
intel_mipmap_tree.c:778:17: warning: unused parameter 'level' [-Wunused-parameter]
    unsigned int level,
                 ^
intel_mipmap_tree.c:779:17: warning: unused parameter 'slice' [-Wunused-parameter]
    unsigned int slice)
                 ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
0412231266 i915: Silence unused parameter warnings
intel_mipmap_tree.c: In function 'old_intel_miptree_unmap_raw':
intel_mipmap_tree.c:726:51: warning: unused parameter 'intel' [-Wunused-parameter]
 intel_miptree_unmap_raw(struct intel_context *intel,
                                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
20915dd2e0 i915: Remove prototype for nonexistent brw_miptree_layout
Hasn't existed in the i915 source since the i915 and i965 drivers parted
ways.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
31f0967fb5 i965: Make intel_miptree_map_raw static
This hasn't been used outside intel_mipmap_tree.c since d5d4ba9 started
using meta instead of the blitter for PBO TexSubImage.  While we're
here, remove the unused brw parameter from the function formerly known
as intel_miptree_unmap_raw.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
68b44dd5b2 i915, i965: Silence unused parameter warnings in intel_mipmap_tree.h
These only occurred in release builds, but they occurred in every file
that included intel_mipmap_tree.h.  Lots of spam. :(

intel_mipmap_tree.h: In function 'intel_miptree_check_level_layer':
intel_mipmap_tree.h:595:59: warning: unused parameter 'mt' [-Wunused-parameter]
 intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
                                                           ^
intel_mipmap_tree.h:596:42: warning: unused parameter 'level' [-Wunused-parameter]
                                 uint32_t level,
                                          ^
intel_mipmap_tree.h:597:42: warning: unused parameter 'layer' [-Wunused-parameter]
                                 uint32_t layer)
                                          ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
094877f9d2 i965: Silence unused parameter warnings in intel_mipmap_tree.c
The target parameter of compute_msaa_layout appears to be unused since
83b83fb when support for CMS textures was added for Gen7.

The brw parameter of intel_get_non_msrt_mcs_alignment appears to be
unused since e92fbdc when the GEN check (along with the "can we fast
clear" decision) was moved to a different function.

intel_mipmap_tree.c: In function 'compute_msaa_layout':
intel_mipmap_tree.c:62:73: warning: unused parameter 'target' [-Wunused-parameter]
 compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
                                                                         ^
intel_mipmap_tree.c: In function 'intel_get_non_msrt_mcs_alignment':
intel_mipmap_tree.c:143:54: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
                                                      ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-10 20:29:50 -07:00
Ian Romanick
38e412d548 i965: Silence unused parameter warnings in intel_fbo.c
intel_fbo.c: In function 'intel_alloc_window_storage':
intel_fbo.c:415:48: warning: unused parameter 'ctx' [-Wunused-parameter]
 intel_alloc_window_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
                                                ^
intel_fbo.c: In function 'intel_nop_alloc_storage':
intel_fbo.c:428:74: warning: unused parameter 'rb' [-Wunused-parameter]
 intel_nop_alloc_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
                                                                          ^
intel_fbo.c:429:32: warning: unused parameter 'internalFormat' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                ^
intel_fbo.c:429:55: warning: unused parameter 'width' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                                       ^
intel_fbo.c:429:69: warning: unused parameter 'height' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                                                     ^
intel_fbo.c: In function 'intel_blit_framebuffer_with_blitter':
intel_fbo.c:790:61: warning: unused parameter 'filter' [-Wunused-parameter]
                                     GLbitfield mask, GLenum filter)
                                                             ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:50 -07:00
Dave Airlie
b46cbc3607 st/mesa: set the vbuffer to NULL if we are skipping it
If we skip a vbuffer we need to make sure we NULL out
the contents, otherwise when it gets passed to the driver
it will get confused.

This was hit by:
GL41-CTS.gpu_shader_fp64.varyings

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-11 03:05:42 +01:00
Jordan Justen
34cff76fc2 i965/cs: Enable barrier in MEDIA_INTERFACE_DESCRIPTOR
Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
barrier() GLSL function.

On Ivy Bridge and Haswell, this allows the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass. On gen8, this enables a similar test with a local group size
of 896 to pass.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
b01d047391 i965/cs: Emit texture surfaces to enable CS sampling
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
1180b79487 i965: Set up sampler state for compute shaders
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
af48612b88 i965/fs: Set first_non_payload_grf in assign_curb_setup
first_non_payload_grf may be updated in assign_urb_setup for FS or
assign_vs_urb_setup for VS.

We need to set this in assign_curb_setup for compute shaders since cs
does not have an assign_cs_urb_setup like assign_urb_setup (fs) or
assign_vs_urb_setup (vs).

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
75d04e561b i965: Support compute shaders in is_scalar_shader_stage()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
2b9c35945a i965: Support CS in update_stage_texture_surfaces
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Ilia Mirkin
bfc5ace5bd i965: enable ARB_shader_texture_image_samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:39:46 -04:00
Ilia Mirkin
55ebaa6d00 i965: add handling for imageSamples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:55 -04:00
Ilia Mirkin
56238305e5 nir: convert glsl imageSamples into a new intrinsic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:52 -04:00
Ilia Mirkin
37c5c86281 glsl: add support for the imageSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:49 -04:00
Ilia Mirkin
0b91bcea98 i965: add support for textureSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:45 -04:00
Ilia Mirkin
0c7fbcb844 glsl: add support for the textureSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:41 -04:00
Ilia Mirkin
fb18ee9ba6 glsl: add ARB_shader_texture_image_samples infrastructure
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:37 -04:00
Ilia Mirkin
1807a08e4f nir: add nir_texop_texture_samples and convert from glsl
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:33 -04:00
Ilia Mirkin
f9052914e9 glsl: add ir_texture_samples texture opcode
Will be used for textureSamples()

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:29 -04:00
Ilia Mirkin
6efae687b7 mesa: add infra for ARB_shader_texture_image_samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:37:05 -04:00
Ian Romanick
284dcad20a i965: Fix typos in license
grep -lr 'sub license' | while read f; do \
    sed --in-place -e 's/sub license/sublicense/' $f ;\
    done

grep -lr 'NON-INFRINGEMENT' | while read f; do \
    sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\
    done

As noted by Matt, both of these changes match the MIT license text found
at http://opensource.org/licenses/MIT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-10 11:36:30 -07:00
Ian Romanick
aa1a5c0c9e i965: Remove horizontal bars from file header comments
Why was that ever a thing?

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-10 11:36:03 -07:00
Brian Paul
a9b143a648 svga: clean up the compile_vs/gs/fs() functions
Sipmlify structure and remove gotos.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
289804515f svga: fix shader variant memory leak
Fixes a small leak in a seldom-hit corner case for VS/FS compilation.
Found with coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
ece33f9687 svga: remove useless MAX2() call
The sum of two unsigned ints is always >= 0.  Found with Coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
bc75fe214d winsys/svga: remove useless assertion
An unsigned int is always >= 0.  Found with Coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Emil Velikov
9de62819c9 docs: add news item and link release notes for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 19:12:38 +01:00
Emil Velikov
ded289e348 docs: add sha256 checksums for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8789dd627c)
2015-09-10 19:10:58 +01:00
Emil Velikov
e3c5aeee71 docs: add release notes for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 32efdc87cb)
2015-09-10 19:10:57 +01:00
Krzesimir Nowak
423a1dca2f docs: Update wrt. textureQueryLod on softpipe
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
60905f2b19 softpipe: Implement and enable textureQueryLod
Passes the shader piglit tests and introduces no regressions.

This commit finally makes use of the refactoring in previous
commits.

v2:
  - adapted the code to changes in previous commits (renames,
    need_cube_convert stuff)
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
263d4a7406 tgsi: Add code for handling lodq opcode
This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.

v2:
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
d71a3be860 softpipe: Add functions for computing relative mipmap level
These functions will be used by textureQueryLod.

v2:

  - renamed mip_level_* funcs to mip_rel_level_* to indicate that
    these functions return mip level relative to base level and
    documented them
  - renamed a level member in sp_filter_funcs struct to relative_level
  - changed mip_rel_level_none and mip_rel_level_nearest to return mip
    level relative to base level, mip_rel_level_linear already did
    that
  - documented clamp_lod function

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
ac3637dda0 softpipe: Split 3D to 2D coords conversion into separate function
This is to avoid tying the conversion to the sampling -
textureQueryLod will need to do the conversion too, but it does not do
any sampling.

So instead of a "get_samples" vfunc, there is just a bool saying
whether the conversion is needed or not. This solution keeps a nice
property of not adding any overhead for the common case (2D textures).

v2:
  - replaced the "convert_coords" vfunc with a "need_cube_convert"
    boolean to avoid overhead of copying arrays in common case
  - removed an unused typedef
  - splitted too long lines in convert_cube
  - const fixes in convert_cube

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
380a3c0804 softpipe: Split code getting a filter into separate function
This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.

v2:
  - adapted to changes in previous commit (renames)
  - simplified conditions a bit
  - updated docs
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
b9bc6c42c9 softpipe: Put mip_filter_func inside a struct
Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning relative mip level.

v2:
  - renamed sp_mip struct to sp_filter_funcs
  - renamed sp_filter_funcs instances from mip_foo to funcs_foo
  - splitted too long lines
  - sp_sampler now holds a pointer to sp_filter_funcs instead of an
    instance of it
  - some const fixes

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
16084cd2cf softpipe: Split compute_lambda_lod into two functions
textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.

v2:
  - changed the "not_clamped" part to "unclamped"
  - corrected "clamp into" to "clamp to"
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
bdc69552ca softpipe: Fix textureLod with nonzero GL_TEXTURE_LOD_BIAS value
The level-of-detail bias wasn't simply added in the explicit LOD case.
This case seems to be tested only in piglit's
fs-texturequerylod-nearest-biased test, which is currently skipped, as
softpipe does not support textureQueryLod at the moment.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:13 -06:00
Krzesimir Nowak
85500fe2e1 tgsi: Remove trailing backslash in comment
It clearly is here by accident.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:13 -06:00
Marek Olšák
b409524fef gallium/radeon: handle PIPE_TRANSFER_FLUSH_EXPLICIT
Basically, do the same thing as for buffer_unmap, but use the explicit range
instead. It's for apps which want to map a whole buffer and mark touched
ranges explicitly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
60ec8fb448 radeonsi: don't update polygon offset state if it has no effect
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
afa752d3f0 radeonsi: decrease the size of si_pm4_state
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
6a684ff67e radeonsi/compute: add buffers to the CS directly
Packets are emitted immediately anyway.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
2176b3b09f radeonsi: only use new versions of LLVM image and sample intrinsics
Just a cleanup I had made a long time ago and forgot about.

v2: use tgsi_is_shadow_target

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
e6d3846dd0 gallium/radeon: drop support for LLVM 3.4
This allows using the new tex instrinsics unconditionally.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
5fbfd8dd23 r600/llvm: remove dead code for LLVM 3.3
LLVM 3.3 has been unsupported for quite a while.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
5c6c5b5246 r600g: use pipe_resource::width0 instead pb_buffer::size
pb_buffer::size was aligned by 29aaab2b5f,
which broke the CMASK code I think.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91881

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
7956eae1c7 radeonsi: enable VGPR spilling on VI
This fixes corruption in Unigine Heaven on VI

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
c6502e880b winsys/amdgpu: calculate the maximum number of compute units
Required for register spilling.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-10 17:14:15 +02:00
Jon TURNEY
adeba943e1 Use IMP_LIB_EXT when checking for LLVM shared libraries
When checking for LLVM shared libraries, use IMP_LIB_EXT for the extension for
shared libraries appropriate to the target, rather than hardcoding '.so'

Also add some comments to explain why we have this circus of pain.

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 15:09:30 +01:00
Rhys Kidd
2c3007652d i965: Resolve GCC sign-compare warning.
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_control_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:805:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) {
                      ^
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_source_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:839:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) {
                      ^
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c: In function 'dump_sampler_state':
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c:382:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (i = 0; i < size / 16; i++) {
                  ^
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c: In function 'brw_pipeline_state_finished':
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:801:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (i != pipeline) {
             ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen7_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1544:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                               ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen8_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1638:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                            ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_hiz':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1771:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                            ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1775:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int layer = 0; layer < mt->level[level].depth; ++layer) {
                                 ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
1c194840fd mesa: Resolve GCC sign-compare warning.
mesa/src/mesa/program/prog_to_nir.c: In function 'setup_registers_and_variables':
/mesa/src/mesa/program/prog_to_nir.c:1059:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < c->prog->NumTemporaries; i++) {
                      ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
32cdb49fe2 glsl: Resolve GCC sign-compare warning.
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:63:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = 0; i < tex->num_srcs; i++) {
                         ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:114:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = proj_index + 1; i < tex->num_srcs; i++) {
                                      ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:53:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (proj_index = 0; proj_index < tex->num_srcs; proj_index++) {
                                       ^
mesa/src/glsl/nir/nir_lower_tex_projector.c:57:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (proj_index == tex->num_srcs)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < num_components; ++i)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:110:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          for (int i = 0; i < num_components; ++i) {
                            ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:139:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             if (i < num_components)
                   ^
mesa/src/glsl/nir/nir_opt_peephole_ffma.c: In function 'get_mul_for_src':
mesa/src/glsl/nir/nir_opt_peephole_ffma.c:130:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (unsigned i = 0; i < num_components; i++)
                           ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
548bf70fd2 mesa: Resolve GCC missing field initializer warning.
Resolve a series of missing field initializer warnings within get_hash_params.py

Of the form:
In file included from mesa/src/mesa/main/get.c:495:0:
mesa/src/mesa/main/get_hash.h:180:5: warning: missing initializer for field
'extra' of 'const struct value_desc' [-Wmissing-field-initializers]
     { GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES, LOC_CUSTOM, TYPE_INT, 0 },
     ^
mesa/src/mesa/main/get.c:165:15: note: 'extra' declared here
    const int *extra;
               ^

This patch addresses some likely code rot around the *extra field, where the
initialization is via C code generated indirectly from a Python script.
It resolves a number of warnings reported by GCC when configured to be pedantic.

$ gcc --version
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2

No piglit regressions on Ironlake.

v2:
- Squash series into a single patch.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Albert Freeman
1691ead1b8 clover: Avoid using typename to allow compilation of clover by clang
When parsing an variable declaration qualified with the typename
keyword, clang attempted to declare a variable with the type of non
type member "enum type type" of module::argument (within the header
file clover/core/module.hpp) instead of the typed member of
module::argument "enum type".

Replaced "typename" with "enum" to force clang to declare the variable
marg_type with type "enum type" of module::argument.

CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-09-10 14:56:40 +01:00
Kenneth Graunke
bf58a2c362 i965: Advertise 65536 for GL_MAX_UNIFORM_BLOCK_SIZE.
Our old value of 16384 is the minimum value.  DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise.  AMD advertises MAX_INT.

Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit.  Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders.  While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.

We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger.  For now, matching
nVidia and Intel/Windows seems like a good plan.

We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.

According to Tapani, there's an unreleased game that asserts on this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 02:26:26 -07:00
Ilia Mirkin
74b86b971f nv50/ir: don't fold immediate into mad if registers are too high
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 05:03:24 -04:00
Ilia Mirkin
ce28ca7133 nv50/ir: fix emission of 8-byte wide interp instruction
This can come up if the target register number is > 63, which is fairly
rare.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 04:30:45 -04:00
Ilia Mirkin
641eda0c79 nv50/ir: r63 is only 0 if we are using less than 63 registers
It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 04:30:45 -04:00
Ilia Mirkin
a072ef8748 nv50/ir: make edge splitting fix up phi node sources
Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so
the mapping between source and the actual BB is by inbound edge order.
So when manipulating edges one has to be extremely careful. We were
insufficiently careful when splitting critical edges which resulted in
the phi nodes being confused as to where their sources were coming from.

This primarily manifests itself with the TXL-lowering logic on nv50,
when it is inside of a conditional. I've been unable to trigger the
issue anywhere else so far. This resolves rendering failures
in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2,
XCOM:Enemy Unknown, Stacking. It also improves the situation in
Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief.
However more work needs to be done there (splitting a lot more edges
solves it, so it's some other sort of RA-related issue).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 03:11:31 -04:00
Ian Romanick
13a974f9ae glsl: Remove ADD_VARYING macro
The purpose of the macro was to create the name_as_gs_input from name.
The previous commit removed the name_as_gs_input from add_varying, so
the macro is unnecessary.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-09 19:15:15 -07:00
Ian Romanick
bd0245b8b2 glsl: Silence unused parameter warnings
builtin_variables.cpp:1062:53: warning: unused parameter 'name_as_gs_input' [-Wunused-parameter]
                                         const char *name_as_gs_input)
                                                     ^
builtin_functions.cpp:4774:47: warning: unused parameter 'intrinsic_name' [-Wunused-parameter]
                                   const char *intrinsic_name,
                                               ^
builtin_functions.cpp:4907:66: warning: unused parameter 'state' [-Wunused-parameter]
 _mesa_glsl_find_builtin_function_by_name(_mesa_glsl_parse_state *state,
                                                                  ^
builtin_functions.cpp:4915:49: warning: unused parameter 'num_arguments' [-Wunused-parameter]
                                        unsigned num_arguments,
                                                 ^
builtin_functions.cpp:4916:49: warning: unused parameter 'flags' [-Wunused-parameter]
                                        unsigned flags)
                                                 ^
ir_print_visitor.cpp:589:37: warning: unused parameter 'ir' [-Wunused-parameter]
 ir_print_visitor::visit(ir_barrier *ir)
                                     ^
linker.cpp:3212:48: warning: unused parameter 'ctx' [-Wunused-parameter]
 build_program_resource_list(struct gl_context *ctx,
                                                ^
standalone_scaffolding.cpp:65:57: warning: unused parameter ‘id’ [-Wunused-parameter]
 _mesa_shader_debug(struct gl_context *, GLenum, GLuint *id,
                                                         ^

v2: Rebase on top of GL_ARB_shader_image_size work (especially
58a86897).  Silence more warnings added by that work.

v3: Remove mention of the removed parameter from comments.  Suggested by
Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "Martin Peres <martin.peres@linux.intel.com>"
2015-09-09 19:15:15 -07:00
Ilia Mirkin
342e68dc60 nvc0: remove BGRA4 format support
Something is wrong with the support somewhere. I couldn't get the blob
driver to use it either, although it happily used RGB5_A1.
teximage-colors works, but WoW seems to fail in the menus for drawing
text.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91526
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 21:54:47 -04:00
Rob Clark
9ce2e30726 gallium/ttn: fix cursor handling vs builder
After inserting instructions the cursor.option becomes _after_instr
(even if it started life as an _after_block).  So we cannot simply stash
the current cursor on the if/loop_stack.  Otherwise we end up inserting
instructions after the endif/endloop in the block preceeding the if/
loop.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-09 17:34:47 -04:00
Ilia Mirkin
e50c01d5af nvc0: keep track of cb bindings per buffer, use for upload settings
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.

So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.

This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 16:29:21 -04:00
Jason Ekstrand
b828f7a27b nir/glsl: Use lower_outputs_to_temporaries instead of relying on GLSL IR
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:38 -07:00
Jason Ekstrand
1dbe4af9c9 nir: Add a pass to lower outputs to temporary variables
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:21 -07:00
Jason Ekstrand
f5e08ab6b1 nir/cursor: Add a constructor for the end of a block but before the jump
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:28:51 -07:00
Hans de Goede
3e9df0e3af nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:

 [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
 [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
 [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12

Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.

To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 12:10:20 -04:00
Hans de Goede
ac066bf65c nv30: Fix color resolving for nv3x cards
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.

This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 11:57:34 -04:00
Rob Clark
30a915bd17 gallium/docs: clairify dmabuf fd ownership
Since debugging issues w/ fd's close()d at the wrong time can be quite
fun, this should probably be made more explicit in the docs.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-09 11:24:56 -04:00
Mauro Rossi
c12ffb30b4 android: radeonsi: add support for sid_tables.h generated sources
This patch is necessary to avoid building error on android,
due to missing sid_tables.h generated sources

v2:[Emil Velikov] Correctly split the lists.

Fixes: fbbebeae10f(radeonsi: inline si_cmd_context_control)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:27:31 +01:00
Mauro Rossi
8056b3ffeb android: Always define __STDC_LIMIT_MACROS.
Analogous to commit 02a4fe22b1 (configure.ac: Always define
__STDC_LIMIT_MACROS.)

v2: [Emil Velikov] keep the LLVM specific __STDC_FORMAT_MACROS

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:26:46 +01:00
Mauro Rossi
5235bfe7b7 android: rename LLVM_VERSION_PATCH to MESA_LLVM_VERSION_PATCH
Fixes: 797f4eacea8(configure.ac: rename LLVM_VERSION_PATCH to avoid
conflict with llvm-config.h)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:26:06 +01:00
Mauro Rossi
e838d91b94 nouveau: android: add space before PRIx64 macro
Otherwise the android build fails with

   error : unable to find string literal operator ‘operator"" PRIx64’

There are several resources referring to the problem, which is related
to c++11, in our case used when building mesa for lollipop.

http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883

I've not investigated all the semantics, some people even suggested a
bug in the gcc compiler,
I just saw the building error was solved with one little space for
lollipop and no side effect when c+11 not used.

v2: [Emil Velikov] add an alternative commit message from Mauro.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:25:35 +01:00
Emil Velikov
d9df8c2fa2 svga: pick all the files into the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-09-09 14:52:34 +01:00
Emil Velikov
0d39279448 auxiliary: rework the python generated sources rules
There are a few bits this commit aims to resolve:

One can generalise the mkdir rule to a simple MKDIR_P $(@D) which will
expand appropriately for even if we change the subdir name, and/or add
new rules. We can also drop the explicit $(srcdir) prefix for the
dependency rules, they they are not strictly required, nor used
elsewhere in mesa.

Finally replace $< with explicit filename to be consistent through the
file, and honour PYTHON_FLAGS.

v2: Add comprehensive commit summary/message (Ian, Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:50 +01:00
Emil Velikov
c373eaedfc glsl: build: remove bogus dependency
v2: rebase on top of the previous commit - don't touch the LOCAL_PATH
prefix for nir_constant_expressions.h

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:47 +01:00
Emil Velikov
a3b05e0492 glsl: build: use makefile.sources variables when possible
Rather than folding one variable within the other only to unwrap them,
just use the ones we need.

v2: bring back LOCAL_PATH prefix for nir_constant_expressions,h

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
2015-09-09 12:48:43 +01:00
Emil Velikov
da5e4559ee glsl: automake: reuse $(NIR_GENERATED_FILES) where possible
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:39 +01:00
Emil Velikov
9e0594418d glsl: automake: rework the sources generation rules
The glsl equivalent of "mesa: automake: rework the source generation
rules". Plus let's make things consistent and always explicitly provide
the header name.

v2: Rebase on top of reverted "remove custom AM_V_LEX/YACC" (Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:33 +01:00
Emil Velikov
fd913f47b7 mesa: automake: rework the source generation rules
Same logic as previous commit applies.

Additionally remove the odd (set -e/mv/INDENT) from the rules.
The last one is the only one we remotely care about, if reading the
generated sources.

Upcoming work from DylanB which will replace the existing python
scripts with ones that produce more readable output anyway.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:29 +01:00
Emil Velikov
96509aa804 mapi: automake: rework the source generation rules
Same logic as previous commit applies. Also fix bogus MESA_MAPI_DIR -
the sources are located in the source dir (duh).

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:25 +01:00
Emil Velikov
449ce5d64f mapi: automake: rework the *api/glapi_mapi_tmp.h rules
Same logic as previous commit applies.

v2: Merge with "inline glapi_gen_mapi define" (Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:18 +01:00
Emil Velikov
d65bd7a7be util: automake: rework the format_srgb.c rule
A handful of changes/cleanups paving the way to bmake support:
 - Remove optional $(srcdir)/ prefix for files in the prereq list.
 - Drop the space after the AM_V_GEN variable.
 - Using $< in a non-suffix rule is a GNU make idiom.
 - Use $(@D) over $(dir $@). The latter is a POSIX standard.

v2: Cosmetic tweaks in the commit summary.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
2015-09-09 12:48:09 +01:00
Emil Velikov
c8984a7a46 xmlpool: 'promote' LOCALEDIR variable
This is the only place in mesa that uses this constuct which seems
to be GNUmake-ism. Attempting to build with POSIX make implementations
(bmake) would fail as below.

--- options.h ---
LOCALEDIR := .
sh: line 2: LOCALEDIR: command not found
*** [options.h] Error code 127

So let's keep things consistent and compatible by making the variable
non target specific.

v2:
 - Bring back LOCALEDIR.
 - Reword the commit message
 - Change mesa-stable tag 10.6 > 11.0

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:04 +01:00
Boyan Ding
63c4b7ee1e egl_dri2: Add support for EGL_KHR_create_contest when using swrast
This requires swrast version >= 3. Also EGL_EXT_create_context_robostness
is supported if __DRI2_ROBUSTNESS extension is found.

Reference: https://bugs.freedesktop.org/show_bug.cgi?id=80821
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:26:48 +01:00
Boyan Ding
6345d2da60 egl_dri2: Use createContextAttribs if swrast version >= 3
v2: Change return type of the new function from int to bool

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:25:55 +01:00
Boyan Ding
b9ea608c1a egl_dri2: Move filling context_attrib array in a separate function
v2: Change return type of the new function from int to bool

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:25:18 +01:00
Marta Lofstedt
b8d6de87f6 mesa: Allow query of GL_VERTEX_BINDING_BUFFER
According to OpenGL ES 3.1 specification table : 20.2 and
OpenGL specification 4.4 table 23.4. The glGetIntegeri_v
functions should report the name  of the buffer bound
when called with GL_VERTEX_BINDING_BUFFER.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-09 09:29:04 +02:00
Marta Lofstedt
ea69ae04db mesa/es3.1: Enable GL_MAX_VERTEX_ATTRIB enums for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-09 09:29:04 +02:00
Kenneth Graunke
0cc331dddd i965/nir: Use nir_system_value_from_intrinsic to reduce duplication.
This code is all pretty much identical.  We just needed the translation
from one enum value to the other.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-08 18:02:16 -07:00
Kenneth Graunke
d5d74d0b86 nir: Add a nir_system_value_from_intrinsic() function.
This converts NIR intrinsics that load system values into Mesa's
SYSTEM_VALUE_* enumerations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-08 18:02:08 -07:00
Kenneth Graunke
8fbc4ae330 i965: Mark topologies with adjacency information as G45+.
These didn't exist on the original 965.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-08 18:00:42 -07:00
Kenneth Graunke
aa18fa30c5 i965: Fix value of _3DPRIM_TRIFAN_NOSTIPPLE.
TRIFAN_NOSTIPPLE has always been 0x16 - 0x15 is marked "Reserved" on all
platforms.  See the 965 PRM, Volume 2, Table 3-1, "3D Primitive Topology
Type Encoding" for a list.

We don't currently use this, and I don't expect we will, but we may as
well not leave the bogus value around.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-08 18:00:40 -07:00
Chris Forbes
70650094ef i965: Add 64-bit dirty flag handling to brw_upload_pull_constants
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-08 18:00:36 -07:00
Chris Forbes
a9df772e0e i965: Add defines for all new Gen7/8 URB opcodes
Tessellation needs to emit URB reads and atomics;

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-08 17:57:54 -07:00
Ben Widawsky
e8a219ab46 i965/gen8+: Skip depth stalls on state change
Docs suggest this is no longer required starting with Gen8.

Perf (no regressions in n=20)
OglMultithread       0.67%
OglTerrainPanInst    0.12%
trex                 0.45%
warsow               0.64%

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-09-08 16:09:52 -07:00
Dave Airlie
6d2ceb10cd r600: don't use shader key without verifying shader type (v2)
Since 7a32652231
r600: Turn 'r600_shader_key' struct into union

we were accessing key fields that might be aliased in the union
with other fields, so we should check what shader type we are
compiling for before using key values from it.

v1.1: make it compile
v2: have caffeine, make it work - we don't set type
until later, so don't reference it until we've set it.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-09 08:42:06 +10:00
Ben Widawsky
f5509874aa i965/skl: Use more compact hiz dimensions
I meant to do this here, but it was in the wrong place:

commit c1151b18f2
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Jun 24 20:07:54 2015 -0700

   i965/skl: Use more compact hiz dimensions

NOTE: Jordan did go back and look at the original mailing list post. I mailed
the right thing, and pushed the wrong one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-09-08 15:36:01 -07:00
Ilia Mirkin
458e55d7c5 st/mesa: increase viewport bounds limits for GL4 hw
According to the ARB_viewport_array spec, GL4 limit is higher than the
GL3 limit. Also take this opportunity to fix the GL3 limit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-08 17:15:02 -04:00
Ilia Mirkin
39df725f73 nvc0: always emit a full shader colormask
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.

Fixes the following piglit on Fermi (Kepler is unaffected):
  fbo-stencil blit GL_DEPTH32F_STENCIL8

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-08 17:13:12 -04:00
Brian Paul
a3b0b3fda5 docs: fix date formatting in index.html 2015-09-08 08:47:01 -06:00
Iago Toral Quiroga
205ff843ff nir: UBO loads no longer use const_index[1]
Commit 2126c68e5c killed the array elements parameter on load/store
intrinsics that was stored in const_index[1]. It looks like that
patch missed to remove this assignment in the UBO path.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-08 09:06:34 +02:00
Hans de Goede
87073c69f3 nv30: Fix max width / height checks in nv30 sifm code
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.

This commit fixes this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-07 16:10:23 -04:00
Chris Wilson
be519c2d50 i965: Disallow fast blit paths for CopyTexImage with PixelTransfer ops
glCopyTexImage behaves similarly to glReadPixels with respect to the
pixel transfer operations. Therefore if any are set we cannot use the
simple blit-only fast paths.

(Though if would be possible to relax the blorp path to handle
pixel zoom, or we can just enhance meta.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviwewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-09-07 20:50:07 +01:00
Jon TURNEY
a1575b55c2 mesa/tests: Remove unneeded X11_CFLAGS
X11_CFLAGS is never defined.  Path to X11 headers is not needed here, so
just remove.

Future work: Using AM_CFLAGS here looks wrong, as this Makefile only builds
C++ files

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-07 10:43:32 +01:00
Jon TURNEY
5f9c72ad23 glxl/tests: Use X11_INCLUDES instead of X11_CFLAGS
X11_CFLAGS is undefined, so these tests will fail to build if x11proto is
installed in a non-standard location.

(See also commits 35189d76, bc93c3798, 54b028ba, d901d7e08, etc.)

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-07 10:43:32 +01:00
Thomas Hellstrom
f1ef89eaab svga: Fix surface view error handling
Make sure errors are correcly propagated.
Also don't flush during state emission if emission fails.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-07 01:25:08 -07:00
Rob Clark
1432a18241 xa: add xa_surface_from_handle2 v2
Like xa_surface_from_handle(), but takes a handle type, rather than
hard-coding 'shared' handle.  This is needed to fix bugs seen with
xf86-video-freedreno with xrandr rotation, for example.  The root issue
is that doing a GEM_OPEN ioctl on a bo that already has a GEM handle
associated with the drm_file will result in two unique handles for the
same bo.  Which causes all sorts of follow-on fail.

v2:
- Add support for for fd handles.
- Avoid duplicating code.
- Bump xa version minor.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2015-09-07 01:25:08 -07:00
Alejandro Piñeiro
00c568f679 i965/nir/vec4: removed unneeded tex src swizzle set
At that point the swizzle should be correct.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-07 10:10:42 +02:00
Ilia Mirkin
ae535cb0bf util: make mesa-sha1.c completely empty when there are no SHA1 impls
My earlier attempt to fix this missed the fact that there was a #else
clause that assumes that you have openssh. This moves the whole thing
under #ifdef HAVE_SHA1 which should avoid this issue.

Fixes: 13bfa5201 (util: always include sha1 into the build)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91898
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
2015-09-07 00:18:12 -04:00
Ilia Mirkin
13bfa52011 util: always include sha1 into the build
SHA1 is now used in all builds when HAVE_SHA1 is defined. Adjust src to
do the same thing, rather than predicating on shader cache.

Fixes: 04e201d0c0 ("mesa: change 'SHADER_SUBST' facility to work with env variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
2015-09-06 16:11:24 -04:00
Ilia Mirkin
e40f32d562 st/mesa: don't fall back to 16F when 32F is requested
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-06 14:15:59 -04:00
Ilia Mirkin
bfd3d5244b st/mesa: properly handle u_upload_alloc failure
vbuf is never null. We want to make sure that a resource was allocated
for the vbuf, which is *vbuf.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-06 11:32:07 -04:00
Ilia Mirkin
a778831735 nouveau: don't mark full range as used on unmap with explicit flush
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:23 -04:00
Ilia Mirkin
c830d193db nv50: avoid using inline vertex data submit when gl_VertexID is used
The hardware only generates vertexid when vertices come from a VBO. This
fixes:

  vertexid-drawelements
  vertexid-drawarrays

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-05 23:04:21 -04:00
Ilia Mirkin
4a025c6bc8 nv50: don't flush vertex arrays when index buffer changes
The index buffer is fed in inline over a pushbuf. It's not related to
vertices or any caching that might be done on them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:18 -04:00
Ilia Mirkin
1f62d36ae2 nv50: rebind bo to bufctx when invalidating idxbuf storage
There is nothing to be done on a dirty idxbuf, but the bo may have
changed, so we have to rebind it to the bufctx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:15 -04:00
Ilia Mirkin
114cc18b98 nv50: clear buffer status on all vertex bufs, not just the first one
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:08 -04:00
Ilia Mirkin
75e34d1df8 nv50: fix drawing from tfb, direct-to-pushbuf submits
The stride was being set to 0, which is illegal (and also non-sensical).
Also we must wait for the buffer to become available for reading as
otherwise a wrong value may be prefetched. Since we must wait for the
buffer anyways, and it's mapped and in GART, we may as well avoid the
annoyance of the indirect pushbuf submit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:03:52 -04:00
Ben Widawsky
5165e464f2 i965: Remove base miplevel from sampler state.
Gen9 changes the meaning of this to coarse LOD quality mode. Although that's a
desirable thing to be setting, it doesn't match the gen8 behavior and this was
unintentional. More importantly, we don't ever use this field. So instead of
getting it "wrong" drop it entirely.

This is a respin of a patch which only [incorrectly] tried to address gen9.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-04 16:05:02 -07:00
Emil Velikov
509ba61d5a docs: add news item and link release notes for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 23:11:40 +01:00
Emil Velikov
f39bc1c828 docs: add sha256 checksums for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e3e2a3e0e5)
2015-09-04 23:10:09 +01:00
Emil Velikov
5685ed72b8 docs: add release notes for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4b05739e9d)
2015-09-04 23:10:07 +01:00
Oded Gabbay
4f2290d161 llvmpipe: convert double to long long instead of unsigned long long
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.

This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-04 17:37:17 -04:00
Hans de Goede
3c6c4d4f29 nv30: Implement color resolve for msaa
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.

And on nv3x we end up using the cpu which is really slow.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 16:07:08 -04:00
Hans de Goede
3329703eb1 nv30: Fix creation of scanout buffers
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.

These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.

nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.

This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 16:07:08 -04:00
Boyan Ding
48de40ce9c vc4: Initialize pack field of qreg to 0 in qir_get_temp
This avoids generation of undefined packing in qir and qpu instructions,
fixing a lot of rendering errors.

Fixes 8b36d107fd (vc4: Pack the unorm-packing bits into a src MUL
instruction when possible.)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 12:16:07 -07:00
Chris Wilson
099f5b3a62 i965: Disallow PixelTransfer operations for tiled-memcpy TexImage/ReadPixels
The tiled memcpy fast paths perform a simple blit (with only a couple of
trivial pixel conversion routines) and do not accommodate PixelTransfer
operations. Therefore if any are set, fallback to the regular routines.
Note that PixelTransfer only applies to TexImage and ReadPixels, not to
GetTexImage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-09-04 20:11:15 +01:00
Iago Toral Quiroga
96ea166308 i965/vec4: Don't unspill the same register in consecutive instructions
If we have spilled/unspilled a register in the current instruction, avoid
emitting unspills for the same register in the same instruction or consecutive
instructions following the current one as long as they keep reading the spilled
register. This should allow us to avoid emitting costy unspills that come with
little benefit to register allocation.

v2:
  - Apply the same logic when evaluating spilling costs (Curro).

v3:
  - Abstract the logic that decides if a register can be reused in a function.
    that can be used from both spill_reg and evaluate_spill_costs (Curro).

v4:
  - Do not disallow reusing scratch_reg in predicated reads (Curro).
  - Track if previous sources in the same instruction read scratch_reg (Curro).
  - Return prev_inst_read_scratch_reg at the end (Curro).
  - No need to explicitily skip scratch read/write opcodes in spill_reg (Curro).
  - Fix the comments explaining what happens when we hit an instruction that
    does not read or write scratch_reg (Curro)
  - Return true early when the current or previous instructions read
    scratch_reg with a compatible mask.

v5:
  - Do not return true early, the loop should not be expensive anyway
    and this adds more complexity (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-04 15:13:49 +02:00
Iago Toral Quiroga
bd6e516fc2 i965: Add a debug option for spilling everything in vec4 code
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-04 12:49:36 +02:00
Francisco Jerez
6cf4142db8 dri/common: Tokenize driParseDebugString() argument before matching debug flags.
Fixes debug string parsing when one of the supported flags is a
substring of another.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-04 12:49:36 +02:00
Francisco Jerez
3d4f75506c dri/common: Fix codestyle of driParseDebugString().
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-04 12:49:36 +02:00
Tapani Pälli
08e9049e3d glsl: error out on ES 3.1 if VS or FS present but not both
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 09:22:24 +03:00
Tapani Pälli
69678953d1 glsl: error on linking if no shaders are attached to program
This applies to OpenGL Core >= 4.5 and OpenGL ES >= 3.1.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 09:01:00 +03:00
Kenneth Graunke
4323e78d3f i965: Improve disassembly of data port read messages.
We now print out the name of the message instead of its numerical
value, and label the message control and surface numbers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
0e23c246c0 i965: Optimize VUE map comparisons.
The entire VUE map is computed based on the slots_valid bitfield;
calling brw_compute_vue_map on the same bitfield will return the
same result.  So we can simply compare those.

struct brw_vue_map is 136 bytes; doing a single 8-byte comparison is
much cheaper and should work just as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
6e03377daf i965/gs: Don't reserve space for clip plane uniforms.
These were only for legacy userclipping, which we no longer support
in geometry shaders.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
fba4823a91 i965: Don't do legacy userclipping in non-compatibility contexts.
According to the GLSL 1.50 specification, page 76:
"The shader must also set all values in gl_ClipDistance that have been
 enabled via the OpenGL API, or results are undefined."

With this patch, we only enable clip distance writes when the shader
actually writes them.  We no longer force a value to be written when
clip planes are enabled in the API.  This could mean the first varying
slot would be used as clip distances - I believe it should be the safe
kind of undefined behavior.

Empirically, it doesn't seem to cause a problem.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
4f4b7c4711 i965: Remove the brw_vue_prog_key base class.
The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys.  So there's no need for a "VUE" key base class.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
3239621825 i965: Virtualize vec4_visitor::emit_urb_slot().
This avoids a downcast of key, which won't exist in the base class soon.

I'm not a huge fan of this patch, but given that we're currently using
inheritance, this seems like the "right" way to do it.  The alternative
is to make key a void pointer in the parent class and continue
downcasting.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
27e83b62bb i965: Store a key_tex pointer in vec4_visitor.
I'm about to remove the base class for VS/GS/HS/DS program keys, at
which point we won't be able to use key->tex anymore.  Instead, we'll
need to store a direct pointer (like we do in the FS backend).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
014b90221a i965: Move legacy clip plane handling to vec4_vs_visitor.
This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.

Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter.  This eliminates the bogus NULL parameter in the
GS case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
082b7f1876 i965: Delete the brw_vue_program_key::userclip_active flag.
There are two uses of this flag.

The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances.  In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value.  Checking if it's
> 0 works for detecting this case.

Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags.  Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
294282aaa6 i965: Remove legacy clip plane handling from geometry shaders.
We only support geometry shaders in core profiles, where gl_ClipVertex
doesn't exist.  Presumably the even older behavior of clipping to
gl_Position isn't supported either.  In fact, GLSL 1.50 page 76 claims:

"The shader must also set all values in gl_ClipDistance that have been
 enabled via the OpenGL API, or results are undefined."

So we don't need to handle legacy clipping in geometry shaders.  I think
Paul added this back when we were considering supporting the old
GL_ARB_geometry_shader4 extension.

This removes a non-orthagonal state dependency on GS compilation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
a2151560b8 i965: Move brw_setup_tex_for_precompile to brw_program.[ch].
This living in brw_fs.{h,cpp} is a historical artifact of us supporting
texturing for fragment shaders before any other stages.  It's kind of
awkward given that we use it for all stages.

This avoids having to include brw_fs.h in geometry shader code in order
to access this function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Tapani Pälli
04e201d0c0 mesa: change 'SHADER_SUBST' facility to work with env variables
Patch modifies existing shader source and replace functionality to work
with environment variables rather than enable dumping on compile time.
Also instead of _mesa_str_checksum, _mesa_sha1_compute is used to avoid
collisions.

Functionality is controlled via two environment variables:

MESA_SHADER_DUMP_PATH - path where shader sources are dumped
MESA_SHADER_READ_PATH - path where replacement shaders are read

v2: cleanups, add strerror if fopen fails, put all functionality
    inside HAVE_SHA1 since sha1 is required

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-04 08:22:37 +03:00
Tapani Pälli
0db323a624 build: add HAVE_SHA1 define when using --with-sha1 option
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2015-09-04 08:05:24 +03:00
Kenneth Graunke
2ace64fd59 i965: Fix copy propagation type changes.
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around".  It didn't account for type conversion moves,
however.  So it would happily turn this:

   mov(8) vgrf6:D, -vgrf5:D
   mov(8) vgrf7:F, vgrf6:UD

into this:

   mov(8) vgrf6:D, -vgrf5:D
   mov(8) vgrf7:D, -vgrf5:D

which erroneously drops the conversion to float.

Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-03 21:12:54 -07:00
Dave Airlie
5fa5a012b1 r600: fix loop overrun in cayman_mul_double_instr
Coverity warned about this. Ilia pointed it out.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-04 08:02:14 +10:00
Ben Widawsky
b05619c627 i965/gen9: Annotate input coverage mask change
As far as I can tell, the behavior is preserved from the previous generations.
Before we set a single bit to tell the FS whether or not we'll be using an input
coverage mask. Now we have some options which are implementing various
extensions. These bits are used for the various conservative rasterization
mechanisms (for collision detection, binning, and whatever else).

I believe that the behavior is preserved because the problem which conservative
rasterization is attempting to fix would go away with the "NORMAL" mode (at the
cost of performance, I believe).

This patch serves as documentation of the change by creating the enums, as well
as giving some of the history with the links here so that the next person who
comes along and looks at it doesn't spend as long as I had to in order to
determine if there is an issue or not.

Previously, this algorithm had been done in software, and this can still be used
as long as we don't export an extension stating otherwise.

References: https://www.opengl.org/registry/specs/NV/conservative_raster.txt
References: https://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-03 11:55:31 -07:00
Brian Paul
70dbdca15f svga: update call to u_upload_alloc()
u_upload_alloc() no longer returns a return value.

Trivial.
2015-09-03 11:24:24 -06:00
Marek Olšák
efea7c3a3f winsys/radeon: remove exported buffers from the cache
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:45 +02:00
Marek Olšák
54964c7751 winsys/amdgpu: remove exported buffers from the cache
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:42 +02:00
Marek Olšák
35d0f12797 gallium/pb_bufmgr_cache: add a way to remove buffers from the cache explicitly
This must be done before exporting a buffer as dmabuf fds, because
we lose track of who is using it and can't trust the reference counter.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:40 +02:00
Marek Olšák
44dbaa1746 u_upload_mgr: remove the return value from u_upload_data
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:50 +02:00
Marek Olšák
0c5df863ba u_upload_mgr: remove the return value from u_upload_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:48 +02:00
Marek Olšák
b4f7639955 u_upload_mgr: remove the return value from u_upload_alloc_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:43 +02:00
Marek Olšák
8c6ff05517 u_upload_mgr: remove the return value from u_upload_alloc
The return buffer or the returned pointer can be used instead.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:09 +02:00
Marek Olšák
6c1e368cf3 u_upload_mgr: optimize u_upload_alloc
This is probably the most called util function. It does almost nothing,
yet it can consume 10% of the CPU on the profile. This drops it down to 5%.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:09:13 +02:00
Grazvydas Ignotas
722ce74743 gallium/radeon: remove 'dirty' member from r600_atom
It's no longer used by both r600 and radeonsi now.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:51 +02:00
Grazvydas Ignotas
ccbc7952a4 r600g: simplify dirty atom tracking
Now that R600_NUM_ATOMS is below 64, dirty atom tracking can be
simplified.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:42 +02:00
Grazvydas Ignotas
6ef4572937 r600g: start numbering atoms from 1
There doesn't seem any reason to start from 4.
Start from 1 instead (0 is left reserved to catch uninitialized atoms).

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:29 +02:00
Grazvydas Ignotas
4d9af438bc r600g: make all viewport states use single atom
Similarly to scissor states, we can use single atom to track all viewport
states. This will allow to simplify dirty atom handling later.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:14 +02:00
Grazvydas Ignotas
fbb423b433 r600g: apply disable workaround on all scissors
During review of the "r600g: make all scissor states use single atom" patch
Marek Olšák noticed that scissor disable workaround should be applied on
all scissor states and not just first one, so let's do so.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:05:58 +02:00
Grazvydas Ignotas
7d475bad66 r600g: make all scissor states use single atom
As suggested by Marek Olšák, we can use single atom to track all scissor
states. This will allow to simplify dirty atom handling later.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:05:54 +02:00
Neil Roberts
ce181aea6c mesa/pbo: Handle zero width, height or depth when validating access
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.

This was causing GL errors to be generated during the Piglit
texsubimage test, although the test was passing anyway.

v2: Also check for width == 0. Don't validate the start pointer if any
    of the dimensions are zero.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-03 17:00:54 +01:00
Kenneth Graunke
30e84530a0 glsl: Remove unused total_attribs_size variable.
Accidentally left behind by my previous patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-03 00:56:18 -07:00
Kenneth Graunke
c3294ca5a1 glsl: Handle attribute aliasing in attribute storage limit check.
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.

So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes.  Instead, we need to look at the
enabled slots.

This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots).  We then
count normal attributes once, and count the double-size attributes
a second time.

Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.

No Piglit changes on llvmpipe (which actually supports dvecs).

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-02 23:28:20 -07:00
Ian Romanick
6e37304521 i965/meta: Fix typo in comment
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-02 16:24:18 -07:00
Ian Romanick
7237c937af mesa: Don't allow wrong type setters for matrix uniforms
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4.  Both are illegal.  That later also
overwrites the storage for the mat4 and causes bad things to happen.

Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-02 16:24:17 -07:00
Ian Romanick
a6976f0972 mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type
This matches _mesa_uniform, and it enables the bug fix in the next
patch.

v2: s/type/basicType/ in the assert in _mesa_uniform_matrix.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au> [v1]
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-02 16:24:17 -07:00
Ian Romanick
882aab00ab mesa: Silence unused parameter warnings in bufferobj.c
main/bufferobj.c: In function 'count_buffer_size':
main/bufferobj.c:520:26: warning: unused parameter 'key' [-Wunused-parameter]
 count_buffer_size(GLuint key, void *data, void *userData)
                          ^
main/bufferobj.c: In function 'flush_mapped_buffer_range_fallback':
main/bufferobj.c:740:56: warning: unused parameter 'index' [-Wunused-parameter]
                                    gl_map_buffer_index index)
                                                        ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
8ba3b7661b mesa: Remove target parameter from _mesa_handle_bind_buffer_gen
main/bufferobj.c: In function '_mesa_handle_bind_buffer_gen':
main/bufferobj.c:915:37: warning: unused parameter 'target' [-Wunused-parameter]
                              GLenum target,
                                     ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
1e4d3d25ff i965: Make gen7_enable_hw_binding_tables static
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-09-02 16:24:17 -07:00
Ian Romanick
97ce8bd437 i965: Make gen8_upload_state_base_address static
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-09-02 16:24:17 -07:00
Ian Romanick
4ff9e599cb linker: Silence GCC unused parameter warnings
linker.cpp:320:55: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_function *ir)
                                                       ^
linker.cpp:327:53: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_return *ir)
                                                     ^
linker.cpp:333:49: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_enter(ir_if *ir)
                                                 ^
linker.cpp:339:49: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_if *ir)
                                                 ^
linker.cpp:345:51: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_enter(ir_loop *ir)
                                                   ^
linker.cpp:351:51: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_loop *ir)
                                                   ^
linker.cpp:2824:53: warning: unused parameter 'ctx' [-Wunused-parameter]
 link_calculate_subroutine_compat(struct gl_context *ctx, struct gl_shader_program *prog)
                                                     ^
linker.cpp:2854:47: warning: unused parameter 'ctx' [-Wunused-parameter]
 check_subroutine_resources(struct gl_context *ctx, struct gl_shader_program *prog)
                                               ^
linker.cpp:3368:49: warning: unused parameter 'ctx' [-Wunused-parameter]
 link_assign_subroutine_types(struct gl_context *ctx,
                                                 ^

Also make link_assign_subroutine_types static since it is only called
from this file.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
8fafb0a67f mesa: Fix warning about static being in the wrong place
Because the compiler already has enough things to complain about.

    grep -rl 'const static' src/ | while read f
    do
        sed --in-place -e 's/const static/static const/g' $f
    done

brw_eu_emit.c: In function 'brw_reg_type_to_hw_type':
brw_eu_emit.c:98:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
       const static int imm_hw_types[] = {
       ^
brw_eu_emit.c:120:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
       const static int hw_types[] = {
       ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Jordan Justen
06ada493fb i965/cs: Setup push constant data for uniforms
brw_upload_cs_push_constants was based on gen6_upload_push_constants.

v2:
 * Add FINISHME comments about more efficient ways to push uniforms

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-09-02 14:17:24 -07:00
Jordan Justen
4bdd5e09c3 meta: Save/restore compute shaders
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-02 14:17:24 -07:00
Charmaine Lee
4a9480b64a svga: fix referencing a NULL framebuffer cbuf
Check for a valid framebuffer cbuf pointer before accessing its
associated surface.

Fix piglit test fbo-drawbuffers-none.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-02 13:22:42 -06:00
Charmaine Lee
5a5e5e3959 svga: increment texture age when surface is to be marked as dirty
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture(). This implies a surface can now be reused for a
render buffer. Currently, when we render to a texture, we mark the
surface as dirty. But in svga_mark_surface_dirty(), if the surface
is already marked as dirty, it does not increment the texture age.
Any view to this texture might not be updated properly then.

With this patch, the texture age is incremented regardless of whether
the surface is already marked as dirty or not.

Fix bug 1499181.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-09-02 13:22:42 -06:00
Charmaine Lee
b2fd41ce46 svga: fix backed surface view regression
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture() and exposes a bug in the backed surface view
creation.  Currently a backed surface view for a conflicted surface view
is created at framebuffer emit time. But if shader sampler views are changed
but framebuffer surface views remain unchanged, emit_framebuffer() will not
be called and conflicted surface views will not be detected.

To fix this, also check for conflicted surface views when setting sampler
views. If there is any conflicted surface views, enable the
framebuffer dirty bit so that the framebuffer emit code has a chance to
create a backed surface view for the conflicted surface view.

Fix cinebench-r11-test regression.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-02 13:22:42 -06:00
Matt Turner
9390cb8459 i965/fs: Handle MRF destinations in lower_integer_multiplication().
The lowered code reads from the destination, which isn't possible from
message registers.

Fixes the following dEQP tests on SNB:

    dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
    dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
    dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by:  Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-02 11:52:10 -07:00
Brian Paul
4fd314852c docs: document VMware OpenGL 3.3 support
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
e054251ed1 svga: update driver for version 10 GPU interface
This is a squash commit of roughly two years of development work.
Authors include:
  Brian Paul
  Charmaine Lee
  Thomas Hellstrom
  Jakob Bornecrantz
  Sinclair Yeh
  Mingcheng Chen
  Kai Ninomiya
  MengLin Wu

The driver supports OpenGL 3.3.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
656dac120d svga: add new version 10 device command prototypes
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
e8c20d97eb svga: add new svga_streamout.h file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:24 -06:00
Brian Paul
8ddf98d671 svga: add new svga_state_tgsi_transform.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:24 -06:00
Brian Paul
26d8bae889 svga: add new svga_state_sampler.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
a633948e7e svga: add new svga_state_gs.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
ff85bcdba2 svga: add new svga_pipe_streamout.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
7ce20cf59a svga: add new svga_pipe_gs.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
9cb2d9ddfa svga: add new svga_link.[ch] files
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
53d07910c3 svga: add new svga_cmd_vgpu10.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
35bb29d499 svga: add new svga_tgsi_vgpu10.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
1c5468e9c0 svga: remove unused SVGA3D_* command functions
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
133a47107c gallium/st: add pipe_context::get_timestamp()
The VMware svga driver doesn't directly support pipe_screen::get_timestamp()
but we can do a work-around.  However, we need a gallium context to do so.
This patch adds a new pipe_context::get_timestamp() function that will only
be called if the pipe_screen::get_timestamp() function is NULL.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
e2a1d21cb6 svga/winsys: Add support for VGPU10
This involves a few driver modifications to keep things building.
The driver may not actually run properly at this point.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
c191b507cb svga: update the svga3d device header files
Remove some obsolete svga_dump.c code for items which no longer exist.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
3a92526704 svga: add new version 10 device header files
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
75f92e28b4 winsys/svga: add new vmw_query.c[h] files
Functions for creating, destroying, getting queries, etc.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Chris Wilson
f30cf3258e meta: Compute correct buffer size with SkipRows/SkipPixels
If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:

==28136== Invalid read of size 8
==28136==    at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136==    at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==

Fixes regression from commit 7f396189f0
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jan 5 18:17:04 2015 -0800

    meta: Add a BlitFramebuffers-based implementation of TexSubImage

v2: However, the teximage we create does need to be width x full_height x 1

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>
2015-09-02 10:08:39 +01:00
Alejandro Piñeiro
4de86e1371 i965/vec4: fill src_reg type using the constructor type parameter
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.

This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs:     2918 -> 2833 (-2.91%)
helped:                                79
HURT:                                  0
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-02 09:59:47 +02:00
Glenn Kennard
d2cab815b4 r600g: Add doubles support for CYPRESS
This doesn't enable the support, just adds some of
the code, so we don't have to keep rebasing.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:34:39 +10:00
Dave Airlie
3be5ee1574 r600g: add doubles support for CAYMAN
Only a subset of AMD GPUs supported by r600g support doubles,
CAYMAN and CYPRESS are probably all we'll try and support, however
I don't have a CYPRESS so ignore that for now.

This disables SB support for doubles, as we think we need to
make the scheduler smarter to introduce delay slots.

[airlied: pushing this to avoid pain of rebasing, it mostly
works on cayman only so far, Glenn has some ideas about
delay slot issues we need to look into. turned off by
default for now]

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:06:18 +10:00
Dave Airlie
ee67fd70c2 tgsi/scan: add uses_doubles to tgsi scanner
This allows drivers to work out if a shader contains any
double opcodes easily.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:06:13 +10:00
Glenn Kennard
3bfa345c1e r600g: add multiple stream support for geom shaders
This patch is taken from work by Glenn and myself,
and I've spent some time making it all work here.

This adds support for the multiple streams part of
ARB_gpu_shader5 to r600g.

It doesn't enable ARB_gpu_shader5 yet.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
3d497e0d91 r600g/sb: add support for multiple streams to SB backend
This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
d503bbbf30 r600g: add support for streams to the assembler.
This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
90ac5fb6bb r600g/sb: dump sampler/resource index modes for textures.
This just aids debugging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
32769ac016 mesa/readpixels: check strides are equal before skipping conversion
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.

This fixes all the core mesa problems with CTS packed_pixels
tests.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:34:21 +10:00
Dave Airlie
b4a70401f5 texcompress_s3tc/fxt1: fix stride checks (v1.1)
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.

align the rowlength to the pack alignment before comparing.

This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.

v1.1: add fxt1 fix (Iago)

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:32:26 +10:00
Dave Airlie
6a3e1fb958 st/readpixels: fix accel path for skipimages.
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.

This fixes some memory overruns under CTS
 packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:30:48 +10:00
Dave Airlie
c3c242070e mesa/formats: 8-bit channel integer formats addition
Add enough 8-bit channel formats to handle all the
different things CTS throws at us.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:26:34 +10:00
Dave Airlie
8185a02316 mesa/formats: add some formats from GL3.3
GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.

While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.

This adds the first chunk of formats that hit the
assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:26:13 +10:00
Dave Airlie
5b6c7da460 mesa: handle SwapBytes in compressed texture get code.
This case just wasn't handled, so add support for it.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:17:29 +10:00
Dave Airlie
0ad3a475ef mesa: fix SwapBytes handling in numerous places
In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.

This adds a common routine to swap bytes a 2D image and uses this
code in:

texture storage
texture get
readpixels
swrast drawpixels.

[airlied: updated with Brian's nitpicks].

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:16:43 +10:00
José Fonseca
60aea30115 auxiliary/os: Don't implement os_get_option() on embedded builds.
Let it be defined externally instead, allowing setting mechanisms other
than environment variables.

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
84e71ef2ee util: add a couple primitive restart helper functions
The first function translates prim restart indexes to be 0xffff or
0xffffffff.

The second splits indexed primitives with restart indexes into sub-
primitives without restart indexes.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
14f35194d8 tgsi: add tgsi utility to transform a fragment shader to support aa point
This adds a tgsi utility tgsi_add_aa_point to transform a fragment shader
to support anti-aliased wide point by computing the fragment distance from
the point center. This utility assumes the geometry shader is emitting
an extra generic output with point coord data. The semantic index of
this generic output is passed to the tgsi_add_aa_point utility.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
bca238d4f5 tgsi: adds tgsi utility to transform a shader to support point sprite
This adds a tgsi utility tgsi_add_point_sprite to transform a geometry
shader to emulate wide points by drawing quads. This utility adds an
extra output for the original point position if the point position is
to be written to a stream output buffer. It also assumes the driver will
add a constant for inverse viewport scale after the user defined constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
a65bdf5f47 tgsi: add new tgsi_two_side.c utility code
This could be used by any driver where the device doesn't directly
support two-sided lighting.  This code modifies a fragment shader
to accecpt back-face colors and choose between the front/back colors
depending on the triangle's front-face sign.
2015-09-01 16:29:17 -06:00
Brian Paul
da33c2434b util: add util_strcasecmp() wrapper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
0c4b621590 gallium/util: add a utility to create geometry passthrough shader
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Roland Scheidegger
1754208617 gallium/util: fix returning empty box for rectangle intersection
These functions deal with inclusive coordinates, hence a 0/0/0/0 rect
returned when there's no intersection doesn't actually represent an empty
rectangle. Hence return 0/-1/0/-1 instead.
This fixes some problems in llvmpipe with empty scissor rects (which up
to now didn't really matter because while the intersect test returned the
wrong result all pixels were scissored away later anyway).
2015-09-01 16:29:17 -06:00
Roland Scheidegger
fec4f5de67 gallium/util: return FALSE for intersection if there's empty rectangles
It isn't really obvious if intersection test should take into account empty
rectangles or if the caller should do it. But it looks like most callers
actually verified one of the rects but not the other, but since correctly
returning an empty rect that other rect could actually be empty leading to
more bugs. Hence just verify both rects for emptyness in the intersection
test itself which makes the code easier in the caller (though it will be
slower if the caller knows the rectangles are non-empty).

Reviewed-by: Zack Rusin <zackr@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
1775687637 tgsi: add some more helper functions
This patch adds some more helper functions such as
   . tgsi_transform_temps_decl
   . tgsi_transform_output_decl
   . tgsi_transform_dst_reg
   . tgsi_transform_src_reg

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
f8da1e1459 tgsi: added tgsi_is_shadow_target() helper 2015-09-01 16:29:17 -06:00
Brian Paul
bd883c9070 tgsi: add negate parameter to tgsi_transform_kill_inst()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
56852e925e util: added ffsll() function
v2: fix errant _GNU_SOURCE test, per Matt Turner.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-01 16:29:17 -06:00
Brian Paul
84dad65088 util: added util_set_index_buffer()
Like util_set_vertex_buffers_count(), this basically just copies a
pipe_index_buffer object, taking care of refcounting.
2015-09-01 16:29:17 -06:00
Jason Ekstrand
47b4efc710 mesa: Move gl_vert_attrib from mtypes.h to shader_enums.h
It is a shader enum after all...

Acked-by: Brian Paul <brianp@vmware.com>
2015-09-01 14:45:37 -07:00
Matt Turner
e34834f059 glapi: Inline x86_64_current_tls().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-01 13:23:13 -07:00
Edward O'Callaghan
d351bab9c5 r600g: Simplify out a couple of unnecessary branches
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-09-01 21:55:23 +02:00
Marek Olšák
2d8f7d3c15 radeonsi: use an indirect buffer for init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
df12ddb55d radeonsi: add IB2 indirect buffer support for pm4 states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
8a9ab86ca6 winsys/radeon: add a flag telling how gfx IBs should be padded
This is always false on amdgpu (set by calloc).

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
ba79ff7fa8 winsys/amdgpu: remove IB padding for SI
SI is unsupported by amdgpu

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
0f4688fbe7 radeonsi: remove unused macro si_pm4_set_state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
b89fa63d45 radeonsi: remove si_pm4_cleanup
All remaining pm4 state are created and destroyed by state trackers.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
a9971e85d9 radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
5e2619ef30 radeonsi: use all built-in border colors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
fbbebeae10 radeonsi: inline si_cmd_context_control
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
77f80a20be radeonsi: remove unused si_pm4_state code
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
228e80123a radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
28b34b474e radeonsi: don't send IB dword usage to si_need_cs_space
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
aad43f0768 radeonsi: don't set number of IB dwords for states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
ec9d5e181e radeonsi: don't count IB space for states, just use an upper bound
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
fc95058add radeonsi: convert SPI state to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
7ff2991e34 gallium/radeon: rename r600_context_bo_reloc -> radeon_add_to_buffer_list
this name should be easy to understand without other knowledge

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
d2e63ac042 gallium/radeon: rename write_*_reg functions
e.g. radeon_set_context_reg is nicer and looks consistent next to
radeon_emit().

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0da159ecac radeonsi: rename and precalculate polygon offset states
one less calloc and state construction while drawing

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
45e549fcbc radeonsi: convert CB_TARGET_MASK setup to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
8a67e78bb8 radeonsi: don't set VGT_VTX_CNT_EN twice in init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
e21418f221 radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c44de30979 radeonsi: convert blend color state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
74aa64876b radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
12b205341a radeonsi: convert clip state into an atom
Reducing calloc overhead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0c2eed0ede radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c2a42d1f9f radeonsi: don't rebind GSVS ring buffers every draw call using GS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c9a3196b14 radeonsi: don't clear the tessellation factor ring buffer
Leftover from the bring-up.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
a2c6ae07b4 radeonsi: remove the tf_ring state, add the registers to init_config
One less state to worry about.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0d46c3bc9d radeonsi: remove the gs_rings state, add the registers to init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
87c1e9e19c radeonsi: use a bitmask for tracking dirty atoms
This mainly removes the cache misses when checking the dirty flags.
Not much else though.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
2fe040ee61 radeonsi: initialize atom IDs for external atoms
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
5bb0ad7ccc radeonsi: call si_init_atom for remaining radeonsi atoms
I need to initialize more atom IDs.

This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
e191c58324 radeonsi: initialize atom IDs
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
ba7a6cf626 radeonsi: define the state atom array separately
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
8a97528b3a radeonsi: optimize viewport states
same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
f6a10f60b7 radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
02c8e06497 radeonsi: add SI_MAX_ATTRIBS
PIPE_MAX_ATTRIBS is 32, but we currently only support 16.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
05af645a95 radeonsi: fix memory usage checking for big IBs
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
08775a2196 radeonsi: set all 16 viewport Z bounds for GL 4.1
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
9b510a9652 radeonsi: fix a Unigine Heaven hang when drirc is missing
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
b1e5451211 winsys/amdgpu: use small IBs for better performance on VI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
fc292b5821 gallium/util: add u_bit_scan_consecutive_range
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Chris Wilson
d38a560106 i965: Prevent coordinate overflow in intel_emit_linear_blit
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Apr 15 03:04:33 2015 -0700

    i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.

which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.

Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-01 16:41:07 +01:00
Connor Abbott
1484d8c9aa i965/nir: enable the dead control flow optimization
total instructions in shared programs: 7541551 -> 7541381 (-0.00%)
instructions in affected programs:     3054 -> 2884 (-5.57%)
helped:                                29

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 01:48:04 -07:00
Connor Abbott
aec6744501 nir/dead_cf: add support for removing useless loops
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
    checking for values live after the loop to catch const_load
    instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-01 00:58:17 -07:00
Connor Abbott
019eea1c4f nir: add a helper for iterating over blocks in a cf node
We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
89dc0626bd nir: add nir_block_get_following_loop() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
f649afc9dd nir/dead_cf: delete code that's unreachable due to jumps
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
1e6ad4b027 nir: add an optimization for removing dead control flow
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Dave Airlie
0de53ccc8c r600g: fix calculation for gpr allocation
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.

This is totally my fault as well, maths fail 101.

This makes geom shaders on r600 not fail heavily.

Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-01 16:43:22 +10:00
Marta Lofstedt
f8a938814e mesa: Limit Framebuffer Parameter OpenGL ES 3.1 usage
According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:

"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."

Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
2015-09-01 08:24:37 +03:00
Marta Lofstedt
d770e2746c mesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1
V2: Conform to new standard for exposing enums for OpenGL ES 3.1.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-01 08:19:11 +03:00
Jason Ekstrand
e16531fbe3 nir/builder: Use nir_after_instr to advance the cursor
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node.  Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node.  By just resetting to after_instr, we
avoid all these problems.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-31 18:17:07 -07:00
Nanley Chery
f3a483069a i965: advertise ASTC support for Skylake
v2: remove OES ASTC extension reference.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 17:29:36 -07:00
Nanley Chery
be7f640257 mesa/glformats: recognize ASTC formats as color formats
ASTC formats contain RGBA components.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 17:23:10 -07:00
Nanley Chery
76f17266ec mesa/texformat: use format conversion function in _mesa_choose_tex_format
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:21 -07:00
Nanley Chery
01024ded1e mesa/texcompress: correct mapping of S3TC formats in conversion function
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:08 -07:00
Dave Airlie
3063913f77 r600/sb: update last_cf for finalize if.
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.

I think this fixes one of the earlier shader going off end
of memory problems we've stopped.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-01 07:39:24 +10:00
Matt Turner
a4ba41638d i965/fs: Use greater-equal cmod to implement maximum.
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.

See commit 3b7f683f.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-08-31 11:51:59 -07:00
Ben Widawsky
d2e3638ef9 i965/chv|skl: Apply sampler bypass w/a
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).

Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-31 10:08:43 -07:00
Dave Airlie
78027c965a st/mesa: move to renumbering registers in a group
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:33 +01:00
Dave Airlie
aee73f2942 st/mesa: reduce time spent in calculating temp read/writes
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,

so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:18 +01:00
Dave Airlie
46968c1140 st/mesa: cache tgsi opcode info in the instruction
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:26:23 +01:00
Dave Airlie
03b7ec8778 r600: move prim convert from geom shader to function.
This should avoid C++ fail including this header.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 19:45:13 +10:00
Timothy Arceri
c8bc8d7235 glsl: remove specical case subroutine type counting
Unlike samplers we can get the correct value for subroutines from
component_slots()

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-31 13:10:44 +10:00
Edward O'Callaghan
0d19dc302f r600g: Use TGSI parse results instead of manually exfiltrating
This makes better use of the work that the TGSI API has done for
us.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:14 +02:00
Edward O'Callaghan
3eed81a97b r600g: Set geometry properties in r600_create_shader_state()
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:00 +02:00
Edward O'Callaghan
b4dee1b636 r600g: Move geometry properties state from shader to selector
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:44 +02:00
Edward O'Callaghan
7b6369eb69 r600g: Remove dead assigment to 'gs_input_prim' in shader state
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:26 +02:00
Marek Olšák
7dc8a3497f radeonsi: don't use the emit qt keyword in si_init_atom
It confuses my editor.
2015-08-29 23:18:23 +02:00
Marek Olšák
379e3382e8 radeonsi: remove no-op 32-bit masking
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:21 +02:00
Marek Olšák
437cb1e3f4 gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:08 +02:00
Marek Olšák
e321596e9f winsys/radeon: handle non-zero finite timeout when waiting for buffers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:06 +02:00
Ilia Mirkin
a5a96118ed freedreno/a3xx: implement half-z clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-29 16:18:04 -04:00
Ilia Mirkin
58e24b4761 freedreno/a3xx: add basic clip plane support
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-29 16:18:04 -04:00
Samuel Pitoiset
c8a61ea4fb nvc0: change prefix of MP performance counters to HW_SM
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 11:04:00 +02:00
Samuel Pitoiset
21bdb4d8f3 nvc0: sort performance counter queries by name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:50 +02:00
Samuel Pitoiset
ebca85423c nvc0: make names of performance counter queries consistent
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:44 +02:00
Samuel Pitoiset
981f46aa95 nvc0: use enumerations for driver queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:40 +02:00
Samuel Pitoiset
0eac599001 nvc0: remove commented out code related to PCOUNTER queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:35 +02:00
Dave Airlie
6941883175 r600: port si_conv_prim_to_gs_out from radeonsi
This code was broken by the tess merge, and I totally missed it
until now. I'm not sure this fixes anything but it stops the assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
c149d84d45 r600g: use PRIi64 for some compute debug printfs
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
8d6d0cc17d gallium/util: fix debug_get_flags_option on 32-bit
On 32-bit we need to use PRIu64 flags for printfs,
otherwise this segfaults in R600_DEBUG=help otherwise.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Ilia Mirkin
275c5810ca glsl: provide the option of using BFE for unpack builting lowering
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Ilia Mirkin
889a946a45 glsl: use bitfield_insert instead of and + shift + or for packing
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Matt Turner
c676c432f3 i965/fs: Remove fs_visitor::try_replace_with_sel().
No shader-db changes on g4x, snb, hsw, or bdw.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
64e312d7fa i965/fs: Replace awful variable names.
start_to      -> dst_start
   end_to        -> dst_end
   start_from    -> src_start
   end_from      -> src_end
   var_to        -> dst_var
   var_from      -> src_var
   reg_to        -> dst_reg
   reg_to_offset -> dst_reg_offset
   reg_from      -> src_reg

Not sure how these made sense to me before.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
a2ff1e95a4 i965/fs: Skip blocks in register coalescing interference check.
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f2f8c43af9 i965/fs: Improve register coalescing interference check.
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.

The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.

With that fixed, we can calculate interference more accurately.

total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs:     346282 -> 342880 (-0.98%)
helped:                                1552

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f3d0a894af i965/fs: Use overwrites_reg() instead of dst.equals().
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
8765f1d7dd i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.
Noticed when debugging things that lead to the next patch.

On G45 (and presumably ILK) this helps register coalescing:

total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs:     43751 -> 43718 (-0.08%)
helped:                                52
HURT:                                  2

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Marta Lofstedt
2581fe931a i965/fs: Do not set the size for zero-size uniforms
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them.  This was previously fixed in:

commit: 781dc7c0e1.

However,

commit: 259f7291de

removed the fix.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-28 09:52:59 -07:00
Daniel Scharrer
0516159613 mesa: return old name for deleted samplers for SAMPLER_BINDING queries
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.

Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:39 +02:00
Daniel Scharrer
5aaaaebf22 mesa: add missing queries for ARB_direct_state_access
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:26 +02:00
Neil Roberts
2dbc6a0ad9 docs: Fix a typo in GL3.txt concerning GL_KHR_context_flush_control 2015-08-28 14:29:22 +01:00
Ilia Mirkin
b319fd7c14 mesa: fix dispatch sanity with GL_OES_texture_storage_multisample_2d_array
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91785
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-08-28 03:12:05 -04:00
Vinson Lee
2ef5a4f830 ABI-check: Use more portable bash invocation.
Fixes 'make check' on FreeBSD.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-27 23:48:43 -07:00
Boyan Ding
86c57ebe0e i965/nir: Make use of nir_opt_undef
Shader-db result on Ivy Bridge:
total instructions in shared programs: 145484 -> 145445 (-0.03%)
instructions in affected programs:     225 -> 186 (-17.33%)
helped:                                5
HURT:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-08-27 23:33:49 -07:00
Matt Turner
559b8842fa glapi: Remove _x86_64_get_get_dispatch symbol from x86-64 assembly.
Never used.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-08-27 22:28:49 -07:00
Ilia Mirkin
4a6a47ed05 glsl: clean up textureSize prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-27 23:49:13 -04:00
Glenn Kennard
608c7b4a63 r600g/sb: Don't crash on empty if jump target
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:36 +10:00
Glenn Kennard
a830225adb r600g/sb: Don't read junk after EOP
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.

Add a couple of asserts, and print EOP bit if set in old asm printer.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:32 +10:00
Glenn Kennard
36f1999a87 r600g/sb: Handle undef in read port tracker
e8e443 missed adding check for undef values also in
unreserve function, leading to an assert triggering.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:14 +10:00
Brian Paul
52f7487923 mesa: rename rowStride to imageStride in texturesubimage()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 15:22:01 -06:00
Ilia Mirkin
2259b11100 mesa: only copy the requested teximage faces
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.

zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-27 17:18:43 -04:00
Kenneth Graunke
0a913a9d85 nir: Convert the builder to use the new NIR cursor API.
The NIR cursor API is exactly what we want for the builder's insertion
point.  This simplifies the API, the implementation, and is actually
more flexible as well.

This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.

v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
2015-08-27 13:36:57 -07:00
Kenneth Graunke
3e3cb77901 nir: Convert the NIR instruction insertion API to use cursors.
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point.  It then reworks the existing API to
simply be a wrapper around that for compatibility.

This largely involves moving the existing code into a new function.

Suggested by Connor Abbott.

v2: Make the legacy functions static inline in nir.h (requested by
    Connor Abbott).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
f90c6b1ce0 nir: Move nir_cursor to nir.h.
We want to use this for normal instruction insertion too, not just
control flow.  Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
c44d507752 nir: Strengthen "no jumps" assertions in instruction insertion API.
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.

Previously, we only checked this when the new instruction being inserted
was a jump.  This is a red herring - inserting *any* kind of instruction
after a jump is illegal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Brian Paul
bcae4640c8 st/mesa: use PROGRAM_ARRAY for storing structs containing arrays
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices.  But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.

Before, we failed an assertion:
  state_tracker/st_glsl_to_tgsi.cpp:4900:
  Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-08-27 13:11:26 -06:00
Brian Paul
42c7be5877 glsl: fix comment typo: s/filed/field/ 2015-08-27 13:11:26 -06:00
Brian Paul
3c256f572b gallium/util: fix code formatting in u_blitter.h
Trivial.
2015-08-27 13:11:26 -06:00
Jason Ekstrand
fee0c5af11 i965/fs: Split VGRFs after lowering pull constants
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.

This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:36 -07:00
Jason Ekstrand
f2e667172a i964/fs: Refactor assign_constant_locations
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things.  In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:24 -07:00
Kenneth Graunke
885a9b058c i965: Rename INTEL_DEBUG=vec4vs to INTEL_DEBUG=vec4.
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().

This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.

We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2015-08-27 11:38:50 -07:00
Tapani Pälli
16ad1d2a8d mesa: enable enums for OES_texture_storage_multisample_2d_array
v2: use _mesa_is_gles31(ctx) for verifying we are on ES 3.1,
    remove _es31 usage from get_hash_params.py

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:58:10 +03:00
Tapani Pälli
c2c64fd269 glsl: add support for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample enable bit

Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:54:41 +03:00
Tapani Pälli
b9101b1443 mesa: Add extension enable for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample bit to enable extension

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:53:15 +03:00
Tapani Pälli
f4280b740d glapi: add GL_OES_texture_storage_multisample_2d_array extension
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:52:46 +03:00
Nanley Chery
9a759a6ee0 swrast: add a new macro, FETCH_COMPRESSED
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
42ee16176d mesa: return bool instead of GLboolean in compressedteximage_only_format()
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.

Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
43d5b4db96 i965: refactor miptree alignment calculation code
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.

v2: reintroduce brw.
    don't include functional changes.
    don't adjust function parameters or create a new function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
a687734135 i965: change the meaning of cpp for compressed textures
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.

Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.

v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
    simplify cpp assignment in copy_image_with_blitter().
    update miptree width and offset variables in: intel_miptree_copy_slice(),
        intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
1a9ceed4ba i965: correct mt->align_h for 2D textures on Skylake
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.

v2: add newlines to separate declarations, statments, and comments.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
10ff64fd3d i965: use ALIGN_NPOT for setting ASTC mipmap layouts
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.

v2: handle texture arrays and LDR-only systems.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
54d2aa4258 mesa/macros: move ALIGN_NPOT to macros.h
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.

v2: add greater than 0 assertion (Anuj).
    convert the macro to a static inline function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
97f4efd573 mesa/macros: add power-of-two assertions for alignment macros
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.

v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
8b1f008e9a i965/surface_formats: add support for 2D ASTC surface formats
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.

ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.

v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
    update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
cd49b97a8a mesa/teximage: return the base internal format of the ASTC formats
This is necesary to initialize the gl_texture_image struct.

From the KHR_texture_compression_astc_ldr spec:
  "Added to Section 3.8.6, Compressed Texture Images

   Add the tokens specified above to Table 3.16, Compressed Internal Formats.
   In all cases, the base internal format will be RGBA. The encoding allows
   images to be encoded with fewer channels, but this is always presented as
   RGBA to the sampler."

v2. use _mesa_is_astc_format().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
12b519b457 mesa/teximage: accept ASTC formats for 3D texture specification
The ASTC spec was revised as follows:

   Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
   commands accepting ASTC format tokens in the New Tokens section [...].

Support only exists in the HDR submode:

   Add a second new column "3D Tex." which is empty for all non-ASTC
   formats. If only the LDR profile is supported by the implementation,
   this column is also empty for all ASTC formats. If both the LDR and HDR
   profiles are supported only, this column is checked for all ASTC
   formats.

LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.

v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
23c9cd5a96 mesa/texcompress: enable translation between MESA and GL ASTC formats
v3. conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
692578ed13 mesa/glformats: recognize ASTC formats as compressed
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
4143511b15 mesa: add ASTC extensions to the extensions table
v2: alphabetize the extensions.
    remove OES ASTC extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
582ce1ea97 mesa: don't enable online compression for ASTC formats
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.

v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
e9fd8e154f glapi: add support for KHR_texture_compression_astc_ldr
v2: correct the spelling of the sRGB variants.
    remove spaces around "=" when setting the enum value.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
8ae37365f3 mesa/formats: define the 2D ASTC formats
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().

v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
    only used for generating mipmaps - something ASTC formats do not support
    due to lack of online compression.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Ilia Mirkin
c4cbaca327 nouveau: avoid build failures since 0fc21ecf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 14:04:41 -04:00
Marek Olšák
6924ecac77 gallium/radeon: read_registers should return bool meaning success or failure
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:20 +02:00
Marek Olšák
16e5d8ad38 radeonsi: add IB parser support for CP DMA packets
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
2c14a6d3b1 radeonsi: add IB tracing support for debug contexts
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
189953ee13 radeonsi: remove old CS tracing code
Some of it is left there and it will be re-used in the next commit.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
df6a5666b6 radeonsi: parse and dump status registers on GPU hang
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]

This may print too much information that we might not understand yet,
but some of the bits are very useful.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
61df4f0cd3 radeonsi: add an IB parser
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
be6dc87776 radeonsi: save the contents of indirect buffers for debug contexts
This will be used by the IB parser.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
a6a6c68955 radeonsi: generate register and packet tables for an IB parser from sid.h
This makes writing a good IB parser a lot easier.

It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
d15b71b4bd radeonsi: remove duplicated register definitions and instruction definitions
Instruction encoding isn't needed in Mesa.

The border color address registers were duplicated.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
c59ad265df r600g,radeonsi: remove unused ill-formed register field definitions
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
110873ed11 radeonsi: add an initial dump_debug_state implementation dumping shaders
This is usually called after a draw call.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
93d97db349 radeonsi: allow si_dump_key to write to a file
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
525921ed51 gallium/ddebug: new pipe for hang detection and driver state dumping (v2)
v2: lots of improvements

This is like identity or trace, but simpler. It doesn't wrap most states.

Run with:
  GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.

If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
  /home/$username/dd_dumps/$processname_$pid_$index.

Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.

You can also do:
  GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.

Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.

v2: drop pipe-loader changes, drop SConscript
    rename dd.h -> dd_pipe.h

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
0fc21ecfc0 gallium: add flags parameter to pipe_screen::context_create
This allows creating compute-only and debug contexts.

Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
7b5c92391f gallium: add an interface for dumping debug driver state
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Ilia Mirkin
a3b617a258 mesa: remove pointless es31 checks, fix indirect to only be in es31
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 12:37:38 -04:00
Ilia Mirkin
332fb341dd mesa: uncomment checks in es31 computation, add texture_ms
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
2015-08-26 12:37:17 -04:00
Marek Olšák
f432ae899f mesa: create multisample fallback textures like normal textures
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-26 15:42:26 +02:00
Grazvydas Ignotas
f8b01ae47c radeonsi: mark unreachable paths to avoid warnings
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 15:42:26 +02:00
Tapani Pälli
e0c2ea0337 mesa: GetTexLevelParameter{if}v changes for OpenGL ES 3.1
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.

v2: remove extra is_gles31() checks (suggested by Ilia)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 08:38:25 +03:00
Marta Lofstedt
ae8d0e7abe mesa/es3.1: Allow GL_COMPUTE_WORK_GROUP_SIZE for OpenGL ES 3.1
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Marta Lofstedt
c2a766880d mesa/es3.1: Enable getting MAX_COMPUTE_WORK_GROUP_ values for OpenGL ES 3.1
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Dave Airlie
73e5adc4b2 mesa/formats: pass correct parameter to _mesa_is_format_compressed
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date:   Fri Jul 31 10:26:36 2015 -0700

    mesa/formats: remove compressed formats from matching function

caused a regression in my CTS testing, this looks like a clear
thinko.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 14:13:27 +10:00
Roland Scheidegger
48e6404c04 gallium/auxiliary: optimize rgb9e5 helper some more
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:38 +02:00
Roland Scheidegger
941346a803 gallium/auxiliary: optimize rgb9e5 helper a bit
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
  getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:37 +02:00
Dave Airlie
c1452983b4 mesa/texgetimage: fix missing stencil check
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.

This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 10:22:09 +10:00
Nanley Chery
1d2a844e7d mesa/teximage: Add GL error parameter to _mesa_target_can_be_compressed
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.

Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:53:46 -07:00
Nanley Chery
26c549e69d mesa/formats: remove compressed formats from matching function
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.

v2. Since the exhaustive switch is removed, add a gtest to ensure
    all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
    fix formatting and misc improvements (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:17 -07:00
Nanley Chery
8e581747d2 mesa/formats: make format testing a gtest
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().

v2. indentation and conditional inclusion fixes (Chad).
    allow tests to continue running if any format fails
    and display the failing format name.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:13 -07:00
Kenneth Graunke
1bec29d04d gallium/ttn: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
78856194c1 prog_to_nir: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
5f14c417c8 nir: Use nir_shader::stage rather than passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
d4d5b430a5 nir: Store gl_shader_stage in nir_shader.
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.

Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Jason Ekstrand
dfacae3a56 i965/fs: Combine assign_constant_locations and move_uniform_array_access_to_pull_constants
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c999a58f50 nir/lower_io: Remove assign_var_locations_direct_first
This is no longer used so we might as well get rid of it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
259f7291de i965/fs: Rework uniform handling
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect.  This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform.  This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate.  This should allow us
to more easily make decisions about what to push.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
cfa056c6a5 i965/vec4_nir: Get rid of the uniform_driver_location tracking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
ce5e9139aa nir/lower_io: Separate driver_location and base offset for uniforms
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
0db8e87b4a nir/intrinsics: Add a second const index to load_uniform
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path.  In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load.  Previously, it was completely flattened by
nir_lower_io which made things difficult.  This adds more information to
the intrinsic to make this easier for us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
6c33d6bbf9 nir: Pass a type_size() function pointer into nir_lower_io().
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play.  The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer.  This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
 - One side-effect of passing in a function pointer is that nir_lower_io is
   now aware of and properly allocates space for image uniforms, allowing
   us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
a23f82053d prog_to_nir: Don't allocate nir_variable with type vec4[0] for uniforms.
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.

Should avoid i965 backend assertions in future patches Jason and I are
working on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
640c472fd0 i965: Move type_size() methods out of visitor classes.
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c56899f41a i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
8d8b8f5854 i965: Rename setup_vector_uniform_values to setup_vec4_uniform_value
The new name more accurately represents what it does: Set up a single vec4
uniform value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Rob Clark
0ab29751b6 freedreno/ir3: fix compile break after splitting out nir_control_flow.h
The commit:

  commit b49371b8ed
  Author:     Connor Abbott <cwabbott0@gmail.com>
  AuthorDate: Tue Jul 21 19:54:18 2015 -0700

      nir: move control flow modification to its own file

split out some control flow related APIs into a separate header, but did
not update drivers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:17:30 -04:00
Rob Clark
8b2d0bb844 freedreno/ir3: fix compile break after fxn->start_block removal
The commit:

  commit 8e0d4ef341
  Author:     Kenneth Graunke <kenneth@whitecape.org>
  AuthorDate: Thu Aug 6 18:18:40 2015 -0700

      nir: Delete the nir_function_impl::start_block field.

removed the start_block field without fixing up drivers..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:13:04 -04:00
Dave Airlie
529acab22a mesa: enable texture stencil8 for multisample
This fixes GL45-CTS.gtf44.GL31Tests.texture_stencil8.texture_stencil8_gl44
from the ogl conform suite.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-25 11:06:58 +10:00
Brian Paul
e089ca26e1 mesa: make _mesa_bind_texture_unit() static
It's only called from the file it's defined in.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-24 18:23:19 -06:00
Nanley Chery
8f378d1083 mesa/formats: store whether or not a format is sRGB in gl_format_info
v2: remove extra newline.
v3: use bool instead of GLboolean.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 16:08:01 -07:00
Kenneth Graunke
4f2cdd8497 nir: Use !block_ends_in_jump() in a few places rather than open-coding.
Connor introduced this helper recently; we should use it here too.

I had to move the function earlier in the file for it to be available.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-24 15:10:55 -07:00
Connor Abbott
d7971b41ce nir/cf: reimplement nir_cf_node_remove() using the new API
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
fc7f2d2364 nir/cf: add new control modification API's
These will help us do a number of things, including:

- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:

if (foo) {
    ...
}
if (!foo) {
    ...
}

with:

if (foo) {
    ...
} else {
    ...
}

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
476eb5e4a1 nir/cf: use a cursor for inserting control flow
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
d356f84d4c nir/cf: add split_block_cursor()
This is a helper that will be shared between the new control flow
insertion and modification code.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
58a360c6b8 nir/cf: add split_block_before_instr()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6e47a34b29 nir/cf: add a cursor structure
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().

In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6f5c81f86f nir/cf: fix link_blocks() when there are no successors
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6d028749ac nir/cf: clean up jumps when cleaning up CF nodes
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
211c79515d nir/cf: remove uses of SSA definitions that are being deleted
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
633cbbc068 nir/cf: handle jumps better in stitch_blocks()
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
940873bf22 nir/cf: handle jumps in split_block_end()
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
f596e4021c nir/cf: add block_ends_in_jump()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
788d45cb47 nir/cf: handle phi nodes better in split_block_beginning()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
747ddc3cdd nir/cf: split up and improve nir_handle_remove_jumps()
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
13482111d0 nir/cf: add remove_phi_src() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
f41e108d8b nir: add nir_foreach_phi_src_safe()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
762ae436ea nir/cf: add insert_phi_undef() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
b49371b8ed nir: move control flow modification to its own file
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
1c53f89696 nir: make cleanup_cf_node() not use remove_defs_uses()
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
9d5944053c nir: inline block_add_pred() a few places
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
c7df141c71 nir/validate: check successors/predecessors more carefully
We should be checking almost everything now.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Kenneth Graunke
8e0d4ef341 nir: Delete the nir_function_impl::start_block field.
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly.  Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.

Thanks to Connor Abbott for suggesting this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-24 13:31:41 -07:00
Nanley Chery
9f00af672b mesa/formats: only do type and component lookup for uncompressed formats
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.

v2. include compressed format cases to avoid gcc warnings (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 11:27:46 -07:00
Rob Clark
000e225360 freedreno/a4xx: formats update
Fixes glamor, which wants to use R8 integer textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:16:27 -04:00
Rob Clark
afb6c24a20 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:15:57 -04:00
Chris Wilson
4e5752e2b7 i965: Always re-emit the pipeline select during invariant state emission
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of

commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Wed Apr 22 11:43:50 2015 -0700

    i965/state: Emit pipeline select when changing pipelines

However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.

Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-24 08:57:55 +01:00
Marek Olšák
a83c36b5c0 Revert "radeon/winsys: increase the IB size for VM"
This reverts commit 567394112d.

It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-08-23 19:01:15 +02:00
Ilia Mirkin
e18c29b031 nv50: fix 2d engine blits for 64- and 128-bit formats
This fixes bin/ext_framebuffer_multisample-formats all_samples

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 03:12:07 -04:00
Ilia Mirkin
a6ad49cbbd nv50: account for the int RT0 rule for alpha-to-one/cov
Same as commit 1af0641db but for nvc0. If an integer texture is
bound to RT0, don't do alpha-to-one or alpha-to-coverage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 02:58:58 -04:00
Dave Airlie
45971fd0df mesa/arb_gpu_shader_fp64: add support for glGetUniformdv
This was missed when I did fp64, I've sent a piglit test to cover
the case as well.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-23 15:56:35 +10:00
Ilia Mirkin
abbf05cfc2 nv50,nvc0: disable depth bounds test on blit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 01:39:29 -04:00
Neil Roberts
3a1ab23480 i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.

v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
    when gl_VertexID is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:25:39 -07:00
Neil Roberts
fb02b4ec48 i965: Swap the order of the vertex ID and edge flag attributes
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.

v2: Also implement for BDW+

v3 [by Ben]: Remove 10.5 tag. Too late.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:20:33 -07:00
Glenn Kennard
50932268aa r600g: Fix assert in tgsi_cmp
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=91726

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-08-23 09:31:12 +10:00
Alexander von Gluck IV
5abbd1cacc egl: scons: fix the haiku build, do not build the dri2 backend
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 10:13:31 -05:00
Emil Velikov
a8c5c62359 docs: add 11.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 13:28:16 +01:00
1344 changed files with 104959 additions and 43982 deletions

View File

@@ -42,6 +42,7 @@ LOCAL_CFLAGS += \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
LOCAL_CFLAGS += \
-D__STDC_LIMIT_MACROS \
-DHAVE___BUILTIN_EXPECT \
-DHAVE___BUILTIN_FFS \
-DHAVE___BUILTIN_FFSLL \
@@ -70,7 +71,7 @@ endif
ifeq ($(MESA_ENABLE_LLVM),true)
LOCAL_CFLAGS += \
-DHAVE_LLVM=0x0305 -DLLVM_VERSION_PATCH=2 \
-DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2 \
-D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS \
-D__STDC_LIMIT_MACROS

View File

@@ -32,6 +32,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-vdpau \
--enable-xa \
--enable-xvmc \
--disable-llvm-shared-libs \
--with-egl-platforms=x11,wayland,drm \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast

View File

@@ -1 +1 @@
11.0.0-devel
11.1.0-devel

View File

@@ -74,14 +74,14 @@ LIBDRM_AMDGPU_REQUIRED=2.4.63
LIBDRM_INTEL_REQUIRED=2.4.61
LIBDRM_NVVIEUX_REQUIRED=2.4.33
LIBDRM_NOUVEAU_REQUIRED=2.4.62
LIBDRM_FREEDRENO_REQUIRED=2.4.64
LIBDRM_FREEDRENO_REQUIRED=2.4.65
DRI2PROTO_REQUIRED=2.6
DRI3PROTO_REQUIRED=1.0
PRESENTPROTO_REQUIRED=1.0
LIBUDEV_REQUIRED=151
GLPROTO_REQUIRED=1.4.14
LIBOMXIL_BELLAGIO_REQUIRED=0.0
LIBVA_REQUIRED=0.35.0
LIBVA_REQUIRED=0.38.0
VDPAU_REQUIRED=1.1
WAYLAND_REQUIRED=1.2.0
XCB_REQUIRED=1.9.3
@@ -107,6 +107,8 @@ AC_SYS_LARGEFILE
LT_PREREQ([2.2])
LT_INIT([disable-static])
AC_CHECK_PROG(RM, rm, [rm -f])
AX_PROG_BISON([],
AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.c"],
[AC_MSG_ERROR([bison not found - unable to compile glcpp-parse.y])]))
@@ -533,15 +535,32 @@ AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
dnl
dnl library names
dnl
dnl Unfortunately we need to do a few things that libtool can't help us with,
dnl so we need some knowledge of shared library filenames:
dnl
dnl LIB_EXT is the extension used when creating symlinks for alternate
dnl filenames for a shared library which will be dynamically loaded
dnl
dnl IMP_LIB_EXT is the extension used when checking for the presence of a
dnl the file for a shared library we wish to link with
dnl
case "$host_os" in
darwin* )
LIB_EXT='dylib' ;;
LIB_EXT='dylib'
IMP_LIB_EXT=$LIB_EXT
;;
cygwin* )
LIB_EXT='dll' ;;
LIB_EXT='dll'
IMP_LIB_EXT='dll.a'
;;
aix* )
LIB_EXT='a' ;;
LIB_EXT='a'
IMP_LIB_EXT=$LIB_EXT
;;
* )
LIB_EXT='so' ;;
LIB_EXT='so'
IMP_LIB_EXT=$LIB_EXT
;;
esac
AC_SUBST([LIB_EXT])
@@ -847,7 +866,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
AC_ARG_WITH([gallium-drivers],
[AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
[comma delimited Gallium drivers list, e.g.
"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4"
"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl"
@<:@default=r300,r600,svga,swrast@:>@])],
[with_gallium_drivers="$withval"],
[with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
@@ -937,8 +956,13 @@ gnu*|cygwin*)
dri_platform='drm' ;;
esac
if test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes; then
have_drisw_kms='yes'
fi
AM_CONDITIONAL(HAVE_DRICOMMON, test "x$enable_dri" = xyes )
AM_CONDITIONAL(HAVE_DRISW, test "x$enable_dri" = xyes )
AM_CONDITIONAL(HAVE_DRISW_KMS, test "x$have_drisw_kms" = xyes )
AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xapple )
@@ -973,10 +997,6 @@ if test -n "$with_gallium_drivers" -a "x$enable_glx$enable_xlib_glx" = xyesyes;
NEED_WINSYS_XLIB="yes"
fi
if test "x$enable_dri" = xyes; then
enable_gallium_loader="$enable_shared_pipe_drivers"
fi
if test "x$enable_gallium_osmesa" = xyes; then
if ! echo "$with_gallium_drivers" | grep -q 'swrast'; then
AC_MSG_ERROR([gallium_osmesa requires the gallium swrast driver])
@@ -1110,6 +1130,11 @@ AC_MSG_RESULT([$with_sha1])
AC_SUBST(SHA1_LIBS)
AC_SUBST(SHA1_CFLAGS)
# Enable a define for SHA1
if test "x$with_sha1" != "x"; then
DEFINES="$DEFINES -DHAVE_SHA1"
fi
# Allow user to configure out the shader-cache feature
AC_ARG_ENABLE([shader-cache],
AS_HELP_STRING([--disable-shader-cache], [Disable binary shader cache]),
@@ -1202,7 +1227,8 @@ xyesno)
if test x"$enable_dri3" = xyes; then
PKG_CHECK_EXISTS([xcb >= $XCB_REQUIRED], [], AC_MSG_ERROR([DRI3 requires xcb >= $XCB_REQUIRED]))
dri_modules="$dri_modules xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
dri3_modules="xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
PKG_CHECK_MODULES([XCB_DRI3], [$dri3_modules])
fi
fi
if test x"$dri_platform" = xapple ; then
@@ -1289,6 +1315,16 @@ AC_SUBST(GLX_TLS, ${GLX_USE_TLS})
AS_IF([test "x$GLX_USE_TLS" = xyes -a "x$ax_pthread_ok" = xyes],
[DEFINES="${DEFINES} -DGLX_USE_TLS"])
dnl Read-only text section on x86 hardened platforms
AC_ARG_ENABLE([glx-read-only-text],
[AS_HELP_STRING([--enable-glx-read-only-text],
[Disable writable .text section on x86 (decreases performance) @<:@default=disabled@:>@])],
[enable_glx_read_only_text="$enableval"],
[enable_glx_read_only_text=no])
if test "x$enable_glx_read_only_text" = xyes; then
DEFINES="$DEFINES -DGLX_X86_READONLY_TEXT"
fi
dnl
dnl More DRI setup
dnl
@@ -1533,6 +1569,12 @@ if test "x$enable_egl" = xyes; then
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([egl_dri2 requires --enable-shared-glapi])
fi
if test "x$enable_dri3" = xyes; then
HAVE_EGL_DRIVER_DRI3=1
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([egl_dri3 requires --enable-shared-glapi])
fi
fi
else
# Avoid building an "empty" libEGL. Drop/update this
# when other backends (haiku?) come along.
@@ -1544,6 +1586,8 @@ fi
AM_CONDITIONAL(HAVE_EGL, test "x$enable_egl" = xyes)
AC_SUBST([EGL_LIB_DEPS])
gallium_st="mesa"
dnl
dnl XA configuration
dnl
@@ -1556,7 +1600,7 @@ if test "x$enable_xa" = xyes; then
enabling XA.
Example: ./configure --enable-xa --with-gallium-drivers=svga...])
fi
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st xa"
fi
AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes)
@@ -1601,25 +1645,25 @@ AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
if test "x$enable_xvmc" = xyes; then
PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st xvmc"
fi
AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes)
if test "x$enable_vdpau" = xyes; then
PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st vdpau"
fi
AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes)
if test "x$enable_omx" = xyes; then
PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st omx"
fi
AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes)
if test "x$enable_va" = xyes; then
PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st va"
fi
AM_CONDITIONAL(HAVE_ST_VA, test "x$enable_va" = xyes)
@@ -1641,7 +1685,7 @@ if test "x$enable_nine" = xyes; then
AC_MSG_WARN([using nine together with wine requires DRI3 enabled system])
fi
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st nine"
fi
AM_CONDITIONAL(HAVE_ST_NINE, test "x$enable_nine" = xyes)
@@ -1679,8 +1723,7 @@ if test "x$enable_opencl" = xyes; then
AC_SUBST([LIBCLC_LIBEXECDIR])
fi
# XXX: Use $enable_shared_pipe_drivers once converted to use static/shared pipe-drivers
enable_gallium_loader=yes
gallium_st="$gallium_st clover"
if test "x$enable_opencl_icd" = xyes; then
OPENCL_LIBNAME="MesaOpenCL"
@@ -1960,10 +2003,6 @@ AC_SUBST([XVMC_LIB_INSTALL_DIR])
dnl
dnl Gallium Tests
dnl
if test "x$enable_gallium_tests" = xyes; then
# XXX: Use $enable_shared_pipe_drivers once converted to use static/shared pipe-drivers
enable_gallium_loader=yes
fi
AM_CONDITIONAL(HAVE_GALLIUM_TESTS, test "x$enable_gallium_tests" = xyes)
dnl Directory for VDPAU libs
@@ -2018,14 +2057,8 @@ gallium_require_llvm() {
}
gallium_require_drm_loader() {
if test "x$enable_gallium_loader" = xyes; then
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([Gallium drm loader requires libudev >= $LIBUDEV_REQUIRED or sysfs])
fi
enable_gallium_drm_loader=yes
fi
if test "x$enable_va" = xyes && test "x$7" != x; then
GALLIUM_TARGET_DIRS="$GALLIUM_TARGET_DIRS $7"
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([Gallium drm loader requires libudev >= $LIBUDEV_REQUIRED or sysfs])
fi
}
@@ -2051,7 +2084,7 @@ radeon_llvm_check() {
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
llvm_check_version_for "3" "4" "2" $1
llvm_check_version_for "3" "5" "0" $1
if test true && $LLVM_CONFIG --targets-built | grep -iqvw $amdgpu_llvm_target_name ; then
AC_MSG_ERROR([LLVM $amdgpu_llvm_target_name not enabled in your LLVM build.])
fi
@@ -2139,11 +2172,14 @@ if test -n "$with_gallium_drivers"; then
gallium_require_drm "vc4"
gallium_require_drm_loader
case "$host_cpu" in
i?86 | x86_64 | amd64)
USE_VC4_SIMULATOR=yes
;;
esac
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes], [USE_VC4_SIMULATOR=no])
;;
xvirgl)
HAVE_GALLIUM_VIRGL=yes
gallium_require_drm "virgl"
gallium_require_drm_loader
require_egl_drm "virgl"
;;
*)
AC_MSG_ERROR([Unknown Gallium driver: $driver])
@@ -2163,10 +2199,14 @@ if test "x$MESA_LLVM" != x0; then
LLVM_LIBS="`$LLVM_CONFIG --libs ${LLVM_COMPONENTS}`"
dnl llvm-config may not give the right answer when llvm is a built as a
dnl single shared library, so we must work the library name out for
dnl ourselves.
dnl (See https://llvm.org/bugs/show_bug.cgi?id=6823)
if test "x$enable_llvm_shared_libs" = xyes; then
dnl We can't use $LLVM_VERSION because it has 'svn' stripped out,
LLVM_SO_NAME=LLVM-`$LLVM_CONFIG --version`
AS_IF([test -f "$LLVM_LIBDIR/lib$LLVM_SO_NAME.so"], [llvm_have_one_so=yes])
AS_IF([test -f "$LLVM_LIBDIR/lib$LLVM_SO_NAME.$IMP_LIB_EXT"], [llvm_have_one_so=yes])
if test "x$llvm_have_one_so" = xyes; then
dnl LLVM was built using auto*, so there is only one shared object.
@@ -2174,7 +2214,7 @@ if test "x$MESA_LLVM" != x0; then
else
dnl If LLVM was built with CMake, there will be one shared object per
dnl component.
AS_IF([test ! -f "$LLVM_LIBDIR/libLLVMTarget.so"],
AS_IF([test ! -f "$LLVM_LIBDIR/libLLVMTarget.$IMP_LIB_EXT"],
[AC_MSG_ERROR([Could not find llvm shared libraries:
Please make sure you have built llvm with the --enable-shared option
and that your llvm libraries are installed in $LLVM_LIBDIR
@@ -2212,26 +2252,19 @@ AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test "x$HAVE_GALLIUM_FREEDRENO" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VIRGL, test "x$HAVE_GALLIUM_VIRGL" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test "x$enable_shared_pipe_drivers" = xno)
# NOTE: anything using xcb or other client side libs ends up in separate
# _CLIENT variables. The pipe loader is built in two variants,
# one that is standalone and does not link any x client libs (for
# use by XA tracker in particular, but could be used in any case
# where communication with xserver is not desired).
if test "x$enable_gallium_loader" = xyes; then
if test "x$enable_dri" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRI"
fi
if test "x$enable_gallium_drm_loader" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRM"
fi
AC_SUBST([GALLIUM_PIPE_LOADER_DEFINES])
if test "x$enable_dri" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRI"
fi
if test "x$have_drisw_kms" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_KMS"
fi
AC_SUBST([GALLIUM_PIPE_LOADER_DEFINES])
AM_CONDITIONAL(HAVE_I915_DRI, test x$HAVE_I915_DRI = xyes)
AM_CONDITIONAL(HAVE_I965_DRI, test x$HAVE_I965_DRI = xyes)
AM_CONDITIONAL(HAVE_NOUVEAU_DRI, test x$HAVE_NOUVEAU_DRI = xyes)
@@ -2245,8 +2278,6 @@ AM_CONDITIONAL(NEED_RADEON_DRM_WINSYS, test "x$HAVE_GALLIUM_R300" = xyes -o \
AM_CONDITIONAL(NEED_WINSYS_XLIB, test "x$NEED_WINSYS_XLIB" = xyes)
AM_CONDITIONAL(NEED_RADEON_LLVM, test x$NEED_RADEON_LLVM = xyes)
AM_CONDITIONAL(USE_R600_LLVM_COMPILER, test x$USE_R600_LLVM_COMPILER = xyes)
AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test x$enable_gallium_loader = xyes)
AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
@@ -2317,6 +2348,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
src/gallium/drivers/freedreno/Makefile
src/gallium/drivers/ddebug/Makefile
src/gallium/drivers/i915/Makefile
src/gallium/drivers/ilo/Makefile
src/gallium/drivers/llvmpipe/Makefile
@@ -2331,6 +2363,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/trace/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/virgl/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
src/gallium/state_trackers/glx/xlib/Makefile
@@ -2371,6 +2404,8 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
src/gallium/winsys/vc4/drm/Makefile
src/gallium/winsys/virgl/drm/Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glsl/Makefile
@@ -2464,6 +2499,9 @@ if test "$enable_egl" = yes; then
if test "x$HAVE_EGL_DRIVER_DRI2" != "x"; then
egl_drivers="$egl_drivers builtin:egl_dri2"
fi
if test "x$HAVE_EGL_DRIVER_DRI3" != "x"; then
egl_drivers="$egl_drivers builtin:egl_dri3"
fi
echo " EGL drivers: $egl_drivers"
fi
@@ -2479,7 +2517,8 @@ fi
echo ""
if test -n "$with_gallium_drivers"; then
echo " Gallium: yes"
echo " Gallium drivers: $gallium_drivers"
echo " Gallium st: $gallium_st"
else
echo " Gallium: no"
fi

View File

@@ -96,27 +96,27 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
GL_ARB_draw_buffers_blend DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_draw_indirect DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_gpu_shader5 DONE (i965)
GL_ARB_gpu_shader5 DONE (i965, r600)
- 'precise' qualifier DONE
- Dynamically uniform sampler array indices DONE (r600, softpipe)
- Dynamically uniform UBO array indices DONE (r600)
- Dynamically uniform sampler array indices DONE (softpipe)
- Dynamically uniform UBO array indices DONE ()
- Implicit signed -> unsigned conversions DONE
- Fused multiply-add DONE ()
- Packing/bitfield/conversion functions DONE (r600, softpipe)
- Enhanced textureGather DONE (r600, softpipe)
- Geometry shader instancing DONE (r600, llvmpipe, softpipe)
- Packing/bitfield/conversion functions DONE (softpipe)
- Enhanced textureGather DONE (softpipe)
- Geometry shader instancing DONE (llvmpipe, softpipe)
- Geometry shader multiple streams DONE ()
- Enhanced per-sample shading DONE (r600)
- Interpolation functions DONE (r600)
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe)
GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50, r600)
GL_ARB_shader_subroutine DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_tessellation_shader DONE ()
GL_ARB_texture_buffer_object_rgb32 DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_texture_cube_map_array DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_query_lod DONE (i965, nv50, r600)
GL_ARB_texture_query_lod DONE (i965, nv50, r600, softpipe)
GL_ARB_transform_feedback2 DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_transform_feedback3 DONE (i965, nv50, r600, llvmpipe, softpipe)
@@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe)
GL_ARB_vertex_attrib_64bit DONE (r600, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, r600, llvmpipe)
@@ -149,14 +149,14 @@ GL 4.2, GLSL 4.20:
GL 4.3, GLSL 4.30:
GL_ARB_arrays_of_arrays started (Timothy)
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_copy_image DONE (i965) (gallium - in progress, VMware)
GL_ARB_copy_image DONE (i965, nv50, nvc0, radeonsi)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965)
GL_ARB_internalformat_query2 not started
GL_ARB_invalidate_subdata DONE (all drivers)
@@ -164,12 +164,12 @@ GL 4.3, GLSL 4.30:
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior not started
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object in progress (Iago Toral, Samuel Iglesias)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, llvmpipe, softpipe)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_vertex_attrib_binding DONE (all drivers)
@@ -177,8 +177,14 @@ GL 4.4, GLSL 4.40:
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965) (gallium - in progress, VMware)
GL_ARB_enhanced_layouts not started
GL_ARB_clear_texture DONE (i965, nv50, nvc0)
GL_ARB_enhanced_layouts in progress (Timothy)
- compile-time constant expressions DONE
- explicit byte offsets for blocks in progress
- forced alignment within blocks in progress
- specified vec4-slot component numbers in progress
- specified transform/feedback layout in progress
- input/output block locations in progress
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object not started
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
@@ -194,16 +200,16 @@ GL 4.5, GLSL 4.50:
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples not started
GL_ARB_texture_barrier DONE (nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EXT extension to be useful)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_texture_barrier DONE (i965, nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robust_buffer_access_behavior not started
GL_KHR_robustness 90% done (the ARB variant)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
These are the extensions cherry-picked to make GLES 3.1
GLES3.1, GLSL ES 3.1
GL_ARB_arrays_of_arrays started (Timothy)
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
@@ -212,7 +218,7 @@ GLES3.1, GLSL ES 3.1
GL_ARB_shader_atomic_counters DONE (i965)
GL_ARB_shader_image_load_store DONE (i965)
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object in progress (Iago Toral, Samuel Iglesias)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
@@ -223,10 +229,35 @@ GLES3.1, GLSL ES 3.1
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functions not covered above:
glMemoryBarrierByRegion
glGetTexLevelParameter[fi]v - needs updates to restrict to GLES enums
glGetBooleani_v - needs updates to restrict to GLES enums
Additional functionality not covered above:
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glGetBooleani_v - restrict to GLES enums
gl_HelperInvocation support
GLES3.2, GLSL ES 3.2
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced not started
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness 90% done (the ARB variant)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image not started (based on GL_ARB_copy_image, which is done for some drivers)
GL_OES_draw_buffers_indexed not started
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
http://dri.freedesktop.org/wiki/MissingFunctionality

View File

@@ -2,8 +2,8 @@ The software may implement third party technologies (e.g. third party
libraries) that are not licensed to you by AMD and for which you may need
to obtain licenses from other parties. Unless explicitly stated otherwise,
these third party technologies are not licensed hereunder. Such third
party technologies include, but are not limited, to H.264, MPEG-2, MPEG-4,
AVC, and VC-1.
party technologies include, but are not limited, to H.264, H.265, HEVC, MPEG-2,
MPEG-4, AVC, and VC-1.
For MPEG-2 Encoding Products ANY USE OF THIS PRODUCT IN ANY MANNER OTHER
THAN PERSONAL USE THAT COMPLIES WITH THE MPEG-2 STANDARD FOR ENCODING VIDEO

View File

@@ -87,6 +87,13 @@ created in a <code>lib64</code> directory at the top of the Mesa source
tree.</p>
</dd>
<dt><code>--sysconfdir=DIR</code></dt>
<dd><p>This option specifies the directory where the configuration
files will be installed. The default is <code>${prefix}/etc</code>.
Currently there's only one config file provided when dri drivers are
enabled - it's <code>drirc</code>.</p>
</dd>
<dt><code>--enable-static, --disable-shared</code></dt>
<dd><p>By default, Mesa
will build shared libraries. Either of these options will force static
@@ -217,7 +224,7 @@ GLX.
<dt><code>--with-expat=DIR</code>
<dd><p><strong>DEPRECATED</strong>, use <code>PKG_CONFIG_PATH</code> instead.</p>
<p>The DRI-enabled libGL uses expat to
parse the DRI configuration files in <code>/etc/drirc</code> and
parse the DRI configuration files in <code>${sysconfdir}/drirc</code> and
<code>~/.drirc</code>. This option allows a specific expat installation
to be used. For example, <code>--with-expat=/usr/local</code> will
search for expat headers and libraries in <code>/usr/local/include</code>

View File

@@ -153,6 +153,7 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
</ul>
</ul>
@@ -178,6 +179,14 @@ Mesa EGL supports different sets of environment variables. See the
<li>GALLIUM_HUD - draws various information on the screen, like framerate,
cpu load, driver statistics, performance counters, etc.
Set GALLIUM_HUD=help and run e.g. glxgears for more info.
<li>GALLIUM_HUD_PERIOD - sets the hud update rate in seconds (float). Use zero
to update every frame. The default period is 1/2 second.
<li>GALLIUM_HUD_VISIBLE - control default visibility, defaults to true.
<li>GALLIUM_HUD_TOGGLE_SIGNAL - toggle visibility via user specified signal.
Especially useful to toggle hud at specific points of application and
disable for unencumbered viewing the rest of the time. For example, set
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_SIGNAL_TOGGLE to 10 (SIGUSR1).
Use kill -10 <pid> to toggle the hud as desired.
<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.
rather than stderr.
<li>GALLIUM_PRINT_OPTIONS - if non-zero, print all the Gallium environment

View File

@@ -16,25 +16,96 @@
<h1>News</h1>
<h2>August 22 2015</h2>
<h2>November 21, 2015</h2>
<p>
<a href="relnotes/11.0.6.html">Mesa 11.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>November 11, 2015</h2>
<p>
<a href="relnotes/11.0.5.html">Mesa 11.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>October 24, 2015</h2>
<p>
<a href="relnotes/11.0.4.html">Mesa 11.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>October 10, 2015</h2>
<p>
<a href="relnotes/11.0.3.html">Mesa 11.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>October 3, 2015</h2>
<p>
<a href="relnotes/10.6.9.html">Mesa 10.6.9</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 10.6.9 will be the final release in the 10.6
series. Users of 10.6 are encouraged to migrate to the 11.0 series in order
to obtain future fixes.
</p>
<h2>September 28, 2015</h2>
<p>
<a href="relnotes/11.0.2.html">Mesa 11.0.2</a> is released.
This is a bug-fix release.
</p>
<h2>September 26, 2015</h2>
<p>
<a href="relnotes/11.0.1.html">Mesa 11.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>September 20, 2015</h2>
<p>
<a href="relnotes/10.6.8.html">Mesa 10.6.8</a> is released.
This is a bug-fix release.
</p>
<h2>September 12, 2015</h2>
<p>
<a href="relnotes/11.0.0.html">Mesa 11.0.0</a> is released. This is a new
development release. See the release notes for more information about
the release.
</p>
<h2>September 10, 2015</h2>
<p>
<a href="relnotes/10.6.7.html">Mesa 10.6.7</a> is released.
This is a bug-fix release.
</p>
<h2>September 4, 2015</h2>
<p>
<a href="relnotes/10.6.6.html">Mesa 10.6.6</a> is released.
This is a bug-fix release.
</p>
<h2>August 22, 2015</h2>
<p>
<a href="relnotes/10.6.5.html">Mesa 10.6.5</a> is released.
This is a bug-fix release.
</p>
<h2>August 11 2015</h2>
<h2>August 11, 2015</h2>
<p>
<a href="relnotes/10.6.4.html">Mesa 10.6.4</a> is released.
This is a bug-fix release.
</p>
<h2>July 26 2015</h2>
<h2>July 26, 2015</h2>
<p>
<a href="relnotes/10.6.3.html">Mesa 10.6.3</a> is released.
This is a bug-fix release.
</p>
<h2>July 11 2015</h2>
<h2>July 11, 2015</h2>
<p>
<a href="relnotes/10.6.2.html">Mesa 10.6.2</a> is released.
This is a bug-fix release.

View File

@@ -21,6 +21,17 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/11.0.6.html">11.0.6 release notes</a>
<li><a href="relnotes/11.0.5.html">11.0.5 release notes</a>
<li><a href="relnotes/11.0.4.html">11.0.4 release notes</a>
<li><a href="relnotes/11.0.3.html">11.0.3 release notes</a>
<li><a href="relnotes/10.6.9.html">10.6.9 release notes</a>
<li><a href="relnotes/11.0.2.html">11.0.2 release notes</a>
<li><a href="relnotes/11.0.1.html">11.0.1 release notes</a>
<li><a href="relnotes/10.6.8.html">10.6.8 release notes</a>
<li><a href="relnotes/11.0.0.html">11.0.0 release notes</a>
<li><a href="relnotes/10.6.7.html">10.6.7 release notes</a>
<li><a href="relnotes/10.6.6.html">10.6.6 release notes</a>
<li><a href="relnotes/10.6.5.html">10.6.5 release notes</a>
<li><a href="relnotes/10.6.4.html">10.6.4 release notes</a>
<li><a href="relnotes/10.6.3.html">10.6.3 release notes</a>

164
docs/relnotes/10.6.6.html Normal file
View File

@@ -0,0 +1,164 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.6 Release Notes / September 04, 2015</h1>
<p>
Mesa 10.6.6 is a bug fix release which fixes bugs found since the 10.6.5 release.
</p>
<p>
Mesa 10.6.6 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
416517aa9df4791f97d34451a9e4da33c966afcd18c115c5769b92b15b018ef5 mesa-10.6.6.tar.gz
570f2154b7340ff5db61ff103bc6e85165b8958798b78a50fa2df488e98e5778 mesa-10.6.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90748">Bug 90748</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.depth.rg_half_float_oes fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90902">Bug 90902</a> - [bsw][regression] dEQP: &quot;Found invalid pixel values&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - &quot;high fidelity&quot;: Segfault in _mesa_program_resource_find_name</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
</ul>
<h2>Changes</h2>
<p>Chris Wilson (2):</p>
<ul>
<li>i965: Prevent coordinate overflow in intel_emit_linear_blit</li>
<li>i965: Always re-emit the pipeline select during invariant state emission</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: add missing queries for ARB_direct_state_access</li>
</ul>
<p>Dave Airlie (8):</p>
<ul>
<li>mesa/arb_gpu_shader_fp64: add support for glGetUniformdv</li>
<li>mesa/texgetimage: fix missing stencil check</li>
<li>st/readpixels: fix accel path for skipimages.</li>
<li>texcompress_s3tc/fxt1: fix stride checks (v1.1)</li>
<li>mesa/readpixels: check strides are equal before skipping conversion</li>
<li>mesa: enable texture stencil8 for multisample</li>
<li>r600/sb: update last_cf for finalize if.</li>
<li>r600g: fix calculation for gpr allocation</li>
</ul>
<p>David Heidelberg (1):</p>
<ul>
<li>st/nine: Require gcc &gt;= 4.6</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.5</li>
<li>get-pick-list.sh: Require explicit "10.6" for nominating stable patches</li>
</ul>
<p>Glenn Kennard (4):</p>
<ul>
<li>r600g: Fix assert in tgsi_cmp</li>
<li>r600g/sb: Handle undef in read port tracker</li>
<li>r600g/sb: Don't read junk after EOP</li>
<li>r600g/sb: Don't crash on empty if jump target</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: fix assignments with 4-operand arguments (i.e. BFI)</li>
<li>st/mesa: pass through 4th opcode argument in bitmap/pixel visitors</li>
<li>nv50,nvc0: disable depth bounds test on blit</li>
<li>nv50: fix 2d engine blits for 64- and 128-bit formats</li>
<li>mesa: only copy the requested teximage faces</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Split VGRFs after lowering pull constants</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>i965: Fix copy propagation type changes.</li>
<li>Revert "i965: Advertise a line width of 40.0 on Cherryview and Skylake."</li>
<li>i965: Momentarily pretend to support ARB_texture_stencil8 for blits.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets</li>
<li>mesa: create multisample fallback textures like normal textures</li>
<li>radeonsi: fix a Unigine Heaven hang when drirc is missing</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>i965/fs: Handle MRF destinations in lower_integer_multiplication().</li>
</ul>
<p>Neil Roberts (2):</p>
<ul>
<li>i965: Swap the order of the vertex ID and edge flag attributes</li>
<li>i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used</li>
</ul>
<p>Tapani Pälli (5):</p>
<ul>
<li>mesa: update fbo state in glTexStorage</li>
<li>glsl: build stageref mask using IR, not symbol table</li>
<li>glsl: expose build_program_resource_list function</li>
<li>glsl: create program resource list after LinkShader</li>
<li>mesa: add GL_RED, GL_RG support for floating point textures</li>
</ul>
</div>
</body>
</html>

75
docs/relnotes/10.6.7.html Normal file
View File

@@ -0,0 +1,75 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.7 Release Notes / September 10, 2015</h1>
<p>
Mesa 10.6.7 is a bug fix release which fixes bugs found since the 10.6.6 release.
</p>
<p>
Mesa 10.6.7 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
4ba10c59abee30d72476543a57afd2f33803dabf4620dc333b335d47966ff842 mesa-10.6.7.tar.gz
feb1f640b915dada88a7c793dfaff0ae23580f8903f87a6b76469253de0d28d8 mesa-10.6.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90751">Bug 90751</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.stencil.stencil_index8 fails</li>
</ul>
<h2>Changes</h2>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/teximage: use correct extension for accept stencil texture.</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.6</li>
<li>Revert "i965: Momentarily pretend to support ARB_texture_stencil8 for blits."</li>
<li>Update version to 10.6.7</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>glsl: Handle attribute aliasing in attribute storage limit check.</li>
</ul>
</div>
</body>
</html>

136
docs/relnotes/10.6.8.html Normal file
View File

@@ -0,0 +1,136 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.8 Release Notes / September 20, 2015</h1>
<p>
Mesa 10.6.8 is a bug fix release which fixes bugs found since the 10.6.7 release.
</p>
<p>
Mesa 10.6.8 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
1f34dba2a8059782e3e4e0f18b9628004e253b2c69085f735b846d2e63c9e250 mesa-10.6.8.tar.gz
e36ee5ceeadb3966fb5ce5b4cf18322dbb76a4f075558ae49c3bba94f57d58fd mesa-10.6.8.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Antia Puentes (1):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.7</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
</ul>
<p>Hans de Goede (4):</p>
<ul>
<li>nv30: Fix creation of scanout buffers</li>
<li>nv30: Implement color resolve for msaa</li>
<li>nv30: Fix max width / height checks in nv30 sifm code</li>
<li>nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type</li>
<li>mesa: Don't allow wrong type setters for matrix uniforms</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: don't fall back to 16F when 32F is requested</li>
<li>nvc0: always emit a full shader colormask</li>
<li>nvc0: remove BGRA4 format support</li>
<li>st/mesa: avoid integer overflows with buffers &gt;= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/vec4: Don't reswizzle hardware registers</li>
</ul>
<p>Jose Fonseca (1):</p>
<ul>
<li>gallivm: Workaround LLVM PR23628.</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Momentarily pretend to support ARB_texture_stencil8 for blits.</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: convert double to long long instead of unsigned long long</li>
</ul>
<p>Ray Strode (1):</p>
<ul>
<li>gbm: convert gbm bo format to fourcc format on dma-buf import</li>
</ul>
<p>Ulrich Weigand (1):</p>
<ul>
<li>mesa: Fix texture compression on big-endian systems</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>gallivm: Do not use NoFramePointerElim with LLVM 3.7.</li>
</ul>
</div>
</body>
</html>

130
docs/relnotes/10.6.9.html Normal file
View File

@@ -0,0 +1,130 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.9 Release Notes / Octover 03, 2015</h1>
<p>
Mesa 10.6.9 is a bug fix release which fixes bugs found since the 10.6.8 release.
</p>
<p>
Mesa 10.6.9 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
3406876aac67546d0c3e2cb97da330b62644c313e7992b95618662e13c54296a mesa-10.6.9.tar.gz
b04c4de6280b863babc2929573da17218d92e9e4ba6272d548d135415723e8c3 mesa-10.6.9.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats</li>
</ul>
<p>Chris Wilson (1):</p>
<ul>
<li>i965: Remove early release of DRI2 miptree</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.8</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
<li>Update version to 10.6.9</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.</li>
</ul>
<p>Ian Romanick (5):</p>
<ul>
<li>t_dd_dmatmp: Make "count" actually be the count</li>
<li>t_dd_dmatmp: Clean up improper code formatting from previous patch</li>
<li>t_dd_dmatmp: Use '&amp; 3' instead of '% 4' everywhere</li>
<li>t_dd_dmatmp: Pull out common 'count -= count &amp; 3' code</li>
<li>t_dd_dmatmp: Use addition instead of subtraction in loop bounds</li>
</ul>
<p>Jeremy Huddleston (1):</p>
<ul>
<li>configure.ac: Add support to enable read-only text segment on x86.</li>
</ul>
<p>Kristian Høgsberg Kristensen (1):</p>
<ul>
<li>i965: Respect stride and subreg_offset for ATTR registers</li>
</ul>
<p>Kyle Brenneman (3):</p>
<ul>
<li>glx: Fix build errors with --enable-mangling (v2)</li>
<li>mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.</li>
<li>glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: fix vui time_scale zero error</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: fix front buffer regression after dropping st_validate_state in Blit</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>mesa: fix mipmap generation for immutable, compressed textures</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.0 Release Notes / TBD</h1>
<h1>Mesa 11.0.0 Release Notes / September 12, 2015</h1>
<p>
Mesa 11.0.0 is a new development release.
@@ -33,7 +33,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD.
7d7e4ddffa3b162506efa01e2cc41e329caa4995336b92e5cc21f2e1fb36c1b3 mesa-11.0.0.tar.gz
e095a3eb2eca9dfde7efca8946527c8ae20a0cc938a8c78debc7f158ad44af32 mesa-11.0.0.tar.xz
</pre>
@@ -83,13 +84,175 @@ Note: some of the new features are only available with certain drivers.
<li>EGL 1.5 on r600, radeonsi, nv50, nvc0</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=51658">Bug 51658</a> - r200 (&amp; possibly radeon) DRI fixes for gnome shell on Mesa 8.0.3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=65525">Bug 65525</a> - [llvmpipe] lp_scene.h:210:lp_scene_alloc: Assertion `size &lt;= (64 * 1024)' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=66346">Bug 66346</a> - shader_query.cpp:49: error: invalid conversion from 'void*' to 'GLuint'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=73512">Bug 73512</a> - [clover] mesa.icd. should contain full path</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=74329">Bug 74329</a> - Please expose OES_texture_float and OES_texture_half_float on the ES3 context</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84225">Bug 84225</a> - Allow constant-index-expression sampler array indexing with GLSL-ES &lt; 300</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90000">Bug 90000</a> - [i965 Bisected NIR] Piglit/gglean_fragprog1-z-write_test fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90073">Bug 90073</a> - Leaks in xcb_dri3_open_reply_fds() and get_render_node_from_id_path_tag</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90249">Bug 90249</a> - Fails to build egl_dri2 on osx</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90537">Bug 90537</a> - radeonsi bo/va conflict on RADEON_GEM_VA (rscreen-&gt;ws-&gt;buffer_from_handle returns NULL)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90547">Bug 90547</a> - [BDW/BSW/SKL Bisected]Piglit/glean&#64;vertprog1-rsq_test_2_(reciprocal_square_root_of_negative_value) fais</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90580">Bug 90580</a> - [HSW bisected] integer multiplication bug</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90600">Bug 90600</a> - IOError: [Errno 2] No such file or directory: 'gl_API.xml'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90629">Bug 90629</a> - [i965] SIMD16 dual_source_blend assertion `src[i].file != GRF || src[i].width == dst.width' failed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90691">Bug 90691</a> - [BSW]Piglit/spec/nv_conditional_render/dlist fails intermittently</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90728">Bug 90728</a> - dvd playback with vlc and vdpau causes segmentation fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90748">Bug 90748</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.depth.rg_half_float_oes fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90749">Bug 90749</a> - [BDW Bisected]dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90751">Bug 90751</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.stencil.stencil_index8 fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90797">Bug 90797</a> - [ALL bisected] Mesa change cause performance case manhattan fail.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90817">Bug 90817</a> - swrast fails to load with certain remote X servers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90895">Bug 90895</a> - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex &amp; ALU and many SynMark cases performance reduced by 10-23%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90902">Bug 90902</a> - [bsw][regression] dEQP: &quot;Found invalid pixel values&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90903">Bug 90903</a> - egl_dri2.c:dri2_load fails to load libglapi on osx</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90904">Bug 90904</a> - OSX: EXC_BAD_ACCESS when using translate_sse + gallium + softpipe/llvmpipe</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90905">Bug 90905</a> - mesa: Finish subdir-objects transition</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - &quot;high fidelity&quot;: Segfault in _mesa_program_resource_find_name</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91022">Bug 91022</a> - [g45 g965 bisected] assertions generated from textureGrad cube samplers fix</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91077">Bug 91077</a> - dri2_glx.c:1186: undefined reference to `loader_open_device'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91099">Bug 91099</a> - [llvmpipe] piglit glsl-max-varyings &gt;max_varying_components regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91101">Bug 91101</a> - [softpipe] piglit glsl-1.50&#64;execution&#64;geometry&#64;max-input-components regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91173">Bug 91173</a> - Oddworld: Stranger's Wrath HD: disfigured models in wrong colors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91193">Bug 91193</a> - [290x] Dota2 reborn ingame rendering breaks with git-af4b9c7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91222">Bug 91222</a> - lp_test_format regression on CentOS 7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91226">Bug 91226</a> - Crash in glLinkProgram (NEW)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91231">Bug 91231</a> - [NV92] Psychonauts (native) segfaults on start when DRI3 enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91290">Bug 91290</a> - SIGSEGV glcpp/glcpp-parse.y:1077</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91337">Bug 91337</a> - OSMesaGetProcAdress(&quot;OSMesaPixelStore&quot;) returns nil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91418">Bug 91418</a> - Visual Studio 2015 vsnprintf build error</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91425">Bug 91425</a> - [regression, bisected] Piglit spec/ext_packed_float/ getteximage-invalid-format-for-packed-type fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91441">Bug 91441</a> - make check DispatchSanity_test.GL30 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91444">Bug 91444</a> - regression bisected radeonsi: don't change pipe_resource in resource_copy_region</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91461">Bug 91461</a> - gl_TessLevel* writes have no effect for all but the last TCS invocation</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91513">Bug 91513</a> - [IVB/HSW/BDW/SKL Bisected] Lightsmark performance reduced by 7%-10%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91544">Bug 91544</a> - [i965, regression, bisected] regression of several tests in 93977d3a151675946c03e</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91591">Bug 91591</a> - rounding.h:102:2: error: #error &quot;Unsupported or undefined LONG_BIT&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image &amp; DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
</ul>
<h2>Changes</h2>
TBD.
<li>Removed the EGL loader from the Linux SCons build.</li>
</div>
</body>

134
docs/relnotes/11.0.1.html Normal file
View File

@@ -0,0 +1,134 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.1 Release Notes / September 26, 2015</h1>
<p>
Mesa 11.0.1 is a bug fix release which fixes bugs found since the 11.0.0 release.
</p>
<p>
Mesa 11.0.1 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
6dab262877e12c0546a0e2970c6835a0f217e6d4026ccecb3cd5dd733d1ce867 mesa-11.0.1.tar.gz
43d0dfcd1f1e36f07f8228cd76d90175d3fc74c1ed25d7071794a100a98ef2a6 mesa-11.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
</ul>
<h2>Changes</h2>
<p>Antia Puentes (2):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
<li>i965/vec4_nir: Load constants as integers</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>meta: Abort meta pbo path if TexSubImage need signed unsigned conversion</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.0</li>
<li>Update version to 11.0.1</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.</li>
</ul>
<p>Ian Romanick (5):</p>
<ul>
<li>t_dd_dmatmp: Make "count" actually be the count</li>
<li>t_dd_dmatmp: Clean up improper code formatting from previous patch</li>
<li>t_dd_dmatmp: Use '&amp; 3' instead of '% 4' everywhere</li>
<li>t_dd_dmatmp: Pull out common 'count -= count &amp; 3' code</li>
<li>t_dd_dmatmp: Use addition instead of subtraction in loop bounds</li>
</ul>
<p>Ilia Mirkin (6):</p>
<ul>
<li>st/mesa: avoid integer overflows with buffers &gt;= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
<li>freedreno/a3xx: fix blending of L8 format</li>
<li>nv50,nvc0: detect underlying resource changes and update tic</li>
<li>nv50,nvc0: flush texture cache in presence of coherent bufs</li>
<li>radeonsi: load fmask ptr relative to the resources array</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir: Fix a bunch of ralloc parenting errors</li>
<li>i965/vec4: Don't reswizzle hardware registers</li>
</ul>
<p>Jeremy Huddleston (1):</p>
<ul>
<li>configure.ac: Add support to enable read-only text segment on x86.</li>
</ul>
<p>Ray Strode (1):</p>
<ul>
<li>gbm: convert gbm bo format to fourcc format on dma-buf import</li>
</ul>
<p>Tapani Pälli (2):</p>
<ul>
<li>mesa: fix errors when reading depth with glReadPixels</li>
<li>i965: fix textureGrad for cubemaps</li>
</ul>
<p>Ulrich Weigand (1):</p>
<ul>
<li>mesa: Fix texture compression on big-endian systems</li>
</ul>
</div>
</body>
</html>

85
docs/relnotes/11.0.2.html Normal file
View File

@@ -0,0 +1,85 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.2 Release Notes / September 28, 2015</h1>
<p>
Mesa 11.0.2 is a bug fix release which fixes bugs found since the 11.0.1 release.
</p>
<p>
Mesa 11.0.2 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
45170773500d6ae2f9eb93fc85efee69f7c97084411ada4eddf92f78bca56d20 mesa-11.0.2.tar.gz
fce11fb27eb87adf1e620a76455d635c6136dfa49ae58c53b34ef8d0c7b7eae4 mesa-11.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91582">Bug 91582</a> - [bisected] Regression in DEQP gles2.functional.negative_api.texture.texsubimage2d_neg_offset</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
</ul>
<h2>Changes</h2>
<p>Eduardo Lima Mitev (3):</p>
<ul>
<li>mesa: Fix order of format+type and internal format checks for glTexImageXD ops</li>
<li>mesa: Move _mesa_base_tex_format() from teximage to glformats files</li>
<li>mesa: Use the effective internal format instead for validation</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.1</li>
<li>Update version to 11.0.2</li>
</ul>
<p>Kristian Høgsberg Kristensen (1):</p>
<ul>
<li>i965: Respect stride and subreg_offset for ATTR registers</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>glsl: Expose gl_MaxTess{Control,Evaluation}AtomicCounters.</li>
</ul>
</div>
</body>
</html>

185
docs/relnotes/11.0.3.html Normal file
View File

@@ -0,0 +1,185 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.3 Release Notes / October 10, 2015</h1>
<p>
Mesa 11.0.3 is a bug fix release which fixes bugs found since the 11.0.2 release.
</p>
<p>
Mesa 11.0.3 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c2210e3daecc10ed9fdcea500327652ed6effc2f47c4b9cee63fb08f560d7117 mesa-11.0.3.tar.gz
ab2992eece21adc23c398720ef8c6933cb69ea42e1b2611dc09d031e17e033d6 mesa-11.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: Add abs input modifier to base for POW in ffvertex_prog</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.2</li>
<li>Revert "nouveau: make sure there's always room to emit a fence"</li>
<li>Update version to 11.0.3</li>
</ul>
<p>Francisco Jerez (1):</p>
<ul>
<li>i965/fs: Fix hang on IVB and VLV with image format mismatch.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>meta: Handle array textures in scaled MSAA blits</li>
</ul>
<p>Ilia Mirkin (6):</p>
<ul>
<li>nouveau: be more careful about freeing temporary transfer buffers</li>
<li>nouveau: delay deleting buffer with unflushed fence</li>
<li>nouveau: wait to unref the transfer's bo until it's no longer used</li>
<li>nv30: pretend to have packed texture/surface formats</li>
<li>nv30: always go through translate module on big-endian</li>
<li>nouveau: make sure there's always room to emit a fence</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks</li>
</ul>
<p>Kyle Brenneman (3):</p>
<ul>
<li>glx: Fix build errors with --enable-mangling (v2)</li>
<li>mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.</li>
<li>glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: fix vui time_scale zero error</li>
</ul>
<p>Marek Olšák (21):</p>
<ul>
<li>st/mesa: fix front buffer regression after dropping st_validate_state in Blit</li>
<li>radeonsi: handle index buffer alloc failures</li>
<li>radeonsi: handle constant buffer alloc failures</li>
<li>gallium/radeon: handle buffer_map staging buffer failures better</li>
<li>gallium/radeon: handle buffer alloc failures in r600_draw_rectangle</li>
<li>gallium/radeon: add a fail path for depth MSAA texture readback</li>
<li>radeonsi: report alloc failure from si_shader_binary_read</li>
<li>radeonsi: add malloc fail paths to si_create_shader_state</li>
<li>radeonsi: skip drawing if the tess factor ring allocation fails</li>
<li>radeonsi: skip drawing if GS ring allocations fail</li>
<li>radeonsi: handle shader precompile failures</li>
<li>radeonsi: handle fixed-func TCS shader create failure</li>
<li>radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload</li>
<li>radeonsi: skip drawing if PS fails to compile or upload</li>
<li>radeonsi: skip drawing if updating the scratch buffer fails</li>
<li>radeonsi: don't forget to update scratch relocations for LS, HS, ES shaders</li>
<li>radeonsi: handle dummy constant buffer allocation failure</li>
<li>gallium/u_blitter: handle allocation failures</li>
<li>radeonsi: add scratch buffer to the buffer list when it's re-allocated</li>
<li>st/dri: don't use _ctx in client_wait_sync</li>
<li>egl/dri2: don't require a context for ClientWaitSync (v2)</li>
</ul>
<p>Matthew Waters (1):</p>
<ul>
<li>egl: rework handling EGL_CONTEXT_FLAGS</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>st/dri: Use packed RGB formats</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>mesa: fix mipmap generation for immutable, compressed textures</li>
</ul>
<p>Tom Stellard (3):</p>
<ul>
<li>gallium/radeon: Use call_once() when initailizing LLVM targets</li>
<li>gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets v2</li>
<li>radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target v2</li>
</ul>
<p>Varad Gautam (1):</p>
<ul>
<li>egl: restore surface type before linking config to its display</li>
</ul>
<p>Ville Syrjälä (3):</p>
<ul>
<li>i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0)</li>
<li>i915: Fix texcoord vs. varying collision in fragment programs</li>
<li>i915: Remember to call intel_prepare_render() before blitting</li>
</ul>
</div>
</body>
</html>

168
docs/relnotes/11.0.4.html Normal file
View File

@@ -0,0 +1,168 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.4 Release Notes / October 24, 2015</h1>
<p>
Mesa 11.0.4 is a bug fix release which fixes bugs found since the 11.0.3 release.
</p>
<p>
Mesa 11.0.4 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
ed412ca6a46d1bd055120e5c12806c15419ae8c4dd6d3f6ea20a83091d5c78bf mesa-11.0.4.tar.gz
40201bf7fc6fa12a6d9edfe870b41eb4dd6669154e3c42c48a96f70805f5483d mesa-11.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (2):</p>
<ul>
<li>i965/vec4: check writemask when bailing out at register coalesce</li>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>vbo: fix incorrect switch statement in init_mat_currval()</li>
<li>mesa: fix incorrect opcode in save_BlendFunci()</li>
</ul>
<p>Chih-Wei Huang (3):</p>
<ul>
<li>mesa: android: Fix the incorrect path of sse_minmax.c</li>
<li>nv50/ir: use C++11 standard std::unordered_map if possible</li>
<li>nv30: include the header of ffs prototype</li>
</ul>
<p>Chris Wilson (1):</p>
<ul>
<li>i965: Remove early release of DRI2 miptree</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/uniforms: fix get_uniform for doubles (v2)</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.3</li>
</ul>
<p>Francisco Jerez (5):</p>
<ul>
<li>i965: Don't tell the hardware about our UAV access.</li>
<li>mesa: Expose function to calculate whether a shader image unit is valid.</li>
<li>mesa: Skip redundant texture completeness checking during image validation.</li>
<li>i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.</li>
<li>mesa: Get rid of texture-dependent image unit derived state.</li>
</ul>
<p>Ian Romanick (8):</p>
<ul>
<li>glsl: Allow built-in functions as constant expressions in OpenGL ES 1.00</li>
<li>ff_fragment_shader: Use binding to set the sampler unit</li>
<li>glsl/linker: Use constant_initializer instead of constant_value to initialize uniforms</li>
<li>glsl: Use constant_initializer instead of constant_value to determine whether to keep an unused uniform</li>
<li>glsl: Only set ir_variable::constant_value for const-decorated variables</li>
<li>glsl: Restrict initializers for global variables to constant expression in ES</li>
<li>glsl: Add method to determine whether an expression contains the sequence operator</li>
<li>glsl: In later GLSL versions, sequence operator is cannot be a constant expression</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nouveau: make sure there's always room to emit a fence</li>
</ul>
<p>Indrajit Das (1):</p>
<ul>
<li>st/va: Used correct parameter to derive the value of the "h" variable in vlVaCreateImage</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>configure.ac: ensure RM is set</li>
</ul>
<p>Krzysztof Sobiecki (1):</p>
<ul>
<li>st/fbo: use pipe_surface_release instead of pipe_surface_reference</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>st/omx/dec/h264: fix field picture type 0 poc disorder</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: fix clip state dependencies</li>
<li>radeonsi: fix a GS copy shader leak</li>
<li>gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>u_vbuf: fix vb slot assignment for translated buffers</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno/a3xx: cache-flush is needed after MEM_WRITE</li>
</ul>
<p>Tapani Pälli (3):</p>
<ul>
<li>mesa: add GL_UNSIGNED_INT_24_8 to _mesa_pack_depth_span</li>
<li>mesa: Set api prefix to version string when overriding version</li>
<li>mesa: fix ARRAY_SIZE query for GetProgramResourceiv</li>
</ul>
</div>
</body>
</html>

174
docs/relnotes/11.0.5.html Normal file
View File

@@ -0,0 +1,174 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.5 Release Notes / November 11, 2015</h1>
<p>
Mesa 11.0.5 is a bug fix release which fixes bugs found since the 11.0.4 release.
</p>
<p>
Mesa 11.0.5 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
8495ef5c06f7f726452462b7d408a5b40048373ff908f2283a3b4d1f49b45ee6 mesa-11.0.5.tar.gz
9c255a2a6695fcc6ef4a279e1df0aeaf417dc142f39ee59dfb533d80494bb67a mesa-11.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeon/uvd: don't expose HEVC on old UVD hw (v3)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl: Add GT4 PCI IDs</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.4</li>
<li>cherry-ignore: ignore a possible wrong nomination</li>
<li>Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"</li>
<li>Update version to 11.0.5</li>
</ul>
<p>Emmanuel Gil Peyrot (1):</p>
<ul>
<li>gbm.h: Add a missing stddef.h include for size_t.</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: When the create ioctl fails, free our cache and try again.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>i965: Fix is-renderable check in intel_image_target_renderbuffer_storage</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: respect edgeflag attribute width</li>
<li>nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments</li>
<li>nouveau: relax fence emit space assert</li>
</ul>
<p>Ivan Kalvachev (1):</p>
<ul>
<li>r600g: Fix special negative immediate constants when using ABS modifier.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir/lower_vec_to_movs: Pass the shader around directly</li>
<li>nir: Report progress from lower_vec_to_movs().</li>
</ul>
<p>Jose Fonseca (2):</p>
<ul>
<li>gallivm: Translate all util_cpu_caps bits to LLVM attributes.</li>
<li>gallivm: Explicitly disable unsupported CPU features.</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>st/va: pass picture desc to begin and decode</li>
<li>nvc0: fix crash when nv50_miptree_from_handle fails</li>
<li>st/va: do not destroy old buffer when new one failed</li>
<li>st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer</li>
</ul>
<p>Kenneth Graunke (6):</p>
<ul>
<li>i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.</li>
<li>nir: Report progress from nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_opt_copy_prop().</li>
<li>nir: Properly invalidate metadata in nir_lower_vec_to_movs().</li>
<li>nir: Properly invalidate metadata in nir_opt_remove_phis().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: add register definitions for Stoney</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/glformats: Undo code changes from _mesa_base_tex_format() move</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>st/mesa: fix mipmap generation for immutable textures with incomplete pyramids</li>
</ul>
<p>Nigel Stewart (1):</p>
<ul>
<li>osmesa: Expose GL entry points for Windows build via DEF file.</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>gallivm: disable f16c when not using AVX</li>
</ul>
<p>Samuel Li (2):</p>
<ul>
<li>radeonsi: add support for Stoney asics (v3)</li>
<li>radeonsi: add Stoney pci ids</li>
</ul>
</div>
</body>
</html>

145
docs/relnotes/11.0.6.html Normal file
View File

@@ -0,0 +1,145 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.6 Release Notes / November 21, 2015</h1>
<p>
Mesa 11.0.6 is a bug fix release which fixes bugs found since the 11.0.5 release.
</p>
<p>
Mesa 11.0.6 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
4bdf054af66ebabf3eca0616f9f5e44c2f234695661b570261c391bc2f4f7482 mesa-11.0.6.tar.gz
8340e64cdc91999840404c211496f3de38e7b4cb38db34e2f72f1642c5134760 mesa-11.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: enable optimal raster config setting for fiji (v2)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl/gt4: Fix URB programming restriction.</li>
</ul>
<p>Boyuan Zhang (2):</p>
<ul>
<li>st/vaapi: fix vaapi VC-1 simple/main corruption v2</li>
<li>radeon/uvd: fix VC-1 simple/main profile decode v2</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>r600: initialised PGM_RESOURCES_2 for ES/GS</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.5</li>
<li>cherry-ignore: add the swrast front buffer support</li>
<li>automake: use static llvm for make distcheck</li>
<li>Update version to 11.0.6</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.</li>
<li>vc4: Return NULL when we can't make our shadow for a sampler view.</li>
<li>vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>meta/generate_mipmap: Don't leak the sampler object</li>
<li>meta/generate_mipmap: Only modify the draw framebuffer binding in fallback_required</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>mesa/copyimage: allow width/height to not be multiples of block</li>
<li>nouveau: don't expose HEVC decoding support</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_store</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>glsl: Allow implicit int -&gt; uint conversions for the % operator.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: use simple coeffs calc for 128bit vectors</li>
</ul>
<p>Roland Scheidegger (2):</p>
<ul>
<li>radeon: fix bgrx8/xrgb8 blits</li>
<li>r200: fix bgrx8/xrgb8 blits</li>
</ul>
</div>
</body>
</html>

89
docs/relnotes/11.1.0.html Normal file
View File

@@ -0,0 +1,89 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
<p>
Mesa 11.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.1.1.
</p>
<p>
Mesa 11.1.0 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 3.1 support on freedreno (a3xx, a4xx)</li>
<li>OpenGL 3.3 support for VMware guest VM driver (supported by Workstation 12
and Fusion 8).
<li>GL_AMD_performance_monitor on nv50</li>
<li>GL_ARB_arrays_of_arrays on i965</li>
<li>GL_ARB_blend_func_extended on freedreno (a3xx)</li>
<li>GL_ARB_clear_texture on nv50, nvc0</li>
<li>GL_ARB_copy_image on nv50, nvc0, radeonsi</li>
<li>GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips</li>
<li>GL_ARB_gpu_shader5 on r600 for Evergreen and later chips</li>
<li>GL_ARB_shader_clock on i965 (gen7+)</li>
<li>GL_ARB_shader_stencil_export on i965 (gen9+)</li>
<li>GL_ARB_shader_storage_buffer_object on i965</li>
<li>GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, radeonsi</li>
<li>GL_ARB_texture_barrier / GL_NV_texture_barrier on i965</li>
<li>GL_ARB_texture_query_lod on softpipe</li>
<li>GL_ARB_texture_view on radeonsi and r600 (for evergeen and newer)</li>
<li>GL_ARB_vertex_type_2_10_10_10_rev on freedreno (a3xx, a4xx)</li>
<li>GL_EXT_blend_func_extended on all drivers that support the ARB version</li>
<li>GL_EXT_buffer_storage implemented for when ES 3.1 support is gained</li>
<li>GL_EXT_draw_elements_base_vertex on all drivers</li>
<li>GL_EXT_texture_compression_rgtc / latc on freedreno (a3xx & a4xx)</li>
<li>GL_KHR_debug (GLES)</li>
<li>GL_NV_conditional_render on freedreno</li>
<li>GL_OES_draw_elements_base_vertex on all drivers</li>
<li>EGL_KHR_create_context on softpipe, llvmpipe</li>
<li>EGL_KHR_gl_colorspace on softpipe, llvmpipe</li>
<li>new virgl gallium driver for qemu virtio-gpu</li>
<li>16x multisampling on i965 (gen9+)</li>
<li>GL_EXT_shader_samples_identical on i965.</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -63,6 +63,20 @@ execution. These are generally used for debugging.
Example: export MESA_GLSL=dump,nopt
</p>
<p>
Shaders can be dumped and replaced on runtime for debugging purposes. Mesa
needs to be configured with '--with-sha1' to enable this functionality. This
feature is not currently supported by SCons build.
This is controlled via following environment variables:
<ul>
<li><b>MESA_SHADER_DUMP_PATH</b> - path where shader sources are dumped
<li><b>MESA_SHADER_READ_PATH</b> - path where replacement shaders are read
</ul>
Note, path set must exist before running for dumping or replacing to work.
When both are set, these paths should be different so the dumped shaders do
not clobber the replacement shaders.
</p>
<h2 id="support">GLSL Version</h2>

View File

@@ -0,0 +1,176 @@
Name
EXT_shader_samples_identical
Name Strings
GL_EXT_shader_samples_identical
Contact
Ian Romanick, Intel (ian.d.romanick 'at' intel.com)
Contributors
Chris Forbes, Mesa
Magnus Wendt, Intel
Neil S. Roberts, Intel
Graham Sellers, AMD
Status
XXX - Not complete yet.
Version
Last Modified Date: November 19, 2015
Revision: 6
Number
TBD
Dependencies
OpenGL 3.2, or OpenGL ES 3.1, or ARB_texture_multisample is required.
This extension is written against the OpenGL 4.5 (Core Profile)
Specification
Overview
Multisampled antialiasing has become a common method for improving the
quality of rendered images. Multisampling differs from supersampling in
that the color of a primitive that covers all or part of a pixel is
resolved once, regardless of the number of samples covered. If a large
polygon is rendered, the colors of all samples in each interior pixel will
be the same. This suggests a simple compression scheme that can reduce
the necessary memory bandwidth requirements. In one such scheme, each
sample is stored in a separate slice of the multisample surface. An
additional multisample control surface (MCS) contains a mapping from pixel
samples to slices.
If all the values stored in the MCS for a particular pixel are the same,
then all the samples have the same value. Applications can take advantage
of this information to reduce the bandwidth of reading multisample
textures. A custom multisample resolve filter could optimize resolving
pixels where every sample is identical by reading the color once.
color = texelFetch(sampler, coordinate, 0);
if (!textureSamplesIdenticalEXT(sampler, coordinate)) {
for (int i = 1; i < MAX_SAMPLES; i++) {
vec4 c = texelFetch(sampler, coordinate, i);
//... accumulate c into color
}
}
New Procedures and Functions
None.
New Tokens
None.
Additions to the OpenGL 4.5 (Core Profile) Specification
None.
Modifications to The OpenGL Shading Language Specification, Version 4.50.5
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_EXT_shader_samples_identical
A new preprocessor #define is added to the OpenGL Shading Language:
#define GL_EXT_shader_samples_identical
Add to the table in section 8.7 "Texture Lookup Functions"
Syntax:
bool textureSamplesIdenticalEXT(gsampler2DMS sampler, ivec2 coord)
bool textureSamplesIdenticalEXT(gsampler2DMSArray sampler,
ivec3 coord)
Description:
Returns true if it can be determined that all samples within the texel
of the multisample texture bound to <sampler> at <coord> contain the
same values or false if this cannot be determined."
Additions to the AGL/EGL/GLX/WGL Specifications
None
Errors
None
New State
None
New Implementation Dependent State
None
Issues
1) What should the new functions be called?
RESOLVED: textureSamplesIdenticalEXT. Initially
textureAllSamplesIdenticalEXT was considered, but
textureSamplesIdenticalEXT is more similar to the existing textureSamples
function.
2) It seems like applications could implement additional optimization if
they were provided with raw MCS data. Should this extension also
provide that data?
There are a number of challenges in providing raw MCS data. The biggest
problem being that the amount of MCS data depends on the number of
samples, and that is not known at compile time. Additionally, without new
texelFetch functions, applications would have difficulty utilizing the
information.
Another option is to have a function that returns an array of tuples of
sample number and count. This also has difficulties with the maximum
array size not being known at compile time.
RESOLVED: Do not expose raw MCS data in this extension.
3) Should this extension also extend SPIR-V?
RESOLVED: Yes, but this has not yet been written.
4) Is it possible for textureSamplesIdenticalEXT to report false negatives?
RESOLVED: Yes. It is possible that the underlying hardware may not detect
that separate writes of the same color to different samples of a pixel are
the same. The shader function is at the whim of the underlying hardware
implementation. It is also possible that a compressed multisample surface
is not used. In that case the function will likely always return false.
Revision History
Rev Date Author Changes
--- ---------- -------- ---------------------------------------------
1 2014/08/20 cforbes Initial version
2 2015/10/23 idr Change from MESA to EXT. Rebase on OpenGL 4.5,
and add dependency on OpenGL ES 3.1. Initial
draft of overview section and issues 1 through
3.
3 2015/10/27 idr Typo fixes.
4 2015/11/10 idr Rename extension from EXT_shader_multisample_compression
to EXT_shader_samples_identical.
Add issue #4.
5 2015/11/18 idr Fix some typos spotted by gsellers. Change the
name of the name of the function to
textureSamplesIdenticalEXT.
6 2015/11/19 idr Fix more typos spotted by Nicolai Hähnle.

View File

@@ -30,6 +30,10 @@
<dt><a href="http://www.valgrind.org">Valgrind</a></dt>
<dd>is a very useful tool for tracking down
memory-related problems in your code.</dd>
<dt><a href="http:scan.coverity.com/projects/mesa">Coverity</a><dt>
<dd>provides static code analysis of Mesa. If you create an account
you can see the results and try to fix outstanding issues.</dd>
</dl>
</div>

View File

@@ -26,6 +26,31 @@ VMware Workstation running on Linux or Windows and VMware Fusion running on
MacOS are all supported.
</p>
<p>
With the August 2015 Workstation 12 / Fusion 8 releases, OpenGL 3.3
is supported in the guest.
This requires:
<ul>
<li>The VM is configured for virtual hardware version 12.
<li>The host OS, GPU and graphics driver supports DX11 (Windows) or
OpenGL 4.0 (Linux, Mac)
<li>On Linux, the vmwgfx kernel module must be version 2.9.0 or later.
<li>A recent version of Mesa with the updated svga gallium driver.
</ul>
</p>
<p>
Otherwise, OpenGL 2.1 is supported.
</p>
<p>
OpenGL 3.3 support can be disabled by setting the environment variable
SVGA_VGPU10=0.
You will then have OpenGL 2.1 support.
This may be useful to work around application bugs (such as incorrect use
of the OpenGL 3.x core profile).
</p>
<p>
Most modern Linux distros include the SVGA3D driver so end users shouldn't
be concerned with this information.
@@ -123,10 +148,33 @@ To get the latest code from git:
<h2>Building the Code</h2>
<ul>
<li>Build libdrm: If you're on a 32-bit system, you should skip the --libdir configure option. Note also the comment about toolchain libdrm above.
<li>
Determine where the GL-related libraries reside on your system and set
the LIBDIR environment variable accordingly.
<br><br>
For 32-bit Ubuntu systems:
<pre>
export LIBDIR=/usr/lib/i386-linux-gnu
</pre>
For 64-bit Ubuntu systems:
<pre>
export LIBDIR=/usr/lib/x86_64-linux-gnu
</pre>
For 32-bit Fedora systems:
<pre>
export LIBDIR=/usr/lib
</pre>
For 64-bit Fedora systems:
<pre>
export LIBDIR=/usr/lib64
</pre>
</li>
<li>Build libdrm:
<pre>
cd $TOP/drm
./autogen.sh --prefix=/usr --libdir=/usr/lib64
./autogen.sh --prefix=/usr --libdir=${LIBDIR}
make
sudo make install
</pre>
@@ -137,12 +185,9 @@ The libxatracker library is used exclusively by the X server to do render,
copy and video acceleration:
<br>
The following configure options doesn't build the EGL system.
<br>
As before, if you're on a 32-bit system, you should skip the --libdir
configure option.
<pre>
cd $TOP/mesa
./autogen.sh --prefix=/usr --libdir=/usr/lib64 --with-gallium-drivers=svga --with-dri-drivers= --enable-xa --disable-dri3
./autogen.sh --prefix=/usr --libdir=${LIBDIR} --with-gallium-drivers=svga --with-dri-drivers=swrast --enable-xa --disable-dri3 --enable-glx-tls
make
sudo make install
</pre>
@@ -152,25 +197,39 @@ if they're not installed in your system. You should be told what's missing.
<br>
<br>
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with building and replacing the current Xorg driver. First check if your system is 32- or 64-bit. If you're building for a 32-bit system, you will not be needing the --libdir=/usr/lib64 option to autogen.
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with
building and replacing the current Xorg driver.
First check if your system is 32- or 64-bit.
<pre>
cd $TOP/xf86-video-vmware
./autogen.sh --prefix=/usr --libdir=/usr/lib64
./autogen.sh --prefix=/usr --libdir=${LIBDIR}
make
sudo make install
</pre>
<li>vmwgfx kernel module. First make sure that any old version of this kernel module is removed from the system by issuing
<pre>
<pre>
sudo rm /lib/modules/`uname -r`/kernel/drivers/gpu/drm/vmwgfx.ko*
</pre>
Then
<pre>
</pre>
Build and install:
<pre>
cd $TOP/vmwgfx
make
sudo make install
sudo cp 00-vmwgfx.rules /etc/udev/rules.d
sudo depmod -ae
</pre>
sudo depmod -a
</pre>
If you're using a Ubuntu OS:
<pre>
sudo update-initramfs -u
</pre>
If you're using a Fedora OS:
<pre>
sudo dracut --force
</pre>
Add 'vmwgfx' to the /etc/modules file:
<pre>
echo vmwgfx | sudo tee -a /etc/modules
</pre>
Note: some distros put DRM kernel drivers in different directories.
For example, sometimes vmwgfx.ko might be found in
@@ -227,6 +286,16 @@ If you don't see this, try setting this environment variable:
then rerun glxinfo and examine the output for error messages.
</p>
<p>
If OpenGL 3.3 is not working (you only get OpenGL 2.1):
</p>
<ul>
<li>Make sure the VM uses hardware version 12.
<li>Make sure the vmwgfx kernel module is version 2.9.0 or later.
<li>Check the vmware.log file for errors.
<li>Run 'dmesg | grep vmwgfx' and look for "DX: yes".
</div>
</body>
</html>

View File

@@ -495,7 +495,7 @@ struct __DRIdamageExtensionRec {
* SWRast Loader extension.
*/
#define __DRI_SWRAST_LOADER "DRI_SWRastLoader"
#define __DRI_SWRAST_LOADER_VERSION 2
#define __DRI_SWRAST_LOADER_VERSION 3
struct __DRIswrastLoaderExtensionRec {
__DRIextension base;
@@ -528,6 +528,15 @@ struct __DRIswrastLoaderExtensionRec {
void (*putImage2)(__DRIdrawable *drawable, int op,
int x, int y, int width, int height, int stride,
char *data, void *loaderPrivate);
/**
* Put image to drawable
*
* \since 3
*/
void (*getImage2)(__DRIdrawable *readable,
int x, int y, int width, int height, int stride,
char *data, void *loaderPrivate);
};
/**

View File

@@ -102,9 +102,8 @@ call_once(once_flag *flag, void (*func)(void))
static inline int
cnd_broadcast(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_broadcast(cond);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_broadcast(cond) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.2
@@ -119,18 +118,16 @@ cnd_destroy(cnd_t *cond)
static inline int
cnd_init(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_init(cond, NULL);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_init(cond, NULL) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.4
static inline int
cnd_signal(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_signal(cond);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_signal(cond) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.5
@@ -139,7 +136,14 @@ cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt)
{
struct timespec abs_time;
int rt;
if (!cond || !mtx || !xt) return thrd_error;
assert(mtx != NULL);
assert(cond != NULL);
assert(xt != NULL);
abs_time.tv_sec = xt->sec;
abs_time.tv_nsec = xt->nsec;
rt = pthread_cond_timedwait(cond, mtx, &abs_time);
if (rt == ETIMEDOUT)
return thrd_busy;
@@ -150,9 +154,9 @@ cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt)
static inline int
cnd_wait(cnd_t *cond, mtx_t *mtx)
{
if (!cond || !mtx) return thrd_error;
pthread_cond_wait(cond, mtx);
return thrd_success;
assert(mtx != NULL);
assert(cond != NULL);
return (pthread_cond_wait(cond, mtx) == 0) ? thrd_success : thrd_error;
}
@@ -161,7 +165,7 @@ cnd_wait(cnd_t *cond, mtx_t *mtx)
static inline void
mtx_destroy(mtx_t *mtx)
{
assert(mtx);
assert(mtx != NULL);
pthread_mutex_destroy(mtx);
}
@@ -170,7 +174,7 @@ static inline int
mtx_init(mtx_t *mtx, int type)
{
pthread_mutexattr_t attr;
if (!mtx) return thrd_error;
assert(mtx != NULL);
if (type != mtx_plain && type != mtx_timed && type != mtx_try
&& type != (mtx_plain|mtx_recursive)
&& type != (mtx_timed|mtx_recursive)
@@ -188,9 +192,8 @@ mtx_init(mtx_t *mtx, int type)
static inline int
mtx_lock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
pthread_mutex_lock(mtx);
return thrd_success;
assert(mtx != NULL);
return (pthread_mutex_lock(mtx) == 0) ? thrd_success : thrd_error;
}
static inline int
@@ -203,7 +206,9 @@ thrd_yield(void);
static inline int
mtx_timedlock(mtx_t *mtx, const xtime *xt)
{
if (!mtx || !xt) return thrd_error;
assert(mtx != NULL);
assert(xt != NULL);
{
#ifdef EMULATED_THREADS_USE_NATIVE_TIMEDLOCK
struct timespec ts;
@@ -233,7 +238,7 @@ mtx_timedlock(mtx_t *mtx, const xtime *xt)
static inline int
mtx_trylock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
assert(mtx != NULL);
return (pthread_mutex_trylock(mtx) == 0) ? thrd_success : thrd_busy;
}
@@ -241,9 +246,8 @@ mtx_trylock(mtx_t *mtx)
static inline int
mtx_unlock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
pthread_mutex_unlock(mtx);
return thrd_success;
assert(mtx != NULL);
return (pthread_mutex_unlock(mtx) == 0) ? thrd_success : thrd_error;
}
@@ -253,7 +257,7 @@ static inline int
thrd_create(thrd_t *thr, thrd_start_t func, void *arg)
{
struct impl_thrd_param *pack;
if (!thr) return thrd_error;
assert(thr != NULL);
pack = (struct impl_thrd_param *)malloc(sizeof(struct impl_thrd_param));
if (!pack) return thrd_nomem;
pack->func = func;
@@ -329,7 +333,7 @@ thrd_yield(void)
static inline int
tss_create(tss_t *key, tss_dtor_t dtor)
{
if (!key) return thrd_error;
assert(key != NULL);
return (pthread_key_create(key, dtor) == 0) ? thrd_success : thrd_error;
}

View File

@@ -109,21 +109,29 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 (Broadwell GT3e)")
CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x1902, skl_gt1, "Intel(R) Skylake DT GT1")
CHIPSET(0x1906, skl_gt1, "Intel(R) Skylake ULT GT1")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake SRV GT1")
CHIPSET(0x190B, skl_gt1, "Intel(R) Skylake Halo GT1")
CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake ULX GT1")
CHIPSET(0x1912, skl_gt2, "Intel(R) Skylake DT GT2")
CHIPSET(0x1916, skl_gt2, "Intel(R) Skylake ULT GT2")
CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake SRV GT2")
CHIPSET(0x191B, skl_gt2, "Intel(R) Skylake Halo GT2")
CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x1912, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x1913, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x1915, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x1916, skl_gt2, "Intel(R) HD Graphics 520 (Skylake GT2)")
CHIPSET(0x1917, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x191B, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x191D, skl_gt2, "Intel(R) HD Graphics P530 (Skylake GT2)")
CHIPSET(0x191E, skl_gt2, "Intel(R) HD Graphics 515 (Skylake GT2)")
CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x1923, skl_gt3, "Intel(R) Iris Graphics 540 (Skylake GT3e)")
CHIPSET(0x1926, skl_gt3, "Intel(R) HD Graphics 535 (Skylake GT3)")
CHIPSET(0x1927, skl_gt3, "Intel(R) Iris Graphics 550 (Skylake GT3e)")
CHIPSET(0x192A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x192B, skl_gt3, "Intel(R) Iris Graphics (Skylake GT3fe)")
CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")

View File

@@ -181,3 +181,5 @@ CHIPSET(0x9876, CARRIZO_, CARRIZO)
CHIPSET(0x9877, CARRIZO_, CARRIZO)
CHIPSET(0x7300, FIJI_, FIJI)
CHIPSET(0x98E4, STONEY_, STONEY)

View File

@@ -47,12 +47,21 @@ libEGL_la_LDFLAGS = \
$(LD_NO_UNDEFINED)
dri2_backend_FILES =
dri3_backend_FILES =
if HAVE_EGL_PLATFORM_X11
AM_CFLAGS += -DHAVE_X11_PLATFORM
AM_CFLAGS += $(XCB_DRI2_CFLAGS)
libEGL_la_LIBADD += $(XCB_DRI2_LIBS)
dri2_backend_FILES += drivers/dri2/platform_x11.c
if HAVE_DRI3
dri3_backend_FILES += \
drivers/dri2/platform_x11_dri3.c \
drivers/dri2/platform_x11_dri3.h
libEGL_la_LIBADD += $(top_builddir)/src/loader/libloader_dri3_helper.la
endif
endif
if HAVE_EGL_PLATFORM_WAYLAND
@@ -88,7 +97,8 @@ AM_CFLAGS += \
libEGL_la_SOURCES += \
$(dri2_backend_core_FILES) \
$(dri2_backend_FILES)
$(dri2_backend_FILES) \
$(dri3_backend_FILES)
libEGL_la_LIBADD += $(top_builddir)/src/loader/libloader.la
libEGL_la_LIBADD += $(DLOPEN_LIBS) $(LIBDRM_LIBS)
@@ -111,7 +121,10 @@ egl_HEADERS = \
$(top_srcdir)/include/EGL/eglmesaext.h \
$(top_srcdir)/include/EGL/eglplatform.h
TESTS = egl-symbols-check
EXTRA_DIST = \
egl-symbols-check \
SConscript \
drivers/haiku \
docs \

View File

@@ -8,6 +8,7 @@ env = env.Clone()
env.Append(CPPPATH = [
'#/include',
'#/include/HaikuGL',
'#/src/egl/main',
'#/src',
])
@@ -15,7 +16,6 @@ env.Append(CPPPATH = [
# parse Makefile.sources
egl_sources = env.ParseSourceList('Makefile.sources', 'LIBEGL_C_FILES')
egl_sources.append(env.ParseSourceList('Makefile.sources', 'dri2_backend_core_FILES'))
env.Append(CPPDEFINES = [
'_EGL_NATIVE_PLATFORM=_EGL_PLATFORM_HAIKU',

View File

@@ -27,6 +27,7 @@
#define WL_HIDE_DEPRECATED
#include <stdbool.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
@@ -130,12 +131,10 @@ const __DRIconfig *
dri2_get_dri_config(struct dri2_egl_config *conf, EGLint surface_type,
EGLenum colorspace)
{
if (colorspace == EGL_GL_COLORSPACE_SRGB_KHR)
return surface_type == EGL_WINDOW_BIT ? conf->dri_srgb_double_config :
conf->dri_srgb_single_config;
else
return surface_type == EGL_WINDOW_BIT ? conf->dri_double_config :
conf->dri_single_config;
const bool srgb = colorspace == EGL_GL_COLORSPACE_SRGB_KHR;
return surface_type == EGL_WINDOW_BIT ? conf->dri_double_config[srgb] :
conf->dri_single_config[srgb];
}
static EGLBoolean
@@ -283,14 +282,10 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id,
if (num_configs == 1) {
conf = (struct dri2_egl_config *) matching_config;
if (double_buffer && srgb && !conf->dri_srgb_double_config)
conf->dri_srgb_double_config = dri_config;
else if (double_buffer && !srgb && !conf->dri_double_config)
conf->dri_double_config = dri_config;
else if (!double_buffer && srgb && !conf->dri_srgb_single_config)
conf->dri_srgb_single_config = dri_config;
else if (!double_buffer && !srgb && !conf->dri_single_config)
conf->dri_single_config = dri_config;
if (double_buffer && !conf->dri_double_config[srgb])
conf->dri_double_config[srgb] = dri_config;
else if (!double_buffer && !conf->dri_single_config[srgb])
conf->dri_single_config[srgb] = dri_config;
else
/* a similar config type is already added (unlikely) => discard */
return NULL;
@@ -300,18 +295,13 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id,
if (conf == NULL)
return NULL;
if (double_buffer)
conf->dri_double_config[srgb] = dri_config;
else
conf->dri_single_config[srgb] = dri_config;
memcpy(&conf->base, &base, sizeof base);
if (double_buffer) {
if (srgb)
conf->dri_srgb_double_config = dri_config;
else
conf->dri_double_config = dri_config;
} else {
if (srgb)
conf->dri_srgb_single_config = dri_config;
else
conf->dri_single_config = dri_config;
}
conf->base.SurfaceType = 0;
conf->base.ConfigID = config_id;
_eglLinkConfig(&conf->base);
@@ -362,6 +352,12 @@ struct dri2_extension_match {
int offset;
};
static struct dri2_extension_match dri3_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_IMAGE_DRIVER, 1, offsetof(struct dri2_egl_display, image_driver) },
{ NULL, 0, 0 }
};
static struct dri2_extension_match dri2_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_DRI2, 2, offsetof(struct dri2_egl_display, dri2) },
@@ -395,13 +391,13 @@ dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
void *field;
for (i = 0; extensions[i]; i++) {
_eglLog(_EGL_DEBUG, "DRI2: found extension `%s'", extensions[i]->name);
_eglLog(_EGL_DEBUG, "found extension `%s'", extensions[i]->name);
for (j = 0; matches[j].name; j++) {
if (strcmp(extensions[i]->name, matches[j].name) == 0 &&
extensions[i]->version >= matches[j].version) {
field = ((char *) dri2_dpy + matches[j].offset);
*(const __DRIextension **) field = extensions[i];
_eglLog(_EGL_INFO, "DRI2: found extension %s version %d",
_eglLog(_EGL_INFO, "found extension %s version %d",
extensions[i]->name, extensions[i]->version);
}
}
@@ -410,7 +406,7 @@ dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
for (j = 0; matches[j].name; j++) {
field = ((char *) dri2_dpy + matches[j].offset);
if (*(const __DRIextension **) field == NULL) {
_eglLog(_EGL_WARNING, "DRI2: did not find extension %s version %d",
_eglLog(_EGL_WARNING, "did not find extension %s version %d",
matches[j].name, matches[j].version);
ret = EGL_FALSE;
}
@@ -503,6 +499,25 @@ dri2_open_driver(_EGLDisplay *disp)
return extensions;
}
EGLBoolean
dri2_load_driver_dri3(_EGLDisplay *disp)
{
struct dri2_egl_display *dri2_dpy = disp->DriverData;
const __DRIextension **extensions;
extensions = dri2_open_driver(disp);
if (!extensions)
return EGL_FALSE;
if (!dri2_bind_extensions(dri2_dpy, dri3_driver_extensions, extensions)) {
dlclose(dri2_dpy->driver);
return EGL_FALSE;
}
dri2_dpy->driver_extensions = extensions;
return EGL_TRUE;
}
EGLBoolean
dri2_load_driver(_EGLDisplay *disp)
{
@@ -560,7 +575,9 @@ dri2_setup_screen(_EGLDisplay *disp)
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
unsigned int api_mask;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
api_mask = dri2_dpy->image_driver->getAPIMask(dri2_dpy->dri_screen);
} else if (dri2_dpy->dri2) {
api_mask = dri2_dpy->dri2->getAPIMask(dri2_dpy->dri_screen);
} else {
assert(dri2_dpy->swrast);
@@ -580,7 +597,7 @@ dri2_setup_screen(_EGLDisplay *disp)
if (api_mask & (1 << __DRI_API_GLES3))
disp->ClientAPIs |= EGL_OPENGL_ES3_BIT_KHR;
assert(dri2_dpy->dri2 || dri2_dpy->swrast);
assert(dri2_dpy->image_driver || dri2_dpy->dri2 || dri2_dpy->swrast);
disp->Extensions.KHR_surfaceless_context = EGL_TRUE;
disp->Extensions.MESA_configless_context = EGL_TRUE;
@@ -588,7 +605,9 @@ dri2_setup_screen(_EGLDisplay *disp)
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB))
disp->Extensions.KHR_gl_colorspace = EGL_TRUE;
if (dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) {
if (dri2_dpy->image_driver ||
(dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) ||
(dri2_dpy->swrast && dri2_dpy->swrast->base.version >= 3)) {
disp->Extensions.KHR_create_context = EGL_TRUE;
if (dri2_dpy->robustness)
@@ -650,7 +669,14 @@ dri2_create_screen(_EGLDisplay *disp)
dri2_dpy = disp->DriverData;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
dri2_dpy->dri_screen =
dri2_dpy->image_driver->createNewScreen2(0, dri2_dpy->fd,
dri2_dpy->extensions,
dri2_dpy->driver_extensions,
&dri2_dpy->driver_configs,
disp);
} else if (dri2_dpy->dri2) {
if (dri2_dpy->dri2->base.version >= 4) {
dri2_dpy->dri_screen =
dri2_dpy->dri2->createNewScreen2(0, dri2_dpy->fd,
@@ -686,7 +712,7 @@ dri2_create_screen(_EGLDisplay *disp)
extensions = dri2_dpy->core->getExtensions(dri2_dpy->dri_screen);
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver || dri2_dpy->dri2) {
if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions))
goto cleanup_dri_screen;
} else {
@@ -784,7 +810,7 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp)
if (dri2_dpy->own_dri_screen)
dri2_dpy->core->destroyScreen(dri2_dpy->dri_screen);
if (dri2_dpy->fd)
if (dri2_dpy->fd >= 0)
close(dri2_dpy->fd);
if (dri2_dpy->driver)
dlclose(dri2_dpy->driver);
@@ -902,6 +928,55 @@ dri2_create_context_attribs_error(int dri_error)
_eglError(egl_error, "dri2_create_context");
}
static bool
dri2_fill_context_attribs(struct dri2_egl_context *dri2_ctx,
struct dri2_egl_display *dri2_dpy,
uint32_t *ctx_attribs,
unsigned *num_attribs)
{
int pos = 0;
assert(*num_attribs >= 8);
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MAJOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMajorVersion;
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMinorVersion;
if (dri2_ctx->base.Flags != 0) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode. Instead, generate the required EGL error here.
*/
if ((dri2_ctx->base.Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) != 0
&& !dri2_dpy->robustness) {
_eglError(EGL_BAD_MATCH, "eglCreateContext");
return false;
}
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_FLAGS;
ctx_attribs[pos++] = dri2_ctx->base.Flags;
}
if (dri2_ctx->base.ResetNotificationStrategy != EGL_NO_RESET_NOTIFICATION_KHR) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it a reset strategy. It may
* explode. Instead, generate the required EGL error here.
*/
if (!dri2_dpy->robustness) {
_eglError(EGL_BAD_CONFIG, "eglCreateContext");
return false;
}
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_RESET_STRATEGY;
ctx_attribs[pos++] = __DRI_CTX_RESET_LOSE_CONTEXT;
}
*num_attribs = pos;
return true;
}
/**
* Called via eglCreateContext(), drv->API.CreateContext().
*/
@@ -970,10 +1045,10 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
* doubleBufferMode check in
* src/mesa/main/context.c:check_compatible()
*/
if (dri2_config->dri_double_config)
dri_config = dri2_config->dri_double_config;
if (dri2_config->dri_double_config[0])
dri_config = dri2_config->dri_double_config[0];
else
dri_config = dri2_config->dri_single_config;
dri_config = dri2_config->dri_single_config[0];
/* EGL_WINDOW_BIT is set only when there is a dri_double_config. This
* makes sure the back buffer will always be used.
@@ -984,47 +1059,34 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
else
dri_config = NULL;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
unsigned error;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->image_driver->createContextAttribs(dri2_dpy->dri_screen,
api,
dri_config,
shared,
num_attribs / 2,
ctx_attribs,
& error,
dri2_ctx);
dri2_create_context_attribs_error(error);
} else if (dri2_dpy->dri2) {
if (dri2_dpy->dri2->base.version >= 3) {
unsigned error;
unsigned num_attribs = 0;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_MAJOR_VERSION;
ctx_attribs[num_attribs++] = dri2_ctx->base.ClientMajorVersion;
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[num_attribs++] = dri2_ctx->base.ClientMinorVersion;
if (dri2_ctx->base.Flags != 0) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode. Instead, generate the required EGL error here.
*/
if ((dri2_ctx->base.Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) != 0
&& !dri2_dpy->robustness) {
_eglError(EGL_BAD_MATCH, "eglCreateContext");
goto cleanup;
}
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_FLAGS;
ctx_attribs[num_attribs++] = dri2_ctx->base.Flags;
}
if (dri2_ctx->base.ResetNotificationStrategy != EGL_NO_RESET_NOTIFICATION_KHR) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it a reset strategy. It may
* explode. Instead, generate the required EGL error here.
*/
if (!dri2_dpy->robustness) {
_eglError(EGL_BAD_CONFIG, "eglCreateContext");
goto cleanup;
}
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_RESET_STRATEGY;
ctx_attribs[num_attribs++] = __DRI_CTX_RESET_LOSE_CONTEXT;
}
assert(num_attribs <= ARRAY_SIZE(ctx_attribs));
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->dri2->createContextAttribs(dri2_dpy->dri_screen,
@@ -1046,12 +1108,33 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
}
} else {
assert(dri2_dpy->swrast);
dri2_ctx->dri_context =
dri2_dpy->swrast->createNewContextForAPI(dri2_dpy->dri_screen,
api,
dri_config,
shared,
dri2_ctx);
if (dri2_dpy->swrast->base.version >= 3) {
unsigned error;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->swrast->createContextAttribs(dri2_dpy->dri_screen,
api,
dri_config,
shared,
num_attribs / 2,
ctx_attribs,
& error,
dri2_ctx);
dri2_create_context_attribs_error(error);
} else {
dri2_ctx->dri_context =
dri2_dpy->swrast->createNewContextForAPI(dri2_dpy->dri_screen,
api,
dri_config,
shared,
dri2_ctx);
}
}
if (!dri2_ctx->dri_context)
@@ -1090,11 +1173,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
{
struct dri2_egl_driver *dri2_drv = dri2_egl_driver(drv);
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_dsurf = dri2_egl_surface(dsurf);
struct dri2_egl_surface *dri2_rsurf = dri2_egl_surface(rsurf);
struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
_EGLContext *old_ctx;
_EGLSurface *old_dsurf, *old_rsurf;
_EGLSurface *tmp_dsurf, *tmp_rsurf;
__DRIdrawable *ddraw, *rdraw;
__DRIcontext *cctx;
@@ -1106,8 +1188,8 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
if (old_ctx && dri2_drv->glFlush)
dri2_drv->glFlush();
ddraw = (dri2_dsurf) ? dri2_dsurf->dri_drawable : NULL;
rdraw = (dri2_rsurf) ? dri2_rsurf->dri_drawable : NULL;
ddraw = (dsurf) ? dri2_dpy->vtbl->get_dri_drawable(dsurf) : NULL;
rdraw = (rsurf) ? dri2_dpy->vtbl->get_dri_drawable(rsurf) : NULL;
cctx = (dri2_ctx) ? dri2_ctx->dri_context : NULL;
if (old_ctx) {
@@ -1127,10 +1209,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
return EGL_TRUE;
} else {
/* undo the previous _eglBindContext */
_eglBindContext(old_ctx, old_dsurf, old_rsurf, &ctx, &dsurf, &rsurf);
_eglBindContext(old_ctx, old_dsurf, old_rsurf, &ctx, &tmp_dsurf, &tmp_rsurf);
assert(&dri2_ctx->base == ctx &&
&dri2_dsurf->base == dsurf &&
&dri2_rsurf->base == rsurf);
tmp_dsurf == dsurf &&
tmp_rsurf == rsurf);
_eglPutSurface(dsurf);
_eglPutSurface(rsurf);
@@ -1144,6 +1226,14 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
}
}
__DRIdrawable *
dri2_surface_get_dri_drawable(_EGLSurface *surf)
{
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
return dri2_surf->dri_drawable;
}
/*
* Called from eglGetProcAddress() via drv->API.GetProcAddress().
*/
@@ -1206,7 +1296,7 @@ void
dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(draw);
if (dri2_dpy->flush) {
if (dri2_dpy->flush->base.version >= 4) {
@@ -1224,12 +1314,12 @@ dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw)
* after calling eglSwapBuffers."
*/
dri2_dpy->flush->flush_with_flags(dri2_ctx->dri_context,
dri2_surf->dri_drawable,
dri_drawable,
__DRI2_FLUSH_DRAWABLE |
__DRI2_FLUSH_INVALIDATE_ANCILLARY,
__DRI2_THROTTLE_SWAPBUFFER);
} else {
dri2_dpy->flush->flush(dri2_surf->dri_drawable);
dri2_dpy->flush->flush(dri_drawable);
}
}
}
@@ -1286,7 +1376,8 @@ static EGLBoolean
dri2_wait_client(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(ctx->DrawSurface);
_EGLSurface *surf = ctx->DrawSurface;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
(void) drv;
@@ -1294,7 +1385,7 @@ dri2_wait_client(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
* we need to copy fake to real here.*/
if (dri2_dpy->flush != NULL)
dri2_dpy->flush->flush(dri2_surf->dri_drawable);
dri2_dpy->flush->flush(dri_drawable);
return EGL_TRUE;
}
@@ -1317,10 +1408,10 @@ dri2_bind_tex_image(_EGLDriver *drv,
_EGLDisplay *disp, _EGLSurface *surf, EGLint buffer)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
struct dri2_egl_context *dri2_ctx;
_EGLContext *ctx;
GLint format, target;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
ctx = _eglGetCurrentContext();
dri2_ctx = dri2_egl_context(ctx);
@@ -1328,7 +1419,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
if (!_eglBindTexImage(drv, disp, surf, buffer))
return EGL_FALSE;
switch (dri2_surf->base.TextureFormat) {
switch (surf->TextureFormat) {
case EGL_TEXTURE_RGB:
format = __DRI_TEXTURE_FORMAT_RGB;
break;
@@ -1340,7 +1431,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
format = __DRI_TEXTURE_FORMAT_RGBA;
}
switch (dri2_surf->base.TextureTarget) {
switch (surf->TextureTarget) {
case EGL_TEXTURE_2D:
target = GL_TEXTURE_2D;
break;
@@ -1351,7 +1442,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
(*dri2_dpy->tex_buffer->setTexBuffer2)(dri2_ctx->dri_context,
target, format,
dri2_surf->dri_drawable);
dri_drawable);
return EGL_TRUE;
}
@@ -1361,10 +1452,10 @@ dri2_release_tex_image(_EGLDriver *drv,
_EGLDisplay *disp, _EGLSurface *surf, EGLint buffer)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
struct dri2_egl_context *dri2_ctx;
_EGLContext *ctx;
GLint target;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
ctx = _eglGetCurrentContext();
dri2_ctx = dri2_egl_context(ctx);
@@ -1372,7 +1463,7 @@ dri2_release_tex_image(_EGLDriver *drv,
if (!_eglReleaseTexImage(drv, disp, surf, buffer))
return EGL_FALSE;
switch (dri2_surf->base.TextureTarget) {
switch (surf->TextureTarget) {
case EGL_TEXTURE_2D:
target = GL_TEXTURE_2D;
break;
@@ -1384,7 +1475,7 @@ dri2_release_tex_image(_EGLDriver *drv,
dri2_dpy->tex_buffer->releaseTexBuffer != NULL) {
(*dri2_dpy->tex_buffer->releaseTexBuffer)(dri2_ctx->dri_context,
target,
dri2_surf->dri_drawable);
dri_drawable);
}
return EGL_TRUE;
@@ -2384,13 +2475,18 @@ dri2_client_wait_sync(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSync *sync,
unsigned wait_flags = 0;
EGLint ret = EGL_CONDITION_SATISFIED_KHR;
if (flags & EGL_SYNC_FLUSH_COMMANDS_BIT_KHR)
/* The EGL_KHR_fence_sync spec states:
*
* "If no context is current for the bound API,
* the EGL_SYNC_FLUSH_COMMANDS_BIT_KHR bit is ignored.
*/
if (dri2_ctx && flags & EGL_SYNC_FLUSH_COMMANDS_BIT_KHR)
wait_flags |= __DRI2_FENCE_FLAG_FLUSH_COMMANDS;
/* the sync object should take a reference while waiting */
dri2_egl_ref_sync(dri2_sync);
if (dri2_dpy->fence->client_wait_sync(dri2_ctx->dri_context,
if (dri2_dpy->fence->client_wait_sync(dri2_ctx ? dri2_ctx->dri_context : NULL,
dri2_sync->fence, wait_flags,
timeout))
dri2_sync->base.SyncStatus = EGL_SIGNALED_KHR;

View File

@@ -35,6 +35,10 @@
#include <xcb/dri2.h>
#include <xcb/xfixes.h>
#include <X11/Xlib-xcb.h>
#ifdef HAVE_DRI3
#include "loader_dri3_helper.h"
#endif
#endif
#ifdef HAVE_WAYLAND_PLATFORM
@@ -145,6 +149,8 @@ struct dri2_egl_display_vtbl {
EGLBoolean (*get_sync_values)(_EGLDisplay *display, _EGLSurface *surface,
EGLuint64KHR *ust, EGLuint64KHR *msc,
EGLuint64KHR *sbc);
__DRIdrawable *(*get_dri_drawable)(_EGLSurface *surf);
};
struct dri2_egl_display
@@ -158,6 +164,7 @@ struct dri2_egl_display
const __DRIconfig **driver_configs;
void *driver;
const __DRIcoreExtension *core;
const __DRIimageDriverExtension *image_driver;
const __DRIdri2Extension *dri2;
const __DRIswrastExtension *swrast;
const __DRI2flushExtension *flush;
@@ -190,6 +197,9 @@ struct dri2_egl_display
#ifdef HAVE_X11_PLATFORM
xcb_connection_t *conn;
int screen;
#ifdef HAVE_DRI3
struct loader_dri3_extensions loader_dri3_ext;
#endif
#endif
#ifdef HAVE_WAYLAND_PLATFORM
@@ -203,8 +213,9 @@ struct dri2_egl_display
int formats;
uint32_t capabilities;
int is_render_node;
int is_different_gpu;
#endif
int is_different_gpu;
};
struct dri2_egl_context
@@ -284,10 +295,8 @@ struct dri2_egl_surface
struct dri2_egl_config
{
_EGLConfig base;
const __DRIconfig *dri_single_config;
const __DRIconfig *dri_double_config;
const __DRIconfig *dri_srgb_single_config;
const __DRIconfig *dri_srgb_double_config;
const __DRIconfig *dri_single_config[2];
const __DRIconfig *dri_double_config[2];
};
struct dri2_egl_image
@@ -326,9 +335,15 @@ dri2_setup_screen(_EGLDisplay *disp);
EGLBoolean
dri2_load_driver_swrast(_EGLDisplay *disp);
EGLBoolean
dri2_load_driver_dri3(_EGLDisplay *disp);
EGLBoolean
dri2_create_screen(_EGLDisplay *disp);
__DRIdrawable *
dri2_surface_get_dri_drawable(_EGLSurface *surf);
__DRIimage *
dri2_lookup_egl_image(__DRIscreen *screen, void *image, void *data);

View File

@@ -650,6 +650,7 @@ static struct dri2_egl_display_vtbl droid_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
EGLBoolean

View File

@@ -101,6 +101,7 @@ dri2_drm_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
struct dri2_egl_surface *dri2_surf;
struct gbm_surface *window = native_window;
struct gbm_dri_surface *surf;
const __DRIconfig *config;
(void) drv;
@@ -130,21 +131,20 @@ dri2_drm_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
goto cleanup_surf;
}
if (dri2_dpy->dri2) {
const __DRIconfig *config =
dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
config = dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
if (dri2_dpy->dri2) {
dri2_surf->dri_drawable =
(*dri2_dpy->dri2->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf->gbm_surf);
} else {
assert(dri2_dpy->swrast != NULL);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf->gbm_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf->gbm_surf);
}
if (dri2_surf->dri_drawable == NULL) {
@@ -594,6 +594,7 @@ static struct dri2_egl_display_vtbl dri2_drm_display_vtbl = {
.query_buffer_age = dri2_drm_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
EGLBoolean
@@ -623,27 +624,19 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
dri2_dpy->own_device = 1;
gbm = gbm_create_device(fd);
if (gbm == NULL)
return EGL_FALSE;
goto cleanup;
} else {
fd = fcntl(gbm_device_get_fd(gbm), F_DUPFD_CLOEXEC, 3);
if (fd < 0)
goto cleanup;
}
if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0) {
free(dri2_dpy);
return EGL_FALSE;
}
if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0)
goto cleanup;
dri2_dpy->gbm_dri = gbm_dri_device(gbm);
if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI) {
free(dri2_dpy);
return EGL_FALSE;
}
if (fd < 0) {
fd = fcntl(gbm_device_get_fd(gbm), F_DUPFD_CLOEXEC, 3);
if (fd < 0) {
free(dri2_dpy);
return EGL_FALSE;
}
}
if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI)
goto cleanup;
dri2_dpy->fd = fd;
dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd);
@@ -727,4 +720,11 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
dri2_dpy->vtbl = &dri2_drm_display_vtbl;
return EGL_TRUE;
cleanup:
if (fd >= 0)
close(fd);
free(dri2_dpy);
return EGL_FALSE;
}

View File

@@ -703,18 +703,10 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv,
dri2_surf->dx = 0;
dri2_surf->dy = 0;
if (n_rects == 0) {
wl_surface_damage(dri2_surf->wl_win->surface,
0, 0, INT32_MAX, INT32_MAX);
} else {
for (i = 0; i < n_rects; i++) {
const int *rect = &rects[i * 4];
wl_surface_damage(dri2_surf->wl_win->surface,
rect[0],
dri2_surf->base.Height - rect[1] - rect[3],
rect[2], rect[3]);
}
}
/* We deliberately ignore the damage region and post maximum damage, due to
* https://bugs.freedesktop.org/78190 */
wl_surface_damage(dri2_surf->wl_win->surface,
0, 0, INT32_MAX, INT32_MAX);
if (dri2_dpy->is_different_gpu) {
_EGLContext *ctx = _eglGetCurrentContext();
@@ -1033,6 +1025,7 @@ static struct dri2_egl_display_vtbl dri2_wl_display_vtbl = {
.query_buffer_age = dri2_wl_query_buffer_age,
.create_wayland_buffer_from_image = dri2_wl_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean
@@ -1645,6 +1638,7 @@ dri2_wl_swrast_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
struct dri2_egl_config *dri2_conf = dri2_egl_config(conf);
struct wl_egl_window *window = native_window;
struct dri2_egl_surface *dri2_surf;
const __DRIconfig *config;
(void) drv;
@@ -1669,10 +1663,12 @@ dri2_wl_swrast_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
dri2_surf->base.Width = -1;
dri2_surf->base.Height = -1;
config = dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen,
config, dri2_surf);
if (dri2_surf->dri_drawable == NULL) {
_eglError(EGL_BAD_ALLOC, "swrast->createNewDrawable");
goto cleanup_dri_drawable;
@@ -1757,6 +1753,7 @@ static struct dri2_egl_display_vtbl dri2_wl_swrast_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean
@@ -1804,6 +1801,7 @@ dri2_initialize_wayland_swrast(_EGLDriver *drv, _EGLDisplay *disp)
if (roundtrip(dri2_dpy) < 0 || dri2_dpy->formats == 0)
goto cleanup_shm;
dri2_dpy->fd = -1;
dri2_dpy->driver_name = strdup("swrast");
if (!dri2_load_driver_swrast(disp))
goto cleanup_shm;

View File

@@ -45,6 +45,10 @@
#include "egl_dri2_fallbacks.h"
#include "loader.h"
#ifdef HAVE_DRI3
#include "platform_x11_dri3.h"
#endif
static EGLBoolean
dri2_x11_swap_interval(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
EGLint interval);
@@ -206,6 +210,7 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
xcb_generic_error_t *error;
xcb_drawable_t drawable;
xcb_screen_t *screen;
const __DRIconfig *config;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_surface));
drawable = (uintptr_t) native_surface;
@@ -245,19 +250,18 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
dri2_surf->drawable = drawable;
}
if (dri2_dpy->dri2) {
const __DRIconfig *config =
dri2_get_dri_config(dri2_conf, type, dri2_surf->base.GLColorspace);
config = dri2_get_dri_config(dri2_conf, type,
dri2_surf->base.GLColorspace);
if (dri2_dpy->dri2) {
dri2_surf->dri_drawable =
(*dri2_dpy->dri2->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf);
} else {
assert(dri2_dpy->swrast);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf);
}
if (dri2_surf->dri_drawable == NULL) {
@@ -703,7 +707,7 @@ dri2_x11_local_authenticate(_EGLDisplay *disp)
static EGLBoolean
dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy,
_EGLDisplay *disp)
_EGLDisplay *disp, bool supports_preserved)
{
xcb_screen_iterator_t s;
xcb_depth_iterator_t d;
@@ -724,8 +728,10 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy,
surface_type =
EGL_WINDOW_BIT |
EGL_PIXMAP_BIT |
EGL_PBUFFER_BIT |
EGL_SWAP_BEHAVIOR_PRESERVED_BIT;
EGL_PBUFFER_BIT;
if (supports_preserved)
surface_type |= EGL_SWAP_BEHAVIOR_PRESERVED_BIT;
while (d.rem > 0) {
EGLBoolean class_added[6] = { 0, };
@@ -1112,6 +1118,7 @@ static struct dri2_egl_display_vtbl dri2_x11_swrast_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = {
@@ -1130,6 +1137,7 @@ static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_x11_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean
@@ -1161,6 +1169,7 @@ dri2_initialize_x11_swrast(_EGLDriver *drv, _EGLDisplay *disp)
* Every hardware driver_name is set using strdup. Doing the same in
* here will allow is to simply free the memory at dri2_terminate().
*/
dri2_dpy->fd = -1;
dri2_dpy->driver_name = strdup("swrast");
if (!dri2_load_driver_swrast(disp))
goto cleanup_conn;
@@ -1178,7 +1187,7 @@ dri2_initialize_x11_swrast(_EGLDriver *drv, _EGLDisplay *disp)
if (!dri2_create_screen(disp))
goto cleanup_driver;
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp))
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, true))
goto cleanup_configs;
/* Fill vtbl last to prevent accidentally calling virtual function during
@@ -1249,6 +1258,100 @@ dri2_x11_setup_swap_interval(struct dri2_egl_display *dri2_dpy)
}
}
#ifdef HAVE_DRI3
static EGLBoolean
dri2_initialize_x11_dri3(_EGLDriver *drv, _EGLDisplay *disp)
{
struct dri2_egl_display *dri2_dpy;
dri2_dpy = calloc(1, sizeof *dri2_dpy);
if (!dri2_dpy)
return _eglError(EGL_BAD_ALLOC, "eglInitialize");
disp->DriverData = (void *) dri2_dpy;
if (disp->PlatformDisplay == NULL) {
dri2_dpy->conn = xcb_connect(0, &dri2_dpy->screen);
dri2_dpy->own_device = true;
} else {
Display *dpy = disp->PlatformDisplay;
dri2_dpy->conn = XGetXCBConnection(dpy);
dri2_dpy->screen = DefaultScreen(dpy);
}
if (xcb_connection_has_error(dri2_dpy->conn)) {
_eglLog(_EGL_WARNING, "DRI3: xcb_connect failed");
goto cleanup_dpy;
}
if (dri2_dpy->conn) {
if (!dri3_x11_connect(dri2_dpy))
goto cleanup_conn;
}
if (!dri2_load_driver_dri3(disp))
goto cleanup_conn;
dri2_dpy->extensions[0] = &dri3_image_loader_extension.base;
dri2_dpy->extensions[1] = &use_invalidate.base;
dri2_dpy->extensions[2] = &image_lookup_extension.base;
dri2_dpy->extensions[3] = NULL;
dri2_dpy->swap_available = true;
dri2_dpy->invalidate_available = true;
if (!dri2_create_screen(disp))
goto cleanup_fd;
dri2_x11_setup_swap_interval(dri2_dpy);
if (!dri2_dpy->is_different_gpu)
disp->Extensions.KHR_image_pixmap = EGL_TRUE;
disp->Extensions.NOK_texture_from_pixmap = EGL_TRUE;
disp->Extensions.CHROMIUM_sync_control = EGL_TRUE;
disp->Extensions.EXT_buffer_age = EGL_TRUE;
#ifdef HAVE_WAYLAND_PLATFORM
disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
#endif
if (dri2_dpy->conn) {
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, false))
goto cleanup_configs;
}
dri2_dpy->loader_dri3_ext.core = dri2_dpy->core;
dri2_dpy->loader_dri3_ext.image_driver = dri2_dpy->image_driver;
dri2_dpy->loader_dri3_ext.flush = dri2_dpy->flush;
dri2_dpy->loader_dri3_ext.tex_buffer = dri2_dpy->tex_buffer;
dri2_dpy->loader_dri3_ext.image = dri2_dpy->image;
dri2_dpy->loader_dri3_ext.config = dri2_dpy->config;
/* Fill vtbl last to prevent accidentally calling virtual function during
* initialization.
*/
dri2_dpy->vtbl = &dri3_x11_display_vtbl;
_eglLog(_EGL_INFO, "Using DRI3");
return EGL_TRUE;
cleanup_configs:
_eglCleanupDisplay(disp);
dri2_dpy->core->destroyScreen(dri2_dpy->dri_screen);
dlclose(dri2_dpy->driver);
cleanup_fd:
close(dri2_dpy->fd);
cleanup_conn:
if (disp->PlatformDisplay == NULL)
xcb_disconnect(dri2_dpy->conn);
cleanup_dpy:
free(dri2_dpy);
return EGL_FALSE;
}
#endif
static EGLBoolean
dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
{
@@ -1320,7 +1423,7 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
#endif
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp))
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, true))
goto cleanup_configs;
/* Fill vtbl last to prevent accidentally calling virtual function during
@@ -1328,6 +1431,8 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
*/
dri2_dpy->vtbl = &dri2_x11_display_vtbl;
_eglLog(_EGL_INFO, "Using DRI2");
return EGL_TRUE;
cleanup_configs:
@@ -1354,9 +1459,16 @@ dri2_initialize_x11(_EGLDriver *drv, _EGLDisplay *disp)
int x11_dri2_accel = (getenv("LIBGL_ALWAYS_SOFTWARE") == NULL);
if (x11_dri2_accel) {
if (!dri2_initialize_x11_dri2(drv, disp)) {
initialized = dri2_initialize_x11_swrast(drv, disp);
#ifdef HAVE_DRI3
if (getenv("LIBGL_DRI3_DISABLE") != NULL ||
!dri2_initialize_x11_dri3(drv, disp)) {
#endif
if (!dri2_initialize_x11_dri2(drv, disp)) {
initialized = dri2_initialize_x11_swrast(drv, disp);
}
#ifdef HAVE_DRI3
}
#endif
} else {
initialized = dri2_initialize_x11_swrast(drv, disp);
}

View File

@@ -0,0 +1,547 @@
/*
* Copyright © 2015 Boyan Ding
*
* Permission to use, copy, modify, distribute, and sell this software and its
* documentation for any purpose is hereby granted without fee, provided that
* the above copyright notice appear in all copies and that both that copyright
* notice and this permission notice appear in supporting documentation, and
* that the name of the copyright holders not be used in advertising or
* publicity pertaining to distribution of the software without specific,
* written prior permission. The copyright holders make no representations
* about the suitability of this software for any purpose. It is provided "as
* is" without express or implied warranty.
*
* THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
* EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
* DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
* TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
* OF THIS SOFTWARE.
*/
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <xcb/xcb.h>
#include <xcb/dri3.h>
#include <xcb/present.h>
#include <xf86drm.h>
#include "egl_dri2.h"
#include "egl_dri2_fallbacks.h"
#include "platform_x11_dri3.h"
#include "loader.h"
#include "loader_dri3_helper.h"
static struct dri3_egl_surface *
loader_drawable_to_egl_surface(struct loader_dri3_drawable *draw) {
size_t offset = offsetof(struct dri3_egl_surface, loader_drawable);
return (struct dri3_egl_surface *)(((void*) draw) - offset);
}
static int
egl_dri3_get_swap_interval(struct loader_dri3_drawable *draw)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
return dri3_surf->base.SwapInterval;
}
static int
egl_dri3_clamp_swap_interval(struct loader_dri3_drawable *draw, int interval)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
if (interval > dri3_surf->base.Config->MaxSwapInterval)
interval = dri3_surf->base.Config->MaxSwapInterval;
else if (interval < dri3_surf->base.Config->MinSwapInterval)
interval = dri3_surf->base.Config->MinSwapInterval;
return interval;
}
static void
egl_dri3_set_swap_interval(struct loader_dri3_drawable *draw, int interval)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
dri3_surf->base.SwapInterval = interval;
}
static void
egl_dri3_set_drawable_size(struct loader_dri3_drawable *draw,
int width, int height)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
dri3_surf->base.Width = width;
dri3_surf->base.Height = height;
}
static bool
egl_dri3_in_current_context(struct loader_dri3_drawable *draw)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
_EGLContext *ctx = _eglGetCurrentContext();
return ctx->Resource.Display == dri3_surf->base.Resource.Display;
}
static __DRIcontext *
egl_dri3_get_dri_context(struct loader_dri3_drawable *draw)
{
_EGLContext *ctx = _eglGetCurrentContext();
struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
return dri2_ctx->dri_context;
}
static void
egl_dri3_flush_drawable(struct loader_dri3_drawable *draw, unsigned flags)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
_EGLDisplay *disp = dri3_surf->base.Resource.Display;
dri2_flush_drawable_for_swapbuffers(disp, &dri3_surf->base);
}
static struct loader_dri3_vtable egl_dri3_vtable = {
.get_swap_interval = egl_dri3_get_swap_interval,
.clamp_swap_interval = egl_dri3_clamp_swap_interval,
.set_swap_interval = egl_dri3_set_swap_interval,
.set_drawable_size = egl_dri3_set_drawable_size,
.in_current_context = egl_dri3_in_current_context,
.get_dri_context = egl_dri3_get_dri_context,
.flush_drawable = egl_dri3_flush_drawable,
.show_fps = NULL,
};
static EGLBoolean
dri3_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
(void) drv;
if (!_eglPutSurface(surf))
return EGL_TRUE;
loader_dri3_drawable_fini(&dri3_surf->loader_drawable);
free(surf);
return EGL_TRUE;
}
static EGLBoolean
dri3_set_swap_interval(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
EGLint interval)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
loader_dri3_set_swap_interval(&dri3_surf->loader_drawable, interval);
return EGL_TRUE;
}
static xcb_screen_t *
get_xcb_screen(xcb_screen_iterator_t iter, int screen)
{
for (; iter.rem; --screen, xcb_screen_next(&iter))
if (screen == 0)
return iter.data;
return NULL;
}
static _EGLSurface *
dri3_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
_EGLConfig *conf, void *native_surface,
const EGLint *attrib_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_config *dri2_conf = dri2_egl_config(conf);
struct dri3_egl_surface *dri3_surf;
const __DRIconfig *dri_config;
xcb_drawable_t drawable;
xcb_screen_iterator_t s;
xcb_screen_t *screen;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_surface));
drawable = (uintptr_t) native_surface;
(void) drv;
dri3_surf = calloc(1, sizeof *dri3_surf);
if (!dri3_surf) {
_eglError(EGL_BAD_ALLOC, "dri3_create_surface");
return NULL;
}
if (!_eglInitSurface(&dri3_surf->base, disp, type, conf, attrib_list))
goto cleanup_surf;
if (type == EGL_PBUFFER_BIT) {
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
_eglError(EGL_BAD_NATIVE_WINDOW, "dri3_create_surface");
goto cleanup_surf;
}
drawable = xcb_generate_id(dri2_dpy->conn);
xcb_create_pixmap(dri2_dpy->conn, conf->BufferSize,
drawable, screen->root,
dri3_surf->base.Width, dri3_surf->base.Height);
}
dri_config = dri2_get_dri_config(dri2_conf, type,
dri3_surf->base.GLColorspace);
if (loader_dri3_drawable_init(dri2_dpy->conn, drawable,
dri2_dpy->dri_screen,
dri2_dpy->is_different_gpu, dri_config,
&dri2_dpy->loader_dri3_ext,
&egl_dri3_vtable,
&dri3_surf->loader_drawable)) {
_eglError(EGL_BAD_ALLOC, "dri3_surface_create");
goto cleanup_pixmap;
}
return &dri3_surf->base;
cleanup_pixmap:
if (type == EGL_PBUFFER_BIT)
xcb_free_pixmap(dri2_dpy->conn, drawable);
cleanup_surf:
free(dri3_surf);
return NULL;
}
/**
* Called via eglCreateWindowSurface(), drv->API.CreateWindowSurface().
*/
static _EGLSurface *
dri3_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, void *native_window,
const EGLint *attrib_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
_EGLSurface *surf;
surf = dri3_create_surface(drv, disp, EGL_WINDOW_BIT, conf,
native_window, attrib_list);
if (surf != NULL)
dri3_set_swap_interval(drv, disp, surf, dri2_dpy->default_swap_interval);
return surf;
}
static _EGLSurface *
dri3_create_pixmap_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, void *native_pixmap,
const EGLint *attrib_list)
{
return dri3_create_surface(drv, disp, EGL_PIXMAP_BIT, conf,
native_pixmap, attrib_list);
}
static _EGLSurface *
dri3_create_pbuffer_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, const EGLint *attrib_list)
{
return dri3_create_surface(drv, disp, EGL_PBUFFER_BIT, conf,
XCB_WINDOW_NONE, attrib_list);
}
static EGLBoolean
dri3_get_sync_values(_EGLDisplay *display, _EGLSurface *surface,
EGLuint64KHR *ust, EGLuint64KHR *msc,
EGLuint64KHR *sbc)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surface);
return loader_dri3_wait_for_msc(&dri3_surf->loader_drawable, 0, 0, 0,
(int64_t *) ust, (int64_t *) msc,
(int64_t *) sbc) ? EGL_TRUE : EGL_FALSE;
}
static _EGLImage *
dri3_create_image_khr_pixmap(_EGLDisplay *disp, _EGLContext *ctx,
EGLClientBuffer buffer, const EGLint *attr_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_image *dri2_img;
xcb_drawable_t drawable;
xcb_dri3_buffer_from_pixmap_cookie_t bp_cookie;
xcb_dri3_buffer_from_pixmap_reply_t *bp_reply;
unsigned int format;
drawable = (xcb_drawable_t) (uintptr_t) buffer;
bp_cookie = xcb_dri3_buffer_from_pixmap(dri2_dpy->conn, drawable);
bp_reply = xcb_dri3_buffer_from_pixmap_reply(dri2_dpy->conn,
bp_cookie, NULL);
if (!bp_reply) {
_eglError(EGL_BAD_ALLOC, "xcb_dri3_buffer_from_pixmap");
return NULL;
}
switch (bp_reply->depth) {
case 16:
format = __DRI_IMAGE_FORMAT_RGB565;
break;
case 24:
format = __DRI_IMAGE_FORMAT_XRGB8888;
break;
case 32:
format = __DRI_IMAGE_FORMAT_ARGB8888;
break;
default:
_eglError(EGL_BAD_PARAMETER,
"dri3_create_image_khr: unsupported pixmap depth");
free(bp_reply);
return EGL_NO_IMAGE_KHR;
}
dri2_img = malloc(sizeof *dri2_img);
if (!dri2_img) {
_eglError(EGL_BAD_ALLOC, "dri3_create_image_khr");
return EGL_NO_IMAGE_KHR;
}
if (!_eglInitImage(&dri2_img->base, disp)) {
free(dri2_img);
return EGL_NO_IMAGE_KHR;
}
dri2_img->dri_image = loader_dri3_create_image(dri2_dpy->conn,
bp_reply,
format,
dri2_dpy->dri_screen,
dri2_dpy->image,
dri2_img);
free(bp_reply);
return &dri2_img->base;
}
static _EGLImage *
dri3_create_image_khr(_EGLDriver *drv, _EGLDisplay *disp,
_EGLContext *ctx, EGLenum target,
EGLClientBuffer buffer, const EGLint *attr_list)
{
(void) drv;
switch (target) {
case EGL_NATIVE_PIXMAP_KHR:
return dri3_create_image_khr_pixmap(disp, ctx, buffer, attr_list);
default:
return dri2_create_image_khr(drv, disp, ctx, target, buffer, attr_list);
}
}
/**
* Called by the driver when it needs to update the real front buffer with the
* contents of its fake front buffer.
*/
static void
dri3_flush_front_buffer(__DRIdrawable *driDrawable, void *loaderPrivate)
{
/* There does not seem to be any kind of consensus on whether we should
* support front-buffer rendering or not:
* http://lists.freedesktop.org/archives/mesa-dev/2013-June/040129.html
*/
_eglLog(_EGL_WARNING, "FIXME: egl/x11 doesn't support front buffer rendering.");
(void) driDrawable;
(void) loaderPrivate;
}
const __DRIimageLoaderExtension dri3_image_loader_extension = {
.base = { __DRI_IMAGE_LOADER, 1 },
.getBuffers = loader_dri3_get_buffers,
.flushFrontBuffer = dri3_flush_front_buffer,
};
static EGLBoolean
dri3_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(draw);
/* No-op for a pixmap or pbuffer surface */
if (draw->Type == EGL_PIXMAP_BIT || draw->Type == EGL_PBUFFER_BIT)
return 0;
return loader_dri3_swap_buffers_msc(&dri3_surf->loader_drawable,
0, 0, 0, 0,
draw->SwapBehavior == EGL_BUFFER_PRESERVED) != -1;
}
static EGLBoolean
dri3_copy_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
void *native_pixmap_target)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
xcb_pixmap_t target;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_pixmap_target));
target = (uintptr_t) native_pixmap_target;
loader_dri3_copy_drawable(&dri3_surf->loader_drawable, target,
dri3_surf->loader_drawable.drawable);
return EGL_TRUE;
}
static int
dri3_query_buffer_age(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
return loader_dri3_query_buffer_age(&dri3_surf->loader_drawable);
}
static __DRIdrawable *
dri3_get_dri_drawable(_EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
return dri3_surf->loader_drawable.dri_drawable;
}
struct dri2_egl_display_vtbl dri3_x11_display_vtbl = {
.authenticate = NULL,
.create_window_surface = dri3_create_window_surface,
.create_pixmap_surface = dri3_create_pixmap_surface,
.create_pbuffer_surface = dri3_create_pbuffer_surface,
.destroy_surface = dri3_destroy_surface,
.create_image = dri3_create_image_khr,
.swap_interval = dri3_set_swap_interval,
.swap_buffers = dri3_swap_buffers,
.swap_buffers_with_damage = dri2_fallback_swap_buffers_with_damage,
.swap_buffers_region = dri2_fallback_swap_buffers_region,
.post_sub_buffer = dri2_fallback_post_sub_buffer,
.copy_buffers = dri3_copy_buffers,
.query_buffer_age = dri3_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri3_get_sync_values,
.get_dri_drawable = dri3_get_dri_drawable,
};
static char *
dri3_get_device_name(int fd)
{
char *ret = NULL;
ret = drmGetRenderDeviceNameFromFd(fd);
if (ret)
return ret;
/* For dri3, render node support is required for WL_bind_wayland_display.
* In order not to regress on older systems without kernel or libdrm
* support, fall back to dri2. User can override it with environment
* variable if they don't need to use that extension.
*/
if (getenv("EGL_FORCE_DRI3") == NULL) {
_eglLog(_EGL_WARNING, "Render node support not available, falling back to dri2");
_eglLog(_EGL_WARNING, "If you want to force dri3, set EGL_FORCE_DRI3 environment variable");
} else
ret = loader_get_device_name_for_fd(fd);
return ret;
}
EGLBoolean
dri3_x11_connect(struct dri2_egl_display *dri2_dpy)
{
xcb_dri3_query_version_reply_t *dri3_query;
xcb_dri3_query_version_cookie_t dri3_query_cookie;
xcb_present_query_version_reply_t *present_query;
xcb_present_query_version_cookie_t present_query_cookie;
xcb_generic_error_t *error;
xcb_screen_iterator_t s;
xcb_screen_t *screen;
const xcb_query_extension_reply_t *extension;
xcb_prefetch_extension_data (dri2_dpy->conn, &xcb_dri3_id);
xcb_prefetch_extension_data (dri2_dpy->conn, &xcb_present_id);
extension = xcb_get_extension_data(dri2_dpy->conn, &xcb_dri3_id);
if (!(extension && extension->present))
return EGL_FALSE;
extension = xcb_get_extension_data(dri2_dpy->conn, &xcb_present_id);
if (!(extension && extension->present))
return EGL_FALSE;
dri3_query_cookie = xcb_dri3_query_version(dri2_dpy->conn,
XCB_DRI3_MAJOR_VERSION,
XCB_DRI3_MINOR_VERSION);
present_query_cookie = xcb_present_query_version(dri2_dpy->conn,
XCB_PRESENT_MAJOR_VERSION,
XCB_PRESENT_MINOR_VERSION);
dri3_query =
xcb_dri3_query_version_reply(dri2_dpy->conn, dri3_query_cookie, &error);
if (dri3_query == NULL || error != NULL) {
_eglLog(_EGL_WARNING, "DRI3: failed to query the version");
free(dri3_query);
free(error);
return EGL_FALSE;
}
free(dri3_query);
present_query =
xcb_present_query_version_reply(dri2_dpy->conn,
present_query_cookie, &error);
if (present_query == NULL || error != NULL) {
_eglLog(_EGL_WARNING, "DRI3: failed to query Present version");
free(present_query);
free(error);
return EGL_FALSE;
}
free(present_query);
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
_eglError(EGL_BAD_NATIVE_WINDOW, "dri3_x11_connect");
return EGL_FALSE;
}
dri2_dpy->fd = loader_dri3_open(dri2_dpy->conn, screen->root, 0);
if (dri2_dpy->fd < 0) {
int conn_error = xcb_connection_has_error(dri2_dpy->conn);
_eglLog(_EGL_WARNING, "DRI3: Screen seems not DRI3 capable");
if (conn_error)
_eglLog(_EGL_WARNING, "DRI3: Failed to initialize");
return EGL_FALSE;
}
dri2_dpy->fd = loader_get_user_preferred_fd(dri2_dpy->fd, &dri2_dpy->is_different_gpu);
dri2_dpy->driver_name = loader_get_driver_for_fd(dri2_dpy->fd, 0);
if (!dri2_dpy->driver_name) {
_eglLog(_EGL_WARNING, "DRI3: No driver found");
close(dri2_dpy->fd);
return EGL_FALSE;
}
dri2_dpy->device_name = dri3_get_device_name(dri2_dpy->fd);
if (!dri2_dpy->device_name) {
close(dri2_dpy->fd);
return EGL_FALSE;
}
return EGL_TRUE;
}

View File

@@ -0,0 +1,41 @@
/*
* Copyright © 2015 Boyan Ding
*
* Permission to use, copy, modify, distribute, and sell this software and its
* documentation for any purpose is hereby granted without fee, provided that
* the above copyright notice appear in all copies and that both that copyright
* notice and this permission notice appear in supporting documentation, and
* that the name of the copyright holders not be used in advertising or
* publicity pertaining to distribution of the software without specific,
* written prior permission. The copyright holders make no representations
* about the suitability of this software for any purpose. It is provided "as
* is" without express or implied warranty.
*
* THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
* EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
* DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
* TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
* OF THIS SOFTWARE.
*/
#ifndef EGL_X11_DRI3_INCLUDED
#define EGL_X11_DRI3_INCLUDED
#include "egl_dri2.h"
_EGL_DRIVER_TYPECAST(dri3_egl_surface, _EGLSurface, obj)
struct dri3_egl_surface {
_EGLSurface base;
struct loader_dri3_drawable loader_drawable;
};
extern const __DRIimageLoaderExtension dri3_image_loader_extension;
extern struct dri2_egl_display_vtbl dri3_x11_display_vtbl;
EGLBoolean
dri3_x11_connect(struct dri2_egl_display *dri2_dpy);
#endif

55
src/egl/egl-symbols-check Executable file
View File

@@ -0,0 +1,55 @@
#!/bin/bash
FUNCS=$(nm -D --defined-only ${1-.libs/libEGL.so} | grep -o "T .*" | cut -c 3- | while read func; do
( grep -q "^$func$" || echo $func ) <<EOF
eglBindAPI
eglBindTexImage
eglChooseConfig
eglClientWaitSync
eglCopyBuffers
eglCreateContext
eglCreateImage
eglCreatePbufferFromClientBuffer
eglCreatePbufferSurface
eglCreatePixmapSurface
eglCreatePlatformPixmapSurface
eglCreatePlatformWindowSurface
eglCreateSync
eglCreateWindowSurface
eglDestroyContext
eglDestroyImage
eglDestroySurface
eglDestroySync
eglGetConfigAttrib
eglGetConfigs
eglGetCurrentContext
eglGetCurrentDisplay
eglGetCurrentSurface
eglGetDisplay
eglGetError
eglGetPlatformDisplay
eglGetProcAddress
eglGetSyncAttrib
eglInitialize
eglMakeCurrent
eglQueryAPI
eglQueryContext
eglQueryString
eglQuerySurface
eglReleaseTexImage
eglReleaseThread
eglSurfaceAttrib
eglSwapBuffers
eglSwapInterval
eglTerminate
eglWaitClient
eglWaitGL
eglWaitNative
eglWaitSync
_fini
_init
EOF
done)
test ! -n "$FUNCS" || echo $FUNCS
test ! -n "$FUNCS"

View File

@@ -152,12 +152,51 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay *dpy,
/* The EGL_KHR_create_context spec says:
*
* "Flags are only defined for OpenGL context creation, and
* specifying a flags value other than zero for other types of
* contexts, including OpenGL ES contexts, will generate an
* error."
* "If the EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR flag bit is set in
* EGL_CONTEXT_FLAGS_KHR, then a <debug context> will be created.
* [...]
* In some cases a debug context may be identical to a non-debug
* context. This bit is supported for OpenGL and OpenGL ES
* contexts."
*/
if (api != EGL_OPENGL_API && val != 0) {
if ((val & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR) &&
(api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API)) {
err = EGL_BAD_ATTRIBUTE;
break;
}
/* The EGL_KHR_create_context spec says:
*
* "If the EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR flag bit
* is set in EGL_CONTEXT_FLAGS_KHR, then a <forward-compatible>
* context will be created. Forward-compatible contexts are
* defined only for OpenGL versions 3.0 and later. They must not
* support functionality marked as <deprecated> by that version of
* the API, while a non-forward-compatible context must support
* all functionality in that version, deprecated or not. This bit
* is supported for OpenGL contexts, and requesting a
* forward-compatible context for OpenGL versions less than 3.0
* will generate an error."
*/
if ((val & EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR) &&
(api != EGL_OPENGL_API || ctx->ClientMajorVersion < 3)) {
err = EGL_BAD_ATTRIBUTE;
break;
}
/* The EGL_KHR_create_context_spec says:
*
* "If the EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR bit is set in
* EGL_CONTEXT_FLAGS_KHR, then a context supporting <robust buffer
* access> will be created. Robust buffer access is defined in the
* GL_ARB_robustness extension specification, and the resulting
* context must also support either the GL_ARB_robustness
* extension, or a version of OpenGL incorporating equivalent
* functionality. This bit is supported for OpenGL contexts.
*/
if ((val & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) &&
(api != EGL_OPENGL_API ||
!dpy->Extensions.EXT_create_context_robustness)) {
err = EGL_BAD_ATTRIBUTE;
break;
}

View File

@@ -197,7 +197,7 @@ drm_authenticate(struct wl_client *client,
wl_resource_post_event(resource, WL_DRM_AUTHENTICATED);
}
const static struct wl_drm_interface drm_interface = {
static const struct wl_drm_interface drm_interface = {
drm_authenticate,
drm_create_buffer,
drm_create_planar_buffer,

View File

@@ -1,3 +1,32 @@
/*
* Copyright © 2011 Kristian Høgsberg
* Copyright © 2011 Benjamin Franzke
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*
* Authors:
* Kristian Høgsberg <krh@bitplanet.net>
* Benjamin Franzke <benjaminfranzke@googlemail.com>
*/
#include <stdlib.h>
#include <wayland-client.h>

View File

@@ -27,6 +27,7 @@ GALLIUM_TOP := $(call my-dir)
GALLIUM_COMMON_MK := $(GALLIUM_TOP)/Android.common.mk
SUBDIRS := auxiliary
SUBDIRS += auxiliary/pipe-loader
#
# Gallium drivers and their respective winsys

View File

@@ -67,3 +67,9 @@ if HAVE_DRISW
GALLIUM_PIPE_LOADER_WINSYS_LIBS += \
$(top_builddir)/src/gallium/winsys/sw/dri/libswdri.la
endif
if HAVE_DRISW_KMS
GALLIUM_PIPE_LOADER_WINSYS_LIBS += \
$(top_builddir)/src/gallium/winsys/sw/kms-dri/libswkmsdri.la \
$(LIBDRM_LIBS)
endif

View File

@@ -5,12 +5,14 @@ SUBDIRS =
##
SUBDIRS += auxiliary
SUBDIRS += auxiliary/pipe-loader
##
## Gallium pipe drivers and their respective winsys'
##
SUBDIRS += \
drivers/ddebug \
drivers/noop \
drivers/trace \
drivers/rbug
@@ -81,6 +83,11 @@ if HAVE_GALLIUM_VC4
SUBDIRS += drivers/vc4 winsys/vc4/drm
endif
## virgl
if HAVE_GALLIUM_VIRGL
SUBDIRS += drivers/virgl winsys/virgl/drm winsys/virgl/vtest
endif
## the sw winsys'
SUBDIRS += winsys/sw/null
@@ -92,7 +99,7 @@ if HAVE_DRISW
SUBDIRS += winsys/sw/dri
endif
if HAVE_DRI2
if HAVE_DRISW_KMS
SUBDIRS += winsys/sw/kms-dri
endif
@@ -114,7 +121,8 @@ EXTRA_DIST = \
## Gallium state trackers and their users (targets)
##
if HAVE_LOADER_GALLIUM
## XXX: Rename the conditional once we have a config switch for static/dynamic pipe-drivers
if HAVE_CLOVER
SUBDIRS += targets/pipe-loader
endif

View File

@@ -5,6 +5,7 @@ Import('env')
#
SConscript('auxiliary/SConscript')
SConscript('auxiliary/pipe-loader/SConscript')
#
# Drivers

View File

@@ -1,7 +1,3 @@
if HAVE_LOADER_GALLIUM
SUBDIRS := pipe-loader
endif
include Makefile.sources
include $(top_srcdir)/src/gallium/Automake.inc
@@ -38,18 +34,23 @@ libgallium_la_SOURCES += \
endif
indices/u_indices_gen.c: $(srcdir)/indices/u_indices_gen.py
$(AM_V_at)$(MKDIR_P) indices
$(AM_V_GEN) $(PYTHON2) $< > $@
MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
indices/u_unfilled_gen.c: $(srcdir)/indices/u_unfilled_gen.py
$(AM_V_at)$(MKDIR_P) indices
$(AM_V_GEN) $(PYTHON2) $< > $@
indices/u_indices_gen.c: indices/u_indices_gen.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/indices/u_indices_gen.py > $@
util/u_format_table.c: $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format_pack.py $(srcdir)/util/u_format_parse.py $(srcdir)/util/u_format.csv
$(AM_V_at)$(MKDIR_P) util
$(AM_V_GEN) $(PYTHON2) $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format.csv > $@
indices/u_unfilled_gen.c: indices/u_unfilled_gen.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/indices/u_unfilled_gen.py > $@
util/u_format_table.c: util/u_format_table.py \
util/u_format_pack.py \
util/u_format_parse.py \
util/u_format.csv
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format.csv > $@
noinst_LTLIBRARIES += libgalliumvl_stub.la
libgalliumvl_stub_la_SOURCES = \
@@ -61,15 +62,7 @@ COMMON_VL_CFLAGS = \
$(AM_CFLAGS) \
$(VL_CFLAGS) \
$(DRI2PROTO_CFLAGS) \
$(LIBDRM_CFLAGS) \
$(GALLIUM_PIPE_LOADER_DEFINES) \
-DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
if HAVE_GALLIUM_STATIC_TARGETS
COMMON_VL_CFLAGS += \
-DGALLIUM_STATIC_TARGETS=1
endif # HAVE_GALLIUM_STATIC_TARGETS
$(LIBDRM_CFLAGS)
noinst_LTLIBRARIES += libgalliumvl.la

View File

@@ -129,12 +129,16 @@ C_SOURCES := \
rtasm/rtasm_execmem.h \
rtasm/rtasm_x86sse.c \
rtasm/rtasm_x86sse.h \
tgsi/tgsi_aa_point.c \
tgsi/tgsi_aa_point.h \
tgsi/tgsi_build.c \
tgsi/tgsi_build.h \
tgsi/tgsi_dump.c \
tgsi/tgsi_dump.h \
tgsi/tgsi_exec.c \
tgsi/tgsi_exec.h \
tgsi/tgsi_emulate.c \
tgsi/tgsi_emulate.h \
tgsi/tgsi_info.c \
tgsi/tgsi_info.h \
tgsi/tgsi_iterate.c \
@@ -144,6 +148,8 @@ C_SOURCES := \
tgsi/tgsi_opcode_tmp.h \
tgsi/tgsi_parse.c \
tgsi/tgsi_parse.h \
tgsi/tgsi_point_sprite.c \
tgsi/tgsi_point_sprite.h \
tgsi/tgsi_sanity.c \
tgsi/tgsi_sanity.h \
tgsi/tgsi_scan.c \
@@ -154,6 +160,8 @@ C_SOURCES := \
tgsi/tgsi_text.h \
tgsi/tgsi_transform.c \
tgsi/tgsi_transform.h \
tgsi/tgsi_two_side.c \
tgsi/tgsi_two_side.h \
tgsi/tgsi_ureg.c \
tgsi/tgsi_ureg.h \
tgsi/tgsi_util.c \
@@ -260,6 +268,8 @@ C_SOURCES := \
util/u_pack_color.h \
util/u_pointer.h \
util/u_prim.h \
util/u_prim_restart.c \
util/u_prim_restart.h \
util/u_pstipple.c \
util/u_pstipple.h \
util/u_range.h \
@@ -339,7 +349,8 @@ VL_SOURCES := \
# XXX: Nuke this as our dri targets no longer depend on VL.
VL_WINSYS_SOURCES := \
vl/vl_winsys_dri.c
vl/vl_winsys_dri.c \
vl/vl_winsys_drm.c
VL_STUB_SOURCES := \
vl/vl_stubs.c
@@ -368,7 +379,9 @@ GALLIVM_SOURCES := \
gallivm/lp_bld_flow.h \
gallivm/lp_bld_format_aos_array.c \
gallivm/lp_bld_format_aos.c \
gallivm/lp_bld_format_cached.c \
gallivm/lp_bld_format_float.c \
gallivm/lp_bld_format.c \
gallivm/lp_bld_format.h \
gallivm/lp_bld_format_soa.c \
gallivm/lp_bld_format_srgb.c \

View File

@@ -625,6 +625,7 @@ generate_vs(struct draw_llvm_variant *variant,
inputs,
outputs,
context_ptr,
NULL,
draw_sampler,
&llvm->draw->vs.vertex_shader->info,
NULL);
@@ -749,7 +750,8 @@ generate_fetch(struct gallivm_state *gallivm,
lp_float32_vec4_type(),
FALSE,
map_ptr,
zero, zero, zero);
zero, zero, zero,
NULL);
LLVMBuildStore(builder, val, temp_ptr);
}
lp_build_endif(&if_ctx);
@@ -2193,6 +2195,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
NULL,
outputs,
context_ptr,
NULL,
sampler,
&llvm->draw->gs.geometry_shader->info,
(const struct lp_build_tgsi_gs_iface *)&gs_iface);

View File

@@ -240,7 +240,8 @@ aa_transform_prolog(struct tgsi_transform_context *ctx)
TGSI_FILE_INPUT, texInput, TGSI_SWIZZLE_W);
/* KILL_IF -tmp0.yyyy; # if -tmp0.y < 0, KILL */
tgsi_transform_kill_inst(ctx, TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_Y);
tgsi_transform_kill_inst(ctx, TGSI_FILE_TEMPORARY, tmp0,
TGSI_SWIZZLE_Y, TRUE);
/* compute coverage factor = (1-d)/(1-k) */

View File

@@ -280,7 +280,8 @@ pstip_transform_prolog(struct tgsi_transform_context *ctx)
/* KILL_IF -texTemp.wwww; # if -texTemp < 0, KILL fragment */
tgsi_transform_kill_inst(ctx,
TGSI_FILE_TEMPORARY, pctx->texTemp, TGSI_SWIZZLE_W);
TGSI_FILE_TEMPORARY, pctx->texTemp,
TGSI_SWIZZLE_W, TRUE);
}

View File

@@ -355,8 +355,9 @@ struct draw_vertex_info {
};
/* these flags are set if the primitive is a segment of a larger one */
#define DRAW_SPLIT_BEFORE 0x1
#define DRAW_SPLIT_AFTER 0x2
#define DRAW_SPLIT_BEFORE 0x1
#define DRAW_SPLIT_AFTER 0x2
#define DRAW_LINE_LOOP_AS_STRIP 0x4
struct draw_prim_info {
boolean linear;

View File

@@ -359,6 +359,16 @@ fetch_pipeline_generic(struct draw_pt_middle_end *middle,
}
static inline unsigned
prim_type(unsigned prim, unsigned flags)
{
if (flags & DRAW_LINE_LOOP_AS_STRIP)
return PIPE_PRIM_LINE_STRIP;
else
return prim;
}
static void
fetch_pipeline_run(struct draw_pt_middle_end *middle,
const unsigned *fetch_elts,
@@ -380,7 +390,7 @@ fetch_pipeline_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;
@@ -408,7 +418,7 @@ fetch_pipeline_linear_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = count;
prim_info.elts = NULL;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &count;
@@ -439,7 +449,7 @@ fetch_pipeline_linear_run_elts(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;

View File

@@ -473,6 +473,16 @@ llvm_pipeline_generic(struct draw_pt_middle_end *middle,
}
static inline unsigned
prim_type(unsigned prim, unsigned flags)
{
if (flags & DRAW_LINE_LOOP_AS_STRIP)
return PIPE_PRIM_LINE_STRIP;
else
return prim;
}
static void
llvm_middle_end_run(struct draw_pt_middle_end *middle,
const unsigned *fetch_elts,
@@ -494,7 +504,7 @@ llvm_middle_end_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;
@@ -522,7 +532,7 @@ llvm_middle_end_linear_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = count;
prim_info.elts = NULL;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &count;
@@ -552,7 +562,7 @@ llvm_middle_end_linear_run_elts(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;

View File

@@ -249,6 +249,9 @@ vsplit_segment_loop_linear(struct vsplit_frontend *vsplit, unsigned flags,
assert(icount + !!close_loop <= vsplit->segment_size);
/* need to draw the sections of the line loop as line strips */
flags |= DRAW_LINE_LOOP_AS_STRIP;
if (close_loop) {
for (nr = 0; nr < icount; nr++)
vsplit->fetch_elts[nr] = istart + nr;

View File

@@ -311,7 +311,7 @@ lp_build_const_elem(struct gallivm_state *gallivm,
else {
double dscale = lp_const_scale(type);
elem = LLVMConstInt(elem_type, round(val*dscale), 0);
elem = LLVMConstInt(elem_type, (long long) round(val*dscale), 0);
}
return elem;

View File

@@ -0,0 +1,56 @@
/**************************************************************************
*
* Copyright 2010 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
**************************************************************************/
#include "lp_bld_format.h"
LLVMTypeRef
lp_build_format_cache_type(struct gallivm_state *gallivm)
{
LLVMTypeRef elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_COUNT];
LLVMTypeRef s;
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_DATA] =
LLVMArrayType(LLVMInt32TypeInContext(gallivm->context),
LP_BUILD_FORMAT_CACHE_SIZE * 16);
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_TAGS] =
LLVMArrayType(LLVMInt64TypeInContext(gallivm->context),
LP_BUILD_FORMAT_CACHE_SIZE);
#if LP_BUILD_FORMAT_CACHE_DEBUG
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL] =
LLVMInt64TypeInContext(gallivm->context);
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS] =
LLVMInt64TypeInContext(gallivm->context);
#endif
s = LLVMStructTypeInContext(gallivm->context, elem_types,
LP_BUILD_FORMAT_CACHE_MEMBER_COUNT, 0);
return s;
}

View File

@@ -44,6 +44,45 @@ struct lp_type;
struct lp_build_context;
#define LP_BUILD_FORMAT_CACHE_DEBUG 0
/*
* Block cache
*
* Optional block cache to be used when unpacking big pixel blocks.
* Must be a power of 2
*/
#define LP_BUILD_FORMAT_CACHE_SIZE 128
/*
* Note: cache_data needs 16 byte alignment.
*/
struct lp_build_format_cache
{
PIPE_ALIGN_VAR(16) uint32_t cache_data[LP_BUILD_FORMAT_CACHE_SIZE][4][4];
uint64_t cache_tags[LP_BUILD_FORMAT_CACHE_SIZE];
#if LP_BUILD_FORMAT_CACHE_DEBUG
uint64_t cache_access_total;
uint64_t cache_access_miss;
#endif
};
enum {
LP_BUILD_FORMAT_CACHE_MEMBER_DATA = 0,
LP_BUILD_FORMAT_CACHE_MEMBER_TAGS,
#if LP_BUILD_FORMAT_CACHE_DEBUG
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS,
#endif
LP_BUILD_FORMAT_CACHE_MEMBER_COUNT
};
LLVMTypeRef
lp_build_format_cache_type(struct gallivm_state *gallivm);
/*
* AoS
*/
@@ -66,7 +105,8 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j);
LLVMValueRef j,
LLVMValueRef cache);
LLVMValueRef
lp_build_fetch_rgba_aos_array(struct gallivm_state *gallivm,
@@ -107,13 +147,13 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
LLVMValueRef offsets,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache,
LLVMValueRef rgba_out[4]);
/*
* YUV
*/
LLVMValueRef
lp_build_fetch_subsampled_rgba_aos(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
@@ -123,6 +163,18 @@ lp_build_fetch_subsampled_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef i,
LLVMValueRef j);
LLVMValueRef
lp_build_fetch_cached_texels(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
unsigned n,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache);
/*
* special float formats
*/

View File

@@ -370,7 +370,8 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j)
LLVMValueRef j,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
unsigned num_pixels = type.length / 4;
@@ -502,6 +503,34 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
return tmp;
}
/*
* s3tc rgb formats
*/
if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC && cache) {
struct lp_type tmp_type;
LLVMValueRef tmp;
memset(&tmp_type, 0, sizeof tmp_type);
tmp_type.width = 8;
tmp_type.length = num_pixels * 4;
tmp_type.norm = TRUE;
tmp = lp_build_fetch_cached_texels(gallivm,
format_desc,
num_pixels,
base_ptr,
offset,
i, j,
cache);
lp_build_conv(gallivm,
tmp_type, type,
&tmp, 1, &tmp, 1);
return tmp;
}
/*
* Fallback to util_format_description::fetch_rgba_8unorm().
*/

View File

@@ -0,0 +1,374 @@
/**************************************************************************
*
* Copyright 2015 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "lp_bld_format.h"
#include "lp_bld_type.h"
#include "lp_bld_struct.h"
#include "lp_bld_const.h"
#include "lp_bld_flow.h"
#include "lp_bld_swizzle.h"
#include "util/u_math.h"
/**
* @file
* Complex block-compression based formats are handled here by using a cache,
* so re-decoding of every pixel is not required.
* Especially for bilinear filtering, texel reuse is very high hence even
* a small cache helps.
* The elements in the cache are the decoded blocks - currently things
* are restricted to formats which are 4x4 block based, and the decoded
* texels must fit into 4x8 bits.
* The cache is direct mapped so hitrates aren't all that great and cache
* thrashing could happen.
*
* @author Roland Scheidegger <sroland@vmware.com>
*/
#if LP_BUILD_FORMAT_CACHE_DEBUG
static void
update_cache_access(struct gallivm_state *gallivm,
LLVMValueRef ptr,
unsigned count,
unsigned index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, cache_access;
assert(index == LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL ||
index == LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
member_ptr = lp_build_struct_get_ptr(gallivm, ptr, index, "");
cache_access = LLVMBuildLoad(builder, member_ptr, "cache_access");
cache_access = LLVMBuildAdd(builder, cache_access,
LLVMConstInt(LLVMInt64TypeInContext(gallivm->context),
count, 0), "");
LLVMBuildStore(builder, cache_access, member_ptr);
}
#endif
static void
store_cached_block(struct gallivm_state *gallivm,
LLVMValueRef *col,
LLVMValueRef tag_value,
LLVMValueRef hash_index,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef ptr, indices[3];
LLVMTypeRef type_ptr4x32;
unsigned count;
type_ptr4x32 = LLVMPointerType(LLVMVectorType(LLVMInt32TypeInContext(gallivm->context), 4), 0);
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_TAGS);
indices[2] = hash_index;
ptr = LLVMBuildGEP(builder, cache, indices, Elements(indices), "");
LLVMBuildStore(builder, tag_value, ptr);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_DATA);
hash_index = LLVMBuildMul(builder, hash_index,
lp_build_const_int32(gallivm, 16), "");
for (count = 0; count < 4; count++) {
indices[2] = hash_index;
ptr = LLVMBuildGEP(builder, cache, indices, Elements(indices), "");
ptr = LLVMBuildBitCast(builder, ptr, type_ptr4x32, "");
LLVMBuildStore(builder, col[count], ptr);
hash_index = LLVMBuildAdd(builder, hash_index,
lp_build_const_int32(gallivm, 4), "");
}
}
static LLVMValueRef
lookup_cached_pixel(struct gallivm_state *gallivm,
LLVMValueRef ptr,
LLVMValueRef index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, indices[3];
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_DATA);
indices[2] = index;
member_ptr = LLVMBuildGEP(builder, ptr, indices, Elements(indices), "");
return LLVMBuildLoad(builder, member_ptr, "cache_data");
}
static LLVMValueRef
lookup_tag_data(struct gallivm_state *gallivm,
LLVMValueRef ptr,
LLVMValueRef index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, indices[3];
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_TAGS);
indices[2] = index;
member_ptr = LLVMBuildGEP(builder, ptr, indices, Elements(indices), "");
return LLVMBuildLoad(builder, member_ptr, "tag_data");
}
static void
update_cached_block(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
LLVMValueRef ptr_addr,
LLVMValueRef hash_index,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMTypeRef i8t = LLVMInt8TypeInContext(gallivm->context);
LLVMTypeRef pi8t = LLVMPointerType(i8t, 0);
LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm->context);
LLVMTypeRef i32x4 = LLVMVectorType(LLVMInt32TypeInContext(gallivm->context), 4);
LLVMValueRef function;
LLVMValueRef tag_value, tmp_ptr;
LLVMValueRef col[4];
unsigned i, j;
/*
* Use format_desc->fetch_rgba_8unorm() for each pixel in the block.
* This doesn't actually make any sense whatsoever, someone would need
* to write a function doing this for all pixels in a block (either as
* an external c function or with generated code). Don't ask.
*/
{
/*
* Function to call looks like:
* fetch(uint8_t *dst, const uint8_t *src, unsigned i, unsigned j)
*/
LLVMTypeRef ret_type;
LLVMTypeRef arg_types[4];
LLVMTypeRef function_type;
assert(format_desc->fetch_rgba_8unorm);
ret_type = LLVMVoidTypeInContext(gallivm->context);
arg_types[0] = pi8t;
arg_types[1] = pi8t;
arg_types[2] = i32t;
arg_types[3] = i32t;
function_type = LLVMFunctionType(ret_type, arg_types,
Elements(arg_types), 0);
/* make const pointer for the C fetch_rgba_8unorm function */
function = lp_build_const_int_pointer(gallivm,
func_to_pointer((func_pointer) format_desc->fetch_rgba_8unorm));
/* cast the callee pointer to the function's type */
function = LLVMBuildBitCast(builder, function,
LLVMPointerType(function_type, 0),
"cast callee");
}
tmp_ptr = lp_build_array_alloca(gallivm, i32x4,
lp_build_const_int32(gallivm, 16),
"tmp_decode_store");
tmp_ptr = LLVMBuildBitCast(builder, tmp_ptr, pi8t, "");
/*
* Invoke format_desc->fetch_rgba_8unorm() for each pixel.
* This is going to be really really slow.
* Note: the block store format is actually
* x0y0x0y1x0y2x0y3 x1y0x1y1x1y2x1y3 ...
*/
for (i = 0; i < 4; ++i) {
for (j = 0; j < 4; ++j) {
LLVMValueRef args[4];
LLVMValueRef dst_offset = lp_build_const_int32(gallivm, (i * 4 + j) * 4);
/*
* Note we actually supply a pointer to the start of the block,
* not the start of the texture.
*/
args[0] = LLVMBuildGEP(gallivm->builder, tmp_ptr, &dst_offset, 1, "");
args[1] = ptr_addr;
args[2] = LLVMConstInt(i32t, i, 0);
args[3] = LLVMConstInt(i32t, j, 0);
LLVMBuildCall(builder, function, args, Elements(args), "");
}
}
/* Finally store the block - pointless mem copy + update tag. */
tmp_ptr = LLVMBuildBitCast(builder, tmp_ptr, LLVMPointerType(i32x4, 0), "");
for (i = 0; i < 4; ++i) {
LLVMValueRef tmp_offset = lp_build_const_int32(gallivm, i);
LLVMValueRef ptr = LLVMBuildGEP(gallivm->builder, tmp_ptr, &tmp_offset, 1, "");
col[i] = LLVMBuildLoad(builder, ptr, "");
}
tag_value = LLVMBuildPtrToInt(gallivm->builder, ptr_addr,
LLVMInt64TypeInContext(gallivm->context), "");
store_cached_block(gallivm, col, tag_value, hash_index, cache);
}
/*
* Do a cached lookup.
*
* Returns (vectors of) 4x8 rgba aos value
*/
LLVMValueRef
lp_build_fetch_cached_texels(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
unsigned n,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
unsigned count, low_bit, log2size;
LLVMValueRef color, offset_stored, addr, ptr_addrtrunc, tmp;
LLVMValueRef ij_index, hash_index, hash_mask, block_index;
LLVMTypeRef i8t = LLVMInt8TypeInContext(gallivm->context);
LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm->context);
LLVMTypeRef i64t = LLVMInt64TypeInContext(gallivm->context);
struct lp_type type;
struct lp_build_context bld32;
memset(&type, 0, sizeof type);
type.width = 32;
type.length = n;
assert(format_desc->block.width == 4);
assert(format_desc->block.height == 4);
lp_build_context_init(&bld32, gallivm, type);
/*
* compute hash - we use direct mapped cache, the hash function could
* be better but it needs to be simple
* per-element:
* compare offset with offset stored at tag (hash)
* if not equal decode/store block, update tag
* extract color from cache
* assemble result vector
*/
/* TODO: not ideal with 32bit pointers... */
low_bit = util_logbase2(format_desc->block.bits / 8);
log2size = util_logbase2(LP_BUILD_FORMAT_CACHE_SIZE);
addr = LLVMBuildPtrToInt(builder, base_ptr, i64t, "");
ptr_addrtrunc = LLVMBuildPtrToInt(builder, base_ptr, i32t, "");
ptr_addrtrunc = lp_build_broadcast_scalar(&bld32, ptr_addrtrunc);
/* For the hash function, first mask off the unused lowest bits. Then just
do some xor with address bits - only use lower 32bits */
ptr_addrtrunc = LLVMBuildAdd(builder, offset, ptr_addrtrunc, "");
ptr_addrtrunc = LLVMBuildLShr(builder, ptr_addrtrunc,
lp_build_const_int_vec(gallivm, type, low_bit), "");
/* This only really makes sense for size 64,128,256 */
hash_index = ptr_addrtrunc;
ptr_addrtrunc = LLVMBuildLShr(builder, ptr_addrtrunc,
lp_build_const_int_vec(gallivm, type, 2*log2size), "");
hash_index = LLVMBuildXor(builder, ptr_addrtrunc, hash_index, "");
tmp = LLVMBuildLShr(builder, hash_index,
lp_build_const_int_vec(gallivm, type, log2size), "");
hash_index = LLVMBuildXor(builder, hash_index, tmp, "");
hash_mask = lp_build_const_int_vec(gallivm, type, LP_BUILD_FORMAT_CACHE_SIZE - 1);
hash_index = LLVMBuildAnd(builder, hash_index, hash_mask, "");
ij_index = LLVMBuildShl(builder, i, lp_build_const_int_vec(gallivm, type, 2), "");
ij_index = LLVMBuildAdd(builder, ij_index, j, "");
block_index = LLVMBuildShl(builder, hash_index,
lp_build_const_int_vec(gallivm, type, 4), "");
block_index = LLVMBuildAdd(builder, ij_index, block_index, "");
if (n > 1) {
color = LLVMGetUndef(LLVMVectorType(i32t, n));
for (count = 0; count < n; count++) {
LLVMValueRef index, cond, colorx;
LLVMValueRef block_indexx, hash_indexx, addrx, offsetx, ptr_addrx;
struct lp_build_if_state if_ctx;
index = lp_build_const_int32(gallivm, count);
offsetx = LLVMBuildExtractElement(builder, offset, index, "");
addrx = LLVMBuildZExt(builder, offsetx, i64t, "");
addrx = LLVMBuildAdd(builder, addrx, addr, "");
block_indexx = LLVMBuildExtractElement(builder, block_index, index, "");
hash_indexx = LLVMBuildLShr(builder, block_indexx,
lp_build_const_int32(gallivm, 4), "");
offset_stored = lookup_tag_data(gallivm, cache, hash_indexx);
cond = LLVMBuildICmp(builder, LLVMIntNE, offset_stored, addrx, "");
lp_build_if(&if_ctx, gallivm, cond);
{
ptr_addrx = LLVMBuildIntToPtr(builder, addrx,
LLVMPointerType(i8t, 0), "");
update_cached_block(gallivm, format_desc, ptr_addrx, hash_indexx, cache);
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, 1,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
#endif
}
lp_build_endif(&if_ctx);
colorx = lookup_cached_pixel(gallivm, cache, block_indexx);
color = LLVMBuildInsertElement(builder, color, colorx,
lp_build_const_int32(gallivm, count), "");
}
}
else {
LLVMValueRef cond;
struct lp_build_if_state if_ctx;
tmp = LLVMBuildZExt(builder, offset, i64t, "");
addr = LLVMBuildAdd(builder, tmp, addr, "");
offset_stored = lookup_tag_data(gallivm, cache, hash_index);
cond = LLVMBuildICmp(builder, LLVMIntNE, offset_stored, addr, "");
lp_build_if(&if_ctx, gallivm, cond);
{
tmp = LLVMBuildIntToPtr(builder, addr, LLVMPointerType(i8t, 0), "");
update_cached_block(gallivm, format_desc, tmp, hash_index, cache);
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, 1,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
#endif
}
lp_build_endif(&if_ctx);
color = lookup_cached_pixel(gallivm, cache, block_index);
}
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, n,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL);
#endif
return LLVMBuildBitCast(builder, color, LLVMVectorType(i8t, n * 4), "");
}

View File

@@ -346,6 +346,7 @@ lp_build_rgba8_to_fi32_soa(struct gallivm_state *gallivm,
* \param i, j the sub-block pixel coordinates. For non-compressed formats
* these will always be (0,0). For compressed formats, i will
* be in [0, block_width-1] and j will be in [0, block_height-1].
* \param cache optional value pointing to a lp_build_format_cache structure
*/
void
lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
@@ -355,6 +356,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache,
LLVMValueRef rgba_out[4])
{
LLVMBuilderRef builder = gallivm->builder;
@@ -473,7 +475,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
tmp_type.norm = TRUE;
tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
TRUE, base_ptr, offset, i, j);
TRUE, base_ptr, offset, i, j, cache);
lp_build_rgba8_to_fi32_soa(gallivm,
type,
@@ -483,6 +485,39 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
return;
}
if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC &&
/* non-srgb case is already handled above */
format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB &&
type.floating && type.width == 32 &&
(type.length == 1 || (type.length % 4 == 0)) &&
cache) {
const struct util_format_description *format_decompressed;
const struct util_format_description *flinear_desc;
LLVMValueRef packed;
flinear_desc = util_format_description(util_format_linear(format_desc->format));
packed = lp_build_fetch_cached_texels(gallivm,
flinear_desc,
type.length,
base_ptr,
offset,
i, j,
cache);
packed = LLVMBuildBitCast(builder, packed,
lp_build_int_vec_type(gallivm, type), "");
/*
* The values are now packed so they match ordinary srgb RGBA8 format,
* hence need to use matching format for unpack.
*/
format_decompressed = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB);
lp_build_unpack_rgba_soa(gallivm,
format_decompressed,
type,
packed, rgba_out);
return;
}
/*
* Fallback to calling lp_build_fetch_rgba_aos for each pixel.
*
@@ -524,7 +559,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
/* Get a single float[4]={R,G,B,A} pixel */
tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
TRUE, base_ptr, offset_elem,
i_elem, j_elem);
i_elem, j_elem, cache);
/*
* Insert the AoS tmp value channels into the SoA result vectors at

View File

@@ -427,6 +427,7 @@ lp_build_init(void)
*/
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
}
#ifdef PIPE_ARCH_PPC_64
@@ -458,7 +459,9 @@ lp_build_init(void)
util_cpu_caps.has_sse3 = 0;
util_cpu_caps.has_ssse3 = 0;
util_cpu_caps.has_sse4_1 = 0;
util_cpu_caps.has_sse4_2 = 0;
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
#endif

View File

@@ -137,6 +137,8 @@ gallivm_get_shader_param(enum pipe_shader_cap param)
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
return 0;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
return 32;
}
/* if we get here, we missed a shader cap above (and should have seen
* a compiler warning.)

View File

@@ -81,6 +81,8 @@
# pragma pop_macro("DEBUG")
#endif
#include "c11/threads.h"
#include "os/os_thread.h"
#include "pipe/p_config.h"
#include "util/u_debug.h"
#include "util/u_cpu_detect.h"
@@ -103,6 +105,33 @@ static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
}
static once_flag init_native_targets_once_flag;
static void init_native_targets()
{
// If we have a native target, initialize it to ensure it is linked in and
// usable by the JIT.
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetDisassembler();
}
/**
* The llvm target registry is not thread-safe, so drivers and state-trackers
* that want to initialize targets should use the gallivm_init_llvm_targets()
* function to safely initialize targets.
*
* LLVM targets should be initialized before the driver or state-tracker tries
* to access the registry.
*/
extern "C" void
gallivm_init_llvm_targets(void)
{
call_once(&init_native_targets_once_flag, init_native_targets);
}
extern "C" void
lp_set_target_options(void)
{
@@ -115,13 +144,7 @@ lp_set_target_options(void)
llvm::DisablePrettyStackTrace = true;
#endif
// If we have a native target, initialize it to ensure it is linked in and
// usable by the JIT.
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetDisassembler();
gallivm_init_llvm_targets();
}
@@ -474,20 +497,57 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
#endif
}
llvm::SmallVector<std::string, 1> MAttrs;
if (util_cpu_caps.has_avx) {
/*
* AVX feature is not automatically detected from CPUID by the X86 target
* yet, because the old (yet default) JIT engine is not capable of
* emitting the opcodes. On newer llvm versions it is and at least some
* versions (tested with 3.3) will emit avx opcodes without this anyway.
*/
MAttrs.push_back("+avx");
if (util_cpu_caps.has_f16c) {
MAttrs.push_back("+f16c");
}
builder.setMAttrs(MAttrs);
llvm::SmallVector<std::string, 16> MAttrs;
#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
/*
* We need to unset attributes because sometimes LLVM mistakenly assumes
* certain features are present given the processor name.
*
* https://bugs.freedesktop.org/show_bug.cgi?id=92214
* http://llvm.org/PR25021
* http://llvm.org/PR19429
* http://llvm.org/PR16721
*/
MAttrs.push_back(util_cpu_caps.has_sse ? "+sse" : "-sse" );
MAttrs.push_back(util_cpu_caps.has_sse2 ? "+sse2" : "-sse2" );
MAttrs.push_back(util_cpu_caps.has_sse3 ? "+sse3" : "-sse3" );
MAttrs.push_back(util_cpu_caps.has_ssse3 ? "+ssse3" : "-ssse3" );
#if HAVE_LLVM >= 0x0304
MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse4.1" : "-sse4.1");
#else
MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41" : "-sse41" );
#endif
#if HAVE_LLVM >= 0x0304
MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse4.2" : "-sse4.2");
#else
MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse42" : "-sse42" );
#endif
/*
* AVX feature is not automatically detected from CPUID by the X86 target
* yet, because the old (yet default) JIT engine is not capable of
* emitting the opcodes. On newer llvm versions it is and at least some
* versions (tested with 3.3) will emit avx opcodes without this anyway.
*/
MAttrs.push_back(util_cpu_caps.has_avx ? "+avx" : "-avx");
MAttrs.push_back(util_cpu_caps.has_f16c ? "+f16c" : "-f16c");
MAttrs.push_back(util_cpu_caps.has_avx2 ? "+avx2" : "-avx2");
#endif
#if defined(PIPE_ARCH_PPC)
MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
#if HAVE_LLVM >= 0x0304
/*
* Make sure VSX instructions are disabled
* See LLVM bug https://llvm.org/bugs/show_bug.cgi?id=25503#c7
*/
if (util_cpu_caps.has_altivec) {
MAttrs.push_back("-vsx");
}
#endif
#endif
builder.setMAttrs(MAttrs);
#if HAVE_LLVM >= 0x0305
StringRef MCPU = llvm::sys::getHostCPUName();

View File

@@ -41,6 +41,8 @@ extern "C" {
struct lp_generated_code;
extern void
gallivm_init_llvm_targets(void);
extern void
lp_set_target_options(void);

View File

@@ -99,6 +99,7 @@ struct lp_sampler_params
unsigned sampler_index;
unsigned sample_key;
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr;
const LLVMValueRef *coords;
const LLVMValueRef *offsets;
LLVMValueRef lod;
@@ -267,6 +268,17 @@ struct lp_sampler_dynamic_state
struct gallivm_state *gallivm,
LLVMValueRef context_ptr,
unsigned sampler_unit);
/**
* Obtain texture cache (returns ptr to lp_build_format_cache).
*
* It's optional: no caching will be done if it's NULL.
*/
LLVMValueRef
(*cache_ptr)(const struct lp_sampler_dynamic_state *state,
struct gallivm_state *gallivm,
LLVMValueRef thread_data_ptr,
unsigned unit);
};
@@ -356,6 +368,7 @@ struct lp_build_sample_context
LLVMValueRef img_stride_array;
LLVMValueRef base_ptr;
LLVMValueRef mip_offsets;
LLVMValueRef cache;
/** Integer vector with texture width, height, depth */
LLVMValueRef int_size;

View File

@@ -593,7 +593,8 @@ lp_build_sample_fetch_image_nearest(struct lp_build_sample_context *bld,
TRUE,
data_ptr, offset,
x_subcoord,
y_subcoord);
y_subcoord,
bld->cache);
}
*colors = rgba8;
@@ -933,7 +934,8 @@ lp_build_sample_fetch_image_linear(struct lp_build_sample_context *bld,
TRUE,
data_ptr, offset[k][j][i],
x_subcoord[i],
y_subcoord[j]);
y_subcoord[j],
bld->cache);
}
neighbors[k][j][i] = rgba8;

View File

@@ -161,6 +161,7 @@ lp_build_sample_texel_soa(struct lp_build_sample_context *bld,
bld->texel_type,
data_ptr, offset,
i, j,
bld->cache,
texel_out);
/*
@@ -405,16 +406,17 @@ lp_build_sample_wrap_linear(struct lp_build_sample_context *bld,
break;
case PIPE_TEX_WRAP_MIRROR_REPEAT:
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
offset = lp_build_div(coord_bld, offset, length_f);
coord = lp_build_add(coord_bld, coord, offset);
}
/* compute mirror function */
coord = lp_build_coord_mirror(bld, coord);
/* scale coord to length */
coord = lp_build_mul(coord_bld, coord, length_f);
coord = lp_build_sub(coord_bld, coord, half);
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
coord = lp_build_add(coord_bld, coord, offset);
}
/* convert to int, compute lerp weight */
lp_build_ifloor_fract(coord_bld, coord, &coord0, &weight);
@@ -567,12 +569,13 @@ lp_build_sample_wrap_nearest(struct lp_build_sample_context *bld,
coord = lp_build_mul(coord_bld, coord, length_f);
}
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
coord = lp_build_add(coord_bld, coord, offset);
}
/* floor */
/* use itrunc instead since we clamp to 0 anyway */
icoord = lp_build_itrunc(coord_bld, coord);
if (offset) {
icoord = lp_build_add(int_coord_bld, icoord, offset);
}
/* clamp to [0, length - 1]. */
icoord = lp_build_clamp(int_coord_bld, icoord, int_coord_bld->zero,
@@ -2387,6 +2390,7 @@ lp_build_fetch_texel(struct lp_build_sample_context *bld,
bld->texel_type,
bld->base_ptr, offset,
i, j,
bld->cache,
colors_out);
if (out_of_bound_ret_zero) {
@@ -2440,6 +2444,7 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
unsigned texture_index,
unsigned sampler_index,
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
const LLVMValueRef *coords,
const LLVMValueRef *offsets,
const struct lp_derivatives *derivs, /* optional */
@@ -2586,6 +2591,10 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
derived_sampler_state.wrap_s = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
derived_sampler_state.wrap_t = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
}
/*
* We could force CLAMP to CLAMP_TO_EDGE here if min/mag filter is nearest,
* so AoS path could be used. Not sure it's worth the trouble...
*/
min_img_filter = derived_sampler_state.min_img_filter;
mag_img_filter = derived_sampler_state.mag_img_filter;
@@ -2701,6 +2710,11 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
context_ptr, texture_index);
/* Note that mip_offsets is an array[level] of offsets to texture images */
if (dynamic_state->cache_ptr && thread_data_ptr) {
bld.cache = dynamic_state->cache_ptr(dynamic_state, gallivm,
thread_data_ptr, texture_index);
}
/* width, height, depth as single int vector */
if (dims <= 1) {
bld.int_size = tex_width;
@@ -2877,6 +2891,7 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
bld4.base_ptr = bld.base_ptr;
bld4.mip_offsets = bld.mip_offsets;
bld4.int_size = bld.int_size;
bld4.cache = bld.cache;
bld4.vector_width = lp_type_width(type4);
@@ -3075,12 +3090,14 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
LLVMValueRef offsets[3] = { NULL };
LLVMValueRef lod = NULL;
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr = NULL;
LLVMValueRef texel_out[4];
struct lp_derivatives derivs;
struct lp_derivatives *deriv_ptr = NULL;
unsigned num_param = 0;
unsigned i, num_coords, num_derivs, num_offsets, layer;
enum lp_sampler_lod_control lod_control;
boolean need_cache = FALSE;
lod_control = (sample_key & LP_SAMPLER_LOD_CONTROL_MASK) >>
LP_SAMPLER_LOD_CONTROL_SHIFT;
@@ -3088,8 +3105,19 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
get_target_info(static_texture_state->target,
&num_coords, &num_derivs, &num_offsets, &layer);
if (dynamic_state->cache_ptr) {
const struct util_format_description *format_desc;
format_desc = util_format_description(static_texture_state->format);
if (format_desc && format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) {
need_cache = TRUE;
}
}
/* "unpack" arguments */
context_ptr = LLVMGetParam(function, num_param++);
if (need_cache) {
thread_data_ptr = LLVMGetParam(function, num_param++);
}
for (i = 0; i < num_coords; i++) {
coords[i] = LLVMGetParam(function, num_param++);
}
@@ -3140,6 +3168,7 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
texture_index,
sampler_index,
context_ptr,
thread_data_ptr,
coords,
offsets,
deriv_ptr,
@@ -3183,6 +3212,7 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
const LLVMValueRef *offsets = params->offsets;
const struct lp_derivatives *derivs = params->derivs;
enum lp_sampler_lod_control lod_control;
boolean need_cache = FALSE;
lod_control = (sample_key & LP_SAMPLER_LOD_CONTROL_MASK) >>
LP_SAMPLER_LOD_CONTROL_SHIFT;
@@ -3190,6 +3220,17 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
get_target_info(static_texture_state->target,
&num_coords, &num_derivs, &num_offsets, &layer);
if (dynamic_state->cache_ptr) {
const struct util_format_description *format_desc;
format_desc = util_format_description(static_texture_state->format);
if (format_desc && format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) {
/*
* This is not 100% correct, if we have cache but the
* util_format_s3tc_prefer is true the cache won't get used
* regardless (could hook up the block decode there...) */
need_cache = TRUE;
}
}
/*
* texture function matches are found by name.
* Thus the name has to include both the texture and sampler unit
@@ -3215,6 +3256,9 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
*/
arg_types[num_param++] = LLVMTypeOf(params->context_ptr);
if (need_cache) {
arg_types[num_param++] = LLVMTypeOf(params->thread_data_ptr);
}
for (i = 0; i < num_coords; i++) {
arg_types[num_param++] = LLVMTypeOf(coords[0]);
assert(LLVMTypeOf(coords[0]) == LLVMTypeOf(coords[i]));
@@ -3274,6 +3318,9 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
num_args = 0;
args[num_args++] = params->context_ptr;
if (need_cache) {
args[num_args++] = params->thread_data_ptr;
}
for (i = 0; i < num_coords; i++) {
args[num_args++] = coords[i];
}
@@ -3378,6 +3425,7 @@ lp_build_sample_soa(const struct lp_static_texture_state *static_texture_state,
params->texture_index,
params->sampler_index,
params->context_ptr,
params->thread_data_ptr,
params->coords,
params->offsets,
params->derivs,

View File

@@ -129,7 +129,8 @@ lp_build_emit_llvm_unary(
unsigned tgsi_opcode,
LLVMValueRef arg0)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 1;
emit_data.args[0] = arg0;
return lp_build_emit_llvm(bld_base, tgsi_opcode, &emit_data);
@@ -142,7 +143,8 @@ lp_build_emit_llvm_binary(
LLVMValueRef arg0,
LLVMValueRef arg1)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 2;
emit_data.args[0] = arg0;
emit_data.args[1] = arg1;
@@ -157,7 +159,8 @@ lp_build_emit_llvm_ternary(
LLVMValueRef arg1,
LLVMValueRef arg2)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 3;
emit_data.args[0] = arg0;
emit_data.args[1] = arg1;

View File

@@ -230,6 +230,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
const LLVMValueRef (*inputs)[4],
LLVMValueRef (*outputs)[4],
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
struct lp_build_sampler_soa *sampler,
const struct tgsi_shader_info *info,
const struct lp_build_tgsi_gs_iface *gs_iface);
@@ -447,6 +448,7 @@ struct lp_build_tgsi_soa_context
const LLVMValueRef (*inputs)[TGSI_NUM_CHANNELS];
LLVMValueRef (*outputs)[TGSI_NUM_CHANNELS];
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr;
const struct lp_build_sampler_soa *sampler;

View File

@@ -538,12 +538,19 @@ lrp_emit(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
{
LLVMValueRef tmp;
tmp = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_SUB,
emit_data->args[1],
emit_data->args[2]);
emit_data->output[emit_data->chan] = lp_build_emit_llvm_ternary(bld_base,
TGSI_OPCODE_MAD, emit_data->args[0], tmp, emit_data->args[2]);
struct lp_build_context *bld = &bld_base->base;
LLVMValueRef inv, a, b;
/* This uses the correct version: (1 - t)*a + t*b
*
* An alternative version is "a + t*(b-a)". The problem is this version
* doesn't return "b" for t = 1, because "a + (b-a)" isn't equal to "b"
* because of the floating-point rounding.
*/
inv = lp_build_sub(bld, bld_base->base.one, emit_data->args[0]);
a = lp_build_mul(bld, emit_data->args[1], emit_data->args[0]);
b = lp_build_mul(bld, emit_data->args[2], inv);
emit_data->output[emit_data->chan] = lp_build_add(bld, a, b);
}
/* TGSI_OPCODE_MAD */

View File

@@ -2321,6 +2321,7 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
params.texture_index = unit;
params.sampler_index = unit;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.lod = lod;
@@ -2488,6 +2489,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
params.texture_index = texture_unit;
params.sampler_index = sampler_unit;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.lod = lod;
@@ -2606,8 +2608,14 @@ emit_fetch_texels( struct lp_build_tgsi_soa_context *bld,
params.type = bld->bld_base.base.type;
params.sample_key = sample_key;
params.texture_index = unit;
params.sampler_index = unit;
/*
* sampler not actually used, set to 0 so it won't exceed PIPE_MAX_SAMPLERS
* and trigger some assertions with d3d10 where the sampler view number
* can exceed this.
*/
params.sampler_index = 0;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.derivs = NULL;
@@ -3858,6 +3866,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
const LLVMValueRef (*inputs)[TGSI_NUM_CHANNELS],
LLVMValueRef (*outputs)[TGSI_NUM_CHANNELS],
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
struct lp_build_sampler_soa *sampler,
const struct tgsi_shader_info *info,
const struct lp_build_tgsi_gs_iface *gs_iface)
@@ -3893,6 +3902,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
bld.bld_base.info = info;
bld.indirect_files = info->indirect_files;
bld.context_ptr = context_ptr;
bld.thread_data_ptr = thread_data_ptr;
/*
* If the number of temporaries is rather large then we just

View File

@@ -33,6 +33,7 @@
* Set GALLIUM_HUD=help for more info.
*/
#include <signal.h>
#include <stdio.h>
#include "hud/hud_context.h"
@@ -51,12 +52,15 @@
#include "tgsi/tgsi_text.h"
#include "tgsi/tgsi_dump.h"
/* Control the visibility of all HUD contexts */
static boolean huds_visible = TRUE;
struct hud_context {
struct pipe_context *pipe;
struct cso_context *cso;
struct u_upload_mgr *uploader;
struct hud_batch_query_context *batch_query;
struct list_head pane_list;
/* states */
@@ -95,6 +99,13 @@ struct hud_context {
} text, bg, whitelines;
};
#ifdef PIPE_OS_UNIX
static void
signal_visible_handler(int sig, siginfo_t *siginfo, void *context)
{
huds_visible = !huds_visible;
}
#endif
static void
hud_draw_colored_prims(struct hud_context *hud, unsigned prim,
@@ -441,6 +452,9 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex)
struct hud_pane *pane;
struct hud_graph *gr;
if (!huds_visible)
return;
hud->fb_width = tex->width0;
hud->fb_height = tex->height0;
hud->constants.two_div_fb_width = 2.0f / hud->fb_width;
@@ -510,6 +524,8 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex)
hud_alloc_vertices(hud, &hud->text, 4 * 512, 4 * sizeof(float));
/* prepare all graphs */
hud_batch_query_update(hud->batch_query);
LIST_FOR_EACH_ENTRY(pane, &hud->pane_list, head) {
LIST_FOR_EACH_ENTRY(gr, &pane->graph_list, head) {
gr->query_new_value(gr);
@@ -903,17 +919,21 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
}
else if (strcmp(name, "samples-passed") == 0 &&
has_occlusion_query(hud->pipe->screen)) {
hud_pipe_query_install(pane, hud->pipe, "samples-passed",
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe,
"samples-passed",
PIPE_QUERY_OCCLUSION_COUNTER, 0, 0,
PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
}
else if (strcmp(name, "primitives-generated") == 0 &&
has_streamout(hud->pipe->screen)) {
hud_pipe_query_install(pane, hud->pipe, "primitives-generated",
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe,
"primitives-generated",
PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0,
PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
}
else {
boolean processed = FALSE;
@@ -938,17 +958,19 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
if (strcmp(name, pipeline_statistics_names[i]) == 0)
break;
if (i < Elements(pipeline_statistics_names)) {
hud_pipe_query_install(pane, hud->pipe, name,
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe, name,
PIPE_QUERY_PIPELINE_STATISTICS, i,
0, PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
processed = TRUE;
}
}
/* driver queries */
if (!processed) {
if (!hud_driver_query_install(pane, hud->pipe, name)){
if (!hud_driver_query_install(&hud->batch_query, pane, hud->pipe,
name)) {
fprintf(stderr, "gallium_hud: unknown driver query '%s'\n", name);
}
}
@@ -987,6 +1009,9 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
case ',':
env++;
if (!pane)
break;
y += height + hud->font.glyph_height * (pane->num_graphs + 2);
height = 100;
@@ -1122,6 +1147,12 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso)
struct pipe_sampler_view view_templ;
unsigned i;
const char *env = debug_get_option("GALLIUM_HUD", NULL);
unsigned signo = debug_get_num_option("GALLIUM_HUD_TOGGLE_SIGNAL", 0);
#ifdef PIPE_OS_UNIX
static boolean sig_handled = FALSE;
struct sigaction action = {};
#endif
huds_visible = debug_get_bool_option("GALLIUM_HUD_VISIBLE", TRUE);
if (!env || !*env)
return NULL;
@@ -1264,6 +1295,22 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso)
LIST_INITHEAD(&hud->pane_list);
/* setup sig handler once for all hud contexts */
#ifdef PIPE_OS_UNIX
if (!sig_handled && signo != 0) {
action.sa_sigaction = &signal_visible_handler;
action.sa_flags = SA_SIGINFO;
if (signo >= NSIG)
fprintf(stderr, "gallium_hud: invalid signal %u\n", signo);
else if (sigaction(signo, &action, NULL) < 0)
fprintf(stderr, "gallium_hud: unable to set handler for signal %u\n", signo);
fflush(stderr);
sig_handled = TRUE;
}
#endif
hud_parse_env_var(hud, env);
return hud;
}
@@ -1284,6 +1331,7 @@ hud_destroy(struct hud_context *hud)
FREE(pane);
}
hud_batch_query_cleanup(&hud->batch_query);
pipe->delete_fs_state(pipe, hud->fs_color);
pipe->delete_fs_state(pipe, hud->fs_text);
pipe->delete_vs_state(pipe, hud->vs);

View File

@@ -33,6 +33,58 @@
#include "util/u_memory.h"
#include <stdio.h>
#include <inttypes.h>
#ifdef PIPE_OS_WINDOWS
#include <windows.h>
#endif
#ifdef PIPE_OS_WINDOWS
static inline uint64_t
filetime_to_scalar(FILETIME ft)
{
ULARGE_INTEGER uli;
uli.LowPart = ft.dwLowDateTime;
uli.HighPart = ft.dwHighDateTime;
return uli.QuadPart;
}
static boolean
get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
{
SYSTEM_INFO sysInfo;
FILETIME ftNow, ftCreation, ftExit, ftKernel, ftUser;
GetSystemInfo(&sysInfo);
assert(sysInfo.dwNumberOfProcessors >= 1);
if (cpu_index != ALL_CPUS && cpu_index >= sysInfo.dwNumberOfProcessors) {
/* Tell hud_get_num_cpus there are only this many CPUs. */
return FALSE;
}
/* Get accumulated user and sys time for all threads */
if (!GetProcessTimes(GetCurrentProcess(), &ftCreation, &ftExit,
&ftKernel, &ftUser))
return FALSE;
GetSystemTimeAsFileTime(&ftNow);
*busy_time = filetime_to_scalar(ftUser) + filetime_to_scalar(ftKernel);
*total_time = filetime_to_scalar(ftNow) - filetime_to_scalar(ftCreation);
/* busy_time already has the time accross all cpus.
* XXX: if we want 100% to mean one CPU, 200% two cpus, eliminate the
* following line.
*/
*total_time *= sysInfo.dwNumberOfProcessors;
/* XXX: we ignore cpu_index, i.e, we assume that the individual CPU usage
* and the system usage are one and the same.
*/
return TRUE;
}
#else
static boolean
get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
@@ -81,6 +133,8 @@ get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
fclose(f);
return FALSE;
}
#endif
struct cpu_info {
unsigned cpu_index;

View File

@@ -34,13 +34,164 @@
#include "hud/hud_private.h"
#include "pipe/p_screen.h"
#include "os/os_time.h"
#include "util/u_math.h"
#include "util/u_memory.h"
#include <stdio.h>
// Must be a power of two
#define NUM_QUERIES 8
struct hud_batch_query_context {
struct pipe_context *pipe;
unsigned num_query_types;
unsigned allocated_query_types;
unsigned *query_types;
boolean failed;
struct pipe_query *query[NUM_QUERIES];
union pipe_query_result *result[NUM_QUERIES];
unsigned head, pending, results;
};
void
hud_batch_query_update(struct hud_batch_query_context *bq)
{
struct pipe_context *pipe;
if (!bq || bq->failed)
return;
pipe = bq->pipe;
if (bq->query[bq->head])
pipe->end_query(pipe, bq->query[bq->head]);
bq->results = 0;
while (bq->pending) {
unsigned idx = (bq->head - bq->pending + 1) % NUM_QUERIES;
struct pipe_query *query = bq->query[idx];
if (!bq->result[idx])
bq->result[idx] = MALLOC(sizeof(bq->result[idx]->batch[0]) *
bq->num_query_types);
if (!bq->result[idx]) {
fprintf(stderr, "gallium_hud: out of memory.\n");
bq->failed = TRUE;
return;
}
if (!pipe->get_query_result(pipe, query, FALSE, bq->result[idx]))
break;
++bq->results;
--bq->pending;
}
bq->head = (bq->head + 1) % NUM_QUERIES;
if (bq->pending == NUM_QUERIES) {
fprintf(stderr,
"gallium_hud: all queries busy after %i frames, dropping data.\n",
NUM_QUERIES);
assert(bq->query[bq->head]);
pipe->destroy_query(bq->pipe, bq->query[bq->head]);
bq->query[bq->head] = NULL;
}
++bq->pending;
if (!bq->query[bq->head]) {
bq->query[bq->head] = pipe->create_batch_query(pipe,
bq->num_query_types,
bq->query_types);
if (!bq->query[bq->head]) {
fprintf(stderr,
"gallium_hud: create_batch_query failed. You may have "
"selected too many or incompatible queries.\n");
bq->failed = TRUE;
return;
}
}
if (!pipe->begin_query(pipe, bq->query[bq->head])) {
fprintf(stderr,
"gallium_hud: could not begin batch query. You may have "
"selected too many or incompatible queries.\n");
bq->failed = TRUE;
}
}
static boolean
batch_query_add(struct hud_batch_query_context **pbq,
struct pipe_context *pipe, unsigned query_type,
unsigned *result_index)
{
struct hud_batch_query_context *bq = *pbq;
unsigned i;
if (!bq) {
bq = CALLOC_STRUCT(hud_batch_query_context);
if (!bq)
return false;
bq->pipe = pipe;
*pbq = bq;
}
for (i = 0; i < bq->num_query_types; ++i) {
if (bq->query_types[i] == query_type) {
*result_index = i;
return true;
}
}
if (bq->num_query_types == bq->allocated_query_types) {
unsigned new_alloc = MAX2(16, bq->allocated_query_types * 2);
unsigned *new_query_types
= REALLOC(bq->query_types,
bq->allocated_query_types * sizeof(unsigned),
new_alloc * sizeof(unsigned));
if (!new_query_types)
return false;
bq->query_types = new_query_types;
bq->allocated_query_types = new_alloc;
}
bq->query_types[bq->num_query_types] = query_type;
*result_index = bq->num_query_types++;
return true;
}
void
hud_batch_query_cleanup(struct hud_batch_query_context **pbq)
{
struct hud_batch_query_context *bq = *pbq;
unsigned idx;
if (!bq)
return;
*pbq = NULL;
if (bq->query[bq->head] && !bq->failed)
bq->pipe->end_query(bq->pipe, bq->query[bq->head]);
for (idx = 0; idx < NUM_QUERIES; ++idx) {
if (bq->query[idx])
bq->pipe->destroy_query(bq->pipe, bq->query[idx]);
FREE(bq->result[idx]);
}
FREE(bq->query_types);
FREE(bq);
}
struct query_info {
struct pipe_context *pipe;
struct hud_batch_query_context *batch;
unsigned query_type;
unsigned result_index; /* unit depends on query_type */
enum pipe_driver_query_result_type result_type;
@@ -48,7 +199,6 @@ struct query_info {
/* Ring of queries. If a query is busy, we use another slot. */
struct pipe_query *query[NUM_QUERIES];
unsigned head, tail;
unsigned num_queries;
uint64_t last_time;
uint64_t results_cumulative;
@@ -56,11 +206,26 @@ struct query_info {
};
static void
query_new_value(struct hud_graph *gr)
query_new_value_batch(struct query_info *info)
{
struct hud_batch_query_context *bq = info->batch;
unsigned result_index = info->result_index;
unsigned idx = (bq->head - bq->pending) % NUM_QUERIES;
unsigned results = bq->results;
while (results) {
info->results_cumulative += bq->result[idx]->batch[result_index].u64;
++info->num_results;
--results;
idx = (idx - 1) % NUM_QUERIES;
}
}
static void
query_new_value_normal(struct query_info *info)
{
struct query_info *info = gr->query_data;
struct pipe_context *pipe = info->pipe;
uint64_t now = os_time_get();
if (info->last_time) {
if (info->query[info->head])
@@ -107,30 +272,9 @@ query_new_value(struct hud_graph *gr)
break;
}
}
if (info->num_results && info->last_time + gr->pane->period <= now) {
uint64_t value;
switch (info->result_type) {
default:
case PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE:
value = info->results_cumulative / info->num_results;
break;
case PIPE_DRIVER_QUERY_RESULT_TYPE_CUMULATIVE:
value = info->results_cumulative;
break;
}
hud_graph_add_value(gr, value);
info->last_time = now;
info->results_cumulative = 0;
info->num_results = 0;
}
}
else {
/* initialize */
info->last_time = now;
info->query[info->head] = pipe->create_query(pipe, info->query_type, 0);
}
@@ -138,12 +282,50 @@ query_new_value(struct hud_graph *gr)
pipe->begin_query(pipe, info->query[info->head]);
}
static void
query_new_value(struct hud_graph *gr)
{
struct query_info *info = gr->query_data;
uint64_t now = os_time_get();
if (info->batch) {
query_new_value_batch(info);
} else {
query_new_value_normal(info);
}
if (!info->last_time) {
info->last_time = now;
return;
}
if (info->num_results && info->last_time + gr->pane->period <= now) {
uint64_t value;
switch (info->result_type) {
default:
case PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE:
value = info->results_cumulative / info->num_results;
break;
case PIPE_DRIVER_QUERY_RESULT_TYPE_CUMULATIVE:
value = info->results_cumulative;
break;
}
hud_graph_add_value(gr, value);
info->last_time = now;
info->results_cumulative = 0;
info->num_results = 0;
}
}
static void
free_query_info(void *ptr)
{
struct query_info *info = ptr;
if (info->last_time) {
if (!info->batch && info->last_time) {
struct pipe_context *pipe = info->pipe;
int i;
@@ -159,11 +341,13 @@ free_query_info(void *ptr)
}
void
hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
hud_pipe_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name, unsigned query_type,
unsigned result_index,
uint64_t max_value, enum pipe_driver_query_type type,
enum pipe_driver_query_result_type result_type)
enum pipe_driver_query_result_type result_type,
unsigned flags)
{
struct hud_graph *gr;
struct query_info *info;
@@ -175,28 +359,40 @@ hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
strncpy(gr->name, name, sizeof(gr->name));
gr->name[sizeof(gr->name) - 1] = '\0';
gr->query_data = CALLOC_STRUCT(query_info);
if (!gr->query_data) {
FREE(gr);
return;
}
if (!gr->query_data)
goto fail_gr;
gr->query_new_value = query_new_value;
gr->free_query_data = free_query_info;
info = gr->query_data;
info->pipe = pipe;
info->query_type = query_type;
info->result_index = result_index;
info->result_type = result_type;
if (flags & PIPE_DRIVER_QUERY_FLAG_BATCH) {
if (!batch_query_add(pbq, pipe, query_type, &info->result_index))
goto fail_info;
info->batch = *pbq;
} else {
info->query_type = query_type;
info->result_index = result_index;
}
hud_pane_add_graph(pane, gr);
if (pane->max_value < max_value)
hud_pane_set_max_value(pane, max_value);
pane->type = type;
return;
fail_info:
FREE(info);
fail_gr:
FREE(gr);
}
boolean
hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe,
hud_driver_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name)
{
struct pipe_screen *screen = pipe->screen;
@@ -220,8 +416,9 @@ hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe,
if (!found)
return FALSE;
hud_pipe_query_install(pane, pipe, query.name, query.query_type, 0,
query.max_value.u64, query.type, query.result_type);
hud_pipe_query_install(pbq, pane, pipe, query.name, query.query_type, 0,
query.max_value.u64, query.type, query.result_type,
query.flags);
return TRUE;
}

View File

@@ -80,19 +80,26 @@ void hud_pane_set_max_value(struct hud_pane *pane, uint64_t value);
void hud_graph_add_value(struct hud_graph *gr, uint64_t value);
/* graphs/queries */
struct hud_batch_query_context;
#define ALL_CPUS ~0 /* optionally set as cpu_index */
int hud_get_num_cpus(void);
void hud_fps_graph_install(struct hud_pane *pane);
void hud_cpu_graph_install(struct hud_pane *pane, unsigned cpu_index);
void hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
void hud_pipe_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name, unsigned query_type,
unsigned result_index,
uint64_t max_value,
enum pipe_driver_query_type type,
enum pipe_driver_query_result_type result_type);
boolean hud_driver_query_install(struct hud_pane *pane,
enum pipe_driver_query_result_type result_type,
unsigned flags);
boolean hud_driver_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane,
struct pipe_context *pipe, const char *name);
void hud_batch_query_update(struct hud_batch_query_context *bq);
void hud_batch_query_cleanup(struct hud_batch_query_context **pbq);
#endif

View File

@@ -68,17 +68,18 @@ static void translate_memcpy_uint( const void *in,
* \param out_nr returns number of new vertices
* \param out_translate returns the translation function to use by the caller
*/
int u_index_translator( unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate )
enum indices_mode
u_index_translator(unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate)
{
unsigned in_idx;
unsigned out_idx;
@@ -204,17 +205,17 @@ int u_index_translator( unsigned hw_mask,
* \param out_nr returns new number of vertices to draw
* \param out_generate returns pointer to the generator function
*/
int u_index_generator( unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate )
enum indices_mode
u_index_generator(unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate)
{
unsigned out_idx;

View File

@@ -67,66 +67,68 @@ typedef void (*u_generate_func)( unsigned start,
/* Return codes describe the translate/generate operation. Caller may
* be able to reuse translated indices under some circumstances.
*/
#define U_TRANSLATE_ERROR -1
#define U_TRANSLATE_NORMAL 1
#define U_TRANSLATE_MEMCPY 2
#define U_GENERATE_LINEAR 3
#define U_GENERATE_REUSABLE 4
#define U_GENERATE_ONE_OFF 5
enum indices_mode {
U_TRANSLATE_ERROR = -1,
U_TRANSLATE_NORMAL = 1,
U_TRANSLATE_MEMCPY = 2,
U_GENERATE_LINEAR = 3,
U_GENERATE_REUSABLE= 4,
U_GENERATE_ONE_OFF = 5,
};
void u_index_init( void );
int u_index_translator( unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate );
enum indices_mode
u_index_translator(unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate);
/* Note that even when generating it is necessary to know what the
* API's PV is, as the indices generated will depend on whether it is
* the same as hardware or not, and in the case of triangle strips,
* whether it is first or last.
*/
int u_index_generator( unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate );
enum indices_mode
u_index_generator(unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate);
void u_unfilled_init( void );
int u_unfilled_translator( unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate );
int u_unfilled_generator( unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate );
enum indices_mode
u_unfilled_translator(unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate);
enum indices_mode
u_unfilled_generator(unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate);
#endif

View File

@@ -111,14 +111,15 @@ static unsigned nr_lines( unsigned prim,
int u_unfilled_translator( unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate )
enum indices_mode
u_unfilled_translator(unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate)
{
unsigned in_idx;
unsigned out_idx;
@@ -170,14 +171,15 @@ int u_unfilled_translator( unsigned prim,
* different front/back fill modes, that can be handled with the
* 'draw' module.
*/
int u_unfilled_generator( unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate )
enum indices_mode
u_unfilled_generator(unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate)
{
unsigned out_idx;

View File

@@ -24,9 +24,10 @@
#include "util/ralloc.h"
#include "glsl/nir/nir.h"
#include "glsl/nir/nir_control_flow.h"
#include "glsl/nir/nir_builder.h"
#include "glsl/list.h"
#include "glsl/shader_enums.h"
#include "glsl/nir/shader_enums.h"
#include "nir/tgsi_to_nir.h"
#include "tgsi/tgsi_parse.h"
@@ -64,24 +65,24 @@ struct ttn_compile {
nir_register *addr_reg;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* For each IF/ELSE/ENDIF block, if_stack[if_stack_pos] has where the else
* instructions should be placed, and if_stack[if_stack_pos - 1] has where
* the next instructions outside of the if/then/else block go.
*/
struct exec_list **if_stack;
nir_cursor *if_stack;
unsigned if_stack_pos;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* loop_stack[loop_stack_pos - 1] contains the cf_node_list for the outside
* of the loop.
*/
struct exec_list **loop_stack;
nir_cursor *loop_stack;
unsigned loop_stack_pos;
/* How many TGSI_FILE_IMMEDIATE vec4s have been parsed so far. */
@@ -93,6 +94,128 @@ struct ttn_compile {
#define ttn_channel(b, src, swiz) \
nir_swizzle(b, src, SWIZ(swiz, swiz, swiz, swiz), 1, false)
static gl_varying_slot
tgsi_varying_semantic_to_slot(unsigned semantic, unsigned index)
{
switch (semantic) {
case TGSI_SEMANTIC_POSITION:
return VARYING_SLOT_POS;
case TGSI_SEMANTIC_COLOR:
if (index == 0)
return VARYING_SLOT_COL0;
else
return VARYING_SLOT_COL1;
case TGSI_SEMANTIC_BCOLOR:
if (index == 0)
return VARYING_SLOT_BFC0;
else
return VARYING_SLOT_BFC1;
case TGSI_SEMANTIC_FOG:
return VARYING_SLOT_FOGC;
case TGSI_SEMANTIC_PSIZE:
return VARYING_SLOT_PSIZ;
case TGSI_SEMANTIC_GENERIC:
return VARYING_SLOT_VAR0 + index;
case TGSI_SEMANTIC_FACE:
return VARYING_SLOT_FACE;
case TGSI_SEMANTIC_EDGEFLAG:
return VARYING_SLOT_EDGE;
case TGSI_SEMANTIC_PRIMID:
return VARYING_SLOT_PRIMITIVE_ID;
case TGSI_SEMANTIC_CLIPDIST:
if (index == 0)
return VARYING_SLOT_CLIP_DIST0;
else
return VARYING_SLOT_CLIP_DIST1;
case TGSI_SEMANTIC_CLIPVERTEX:
return VARYING_SLOT_CLIP_VERTEX;
case TGSI_SEMANTIC_TEXCOORD:
return VARYING_SLOT_TEX0 + index;
case TGSI_SEMANTIC_PCOORD:
return VARYING_SLOT_PNTC;
case TGSI_SEMANTIC_VIEWPORT_INDEX:
return VARYING_SLOT_VIEWPORT;
case TGSI_SEMANTIC_LAYER:
return VARYING_SLOT_LAYER;
default:
fprintf(stderr, "Bad TGSI semantic: %d/%d\n", semantic, index);
abort();
}
}
/* Temporary helper to remap back to TGSI style semantic name/index
* values, for use in drivers that haven't been converted to using
* VARYING_SLOT_
*/
void
varying_slot_to_tgsi_semantic(gl_varying_slot slot,
unsigned *semantic_name, unsigned *semantic_index)
{
static const unsigned map[][2] = {
[VARYING_SLOT_POS] = { TGSI_SEMANTIC_POSITION, 0 },
[VARYING_SLOT_COL0] = { TGSI_SEMANTIC_COLOR, 0 },
[VARYING_SLOT_COL1] = { TGSI_SEMANTIC_COLOR, 1 },
[VARYING_SLOT_BFC0] = { TGSI_SEMANTIC_BCOLOR, 0 },
[VARYING_SLOT_BFC1] = { TGSI_SEMANTIC_BCOLOR, 1 },
[VARYING_SLOT_FOGC] = { TGSI_SEMANTIC_FOG, 0 },
[VARYING_SLOT_PSIZ] = { TGSI_SEMANTIC_PSIZE, 0 },
[VARYING_SLOT_FACE] = { TGSI_SEMANTIC_FACE, 0 },
[VARYING_SLOT_EDGE] = { TGSI_SEMANTIC_EDGEFLAG, 0 },
[VARYING_SLOT_PRIMITIVE_ID] = { TGSI_SEMANTIC_PRIMID, 0 },
[VARYING_SLOT_CLIP_DIST0] = { TGSI_SEMANTIC_CLIPDIST, 0 },
[VARYING_SLOT_CLIP_DIST1] = { TGSI_SEMANTIC_CLIPDIST, 1 },
[VARYING_SLOT_CLIP_VERTEX] = { TGSI_SEMANTIC_CLIPVERTEX, 0 },
[VARYING_SLOT_PNTC] = { TGSI_SEMANTIC_PCOORD, 0 },
[VARYING_SLOT_VIEWPORT] = { TGSI_SEMANTIC_VIEWPORT_INDEX, 0 },
[VARYING_SLOT_LAYER] = { TGSI_SEMANTIC_LAYER, 0 },
};
if (slot >= VARYING_SLOT_VAR0) {
*semantic_name = TGSI_SEMANTIC_GENERIC;
*semantic_index = slot - VARYING_SLOT_VAR0;
return;
}
if (slot >= VARYING_SLOT_TEX0 && slot <= VARYING_SLOT_TEX7) {
*semantic_name = TGSI_SEMANTIC_TEXCOORD;
*semantic_index = slot - VARYING_SLOT_TEX0;
return;
}
if (slot >= ARRAY_SIZE(map)) {
fprintf(stderr, "Unknown varying slot %d\n", slot);
abort();
}
*semantic_name = map[slot][0];
*semantic_index = map[slot][1];
}
/* Temporary helper to remap back to TGSI style semantic name/index
* values, for use in drivers that haven't been converted to using
* FRAG_RESULT_
*/
void
frag_result_to_tgsi_semantic(gl_frag_result slot,
unsigned *semantic_name, unsigned *semantic_index)
{
static const unsigned map[][2] = {
[FRAG_RESULT_DEPTH] = { TGSI_SEMANTIC_POSITION, 0 },
[FRAG_RESULT_COLOR] = { TGSI_SEMANTIC_COLOR, -1 },
[FRAG_RESULT_DATA0 + 0] = { TGSI_SEMANTIC_COLOR, 0 },
[FRAG_RESULT_DATA0 + 1] = { TGSI_SEMANTIC_COLOR, 1 },
[FRAG_RESULT_DATA0 + 2] = { TGSI_SEMANTIC_COLOR, 2 },
[FRAG_RESULT_DATA0 + 3] = { TGSI_SEMANTIC_COLOR, 3 },
[FRAG_RESULT_DATA0 + 4] = { TGSI_SEMANTIC_COLOR, 4 },
[FRAG_RESULT_DATA0 + 5] = { TGSI_SEMANTIC_COLOR, 5 },
[FRAG_RESULT_DATA0 + 6] = { TGSI_SEMANTIC_COLOR, 6 },
[FRAG_RESULT_DATA0 + 7] = { TGSI_SEMANTIC_COLOR, 7 },
};
*semantic_name = map[slot][0];
*semantic_index = map[slot][1];
}
static nir_ssa_def *
ttn_src_for_dest(nir_builder *b, nir_alu_dest *dest)
{
@@ -215,12 +338,15 @@ ttn_emit_declaration(struct ttn_compile *c)
var->data.mode = nir_var_shader_in;
var->name = ralloc_asprintf(var, "in_%d", idx);
/* We should probably translate to a VERT_ATTRIB_* or VARYING_SLOT_*
* instead, but nothing in NIR core is looking at the value
* currently, and this is less change to drivers.
*/
var->data.location = decl->Semantic.Name;
var->data.index = decl->Semantic.Index;
if (c->scan->processor == TGSI_PROCESSOR_FRAGMENT) {
var->data.location =
tgsi_varying_semantic_to_slot(decl->Semantic.Name,
decl->Semantic.Index);
} else {
assert(!decl->Declaration.Semantic);
var->data.location = VERT_ATTRIB_GENERIC0 + idx;
}
var->data.index = 0;
/* We definitely need to translate the interpolation field, because
* nir_print will decode it.
@@ -240,6 +366,8 @@ ttn_emit_declaration(struct ttn_compile *c)
exec_list_push_tail(&b->shader->inputs, &var->node);
break;
case TGSI_FILE_OUTPUT: {
int semantic_name = decl->Semantic.Name;
int semantic_index = decl->Semantic.Index;
/* Since we can't load from outputs in the IR, we make temporaries
* for the outputs and emit stores to the real outputs at the end of
* the shader.
@@ -251,14 +379,40 @@ ttn_emit_declaration(struct ttn_compile *c)
var->data.mode = nir_var_shader_out;
var->name = ralloc_asprintf(var, "out_%d", idx);
var->data.index = 0;
var->data.location = decl->Semantic.Name;
if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
decl->Semantic.Index == 0 &&
c->scan->properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS])
var->data.index = -1;
else
var->data.index = decl->Semantic.Index;
if (c->scan->processor == TGSI_PROCESSOR_FRAGMENT) {
switch (semantic_name) {
case TGSI_SEMANTIC_COLOR: {
/* TODO tgsi loses some information, so we cannot
* actually differentiate here between DSB and MRT
* at this point. But so far no drivers using tgsi-
* to-nir support dual source blend:
*/
bool dual_src_blend = false;
if (dual_src_blend && (semantic_index == 1)) {
var->data.location = FRAG_RESULT_DATA0;
var->data.index = 1;
} else {
if (c->scan->properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS])
var->data.location = FRAG_RESULT_COLOR;
else
var->data.location = FRAG_RESULT_DATA0 + semantic_index;
}
break;
}
case TGSI_SEMANTIC_POSITION:
var->data.location = FRAG_RESULT_DEPTH;
break;
default:
fprintf(stderr, "Bad TGSI semantic: %d/%d\n",
decl->Semantic.Name, decl->Semantic.Index);
abort();
}
} else {
var->data.location =
tgsi_varying_semantic_to_slot(semantic_name, semantic_index);
}
if (is_array) {
unsigned j;
@@ -307,7 +461,7 @@ ttn_emit_immediate(struct ttn_compile *c)
for (i = 0; i < 4; i++)
load_const->value.u[i] = tgsi_imm->u[i].Uint;
nir_instr_insert_after_cf_list(b->cf_node_list, &load_const->instr);
nir_builder_instr_insert(b, &load_const->instr);
}
static nir_src
@@ -363,7 +517,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->variables[0] = ttn_array_deref(c, load, var, offset, indirect);
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
@@ -414,7 +568,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->num_components = ncomp;
nir_ssa_dest_init(&load->instr, &load->dest, ncomp, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -476,7 +630,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
srcn++;
}
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -552,7 +706,7 @@ ttn_get_dest(struct ttn_compile *c, struct tgsi_full_dst_register *tgsi_fdst)
load->dest = nir_dest_for_reg(reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
} else {
assert(!tgsi_dst->Indirect);
dest.dest.reg.reg = c->temp_regs[index].reg;
@@ -667,7 +821,7 @@ ttn_alu(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
instr->src[i].src = nir_src_for_ssa(src[i]);
instr->dest = dest;
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -683,7 +837,7 @@ ttn_move_dest_masked(nir_builder *b, nir_alu_dest dest,
mov->src[0].src = nir_src_for_ssa(def);
for (unsigned i = def->num_components; i < 4; i++)
mov->src[0].swizzle[i] = def->num_components - 1;
nir_instr_insert_after_cf_list(b->cf_node_list, &mov->instr);
nir_builder_instr_insert(b, &mov->instr);
}
static void
@@ -902,7 +1056,7 @@ ttn_kill(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
{
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -912,7 +1066,7 @@ ttn_kill_if(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
discard->src[0] = nir_src_for_ssa(cmp);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -920,10 +1074,6 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
{
nir_builder *b = &c->build;
/* Save the outside-of-the-if-statement node list. */
c->if_stack[c->if_stack_pos] = b->cf_node_list;
c->if_stack_pos++;
src = ttn_channel(b, src, X);
nir_if *if_stmt = nir_if_create(b->shader);
@@ -932,11 +1082,14 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
} else {
if_stmt->condition = nir_src_for_ssa(nir_fne(b, src, nir_imm_int(b, 0)));
}
nir_cf_node_insert_end(b->cf_node_list, &if_stmt->cf_node);
nir_builder_cf_insert(b, &if_stmt->cf_node);
nir_builder_insert_after_cf_list(b, &if_stmt->then_list);
c->if_stack[c->if_stack_pos] = nir_after_cf_node(&if_stmt->cf_node);
c->if_stack_pos++;
c->if_stack[c->if_stack_pos] = &if_stmt->else_list;
b->cursor = nir_after_cf_list(&if_stmt->then_list);
c->if_stack[c->if_stack_pos] = nir_after_cf_list(&if_stmt->else_list);
c->if_stack_pos++;
}
@@ -945,7 +1098,7 @@ ttn_else(struct ttn_compile *c)
{
nir_builder *b = &c->build;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos - 1]);
b->cursor = c->if_stack[c->if_stack_pos - 1];
}
static void
@@ -954,7 +1107,7 @@ ttn_endif(struct ttn_compile *c)
nir_builder *b = &c->build;
c->if_stack_pos -= 2;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos]);
b->cursor = c->if_stack[c->if_stack_pos];
}
static void
@@ -962,28 +1115,27 @@ ttn_bgnloop(struct ttn_compile *c)
{
nir_builder *b = &c->build;
/* Save the outside-of-the-loop node list. */
c->loop_stack[c->loop_stack_pos] = b->cf_node_list;
nir_loop *loop = nir_loop_create(b->shader);
nir_builder_cf_insert(b, &loop->cf_node);
c->loop_stack[c->loop_stack_pos] = nir_after_cf_node(&loop->cf_node);
c->loop_stack_pos++;
nir_loop *loop = nir_loop_create(b->shader);
nir_cf_node_insert_end(b->cf_node_list, &loop->cf_node);
nir_builder_insert_after_cf_list(b, &loop->body);
b->cursor = nir_after_cf_list(&loop->body);
}
static void
ttn_cont(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_continue);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
ttn_brk(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_break);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -992,7 +1144,7 @@ ttn_endloop(struct ttn_compile *c)
nir_builder *b = &c->build;
c->loop_stack_pos--;
nir_builder_insert_after_cf_list(b, c->loop_stack[c->loop_stack_pos]);
b->cursor = c->loop_stack[c->loop_stack_pos];
}
static void
@@ -1087,6 +1239,11 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
op = nir_texop_tex;
num_srcs = 1;
break;
case TGSI_OPCODE_TEX2:
op = nir_texop_tex;
num_srcs = 1;
samp = 2;
break;
case TGSI_OPCODE_TXP:
op = nir_texop_tex;
num_srcs = 2;
@@ -1242,10 +1399,12 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
}
if (instr->is_shadow) {
if (instr->coord_components < 3)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], Z));
else
if (instr->coord_components == 4)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[1], X));
else if (instr->coord_components == 3)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], W));
else
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], Z));
instr->src[src_number].src_type = nir_tex_src_comparitor;
src_number++;
@@ -1279,7 +1438,7 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
assert(src_number == num_srcs);
nir_ssa_dest_init(&instr->instr, &instr->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
/* Resolve the writemask on the texture op. */
ttn_move_dest(b, dest, &instr->dest.ssa);
@@ -1318,10 +1477,10 @@ ttn_txq(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
txs->src[0].src_type = nir_tex_src_lod;
nir_ssa_dest_init(&txs->instr, &txs->dest, 3, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &txs->instr);
nir_builder_instr_insert(b, &txs->instr);
nir_ssa_dest_init(&qlv->instr, &qlv->dest, 1, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &qlv->instr);
nir_builder_instr_insert(b, &qlv->instr);
ttn_move_dest_masked(b, dest, &txs->dest.ssa, TGSI_WRITEMASK_XYZ);
ttn_move_dest_masked(b, dest, &qlv->dest.ssa, TGSI_WRITEMASK_W);
@@ -1651,6 +1810,7 @@ ttn_emit_instruction(struct ttn_compile *c)
case TGSI_OPCODE_TXL:
case TGSI_OPCODE_TXB:
case TGSI_OPCODE_TXD:
case TGSI_OPCODE_TEX2:
case TGSI_OPCODE_TXL2:
case TGSI_OPCODE_TXB2:
case TGSI_OPCODE_TXQ_LZ:
@@ -1730,7 +1890,7 @@ ttn_emit_instruction(struct ttn_compile *c)
store->variables[0] = ttn_array_deref(c, store, var, offset, indirect);
store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
@@ -1759,11 +1919,26 @@ ttn_add_output_stores(struct ttn_compile *c)
store->const_index[0] = loc;
store->src[0].reg.reg = c->output_regs[loc].reg;
store->src[0].reg.base_offset = c->output_regs[loc].offset;
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
}
static gl_shader_stage
tgsi_processor_to_shader_stage(unsigned processor)
{
switch (processor) {
case TGSI_PROCESSOR_FRAGMENT: return MESA_SHADER_FRAGMENT;
case TGSI_PROCESSOR_VERTEX: return MESA_SHADER_VERTEX;
case TGSI_PROCESSOR_GEOMETRY: return MESA_SHADER_GEOMETRY;
case TGSI_PROCESSOR_TESS_CTRL: return MESA_SHADER_TESS_CTRL;
case TGSI_PROCESSOR_TESS_EVAL: return MESA_SHADER_TESS_EVAL;
case TGSI_PROCESSOR_COMPUTE: return MESA_SHADER_COMPUTE;
default:
unreachable("invalid TGSI processor");
};
}
struct nir_shader *
tgsi_to_nir(const void *tgsi_tokens,
const nir_shader_compiler_options *options)
@@ -1775,17 +1950,19 @@ tgsi_to_nir(const void *tgsi_tokens,
int ret;
c = rzalloc(NULL, struct ttn_compile);
s = nir_shader_create(NULL, options);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
s = nir_shader_create(NULL, tgsi_processor_to_shader_stage(scan.processor),
options);
nir_function *func = nir_function_create(s, "main");
nir_function_overload *overload = nir_function_overload_create(func);
nir_function_impl *impl = nir_function_impl_create(overload);
nir_builder_init(&c->build, impl);
nir_builder_insert_after_cf_list(&c->build, &impl->body);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
c->build.cursor = nir_after_cf_list(&impl->body);
s->num_inputs = scan.file_max[TGSI_FILE_INPUT] + 1;
s->num_uniforms = scan.const_file_max[0] + 1;
@@ -1801,10 +1978,10 @@ tgsi_to_nir(const void *tgsi_tokens,
c->num_samp_types = scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1;
c->samp_types = rzalloc_array(c, nir_alu_type, c->num_samp_types);
c->if_stack = rzalloc_array(c, struct exec_list *,
c->if_stack = rzalloc_array(c, nir_cursor,
(scan.opcode_count[TGSI_OPCODE_IF] +
scan.opcode_count[TGSI_OPCODE_UIF]) * 2);
c->loop_stack = rzalloc_array(c, struct exec_list *,
c->loop_stack = rzalloc_array(c, nir_cursor,
scan.opcode_count[TGSI_OPCODE_BGNLOOP]);
ret = tgsi_parse_init(&parser, tgsi_tokens);

View File

@@ -28,3 +28,9 @@ struct nir_shader_compiler_options *options;
struct nir_shader *
tgsi_to_nir(const void *tgsi_tokens,
const struct nir_shader_compiler_options *options);
void
varying_slot_to_tgsi_semantic(gl_varying_slot slot,
unsigned *semantic_name, unsigned *semantic_index);
void
frag_result_to_tgsi_semantic(gl_frag_result slot,
unsigned *semantic_name, unsigned *semantic_index);

View File

@@ -96,11 +96,13 @@ os_log_message(const char *message)
}
#if !defined(PIPE_SUBSYSTEM_EMBEDDED)
const char *
os_get_option(const char *name)
{
return getenv(name);
}
#endif /* !PIPE_SUBSYSTEM_EMBEDDED */
/**

View File

@@ -54,37 +54,48 @@ boolean
os_get_process_name(char *procname, size_t size)
{
const char *name;
/* First, check if the GALLIUM_PROCESS_NAME env var is set to
* override the normal process name query.
*/
name = os_get_option("GALLIUM_PROCESS_NAME");
if (!name) {
/* do normal query */
#if defined(PIPE_SUBSYSTEM_WINDOWS_USER)
char szProcessPath[MAX_PATH];
char *lpProcessName;
char *lpProcessExt;
char szProcessPath[MAX_PATH];
char *lpProcessName;
char *lpProcessExt;
GetModuleFileNameA(NULL, szProcessPath, Elements(szProcessPath));
GetModuleFileNameA(NULL, szProcessPath, Elements(szProcessPath));
lpProcessName = strrchr(szProcessPath, '\\');
lpProcessName = lpProcessName ? lpProcessName + 1 : szProcessPath;
lpProcessName = strrchr(szProcessPath, '\\');
lpProcessName = lpProcessName ? lpProcessName + 1 : szProcessPath;
lpProcessExt = strrchr(lpProcessName, '.');
if (lpProcessExt) {
*lpProcessExt = '\0';
}
lpProcessExt = strrchr(lpProcessName, '.');
if (lpProcessExt) {
*lpProcessExt = '\0';
}
name = lpProcessName;
name = lpProcessName;
#elif defined(__GLIBC__) || defined(__CYGWIN__)
name = program_invocation_short_name;
name = program_invocation_short_name;
#elif defined(PIPE_OS_BSD) || defined(PIPE_OS_APPLE)
/* *BSD and OS X */
name = getprogname();
/* *BSD and OS X */
name = getprogname();
#elif defined(PIPE_OS_HAIKU)
image_info info;
get_image_info(B_CURRENT_TEAM, &info);
name = info.name;
image_info info;
get_image_info(B_CURRENT_TEAM, &info);
name = info.name;
#else
#warning unexpected platform in os_process.c
return FALSE;
return FALSE;
#endif
}
assert(size > 0);
assert(procname);

View File

@@ -0,0 +1,49 @@
# Mesa 3-D graphics library
#
# Copyright (C) 2015 Emil Velikov <emil.l.velikov@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# NOTE: Currently we build only a 'static' pipe-loader
LOCAL_PATH := $(call my-dir)
# get COMMON_SOURCES and DRM_SOURCES
include $(LOCAL_PATH)/Makefile.sources
include $(CLEAR_VARS)
LOCAL_CFLAGS := \
-DHAVE_PIPE_LOADER_DRI \
-DDROP_PIPE_LOADER_MISC \
-DGALLIUM_STATIC_TARGETS
LOCAL_SRC_FILES := $(COMMON_SOURCES)
LOCAL_MODULE := libmesa_pipe_loader
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SRC_FILES += $(DRM_SOURCES)
LOCAL_SHARED_LIBRARIES := libdrm
LOCAL_STATIC_LIBRARIES := libmesa_loader
endif
include $(GALLIUM_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

View File

@@ -9,20 +9,40 @@ AM_CFLAGS = \
$(GALLIUM_CFLAGS) \
$(VISIBILITY_CFLAGS)
noinst_LTLIBRARIES = libpipe_loader.la
noinst_LTLIBRARIES = \
libpipe_loader_static.la \
libpipe_loader_dynamic.la
libpipe_loader_la_SOURCES = \
libpipe_loader_static_la_CFLAGS = \
$(AM_CFLAGS) \
-DGALLIUM_STATIC_TARGETS=1
libpipe_loader_dynamic_la_CFLAGS = \
$(AM_CFLAGS) \
-DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
libpipe_loader_static_la_SOURCES = \
$(COMMON_SOURCES)
if HAVE_DRM_LOADER_GALLIUM
libpipe_loader_dynamic_la_SOURCES = \
$(COMMON_SOURCES)
if HAVE_LIBDRM
AM_CFLAGS += \
$(LIBDRM_CFLAGS)
libpipe_loader_la_SOURCES += \
libpipe_loader_static_la_SOURCES += \
$(DRM_SOURCES)
libpipe_loader_la_LIBADD = \
libpipe_loader_dynamic_la_SOURCES += \
$(DRM_SOURCES)
libpipe_loader_static_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la
libpipe_loader_dynamic_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la
endif
EXTRA_DIST = SConscript

View File

@@ -0,0 +1,34 @@
Import('*')
env = env.Clone()
env.MSVC2008Compat()
env.Append(CPPPATH = [
'#/src/loader',
'#/src/gallium/winsys',
])
env.Append(CPPDEFINES = [
('HAVE_PIPE_LOADER_DRI', '1'),
('DROP_PIPE_LOADER_MISC', '1'),
('GALLIUM_STATIC_TARGETS', '1'),
])
source = env.ParseSourceList('Makefile.sources', 'COMMON_SOURCES')
#if HAVE_LIBDRM
source += env.ParseSourceList('Makefile.sources', 'DRM_SOURCES')
env.PkgUseModules('DRM')
env.Append(LIBS = [libloader])
#endif
pipe_loader = env.ConvenienceLibrary(
target = 'pipe_loader',
source = source,
)
env.Alias('pipe_loader', pipe_loader)
Export('pipe_loader')

View File

@@ -35,7 +35,7 @@
#define MODULE_PREFIX "pipe_"
static int (*backends[])(struct pipe_loader_device **, int) = {
#ifdef HAVE_PIPE_LOADER_DRM
#ifdef HAVE_LIBDRM
&pipe_loader_drm_probe,
#endif
&pipe_loader_sw_probe
@@ -69,10 +69,9 @@ pipe_loader_configuration(struct pipe_loader_device *dev,
}
struct pipe_screen *
pipe_loader_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_create_screen(struct pipe_loader_device *dev)
{
return dev->ops->create_screen(dev, library_paths);
return dev->ops->create_screen(dev);
}
struct util_dl_library *

View File

@@ -82,13 +82,9 @@ pipe_loader_probe(struct pipe_loader_device **devs, int ndev);
* Create a pipe_screen for the specified device.
*
* \param dev Device the screen will be created for.
* \param library_paths Colon-separated list of filesystem paths that
* will be used to look for the pipe driver
* module that handles this device.
*/
struct pipe_screen *
pipe_loader_create_screen(struct pipe_loader_device *dev,
const char *library_paths);
pipe_loader_create_screen(struct pipe_loader_device *dev);
/**
* Query the configuration parameters for the specified device.
@@ -112,8 +108,6 @@ pipe_loader_configuration(struct pipe_loader_device *dev,
void
pipe_loader_release(struct pipe_loader_device **devs, int ndev);
#ifdef HAVE_PIPE_LOADER_DRI
/**
* Initialize sw dri device give the drisw_loader_funcs.
*
@@ -125,7 +119,15 @@ bool
pipe_loader_sw_probe_dri(struct pipe_loader_device **devs,
struct drisw_loader_funcs *drisw_lf);
#endif
/**
* Initialize a kms backed sw device given an fd.
*
* This function is platform-specific.
*
* \sa pipe_loader_probe
*/
bool
pipe_loader_sw_probe_kms(struct pipe_loader_device **devs, int fd);
/**
* Initialize a null sw device.
@@ -158,8 +160,6 @@ boolean
pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
struct pipe_screen *screen);
#ifdef HAVE_PIPE_LOADER_DRM
/**
* Get a list of known DRM devices.
*
@@ -180,8 +180,6 @@ pipe_loader_drm_probe(struct pipe_loader_device **devs, int ndev);
bool
pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd);
#endif
#ifdef __cplusplus
}
#endif

View File

@@ -36,6 +36,7 @@
#include <unistd.h>
#include "loader.h"
#include "target-helpers/drm_helper_public.h"
#include "state_tracker/drm_driver.h"
#include "pipe_loader_priv.h"
@@ -50,13 +51,119 @@
struct pipe_loader_drm_device {
struct pipe_loader_device base;
const struct drm_driver_descriptor *dd;
#ifndef GALLIUM_STATIC_TARGETS
struct util_dl_library *lib;
#endif
int fd;
};
#define pipe_loader_drm_device(dev) ((struct pipe_loader_drm_device *)dev)
static struct pipe_loader_ops pipe_loader_drm_ops;
static const struct pipe_loader_ops pipe_loader_drm_ops;
#ifdef GALLIUM_STATIC_TARGETS
static const struct drm_conf_ret throttle_ret = {
DRM_CONF_INT,
{2},
};
static const struct drm_conf_ret share_fd_ret = {
DRM_CONF_BOOL,
{true},
};
static inline const struct drm_conf_ret *
configuration_query(enum drm_conf conf)
{
switch (conf) {
case DRM_CONF_THROTTLE:
return &throttle_ret;
case DRM_CONF_SHARE_FD:
return &share_fd_ret;
default:
break;
}
return NULL;
}
static const struct drm_driver_descriptor driver_descriptors[] = {
{
.name = "i915",
.driver_name = "i915",
.create_screen = pipe_i915_create_screen,
.configuration = configuration_query,
},
{
.name = "i965",
.driver_name = "i915",
.create_screen = pipe_ilo_create_screen,
.configuration = configuration_query,
},
{
.name = "nouveau",
.driver_name = "nouveau",
.create_screen = pipe_nouveau_create_screen,
.configuration = configuration_query,
},
{
.name = "r300",
.driver_name = "radeon",
.create_screen = pipe_r300_create_screen,
.configuration = configuration_query,
},
{
.name = "r600",
.driver_name = "radeon",
.create_screen = pipe_r600_create_screen,
.configuration = configuration_query,
},
{
.name = "radeonsi",
.driver_name = "radeon",
.create_screen = pipe_radeonsi_create_screen,
.configuration = configuration_query,
},
{
.name = "vmwgfx",
.driver_name = "vmwgfx",
.create_screen = pipe_vmwgfx_create_screen,
.configuration = configuration_query,
},
{
.name = "kgsl",
.driver_name = "freedreno",
.create_screen = pipe_freedreno_create_screen,
.configuration = configuration_query,
},
{
.name = "msm",
.driver_name = "freedreno",
.create_screen = pipe_freedreno_create_screen,
.configuration = configuration_query,
},
{
.name = "virtio_gpu",
.driver_name = "virtio-gpu",
.create_screen = pipe_virgl_create_screen,
.configuration = configuration_query,
},
{
.name = "vc4",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
#ifdef USE_VC4_SIMULATOR
{
.name = "i965",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
#endif
};
#endif
bool
pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd)
@@ -81,10 +188,36 @@ pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd)
if (!ddev->base.driver_name)
goto fail;
#ifdef GALLIUM_STATIC_TARGETS
for (int i = 0; i < ARRAY_SIZE(driver_descriptors); i++) {
if (strcmp(driver_descriptors[i].name, ddev->base.driver_name) == 0) {
ddev->dd = &driver_descriptors[i];
break;
}
}
if (!ddev->dd)
goto fail;
#else
ddev->lib = pipe_loader_find_module(&ddev->base, PIPE_SEARCH_DIR);
if (!ddev->lib)
goto fail;
ddev->dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!ddev->dd || strcmp(ddev->dd->name, ddev->base.driver_name) != 0)
goto fail;
#endif
*dev = &ddev->base;
return true;
fail:
#ifndef GALLIUM_STATIC_TARGETS
if (ddev->lib)
util_dl_close(ddev->lib);
#endif
FREE(ddev);
return false;
}
@@ -105,8 +238,9 @@ pipe_loader_drm_probe(struct pipe_loader_device **devs, int ndev)
for (i = DRM_RENDER_NODE_MIN_MINOR, j = 0;
i <= DRM_RENDER_NODE_MAX_MINOR; i++) {
fd = open_drm_render_node_minor(i);
struct pipe_loader_device *dev;
fd = open_drm_render_node_minor(i);
if (fd < 0)
continue;
@@ -132,8 +266,10 @@ pipe_loader_drm_release(struct pipe_loader_device **dev)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(*dev);
#ifndef GALLIUM_STATIC_TARGETS
if (ddev->lib)
util_dl_close(ddev->lib);
#endif
close(ddev->fd);
FREE(ddev->base.driver_name);
@@ -146,47 +282,22 @@ pipe_loader_drm_configuration(struct pipe_loader_device *dev,
enum drm_conf conf)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(dev);
const struct drm_driver_descriptor *dd;
if (!ddev->lib)
if (!ddev->dd->configuration)
return NULL;
dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!dd || strcmp(dd->name, ddev->base.driver_name) != 0)
return NULL;
if (!dd->configuration)
return NULL;
return dd->configuration(conf);
return ddev->dd->configuration(conf);
}
static struct pipe_screen *
pipe_loader_drm_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_drm_create_screen(struct pipe_loader_device *dev)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(dev);
const struct drm_driver_descriptor *dd;
if (!ddev->lib)
ddev->lib = pipe_loader_find_module(dev, library_paths);
if (!ddev->lib)
return NULL;
dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!dd || strcmp(dd->name, ddev->base.driver_name) != 0)
return NULL;
return dd->create_screen(ddev->fd);
return ddev->dd->create_screen(ddev->fd);
}
static struct pipe_loader_ops pipe_loader_drm_ops = {
static const struct pipe_loader_ops pipe_loader_drm_ops = {
.create_screen = pipe_loader_drm_create_screen,
.configuration = pipe_loader_drm_configuration,
.release = pipe_loader_drm_release

View File

@@ -31,8 +31,7 @@
#include "pipe_loader.h"
struct pipe_loader_ops {
struct pipe_screen *(*create_screen)(struct pipe_loader_device *dev,
const char *library_paths);
struct pipe_screen *(*create_screen)(struct pipe_loader_device *dev);
const struct drm_conf_ret *(*configuration)(struct pipe_loader_device *dev,
enum drm_conf conf);

View File

@@ -30,45 +30,160 @@
#include "util/u_memory.h"
#include "util/u_dl.h"
#include "sw/dri/dri_sw_winsys.h"
#include "sw/kms-dri/kms_dri_sw_winsys.h"
#include "sw/null/null_sw_winsys.h"
#include "sw/wrapper/wrapper_sw_winsys.h"
#include "target-helpers/inline_sw_helper.h"
#include "state_tracker/drisw_api.h"
#include "state_tracker/sw_driver.h"
struct pipe_loader_sw_device {
struct pipe_loader_device base;
const struct sw_driver_descriptor *dd;
#ifndef GALLIUM_STATIC_TARGETS
struct util_dl_library *lib;
#endif
struct sw_winsys *ws;
};
#define pipe_loader_sw_device(dev) ((struct pipe_loader_sw_device *)dev)
static struct pipe_loader_ops pipe_loader_sw_ops;
static const struct pipe_loader_ops pipe_loader_sw_ops;
static struct sw_winsys *(*backends[])() = {
null_sw_create
#ifdef GALLIUM_STATIC_TARGETS
static const struct sw_driver_descriptor driver_descriptors = {
.create_screen = sw_screen_create,
.winsys = {
#ifdef HAVE_PIPE_LOADER_DRI
{
.name = "dri",
.create_winsys = dri_create_sw_winsys,
},
#endif
#ifdef HAVE_PIPE_LOADER_KMS
{
.name = "kms_dri",
.create_winsys = kms_dri_create_winsys,
},
#endif
/**
* XXX: Do not include these two for non autotools builds.
* They don't have neither opencl nor nine, where these are used.
*/
#ifndef DROP_PIPE_LOADER_MISC
{
.name = "null",
.create_winsys = null_sw_create,
},
{
.name = "wrapped",
.create_winsys = wrapper_sw_winsys_wrap_pipe_screen,
},
#endif
{ 0 },
}
};
#endif
static bool
pipe_loader_sw_probe_init_common(struct pipe_loader_sw_device *sdev)
{
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
#ifdef GALLIUM_STATIC_TARGETS
sdev->dd = &driver_descriptors;
if (!sdev->dd)
return false;
#else
sdev->lib = pipe_loader_find_module(&sdev->base, PIPE_SEARCH_DIR);
if (!sdev->lib)
return false;
sdev->dd = (const struct sw_driver_descriptor *)
util_dl_get_proc_address(sdev->lib, "swrast_driver_descriptor");
if (!sdev->dd){
util_dl_close(sdev->lib);
sdev->lib = NULL;
return false;
}
#endif
return true;
}
static void
pipe_loader_sw_probe_teardown_common(struct pipe_loader_sw_device *sdev)
{
#ifndef GALLIUM_STATIC_TARGETS
if (sdev->lib)
util_dl_close(sdev->lib);
#endif
}
#ifdef HAVE_PIPE_LOADER_DRI
bool
pipe_loader_sw_probe_dri(struct pipe_loader_device **devs, struct drisw_loader_funcs *drisw_lf)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = dri_create_sw_winsys(drisw_lf);
if (!sdev->ws) {
FREE(sdev);
return false;
}
*devs = &sdev->base;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys; i++) {
if (strcmp(sdev->dd->winsys[i].name, "dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(drisw_lf);
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
#endif
#ifdef HAVE_PIPE_LOADER_KMS
bool
pipe_loader_sw_probe_kms(struct pipe_loader_device **devs, int fd)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys; i++) {
if (strcmp(sdev->dd->winsys[i].name, "kms_dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(fd);
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
#endif
@@ -76,38 +191,40 @@ bool
pipe_loader_sw_probe_null(struct pipe_loader_device **devs)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = null_sw_create();
if (!sdev->ws) {
FREE(sdev);
return false;
}
*devs = &sdev->base;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys; i++) {
if (strcmp(sdev->dd->winsys[i].name, "null") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys();
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
int
pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev)
{
int i;
int i = 1;
for (i = 0; i < Elements(backends); i++) {
if (i < ndev) {
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
/* TODO: handle CALLOC_STRUCT failure */
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = backends[i]();
devs[i] = &sdev->base;
if (i < ndev) {
if (!pipe_loader_sw_probe_null(devs)) {
i--;
}
}
@@ -119,21 +236,30 @@ pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
struct pipe_screen *screen)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = wrapper_sw_winsys_wrap_pipe_screen(screen);
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
if (!sdev->ws) {
FREE(sdev);
return false;
for (i = 0; sdev->dd->winsys; i++) {
if (strcmp(sdev->dd->winsys[i].name, "wrapped") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(screen);
break;
}
}
if (!sdev->ws)
goto fail;
*dev = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
static void
@@ -141,8 +267,10 @@ pipe_loader_sw_release(struct pipe_loader_device **dev)
{
struct pipe_loader_sw_device *sdev = pipe_loader_sw_device(*dev);
#ifndef GALLIUM_STATIC_TARGETS
if (sdev->lib)
util_dl_close(sdev->lib);
#endif
FREE(sdev);
*dev = NULL;
@@ -156,28 +284,19 @@ pipe_loader_sw_configuration(struct pipe_loader_device *dev,
}
static struct pipe_screen *
pipe_loader_sw_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_sw_create_screen(struct pipe_loader_device *dev)
{
struct pipe_loader_sw_device *sdev = pipe_loader_sw_device(dev);
struct pipe_screen *(*init)(struct sw_winsys *);
struct pipe_screen *screen;
if (!sdev->lib)
sdev->lib = pipe_loader_find_module(dev, library_paths);
if (!sdev->lib)
return NULL;
screen = sdev->dd->create_screen(sdev->ws);
if (!screen)
sdev->ws->destroy(sdev->ws);
init = (void *)util_dl_get_proc_address(sdev->lib, "swrast_create_screen");
if (!init){
util_dl_close(sdev->lib);
sdev->lib = NULL;
return NULL;
}
return init(sdev->ws);
return screen;
}
static struct pipe_loader_ops pipe_loader_sw_ops = {
static const struct pipe_loader_ops pipe_loader_sw_ops = {
.create_screen = pipe_loader_sw_create_screen,
.configuration = pipe_loader_sw_configuration,
.release = pipe_loader_sw_release

View File

@@ -166,6 +166,11 @@ pb_cache_manager_create(struct pb_manager *provider,
unsigned bypass_usage,
uint64_t maximum_cache_size);
/**
* Remove a buffer from the cache, but keep it alive.
*/
void
pb_cache_manager_remove_buffer(struct pb_buffer *buf);
struct pb_fence_ops;

View File

@@ -104,18 +104,42 @@ pb_cache_manager(struct pb_manager *mgr)
}
static void
_pb_cache_manager_remove_buffer_locked(struct pb_cache_buffer *buf)
{
struct pb_cache_manager *mgr = buf->mgr;
if (buf->head.next) {
LIST_DEL(&buf->head);
assert(mgr->numDelayed);
--mgr->numDelayed;
mgr->cache_size -= buf->base.size;
}
buf->mgr = NULL;
}
void
pb_cache_manager_remove_buffer(struct pb_buffer *pb_buf)
{
struct pb_cache_buffer *buf = (struct pb_cache_buffer*)pb_buf;
struct pb_cache_manager *mgr = buf->mgr;
if (!mgr)
return;
pipe_mutex_lock(mgr->mutex);
_pb_cache_manager_remove_buffer_locked(buf);
pipe_mutex_unlock(mgr->mutex);
}
/**
* Actually destroy the buffer.
*/
static inline void
_pb_cache_buffer_destroy(struct pb_cache_buffer *buf)
{
struct pb_cache_manager *mgr = buf->mgr;
LIST_DEL(&buf->head);
assert(mgr->numDelayed);
--mgr->numDelayed;
mgr->cache_size -= buf->base.size;
if (buf->mgr)
_pb_cache_manager_remove_buffer_locked(buf);
assert(!pipe_is_referenced(&buf->base.reference));
pb_reference(&buf->buffer, NULL);
FREE(buf);
@@ -156,6 +180,12 @@ pb_cache_buffer_destroy(struct pb_buffer *_buf)
struct pb_cache_buffer *buf = pb_cache_buffer(_buf);
struct pb_cache_manager *mgr = buf->mgr;
if (!mgr) {
pb_reference(&buf->buffer, NULL);
FREE(buf);
return;
}
pipe_mutex_lock(mgr->mutex);
assert(!pipe_is_referenced(&buf->base.reference));

View File

@@ -0,0 +1,275 @@
#ifndef DRM_HELPER_H
#define DRM_HELPER_H
#include <stdio.h>
#include "target-helpers/inline_debug_helper.h"
#include "target-helpers/drm_helper_public.h"
#ifdef GALLIUM_I915
#include "i915/drm/i915_drm_public.h"
#include "i915/i915_public.h"
struct pipe_screen *
pipe_i915_create_screen(int fd)
{
struct i915_winsys *iws;
struct pipe_screen *screen;
iws = i915_drm_winsys_create(fd);
if (!iws)
return NULL;
screen = i915_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_i915_create_screen(int fd)
{
fprintf(stderr, "i915g: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_ILO
#include "intel/drm/intel_drm_public.h"
#include "ilo/ilo_public.h"
struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
struct intel_winsys *iws;
struct pipe_screen *screen;
iws = intel_winsys_create_for_fd(fd);
if (!iws)
return NULL;
screen = ilo_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
fprintf(stderr, "ilo: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_NOUVEAU
#include "nouveau/drm/nouveau_drm_public.h"
struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
struct pipe_screen *screen;
screen = nouveau_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
fprintf(stderr, "nouveau: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_R300
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r300/r300_public.h"
struct pipe_screen *
pipe_r300_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r300_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_r300_create_screen(int fd)
{
fprintf(stderr, "r300: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_R600
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r600/r600_public.h"
struct pipe_screen *
pipe_r600_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r600_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_r600_create_screen(int fd)
{
fprintf(stderr, "r600: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_RADEONSI
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "amdgpu/drm/amdgpu_public.h"
#include "radeonsi/si_public.h"
struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
struct radeon_winsys *rw;
/* First, try amdgpu. */
rw = amdgpu_winsys_create(fd, radeonsi_screen_create);
if (!rw)
rw = radeon_drm_winsys_create(fd, radeonsi_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
fprintf(stderr, "radeonsi: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VMWGFX
#include "svga/drm/svga_drm_public.h"
#include "svga/svga_public.h"
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
struct svga_winsys_screen *sws;
struct pipe_screen *screen;
sws = svga_drm_winsys_screen_create(fd);
if (!sws)
return NULL;
screen = svga_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
fprintf(stderr, "svga: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_FREEDRENO
#include "freedreno/drm/freedreno_drm_public.h"
struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
struct pipe_screen *screen;
screen = fd_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
fprintf(stderr, "freedreno: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/drm/virgl_drm_public.h"
#include "virgl/virgl_public.h"
static struct pipe_screen *
pipe_virgl_create_screen(int fd)
{
struct virgl_winsys *vws;
struct pipe_screen *screen;
vws = virgl_drm_winsys_create(fd);
if (!vws)
return NULL;
screen = virgl_create_screen(vws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_virgl_create_screen(int fd)
{
fprintf(stderr, "virgl: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VC4
#include "vc4/drm/vc4_drm_public.h"
struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
struct pipe_screen *screen;
screen = vc4_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
fprintf(stderr, "vc4: driver missing\n");
return NULL;
}
#endif
#endif /* DRM_HELPER_H */

View File

@@ -0,0 +1,37 @@
#ifndef _DRM_HELPER_PUBLIC_H
#define _DRM_HELPER_PUBLIC_H
struct pipe_screen;
struct pipe_screen *
pipe_i915_create_screen(int fd);
struct pipe_screen *
pipe_ilo_create_screen(int fd);
struct pipe_screen *
pipe_nouveau_create_screen(int fd);
struct pipe_screen *
pipe_r300_create_screen(int fd);
struct pipe_screen *
pipe_r600_create_screen(int fd);
struct pipe_screen *
pipe_radeonsi_create_screen(int fd);
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd);
struct pipe_screen *
pipe_freedreno_create_screen(int fd);
struct pipe_screen *
pipe_virgl_create_screen(int fd);
struct pipe_screen *
pipe_vc4_create_screen(int fd);
#endif /* _DRM_HELPER_PUBLIC_H */

View File

@@ -11,6 +11,10 @@
* one or more debug driver: rbug, trace.
*/
#ifdef GALLIUM_DDEBUG
#include "ddebug/dd_public.h"
#endif
#ifdef GALLIUM_TRACE
#include "trace/tr_public.h"
#endif
@@ -30,6 +34,10 @@
static inline struct pipe_screen *
debug_screen_wrap(struct pipe_screen *screen)
{
#if defined(GALLIUM_DDEBUG)
screen = ddebug_screen_create(screen);
#endif
#if defined(GALLIUM_RBUG)
screen = rbug_screen_create(screen);
#endif

View File

@@ -1,489 +0,0 @@
#ifndef INLINE_DRM_HELPER_H
#define INLINE_DRM_HELPER_H
#include "state_tracker/drm_driver.h"
#include "target-helpers/inline_debug_helper.h"
#include "loader.h"
#if defined(DRI_TARGET)
#include "dri_screen.h"
#endif
#if GALLIUM_SOFTPIPE
#include "target-helpers/inline_sw_helper.h"
#include "sw/kms-dri/kms_dri_sw_winsys.h"
#endif
#if GALLIUM_I915
#include "i915/drm/i915_drm_public.h"
#include "i915/i915_public.h"
#endif
#if GALLIUM_ILO
#include "intel/drm/intel_drm_public.h"
#include "ilo/ilo_public.h"
#endif
#if GALLIUM_NOUVEAU
#include "nouveau/drm/nouveau_drm_public.h"
#endif
#if GALLIUM_R300
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r300/r300_public.h"
#endif
#if GALLIUM_R600
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r600/r600_public.h"
#endif
#if GALLIUM_RADEONSI
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "amdgpu/drm/amdgpu_public.h"
#include "radeonsi/si_public.h"
#endif
#if GALLIUM_VMWGFX
#include "svga/drm/svga_drm_public.h"
#include "svga/svga_public.h"
#endif
#if GALLIUM_FREEDRENO
#include "freedreno/drm/freedreno_drm_public.h"
#endif
#if GALLIUM_VC4
#include "vc4/drm/vc4_drm_public.h"
#endif
static char* driver_name = NULL;
/* XXX: We need to teardown the winsys if *screen_create() fails. */
#if defined(GALLIUM_SOFTPIPE)
#if defined(DRI_TARGET)
#if defined(HAVE_LIBDRM)
const __DRIextension **__driDriverGetExtensions_kms_swrast(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_kms_swrast(void)
{
globalDriverAPI = &dri_kms_driver_api;
return galliumdrm_driver_extensions;
}
struct pipe_screen *
kms_swrast_create_screen(int fd)
{
struct sw_winsys *sws;
struct pipe_screen *screen;
sws = kms_dri_create_winsys(fd);
if (!sws)
return NULL;
screen = sw_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#endif
#endif
#if defined(GALLIUM_I915)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_i915(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_i915(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_i915_create_screen(int fd)
{
struct i915_winsys *iws;
struct pipe_screen *screen;
iws = i915_drm_winsys_create(fd);
if (!iws)
return NULL;
screen = i915_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_ILO)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_i965(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
struct intel_winsys *iws;
struct pipe_screen *screen;
iws = intel_winsys_create_for_fd(fd);
if (!iws)
return NULL;
screen = ilo_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_NOUVEAU)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_nouveau(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_nouveau(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
struct pipe_screen *screen;
screen = nouveau_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_R300)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_r300(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_r300(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_r300_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r300_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_R600)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_r600(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_r600(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_r600_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r600_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_RADEONSI)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_radeonsi(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_radeonsi(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
struct radeon_winsys *rw;
/* First, try amdgpu. */
rw = amdgpu_winsys_create(fd, radeonsi_screen_create);
if (!rw)
rw = radeon_drm_winsys_create(fd, radeonsi_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_VMWGFX)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_vmwgfx(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_vmwgfx(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
struct svga_winsys_screen *sws;
struct pipe_screen *screen;
sws = svga_drm_winsys_screen_create(fd);
if (!sws)
return NULL;
screen = svga_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_FREEDRENO)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_msm(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_msm(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
const __DRIextension **__driDriverGetExtensions_kgsl(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_kgsl(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
struct pipe_screen *screen;
screen = fd_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_VC4)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_vc4(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#if defined(USE_VC4_SIMULATOR)
const __DRIextension **__driDriverGetExtensions_i965(void);
/**
* When building using the simulator (on x86), we advertise ourselves as the
* i965 driver so that you can just make a directory with a link from
* i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that
* on your i965-using host to run the driver under simulation.
*
* This is, of course, incompatible with building with the ilo driver, but you
* shouldn't be building that anyway.
*/
PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
#endif
static struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
struct pipe_screen *screen;
screen = vc4_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
inline struct pipe_screen *
dd_create_screen(int fd)
{
driver_name = loader_get_driver_for_fd(fd, _LOADER_GALLIUM);
if (!driver_name)
return NULL;
#if defined(GALLIUM_I915)
if (strcmp(driver_name, "i915") == 0)
return pipe_i915_create_screen(fd);
else
#endif
#if defined(GALLIUM_ILO)
if (strcmp(driver_name, "i965") == 0)
return pipe_ilo_create_screen(fd);
else
#endif
#if defined(GALLIUM_NOUVEAU)
if (strcmp(driver_name, "nouveau") == 0)
return pipe_nouveau_create_screen(fd);
else
#endif
#if defined(GALLIUM_R300)
if (strcmp(driver_name, "r300") == 0)
return pipe_r300_create_screen(fd);
else
#endif
#if defined(GALLIUM_R600)
if (strcmp(driver_name, "r600") == 0)
return pipe_r600_create_screen(fd);
else
#endif
#if defined(GALLIUM_RADEONSI)
if (strcmp(driver_name, "radeonsi") == 0)
return pipe_radeonsi_create_screen(fd);
else
#endif
#if defined(GALLIUM_VMWGFX)
if (strcmp(driver_name, "vmwgfx") == 0)
return pipe_vmwgfx_create_screen(fd);
else
#endif
#if defined(GALLIUM_FREEDRENO)
if ((strcmp(driver_name, "kgsl") == 0) || (strcmp(driver_name, "msm") == 0))
return pipe_freedreno_create_screen(fd);
else
#endif
#if defined(GALLIUM_VC4)
if (strcmp(driver_name, "vc4") == 0)
return pipe_vc4_create_screen(fd);
else
#if defined(USE_VC4_SIMULATOR)
if (strcmp(driver_name, "i965") == 0)
return pipe_vc4_create_screen(fd);
else
#endif
#endif
return NULL;
}
inline const char *
dd_driver_name(void)
{
return driver_name;
}
static const struct drm_conf_ret throttle_ret = {
DRM_CONF_INT,
{2},
};
static const struct drm_conf_ret share_fd_ret = {
DRM_CONF_BOOL,
{true},
};
static inline const struct drm_conf_ret *
configuration_query(enum drm_conf conf)
{
switch (conf) {
case DRM_CONF_THROTTLE:
return &throttle_ret;
case DRM_CONF_SHARE_FD:
return &share_fd_ret;
default:
break;
}
return NULL;
}
inline const struct drm_conf_ret *
dd_configuration(enum drm_conf conf)
{
if (!driver_name)
return NULL;
#if defined(GALLIUM_I915)
if (strcmp(driver_name, "i915") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_ILO)
if (strcmp(driver_name, "i965") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_NOUVEAU)
if (strcmp(driver_name, "nouveau") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_R300)
if (strcmp(driver_name, "r300") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_R600)
if (strcmp(driver_name, "r600") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_RADEONSI)
if (strcmp(driver_name, "radeonsi") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_VMWGFX)
if (strcmp(driver_name, "vmwgfx") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_FREEDRENO)
if ((strcmp(driver_name, "kgsl") == 0) || (strcmp(driver_name, "msm") == 0))
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_VC4)
if (strcmp(driver_name, "vc4") == 0)
return configuration_query(conf);
else
#if defined(USE_VC4_SIMULATOR)
if (strcmp(driver_name, "i965") == 0)
return configuration_query(conf);
else
#endif
#endif
return NULL;
}
#endif /* INLINE_DRM_HELPER_H */

View File

@@ -19,6 +19,10 @@
#include "llvmpipe/lp_public.h"
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/virgl_public.h"
#include "virgl/vtest/virgl_vtest_public.h"
#endif
static inline struct pipe_screen *
sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
@@ -30,6 +34,14 @@ sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
screen = llvmpipe_create_screen(winsys);
#endif
#if defined(GALLIUM_VIRGL)
if (screen == NULL && strcmp(driver, "virpipe") == 0) {
struct virgl_winsys *vws;
vws = virgl_vtest_winsys_wrap(winsys);
screen = virgl_create_screen(vws);
}
#endif
#if defined(GALLIUM_SOFTPIPE)
if (screen == NULL)
screen = softpipe_create_screen(winsys);
@@ -57,69 +69,4 @@ sw_screen_create(struct sw_winsys *winsys)
return sw_screen_create_named(winsys, driver);
}
#if defined(GALLIUM_SOFTPIPE)
#if defined(DRI_TARGET)
#include "target-helpers/inline_debug_helper.h"
#include "sw/dri/dri_sw_winsys.h"
#include "dri_screen.h"
const __DRIextension **__driDriverGetExtensions_swrast(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_swrast(void)
{
globalDriverAPI = &galliumsw_driver_api;
return galliumsw_driver_extensions;
}
inline struct pipe_screen *
drisw_create_screen(struct drisw_loader_funcs *lf)
{
struct sw_winsys *winsys = NULL;
struct pipe_screen *screen = NULL;
winsys = dri_create_sw_winsys(lf);
if (winsys == NULL)
return NULL;
screen = sw_screen_create(winsys);
if (screen == NULL) {
winsys->destroy(winsys);
return NULL;
}
screen = debug_screen_wrap(screen);
return screen;
}
#endif // DRI_TARGET
#if defined(NINE_TARGET)
#include "sw/wrapper/wrapper_sw_winsys.h"
#include "target-helpers/inline_debug_helper.h"
extern struct pipe_screen *ninesw_create_screen(struct pipe_screen *screen);
inline struct pipe_screen *
ninesw_create_screen(struct pipe_screen *pscreen)
{
struct sw_winsys *winsys = NULL;
struct pipe_screen *screen = NULL;
winsys = wrapper_sw_winsys_wrap_pipe_screen(pscreen);
if (winsys == NULL)
return NULL;
screen = sw_screen_create(winsys);
if (screen == NULL) {
winsys->destroy(winsys);
return NULL;
}
screen = debug_screen_wrap(screen);
return screen;
}
#endif // NINE_TARGET
#endif // GALLIUM_SOFTPIPE
#endif

Some files were not shown because too many files have changed in this diff Show More