Compare commits

..

7544 Commits

Author SHA1 Message Date
Rhys Kidd
f25fdf21e7 vc4: Fix doxygen warnings
Now that vc4 automated code documentation can be generated with
doxygen, fix the warnings issued by Doxygen 1.8.11.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:45 +01:00
Rhys Kidd
db975fa86c doxygen: Plumb through gallium/ to automated documentation
Add Gallium and the Gallium-based drivers to doxygen's automated
code documentation infrastructure.

Can be individually created with:

  cd $MESA_TOP_LEVEL/
  make -C doxygen/ gallium.tag

Benefits from the existing doxygen Makefile runners to clean up
afterwards with 'make clean'.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:45 +01:00
Emil Velikov
26f4638684 Revert "osmesa: don't try to bundle osmesa.def SConscript"
This reverts commit c07df0f201.

Now that the SCons build is back we need to include the files in the
tarball.
2016-05-30 17:53:45 +01:00
Andreas Fänger
9601815b4b scons: build osmesa swrast and gallium
This patch makes it possible to build classic osmesa/swrast on windows
again. It was removed in commit 69db422218.
Although there is a gallium version of osmesa now, the swrast version
still has more features lacking in llvmpipe, e.g. anisotropic filtering.

Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
[Emil Velikov: remove trailing whitespace]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:45 +01:00
Emil Velikov
3689ef32af automake: rework the git_sha1.h rule, include in tarball
As we'll need the file in the release tarball, rework the rule so that
the file is regenerated _only_ if we're in a git repository.

With this in place we can build vulkan (anv) from a release tarball.

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:45 +01:00
Emil Velikov
4cd9cd6abc automake: move the git_sha1.h rule a level up
This way we can reuse the header from other places like -
src/intel/vulkan and src/gallium. Only the former is hooked up atm.

Make sure .gitignore is updated, as well as all the users (the mesa
code does not need any changes).

Also ensure that the file is always created by adding it to the
BUILT_SOURCES target.

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:45 +01:00
Emil Velikov
13faddb6b8 mesa_glinterop: remove mesa_glinterop typedefs
As is there are two places that do the typedefs - dri_interface.h and
this header. As we cannot include the former in here, just drop the
typedefs and use the struct directly (as needed).

This is required because typedef redefinition is C11 feature which is
not supported on all the versions of GCC used to build mesa.

v2: Kill the typedef alltogether, as per Marek.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96236
Cc: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 17:53:44 +01:00
Emil Velikov
d43c894471 glx/glvnd: automake: include all the sources in libglx_la_SOURCES
Otherwise the headers will be missing from the release tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:44 +01:00
Emil Velikov
f9db61d095 glx/glvnd: remove the final if defined($extension) guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:44 +01:00
Emil Velikov
3bf00b6c6a glx/glvnd: rework dispatch functions/indices tables lookup
Rather than checking if the function name maps to a valid entry in the
respective table, just create a dummy entry at the end of each table.

This allows us to remove some unnessesary "index >= 0" checks, which get
executed quite often.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:44 +01:00
Emil Velikov
eab7e54981 glx/glvnd: Use strcmp() based binary search in FindGLXFunction()
It will allows us to find the function within 6 attempts, out of the ~80
entry long table.

v2: calculate middle on each iteration, correctly set the lower limit.

Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:44 +01:00
Chuck Atkins
f9a35bf012 configure.ac: correct the xlib/xlib-gallium GLX detection for GLVND
Things have changed since commit a92910a ("glx: Refactor the configure
options for glx implementation choice (v3)") where only a single
configure option is used to control the GLX provider.

[Emil Velikov: Ensure that the check is moved after the detection code.]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 17:53:34 +01:00
Kyle Brenneman
22a9e00aab glx: Implement the libglvnd interface.
With reference to the libglvnd branch:

https://cgit.freedesktop.org/mesa/mesa/log/?h=libglvnd

This is a squashed commit containing all of Kyle's commits, all but two
of Emil's commits (to follow), and a small fixup from myself to mark the
rest of the glX* functions as _GLX_PUBLIC so they are not exported when
building for libglvnd. I (ajax) squashed them together both for ease of
review, and because most of the changes are un-useful intermediate
states representing the evolution of glvnd's internal API.

Co-author: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2016-05-30 16:29:49 +01:00
Frederic Devernay
cee459d84d gallivm: initialize init_native_targets_once_flag correctly
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 16:13:52 +02:00
Ilia Mirkin
8cc80e396e nvc0/ir: fix emission of predicate spill to register
The lane mask only applies to real mov's, while here we're using PSET.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-30 10:07:01 -04:00
Ilia Mirkin
9444d71611 nvc0: fix some compute texture validation bits on kepler
(a) Make sure to update the TIC in case of an updated buffer address
(b) Mark newly-inactive textures dirty so that we update the handle in
set_tex_handles.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-30 10:07:01 -04:00
Dave Airlie
bac39dddcf mesa/xfb: report calculated size for XFB buffer objects.
This fixes:
GL45-CTS.direct_state_access.xfb_buffers

This test looks correct to me, we should work out the
size value and report it rather than using only the size
from the Range interface.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-30 21:18:54 +10:00
Emil Velikov
e7bd5b4b77 swr: automake: silence the python invocation
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:31:08 +01:00
Emil Velikov
04987ef229 swr: automake: attempt to fix the out-of-tree build
Make sure that the output folder is created otherwise the python scripts
yells at us.

Cc: 0xe2.0x9a.0x9b@gmail.com
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96238
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:31:07 +01:00
Emil Velikov
3a59a624d0 swr: remove LLVM dependency from source generation rules.
The dependencies should not mention any files external to the project.
If we want to do sanity checks for the LLVM installed on the system we
should do that in configure, yet again where is the merit which header
gets checked and which doesn't ?

Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:31:07 +01:00
Emil Velikov
b05b782b43 swr: add all the generators to the release tarball.
Namely the python scripts and the knobs.template.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:31:07 +01:00
Emil Velikov
38394b5d76 anv: automake: don't forget to cleanup dev_icd.json
Otherwise `make distcheck' will barf at us as the file is dangling.

Ideally this should be part of the clean-local hook, although we include
install-lib-links.mk which already has one.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:29:21 +01:00
Emil Velikov
220d8c99fa anv: automake: bring back VULKAN_ENTRYPOINT_CPPFLAGS
We should not have removed them in the first place. There's a subtle
difference between generating the complete sources and using them which
was not obvious as we nuked them.

Without this, the release tarball ends up without various hunks of the
generated sources, thus things fail at a later stage as we attempt to
build them.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:56 +01:00
Emil Velikov
82514f26d8 anv: automake: ship the json files in the release tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:53 +01:00
Emil Velikov
f80b10df8d softpipe: add sp_buffer.h to the sources list (release tarball)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:28:53 +01:00
Emil Velikov
2f43908395 freedreno: make sure we pick up ir3_nir_trig.py in the release tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:28:53 +01:00
Emil Velikov
36859022ea isl: add isl_priv.h to the sources list
Otherwise it will be missing from the release tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:50 +01:00
Mauro Rossi
41d252e418 isl: move the sources lists to Makefile.sources
[Emil Velikov: use the file in the autoconf build]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:48 +01:00
Emil Velikov
b4f6c70397 isl: automake: list builddir before srcdir in the includes list
As seen elsewhere - we want to include the freshly built sources as
opposed the the (likely) stale ones in the srcdir.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:46 +01:00
Emil Velikov
53a2167e68 isl: automake: flatten the tests rules
Fold the unneeded extra variable tests_ldadd, the explicit sources
section (single file with the default extension) and flip the
check_PROGRAMS <> TESTS order (TESTS includes scripts, while
check_PROGRAMS is binaries only).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:43 +01:00
Emil Velikov
1eecc09584 isl: automake: remove unneeded install-lib-links.mk include
One uses the makefile to create compatibility symlinks (to
$top_builddir/libs) for shared libraries/modules. As we don't create any
here, there's no need to include the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:40 +01:00
Emil Velikov
afc1db739a isl: automake: remove unneeded SUBDIRS
As we do not include any other subdirs but self, we don't need to set
it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:28:37 +01:00
Mauro Rossi
779653489e genxml: move the sources (headers) list to Makefile.sources
[Emil Velikov: use the file in the autoconf build]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-30 10:26:36 +01:00
Emil Velikov
ace5403453 anv: bail out if anv_wsi_init() fails
Otherwise we'll end up setting up a device with no winsys integration.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
---
Hard-coding the rendernode name in anv_physical_device_init() is a bad
idea really. We could/should be using drmGetDevices() to get info on all
the devices (master/render/etc. node names, pci location etc.) and apply
our heuristics on top of that.

That can come up as a follow up change.
2016-05-30 10:26:36 +01:00
Emil Velikov
93e65fdcac anv: resolve wayland-only build
Ensure that the final X11/XCB hunk is guarded by the correct macro.
Otherwise we'll require the symbol even when building without said
platform.

Cc: Cedric Sodhi <manday@openmail.cc>
Reported-by: Cedric Sodhi <manday@openmail.cc>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-30 10:26:35 +01:00
Robert Foss
5068d307f9 anv: Fix use of uninitialized variable.
The return variable was not set for failure paths.
It has now been changed to VK_ERROR_INITIALIZATION_FAILED
for failure paths.

Coverity: 1358944
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: rebase against master, s/vulkan/anv/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 10:26:35 +01:00
Stanimir Varbanov
e382bc649b gallium: push offset down to driver
Push offset down to drivers when importing dmabuf. This is needed
to more fully support EGL_EXT_image_dma_buf_import when a non-zero
offset is specified.

Tesing has been done for freedreno, and compile tested following
gallium drivers:
nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-30 10:26:35 +01:00
Stanimir Varbanov
30d28d7c31 st/dri: cleanup image_from_fd/dma_buf paths
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-30 10:26:35 +01:00
Stanimir Varbanov
9d852a1f75 st/dri: add handling of R8 and GR88 DRI fourcc formats
This helps to import dmabuf buffers from DRM_FORMAT_R8 and
DRM_FORMAT_GR88 used for example by GStreamer for YUV to RGB
conversion using shaders.

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-30 10:26:35 +01:00
Bas Nieuwenhuizen
e9d3246a7a radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 09:59:50 +02:00
Francisco Jerez
d8cf982f7d i965: Expose GL 4.3 on Gen8+.
ARB_compute_shader was the last feature missing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
4decc426c2 i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.
We know that there cannot be any destination dependency race if we
reach the beginning or end of the program without having found any
other instruction the send could possibly race with.  This avoids
emitting a pile of useless moves at the beginning or end of the
program in the most common case in which the program has a single
basic block only.

On the original i965 I get the following shader-db results:

 total instructions in shared programs: 3354165 -> 3215637 (-4.13%)
 instructions in affected programs: 3183065 -> 3044537 (-4.35%)
 helped: 13498
 HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
daf4a71883 i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
6956015aa5 i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.
Just to make sure we keep the SIMD lowering pass tidy when we
introduce additional logic to try to optimize out the copy
instructions used to zip and unzip the destination and source regions
into multiple packed regions of the lowered instruction width.
Shouldn't cause any functional changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
a9f00a9e53 i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.
This will be useful in several places.  The only externally visible
difference (other than non-VGRF files being supported now) is that the
region sizes are now passed in byte units instead of in GRF units
because the loss of precision would have become a problem in the SIMD
lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
4db93592de i965/fs: Refactor offset() into a separate function taking the width as argument.
This will be useful in the SIMD lowering pass to avoid having to
construct a builder object of the known region width just to pass it
as argument to offset(), which doesn't do anything with it other than
taking the builder dispatch_width as region width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
a5b4f63c15 i965/fs: Implement opt_sampler_eot() in terms of logical sends.
This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies
the code and will allow the optimization to succeed in more cases
independent of whether the LOAD_PAYLOAD instruction can be found or
not.

The following patch is squashed in:

SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().

The sampler EOT optimization pass naively assumes that the texturing
instruction provides all the data used by the FB write just because
they're standing next to each other.  The least we should be checking
is whether the source and destination regions of the FB write and
texturing instructions match.  Without this the previous seemingly
harmless patch would have caused opt_sampler_eot() to misoptimize a
shader from dota-2 causing DCE to eliminate all of its 78 instructions
except for the final sampler EOT message (!).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
a0d9aed268 i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
2a166c13d4 i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.
No shader-db regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
d5f2f32b11 i965/fs: Run SIMD and logical send lowering after the optimization loop.
There are two reasons why this is useful:

 - It avoids the introduction of an amount of partial writes emitted
   by the SIMD lowering pass to zip and unzip register regions early
   during optimization, which can make subsequent optimization less
   effective.

 - It substantially reduces the burden on the compiler when a large
   fraction of the instructions in the program need to be split (e.g.
   during SIMD32 builds).  Individual halves of split instructions
   will be optimized identically (if they can still be optimized at
   all), so doing it up front can duplicate the amount of instructions
   the optimizer has to deal with which causes the compilation time to
   explode in some cases due to the worse-than-linear runtime
   behaviour of the back-end.

It seems helpful to re-run a few optimization passes in cases where
any of the lowering passes was able to make progress.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
e9eb59ba68 i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:38 -07:00
Francisco Jerez
48d743c501 i965/fs: Allow constant propagation into logical send sources.
Logical sends are eventually lowered into a series of copies so they
can take almost anything as source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:37 -07:00
Francisco Jerez
f1a607cf68 i965/fs: Let CSE handle logical sampler sends as expressions.
This will prevent some shader-db regressions when we start plumbing
logical sends through the optimizer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:37 -07:00
Francisco Jerez
b0c8e5e0c8 i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.
This will let the optimizer know that the sample mask value is unused
so its definition can be DCE'ed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 23:41:37 -07:00
Timothy Arceri
aac90ba292 glsl: fix xfb_offset unsized array validation
This partially fixes CTS test:
GL44-CTS.enhanced_layouts.xfb_get_program_resource_api

The test now fails at a tes evaluation shader with unsized output arrays.

The ARB_enhanced_layouts spec says:

   "It is a compile-time error to apply xfb_offset to the declaration of an
   unsized array."

So this seems like a bug in the CTS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-30 15:11:47 +10:00
Timothy Arceri
87fb5aa3e7 glsl: dont crash when attempting to assign a value to a builtin define
For example GL_ARB_enhanced_layouts = 3;

Fixes:
GL44-CTS.enhanced_layouts.glsl_contant_immutablity

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-30 12:47:58 +10:00
Dave Airlie
d98d6e6269 egl/dri3: don't crash on no context.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94925

Pointed out by Karol Herbst on irc.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 11:30:04 +10:00
Dave Airlie
e2791b38b4 mesa/program_interface_query: fix transform feedback varyings.
The spec says gl_NextBuffer and gl_SkipComponents need to be
returned to userspace in the program interface queries.

We currently throw those away, this requires a complete piglit
run to make sure no drivers fallover due to the extra varyings.

This fixes:
GL45-CTS.program_interface_query.transform-feedback-built-in

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-30 11:26:50 +10:00
Dave Airlie
6effdce92e glsl/ast: subroutineTypes can't be returned from functions.
These types can't be returned.

This fixes:
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
for the return type case.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-30 11:25:30 +10:00
Timothy Arceri
db2a35193f glsl: use has_double() helper
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-05-30 11:01:40 +10:00
Timothy Arceri
8f4ac20b6f glsl: fix explicit uniform block alignment
This stops the offset being bumped again when and an explicit
alignment has already been applied.

Fixes alignment issues in:
GL44-CTS.enhanced_layouts.uniform_block_alignment

Note the test still fails due to unrelated issues with doubles.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-05-30 11:01:32 +10:00
Jordan Justen
7398a32c50 i965: Shrink stage_prog_data param array length
It appears we were over-allocating these arrays.

Previously we would use nir->num_uniforms directly for scalar
programs, and multiply it by 4 for vec4 programs.

Instead we should have been dividing by 4 in both cases to convert
from bytes to a gl_constant_value count. The size of gl_constant_value
is 4 bytes.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-29 09:59:55 -07:00
Ilia Mirkin
160063b110 nv50,nvc0: fix the max_vertices=0 case
This is apparently legal. Drop any emit/restarts, and pass a 1 to the
hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-29 09:34:03 -04:00
Ilia Mirkin
f2e7268a55 st/mesa: fix setting of point_size_per_vertex in ES contexts
GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop
GL. So we have to go and check the last pre-rasterizer stage to see
whether it outputs a point size or not.

This fixes a number of dEQP tests that use a geometry or tessellation
shader to emit points primitives.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-05-29 09:34:03 -04:00
Marek Olšák
04a78068ff mesa: skip level checking for FramebufferTexture*D if texture is zero
From the OpenGL 4.5 core spec:
  "An INVALID_VALUE error is generated if texture is not zero and level is
  not a supported texture level for textarget, as described above."

Other FramebufferTexture functions already do the right thing.

This fixes the main menu in F1 2015.

Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-29 14:24:23 +02:00
Ilia Mirkin
60341ddd5c st/mesa: expose OES_shader_io_blocks when we have enough for ES 3.1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-28 20:58:12 -04:00
Vinson Lee
884ac61722 swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.
Fix build error with icc.

  CXX      libswrAVX_la-swr_clear.lo
icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor'
In file included from ./rasterizer/jitter/jit_api.h(31),
                 from swr_context.h(30),
                 from swr_clear.cpp(24):
./rasterizer/common/os.h(135): error: expected an identifier
  void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a)
       ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-05-28 14:26:54 -07:00
Thomas Hindoe Paaboel Andersen
df210ff24d i965: add missing return in if statement
Re-add the "return false" that was removed in 0c02d7002d

It seems that something went wrong when merging the patch. The patch
sent to the mailing list does not directly match what was committed.
https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-28 11:26:33 -07:00
Ilia Mirkin
c7731a0740 gk110/ir: fix unspilling of predicates from registers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96258
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
2016-05-28 13:14:19 -04:00
Samuel Pitoiset
697237b71e nvc0: remove outdated surfaces validation code for GK104
This code was used for validating surfaces with compute but now we use
pipe_image_view instead. Anyway, surfaces support should be
re-introduced properly once OpenCL happens.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-28 15:50:07 +02:00
Samuel Pitoiset
f07ade6881 nvc0: do not always invalidate 3D CBs when using compute
Constant buffers are aliased between 3D and CP on Fermi, but we should
only invalidate them when a compute shader actually uses CBs and not
all the time after a lauching grid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-28 15:50:03 +02:00
Francisco Jerez
357495b94d i965: Update compute workgroup size limit calculation for SIMD32.
This should have the side effect of enabling the ARB_compute_shader
extension on Gen8+ hardware and all Gen7 platforms that didn't
previously expose it (VLV and IVB GT1) due to the number of hardware
threads per subslice being insufficient in SIMD16 mode.

v2: Bump workgroup size limit for GLES too (Jordan).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 23:29:06 -07:00
Francisco Jerez
46ce93ed22 i965: Add do32 debug option.
The do32 INTEL_DEBUG option causes the back-end to try to generate a
SIMD32 program when compiling a compute shader regardless of the
specified compute shader workgroup size, which will be useful for
testing SIMD32 code generation in the most common case in which the
workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
wouldn't be automatically enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
864737ce6c i965/fs: Build 32-wide compute shader when needed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
37fd13ee2d i965/fs: Extend back-end interface for limiting the shader dispatch width.
This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
2d288cb9ea i965/fs: Implement SIMD32 register allocation support.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
7f10d3983b i965/fs: Remove pre-Gen7 register allocation class micro-optimization.
This was trying to save some one-time init on pre-Gen7 hardware under
the assumption that one would only ever need 1, 2, 4 and 8-wide
registers on those platforms.  However nothing guarantees that those
will be the only VGRF sizes used after lowering and optimization.  In
some cases we may end up with a temporary of different size being
allocated (e.g. by SIMD lowering to zip or unzip a multi-component
register region of a logical send instruction), and there is no
guarantee that they will be optimized away before register allocation
(especially since the compute_to_mrf coalescing pass is
rather... lacking...).  Instead just allocate classes for all possible
VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
encounter a variable of any other size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
1d5bf46ad1 i965/fs: Don't mutate multi-component arguments in sampler payload set-up.
The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate.  Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.

Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.

v2: Fold a few post-increment operators into the last MOV
    statement. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
ad8f66ed33 i965/fs: Fix multiple ACP interference during copy propagation.
This is more fallout from cf375a3333.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
c88b52745c i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I could reproduce this easily with SIMD32 but I don't see any reason
why the problem would be SIMD32-specific, it was most likely working
by luck.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
8476233ae2 i965/fs: Estimate number of registers written correctly in opt_register_renaming.
The current estimate is incorrect for non-32b types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
437e65f9d9 i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
These are completely ignored by the conversion to brw_reg, so they
better be zero.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
51dd6a60f5 i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().
Prevents an assertion failure in the following commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
b9eab911ba i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
796238d9e6 i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
29e4717251 i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
a55452530f i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
ae730049c6 i965/fs: Return 32 bit mask from fs_builder::sample_mask().
This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
8b6edee679 i965/fs: Emit fixed-width null register regardless of the dispatch width.
brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
298320280f i965/fs: Fix half() to handle more exotic register files.
horiz_offset() is able to deal with a superset of the register files
currently special-cased in half().  Just call horiz_offset() in all
cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
8c9601ef7b i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
We'll hit these in some cases during SIMD lowering in 32-wide
programs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
7d430fc05e i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
ecd7a7255a i965/fs: Keep track of flag dependencies with byte granularity during scheduling.
This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
0fec265373 i965/fs: Track flag register liveness with byte granularity.
This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister.  Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.

Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
df1aec763e i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.
v2: Codestyle fixes (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
ece41df247 i965/fs: Expose arbitrary channel execution groups to the IR.
This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8.  At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
81bc6de8c0 i965/ir: Make BROADCAST emit an unmasked single-channel move.
Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.

The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
41562eb8f3 i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control.  On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals.  On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
a5a0810960 i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work.  Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
1e3c58ffaf i965/fs: Lower 32-wide scratch writes in the generator.
The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up.  We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly.  The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:02 -07:00
Francisco Jerez
a7d319c00b i965/fs: Implement scratch reads and writes of 4 GRFs at a time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
fe5cdde2f9 i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
fc7107de1d i965/eu: Set execution size explicitly for memory fence send message.
We don't want to emit a 32-wide send message in 32-wide programs.  The
memory fence message should have the same effect regardless of the
execution size (as long as it's valid) so just set it to one.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
5c887326c5 i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.
In SIMD32 programs the compiler is responsible for providing the
appropriate half of the sample mask in the message header, so the
first and third quarters both map to the first slot group of the
provided 16-bit half, while the second and fourth quarters map to the
second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
two.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
448340d31f i965/fs: Clean up remaining uses of dispatch_width in the generator.
Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
7f28ad8c4d i965/eu: Remove brw_codegen::compressed and ::compressed_stack.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
646213168e i965/eu: Use current exec size instead of p->compressed in surface message generation.
This was kind of an abuse of p->compressed, dataport send message
instructions are always uncompressed.  Use the current execution size
instead since p->compressed is on its way out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:46 -07:00
Francisco Jerez
492286e90b i965/fs: No need to reset predicate control after emitting some instructions.
Trivial clean-up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
8ef5637729 i965/fs: Pass current execution size to brw_IF() and brw_DO().
This gets IF and DO instructions working in SIMD32 programs.  brw_IF()
and brw_DO() should probably behave in the same way as other generator
functions that emit control flow instructions and just figure out the
right execution size by themselves from the current execution controls
specified through the brw_codegen argument.  Changing that will
require updating lots of Gen4-5 clipper code though, so for the moment
just pass the current value redundantly from the FS generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
fdae8b9f91 i965/eu: Stop using p->compressed to specify the exec size of control flow instructions.
p->compressed won't work for SIMD32, we should just be using the
execution size value specified via p->current instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
0b4cd91071 i965/fs: Extend region width calculation to allow arbitrary execution sizes.
Instead of just halving the execution size when the instruction is
compressed hoping that it will give a legal source region width, we
can calculate the maximum legal width value in closed form from the
component size and stride.  This makes sure that brw_reg_from_fs_reg()
always returns a valid hardware region even for virtual 32-wide
instructions (e.g. send-like instructions) that would seem to exceed
the hardware region width limit after halving.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Kenneth Graunke
dabaf4fb96 i965/fs: Pass the compression mode to brw_reg_from_fs_reg().
Curro is planning to eliminate p->compressed, so let's avoid using it
here and just pass in the value directly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
3340a66fce i965/fs: Simplify per-instruction compression control setup in generator.
By using the new compression/group control interface.  This will allow
easier extension to support arbitrary channel enable groups at the IR
level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
c78edcea8b i965/fs: No need to set compression control at the top of generate_code().
The right value is dependent on the specific IR instruction being
generated so it has to be reset in every iteration of the loop anyway.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
c19c3d3a52 i965/eu: Fix a bunch of compression control bugs in the generator.
Most of these were resetting quarter control to zero incorrectly even
though everything they needed to do was disable instruction
compression -- The brw_SAMPLE() case was doing the right thing but it
can be simplified slightly by using the new compression control
interface.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
3dffd81583 i965/eu: Define alternative interface for setting compression and group controls.
This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.

It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs.  On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.

Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
5db4d62395 i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.
It's just a byte MOV with strided source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
29ce110be6 i965/fs: Remove extract virtual opcodes.
These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:09 -07:00
Francisco Jerez
9dcb8ff6a1 i965: Define brw_int_type() helper.
Intended as a (partial) inverse of type_sz().  Will be useful in the
next commit and some other SIMD32 generator changes I have queued up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:09 -07:00
Francisco Jerez
bb89beb26b i965/fs: Remove manual splitting of DDY ops in the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:02 -07:00
Francisco Jerez
982c48dc34 i965/fs: Remove manual unrolling of BFI instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
95272f5c7e i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
f14b9ea6e6 i965/fs: Drop lowering code for a few three-source instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
117a9a0a64 i965/fs: Set default access mode to Align1 for all instructions in the generator.
Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
3a541d0c0b i965/fs: Remove handcrafted math SIMD lowering from the generator.
Most of this wouldn't have worked for SIMD32 and had various
dispatch_width and compression control bugs.  It's mostly dead now
with SIMD lowering of math instructions turned on in the compiler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
cf5443f984 i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.
Which is 16 or 8 in most cases.  This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
197833caa3 i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.
Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.

LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.

An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
9eea3df29f i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time
...on hardware lacking compressed Align16 support.  Will allow
simplifying the generator code and fixing it for SIMD32 codegen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
12ae87abb1 i965/fs: Apply usual FPU-like execution size restrictions to MULH.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
dea9c1df89 i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
122e031548 i965/fs: Assert that IF instruction with embedded compare has legal exec_size.
We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
98c8bef01c i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
88d9cc1563 i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
a6bf5f88c7 i965/fs: Enforce common regioning restrictions by SIMD splitting.
This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction.  Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
2b5adb942b i965/fs: Enforce extended math exec size limits during SIMD lowering.
This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
a8e7b4f1d9 i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.
Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end.  At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
99b5476d33 i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.
The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
e531b7907a i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant opcodes.
This was causing the scheduler to be rather optimistic about the
latency of pull constant opcodes on Gen7+.  This might seem to
increase the cycle count estimate calculated by the scheduler itself
for some shaders, even though the actual cycle count should actually
be decreased.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
ed4d0e41ac i965/fs: Rename Gen4 physical varying pull constant load opcode.
For consistency with the Gen7 variant.  I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
64a6cb87f1 i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.
Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
d8a3294ac2 i965/fs: Hide varying pull constant load message setup behind logical opcode.
This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
0bc5ad8d19 i965/fs: Avoid constant propagation when the type sizes don't match.
The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
52cc80d859 i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.
If the LOAD_PAYLOAD instruction only has header sources it's possible
for the number of registers written to be less than or equal to the
SIMD component size, in which case it would take the single-MOV path
at the bottom which would cause the channel enable masks to be applied
incorrectly to the header contents and/or cause it to write past the
end of the allocated temporary.  If the instruction is either
LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
going to mess up the program so just don't use it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
c5f224145a i965/fs: Handle instruction predication in SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
1760c24b4b i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.
If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk.  As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source.  In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
168163f5f0 i965/fs: Generalize is_uniform() to is_periodic().
This will be useful in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
b736e78ddb i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
2db9dd5aeb i965/fs: Fix off-by-one region overlap comparison in copy propagation.
This was introduced in cf375a3333 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error.  This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.

Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-05-27 23:19:20 -07:00
Ronie Salgado
8f538d9ae0 anv/cmd_buffer: Don't delete command buffers in ResetCommandPool()
v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 18:56:33 -07:00
Brian Paul
747754f027 gallium/util: another s/unsigned/enum pipe_prim_type/ for clang
Trivial.
2016-05-27 18:42:21 -06:00
Jason Ekstrand
b93b5935a7 anv: Try the first 8 render nodes instead of just renderD128
This way, if you have other cards installed, the Vulkan driver will still
work.  No guarantees about WSI working correctly but offscreen should at
least work.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
2016-05-27 17:18:33 -07:00
Jason Ekstrand
e023c104f7 anv: strdup the device path into the physical device
This way we don't have to assume that the string coming in is a piece of
constant data that exists forever.
2016-05-27 17:18:33 -07:00
Jason Ekstrand
9048dee328 anv/formats: Exit early for unsupported formats 2016-05-27 17:17:09 -07:00
Jason Ekstrand
10bc9f7024 anv/formats: Map VK_FORMAT_UNDEFINED to ISL_FORMAT_UNSUPPORTED
At one point in time, we may have used the mapping to ISL_FORMAT_RAW for
certain buffer surfaces but that time has long since passed.  This fixes a
bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
2016-05-27 17:17:09 -07:00
Jason Ekstrand
b16326c740 anv/clear: Remove an unused variable 2016-05-27 17:17:09 -07:00
Brian Paul
8beb6f3c9c gallium/util: another unsigned -> enum pipe_prim_type change
gcc didn't warn about the unsigned / enum pipe_prim_type mismatch
between the .c and .h file.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-27 17:55:05 -06:00
Jordan Justen
47e2a57fe9 i965/compute: Fix uniform init issue when SIMD8 is skipped
In d8347f12ea, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.

This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-27 16:44:00 -07:00
Bas Nieuwenhuizen
65d4ba6f20 docs: Mention GL4.3 and ES3.1 support for nvc0 and radeonsi
v2: also update the introductory text.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-28 01:04:03 +02:00
Jason Ekstrand
fb2a5ceb32 anv: Emit DRAWING_RECTANGLE once at driver initialization
Also, we don't actually need it for clipping because meta always colors
inside the lines and, for all other operations, the user is required to set
a scissor.  Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as
little as possible.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:11 -07:00
Jason Ekstrand
3a83c176ea anv/cmd_buffer: Only emit PIPE_CONTROL on-demand
This is in contrast to emitting it directly in vkCmdPipelineBarrier.  This
has a couple of advantages.  First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs.  Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:09 -07:00
Jason Ekstrand
7120c75ec3 genxml: Make PIPE_CONTROL::CommandStreamerStallEnable a boolean
This has been declared as a uint since SNB but it's only one bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:07 -07:00
Jason Ekstrand
b26bd6790d anv/clear: Only clear the render area when doing subpass clears
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:04 -07:00
Jason Ekstrand
5432487792 anv: Move push constant allocation to the command buffer
Instead of blasting it out as part of the pipeline, we put it in the
command buffer and only blast it out when it's really needed.  Since the
PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a
stall which we would like to avoid.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:17:43 -07:00
Bas Nieuwenhuizen
2cee0d0f9c radeonsi: enable OpenGL 4.3
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-27 22:28:11 +02:00
Dave Airlie
0438bc76e2 nouveau: enable GL 4.3 on kepler/fermi
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:52:13 +10:00
Marek Olšák
43550f25ed radeonsi: always reserve output space for tess factors
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dave Airlie <airlied@redhat.com>
2016-05-27 21:40:43 +02:00
Dave Airlie
c44513a1f3 glsl/linker: call link_uniform blocks on linked shader.
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.

This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e
glsl: handle implicit sized arrays in ssbo

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reported-by: Mark James
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:35:53 +10:00
Dave Airlie
f0254fdd07 mesa/get: drop unused extension checks.
These all show up as unused warnings here, so drop them for now.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:29:23 +10:00
Bas Nieuwenhuizen
4717d5a2d3 gallium/ddebug: Add passthrough for query_memory_info.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-27 20:00:07 +02:00
Jason Ekstrand
0482efdc93 nir/inline: Also rewrite param derefs for texture instructions
Without this, samplers get left hanging as derefs to variables that don't
actually exist.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-27 10:28:27 -07:00
Jason Ekstrand
2522180845 nir/inline: Break the guts of rewrite_param-derefs into a helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-27 10:28:27 -07:00
Jason Ekstrand
d19c406395 nir/inline: Make the rewrite_param_derefs helper work on instructions
Now that we have the better nir_foreach_block macro, there's no reason to
use the archaic block version for everything.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-27 10:28:27 -07:00
Jason Ekstrand
2fcba404f8 nir/inline: Don't use foreach_instr_safe unless we need to
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-27 10:28:27 -07:00
Roland Scheidegger
9247570d42 gallivm: eliminate a unnecessary AND with unorm lerps
Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-27 19:11:28 +02:00
Roland Scheidegger
17d685c426 gallium/util: use enum pipe_prim_type instead of unsigned some more
There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-27 19:11:28 +02:00
Brian Paul
2318d2015a svga: remove unneeded casts in get_query_result_vgpu9() calls
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-27 10:01:12 -06:00
Brian Paul
9be122e9b0 svga: use MAYBE_UNUSED to silence release-build warnings
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-27 10:00:56 -06:00
Ben Widawsky
8314dd7ff2 isl: Fix some tautological-compare warnings
Fixes:
isl.c:62:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
    assert(ISL_DEV_GEN(dev) == dev->info->gen);
                      ^~
isl.c:63:33: warning: self-comparison always evaluates to true [-Wtautological-compare]
    assert(ISL_DEV_USE_SEPARATE_STENCIL(dev) == dev->use_separate_stencil);

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-26 21:59:17 -07:00
Ilia Mirkin
4ccf8c952a mesa: add support for GLSL ES 3.20 version string
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-26 21:25:53 -04:00
Ilia Mirkin
faae9ab2ee mapi: expose new functions in GL ES 3.2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-26 21:25:53 -04:00
Ilia Mirkin
df2881381a nvc0/ir: handle a load's reg result not being used for locked variants
For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-26 21:23:49 -04:00
Ilia Mirkin
04ecad97ff nvc0/ir: avoid generating illegal instructions for compute constbuf loads
For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-26 21:23:49 -04:00
Rob Clark
4f98c94be7 gallium/util: fix build break
Missing #include caused build breaks after 21a3fb9cd.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-26 20:59:08 -04:00
Jason Ekstrand
9f9f229359 nir/spirv: Allow pointless variable decorations on inputs
SPIR-V specifies that a bunch of stuff gets applied to types.  This means
taht a local variable could get, for instance, an array stride.  Just
because it's pointless doesn't mean you'll never see it.
2016-05-26 17:10:50 -07:00
Brian Paul
1ec45a1948 gallium/util: use enum pipe_prim_type in u_prim.h functions
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
7a49b41436 util/indices: move duplicated assignments out of switch cases
Spotted by Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
46be65c681 gallium: change pipe_draw_info::mode to be pipe_prim_type
Makes debugging with gdb a little nicer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
a25ae485a6 util/indices,svga: s/unsigned/enum pipe_prim_type/
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
21a3fb9cd8 util: s/unsigned/enum pipe_resource_usage/ for buffer usage variables
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
45078e8890 svga: s/unsigned/enum pipe_resource_usage/ for buffer usage variables
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul
d21a309c6c svga: s/unsigned/enum pipe_prim_type/ for primitive type variables
Proper enum types were only added recently.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
90afd7b7ef svga: fix test for unfilled triangles fallback
VGPU10 actually supports line-mode triangles.  We failed to make use of
that before.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
2c07c40d2f svga: clean up and improve comments in svga_draw_private.h
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
0f983e1793 util/indices: implement unfilled (tri->line) conversion for adjacency prims
Tested with new piglit gl-3.2-adj-prims test.

v2: re-order trisadj and tristripadj code, per Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
d6c2c7d710 util/indices: implement provoking vertex conversion for adjacency primitives
Tested with new piglit gl-3.2-adj-prims test.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
479d364c39 util/indices: assert that the incoming primitive is a triangle type
The unfilled index translator/generator functions should only be
called when the primitive mode is one of the triangle types.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
26de558072 util/indices: formatting, whitespace fixes in u_unfilled_indices.c
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
24eadb4810 util/indices: improve comments in u_indices.h
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul
5393238765 svga: fix primitive mode (point/line/tri) test for unfilled primitives
The original mode test was valid before we had GS support.

Regression tested with full piglit run.  Though, I don't think we have
any piglit tests that exercise drawing unfilled adjacency primitives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-26 17:44:17 -06:00
Ian Romanick
b7af108d3e i965: Enable GL_OES_shader_io_blocks
Only one dEQP io_blocks test fails.  This test fails for the same reason
as the match_different_member_struct_names test in a previous commit.

dEQP-GLES31.functional.separate_shader.validation.io_blocks.match_different_member_struct_names

v2: Add to release notes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:24:25 -07:00
Ian Romanick
660240da9e glsl: Allow shader interface blocks in GLSL ES
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:24:25 -07:00
Ian Romanick
7a3093efcc glsl: Add a has_shader_io_blocks helper
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:24:25 -07:00
Ian Romanick
f0902ee813 mesa: Add extension tracking for GL_OES_shader_io_blocks
v2: Also support GL_EXT_shader_io_blocks.  It's pretty much identical to
the OES extension.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:24:25 -07:00
Ian Romanick
326a269c77 mesa: Only validate SSO shader IO in OpenGL ES or debug context
v2: Move later in series to avoid issues with Gallium drivers and debug
contexts.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-26 16:23:53 -07:00
Ian Romanick
3722c76001 mesa: Remove old validate_io function
The new validate_io catches all of the cases (and many more) that the
old function caught.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:22:25 -07:00
Ian Romanick
bd3f15cffd mesa: Additional SSO validation using program_interface_query data
Fixes the following dEQP tests on SKL:

dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_smooth_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_implicit_explicit_location_1
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_element_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_none
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_order
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_centroid_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_length
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_precision
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_centroid
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_smooth
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_name

It regresses one test:

dEQP-GLES31.functional.separate_shader.validation.varying.match_different_struct_names

Hoever, this test is based on language in the OpenGL ES 3.1 spec that I
believe is incorrect.  I have already submitted a spec bug:

https://www.khronos.org/bugzilla/show_bug.cgi?id=1500

v2: Move spec quote about built-in variables to the first place where
it's relevant.  Suggested by Alejandro.

v3: Move patch earlier in series, fix rebase issues.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v2]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> [v2]
2016-05-26 16:21:01 -07:00
Ian Romanick
cfff746297 mesa: Track the additional data in gl_shader_variable
The interface type, interpolation mode, precision, the type of the
outermost structure, and whether or not the variable has an explicit
location will be used for SSO validation on OpenGL ES.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-26 16:19:16 -07:00
Jason Ekstrand
15e553daf0 nir: Make nir_const_value a union
There's no good reason for it to be a struct of an anonymous union.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-26 16:03:44 -07:00
Kenneth Graunke
e7776fa947 i965: Use the buffer object size for VERTEX_BUFFER_STATE's size field.
commit 7c8dfa78b9 (i965/draw: Use the real size for vertex
buffers) changed how we programmed the VERTEX_BUFFER_STATE size field.

Previously, we programmed it to the size of the actual underlying BO,
which is page-aligned, and potentially much larger than the GL buffer
object.  This violated the ARB_robust_buffer_access spec.

With that change, we started programming it based on the range of data
we expect the draw call to actually access - which is based on the
min_index and max_index information provided to glDrawRangeElements().

Unfortunately, applications often provide inaccurate range information
to glDrawRangeElements().  For example, all the Unreal demos appear to
draw using a range of [0, 3] when the index buffer's actual index range
is [0, 5].  Such results are undefined, and we are absolutely allowed
to restrict access to the range they specified.  However, the failure
mode is usually that nothing draws, or misrendering with wild geometry,
which is kind of bad for a common mistake.  And people tend to assume
the range information isn't that important when data is in VBOs.

There's no real advantage, either.  ARB_robust_buffer_access only
requires us to restrict access to the GL buffer object size, not
the range of data we think they should access.  Doing that allows
buggy applications to still function.  (Note that we still use this
information for busy-tracking, so if they try to overwrite the data
with glBufferSubData, they'll still hit a bug.)  This seems to be
safer.

We may want to provide the more strict range as a debug option,
or scan the VBO and warn against bogus glDrawRangeElements in
debug contexts.  That can be done as a later patch, though.

Makes Unreal demos draw again.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-26 15:56:41 -07:00
Samuel Pitoiset
e01a482182 nvc0: invalidate textures/samplers between 3D and CP on Fermi
Like constant buffers, samplers and textures are aliased on Fermi and
we need to invalidate the state when switching from 3D to CP and vice
versa.

This fixes rendering issues in the UE4 demos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-26 23:51:22 +02:00
Jason Ekstrand
9f0bc0f2b3 anv: Stop linking against libmesa.la and libdri_test_stubs.la
This brings the final size of an optimized non-debug build of the Vulkan
driver down to 2.9 MB as opposed to 8.7 MB for the dri driver.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
057259655e i965: Don't link libmesa or libdri_test_stubs into tests
Now that the compiler has been completely separated from libmesa, we no
longer need these.  We can make the tests much smaller by not linking them
in.  This also ensures that anyone who runs make check won't accidentally
put in any dependencies from the compiler to the rest of mesa core.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
870ff6cd38 i965: Move compiler debug functions to intel_screen.c
They reference the compiler so they shouldn't go in libi965_compiler.la.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
327161a48d i965/test: Remove the fragment/vertex_program field from test visitors
None of them are actually using it.  It's a relic of an older compiler
interface that required a gl_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
e0ae10c49a i965: Move brw_new_shader to brw_link.cpp
That's where brw_link_shader lives and they seem to go together.  Also,
this gets it out of libi965_compiler.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
5136b67915 i965: Move brw_nir_lower_uniforms.cpp to i965_FILES
This gets it out of i965_compiler.la

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
5e43ba7e9e i965: Move brw_create_nir to brw_program.c
This way it's no longer part of libi965_compiler.la since it depends on
GLSL and ARB program stuff.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
86a2447eec i965/nir: Move the type_size_*_bytes functions to brw_nir.h
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
58d1e82d32 ptn: Include nir.h
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Jason Ekstrand
32210dea8e compiler: Move glsl_to_nir to libglsl.la
Right now libglsl.la depends on libnir.la so putting it in libnir.la
adds a dependency on libglsl.la that goes the wrong direction.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:13:38 -07:00
Ben Widawsky
ddcfc35f62 i965/sklgt4: Implement depth/timestamp write w/a
The stated bug describes a scenario in which a post sync write operation for
depth or timestamp can be ignored. There are two workarounds suggested, the
first and easier is to simply do a cs stall when we do these type of writes.
The second option is to do a PIPE_CONTROL flush after the post sync but before
the data is required.

Generally, I believe the data written out is consumed by the application on the
CPU side and so doing the easier of the two is ideal. Furthermore, these queries
aren't tremendously common in the perf sensitive apps I have looked at. However,
there could be cases where a shader stage might directly consume the data, and
as a result option 2 may be desirable.

This patch goes with the easier solution for now.

gen9lp bug_de_id=2137196

By itself, this does *not* fix any of the GT4 hangs we're currently
experiencing.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-26 14:08:17 -07:00
Ben Widawsky
f1fa8b4a1c i965/bxt: Add 2x6 variant
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-26 14:06:43 -07:00
Bas Nieuwenhuizen
43d7305a40 radeonsi: Allow TES distribution between shader engines.
The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.

v2: Add distribution mode field enums.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
f91c85b29b radeonsi: Process multiple patches per threadgroup.
Using more than 1 wave per threadgroup does increase performance
generally.  Not using too many patches per threadgroup also
increases performance. Both catalyst and amdgpu-pro seem to
use 40 patches as their maximum, but I haven't really seen
any performance increase from limiting the number of patches
to 40 instead of 64.

Note that the trick where we overlap the input and output LDS
does not work anymore as the insertion of the tess factors
changes the patch stride.

v2: - Add comment about LDS assumptions.
    - Add constant for buffer size.
    - Fix code style.

v3: - Correct limits for not splitting patches between waves.
    - Set max num_patches to 40 as in the proprietary driver.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
fd0a7a382f radeonsi: Add barrier before writing the tess factors.
The factors may be stored to LDs by another invocation than
the invocation for vertex 0.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
fee3160af9 radeonsi: Enable dynamic HS.
This allows running the TES on different CU's than the
TCS which results in performance improvements.

v2: Only write the control word from one invocation.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
26f436132b radeonsi: Remove LDS layout user SGPR's from TES.
They are unused.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
a4e2146a9d radeonsi: Use buffer loads and stores for passing data from TCS to TES.
We always try to use 4-component loads, as LLVM does not combine loads
and they bypass the L1 cache.

We can't use a similar strategy for stores and this is especially
notable with the tess factors, as they are often set with separate
MOV's per component in the TGSI.

We keep storing to LDS and the LDS space, so we can load the outputs
later, either due to the shader, of for wrting the tess factors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
6217716e8f radeonsi: Store inputs to memory when not using a TCS.
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

v2: Use u_bit_scan64.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
7846fa8768 radeonsi: Add offchip buffer address calculation.
Instead of creating a memory area per patch and per vertex, we put
the same attribute of every vertex & patch together. Most loads
and stores access the same attribute across all lanes, only for
different patches and vertices.

For the TCS this results in tightly packed data for 4-component
stores.

For the TES this is not the case as within a patch the loads
often also access the same vertex. However if there are < 4
vertices/patch, this still results in a reduction of the number
of cache lines. In the LDS situation we only do better than worst
case if the data per patch < 64 bytes, which due to the
tessellation factors is pretty much never.

We do not use hardware swizzling for this. It would slightly reduce
the number of executed VALU instructions, but I had issues with
increased wait times that I haven't been able to solve yet.
Furthermore, the tbuffer_store intrinsic does not support both
VGPR offset and an index, so we have a problem storing
indirectly indexed outputs. This can be solved by temporarily
storing arrays in LDS and then copying them, but I don't think
that is worth the effort. The difference in VALU cycles
hardware swizzling gives is about 0.2% of total busy cycles.
That is without handling the array case.

I chose for attributes instead of components as they are often
accessed together, and the software swizzling takes VALU cycles
for calculating offsets.

v2: - Rename functions to get_tcs_tes_buffer_address.
    - multiply by 16 as late as possible.
    - Use  tgsi_full_src_register_from_dst.
    - Remove some bad comments.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
c49e68dc4b radeonsi: Add user SGPR for the layout of the offchip buffer.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
d9a0c54f6f radeonsi: Use correct parameter index for LS_OUT_LAYOUT.
This happens to be in the right position, but that changes
when TCS/TES get new parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
3e7a7a9a65 radeonsi: Add buffer load functions.
v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM.
    - Code style fixes.

v3: - Code style fix.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
9fdb778702 radeonsi: Define build_tbuffer_store_dwords earlier to support new users.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
5c34562d7c radeonsi: Add offchip tessellation parameters.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen
d27ff7d683 radeonsi: Add buffer for offchip storage between TCS and TES.
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.

v2: - Use the correct register for SI.
    - Add define for block size.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-26 22:07:04 +02:00
Rob Clark
6e51fe75a4 tgsi: fix coverity out-of-bounds warning
CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 15:17:49 -04:00
Rob Clark
3d66ba971e tgsi: fix out of bounds access
Not sure why coverity calls this an out-of-bounds read vs out-of-bounds
write.

CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local:
Overrunning array r of 3 16-byte elements at element index 3 (byte
offset 48) using index chan (which evaluates to 3).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 15:17:49 -04:00
Anuj Phogat
0c02d7002d i965: Don't use fast copy blit in case of logical operations other than GL_COPY
XY_FAST_COPY_BLT command doesn't have a field for raster operation. So, fall
back to using XY_SRC_COPY_BLT to handle those cases.

Fixes piglit test gl-1.1-xor-copypixels when fast copy blit is enabled
for all tiling formats.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-26 10:57:09 -07:00
Anuj Phogat
97f0f91cc1 i965/gen9: Remove the halign/valign field setup code in fast copy blit
Experimentation with different values of src/dst horizontal/vertical
alignment showed that these fileds are not used on gen9 hardware.

A recent update in graphics specs has removed these fields from
XY_FAST_COPY_BLT command.

Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-26 10:57:09 -07:00
Samuel Pitoiset
c52e92ec3a nvc0: allow to monitor MP perf counters with compute shaders
To read out MP perf counters we use a compute shader and need to upload
input data like a 64-bits addr used to store the values and a sequence
ID for synchronization. Currently, this input data is uploaded as user
uniforms which means that it's sticked to c0[], but if a compute shader
from a real application is used, monitoring those performance counters
will just overwrite some data and miserably crash.

Instead, sticking the 64-bits addr and the sequence into the driver
constant buffer seems like much better and will allow to monitor
counters with GL 4.3 apps.

Tested on GF119 and GK110, but should not hurt anything on GK104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-26 19:34:57 +02:00
Kristian Høgsberg Kristensen
329d115ac6 mesa: Move robustness code to main/robustness.c
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 09:37:17 -07:00
Kristian Høgsberg Kristensen
d7d729b965 docs: Mark GL_KHR_robustness done for GLES3.2 as well
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 09:36:36 -07:00
Plamena Manolova
a0674ce5c4 egl: Additional attribute validation for eglCreatePbufferSurface
eglCreatePbufferSurface should generate an EGL_BAD_MATCH error if:
1: The EGL_TEXTURE_FORMAT attribute is EGL_NO_TEXTURE and EGL_TEXTURE_TARGET
is something other than EGL_NO_TEXTURE
2: EGL_TEXTURE_FORMAT is something other than EGL_NO_TEXTURE and
EGL_TEXTURE_TARGET is EGL_NO_TEXTURE.

This fixes the dEQP-EGL.functional.negative_api.create_pbuffer_surface test.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-26 08:02:48 -07:00
Marek Olšák
8539c9bf31 gallium/radeon: add the kernel version into the renderer string
Example:
Gallium 0.4 on AMD TONGA (DRM 3.2.0 / 4.5.0, LLVM 3.9.0)

My kernel version is pretty long already (4.5.0-amd-01025-g32791c1)
and adding "kernel" into the string would make too it long for glxinfo
to display.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-26 16:53:46 +02:00
Marek Olšák
53f33619a4 winsys/amdgpu: add back multithreaded command submission
Ported from the initial amdgpu winsys from the private AMD branch.

The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.

3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.

v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
    add another amdgpu_cs_sync_flush call into amdgpu_bo_map

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-26 16:43:45 +02:00
Lars Hamre
c626a86586 gallium/tgsi: use _mesa_roundevenf in micro_rnd
Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-roundeven-float
fs-roundeven-vec2
fs-roundeven-vec3
fs-roundeven-vec4
vs-roundeven-float
vs-roundeven-vec2
vs-roundeven-vec3
vs-roundeven-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-roundeven-float
gs-roundeven-vec2
gs-roundeven-vec3
gs-roundeven-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 07:59:15 -06:00
Emil Velikov
d519f59a9f .mailmap: use Jakob Bornecrantz's personal email
The VMware one is bouncing.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-26 13:57:32 +01:00
Ilia Mirkin
f998e5dc6b nvc0: add note about where the viewport mask would go
Not piping this all the way through yet, but no better place to note
this down. This will can be used with NV_viewport_array2.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-26 08:46:29 -04:00
Ilia Mirkin
b634936d3b nvc0: enable 32 textures on kepler+
For fermi, this likely will require use of linked tsc mode. However on
bindless architectures, we can have as many as we want. As it stands,
the AUX_TEX_INFO has 32 teture handles reserved.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-26 08:46:13 -04:00
Alejandro Piñeiro
2ed9563e79 glsl: add unit tests data vertex/expected outcome for uninitialized warning
v2: fix 025 test. Add three more tests (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-26 09:19:36 +02:00
Alejandro Piñeiro
eee00274fa glsl: add warning-test
It executes compiler-glsl on all the available shaders, and it checks
that the outcome is the expected.

Bash code based on the already existing optimization-test

v2: rebasing: use --version option

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-26 09:19:17 +02:00
Alejandro Piñeiro
68c23d2d04 glsl: add just-log option for the standalone compiler.
Add an option in order to ask to just print the InfoLog, without any
header or separator. Useful if we want to use the standalone compiler
to track only the warning/error messages.

v2: all printfs goes on its own line (Ian Romanick)
v3: rebasing: move just_log to standalone.h/cpp

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-26 08:46:05 +02:00
Alejandro Piñeiro
66ff04322e glsl: do not raise uninitialized warning with out function parameters
It silence by default warnings with function parameters, as the
parameters need to be processed in order to have the actual and the
formal parameter, and the function signature. Then it raises the
warning if needed at verify_parameter_modes where other in/out/inout modes
checks are done.

v2: fix comment style, multi-line condition style, simplify check,
    remove extra blank (Ian Romanick)
v3: inout function parameters can raise the warning too (Ian
    Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-26 08:39:17 +02:00
Alejandro Piñeiro
b9f90ef652 glsl: add a empty set_is_lhs on ast_node
Just to allow to call set_is_lhs on any ast_node without a casting. Useful
when processing a ast_node list that we know it contain ast_expression.

v2: comment out new_value to avoid unused parameter warning (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-26 08:39:07 +02:00
Dave Airlie
5b2675093e glsl: handle implicit sized arrays in ssbo
The current code disallows unsized arrays except at the end of
an SSBO but it is a bit overzealous in doing so.

struct a {
	int b[];
	int f[4];
};

is valid as long as b is implicitly sized within the shader,
i.e. it is accessed only by integer indices.

I've submitted some piglit tests to test for this.

This also has no regressions on piglit on my Haswell.
This fixes:
GL45-CTS.shader_storage_buffer_object.basic-syntax
GL45-CTS.shader_storage_buffer_object.basic-syntaxSSO

This patch moves a chunk of the linker code down, so
that we don't link the uniform blocks until after we've
merged all the variables. The logic went something like:

Removing the checks for last ssbo member unsized from
the compiler and into the linker, meant doing the check
in the link_uniform_blocks code. However to do that the
array sizing had to happen first, so we knew that the
only unsized arrays were in the last block. But array
sizing required the variable to be merged, otherwise
you'd get two different array sizes in different
version of two variables, and one would get lost
when merged. So the solution was to move array sizing
up, after variable merging, but before uniform block
visiting.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-26 12:42:10 +10:00
Dave Airlie
4d70fd1bc7 glsl: fix error message on uniform block mismatch
This looks like a cut-paste from above.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-26 12:40:41 +10:00
Dave Airlie
c952c0e713 glsl/ast: assign explicit_xfb_buffer from correct place
This fixes:
GL44-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through

As the OUT_TC interface structures weren't matching because
one of them had explicit_xfb_buffer set when it shouldn't.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-26 12:17:03 +10:00
Bruce Cherniak
c8835a5924 swr: [rasterizer] Correctly select optimized primitive assembly.
Indexed primitives were always using cut-aware primitive assembly,
whether primitive_restart was enabled or not.  Correctly pass down
primitive_restart and select optimized PA when possible.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-05-25 18:47:16 -05:00
Kenneth Graunke
978ab88858 docs: Mention i965/gen8+ supports GL 4.2 in release notes. 2016-05-25 14:22:56 -07:00
Kenneth Graunke
72ba9c3160 docs: Update GL_OES_copy_image status. 2016-05-25 14:22:30 -07:00
Kenneth Graunke
0f0f357b77 i965: Enable OES_copy_image (and EXT) on Gen8+ and Baytrail.
For now, only enable it on platforms that actually support ETC2.

At this point, Broadwell is only failing 5 (out of 8358) dEQP tests:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.
   srgb8_alpha8_r11f_g11f_b10f.renderbuffer_to_texture3d
   srgb8_alpha8_rgb10_a2ui.renderbuffer_to_cubemap
   srgb8_alpha8_rgb10_a2ui.renderbuffer_to_renderbuffer
   srgb8_alpha8_rgb10_a2.renderbuffer_to_texture2d
   srgb8_alpha8_rgb9_e5.renderbuffer_to_texture3d

These fail with all methods (meta, blorp, blitter, memcpy).

All are blacklisted from the Android mustpass list, which makes me
wonder whether there's an issue with the tests.  The formats in
question work with other targets, and the targets in question work
with other formats...

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
88a630121d i965: Implement a BLORP path for CopyImage and prefer it over Meta.
We're dropping Meta in favor of BLORP everywhere we can.

This also fixes bugs when copying cubemaps to 2D, which is currently
broken in the meta pass.  BLORP just works.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94198
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
2822c8a078 i965: Make the CopyImage BLT path bail for stencil images.
The BLT can't handle S8 because it's W-tiled (at least without
additional funny business, and I'm not sure we care).  Disallow
it so it falls back to the CPU path, which works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
c51702bdc8 i965: Also copy stencil miptree data.
The Meta path handles this, but the CPU/BLT fallbacks did not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
45d6818021 i965: Make a helper function for CopyImage of a miptree.
Currently, it only contains the BLT/CPU fallbacks, so the name is a bit
too generic.  But eventually this will use BLORP as well, at which point
the name will make more sense.

The next patch will introduce a second call.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
2dc98d9a15 i965: Combine src/dest tex vs. rb checks in intel_copy_image_sub_data.
This simplifies things a little - now we only have one (tex or rb?)
if-ladder for src, and a second for dst, rather than four.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Kenneth Graunke
1b39c5efca i965: Account for MinLayer in CopyImageSubData's blitter/CPU paths.
Fixes Piglit's arb_copy_image-texview test with the Meta path disabled
(so we hit the blitter/CPU fallback paths).

v2: Add MinLayer even for cube maps (suggested by Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-25 14:17:29 -07:00
Rob Clark
231dcb19f9 freedreno/ir3: cmdline compiler for glsl
Use glsl/libstandalone.la to add support for taking glsl src files (in
addition to .tgsi) as input.  Then glsl->nir and feed the result into
the ir3 backend as normal.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-25 16:31:15 -04:00
Rob Clark
0f982bb67d glsl: split out libstandalone
Split standalone glsl_compiler into a libstandalone.la and a thin
main.cpp.  This way drivers can re-use the glsl standalone frontend in
their own standalone compilers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-25 16:31:15 -04:00
Rob Clark
ec434d940d android: drop build of standalone glsl_compiler
It's only a tool for debugging the glsl compiler, and should not be
installed.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-25 16:31:15 -04:00
Matt Turner
61847d7708 i965: Mark fallthrough in switch statement.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Matt Turner
83c6749ddb i965: Assert that a depth_mt exists when using HiZ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Matt Turner
4a5e92ac70 nir: Strengthen assertion that 'out' is nonnull.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Matt Turner
44809f2371 spirv: Mark default cases unreachable().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Matt Turner
469a1c56a6 isl: Mark default cases unreachable.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Matt Turner
47dca31606 isl: Remove useless qualifier from return type.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-05-25 12:44:34 -07:00
Samuel Pitoiset
71c30bd87c nvc0: add descriptions for hardware perf counters/metrics
The GALLIUM_HUD does not yet expose a description for each events, but
this might be useful for developers who want to have a long description
of hw perf counters directly in the source code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-25 21:06:49 +02:00
Brian Paul
89e4de20fa mesa: 80-column wrapping for _context_lost_GetSynciv()
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2016-05-25 12:23:12 -06:00
Brian Paul
ae7c4a6f98 mesa: add GLAPIENTRY to new _context_lost_X functions
To fix MSVC build.  Any function which goes into the dispatch table
needs to have the GLAPIENTRY (__stdcall) tag.

Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2016-05-25 12:23:12 -06:00
Giuseppe Bilotta
1b62b47f6f scons: support 2.5.0
The get_implicit_deps changed in SCons 2.5, expecting a callable rather
than a path as third argument. Detect the SCons versions and set the
argument appropriately to support both 2.5 and earlier versions.

This closes #95211.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95211
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-25 12:23:12 -06:00
Giuseppe Bilotta
8c00fe3970 scons: whitespace cleanup
This text transformation was done automatically via the following shell
command:

$ find -name SCons\* -exec sed -i s/\\s\\+$// '{}' \;

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-25 12:23:12 -06:00
Alejandro Piñeiro
8c29bba242 i965/fs: take into account doubles when emitting system values
Fixes the following cts test:
GL42-CTS.vertex_attrib_64bit.limits_test

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-25 20:14:22 +02:00
Kristian Høgsberg Kristensen
89bb4be91e i965: Fix shadowing of 'height' parameter
The nested declaration of 'height' shadows a parameter and uses
uninitialized memory. Fix by renaming to 'plane_height' which also makes
the code clearer.

This would typically break the bo size computation, but we don't use
that except when mmaping, and we don't mmap YUV buffers much.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-25 09:42:55 -07:00
Kristian Høgsberg Kristensen
595224f714 mesa: Add .gitignore entries for make check binaries
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-05-25 09:41:44 -07:00
Kristian Høgsberg Kristensen
85008db1d5 i965: Enable GL_KHR_robustness
GL_KHR_robustness adds the GL_CONTEXT_LOST error and five new entry
points that we already implement.  This patch adds a new dispatch table
that returns GL_CONTEXT_LOST from all entry points and implements the
GL_LOSE_CONTEXT_ON_RESET strategy by setting that table when we learn
that we've lost the context.

With the GL_CONTEXT_LOST reporting in place and dispatch for the new
entry points we can turn on GL_KHR_robustness.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-25 09:41:44 -07:00
Emil Velikov
f036eea2cf .mailmap: Use Chia-I Wu personal e-mail.
The LunarG one is bouncing.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-25 17:38:06 +01:00
Emil Velikov
4b79f82836 .mailmap: Use my (Emil Velikov) personal e-mail.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-25 17:35:48 +01:00
Ilia Mirkin
21c1754306 docs: add missing GL_OES/EXT_gpu_shader5 enablement note
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-25 09:50:22 -04:00
Ilia Mirkin
601a5195eb glsl: add GL_EXT_clip_cull_distance define, add helpers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2016-05-25 09:50:07 -04:00
Brian Paul
9690ab0cdf tgsi: print TGSI_PROPERTY_NEXT_SHADER value as string, not an integer
Print "GEOM" instead of "2", for example.

v2: also update the text parsing code, per Ilia.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-25 07:21:23 -06:00
Brian Paul
2b773fcf00 tgsi: s/6/PIPE_SHADER_TYPES/ for tgsi_processor_type_names array size
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-25 07:21:23 -06:00
Jason Ekstrand
998829f404 nir/spirv: Handle location decorations on structure members 2016-05-24 21:12:56 -07:00
Jason Ekstrand
961369d597 nir/spirv: Add explicit handling for all decorations
From time to time we have had cases where glslang has added a decoration we
don't handle and it has caused problems.  This audit ensures that, for
every decoration, we either handle it or hit an unreachable() with an
accurate description of why we don't have to.
2016-05-24 21:12:56 -07:00
Jason Ekstrand
6f89e51c84 i965/draw: Use the correct buffer index for interleaved VBO sizes
The buffer_range_* arrays are indexed by buffer index not element index.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-24 20:50:35 -07:00
Jordan Justen
e58fabc93a i965/gen7: Fix gl_HelperInvocation
It appears that UV immediates aren't working on Ivy Bridge. In this
case, a signed version will work, and this fixes the piglit
tests/spec/glsl-4.50/execution/helper-invocation.shader_test test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-24 15:44:06 -07:00
Emil Velikov
e384d75b12 mesa_glinterop: make GL interop version field bidirectional
This allows clear and easy communication between the two.

Caller: Requesting information (struct vN)
Callee: I know how to deal with older version (vN-1) only. Here is your
data and the version I support.
Caller: Older version ? Sure I'll cap all access to the fields provided
by the older version (vN-1)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
0e983276b9 mesa_glinterop: drop mesa_glinterop_device_info::interop_version
One cannot use a single version to control both export_in and export_out
versions. Using this forces us to always extend/bump both structs at the
same time.

An alternative scheme is coming with next patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
f8a114aa5c st/dri: add note about GL interop version checks
... and make them more explicit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
923bdbf48c mesa_glinterop: rename MESA_GLINTEROP_INVALID_{VALUE,VERSION}
Be more explicit what it actually does.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
c196de23ae mesa_glinterop: s/struct_version/version/
OCD polish for consistency with other mesa interfaces.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
cb0708c843 mesa_glinterop: fix GL interop *_VERSION comments
Using the macro to set the version is wrong and ill-advised. Please don't
do it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
a3eb8702fb mesa_glinterop: remove inclusion of EGL header
Analogous to previous commit, but for EGL.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
8472045b16 mesa_glinterop: remove inclusion of GLX header
Since we only need partial information about the GLX symbols we can
forward declare them and drop the include. Obviously each user of the
said API will needs more than what's provides, so they'll include the
GLX header.

If they don't, the compiler will give us a nice warning ;-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
b5f9820d90 mesa_glinterop: remove unneeded GLAPI/GLAPIENTRY/APIENTRYP symbols
These come from windows.h, gl.h, glcorearb.h and/or glext.h.

The interop interface is aimed at non-Windows platforms while the macros
are used/derived due to Windows specifics. Thus we can safely remove
them.

Strictly speaking there should be GLXAPIENTRY/EGLAPIENTRY and alike
macros, although a) there is no GLX ones and b) this brings us even
further from decoupling the file from the GLX/EGL header dependency.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:03:00 +01:00
Emil Velikov
bcf9e47653 mesa_glinterop: replace GL types with their native counterpart.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:02:56 +01:00
Emil Velikov
2e726144f9 mesa_glinterop: use generic variable types for the GL interop
Thus we can preserve the ABI, while avoiding the inclusion of some/all
of the following:

  EGL/egl.h
  GL/gl.h
  GL/glcorearb.h
  GLES/gl.h
  GLES2/gl2.h
  GLES3/gl3.h
  GLES3/gl31.h

This will allow us to build/use it alongside any combination of APIs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:02:08 +01:00
Emil Velikov
cbf29d90ba mesa_glinterop: use consistent naming scheme for GL interop
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:02:08 +01:00
Emil Velikov
0d31bfd71a Revert "mesa: Build EGL without X11 headers after interop patchset"
This reverts commit 4e2c9a0435.

The solution was incomplete and fragile. An alternative one is coming
shortly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-24 23:02:05 +01:00
Ian Romanick
c8d9ed5ea1 docs: Note that GL_OES_geometry_shader and GL_OES_tessellation_shader are started
The GL_OES_geometry_shader work is on the oes_shader_io_blocks branch
of idr's fd.o repository.

The GL_OES_tessellation_shader work is on the tess-gles branch
of kwg's fd.o repository.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-24 12:45:46 -07:00
Emil Velikov
7e196cd170 c11/threads: resolve link issues with -O0
Add weak symbol notation for the pthread_mutexattr* symbols, thus making
the linker happy. When building with -O1 or greater the optimiser will
kick in and remove the said functions as they are dead/unreachable code.

Ideally we'll enable the optimisations locally, yet that does not seem
to work atm.

v2: Add the AX_GCC_FUNC_ATTRIBUTE([weak]) hunk in configure.

Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-05-24 20:21:31 +01:00
Tim Rowley
0ceed1701d swr: [rasterizer] remove containers.hpp
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-24 13:29:37 -05:00
Tim Rowley
1e3e22efb5 swr: [rasterizer core] remove utility dead code
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-24 13:29:29 -05:00
Tim Rowley
dc34479b8c swr: [rasterizer core] buckets fixes
1. Don't clear bucket descriptions to fix issues with sim level
   buckets getting out of sync.

2. Close out threadviz file descriptors in ClearThreads().

3. Skip buckets for jitter based buckets when multithreaded. We need
   thread local storage through llvm jit functions to be fixed before
   we can enable this.

4. Fix buckets StopCapture to correctly detect capture complete.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-24 13:29:21 -05:00
Tim Rowley
3074a2b4fa swr: [rasterizer core] move centroid setup out of CalcCentroidBarycentrics
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-24 13:29:14 -05:00
Tim Rowley
9a2a4ecb39 swr: [rasterizer jitter] implement InstanceID/VertexID in fetch jit
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-24 13:28:47 -05:00
Ian Romanick
7fc4a82007 mesa: Silence unused parameter warnings
Neither shProg nor name was used.  Remove them both.

main/shader_query.cpp:779:53: warning: unused parameter ‘shProg’ [-Wunused-parameter]
 program_resource_location(struct gl_shader_program *shProg,
                                                     ^
main/shader_query.cpp:780:72: warning: unused parameter ‘name’ [-Wunused-parameter]
                           struct gl_program_resource *res, const char *name,
                                                                        ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-24 11:04:08 -07:00
Ian Romanick
78399cf170 glsl/linker: Silence unused parameter warning
The parameter is required for the interface.

glsl/link_uniforms.cpp:689:61: warning: unused parameter ‘record_type’ [-Wunused-parameter]
                             bool row_major, const glsl_type *record_type,
                                                             ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-24 11:04:05 -07:00
Kristian Høgsberg Kristensen
2bb935be2e dri: Add YVU formats
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
1be1114e6b i965: Allow creating planar YUV __DRIimages
Lift the resctriction we had before and allow creation of images with
multiple planes. We still require all the planes to be within the same
bo.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
654e950cba i965: Invoke lowering pass for YUV textures
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
44997fc0c1 i965: Support textures with multiple planes
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
3352f2d746 i965: Create multiple miptrees for planar YUV images
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
6eede87631 i965: Refactor intel_set_texture_image_bo() to create_mt_for_dri_image()
This function now only creates the mt and we then call
intel_set_texture_image_mt() in intel_image_target_texture_2d() to set
it for the texture image.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-24 10:14:57 -07:00
Kristian Høgsberg Kristensen
8ceb7c7d9b i965: Use intel_set_texture_image_mt() in intelSetTexBuffer2()
Create the mt for the drawable bo directly and call our new
intel_miptree_create_for_bo() helper instead.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen
40e9be4a5c i965: Add new intel_set_texture_image_mt() helper
This factors out the work of setting up a miptree as the backing for a
texture image into a new helper.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen
a41b57679f nir: Add a lowering pass for YUV textures
This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen
50c24c3ff3 nir: Handle NULL in nir_copy_deref()
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen
29921ee987 nir: Add new 'plane' texture source type
This will be used to select the plane to sample from for planar
textures.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-24 10:14:56 -07:00
Brian Paul
39b7b8b906 mesa: log buffer ID numbers in decimal, not hexadecimal
All the other error messages use decimal.  Let's be consistent.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-24 10:26:26 -06:00
Brian Paul
ce1cc70e27 mesa: use enum name in bind_buffer_object() error message
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-24 10:26:26 -06:00
Brian Paul
55c19527a6 mesa: raise error for glEnable(GL_VERTEX_ARRAY), etc. in core profile
Otherwise, if the call executes normally we'll hit an assertion later
in the VBO code when we draw something.  Note that these cases were
already handled correctly for the glIsEnabled() function (and the API
checks were copied from there).

Tested with new piglit gl-3.1-enable-vertex-array test.

v2: fix compat/es mix-up, per Ilia.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-24 10:26:26 -06:00
Nicolas Boichat
a9b2b5e241 docs/egl: Android platform can also be build using autotools
We added support for Android build using autotools (configure),
update the documentation to reflect that.

Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-24 16:24:54 +01:00
Juan A. Suarez Romero
e79aa19d88 i965: fix double-precision vertex inputs measurement
For double-precision vertex inputs we need to measure them in dvec4
terms, and for single-precision vertex inputs we need to measure them in
vec4 terms.

For the later case, we use type_size_vec4() function. For the former
case, we had a wrong implementation based on type_size_vec4().

This commit introduces a proper type_size_dvec4() function, that we use
to measure vertex inputs.

Measuring double-precision vertex inputs as dvec4 is required because
ARB_vertex_attrib_64bit states that these uses the same number of
locations than the single-precision version. That is, two consecutives
dvec4 would be located in location "x" and location "x+1", not "x+2".

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-24 10:06:29 +02:00
Ilia Mirkin
ccd58015a2 docs: true up nvc0 status - images, etc
Images aren't supported on maxwell, but neither is tessellation. Don't
overly confuse matters by trying to expose those subtleties in the
GL3.txt file/relnotes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-05-23 23:47:11 -04:00
Ilia Mirkin
856587909c st/mesa: enable ARB_ES3_1_compatibility when ES 3.1 would be exposed
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-23 23:47:11 -04:00
Ilia Mirkin
5878254545 mesa: remove separate enable for KHR_robust_buffer_access_behavior
This extension appears to be a strict subset of the ARB version. Also
remove it from GL3.txt since it doesn't seem relevant.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 23:47:11 -04:00
Timothy Arceri
72449c477e glsl: add support for explicit components to frag outputs
V2: fix error checking for arrays and components. V1 was
only taking into account all the array elements and all the
components of one of the varyings during the comparision
and treating the other as a single slot/component.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-24 12:46:48 +10:00
Ilia Mirkin
37266dfb7c mesa: add view classes for 3d astc formats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-23 22:34:37 -04:00
Ilia Mirkin
979bcb9f42 glsl: add EXT_clip_cull_distance support based on ARB_cull_distance
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-23 22:22:06 -04:00
Ilia Mirkin
f236f1f506 nvc0: expose robust buffer access
We apparently pass all the relevant CTS tests. There are probably some
shortcomings, but they can be addressed down the line.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-23 22:22:05 -04:00
Jason Ekstrand
9f5ccaf4dc i965: Use ISL for surface format introspection
With this, we can delete the surface format table in brw_surface_formats.c
because all of the information we need is now in ISL.
2016-05-23 19:12:34 -07:00
Jason Ekstrand
d68acde1cb anv/formats: Use isl_format_supports* for format introspection 2016-05-23 19:12:34 -07:00
Jason Ekstrand
7374d006b6 isl: Add per-gen format introspection
This is just a copy-and-paste from brw_surface_formats.c.  For the
supports_vertex_fetch function, we do a bit more work so that it properly
handles Bay Trail.
2016-05-23 19:12:34 -07:00
Jason Ekstrand
03a82dc5d1 isl: Add the ISL_FORMAT_R32G32_FLOAT_LD format 2016-05-23 19:12:34 -07:00
Jason Ekstrand
35a514e6ff isl: Add support for quering the string name of a format 2016-05-23 19:12:34 -07:00
Jason Ekstrand
75d10dff0b i965: Enable ARB/KHR_robust_buffer_access_behavior on BYT and HSW+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
1a092fcf3b main: Add extension enable bits for KHR_robust_buffer_access_behavior
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
66e137ecf1 nir/lower_samplers: Protect against sampler index overflow
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
27b9481d03 glsl: Add an option to clamp block indices when lowering UBO/SSBOs
This prevents array overflow when the block is actually an array of UBOs or
SSBOs.  On some hardware such as i965, such overflows can cause GPU hangs.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
ac242aac3d glsl/linker: Add a helper variable for compiler options
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
aec10a1d5b i965/draw: Use the real size for index buffers
Previously, we were using the size of the whole BO which may be
substantially larger than the actual index buffer size.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
7c8dfa78b9 i965/draw: Use the real size for vertex buffers
Previously, we were using the size of the BO which may be substantially
larger than the actual vertex buffer size.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
a643bc6246 i965/draw: Use 3-channel formats for vertex fetch when possible.
For a long time, several of the 3-channel vertex formats didn't exist so we
faked them with 4-channel versions.  Starting with Sandy Bridge, we can use
R16G16B16_FLOAT and 8 and 16-bit integer formats become available on
Haswell and Bay Trail.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
ab3d8d5ea4 i965/surface_formats: Update the VB column for new formats added on BYT
Bay Trail and Haswell added a bunch of new vertex formats.  There was also
the addition of 64-bit passthrough formats for BDW+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
d5b4ab2c5f i965/draw: Properly handle rounding when dividing by InstanceDivisor
The old code always divided rounded down and then subtracted 1.  What we
wanted was to divide rounded up and then subtract 1 which is equivalent to
subtracting 1 and then dividing rounded down.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
ad42ab473c i965/draw: Account for BaseInstance in VBO bounds
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
ad3deec8ca i965/draw: Use worst-case VBO bounds if brw->num_instances == 0
Previously, we only handled the "I don't know what's going on" case for
things with InstanceDivisor == 0.  However, in the DrawIndirect case we can
get num_instances == 0 and we don't know what's going on with the instanced
ones either.  This commit makes the worst-case bound the default and then
conservatively tightens the bound.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
8892519751 i965/draw: Delay when we get the bo for vertex buffers
The previous code got the BO the first time we encountered it.  However,
this can potentially lead to problems if the BO is used for multiple arrays
with the same buffer object because the range we declare as busy may not be
quite right.  By delaying the call to intel_bufferobj_buffer, we can ensure
that we have the full range for the given buffer.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
a01a1eb9e4 i965/draw: Stop relying on min_index == -1 for invalid index bounds
The vbo layer passes an index_bounds_valid flag that we should be using
instead.  This also fixes a bug when min_index == -1 and basevertex != 0
where we were actually comparing min_index + basevertex == -1 which was
false and we were getting the wrong buffer-sizing path.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
a7011922f1 vbo: Declare the index range invalid for DrawTransformFeedback
Right now, we're setting the range to [0, 0] which is obviously bogus.
Instead, we should set it to be invalid like we do for DrawIndirect.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Jason Ekstrand
df6ec2aba5 vbo: Declare the index range invalid for DrawIndirect
Right now, we're just setting the range to [0, MAX_UINT32] which, while
correct isn't helpful.  With DrawIndirect, you can't really know what the
actual range is so we may as well flag it as being an invalid range.  This
is what we do for draws with index buffer which is similar (the indices
aren't statically known) if a bit simpler.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-23 19:12:34 -07:00
Ilia Mirkin
21f3df0820 mesa/teximage: fix GL_FLOAT in comment
Noticed by Brian. Trivial.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-23 21:44:41 -04:00
Timothy Arceri
2d9308012c glsl: fix explicit location validation for doubles
Previously we would fail to find a match for the second half of a
dvec4 as 'i' would get incremented to 1 before we added the var to
the array at component 0.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-24 11:30:51 +10:00
Dave Airlie
33397bf7fd docs: update ARB_cull_distance status.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 11:27:58 +10:00
Dave Airlie
5c10d47bae st/mesa: reenable culling
Now the lowering pass is fixed, reenable ARB_cull_distance.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 11:27:54 +10:00
Dave Airlie
a88c5d7e55 i965: reenable ARB_cull_distance.
Now the lowering pass is fixed we can reenable culling.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 11:27:29 +10:00
Dave Airlie
a08c4ebbe8 glsl: rewrite clip/cull distance lowering pass
The last version of this broke clipping, and I had to spend
sometime getting this working properly.

I had to introduce a third pass to count the clip/cull totals,
all due to one messy corner case. We have a piglit test
tes-input-gl_ClipDistance.shader_test
that doesn't actually output the clip distances, it just passes
them like a varying from TCS->TES, the older lowering pass worked
but to lower clip/cull we need to know the total number of clip+culls
used to defined the new variable correctly, and to offset culls
properly.

This adds an extra pass that works out the sizes for clip/cull,
then lowers gl_ClipDistance then gl_CullDistance into the new
gl_ClipDistanceMESA.

The pass checks using the fixed array sizes code if they array
has been referenced, or is actually never used, and ignores
it in the latter case.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 11:27:29 +10:00
Dave Airlie
8c628ab13e glsl: make max array trackers ints and use -1 as base. (v2)
This fixes a bug that breaks cull distances. The problem
is the max array accessors can't tell the difference between
an never accessed unsized array and an accessed at location 0
unsized array. This leads to converting an undeclared unused
gl_ClipDistance inside or outside gl_PerVertex to a size 1
array. However we need to the number of active clip distances
to work out the starting point for the cull distances, and
this offset by one when it's not being used isn't possible
to distinguish from the case were only the first element is
accessed. I tried to use ->used for this, but that doesn't
work when gl_ClipDistance is part of an interface block.

So this changes things so that max_array_access is an int
and initialised to -1. This also allows unsized arrays to
proceed further than that could before, but we really shouldn't
mind as they will get eliminated if nothing uses them later.

For initialised uniforms we no longer change their array
size at runtime, if these are unused they will get eliminated
eventually.

v2: use ralloc_array (Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 11:27:29 +10:00
Nanley Chery
2ae493d686 anv/formats: Make alpha blending a property of render targets
In agreement with the SNB PRM, alpha blending is a property that render
targets may or may not support.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 17:26:17 -07:00
Nanley Chery
9721be6681 i965: Unset alpha blend for R10G10B10_SNORM_A2_UNORM
This format does not support alpha blending, according to the SNB PRM.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 17:26:17 -07:00
Dave Airlie
8b89c92ef6 i965: deindent blorp code.
gcc6 warns about this.

Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 10:14:31 +10:00
Dave Airlie
e257284481 glsl: reindent line in ast_function.cpp
This fixes a warning with gcc -Wmisleading-indentation.

Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-24 10:14:31 +10:00
Ilia Mirkin
82d756f3af mesa: allow GL_FRAMEBUFFER_DEFAULT_LAYERS to be queried with ES geometry
When we have the geometry extensions, enable querying of the new param.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-23 20:03:40 -04:00
Ilia Mirkin
2dabd49704 mesa: allow xfb to be active in GLES when geometry shader is enabled.
OES_geometry_shader has wording to allow xfb when using Draw*Indirect
and DrawElements.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-23 20:03:20 -04:00
Ilia Mirkin
2e8e1e8909 main: check driver float texture support before upgrading to 16F/32F
When passing in GL_RGBA or other base formats, we will try to upgrade
the format to whatever the passed in type was. However not all drivers
(notably nv30) support 32F textures, and so this would lead to crashes
down the line. Only upgrade when the relevant extensions are available.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-23 20:00:39 -04:00
Ilia Mirkin
1e99a46b44 st/mesa: update inst->info along with inst->op
Otherwise we still have TGSI_OPCODE_CMP's info, which causes a number of
later logic to go wrong. This fixes

dEQP-GLES2.functional.shaders.functions.control_flow.return_in_if_vertex

on nv30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-23 19:58:53 -04:00
Bas Nieuwenhuizen
533d1e9085 glsl: Use correct mode for split components.
The mode should stay the same as the original struct. In
particular, shared should not be changed to temporary.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-24 09:55:38 +10:00
Kenneth Graunke
1c1873b93b mesa: Implement glGet*(GL_PRIMITIVE_RESTART_FOR_PATCHES_SUPPORTED).
Technically, this was introduced with GL 4.4.  However, I believe it
was intended to be retroactive.  As far as I know, AMD has never
supported primitive restart with patches, while NVidia and Intel do.
This necessitated the need for a query which would allow applications
to figure out whether this was usable or not.

I decided to expose it everywhere ARB_tessellation_shader is exposed.
(It's also in both OES and EXT_tessellation_shader.)

Enable this for i965 and Gallium drivers which expose the capability.

v2: Fix a bug in the state_tracker code (caught by Ilia Mirkin).

Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10364
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-23 16:44:22 -07:00
Kenneth Graunke
70048eb1e3 gallium: Add a pipe cap for whether primitive restart works for patches.
Some hardware supports primitive restart on patch primitives, and other
hardware does not.  Modern GL and ES include a query for this feature;
adding a capability bit will allow us to answer it.

As far as I know, AMD hardware does not support this feature, while
NVIDIA and Intel hardware does.  However, most Gallium drivers do not
appear to support tessellation shaders yet.  So, I've enabled it for
nvc0 and disabled it everywhere else.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-23 16:44:11 -07:00
Francisco Jerez
015035027b i965/fs: Mark UBO uniform pull constant loads as force_writemask_all.
This lets the rest of the backend know that the uniform pull constant
load opcodes don't respect channel enables -- Without this the
register allocator has no way to know that the return payload of a
pull constant load is not per-channel and spills of the destination
will be broken under non-uniform control flow.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:07:23 -07:00
Francisco Jerez
7eb4966887 i965/fs: Allow spilling of non-contiguous registers.
This should be working fine now.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94997
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:21 -07:00
Francisco Jerez
6fc5dd5b6a i965/fs: Calculate the (un)spill block size correctly.
Currently the spilling code attempts to guess the scratch message
block size from the dispatch width of the shader, which is plain wrong
for SIMD-lowered instructions (frequently but not exclusively
encountered in SIMD32 shaders) or for instructions with register
region data types of size other than 32 bit.

Instead try to use the SIMD component size of the instruction which in
some cases will allow the dataport to apply the correct channel mask
to the scratch data read or written.  In the spill case the block size
needs to be clamped to the number of MRF registers reserved for
spilling.  In the unspill case I didn't even bother because we
currently have no 100% accurate way to determine whether a source
region is per-channel or whether it contains things like headers that
don't respect channel boundaries -- That's fine, because the unspill
is marked force_writemask_all we can just use the largest allowable
scratch message size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:21 -07:00
Francisco Jerez
11260cc54f i965/fs: Set exec_all on spills not matching the channel layout of the instruction.
This prevents the application of an incorrect channel mask by the
scratch write instruction for spilled variables that don't have an
exact one-to-one correspondence between channels of the variable and
32-bit components of the scratch write instruction.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:21 -07:00
Francisco Jerez
bb67c467a4 i965/fs: Set exec_all on unspills.
This makes sure that unspills restore the exact contents of the
variable in scratch space into the GRF without applying channel
masking, which is incorrect under control flow for things like message
headers or vectors of heterogeneous types that don't properly respect
channel boundaries.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Francisco Jerez
07e67cc266 i965/fs: Move scratch block size calculation into the caller of emit_(un)spill.
This makes emit_(un)spill even more stupid by removing the logic that
decides what execution size each scratch read or write send message
should have and instead relying on the caller to specify an
appropriate execution size via the builder argument.  This makes sense
because the caller will need to act differently based on the scratch
message width (e.g. emit an additional unspill before the instruction
if the execution width and channel layout of the spill doesn't match
the instruction's).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Francisco Jerez
284c8fbcef i965/fs: Make emit_spill/unspill static functions taking builder as argument.
This seems cleaner than exposing an implementation detail of
brw_fs_reg_allocate.cpp to the world, and will give the caller control
over the instruction execution flags (e.g. force_writemask_all) that
are applied to the scratch read and write instructions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Francisco Jerez
70023c40c6 i965/fs: Apply execution controls from the instruction to scratch messages.
Until now the execution controls (e.g. channel group,
force_writemask_all, exec_size) of the instruction had been completely
ignored by spilling, even though that can lead to a mismatch between
the channel mask applied to the contents of the (un)spilled memory and
the GRF source or destination of the instruction.  In some cases we'll
actually want the (un)spill messages to be marked force_writemask_all
regardless of whether the instruction has it set, but that will have
to be handled specially by the caller.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Francisco Jerez
e98cf03114 i965/fs: Fix signedness of local variables and arguments of emit_(un)spill.
To avoid some some spurious warnings about comparison signedness in
the following commits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Francisco Jerez
f471d3eede i965/fs: Factor out calculation of the block of MRFs reserved for spilling.
And as we're at it fix the calculation to allocate a larger block of
registers for 32-wide dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 14:05:20 -07:00
Plamena Manolova
21edd24c0d egl: Add OpenGL_ES to API string regardless of GLES version
According to the EGL specifications eglQueryString(EGL_CLIENT_APIS)
should return a string containing a combination of "OpenGL", "OpenGL_ES"
and "OpenVG", any other values would be considered invalid. Due to this
when the API string is constructed, the version of GLES should be
disregarded and "OpenGL_ES" should be attached once instead of
"OpenGL_ES2" and "OpenGL_ES3".

Fixes:
dEQP-EGL.functional.negative_api* and
dEQP-EGL.functional.query_context.simple.query_api

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-23 13:46:01 -07:00
Rob Clark
46ff17559b freedreno/ir3: disable cp for indirect src's
The variable-indexing tests always had a few random fails, which I
usually couldn't reproduce when running tests manually.  Somehow
recently this got a lot worse.  I ported a couple of the shaders to
GLES to see what blob does, and it also seems to be avoiding to cp
indirect srcs.  So I guess indirect w/ instructions other than cat1
(mov) are not totally reliable.  Let's just switch that off until
this is better understood.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-23 15:57:13 -04:00
Samuel Pitoiset
c3c4370299 nvc0: do not invalidate compute constbufs on Kepler
Constbufs are only aliased on Fermi and this will reduce the number of
flushes when we switch between 3d and compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-23 20:56:29 +02:00
Rob Clark
5245d845b6 nir/validate: fix null deref coverity warning
CID 1265536 (#1 of 2): Explicit null dereferenced (FORWARD_NULL)6.
var_deref_op: Dereferencing null pointer parent.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-23 10:14:50 -04:00
Nicolas Boichat
0cbc90c57c mesa: dri: Add shared glapi to LIBADD on Android
/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"

On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
  dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
  dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);

However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.

Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: s/explicitely/explicitly/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-23 13:25:51 +01:00
Nicolas Boichat
27d713a004 configure.ac: Add support for Android builds
Add support for EGL android platform.

Also, detect when --host finishes with -android. In that case, we
do not set _GNU_SOURCE, and define autoconf symbol HAVE_ANDROID, so
that Android-specific workarounds can be applied.

Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: Rebase on top of HAVE_EGL_PLATFORM_NULL removal]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-23 13:23:39 +01:00
Emil Velikov
960d854a98 anv: remove define _DEFAULT_SOURCE
The build systems already add this as applicable. There's no need to
have this in the source file.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:09:11 +01:00
Emil Velikov
1b64d1247d gbm: remove define _DEFAULT_SOURCE
The build systems already add this as applicable. There's no need to
have this in the source file.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:09:11 +01:00
Emil Velikov
efe4beb717 gbm: remove define _BSD_SOURCE
The build systems already add this as applicable. There's no need to
have this in the source file.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:09:11 +01:00
Jiri Slaby
a6ce91fe52 glxcmds: glXGetFBConfigs, fix screen bounds
Bounds of screen are 0 (inclusive) and ScreenCount(dpy) (exclusive).
The upper bound was too ScreenCount(dpy) (inclusive).

This causes a crash invoked by java3d which passes down an invalid
screen:
6  0x00007f0e5198ba70 in <signal handler called> () at /lib64/libc.so.6
7  0x00007f0e14531e14 in glXGetFBConfigs (dpy=<optimized out>, screen=1, nelements=nelements@entry=0x7f0dab3c522c) at glxcmds.c:1660
8  0x00007f0e14532f7f in glXChooseFBConfig (dpy=<optimized out>, screen=<optimized out>, attribList=0x7f0dab3c54e0, nitems=0x7f0dab3c535c) at glxcmds.c:1611
9  0x00007f0e1478d29b in find_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
10 0x00007f0e1478d3dc in find_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
11 0x00007f0e1478d567 in find_AA_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
12 0x00007f0e1478d728 in find_DB_AA_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
13 0x00007f0e1478d97c in Java_javax_media_j3d_X11NativeConfigTemplate3D_chooseOglVisual () at /usr/lib64/libj3dcore-ogl.so

While ScreenCount(dpy) is actually 1:
(gdb) p dpy->nscreens
$2 = 1
screen=1 is passed to glXGetFBConfigs.

Fix this typo in glXGetFBConfigs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95456
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-23 12:07:47 +01:00
Elie TOURNIER
0f738fa23e doxygen: Add missing modules to Windows runner
Acked-by: Rhys Kidd <rhyskidd@gmail.com>
2016-05-23 12:07:47 +01:00
Emil Velikov
793574afad egl: add missing link against $(CLOCK_LIB)
Some platforms require separate library in order to resolve the
clock_gettime() symbol. Add the link or the build will fail.

Fixes: 70299474f5 ("egl: add EGL_KHR_reusable_sync to egl_dri")

Cc: Dongwon Kim <dongwon.kim@intel.com>
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:07:47 +01:00
Emil Velikov
d67e757d11 egl: android: remove explicit glFlush call
The DRI flush extension should already do the same thing.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Herring <robh@kernel.org>
2016-05-23 12:07:47 +01:00
Emil Velikov
9b3c7481c6 egl: android: drop dri2_create_image_android_native_buffer argument
The drv is no longer used/needed as of last commit.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
2016-05-23 12:07:47 +01:00
Emil Velikov
38ef6f5f60 egl: android: directly use dri2_create_image_dma_buf()
Make the function non static so that we can use it directly from the
android platform code.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
2016-05-23 12:07:47 +01:00
Emil Velikov
2cd687ce97 configure.ac: error out when building from git without python3
Bail early, as opposed to later on during the build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-23 12:07:47 +01:00
Emil Velikov
a155cdaace vl/drm: don't call close(-1) in vl_drm_screen_create error path
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-05-23 12:07:47 +01:00
Emil Velikov
ed3f6ccce0 st/xa: don't call close(-1) in xa_tracker_create error path
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-05-23 12:07:46 +01:00
Emil Velikov
6e00a1e6cb st/dri: don't call close(-1) in dri{2, kms_}_init_screen error path
Add separate labels and jump to the correct one as needed.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-05-23 12:07:46 +01:00
Eric Engestrom
7362bb3e21 vk/intel: use negative VK_NO_PROTOTYPES scheme
3d0fac7aca changed all
VK_PROTOTYPES to VK_NO_PROTOTYPES
This brings the Intel header in line with the rest of the Vulkan code.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-05-23 12:07:46 +01:00
Rob Herring
8aeb6d768b gbm: Add map/unmap functions
This adds map and unmap functions to GBM utilizing the DRIimage extension
mapImage/unmapImage functions or existing internal mapping for dumb
buffers. Unlike prior attempts, this version provides a region to map and
usage flags for the mapping. The operation follows the same semantics as
the gallium transfer_map() function.

This was tested with GBM based gralloc on Android.

Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: drop no longer relevant hunk from commit message.]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
1f4869a208 configure.ac: add pthreadstubs support
Add pthreadstubs to avoid pulling in full pthreads library. GBM will be the
first user.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
0a4275b534 gbm: rename gbm_dri_bo_{map,unmap} to gbm_dri_bo_{map,unmap}_dumb
In preparation to add public map/unmap functions, rename the existing
gbm_dri_bo_{map,unmap} functions to indicate that they are only for dumb
buffers.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:07:46 +01:00
Rob Herring
e8431a630d st/dri: Add support for DRIimage extension mapImage/unmapImage
Implement support for mapImage/unmapImage functions in version 12 of the
DRIimage extension.

Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: align/indent the map/unmap vfuncs]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
a0f06f168f DRI: Add DRIimage map and unmap functions
Add mapImage and unmapImage functions to DRIimage extension for mapping
and unmapping DRIimages for CPU access. The caller provides the region of
the image to map and is returned a pointer to the beginning of the region
and the stride (which could be different from the original).

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
bdfa635f72 gbm: Add Android build support
In order to use libgbm for gralloc, add it to the Android build.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
64a005e3ee gbm: add Android gallium_dri.so library loading support
GBM needs the same special gallium_dri.so loading as EGL for Android, so
copy over the same hunk from the EGL code.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:46 +01:00
Rob Herring
7d79eec456 gbm: split out source file to Makefile.sources
In preparation to add Android build support, split out the source file
lists to Makefile.sources

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>

[Emil Velikov: Whitespace cleanup.]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-23 12:07:46 +01:00
Rob Herring
fc1806e041 Android: Move setting DEFAULT_DRIVER_DIR to shared location
Move the defining of DEFAULT_DRIVER_DIR path to a common location so both
EGL and GBM can use it.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-23 12:07:45 +01:00
Emil Velikov
6ce11e7e2c c11/threads: create mutexattrs only when needed
If the mutexattrs are the default one can just pass NULL to
pthread_mutex_init. As the compiler does not know this detail it
unnecessarily creates/destroys the attrs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-23 12:07:45 +01:00
Andres Gomez
4424bf5da4 configure: added xcb to dri3 modules to pkg-conf
This fixes a recent linking error in libvulkan_common

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-05-23 11:21:34 +02:00
Juan A. Suarez Romero
3c9096eea4 glsl/linker: dvec3/dvec4 consume twice input vertex attributes
From the GL 4.5 core spec, section 11.1.1 (Vertex Attributes):

"A program with more than the value of MAX_VERTEX_ATTRIBS
active attribute variables may fail to link, unless
device-dependent optimizations are able to make the program
fit within available hardware resources. For the purposes
of this test, attribute variables of the type dvec3, dvec4,
dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may
count as consuming twice as many attributes as equivalent
single-precision types. While these types use the same number
of generic attributes as their single-precision equivalents,
implementations are permitted to consume two single-precision
vectors of internal storage for each three- or four-component
double-precision vector."

This commits makes dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4,
dmat4x3 and dmat4 consume twice as many attributes as equivalent
single-precision types.

v3: count doubles as consuming two attributes (Dave Airlie)
v4: make reference to spec (Michael Schellenberger Costa)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2016-05-23 10:48:07 +02:00
Francisco Jerez
b46867cd37 i965/fs: do not depend on std140 alignment rules for UBO loads
The previous implementation relied on the std140 alignment rules to
avoid handling misalignment in the case where we are loading more than
2 double components from a vector, which requires to emit a second load
message.

This alternative implementation deals with misalignment and is more
flexible going forward.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-05-23 08:56:57 +02:00
Iago Toral Quiroga
38b719d624 nir: handle double-precision in fsign, fsat, fnot and frcp
I think these are not strictly necessary since the floats in them
should be automatically promoted to doubles when operated with
double sources, but it makes things more explicit at least.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-23 08:54:37 +02:00
Iago Toral Quiroga
3f73039ade nir: handle double-precision in fabs, frsq and fsqrt
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-23 08:54:28 +02:00
Dave Airlie
3466db3969 glsl/parser: handle multiple layout sections with AST nodes.
For geometry/compute inputs and tess control outputs, we create
an AST node to keep track of some things. However if we have
multiple layout sections, we don't ever link the node into the AST.

This is because we create the node on the rightmost layout declaration
and don't pass it back in so it gets linked at the end of the parsing
of the rightmost.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:20:01 +10:00
Dave Airlie
aaa69c79cd glsl: allow layout qualifier overrides with ARB_shading_language_420pack
GLSL 4.20 allows overriding the layout qualifiers.

This helps fix:
GL45-CTS.shading_language_420pack.qualifier_override_layout

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
6f2dc0d044 subroutines: handle explicit indexes properly
The code didn't deal with explicit function indexes properly.
It also handed out the indexes at link time, when we really
need them in the lowering pass to create the correct if ladder.

So this patch moves assigning the non-explicit indexes earlier,
fixes the lowering pass and the lookups to get the correct values.

This fixes a few of:
GL45-CTS.explicit_uniform_location.subroutine-index-*

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
5fe912831c mesa/subroutines: fix reset on bindpipeline
Fixes:
GL45-CTS.shader_subroutine.subroutine_uniform_reset

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
7fa0250f94 mesa/subroutines: count number subroutines properly.
The code was implementing the ACTIVE_SUBROUTINE_UNIFORMS
incorrectly, using the number of types not the number of
uniforms. This is different than the locations as the
locations may be sparsly allocated.

This fixes:
GL43-CTS.shader_subroutine.four_subroutines_with_two_uniforms

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
22db9b10eb mesa/subroutines: don't generate error in GetSubroutineIndex.
GLSL spec says this doesn't generate an error.

Fixes:
GL45-CTS.explicit_uniform_location.subroutine-loc

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
3b8b6be7bb glsl/ast: for geom shaders allow stream flags in input flags.
This fixes:
GL45-CTS.shader_subroutine.subroutines_with_separate_shader_objects

Since we set the stream flags earlier on all geom shaders, we
shouldn't fall over later if we find one.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
93b3b6af3c glsl/linker: skip inactive explicit locations.
This fixes a crash in:
GL45-CTS.explicit_uniform_location.subroutine-loc-negative-link-max-num-of-locations

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
c714731653 glsl: fix subroutine uniform .length().
This fixes .length() on subroutine uniform arrays, if
we don't find the identifier normally, we look up the corresponding
subroutine identifier instead.

Fixes:
GL45-CTS.shader_subroutine.arrays_of_arrays_of_uniforms
GL45-CTS.shader_subroutine.arrayed_subroutine_uniforms

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:57 +10:00
Dave Airlie
432ac19c1a glsl/linker: link error on too many subroutine functions.
This fixes:
GL45-CTS.explicit_uniform_location.subroutine-index-negative-link-max-num-of-indices

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:56 +10:00
Dave Airlie
18b0a13e80 glsl: produce a linker error for a subroutine uniform with no functions.
If a subroutine uniform is declared with no functions backing it,
that isn't legal, so we should fail to link.

Fixes:
GL43-CTS.shader_subroutine.subroutine_uniform_wo_matching_subroutines

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:56 +10:00
Dave Airlie
b572b599ef glsl: validate subroutine types match function signature.
This fixes:
GL43-CTS.shader_subroutine.subroutines_incompatible_with_subroutine_type

It just makes sure the signatures match as well as the return
types.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:19:56 +10:00
Dave Airlie
ba3414d832 arb_shader_subroutine: check active subroutine limit
_mesa_GetActiveSubroutineUniformiv needs to check
against the number of types here.

Noticed while playing with ogl conform.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 16:18:25 +10:00
Ilia Mirkin
74e71cbfcb nv30: don't assert when running out of registers
This happens with dEQP tests. The code doesn't at all protect against
this condition, so while unhandled, this is an expected situation.

Also avoid using more than the first 16 registers for nv3x vertex
programs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-22 22:57:18 -04:00
Ilia Mirkin
36ff09cdfe nouveau: allow allocating non-object-backed buffers
On nv30, for example, there is no hardware index buffer support. So all
of those will be created entirely in user memory.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-22 22:57:18 -04:00
Tobias Klausmann
96f390ff35 llvm/softpipe: Enable cull_distance as draw supports it.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:04:37 +10:00
Dave Airlie
e6d9389366 tgsi: remove culldist semantic.
This isn't used anymore in the tree, culldist's
are part of the clipdist semantic, we could in theory
rename it, but I'm not sure there is much point, and
I'd have to be careful with virgl.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:03:44 +10:00
Dave Airlie
d17062a40e draw: stop using CULLDIST semantic.
The way the HW works doesn't really fit with having
two semantics for this.

The GLSL compiler emits 2 vec4s and two properties,
this makes draw use those instead of CULLDIST semantics.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:03:40 +10:00
Emil Velikov
bddb3b5375 virgl: remove unused state_tracker/graw.h include
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:02:17 +10:00
Dave Airlie
62c728f7d8 mesa/queryobject: return INVALID_VALUE if offset < 0 (v2)
This fixes:
GL45-CTS.direct_state_access.queries_errors

The ARB_direct_state_access spec agrees.

v2: move check down further (Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 07:33:03 +10:00
Samuel Pitoiset
a7fad12931 nvc0/ir: fix indirect access for images
When the array doesn't start at 0 we need to account for su->tex.r.
While we are at it, make sure to avoid out of bounds access by masking
the index.

This fixes GL45-CTS.shading_language_420pack.binding_image_array.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-22 23:06:16 +02:00
Ilia Mirkin
cb9a51d1f6 nv30: reset the stencil mask when fast-clearing
Apparently the stencil mask applies to clears on nv30/nv40. Reset it to
0xff before doing a stencil clear. This fixes gl-1.0-readpixsanity and
a number of other piglit tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-22 14:48:56 -04:00
Ilia Mirkin
f57a8440d5 nv30,nv50: add PIPE_SHADER_CAP_PREFERRED_IR support
The mesa state tracker has recently started to query this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-22 14:05:36 -04:00
Ilia Mirkin
9f19ccff9c nvc0: fix setting of tess_mode in various situations
This fixes a lot of INVALID_VALUE errors reported by the card when
running dEQP tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-05-22 11:58:22 -04:00
Ilia Mirkin
d6edae7090 nv50/ir: fix prog info init
Left over from the pre-mainline tess support. Adapt to use the new
defines.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-05-22 11:58:22 -04:00
Ilia Mirkin
035b1097db nvc0/ir: return 0 for gl_TessCoord.z for non-triangles modes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-05-22 11:58:22 -04:00
Matt Turner
bdc9c20df0 mesa: Unlock mutex on error path.
Caught by Coverity (CID 1362021). Caused by commit 015f2207c.
2016-05-22 07:01:35 -07:00
Timothy Arceri
a83e9afbe4 i965: remove redundant NULL check
We would have segfaulted in the above code if prog could be NULL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-22 23:08:08 +10:00
Eduardo Lima Mitev
7dce4793b7 anv/nir_apply_pipeline_layout: Pass the nir_src from the nir_tex_src
nir_instr_rewrite_src() expects a nir_src and it is currently being fed a
nir_tex_src. This will crash something.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-21 19:57:31 +02:00
Samuel Pitoiset
30b93141aa nvc0: expose GLSL version 420 on GF100
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:33:06 +02:00
Samuel Pitoiset
d04050071d nvc0: enable ARB_shader_image_load_store on GF100
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:33:03 +02:00
Samuel Pitoiset
362e17a712 nvc0/ir: add a lowering pass for surfaces on Fermi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:32:58 +02:00
Samuel Pitoiset
b663db44ba nvc0/ir: add emission for SULDB and SUSTx
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:32:56 +02:00
Samuel Pitoiset
cd88d1a171 nvc0/ir: add emission for OP_SULEA
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:32:54 +02:00
Samuel Pitoiset
8aa1fd321d nv50/ir: fix tex constraints for surface coords on Fermi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:32:49 +02:00
Ilia Mirkin
be4caaf247 nv50/ir: use moveSources to condense sources
This makes sure that rIndirectSrc and other things stay updated.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-21 18:32:46 +02:00
Samuel Pitoiset
879bd2ea0c nvc0: bind images on fragment and compute shaders for Fermi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 18:32:41 +02:00
Samuel Pitoiset
e7d2ef42a5 nvc0/ir: don't check the format for surface stores on Kepler
Initially to make sure the format doesn't mismatch and won't produce
out-of-bounds access, we checked that both formats have exactly the same
number of bytes, but this should not be checked for type stores.

This fixes serious rendering issues in the UE4 demos (tested with
realistic and reflections).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 16:50:28 +02:00
Samuel Pitoiset
5e32cc9192 nv50/ir: fix a comment in canDualIssue()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 16:50:25 +02:00
Samuel Pitoiset
70834d05cd nv50/ir: fix SUSTx constraints on Kepler
To prevent out-of-bounds access and format mismatch we add a predicate
on sustp, but we have to account for it when the sources are condensed
because a predicate is a source. Using the range 3:6 will only condense
the input data and it's always the case. This also fixes constraints
when an indirect access is used.

This ensures that sources are correctly aligned.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-21 16:06:14 +02:00
Kenneth Graunke
9c0d16adc1 i965: Just read the existing tally on EndTransformFeedback if paused.
If the transform feedback object is paused when ending, then there are
no new snapshots to add to the tally.  In fact, we haven't written a
starting snapshot, so we'd best not try and compute (end - start).

Just load the existing tally so we can convert it to the number of
vertices written and store it to the final result location.

This is the Haswell+ equivalent of the previous commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-20 19:55:42 -07:00
Kenneth Graunke
915f7c25fa i965: Don't write a counter snapshot on EndTransformFeedback if paused.
If the transform feedback object is paused, then we've already written
an ending counter snapshot.  We don't want to write another one.

This fixes assertions in GL33-CTS.transform_feedback.api_errors_test,
which calls EndTransformfeedback after PauseTransformFeedback.  On the
next BeginTransformFeedback, we tried to tally up the results, and saw
an odd number of snapshots (due to the double-end), and tripped an
assertion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-20 19:55:40 -07:00
Kenneth Graunke
47fbe178fa mesa: Call TransformFeedback driver hooks before setting flags.
This way, the driver's EndTransformFeedback() hook can tell whether the
transform feedback operation was paused.  It's also convenient to have
Paused remain false until the driver's PauseTransformFeedback hook
finishes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-20 19:55:26 -07:00
Kenneth Graunke
f7eb95a526 nir: Fix crash in nir_lower_wpos_center().
Otherwise we rewrote the fadd to use itself, causing crashes in
validation.  Instead, start after the last use like we should.

A brown paper bag fix.  Fixes crashes in several Vulkan tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-20 16:33:24 -07:00
Dave Airlie
0970c563d6 nir: remove dead glsl variables before lowering io.
For cull distance GLSL will let unsized unused arrays get
into the backend, we should nuke those straight away, to
save caring about them later.

This fixes:
arb_separate_shader_objects/linker/large-number-of-unused-varyings
as a side effect (even without culling changes).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-21 08:56:45 +10:00
Kenneth Graunke
de45da6a8c spirv: Handle the PixelCenterInteger execution mode.
This isn't allowed by Vulkan, but might be useful someday for
SPIR-V in OpenGL (if that ever becomes a thing).  It's easy enough
to hook up, and as precedent, we already do so for OriginLowerLeft.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 14:44:22 -07:00
Kenneth Graunke
9b8b3f7501 i965: Delete dead dFdy flipping code.
Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case
of what I expected, so we always take the negate_value case.  It doesn't
really matter.

v2: Write src0 before src1 in ADD instructions (requested by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 14:30:09 -07:00
Kenneth Graunke
08bc74e694 i965: Delete brw_wm_prog_key::render_to_fbo and drawable_height.
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.

This patch actually eliminates the associated recompiles.  The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 14:30:09 -07:00
Kenneth Graunke
dac10e8a13 i965, anv: Use NIR FragCoord re-center and y-transform passes.
This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.

This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height.  This led to a lot of
recompiles in many applications.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 14:30:08 -07:00
Kenneth Graunke
6e5d86c07a nir: Add a simple nir_lower_wpos_center() pass for Vulkan drivers.
nir_lower_wpos_ytransform() is great for OpenGL, which allows
applications to choose whether their coordinate system's origin is
upper left/lower left, and whether the pixel center should be on
integer/half-integer boundaries.

Vulkan, however, has much simpler requirements: the pixel center
is always half-integer, and the origin is always upper left.  No
coordinate transform is needed - we just need to add <0.5, 0.5>.
This means that we can avoid using (and setting up) a uniform.

I thought about adding more options to nir_lower_wpos_ytransform(),
but making a new pass that never even touched uniforms seemed simpler.

v2: Use normal iterator rather than _safe variant (noticed by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:30:00 -07:00
Kenneth Graunke
12ab7fc6ac nir: Don't use ffma in nir_lower_wpos_ytransform().
ffma is an explicitly fused multiply add with higher precision.
The optimizer will take care of promoting mul/add to fma when
it's beneficial to do so.

This fixes failures on Gen4-5 when using this pass, as those platforms
don't actually implement fma().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
b8b1b1c34c nir: Handle fddy_fine and fddy_coarse in nir_lower_wpos_ytransform.
These also need flipping!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
4b7577fad8 nir: Make lower_wpos_ytransform_block a void function.
The return value was used for the old nir_foreach_block callback system,
but at this point it no longer means anything.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
88ea960aa7 nir: Make nir_lower_wpos_ytransform() match FragCoord by location.
gl_FragCoord is a shader input with location == VARYING_SLOT_POS.
ARB_fragment_programs have an equivalent input at VARYING_SLOT_POS,
but it isn't called gl_FragCoord.  We do want to transform it.

Matching by location guarantees we catch both.

Fixes several fp tests on a branch which uses this pass on i965.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
c9192fcbd2 nir: Add interp_var_at_offset flipping.
The Y-offset needs flipping as well, similar to ddy.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
287f099db1 nir: Fix fddy swizzles in nir_lower_wpos_ytransform().
The original value might have been swizzled.  That's taken care of in
the fmul source - we don't want to reswizzle it again.

Fixes validation failures in glsl-derivs-varyings on a branch of mine
which uses this pass in i965.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:29:04 -07:00
Kenneth Graunke
7fe9a19302 nir: Fix wpos_ytransform lowering state_slot swizzle.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:28:30 -07:00
Kenneth Graunke
1539009bf0 i965: Fix brw_regs_equal() for NaN and positive/negative zero.
We'd like the comparisons to mean "the exact same bits".  Comparing
doubles won't do that for NaN values or positive vs. negative zero.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 14:28:06 -07:00
Dave Airlie
b19a0d506d virgl: handle cull distance cap.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-21 06:19:54 +10:00
Rob Herring
2235b80f2a virgl: Add missing texture transfer_inline_write
transfer_inline_write cannot be NULL and the virgl renderer doesn't support
inline writes for textures, so add the default version.

This fixes a crash in st_TexSubImage since commit fb9fe352ea ("st/mesa:
use transfer_inline_write for memcpy TexSubImage path").

Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-21 06:07:18 +10:00
Kristian Høgsberg Kristensen
12dc89d844 anv: Merge in my TODO list items
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2016-05-20 10:35:57 -07:00
Matt Turner
015f2207cf mesa: Replace uses of Shared->Mutex with hash-table mutexes
We were locking the Shared->Mutex and then using calling functions like
_mesa_HashInsert that do additional per-hash-table locking internally.

Instead just lock each hash-table's mutex and use functions like
_mesa_HashInsertLocked and the new _mesa_HashRemoveLocked.

In order to do this, we need to remove the locking from
_mesa_HashFindFreeKeyBlock since it will always be called with the
per-hash-table lock taken.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-20 10:05:09 -07:00
Matt Turner
aded1160e5 hash: Add _mesa_HashRemoveLocked() function.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-20 10:05:09 -07:00
Matt Turner
fb5dcb81cc i965: Pass nir_src/nir_dest by reference.
Cuts 6K of .text.

   text    data     bss     dec     hex filename
5772372  264648   29320 6066340  5c90a4 lib/i965_dri.so before
5766074  264648   29320 6060042  5c780a lib/i965_dri.so after

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 10:04:06 -07:00
Mark Janes
9ca5ec2a31 glsl: Guard against NULL dereference
This trivially corrects mesa 3ca1c221, which introduced a check that
crashes when a match is not found.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95005
Fixes: piglit.spec.glsl-1_50.compiler.interface-blocks-name-reused-globally-4.vert
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-20 09:52:49 -07:00
Nanley Chery
9b8c4000d0 anv: Enable textureCompressionASTC_LDR on Gen9+
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 09:27:11 -07:00
Nanley Chery
0d2847e177 anv/format: Reorder ASTC mappings to match ISL enum ordering
Keep the lists consistent for ease of use.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 09:27:11 -07:00
Nanley Chery
f3ed3a0a15 genxml: Expand SKL's SurfaceFormat field width for ASTC
In the expanded field, only ASTC format enums have the MSB set to 1.
Expanding the field width makes the process of handling these formats
identical to the way other formats are handled.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 09:27:11 -07:00
Nanley Chery
a141576887 isl: Handle npot ASTC block dimensions on Gen9+
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 09:27:11 -07:00
Nanley Chery
de86fb875d isl: Add 2D ASTC format layouts and enums
Also, make changes needed for successful compilation and registration
as a texture compression mode.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-20 09:27:11 -07:00
Youry Metlitsky
4e2c9a0435 mesa: Build EGL without X11 headers after interop patchset
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-20 08:44:18 -07:00
Rob Clark
df361fc58c nir/validate: assume() that hashtable entry exists
At this point, it would require a logic error in nir_validate to not
have already populated this hashtable entry, but coverity doesn't
realize that:

CID 1265547 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.

CID 1271039 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 11:13:50 -04:00
Rob Clark
fcd6b3f42b nir: coverity unitialized pointer read
Not sure how coverity arrives at the conclusion that we can read comp[j]
unitialized (around line 204), other than not being aware that ncomp is
greater than 1 so it won't underflow in the 'if (tex->is_array)' case.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 11:13:50 -04:00
Rob Clark
53c48feae0 nir: coverity sign-extension fix
Not 100% sure, but I think being an unsigned literal will help:

CID 1358505 (#1 of 1): Unintended sign extension
(SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension:
load1->def.num_components with type unsigned char (8 bits, unsigned) is
promoted in load1->def.num_components * (load1->def.bit_size / 8) to
type int (32 bits, signed), then sign-extended to type unsigned long (64
bits, unsigned). If load1->def.num_components * (load1->def.bit_size /
8) is greater than 0x7FFFFFFF, the upper bits of the result will all be
1.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-20 11:13:50 -04:00
Rob Clark
bb993da795 nir/glsl_to_nir: quell some uninit_member coverity errors
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-05-20 11:13:50 -04:00
Rob Clark
3a1bbd6a0a freedreno/ir3: need to lower fmod too
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-20 11:13:50 -04:00
Mark Janes
a2d28ddc01 i965: Fix strerror error code sign
This trivial fix to error-handling corrects the sign of drm error
codes before passing them to strerror.

Identified by Coverity: CID1358581
2016-05-20 05:58:18 -07:00
Jason Ekstrand
eb384daae8 nir/spirv: Handle the NonReadable decoration on struct members 2016-05-19 21:18:59 -07:00
Jason Ekstrand
ea8c11fdc2 anv/pipeline: Bounds-check resource indices when robuts_buffer_access is enabled 2016-05-19 21:18:59 -07:00
Jason Ekstrand
902628bce6 anv/pipeline: Only do buffer bounds checks if robustBufferAccess is enabled 2016-05-19 21:18:59 -07:00
Jason Ekstrand
23090b51e0 anv/apply_dynamic_offsets: Use rewrite_src instead of a regular assignment
Originally we removed the instruction, changed the source, and then
re-inserted it.  This works, but nir_instr_rewrite_src is a bit more
obviously correct.
2016-05-19 21:18:59 -07:00
Jason Ekstrand
c29ffea6d1 anv/device: Add a boolean for robust buffer access 2016-05-19 21:18:59 -07:00
Jason Ekstrand
d5b4638d6a anv: Add a TODO file 2016-05-19 20:09:31 -07:00
Dave Airlie
3ca1c2216d glsl: handle same struct redeclaration (v2)
This works around a bug in older version of UE4, where a shader
defines the same structure twice. Although we aren't sure this is correct
GLSL (it most likely isn't) there are enough UE4 based things out there
we should deal with this.

This drops the error to a warning if the struct names and contents match.

v1.1: do better C++ on record_compare declaration (Rob)
v2: restrict this to desktop GL only (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95005
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-20 11:22:52 +10:00
Matt Turner
8a65b5135a i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.
Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.

total instructions in shared programs: 8923452 -> 8904108 (-0.22%)
instructions in affected programs: 814563 -> 795219 (-2.37%)
helped: 3336
HURT: 10

total cycles in shared programs: 66970734 -> 66651476 (-0.48%)
cycles in affected programs: 10582686 -> 10263428 (-3.02%)
helped: 2438
HURT: 691

total spills in shared programs: 1811 -> 1789 (-1.21%)
spills in affected programs: 85 -> 63 (-25.88%)
helped: 4

total fills in shared programs: 3143 -> 3109 (-1.08%)
fills in affected programs: 167 -> 133 (-20.36%)
helped: 4

LOST:   2
GAINED: 36

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-19 17:27:49 -07:00
Matt Turner
75dccf5ac2 i965: Add infrastucture for sample lod-zero operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-19 17:27:49 -07:00
Matt Turner
07353599e0 i965/fs: Add and use get_nir_src_imm().
The next patch wants to inspect the LOD argument and do something
different if it's 0.0f. But at that point we've emitted a MOV for it and
we just have a register to look at.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-19 17:27:49 -07:00
Ilia Mirkin
8bf5493899 nvc0: account for shader-allocated local memory needs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-19 20:20:23 -04:00
Ilia Mirkin
5c6b8cc7d0 nv50/ir: treat addresses as local
Address registers are always loaded right before use. Don't treat them
as "global", which will cause them to be put into the function's
linkage, and will make the register allocator hold onto that
register until the end of the function.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-19 20:20:23 -04:00
Tim Rowley
65c2abf6fd swr: [rasterizer] utility functions for shared libs
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:27:18 -05:00
Tim Rowley
6deb9f7f2c swr: [rasterizer jitter] fix assert in AVX implementation of MASKLOADD
llvm changed the mask type to vector of ints with 3.8.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:27:12 -05:00
Tim Rowley
600528168b swr: [rasterizer core] apply KNOB_TOSS_DRAW to more functions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:27:06 -05:00
Tim Rowley
6d212cccf0 swr: [rasterizer jitter] add instancing to non-gather fetch path
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:27:01 -05:00
Tim Rowley
63d7ed835a swr: [rasterizer core] move MultisampleTrait static from header to cpp
Move a MultisampleTrait static from header to cpp as clang seemed to get
confused with some specializations in the header vs some in cpp.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:54 -05:00
Tim Rowley
c969ef2d42 swr: [rasterizer core] clang override for _mm_undefined*
Not supported in older xcode versions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:49 -05:00
Tim Rowley
da75160039 swr: [rasterizer common] add OSX to unix portability sections
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:44 -05:00
Tim Rowley
4997169779 swr: [rasterizer] rename _aligned_malloc to AlignedMalloc
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:38 -05:00
Tim Rowley
2e4ef23523 swr: [rasterizer jitter] rename MEMCPY function to MEMCOPY
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:30 -05:00
Tim Rowley
aebbd2f7dd swr: [rasterizer common] guard definition of __cdecl/__stdcall
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:24 -05:00
Tim Rowley
82e335ce67 swr: [rasterizer common] include cstddef for offsetof
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:19 -05:00
Tim Rowley
759d8cf3a3 swr: [rasterizer core] removed tabs that snuck in
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:14 -05:00
Tim Rowley
8e39d410f1 swr: [rasterizer core] code style cleanup
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:08 -05:00
Tim Rowley
b914217c25 swr: [rasterizer core] add dummy code for cygwin build
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:26:02 -05:00
Tim Rowley
a0747c4ce3 swr: [rasterizer core] move variable query outside loop
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:25:54 -05:00
Tim Rowley
f2a1f894ba swr: [rasterizer core] utility function for getenv
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:25:48 -05:00
Tim Rowley
4a58b21ef7 swr: [rasterizer common] portable threadviz buckets
Output with slashes instead of backslashes for unix/linux.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:25:30 -05:00
Tim Rowley
2031baffb5 swr: [rasterizer common] foreground win32 assert dialog
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:25:24 -05:00
Tim Rowley
33d4c2c798 swr: [rasterizer core] use parens to disambiguate operator precedence
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 16:25:06 -05:00
Tim Rowley
9475251145 swr: standardize linkage and check for unresolved symbols
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-19 13:27:33 -05:00
Tim Rowley
6423004d85 swr: fix swr linkage so that static llvm works
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-19 13:27:33 -05:00
Tim Rowley
8987460b9e swr: PIPE_CAP_CULL_DISTANCE cap request response
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 13:27:33 -05:00
Tim Rowley
78572c9b0b docs: add swr to GL3.txt
v2: not on gl3.3 list until gl3.2 is complete

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-19 13:27:17 -05:00
Leo Liu
2f90d11d86 st/va: use drm render node for wayland display type
With xwayland, vainfo use VA_DISPLAY_WAYLAND as default and it fails
and fails when specify display with  `vainfo --display wayland`.
In fact wayland support for libva uses drm path to connect device,
and should use drm pipe loader to create screen.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-19 09:40:33 -04:00
Marek Olšák
f6742859b7 gallium/radeon: small cleanups in r600_texture_transfer_map
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-19 12:35:50 +02:00
Marek Olšák
54737aabb9 gallium/radeon: don't set PB_USAGE in winsyses
There is no point.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-19 12:35:50 +02:00
Marek Olšák
f330b7a14f gallium/radeon: handle VRAM_GTT placements as having slow CPU reads
not sure if we should include GTT WC too

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-19 12:35:50 +02:00
Marek Olšák
5e14d0ac2c gallium/radeon: ignore PIPE_TRANSFER_MAP_DIRECTLY
Only st/xa is using this, which is irrelevant to us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-19 12:35:50 +02:00
Marek Olšák
51cf04cf0e radeonsi: add a workaround for a bug in LLVM <= 3.8
This is not directly applicable to stable and needs to be backported.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-19 12:35:50 +02:00
Eduardo Lima Mitev
7671687713 i965/fs: Silence warnings related to use of uninitialized values
brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...]
brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...]
       prog_data->base.dispatch_grf_start_reg = simd16_grf_start;

brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here
    uint8_t simd8_grf_start, simd16_grf_start;

brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...]
       prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used);

brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here
    unsigned simd8_grf_used, simd16_grf_used;

(and more)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-19 09:05:18 +02:00
Eric Anholt
a507dcc160 vc4: Size transfer temporary mappings appropriately for full maps of 3D.
We don't really support reading/writing of 3D textures since the hardware
doesn't do 3D, but we do need to make sure that a pipe_transfer for them
has enough space to store the image.  This was previously not a problem
because the state tracker only mapped a slice at a time until
fb9fe352ea.  Fixes glean glsl1 tests, which
all have setup of a 3D texture at the start.
2016-05-18 17:30:07 -07:00
Nanley Chery
7ac08adfb4 anv/device: Fix viewportBoundsRange
Align with the spec requirement that the range must be at least
[−2 × maxViewportDimensions, 2 × maxViewportDimensions − 1]. Our
hardware supports this.

Fixes dEQP-VK.api.info.device.properties

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-18 16:01:50 -07:00
Dave Airlie
61b6789252 glsl/linker: attempt to match anonymous structures at link
This is my attempt at fixing at least one of the UE4 bugs with GL4.3.

If we are doing intrastage matching and hit anonymous structs, then
we should do a record comparison instead of using the names.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95005
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-19 08:16:50 +10:00
Mark Janes
4dfa89e33c anv/batch_chain: free pointers for error cases
Trivial fix to improperly handled cleanup during
VK_ERROR_OUT_OF_HOST_MEMORY.

Identified by Coverity: CID 1358908 and 1358909
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-18 15:14:22 -07:00
Wang He
f21b7d1e5c st/nine: Minor change to support musl libc
A few changes to support musl libc as well.

In particular fpu_control.h is glibc specific.
fenv.h doesn't enable to do exactly what we want either,
so instead use assembly directly.

Signed-off-by: Wang He <xw897002528@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
de39231134 st/nine: Enable D3DPMISCCAPS_PERSTAGECONSTANT
Nine already supports the feature.
There are no failing WINE tests for per stage constants.
Enabling D3DPMISCCAPS_PERSTAGECONSTANT as it fixes
https://github.com/iXit/Mesa-3D/issues/205

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
839f417634 st/nine: Turn on thread_submit by default when on different device
The last remaining issues with thread_submit have been resolved,
thus turn it when on a different device (the case where is is
beneficial).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
9cae3cdc89 st/nine: Fix usage of rasterizer multisample bit.
pipe_rasterizer multisample bit should be enabled only when really
wanting to do multisampling, thus we should disable when not having
msaa render target.
This fixes some depth calculation precision issues on radeon.
Also disable it when depth and stencil tests are disabled, since in that
case multisampling is same as not multisampled.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
f297e7de0f st/nine: ATOC has effect only with ALPHATESTENABLE
ATOC extension does something only when alpha test is enabled.
Use a second bit to encode the difference with ATIATOC.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
edc5cdced5 st/nine: Add debug string for ATOC
We were missing a debug string for this format.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
4e89dcf0c4 st/nine: Add asserts for output/input packing
Nine doesn't support vs output/ps input packing.
We haven't found any application requiring that,
and implementing it properly is complex.

Add asserts for now.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
aeddda0c3a st/nine: Use correct PIPE_HANDLE_USAGE flag for frontbuffer copy
When taking screenshots we do a copy from the frontbuffer
to an allocated buffer (which we then copy to a ram buffer).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
ca7c78a88e st/nine: Fix output shift calculation
We were getting it wrong for negative values.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
b8d95d4087 st/nine: Fix CheckDeviceFormat advertising for surfaces
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
6ef231c80f st/nine: Improve buffer placement
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
7639033973 st/nine: Fix buffer bind flags
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
0f6e31823d st/nine: Fix buffer locking flags handling
Our behaviour was not entirely similar to what
the docs and our tests describe.

Drop d3dlock_buffer_to_pipe_transfer_usage.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
f45b9894e5 st/nine: Improve logging
Add missing DBG calls in dtors.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
f3fa7e3068 st/nine: Use WINE thread for threadpool
Use present interface 1.2 function ID3DPresent_CreateThread
to create the thread for threadpool.
Creating the thread with WINE prevents some rarely occuring crashes.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
72be473ad1 st/nine: Don't present if window is occluded
The problem is that if one d3d present call fails,
because of our occlusion check in present method,
the next presentation call will send the same pixmap to the Xserver again,
without waiting it is released, which is wrong.

Move the present call after occlusion check to return and prevent
Xpixmaps errors.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
c673c46ccf st/nine: Use new function to query for resolution mismatch
Any third party app might change the current screen resolution.
Poll for resolution mismatch to force a device reset.
Required for non ex devices only.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Patrick Rudolph
dae9a91727 st/nine: Implement IPresent version 1.2
Implement presentation interface version 1.2:
* ID3DPresent_ResolutionMismatch
    Poll for resolution mismatch.
    A third party app might have changed resolution,
    which requires a device reset.
* ID3DPresent_CreateThread
    Create a thread in WINE to allow nine to use Windows API
    functions. Required for multi-threaded presentation.
    In single-threaded presentation mode the calling thread is
    already known to WINE.
* ID3DPresent_WaitForThread
    Wait for a wine thread to terminate.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
2e149a2bf0 st/nine: Implement BumpEnvMap for ff
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
c4e85202cb st/nine: Format conversion for volumes in UpdateTexture
We were doing the conversion for surfaces, but not yet
volumes. Now that volumes can do conversion, use it.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
23e2a235dc st/nine: Remove one useless function output
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
10e548c0c9 st/nine: Add support for X8L8V8U8
X8L8V8U8 support should be common. Some more recent cards
do support this format, but not L6V5U5.

Add fallback for this format to have it alwaus supported.

L6V5U5 conversion rule apparently differs a bit from the normal
spec, and thus the gallium equivalent format leads to slightly
wrong colors. Since some recent cards do not support it, do not
support it either.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
258ca1823c st/nine: Add format fallback with conversion to volumes
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
755fbcdf24 st/nine: Add format fallback with conversion to surfaces
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
52cb8e33c3 gallium/util: Implement util_format_translate_3d
This is the equivalent of util_format_translate, but for volumes.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
89344a80fc st/nine: Fix Pointsize in programmable shader
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
ae0fdd8a40 st/nine: Fix ff pointscale computation
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
c4af309973 st/nine: Fix header of GetIndices
There is a mistake in the online documentation,
the function only has 2 arguments.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
3e9d01ff39 st/nine: Increase minor d3dadapter9drm ABI
Version 0.1 allows to assume that the second
element of the IDirect3D* structures will
be a pointer to the internal nine vtable.

This is useful if the gallium nine user wants
to wrap some interfaces.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
2d51c817cd st/nine: Fix leak after ctor failures
Previously ctor failures would not unreference
the device.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
7fc8391d23 st/nine: Add ColorFill test for compressed textures
ColorFill should contain alignment checks
for compressed textures.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
d11d913987 st/nine: PositionT and Tessfactor are forbidden as PS input
According to wine tests, they are forbidden as PS input,
which makes sense.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
44068af92e st/nine: Fix some shader failures not triggering error
Some failures during shader translation would not
raise errors before this patch.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
a77d8cd710 st/nine: Forbid POSITION0 for PS3.0
POSITION0 input is forbidden for PS3.0 apparently.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
217d969746 st/nine: Rework UpdateTexture Checks
Our code did match the user documentation of the function
quite well (except for format check).

However the DDI documentation and wine tests show that
documentation was not correct. Thus adapt our code to
fit the best possible to the -real- spec.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
4c77673de7 st/nine: Use bufs instead of Flags for Clear
bufs doesn't contain depthstencil if
there is z buffer mismatch. This is the behaviour
we want.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
f7c3d27d18 d3dadapter9: Add ddebug, rbug and trace support
Add support for ddebug, rbug and trace

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Axel Davy
0ae3c8ece7 radeon: Change AA sample locations for EG+
This sets the AA location to the d3d11
spec.
EG/NI 8X MSAA is left as is. Not sure
why it was set different to Cayman, so
lets it as is.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Axel Davy
11e4987135 radeonsi: Mixed colorbuffer formats are unsupported
Besides depth/stencil, the hardware doesn't support
mixed formats.

The GL state tracker doesn't make use of them.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Axel Davy
fc3533c088 radeonsi: Change default behaviour for undefined COLOR0
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Axel Davy
a221f40dbb r600g: Change default behaviour for undefined COLOR0
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Axel Davy
7e05e4c388 r600: Change default behaviour for undefined COLOR0
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 23:37:14 +02:00
Christian Schmidbauer
f5d6ed5702 st/nine: Clean up WINAPI definition
As Emil pointed out, only gcc, clang and MSVC compatibility is required.
Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are
only defined for gcc/clang, not for MSVC.

Remove the #undef which has been there for historic reasons, when wine
dlls for nine have been built inside mesa. Instead use #ifndef in order
to avoid redefining WINAPI from MSVC's headers.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Brian Paul
243fd02858 svga: add another debug_printf() in svga_screen_create()
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-18 14:58:35 -06:00
Brian Paul
96909ef128 spirv: add switch case for nir_texop_txf_ms_mcs in vtn_handle_texture()
Mark it as unreachable.  Silences a compiler warning:

spirv/spirv_to_nir.c:1397:4: warning: enumeration value
'nir_texop_txf_ms_mcs' not handled in switch [-Wswitch]
    switch (instr->op) {
    ^

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-05-18 14:57:45 -06:00
Matt Turner
9c290b1e54 Revert "i965/urb: fixes division by zero"
This reverts commit 2a8aa1e3de.
2016-05-18 12:48:50 -07:00
Ardinartsev Nikita
2a8aa1e3de i965/urb: fixes division by zero
Fixes regression introduced by af5ca43f26

Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
2016-05-18 11:09:37 -07:00
Matt Turner
caab3cd536 mesa: fclose() filename on error.
Pretty useless, as it's in debugging code. Found by Coverity (CID
1257016).
2016-05-18 11:09:37 -07:00
Matt Turner
cbb0e3a7e8 i965/fs: Assert that nir_op_extract_*'s src1 is a constant. 2016-05-18 11:09:37 -07:00
Matt Turner
6a4ff51f7a glsl: Check that layout is non-null before dereferencing.
layout should only be null for structs, but it's checked everywhere else
and confuses Coverity (CID 1358495).

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 11:09:37 -07:00
Matt Turner
53f64a8404 egl/dri2: Don't check return result of mtx_unlock().
Coverity (CID 1358496) warns that the cleanup code doesn't unlock the
mutex (which is arguably kind of stupid, since the only case that can
happen is when mtx_unlock() failed!). But, mtx_unlock() isn't going to
fail -- the mutex was locked by this thread just a few lines above it.
2016-05-18 11:09:37 -07:00
Matt Turner
b1e6d069da spirv: Properly size the src[] array.
Operations like nir_op_bitfield_insert have four arguments, and Coverity
isn't privy to the fact that 4-argument operations aren't possible here,
so it thinks this can lead to memory corruption. Just increase the size
of the array to quell any fears.
2016-05-18 11:09:37 -07:00
Matt Turner
0a548eb56f isl: Mark default cases in switch unreachable.
To silence -Wmaybe-uninitialized warnings.
2016-05-18 11:09:37 -07:00
Ian Romanick
7619aed41d glsl/linker: Ensure the first stage of an SSO pipeline has input locs assigned
Previously an SSO pipeline containing only a tessellation control shader
and a tessellation evaluation shader would not get locations assigned
for the TCS inputs.  This would lead to assertion failures in some
piglit tests, such as arb_program_interface_query-resource-query.

That piglit test still fails on some tessellation related subtests.
Specifically, these subtests fail:

'GL_PROGRAM_INPUT(tcs) active resources' expected 2 but got 3
'GL_PROGRAM_INPUT(tcs) max length name' expected 12 but got 16
'GL_PROGRAM_INPUT(tcs,tes) active resources' expected 2 but got 3
'GL_PROGRAM_INPUT(tcs,tes) max length name' expected 12 but got 16
'GL_PROGRAM_OUTPUT(tcs) active resources' expected 15 but got 3
'GL_PROGRAM_OUTPUT(tcs) max length name' expected 23 but got 12

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-05-18 10:53:50 -07:00
Ian Romanick
79bbff9def glsl/linker: Don't include interface name for built-in blocks
Commit 11096ec introduced a regression in some piglit tests (e.g.,
arb_program_interface_query-resource-query).  I did not notice this
regression because other (unrelated) problems caused failed assertions
in those same tests on my system... so they crashed before getting to
the new failure.

v2: Use is_gl_identifier.  Suggested by Tim.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-05-18 10:53:34 -07:00
Ian Romanick
2ef4b5bc93 glsl: Assert that inputs have a location assigned
This catches a problem previously undetected until deep in the backend.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Ian Romanick
cf9220b11f glsl/linker: Fix trivial typos in comments
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Ian Romanick
d2579728c9 glsl/linker: Fix some formatting to match current coding conventions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Ian Romanick
02e4753777 glsl/linker: Silence unused parameter warning
The use of the parameter was removed in d6b92028.

glsl/link_varyings.cpp:1390:39: warning: unused parameter ‘separate_shader’ [-Wunused-parameter]
                                   bool separate_shader)
                                       ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Ian Romanick
75c9aa6670 glsl/linker: Silence unused parameter warning
The parameter appears to have been unused since the function was added
in commit 12ba6cfb.  Remove it.

glsl/linker.cpp:2886:60: warning: unused parameter ‘prog’ [-Wunused-parameter]
 match_explicit_outputs_to_inputs(struct gl_shader_program *prog,
                                                            ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Ian Romanick
f687b8e178 i965: Silence unused parameter warnings
The only place that actually used the type parameter was the GS visitor,
and it was always passed glsl_type::int.  Just remove the parameter.

brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter]
                                            const glsl_type *type)
                                                             ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-18 10:53:34 -07:00
Daniel Scharrer
1d628ea09d mesa: Don't advertise GLES 3.1 without compute support
The MaxComputeWorkGroupInvocations constant is used in
compute_version_es2() instead of extensions->ARB_compute_shader
as ES has lower requirements than desktop GL.

Both i965 and gallium set this constant before enabling compute support.

Signed-off-by: Daniel Scharrer <daniel@constexpr.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-18 18:21:21 +02:00
Rob Clark
5827a1dc4b mesa/st: don't leak name
Pointed out by coverity.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-18 09:20:22 -04:00
Brian Paul
877a8026c7 svga: null out all sampler views if start=num=0
Because the CSO module handles sampler views for fragment shaders
differently than vertex/geom shaders, VS/GS shader sampler views
aren't explicitly unbound like for FS sampler vers.  This code
checks for the case of start=num=0 and nulls out the sampler views.
Fixes a assert regression in piglit's arb_texture_multisample-
sample-position test.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-17 19:20:36 -06:00
Brian Paul
fe430b0310 st/mesa: remove unused st_context::default_texture
The code which used this was removed quite a while ago.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-17 19:20:36 -06:00
Brian Paul
5888c47cc9 cso: remove / add some comments
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-17 19:20:36 -06:00
Eric Anholt
18260d0582 vc4: Add support for vertex color clamping in the rasterizer.
This gets us precompile of vertex shaders at the state tracker level as
well.
2016-05-17 18:09:58 -07:00
Eric Anholt
474e2bbcc1 vc4: Move tgsi_to_nir to precompile time.
Now we have an immutable nir shader in our shader's CSO that we can clone
and lower/optimize.
2016-05-17 18:07:39 -07:00
Eric Anholt
734fe41092 vc4: Mark the driver as supporting fragment color clamping in rast.
We always clamp fragment colors, since they're always 8-bit unorm, so
there's no need to have us compile separate shaders based on
GL_ARB_color_buffer_float.  This gives us precompilation of fragment
programs to the vc4_shader_state_create() level.
2016-05-17 18:07:39 -07:00
Eric Anholt
8835eb689b vc4: Enable sharing shaders across contexts.
This allows the same pipe_shader_state to be referenced from multiple
contexts.  Since our pipe_shader_state is treated as immutable (other than
the variant number) within the driver, this is no problem.
2016-05-17 18:07:39 -07:00
Eric Anholt
62087cb9b8 vc4: Switch to using nir_load_front_face.
This will be generated by glsl_to_nir, and it turns out that this is a
more code-efficient path than the floating point math, anyway.

No change on shader-db, but drops an instruction in piglit's
glsl-fs-frontfacing.
2016-05-17 18:07:39 -07:00
Eric Anholt
0700e4c0c7 vc4: Drop the dead export_linkage array.
This came from deriving from freedreno.
2016-05-17 18:07:39 -07:00
Eric Anholt
24e7e3d3fc vc4: Fix a -Wformat-security warning.
This is apparently enabled as an error in Android builds, and the compiler
can't tell that the return value is safe.
2016-05-17 18:07:39 -07:00
Alex Deucher
86f51d7958 radeonsi: add new polaris11 pci ids
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-17 17:49:50 -04:00
Alex Deucher
768320b497 radeonsi: add new polaris10 pci ids
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-17 17:49:50 -04:00
Kenneth Graunke
dc657a8201 i965: Make brw_reg_from_fs_reg() halve exec_size when compressed.
In a5d7e144ea, Connor generalized the
exec_size halving code to handle more cases.  As part of this, he made
it not halve anything if the region accessed falls completely in a
single register.

Unfortunately, it started producing some invalid regions:

-add(16)  g6<1>F  g10<8,8,1>UW    -g1<0,1,0>F    { align1 compr };
-add(16)  g8<1>F  g12<8,8,1>UW    -g1.1<0,1,0>F  { align1 compr };
+add(16)  g6<1>F  g10<16,16,1>UW  -g1<0,1,0>F    { align1 compr };
+add(16)  g8<1>F  g12<16,16,1>UW  -g1.1<0,1,0>F  { align1 compr };

Here, the UW source region completely fits within a register.  However,
we have to use instruction compression because the destination region
spans two registers.  <16,16,1> is invalid because it's compressed.

To handle this, skip the "everything fits in one register" case and
fall through to the exec_size halving case when compressed.

Fixes hundreds of Piglit regressions on GM965.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-17 14:40:37 -07:00
Kenneth Graunke
062ad81669 i965: Move compression decisions before brw_reg_from_fs_reg().
brw_reg_from_fs_reg() needs to know whether the instruction will be
compressed or not.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-17 14:40:31 -07:00
Kenneth Graunke
9a1936d965 i965: Enable ES 3.2 sample shading extensions.
This enables:
- GL_OES_sample_shading
- GL_OES_sample_variables
- GL_OES_shader_multisample_interpolation

On Gen8, we pass all the CTS tests, and all but 4 of the dEQP-GLES31
tests (dealing with 1x/2x MSAA at half rate sampling).  We believe
those 4 dEQP-GLES31 tests are incorrect.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-17 14:27:29 -07:00
Jordan Justen
1ff212bfd3 anv: Fix warning: unused variable ‘cs_prog_data’
This was introduced in 8a80af2820.

Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-17 14:09:56 -07:00
Mauro Rossi
0e81336550 android: fix building error in libmesa_st_mesa
Fixes the following building error due to libmesa_nir dependency:

In file included from external/mesa/src/mesa/state_tracker/st_glsl_to_nir.cpp:44:0:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
 #include "nir_opcodes.h"
                         ^
compilation terminated.
build/core/binary.mk:706: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-17 17:07:28 -04:00
Nicolai Hähnle
941756f092 radeonsi: force level zero on image instructions in non-fragment shaders (v2)
Section 8.9 (Texture Functions) of the OpenGL Shading Language 4.5
specification:

   However, automatic level of detail is computed only for fragment shaders.
   Other shaders operate as though the base level of detail were computed as
   zero.

and Section 8.9.3 (Texture Gather Functions):

   When performing a texture gather operation, the minification and
   magnification filters are ignored, and the rules for LINEAR filtering in
   the OpenGL Specification are applied to the base level of the texture
   image to identify the four texels i_0 j_1, i_1 j_1, i_1 j_0, and i_0 j_0.

Of course, explicit LOD or derivative variants work in all shader types.

This fixes several GL4x-CTS.texture_gather.* tests.

v2: TG4 is always level zero (thanks, Ilia)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:40 -05:00
Nicolai Hähnle
988fd6c922 radeonsi: emit TXQ in separate functions
TXQ is sufficiently different that having in it in the same code path as
texture sampling/fetching opcodes doesn't make much sense.

v2: guard against NULL pointer dereferences

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-05-17 15:28:40 -05:00
Nicolai Hähnle
d464bfd12a winsys/amdgpu: cleanup error handling in amdgpu_ctx_create
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:40 -05:00
Nicolai Hähnle
fef08af99c winsys/amdgpu: avoid ioctl call when fence_wait is called without timeout
When user fences are used, we don't need the kernel for polling.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:39 -05:00
Nicolai Hähnle
0558564200 gallium/radeon: add radeon_emitted to check for non-trivial IBs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:39 -05:00
Nicolai Hähnle
5e89b027b9 gallium/radeon: use radeon_emit_array
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:39 -05:00
Nicolai Hähnle
c23273532e gallium/radeon: use radeon_emit
Mostly generated using a sed-script, with manual fix-up for multi-line
statements.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:38 -05:00
Nicolai Hähnle
4ac555e9e5 st/mesa: fix reversed copyimage canonical format
The format_desc swizzle describes where in the array each color channel
comes from - but the existing code was written as if each entry in the
swizzle described the meaning of an array element.

Fixes piglit's arb_copy_image-format-swizzle.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 15:28:38 -05:00
Jordan Justen
6c9f35bb73 Revert "HACK: Don't re-configure L3$ in render stages pre-BDW"
This reverts commit 41af9b2e51.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94468
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-17 13:04:03 -07:00
Jordan Justen
8a80af2820 anv: Port L3 cache programming from i965
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-17 13:04:03 -07:00
Jordan Justen
aa41de080d anv/gen7: Add memory barrier to vkCmdWaitEvents call
We also have this barrier call for gen8 vkCmdWaitEvents.

We don't implement waiting on events for gen7 yet, but this barrier at
least helps to not regress CTS cases when data caching is enabled.
Without this, the tests would intermittently report a failure when the
data cache was enabled.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-17 13:04:03 -07:00
Jordan Justen
8ee31828c6 anv: Keep track of whether the data cache should be enabled in L3
If images or shader buffers are used, we will enable the data cache in
the the L3 config.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-17 13:04:03 -07:00
Jordan Justen
ff41738871 genxml/hsw: Add L3 cache control registers
These were added to the i965 driver in
5912da45a6.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-17 13:04:03 -07:00
Jan Vesely
47b390fe45 Treewide: Remove Elements() macro
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-17 15:28:04 -04:00
Jan Vesely
322cd2457c r600g,sb: Don't use standard macro name
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-05-17 15:28:03 -04:00
Jason Ekstrand
b6c4d46a58 anv/formats: Add support for VK_FORMAT_B4G4R4A4_UNORM pre-gen8 2016-05-17 12:17:22 -07:00
Jason Ekstrand
45c93384e5 anv: Add a devinfo argument to the get_format functions 2016-05-17 12:17:22 -07:00
Jason Ekstrand
100db3d31c anv/formats: Set the swizzle to RGB1 when using an RGBA format to fake RGB
This way we get correct sampling from RGB formats that are faked as RGBA.
This should also cause it to disable rendering and blending on those
formats.  We should be able to render to them and, on Broadwell and above,
we can blend on them with work-arounds.  However, we'll add support for
that more properly later when it's deemed useful.  For now, disabling
rendering and blending should be safe.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
ce375fba41 anv/formats: Refactor anv_get_format
The new code removes the switch statement and instead handles depth/stencil
as up-front special cases.  This allows for potentially more complicated
color format handling in the future.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
34198d798c anv: Use 16 bits for the isl_format in anv_format
This way the entire anv_format structure fits in 32 bits
2016-05-17 12:17:22 -07:00
Jason Ekstrand
7cae59012d anv/formats: Use the isl_channel_select enum for the swizzle 2016-05-17 12:17:22 -07:00
Jason Ekstrand
8ed429a4f0 anv/formats: Add an anv_get_format helper
This commit removes anv_format_for_vk_format and adds an anv_get_format
helper.  The anv_get_format helper returns the anv_format by-value.  Unlike
anv_format_for_vk_format the format returned by anv_get_format is 100%
accurate and includes any tweaks needed for tiled vs. linear.
anv_get_isl_format is now just a wrapper around anv_get_format that picks
off just the isl_format.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
13f5cee663 anv/format: Simplify anv_format
Now that we have VkFormat introspection and we've removed everything that
tried to use anv_format for introspection, we no longer need most of what
was in anv_format.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
c1c004e5b2 anv/formats: Delete validate_GetPhysicalDeviceFormatProperties
All it ever did was some extra logging that was useful when initially
bringing up Dota2.  We don't need it anymore.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
aad56f3ee7 anv/image: Use aspects for computing full usage 2016-05-17 12:17:22 -07:00
Jason Ekstrand
fbc23d93e0 anv: Remove the anv_format member from anv_image 2016-05-17 12:17:22 -07:00
Jason Ekstrand
be94a23b44 anv/wsi: Use vk_format_info for asserts rather than anv_format 2016-05-17 12:17:22 -07:00
Jason Ekstrand
63dbb2c60a anv/copy: Use the linear format from the image for the buffer block size
Because the buffer is exposed to the user, the block size is defined to
always exactly be the size of the actual vulkan format.  This is the same
size (it had better be) as the linaer image format.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
c87429c5f1 anv/image: Stop using anv_format for image create validation 2016-05-17 12:17:22 -07:00
Jason Ekstrand
990a7420b6 anv/image: Make heavier use of aspects 2016-05-17 12:17:22 -07:00
Jason Ekstrand
369b8bf402 anv/copy: Use the color_surf from the image to get the block size 2016-05-17 12:17:22 -07:00
Jason Ekstrand
9102e88364 anv: Change render_pass_attachment.format to a VkFormat 2016-05-17 12:17:22 -07:00
Jason Ekstrand
ffc502ce0c anv: Add helpers to provide simple VkFormat introspection
As much as I hate adding yet more format introspection, there are times
when the VkFormat is sufficient and we don't want to round-trip through
isl_format.  For these times, the new vk_format_info.c/h files provide some
simple driver-agnostic VkFormat introspection.  This intended to be
specific to Vulkan but not to any driver whatsoever.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
97ba402cc3 anv/image: Use get_isl_format when creating buffer views 2016-05-17 12:17:22 -07:00
Jason Ekstrand
234ecf26c6 anv/image: Add an aspects field
This makes several checks easier and allows us to avoid calling
anv_format_for_vk_format in a number of cases.
2016-05-17 12:17:22 -07:00
Jason Ekstrand
1bda8d06e5 anv: Make format_for_descriptor return an isl_format 2016-05-17 12:17:22 -07:00
Jason Ekstrand
263a8cb52d anv/wayland: Don't allow non-renderable formats 2016-05-17 12:17:22 -07:00
Jason Ekstrand
eb6baa3174 anv/wsi: Make WSI per-physical-device rather than per-instance
This better maps to the Vulkan object model and also allows WSI to at least
know the hardware generation which is useful for format checks.
2016-05-17 12:17:22 -07:00
Adam Jackson
2ad9d6237a glapi/gen: Copy some GL 1.0 enum details into ARB_viewport_array
Otherwise the instances in the extension XML override the core
definitions, and we stop knowing their sizes in indirect_size_get.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Adam Jackson
f4983b194d glapi: Define PURE for Sun Studio as well
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Adam Jackson
f1dd8dd6b6 glapi/glx: Mark byteswap functions as _X_UNUSED (v2)
Squashes the one remaining warning in the xserver build.

v2: Also clean up some non-standard whitespace (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Adam Jackson
ea08a5bcf6 glapi: Harden GLX request size processing (v2)
v2: Use == not is for equality testing (Dylan Baker)

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Adam Jackson
88cfc9ddaa glapi: Add the safe_{add,mul,pad} functions from xserver
We're about to update the generator scripts to use these, easier not to
vary between client and server.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Adam Jackson
7bc5c7f586 glapi: Fix whitespace droppings when printing the license header
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-05-17 15:04:56 -04:00
Rob Clark
1e93b0caa1 mesa/st: add support for NIR as possible driver IR
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
2016-05-17 14:22:46 -04:00
Rob Clark
2bbb140be3 mesa/st: move things around a bit in st_create_fp_variant()
Prep work for next patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-17 14:22:46 -04:00
Rob Clark
8f9a46dccb mesa/st: add nir pass for lowering builtin uniforms
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-17 14:22:46 -04:00
Emil Velikov
52addd90d1 scons: gallium: link against nir as needed
... otherwise we'll produce uncomplete binaries with introduction of NIR
as alternative IR with next commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-17 14:22:46 -04:00
Jason Ekstrand
265487aedf i965/fs: Add an allow_spilling flag to brw_compile_fs
This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling.  Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Francisco Jerez <currojerez@riseup.net>
2016-05-17 10:20:11 -07:00
Ilia Mirkin
dd4b44efc0 nvc0/ir: fix shared atomic lowering to preserve shared memory location
We were always doing atomics on shared memory location 0 instead of the
originally supplied location. Make sure to pass through the original
symbol and any indirection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org # note: expect minor conflict
2016-05-17 11:22:01 -04:00
Rob Clark
b65bd3dee5 freedreno/ir3: fix compiler warning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-17 10:05:20 -04:00
Rob Clark
e8beffb1b3 nir/validate: dump annotated shader with error msgs
Log all the errors, and at the end dump the shader w/ error annotations
to make it easier to see where the problems are.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-17 10:05:20 -04:00
Rob Clark
54ecfcc162 nir/validate: assert() -> validate_assert()
Prep work for next patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-17 10:05:20 -04:00
Rob Clark
a0ef26c1c2 nir/print: add support for print annotations
Caller can pass a hashtable mapping NIR object (currently instr or var,
but I guess others could be added as needed) to annotation msg to print
inline with the shader dump.  As the annotation msg is printed, it is
removed from the hashtable to give the caller a way to know about any
unassociated msgs.

This is used in the next patch, for nir_validate to try to associate
error msgs to nir_print dump.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-17 10:05:20 -04:00
Alejandro Piñeiro
e5e412cd27 i965: Expose OpenGL 4.2 for gen8+
ARB_vertex_attrib_64bit was the only feature missing.

v2: we can expose 4.2 instead of 4.1 (Ian Romanick)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Alejandro Piñeiro
f051eae25a docs: Mark ARB_vertex_attrib_64bit as done for i965/gen8+
v2: label as done for i965/gen8+ instead of i965 (Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Alejandro Piñeiro
59b5441fd9 i965: Enable ARB_vertex_attrib_64bit for gen8+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Juan A. Suarez Romero
d6281a9d95 i965: take care of doubles when lowering VS inputs
Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Juan A. Suarez Romero
7ea09511ca i965/fs: calculate first non-payload GRF using attrib slots
When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.

Instead, we need to use the number of slots used by the attributes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Juan A. Suarez Romero
b7423b485e i965/vec4: use attribute slots to calculate URB read length
Do not use total attributes because a dvec3/dvec4 attribute requires two
slots. So rather use total attribute slots.

v2: do not use loop to calculate required attribute slots (Kenneth
Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:55 +02:00
Juan A. Suarez Romero
b0fb08e179 i965: take care of doubles when remapping VS attributes
Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for
anything else.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:54 +02:00
Juan A. Suarez Romero
80535873bb nir: add double input bitmap
This bitmap tracks which input attributes are double-precision.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:54 +02:00
Juan A. Suarez Romero
ccfe25f758 i965/fs: shuffle 32bits into 64bits for doubles
VS Thread Payload handles attributes in URB as vec4, no matter if they
are actually single or double precision.

So with double-precision types, value ends up in the registers split in
32bits chunks, in different positions.

We need to shuffle the chunks to get the doubles correctly.

v2:
 * Extra blank line. Add { } on if body (Ian Romanick)
 * Use dest directly (Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:47 +02:00
Alejandro Piñeiro
96c276dda9 i965/fs: half exec_size when dealing with 64 bits attributes
The HW has a restriction that only vertical stride may cross register
boundaries. Until now this was only handled on VGRFs at
rw_reg_from_fs_reg, but it is also needed for attributes.

v2:
 * Remove reference to commit id on commit message (Juan Suarez)
 * Simplify code that compute final exec_size (Ian Romanick)
 * Use REG_SIZE on that same code (Kenneth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 07:34:40 +02:00
Alejandro Piñeiro
1ff32ae8b2 i965: passthru formats cannot be used width edge flag enabled
Add an assertion to detect this case.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 07:34:40 +02:00
Antia Puentes
8b0a334b5e i965: Configure how to store *64*PASSTHRU vertex components
From the Broadwell specification, structure VERTEX_ELEMENT_STATE
description:

   "When SourceElementFormat is set to one of the *64*_PASSTHRU
    formats,  64-bit components are stored in the URB without any
    conversion. In this case, vertex elements must be written as 128
    or 256 bits, with VFCOMP_STORE_0 being used to pad the output
    as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into
    the URB, Component 1 must be specified as VFCOMP_STORE_0 (with
    Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
    vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0
    in order to output a 256-bit vertex element. Likewise, use of
    R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0
    in order to output a 256-bit vertex element."

Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for
dvec3 and dvec4 vertex elements.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 07:34:40 +02:00
Alejandro Piñeiro
71150b73c8 i965: get the proper vertex surface type for doubles on gen8+
This commit adds support for PASSTHRU format when pushing
double-precision attributes.

Check glarray->Doubles in order to know if we should choose a format
that does a conversion to float, or just passthru the 64-bit double.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 07:34:40 +02:00
Ilia Mirkin
b1d74e9486 nvc0/ir: make sure out-of-bounds buffer loads/atomics get a 0 result
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-17 01:27:29 -04:00
Timothy Arceri
4fb4fd0b6b glsl: make reserved_varying_slot() static
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-17 15:06:39 +10:00
Timothy Arceri
1d752823af glsl: include per-patch varyings when generating reserved slot bitfield
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-17 15:06:27 +10:00
Timothy Arceri
00441829e7 glsl: don't incorrectly eliminate patches with explicit locations
These varying have a separate location domain from per-vertex varyings
and need to be handled separately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-17 15:06:21 +10:00
Timothy Arceri
3f477f0ea5 glsl: remove remainings tabs in link_varyings.cpp
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-17 15:06:16 +10:00
Timothy Arceri
6d5f7557fb glsl: fix location and component packing validation on patches
These varyings have a separate location domain from per-vertex varyings
and need to be handled separately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-17 15:06:12 +10:00
Kenneth Graunke
aae0865dc0 i965: Enable ARB_shader_precision on Gen8+.
I recently fixed a bug in the Piglit tests:
https://lists.freedesktop.org/archives/piglit/2016-May/019802.html

With that patch in place, we pass all the tests.  So, turn it on.

We could probably expose this earlier than Gen8, but the extension
says that OpenGL 4.0 is required, and all of our tests are written
against GLSL 4.00 (which is only supported on Gen8+).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 17:52:45 -07:00
Jose Fonseca
cf010de6ee vl/dri: Move the DRI3 check out of sources include into C.
Fixes SCons build.

Trivial.  Built locally with SCons and autotools.
2016-05-16 21:50:43 +01:00
Leo Liu
5e2072c711 st/vdpau: add dri3 support
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
c122c74dca vl/dri3: implement functions for get and set timestamp
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
9f50a79b8f vl/dri3: handle PresentCompleteNotify event
and get timestamp calculated based on the event's reply

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
e8282178ab st/va: add dri3 support
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
8d7ac0a4e4 vl/dri3: implement DRI3 BufferFromPixmap
We also need render to the front buffer of temporary X pixmap,
this is the case of when we using opengl as video out for vaapi.
the basic implementation is to pass pixmap ID to X server, and
then X will return dma-buf fd, we will get the buffer object
through this dma-buf fd.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
858b329c2c vl/dri3: add support for resizing
When drawable size changed, PresentConfigureNotify event will be
emitted, by handling the event to re-allocate resized buffer.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
96580ad593 vl/dri3: implement funciton for get dirty area
This will clear presentation area not covered by video content

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
b0bd908284 vl/dri3: implement function for flush frontbuffer
Request drawable content in pixmap by calling DRI3 PresentPixmap,
and handle PresentIdleNotify event.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
e1223282db vl/dri3: add back buffers support
This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
69ba9be4d2 vl/dri3: implement flushing for queued events
also place holder for present events handling

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
758b1bbaa7 vl/dri3: register present events
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
672e8d5e7e vl/dri3: set drawable geometry
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu
12e5220e34 vl/dri3: add DRI3 support and implement create and destroy
Required functions into place for implementation, create screen
with device fd returned from X server, also bail out to DRI2
with certain conditions.

v2: -organize the error out path (Axel)
    -squash previous patch 1 and 2 into one (Emil)

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Dave Airlie
30e437bd76 mesa/version.c: enable cull distance in version check.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-17 06:08:31 +10:00
Ian Romanick
11096ecc39 glsl/linker: Include the interface name for input and output blocks
On my oes_shader_io_blocks branch, this fixes 71
dEQP-GLES31.functional.program_interface_query.* tests.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2016-05-16 11:18:03 -07:00
Ian Romanick
7c11589eb4 glsl/linker: Use canonical format for ARB_program_interface_query spec quotes
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 11:18:03 -07:00
Mark Janes
fd854c1add i965: check tcs for NULL dereference
Coverity issue 1361544 found an instance where the tcs variable is
checked for NULL, but unconditionally dereferenced later in the same
function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 11:11:11 -07:00
Matt Turner
bf91034d44 i965: Mark is_lossless_compressed_aux UNUSED to silence warning.
Used only in assert().
2016-05-16 11:08:55 -07:00
Matt Turner
1385018a72 genxml: Use llroundf() and store to appropriate type.
Both functions return uint64_t, so I expect the masking/shifting should
be done on 64-bit types.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-16 11:06:15 -07:00
Matt Turner
4191551262 nir: Mark nir_start_block()/nir_impl_last_block() with returns_nonnull.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 11:06:15 -07:00
Matt Turner
377ab2f2d7 util: Add ATTRIBUTE_RETURNS_NONNULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 11:06:15 -07:00
Jan Vesely
40c6d54e76 clover: grid_offset should be padded with 0 not 1
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 13:58:14 -04:00
Iago Toral Quiroga
71465179fc i965: Expose OpenGL 4.0 for gen8+
ARB_gpu_shader_fp64 was the only feature missing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:34 +02:00
Iago Toral Quiroga
b1d21e1159 docs: Mark ARB_gpu_shader_fp64 as done for i965/gen8+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
309d285c6b i965: Enable ARB_gpu_shader_fp64 for gen8+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
58f304defe i965/tes/scalar: Fix load input for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
61197b8d5d i965/tcs/scalar: fix store output for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
cda3435ea8 i965/tcs/scalar: fix load input for doubles
v2: do not write to the original indirect_offset since that is
    an expression that could be used somewhere else (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
66192b3c16 i965/fs: fix nir_intrinsic_store_output for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
3cce67aff0 i965/fs: fix number of output components for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
0297f1021a i965/vec4: handle doubles in type_size_vec4()
The scalar backend uses this to check URB input sizes.

v2: Removed redundant break after return (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
8c6d147373 i965/fs: support doubles with shared variable stores
This is pretty much the same we do with SSBOs.

v2: do not shuffle in-place, it is not safe since the original 64-bit data
    could be used after the write, instead use a temporary like we do
    for SSBO stores (Iago)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
943f9442bf i965/fs: support doubles with ssbo stores
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
b9aa66aa51 i965/fs: add shuffle_64bit_data_for_32bit_write helper
This does the inverse operation of shuffle_32bit_load_result_to_64bit_data
and we will use it when we need to write 64-bit data in the layout expected
by untyped write messages.

v2 (curro):
- Use subscript() instead of stride()
- Assert on the input types rather than silently retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Drop the temporary vgrf and force_writemask_all.
- Make component_i const.
- Move to brw_fs_nir.cpp

v3 (curro):
- Pass dst and src by reference.
- Simplify allocation of tmp register.
- Move to brw_fs_nir.cpp.
- Get rid of the temporary.

v3 (Iago):
- Check that the src and dst regions do not overlap, since that would
  typically be a bug in the caller.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
33f7ec18ac i965/fs: support doubles with SSBO loads
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
8aa01ac596 i965/fs: support doubles with shared variable loads
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
6eab06b866 i965/fs: Add do_untyped_vector_read helper
We are going to need the same logic for anything that reads
doubles via untyped messages (CS shared variables and SSBOs). Add a
helper function with that logic so that we can reuse it.

v2:
- Make this a static function instead of a method of fs_visitor (Iago)
- We only support types with a size of 4 or 8 (Curro)
- Avoid retypes by using a separate vgrf for the packed result (Curro)
- Put dst parameter before source parameters (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
b86d4780ed i965/fs: support doubles with UBO loads
UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
types this only provides components x and y. Thus, if we are reading
more than 2 components we need to issue a second load at offset+16 to
read the next 16-byte chunk with components w and z.

UBO loads with non-constant offset emit a load for each component
in the vector (and rely in CSE to fix redundant loads), so we only
need to consider the size of the data type when computing the offset
of each element in a vector.

v2 (Sam):
- Adapt the code to use component() (Curro).

v3 (Sam):
- Use type_sz(dest.type) in VARYING_PULL_CONSTANT_LOAD() call (Curro).
- Add asserts to ensure std140 vector alignment rules are followed
  (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
58f1804c4f i965/fs: fix pull constant load component selection for doubles
UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
constant offset that is 16-byte aligned. If we need to access an unaligned
offset we emit a load with an aligned offset and use the remaining constant
offset to select the component into the vec4 result that we are interested
in. This component must be computed in units of the type size, since that
is what fs_reg::set_smear expects.

This patch does this change in the two places where we use this message:
In demote_pull_constants when we lower uniform access with constant offset
into the pull constant buffer and in UBO loads with constant offset.

v2 (Sam):
- Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
source type instead and remove MAX2 (Curro).
- Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
(Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Francisco Jerez
71fd4942d1 i965/fs: Fix and document component().
This fixes a number of bugs of component() by reimplementing it in
terms of horiz_offset(): Handling of base registers starting at a
non-zero subreg_offset, handling of strided registers and overflow of
subreg_offset into reg_offset.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
e209134f71 i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles
v2 (Curro):
   - Assert on scale == 1 when shuffling 64-bit data.
   - Remove type_slots, use type_sz(vec4_result.type) instead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Iago Toral Quiroga
50b7676dc4 i965/fs: add shuffle_32bit_load_result_to_64bit_data helper
There will be a few places where we need to shuffle the result of a 32-bit
load into valid 64-bit data, so extract this logic into a separate helper
that we can reuse.

v2 (Curro):
- Use subscript() instead of stride()
- Assert on the input types rather than retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Don't use  force_writemask_all.
- Mark component_i as const.
- Make the function name lower case.

v3 (Curro):
- Pass src and dst by reference.
- Move to brw_fs_nir.cpp

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:33 +02:00
Francisco Jerez
4d9c461e53 i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.
Instead of using the LOAD_PAYLOAD instruction (emitted through the
emit_transpose() helper that is no longer useful and this commit
removes) which had to be marked force_writemask_all in some cases,
emit a series of moves to apply proper channel enable signals to the
destination.  Until now lower_simd_width() had mainly been used to
lower things that invariably had a basic block-local temporary as
destination so it didn't seem like a big deal, but I found it to be
the reason for several Piglit regressions in my SIMD32 branch and
Igalia discovered the same issue independently while working on FP64
support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
9149fd6817 i965/fs: fix copy/constant propagation regioning checks
We were not accounting for subreg_offset in the check for the start
of the region.

Also, fs_reg::regs_read() already takes the stride into account, so we
should not multiply its result by the stride again. This was making
copy-propagation fail to copy-propagate cases that would otherwise be
safe to copy-propagate. Again, this was observed in fp64 code, since
there we use stride > 1 often.

v2 (Sam):
- Rename function and add comment (Jason, Curro).
- Assert that register files and number are the same (Jason).
- Fix code to take into account the assumption that src.subreg_offset
is strictly less than the reg_offset unit (Curro).
- Don't pass the registers by value to the function, use
'const fs_reg &' instead (Curro).
- Remove obsolete comment in the commit log (Curro).

v3 (Sam):
- Remove the assert and put the condition in the return (Curro).
- Fix function name (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
789eecdb79 i965/fs: fix copy propagation from load payload
We were not considering the case where the load payload is writing to
a destination with a reg_offset > 0.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
cf375a3333 i965/fs: fix copy propagation of partially invalidated entries
We were not invalidating entries with a src that reads more than one register
when we find writes that overwrite any register read by entry->src after
the first. This leads to incorrect copy propagation because we re-use
entries from the ACP that have been partially invalidated. Same thing for
entries with a dst that writes to more than one register.

v2 (Sam):
- Improve code by defining regions_overlap() and using it instead of a
loop (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Francisco Jerez
ea1ef49a16 i965/fs: Reindent register offset calculation of try_copy_propagate().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Francisco Jerez
0fb19806c0 i965/fs: Simplify and fix register offset calculation of try_copy_propagate().
try_copy_propagate() was special-casing UNIFORM registers (the
BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the
top of the function) and then failing to take into account the
possibility of the instruction reading from a non-zero offset of the
destination of the copy.  The VGRF/ATTR handling takes it into account
correctly, and there is no reason we couldn't use the exact same logic
for the UNIFORM file aside from the fact that uniforms represent
reg_offset in different units.  We can work around that easily by
defining an additional constant with the right unit reg_offset is
expressed in.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
7aa53cd725 i965/fs: disallow type change in copy-propagation if types have different sizes
Because the semantics of source modifiers are type-dependent, the type of the
original source of the copy must be kept unmodified while propagating it into
some instruction, which implies that we need to have the guarantee that the
meaning of the instruction is going to remain the same after we have changed
the types. Whenthe size of the new type is different from the size of the old
type the new and old instructions cannot possibly be equivalent because the new
instruction will be reading more data than the old one was.

Prevents that we turn this:

load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, vgrf17:DF 1sthalf
load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:UD, vgrf21:UD 1sthalf

into:

load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf
load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf

where the semantics of the last instruccion have changed.

v2 (Curro):
  - Update commit log and add comment to explain the problem better.
  - Simplify the condition.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
ac9b966aac i965/fs: Fix copy propagation of load payload for double operands
Specifically, consider the size of the data type of the operand to compute
the number of registers written.

v2 (Sam):
- Fix line width (Jordan).
- Add an assert (Jordan).
- Use REG_SIZE in the calculation of regs_written (Curro)

v3 (Sam):
- Fix assert and calculation of regs_written (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 09:55:32 +02:00
Francisco Jerez
70dc19f9d6 i965/fs: Fix propagation of copies with strided source.
This has likely been broken since we started propagating copies not
matching the offset of the instruction exactly
(1728e74957).  The copy source stride
needs to be taken into account to find out the offset at the origin
that corresponds to the offset at the destination of the copy which is
being read by the instruction.  This has led to program miscompilation
on both my SIMD32 branch and Igalia's FP64 branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Iago Toral Quiroga
17decd940c i965/fs: fix subreg_offset overflow in byte_offset()
This can happen if the register already has a non-zero subreg_offset
when byte_offset() is called.

v2 (Sam):
- Refactor byte_offset() (Jordan).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 09:55:32 +02:00
Kenneth Graunke
2fd79ebe8f i965: Fix JIP to skip over sibling do...while loops.
We've apparently always been botching JIP for sequences such as:

do
    cmp.f0.0 ...
    (+f0.0) break
    ...
    do
        ...
    while
    ...
while

Because the "do" instruction doesn't actually exist, the inner "while"
is at the same depth as the "break".  brw_find_next_block_end() thus
mistook the inner "while" as the end of the loop containing the "break",
and set the "break" to point to the wrong place.

Only "while" instructions that jump before our instruction are relevant.
We need to ignore the rest, as they're sibling control flow nodes (or
children, but this was already handled by the depth == 0 check).

See also commit 1ac1581f38.

This prevents channel masks from being screwed up, and fixes GPU
hangs(*) in dEQP-GLES31.functional.shaders.multisample_interpolation.
interpolate_at_sample.centroid_qualified.multisample_texture_16.

The test ended up executing code with no channels enabled, and that
code contained FIND_LIVE_CHANNEL, which returned 8 (out of range for
a SIMD8 program), which then was used in indirect GRF addressing,
which randomly got a boolean value (0xFFFFFFFF), interpreted it as
a sample ID, OR'd it into an indirect send message descriptor,
which corrupted the message length, sending a pixel interpolator
message with mlen 15, which is illegal.  Whew :)

(*) Technically, the test doesn't GPU hang currently, but only
    because another bug prevents it from issuing pixel interpolator
    messages entirely...with that fixed, it hangs.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 00:20:07 -07:00
Kenneth Graunke
2f02fad6b3 i965: Make a "does this while jump before our instruction?" helper.
I need to use this in an additional place.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-16 00:19:53 -07:00
Kenneth Graunke
b6f250d7f2 i965: Send the minimal number of STATE_BASE_ADDRESS packets.
STATE_BASE_ADDRESS stalls the whole pipeline, and the documentation
cautions us to emit it as little as possible for better performance.

We recently put some hacks in BLORP to try and avoid emitting it
if it was already set correctly.  However, this wasn't quite minimal:
if BLORP is the first operation (i.e. glClear()), then it would emit
it, and subsequent draw calls would emit it again.

This caused a small drop in performance in GPUTest Triangle when
switching from Meta to BLORP.

Unlike most packets, STATE_BASE_ADDRESS isn't influenced by GL state:
it needs to be emitted once per batch, before most other commands, or
whenever we change the program cache BO.  It's also valid in both the
3D and compute pipelines, which makes it even more unique.

This patch removes it from the atom mechanism and instead directly
calls it as part of every draw, compute dispatch, or BLORP operation.
We introduce a new flag indicating that STATE_BASE_ADDRESS has already
been emitted this batch, and if so, skip doing it again.  When we make
a new program cache BO, we simply reset the flag, so the next operation
will emit it again.  When we flush/reset the batch, we reset the flag.

This guarantees that we'll emit STATE_BASE_ADDRESS only when we have to.
It's also less code than the old atom mechanism.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16 00:11:51 -07:00
Kenneth Graunke
97179c606c i965: Combine Gen4-7 and Gen8+ state base address emitters.
We're about to start calling it directly, and this means the callers
won't have to think about generations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16 00:11:50 -07:00
Kenneth Graunke
7b70a12e1c i965: Move Gen4-5 programs to brw_upload_programs() too.
This way all the programs are in one place again, and it also should
make some future STATE_BASE_ADDRESS related changes possible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16 00:11:49 -07:00
Kenneth Graunke
b23b099a0b i965: Mark brw const in brw_state_dirty and callers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-16 00:11:43 -07:00
Kenneth Graunke
8e71ac731b glsl: Don't do constant propagation in opt_constant_folding.
opt_constant_folding is supposed to fold trees of constants into a
single constant.  Surprisingly, it was also propagating constant values
from variables into expression trees - even when the result couldn't be
folded together.  This is opt_constant_propagation's job.

The ir_dereference_variable::constant_expression_value() method returns
a clone of var->constant_value.  So we would replace the dereference
with a constant, propagating it into the tree.

Skip over ir_dereference_variable to avoid this surprising behavior.
However, add code to explicitly continue doing it in the constant
propagation pass, as it's useful to do so.

shader-db statistics on Broadwell:

total instructions in shared programs: 8905349 -> 8905126 (-0.00%)
instructions in affected programs: 30100 -> 29877 (-0.74%)
helped: 93
HURT: 20

total cycles in shared programs: 71017030 -> 71015944 (-0.00%)
cycles in affected programs: 132456 -> 131370 (-0.82%)
helped: 54
HURT: 45

The only hurt programs are by a single instruction, while the helped
ones are helped by 1-4 instructions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-15 23:59:39 -07:00
Kenneth Graunke
db8fcbbaf9 glsl: Avoid excess tree walking when folding ir_dereference_arrays.
If an ir_dereference_array has non-constant components, there's no
point in trying to evaluate its value (which involves walking down
the tree and possibly allocating memory for portions of the subtree
which are constant).

This also removes convoluted tree walking in opt_constant_folding(),
which tries to fold constants while walking up the tree.  No need to
walk down, then up, then down again.

We did this for swizzles and expressions already, but I was lazy
back in the day and didn't do this for ir_dereference_array.

No change in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-15 23:59:33 -07:00
Kenneth Graunke
329fe93210 glsl: Consolidate duplicate copies of constant folding.
We could probably clean this up more (maybe make it a method), but at
least there's only one copy of this code now, and that's a start.

No change in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-15 23:59:20 -07:00
Kenneth Graunke
3bf27a9a00 glsl: Remove bonus tree walking in opt_constant_folding().
It looks like this was missed when converting opt_constant_folding()
from a hierarchical visitor to an rvalue visitor in 6606fde3.

ir_rvalue_visitor already processes values on the way back up the tree,
so we will have already visited every child node.  There's no point in
doing it again.

No change in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-15 23:59:10 -07:00
Kenneth Graunke
8e59670bcf glsl: Make opt_constant_variable() bail in useless cases.
The pass ultimately skips over any entries with assignment_count != 1,
so there's no need to do further work once we've determined that there
are multiple assignments.

The constant value could be a large array (i.e. uvec4[327]), at which
point skipping the constant_expression_value() call (and the clone()
call within) can save us piles of memory.

No change in shader-db.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-15 23:59:05 -07:00
Kenneth Graunke
c907ca6c8d i965: Flip interpolateAtOffset's y offset when necessary.
Fixes 4 dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_offset.no_qualifiers.default_framebuffer
- interpolate_at_offset.centroid_qualifier.default_framebuffer
- interpolate_at_offset.sample_qualifier.default_framebuffer
- interpolate_at_offset.array_element.default_framebuffer

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-15 23:50:52 -07:00
Kenneth Graunke
6d65b0c6dc nir: Add a nir->info.uses_interp_var_at_offset flag.
I've added this to nir_gather_info(), but also to glsl_to_nir() as a
temporary measure, since the i965 GL driver today doesn't use
nir_gather_info() yet.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-15 23:50:28 -07:00
Kenneth Graunke
d4d7e1516b glsl: Drop bad ASSERT_TRUE in gl_CullDistance link_varyings test.
I don't know what the intention was here, but this function returns
void.  We can't assert anything about its return value.

Fixes "make check" failures.

v2: Also fix prototype for the function (caught by Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-15 23:49:19 -07:00
Jan Vesely
9525f33164 clover: Handle PIPE_SHADER_IR_NIR in switch
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-15 20:05:10 -04:00
Rob Clark
277818ecfb freedreno/ir3: small standalone compiler cleanup
Don't hard-code the gpu-id anymore.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-15 17:25:48 -04:00
Rob Clark
f06343d6ea nir: forward-declare 'struct gl_shader_program'
Drop extra #include which is otherwise unneeded (and makes this header
difficult to include from outside of src/mesa).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-15 17:25:48 -04:00
Rob Clark
79d6409a14 nir: return progress from lower_idiv
With algebraic-opt support for lowering div to shift, the driver would
like to be able to run this pass *after* the main opt-loop, and then
conditionally re-run the opt-loop if this pass actually lowered some-
thing.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-15 17:25:48 -04:00
Rob Clark
f8840f471d freedreno/ir3: lower fdiv
Not sure how we didn't hit this already, but since we want fdiv
converted into mul + rcp, we should set this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-15 17:25:48 -04:00
Rob Clark
53cde5e295 freedreno/ir3: handle VARYING_SLOT_PNTC
In the glsl->tgsi path, this already gets translated to VAR8, which
matches up with rasterizer->sprite_coord_enable.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-15 17:25:48 -04:00
Rob Clark
2f1581059b freedreno/ir3: disable TGSI specific hacks in nir case
When we got NIR directly from state tracker (vs using tgsi_to_nir) we
need to realize this and skip some TGSI specific hacks.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-15 17:25:48 -04:00
Rob Clark
784086f3c1 freedreno/ir3: add support for NIR as preferred IR
For now under debug flag, since only suitable for debugging/testing.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-15 17:25:47 -04:00
Rob Clark
8b24f7b440 nir: fix comment typo about f2d/d2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-15 17:25:47 -04:00
Ilia Mirkin
be2b13e3bf nv50/ir: avoid asserts when the state tracker feeds us bogus inputs
INTERP is defined (by me) to have to have a INPUT source. However the
state tracker does not always obey this. This happens due to varying
packing logic introducing additional mov's which can't always be undone.
Instead of just giving up, we instead try harder to find the original
input. This won't always be possible, for example with indirect
accesses. There's not much we can (easily) do about that though.

This fixes the remaining interpolateAt* failures in dEQP:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at*

some of which were asserting due to INTERP_* being passed a non-input.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-15 14:12:56 -04:00
Ilia Mirkin
9323d084ac nvc0: don't try to go through the push path for indirect draws
This fixes

dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute

These tests were causing a const vbo to be set up, and were small enough
draws that the logic was trying to go via the push path (which emits
data directly into the cmd stream rather than uploading a user vbo).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-15 10:48:39 -04:00
Ilia Mirkin
2ef3cdb07e nvc0/ir: make sure to align the second arg of TXD to 4, as we do for TEX
This was handled in handleTEX(), however the way the logic works, those
extra arguments aren't added on by then, so it did nothing. Instead we
must duplicate that bit here. GK110 appears to complain about
MISALIGNED_GPR, however it's reasonable to believe that GK104 has the
same requirements.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95403
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-15 10:48:39 -04:00
Tobias Klausmann
8c02939794 nv50,nvc0: add support for cull distances
Cull distances are just a special case of clip distances as far as the
hardware is concerned. Make sure that the relevant "planes" are enabled,
and flip the clip mode to cull for those.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: add enables on nvc0, add nv50 support]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2016-05-15 10:48:39 -04:00
Ilia Mirkin
2ad970ecf4 st/mesa: disable cull distance for now
The pass that st/mesa relies on to combine clip and cull distances has
been reverted, so we can't expose ARB_cull_distance until that is
resolved.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-15 10:48:38 -04:00
Jason Ekstrand
09e041d61d i965: Use blorp for all clears
We used to use a meta path on gen8 but we haven't since c7cf17ae75.  We
might as well delete the meta path since blorp works on all gens.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
1cfb4bc890 i965: Use blorp for all stencil blits
We used to use a meta path because blorp didn't support 16x MSAA.  Now it
does, so we don't need the meta paths anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
64f2907030 i965: Use blorp for all updownsample blits
We used to use a meta path because blorp didn't support 16x MSAA.  Now it
does, so we don't need the meta paths anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
f5febc83a7 i965/blorp: Add support for 16x MSAA
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
a32315bd19 i965: move brw_meta_set_fast_clear_color to brw_meta_util.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
36529f670f i965; Move brw_meta_get_*_rect to brw_meta_util.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
21034f1b08 i965: Move brw_is_color_fast_clear_compatible to brw_meta_util
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
b05c68fc8a i965: Move brw_get_rb_for_slice to brw_meta_util
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 14:18:21 -07:00
Jason Ekstrand
672cffee0f i965/blorp: Get rid of the blorp_prog_data_int() helper
The helper was initially created to allow us to set reasonable defaults as
we mutated the brw_blorp_prog_data structure in preparation for NIR.  Now
that everything is going through brw_blorp_compile_nir_shader() which fully
fills out the brw_blorp_prog_data structure, we don't need the helper.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:54 -07:00
Jason Ekstrand
c228ea8345 i965/blorp: Delete the old blorp shader emit code
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:54 -07:00
Jason Ekstrand
c18da26abf i965/blorp: Stop doing f2i(i2f(sample_id))
NIR gets kind of awkward when you have a 3-component vector with two floats
and one int.  This led to us accidentally going through float for the
sample index.  It doesn't hurt anything but it also isn't needed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
e503da61c6 i965/blorp: Refactor coordinate munging
The original code-flow tried to map original blorp.  This puts things more
where they belong and simplifies some of the logic.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
8636937dd6 i965/blorp: Add bilinear blending support to the NIR path
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
6bd7bd6633 i965/blorp: Add support for averaging resolves to the NIR path
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
c7269c1551 i965/blorp: Add MSAA encode/decode support to the NIR path
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
df8c2936cd i965/blorp: Add support for W-[de]tiling to the NIR path
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
6adb8d6d3a i965/blorp: Add support for discard-based bounds checks to the NIR path
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
4bdace0791 i965/blorp: Add initial support for NIR-based blit shaders
Many of the more complex cases still fall back to the old shader builder.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
b0275ad0c9 i965/blorp: Refactor getting the blit kernel into a helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
6df3d75206 i965/blorp: Use NIR for clear shaders
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95373
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
bb45f42f55 i965/blorp: Create the program key in get_clear_kernel
There's no reason to be passing a whole struct around just for a single
boolean.  We can create it later when we actually need to use it as a key.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
c1fe8859d3 i965/blorp: Add a helper for compiling NIR shaders
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:53 -07:00
Jason Ekstrand
353eadb170 blorp: Add initial state setup support for SIMD8 dispatch
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:52 -07:00
Jason Ekstrand
cd5a2905cf i965/blorp: Add a param array to prog_data
This array allows the push constants to be re-arranged on upload.  The
actual arrangement will, eventually, come from the back-end compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:52 -07:00
Jason Ekstrand
c46cbe19f4 i965/blorp: Add a prog_data_init helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-14 13:34:52 -07:00
Jason Ekstrand
50e5e1f747 i965/fs: Implement the new NIR MCS texturing
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:49 -07:00
Jason Ekstrand
f47faa4316 nir: Add texture opcodes and source types for multisample compression
Intel hardware does a form of multisample compression that involves an
auxilary surface called the MCS.  When an MCS is in use, you have to first
sample from the MCS with a special opcode and then pass the result of that
operation into the next sample instrucion.  Normally, we just do this
ourselves in the back-end, but we want to expose that functionality to NIR
so that we can use MCS values directly in NIR-based blorp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:44 -07:00
Jason Ekstrand
87a41e862b nir/builder: Add a helper for grabbing multiple channels from an ssa def
This is similar to nir_channel except that it lets you grab more than one
channel by providing a mask.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:40 -07:00
Jason Ekstrand
fc58cb543f nir/builder: Generate the alu helpers directly in python
There's no reason for having a macro *and* a python generator.  We can
easily just do the whole thing in python.  This has the advantage that we
are no longer definining ALU# macros which conflict with the ones in
brw_fs_builder.h.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:38 -07:00
Jason Ekstrand
a0e6e5f21f i965/fs: Use MRF0 for the repclear message
This is what BLORP does.  Making them match cuts down on the noise when
looking at AUB diffs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:33 -07:00
Jason Ekstrand
5a68df87da i965/blorp: Simplify the sample layout calculation
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:30 -07:00
Jason Ekstrand
bee160b31b i965/fs: Organize prog_data by ksp number rather than SIMD width
The hardware packets organize kernel pointers and GRF start by slots that
don't map directly to dispatch width.  This means that all of the state
setup code has to re-arrange the data from prog_data into these slots.
This logic has been duplicated 4 times in the GL driver and one more time
in the Vulkan driver.  Let's just put it all in brw_fs.cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:25 -07:00
Jason Ekstrand
7be100ac9a i965/gen7_wm: Move where we set the fast clear op
This better matches gen8 state setup

Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:21 -07:00
Jason Ekstrand
1ec466d0ff i965/fs: Stop setting dispatch_grf_start_reg from the visitor
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:18 -07:00
Jason Ekstrand
082768af30 i965/fs: Clean up the logic in compile_fs a bit
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:13 -07:00
Jason Ekstrand
b0f8768905 i965/state: Clean up WM/PS state to pull more things out of prog_data
Now that we have a persample_shading bit in prog_data we can reduce the
amount the state setup code needs to be looking at the GL state.  In
particular, it no longer pulls anything directly out of the
gl_fragment_program and no longer depends on NEW_FRAGMENT_PROGRAM.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:10 -07:00
Jason Ekstrand
712a980add i965/fs: Rework the persample shading key/prog_data bits
This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data.  The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear.  This commit changes it to two easily
understood bits that communicate everything we need:

 1) key->persample_interp: means that the user has requested persample
    interpolation through the API.  This is equivalent to having
    SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
    enough that you actually get multiple per-sample invocations.

 2) key->multisample_fbo: means that the shader will be running on an
    actual multi-sampled framebuffer.

This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode.  This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.

In theory, this shuffle could mean more recompiles.  However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:05 -07:00
Jason Ekstrand
a2f50d87b6 nir: Add an info bit for uses_sample_qualifier
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:33:52 -07:00
Kenneth Graunke
59156b2e96 i965: Fix undefined df bits in brw_reg comparisons.
Commit 5310bca024 added a new "double df"
field to the brw_reg struct, adding an extra 4 bytes of data that isn't
usually initialized (or may contain irrelevant garbage if the struct is
mutated).  This means that it's no longer safe to memcmp().

Instead, add a brw_regs_equal() function which ignores the extra df bits
unless they matter.  To keep the implementation cheap, we wrap the first
set of fields in a union/struct so that we can use a single DWord
comparison.

v2: Drop unnecessary casts (caught by Francisco Jerez).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-05-14 00:18:37 -07:00
Dave Airlie
9f8867d877 i965: disable cull distance temporarily.
I'll fix this up on Monday, so leave the docs changes in place.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-14 11:39:34 +10:00
Dave Airlie
7a6d55826e Revert "glsl: Extend lowering pass for gl_ClipDistance to support other arrays (v4)"
This reverts commit ad355652c2.

This broke a bunch of clip tests.
2016-05-14 11:39:34 +10:00
Ian Romanick
a608e946b5 docs: Mark GL_OES_shader_io_blocks as started
Watch the oes_shader_io_blocks of my fd.o Mesa GIT repo for progress.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-13 17:48:46 -07:00
Kristian Høgsberg Kristensen
4e959cf9f9 docs: update ARB_cull_distance status.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-05-13 16:32:14 -07:00
Kristian Høgsberg Kristensen
c564348a2e i965: Add support for GL_ARB_cull_distance
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-13 16:28:25 -07:00
Ilia Mirkin
a1c2444792 st/mesa: flip y coordinate of interpolateAtOffset for winsys
This fixes a few dEQP tests like

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.no_qualifiers.default_framebuffer

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-13 19:17:41 -04:00
Ilia Mirkin
0d8e850195 glsl: make sure that textureProj(bias) variants are only exposed in fs
Many were already marked as fs_only, but not all. This fixes the
remaining ir_txb entries.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-13 19:17:26 -04:00
Ilia Mirkin
37c8f4c609 glsl: be more strict when validating shader inputs
interpolateAt* can only take input variables or an element of an input
variable array. No structs.

Further, GLSL 4.40 relaxes the requirement to allow swizzles, so enable
that as well.

This fixes the following dEQP tests:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_struct_member
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_struct_member
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_struct_member

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-13 19:17:26 -04:00
Ilia Mirkin
5239f1e0c9 glsl: make sure that interpolateAt arguments are variables
In the case of a constant, it might have been propagated through and
variable_referenced() returns NULL. Error out in that case.

Fixes 3 dEQP tests:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_constant
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_constant
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_constant

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-13 19:17:26 -04:00
Tobias Klausmann
8f45f4f3ca mesa/st: Add support for GL_ARB_cull_distance (v2)
v2: don't bother with cull dist varyings except to assert.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-14 08:28:23 +10:00
Tobias Klausmann
2be258ea18 gallium: Add a pipe cap for arb_cull_distance
This lets us safely enable or disable the extension as needed

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-14 08:28:17 +10:00
Tobias Klausmann
d656736bbf glsl: Add arb_cull_distance support (v3)
v2: make too large array a compile error
v3: squash mesa/prog patch to avoid static compiler errors in bisect

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-14 08:28:08 +10:00
Tobias Klausmann
ad355652c2 glsl: Extend lowering pass for gl_ClipDistance to support other arrays (v4)
This will come in handy when we want to lower gl_CullDistance into
gl_CullDistanceMESA.

[airlied: drop separate APIs for clip/cull - just use single API
to call both passes.]

v3: reexamine my sanity, this was pretty broken, the new code
creates one copy of gl_ClipDistanceMESA, as the clip distance
varying and lowers everything into that in two passes, one for clips
one for culls.
v4: rework using the passes in clip/cull sizes, instead of the
array sizes.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-14 08:28:07 +10:00
Dave Airlie
dd3390e12f glsl: rename lower_clip_distance to lower_distance.
This just renames the file in anticipation of adding cull lowering,
and renames the internals.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-14 08:27:40 +10:00
Tobias Klausmann
eb18fea707 mesa/main: Add support for GL_ARB_cull_distance (v2)
airlied:
v2: rename LowerClipDistance to LowerCombinedClipCullDistnace.
I don't think we want any other behaviour with any current hw.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-14 08:27:29 +10:00
Tobias Klausmann
f2a2e08e01 glapi: Add GL_ARB_cull_distance
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-14 08:27:10 +10:00
Nanley Chery
6674d018f7 anv/copy: Fix copying Images from Buffers with larger dimensions
This function previously assumed that the Buffer and Image had matching
dimensions. However, it is possible to copy from a Buffer with larger
dimensions than the Image. Modify the copy function to enable this.

v2: Use ternary instead of MAX for setting bufferExtent (Jason Ekstrand)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95292
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Matthew Waters <matthew@centricular.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-13 14:08:57 -07:00
Maarten Lankhorst
ff5c312623 .mailmap: Fix my email addresses.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@ubuntu.com>
2016-05-13 12:28:05 +02:00
Nicolai Hähnle
a694c20ecf radeonsi/sid_tables: rename reg_table to sid_reg_table
This is purely cosmetic, making it easier to assign blame for space used
in the binary in case somebody else makes a similar cleanup effort in the
future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:39 -05:00
Nicolai Hähnle
c7f73a70f0 radeonsi/sid_tables: store offset into global fields table instead of pointer
This avoids relocations in the final binary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:39 -05:00
Nicolai Hähnle
54ab39caaf radeonsi/sid_tables: store strings by offset instead of by pointer
This saves some space and avoids the need for relocations.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:39 -05:00
Nicolai Hähnle
ca8f71f4cb r600: remove TABLE_SIZE macro
Use ARRAY_SIZE instead.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:38 -05:00
Nicolai Hähnle
43ac091e4c r600: move alu_op_table to .c file
So that it gets compiled and emitted only once, saving space is the final
binary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:38 -05:00
Nicolai Hähnle
390c740b99 r600: move cf_op_table to .c file
So that it gets compiled and emitted only once, saving space is the final
binary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:38 -05:00
Nicolai Hähnle
a180e1d22d r600: move fetch_op_table to .c file
So that it gets compiled and emitted only once, saving space is the final
binary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:37 -05:00
Nicolai Hähnle
6d350fb13f r600: protect r600_isa.h with extern "C"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-05-13 01:03:37 -05:00
Bas Nieuwenhuizen
ac77fb74a0 gallium/ddebug: Implement launch_grid.
Does not implement dumping info.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-13 07:43:46 +02:00
Bas Nieuwenhuizen
22b35122fa gallium/ddebug: Support compute states.
v2: Reuse the macro for bind & delete.

Note that may not be able to share the delete long-term as
pipe_compute_state contains members not in pipe_shader_state,
and we need to distinguish the pointer location if we add that
struct to the union.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-13 07:43:37 +02:00
Bas Nieuwenhuizen
5efe477b13 gallium/ddebug: Add passthrough for get_compute_param.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-13 07:39:12 +02:00
Ian Romanick
8f05a0a4c0 nir: Remove empty visit_call_src and visit_load_const_src functions
The guts were removed in dfb3abba.  It has been almost exactly a year,
so I dont think we're going to "decide we want [predication] back."

Silences several "unused parameter" warnings:

nir/nir.c: In function ‘visit_call_src’:
nir/nir.c:1052:32: warning: unused parameter ‘instr’ [-Wunused-parameter]
 visit_call_src(nir_call_instr *instr, nir_foreach_src_cb cb, void *state)
                                ^
nir/nir.c:1052:58: warning: unused parameter ‘cb’ [-Wunused-parameter]
 visit_call_src(nir_call_instr *instr, nir_foreach_src_cb cb, void *state)
                                                          ^
nir/nir.c:1052:68: warning: unused parameter ‘state’ [-Wunused-parameter]
 visit_call_src(nir_call_instr *instr, nir_foreach_src_cb cb, void *state)
                                                                    ^
nir/nir.c: In function ‘visit_load_const_src’:
nir/nir.c:1058:44: warning: unused parameter ‘instr’ [-Wunused-parameter]
 visit_load_const_src(nir_load_const_instr *instr, nir_foreach_src_cb cb,
                                            ^
nir/nir.c:1058:70: warning: unused parameter ‘cb’ [-Wunused-parameter]
 visit_load_const_src(nir_load_const_instr *instr, nir_foreach_src_cb cb,
                                                                      ^
nir/nir.c:1059:28: warning: unused parameter ‘state’ [-Wunused-parameter]
                      void *state)
                            ^

v2: Add some comments in nir_foreach_src suggested by Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Connor Abbott <cwabbott0@gmail.com>
2016-05-12 16:47:14 -07:00
Ian Romanick
098166e1bc nir: Silence unused parameter warnings
These cases had the parameter removed:

nir/nir_lower_vec_to_movs.c: In function ‘try_coalesce’:
nir/nir_lower_vec_to_movs.c:124:66: warning: unused parameter ‘shader’ [-Wunused-parameter]
 try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
                                                                  ^
nir/nir_lower_io.c: In function ‘load_op’:
nir/nir_lower_io.c:147:32: warning: unused parameter ‘state’ [-Wunused-parameter]
 load_op(struct lower_io_state *state,
                                ^

These cases had the parameter (void) silenced because the parameter was
necessary for an interface:

nir/glsl_to_nir.cpp:1900:32: warning: unused parameter 'ir' [-Wunused-parameter]
 nir_visitor::visit(ir_barrier *ir)
                                ^
nir/nir.c: In function ‘remove_use_cb’:
nir/nir.c:802:35: warning: unused parameter ‘state’ [-Wunused-parameter]
 remove_use_cb(nir_src *src, void *state)
                                   ^
nir/nir.c: In function ‘remove_def_cb’:
nir/nir.c:811:37: warning: unused parameter ‘state’ [-Wunused-parameter]
 remove_def_cb(nir_dest *dest, void *state)
                                     ^

Number of total warnings in my build reduced from 2543 to 2538
(reduction of 5).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-12 16:46:41 -07:00
Leo Liu
bd9ae72459 vl/dri: fix close fd error out
fd should be set to -1 only if it got closed by pipe_loader_release.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-05-12 18:26:48 -04:00
Samuel Pitoiset
988b09f9ac nvc0: fix indentation in nvc0_invalidate_resource_storage()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-12 21:37:08 +02:00
Samuel Pitoiset
abb3401095 nvc0: save some CPU cycles in nvc0_context_unreference_resources()
This reduces the number of loop iterations for invalidating buffers
and images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-12 21:37:08 +02:00
Samuel Pitoiset
b8f0b00a9a nvc0: invalidate texture buffers for compute
This is a pretty rare situation but this can happen though.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-12 21:37:08 +02:00
Tim Rowley
2785f2f2d7 swr: properly expose compressed format support
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-12 14:12:18 -05:00
Jason Ekstrand
5186545d66 anv: Don't advertise shaderImageGatherExtended
We don't actually support all of the extended gather functionality so we
shouldn't be advertising it.
2016-05-12 10:57:00 -07:00
Rob Clark
9d3cc80b75 nir: glsl_get_bit_size() should take glsl_type
It's what all the call-sites once, so gets rid of a bunch of inlined
glsl_get_base_type() at the call-sites.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-12 13:39:40 -04:00
Topi Pohjolainen
b19cff1639 i965/gen9: Enable lossless compression
I tried first creating the auxiliary buffer the same time with the
color buffer. That, however, led me into a situation where we would
later create the rest of the mip-levels and the compression would
need to be disabled (it is only supported for single level buffers).

Here we try to create it on demand just before the hardware starts
to render. This is similar what we do with fast clear buffers,
their creation is deferred until the first clear.
This setup also gives the opportunity to detect if the miptree
represents the temporaty texture used internally in the mesa core.
This texture is mostly written by cpu and therefore enabling
compression for it doesn't make much sense.

Note that a heuristic is included. Floating point formats are not
enabled yet as they are only seen to hurt performance.

Some highlights with window system driver kept fixed to default
and only the application driver changing:

Manhattan: 8.32152% +/- 0.355881%
Offscreen: 9.09713% +/- 0.340763%

Glb trex: 8.46231% +/- 0.460624%
Offscreen: 9.31872% +/- 0.463743%

v2 (Ben): Re-use msaa layout type for single sampled case.
v3: Moved the deferred allocation of mcs to brw_try_draw_prims() and
    brw_blorp_blit_miptrees() instead.
v4: (Ken): Drop MIPTREE_LAYOUT_ACCELERATED_UPLOAD when allocating mcs.
    Do not enable for scanout buffers

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
cd9e97a020 i965: Set render state for lossless compressed
v2: Add support for blorp and removed the support for meta
v3 (Ben): Add assertion on compressed non-fast clear - must
          be partial clear.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
cda8c2a911 i965/wm: Don't sample lossless compressed as multisampled
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
683dda0083 i965/gen9: Setup MCS for compressed texture surfaces
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
1a05aeeb1c i965/blorp: Do not resolve lossless compressed blit sources
Blorp blits use sampling engine which is capable of resolving
on the fly. Buffers are still resolved for blitter engine. Current
understanding is that blitter doesn't understand lossless compression.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
01ba26d0b0 i965/blorp: Prepare blits for lossless compression
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:37 +03:00
Topi Pohjolainen
84066ebd63 i965: Deferred allocation of mcs for lossless compressed
Until now mcs was associated to single sampled buffers only for
fast clear purposes and it was therefore the responsibility of the
clear logic to allocate the aux buffer when needed. Now that normal
3D render or blorp blit may render with mcs enabled also, they need
to prepare the mcs just as well.

v2: Do not enable for scanout buffers
v3 (Ben):
   - Fix typo in commit message.
   - Check for gen < 9 and return early in brw_predraw_set_aux_buffers()
   - Check for gen < 9 and return early in intel_miptree_prepare_mcs()
v4: Check for msaa_layput and number of samples to determine if
    lossless compression is to used. Otherwise one cannot distuingish
    between fast clear with and without compression.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:26 +03:00
Topi Pohjolainen
1ca02b6ebb i965: Add flag telling if miptree is for client consumption
Consider later on adding specific disable flags such as

MIPTREE_LAYOUT_DISABLE_AUX_MCS   = 1 << 3, /* CCS_D */
MIPTREE_LAYOUT_DISABLE_AUX_CCS_E = 1 << 4,
MIPTREE_LAYOUT_DISABLE_AUX       = MIPTREE_LAYOUT_DISABLE_AUX_MCS |
                                   MIPTREE_LAYOUT_DISABLE_AUX_CCS_E,

and equivalent boolean/enums into miptree.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
a6e0f1cc7f i965: Add helper for lossless compression support
v2: Check explicitly against base type of GL_FLOAT instead of
    using _mesa_is_format_integer_color(). Otherwise we miss
    GL_UNSIGNED_NORMALIZED.
v3 (Ben): Also call intel_miptree_supports_non_msrt_fast_clear()
          in order to really check everything.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
874c5f05db i965/gen9: Prepare surface state setup for lossless compression
v2 (Ben): Use combination of msaa_layout and number of samples
          instead of introducing explicit type for lossless
          compression (intel_miptree_is_lossless_compressed()).
v3 (Ben): Do not set fast claer state in surface state setup.
          Moved into brw_postdraw_set_buffers_need_resolve()
          using a separate patch.
v4: Support for blorp
v5 (Ben): Re-use gen8_get_aux_mode()

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
a8544267fd i965/gen8: Expose auxiliary mode resolver
Also use the opportunity to drop the unused surface type argument.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
94926492d8 i965: Relax assertion of halign == 16 for lossless compressed aux
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
ba9f954e60 i965/blorp: Set full resolve for lossless compressed
v2 (Ben): Introduce union for fast clear and resolve ops

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:49:22 +03:00
Topi Pohjolainen
58e7392e12 i965/blorp: Do not skip fast color clear with new color
This hasn't been visible before. It showed up with lossless
compression with:

dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.rgb8

Current fast clear logic kicks color resolves even for gpu sampling.
In the test case this results into trashing of the fast color clear
state between two subsequent clears, and therefore each clear is
performed correctly.
With lossless compression the resolves are unnecessary and therefore
the clear state indicates that the buffer is already cleared. Without
considering if the previous color value was the same as the new,
clears that need to be performed are skipped and the buffer ends up
holding old pixel values.

v2 (Ken): Fix the comparison for gen < 9

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-05-12 19:48:47 +03:00
Kenneth Graunke
12dcad1b42 i965: Enable scalar GS by default.
I'd originally left this off because Orbital Explorer was hanging the
GPU, but it seems to be working these days.  There have been a bunch
of changes since then, so we probably fixed something.

On my Broadwell laptop, both Synmark/GSCloth and Orbital Explorer seem
to run at approximately the same framerate in either mode.  This is
despite large reductions in instruction count for Synmark, and large
increases for Orbital Explorer.  It apparently just doesn't matter.

Switching to scalar mode will gain us fp64 support in the next release,
as vec4-mode support isn't yet ready.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-12 01:01:42 -07:00
Kenneth Graunke
607fb0f13d i965: Reduce the SIMD8 GS push constant threshold from 32 to 24.
Three Shadow of Mordor geometry shaders increase by a single
instruction, but the number of spills/fills in Orbital Explorer
is reduced from 194:1279 -> 82:454.  No other programs are affected.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-12 01:01:42 -07:00
Kenneth Graunke
3aa542c657 i965: Delete bogus assertion in emit_gs_input_load().
This looks like leftover cruft from an earlier attempt at writing
point size hacks.  Each vertex has its own copy of gl_PointSize,
so accessing any vertex other than 0 would cause this to fail.

The tests seem to work fine without it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-12 01:01:42 -07:00
Kenneth Graunke
1c41cb58de i965: Support instanced GS inputs in the scalar backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-12 01:01:36 -07:00
Kenneth Graunke
5fc3772650 i965: Use an early return for the push case in emit_gs_input_load().
Just trying to keep things from getting too ugly in the next commit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-12 00:59:08 -07:00
Kenneth Graunke
e9ca952581 i965: Drop BRW_NEW_BLORP from stipple and line parameter packets.
BLORP never touches these, and they're all non-pipelined.  Some
are fairly large packets as well.

I haven't tried to benchmark this; the effect is likely to be small.
However, we may as well stop the pointless papercuts; maybe they'll
add up someday.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-12 00:54:37 -07:00
Jakob Sinclair
18f7c88dd6 glsl: fixed uninitialized pointer
Class "ir_constant" had a bunch of constructors where the pointer member
"array_elements" had not been initialized. This could have lead to unsafe
code if something had tried to write anything to it. This patch fixes
this issue by initializing the pointer to NULL in all the constructors.
This issue was discovered by Coverity.

CID: 401603, 401604, 401605, 401610

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-05-12 09:46:36 +02:00
Ilia Mirkin
ba3f0b6d59 nvc0: fix gl_SampleMaskIn computation
The SAMPLEMASK semantic should only return the bits set covered by the
current invocation. However we were always retrieving the covmask, which
returns the covered samples of the whole pixel.

When not doing per-sample invocation, this is precisely what we want.
However when doing per-sample invocation, we have to select the
sampleid'th bit and only return that. Furthermore, this means that we
have to have a 1:1 correlation for invocations and samples.

This fixes most

dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*

tests. A few failures remain due to disagreements about nr_samples==1
logic as well as what happens with MSAA x2 RTs when the shading fraction
is 0.5.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-11 20:39:27 -04:00
Ilia Mirkin
f5fe903002 nv50/ir: generalize interp fixups to be able to fixup anything
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-11 20:39:26 -04:00
Jason Ekstrand
66a442687f .mailmap: Update the e-mail addresses for Kristian Høgsberg
This changes it to use his personal e-mail and adds his @intel.com address

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-11 12:32:14 -07:00
Jason Ekstrand
7e759fbd60 .mailmap: Use Connor Abbott's personal e-mail 2016-05-11 12:27:15 -07:00
Giuseppe Bilotta
9c3392cb3a Add .mailmap
This adds a first tentative .mailmap file, to canonicize contributor
name/emails in shortlogs and other statistical endeavours.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:21:46 -07:00
Jason Ekstrand
f1dcc7976a i965: Stop splitting fma() prior to optimization
According to the GLSL spec, if the user uses the fma() intrinsic to
generate a precise-consumed value, and you have it in your hardware, you
shouldn't split it.  For a while now, we've been splitting all ffma's
up-front and then planned to fuse them later which isn't valid.  Correctly
handling the GLSL behaviour fixes rendering corruptions in Tomb Raider.
The only reason why doing this possibly helped before was for ARB programs
which is handled by the previous commit.

Shader-db results on Haswell:

   total instructions in shared programs: 7560300 -> 7561510 (0.02%)
   instructions in affected programs: 56265 -> 57475 (2.15%)
   helped: 86
   HURT: 291

The only shaders in the database that are affected are from "Shadow of
Mordor" which is the first app in our database to use fma().  We could, at
some point in the future, split inexact ffma opcodes which would fix the
shader-db regressions since Shadow of Mordor doesn't ues precise.  However,
this fixes a bug now and and the shader-db impact is fairly small.

Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Jason Ekstrand
47f01e538a ptn: Emit mul+add for MAD
Unlike fma() in GLSL, MAD in ARB programs is 100% splittable.  Just emit
the split version and let the optimizer fuse them later.

Shader-db results on Haswell:

   total instructions in shared programs: 7560379 -> 7560300 (-0.00%)
   instructions in affected programs: 143928 -> 143849 (-0.05%)
   helped: 443
   HURT: 250

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Jason Ekstrand
1b72c31e1f nir/algebraic: Separate ffma lowering from fusing
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Rob Clark
5886d1bad1 anv: fix build break
Previous rename of lower-output-to-temps pass predated merging of anv,
and apparently vulkan wasn't enabled in my local builds so overlooked
this when rebasing.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-11 14:03:24 -04:00
Rob Clark
697382eb61 mesa/st: split the type_size calculation into it's own file
We'll want to re-use this for NIR.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-11 12:20:12 -04:00
Rob Clark
0e5a369879 glsl: export accessor for builtin-uniform descriptors
We'll need this for a nir pass to lower builtin-uniform access.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:12 -04:00
Rob Clark
dfbabc6bad nir/lower-io: add support for lowering inputs
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark
595f9d5476 nir/lower-io: split out some helper fxns
Prep work to reduce the noise in the next patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark
b085016f94 nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries
Since it will gain support to lower inputs, give it a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-11 12:20:11 -04:00
Rob Clark
47fcef9a20 nir: move callsite of lower_outputs_to_temporaries
Going to convert this pass to parameterized lower_io_to_temporaries, and
we want the user to be able to specify whether to lower outputs or
inputs or both.  The restriction of running this pass before validate
to avoid output reads no longer applies.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark
5261947260 nir: lower-io-types pass
A pass to lower complex (struct/array/mat) inputs/outputs to primitive
types.  This allows, for example, linking that removes unused components
of a larger type which is not indirectly accessed.

In the near term, it is needed for gallium (mesa/st) support for NIR,
since only used components of a type are assigned VBO slots, and we
otherwise have no way to represent that to the driver backend.  But it
should be useful for doing shader linking in NIR.

v2: use glsl_count_attribute_slots() rather than passing a type_size
    fxn pointer

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark
b10cc24519 nir: passthrough-edgeflags support
Handled by tgsi_emulate for glsl->tgsi case.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark
3a939d034e nir: add lowering pass for glBitmap
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark
12c18ce476 nir: add lowering pass for glDrawPixels
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark
b26645a00f nir: add lowering pass for y-transform
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark
e1d80f8603 gallium: add NIR as a possible IR
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-11 12:20:11 -04:00
Rob Clark
425dc4c4b3 gallium: refactor pipe_shader_state to support multiple IR's
The goal is to allow the pipe driver to request something other than
TGSI, but detect whether what is getting is TGSI vs what it requested.
The pipe drivers will always have to support TGSI (and convert that into
whatever it is that they prefer), but in some cases we should be able to
skip the TGSI intermediate step (such as glsl->nir vs glsl->tgsi->nir).

I think pipe_compute_state should get similar treatment.  Currently,
afaict, it has one user and one consumer, which has allowed it to be
sloppy wrt. supporting alternative IR's.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-11 12:20:11 -04:00
Rob Clark
4500d17245 freedreno: fix multi-layer transfer_map's
The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-11 12:03:21 -04:00
Juan A. Suarez Romero
9bea018994 glsl: use var with initializer on global var validation
Currently, when cross validating global variables, all global variables
seen in the shaders that are part of a program are saved in a table.

When checking a variable this already exist in the table, we check both
are initialized to the same value. If the already saved variable does
not have an initializer, we copy it from the new variable.

Unfortunately this is wrong, as we are modifying something it is
constant. Also, if this modified variable is used in
another program, it will keep the initializer, when it should have none.

Instead of copying the initializer, this commit replaces the old
variable with the new one. So if we see again the same variable with an
initializer, we can compare if both are the same or not.

v2: convert tabs in whitespaces (Kenenth Graunke)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 13:50:04 +02:00
Jordan Justen
2c1c060b03 util/ralloc: Remove double zero'ing of rzalloc buffers
Juha-Pekka found this back in May 2015:
<1430915727-28677-1-git-send-email-juhapekka.heikkila@gmail.com>

From the discussion, obviously it would be preferable to make
ralloc_size no longer return zeroed memory, but Juha-Pekka found that
it would break Mesa.

In <56AF1C57.2030904@gmail.com>, Juha-Pekka mentioned that patches
exist to fix i965 when ralloc_size is fixed to not zero memory, but
the patches have not made their way to mesa-dev yet.

For now, let's stop doing the double zeroing of rzalloc buffers.

v2:
 * Move ralloc_size code to rzalloc_size, and add a comment as
   suggested by Ken.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 22:54:46 -07:00
Jonathan Gray
e3d43dc5ea genxml: avoid using a GNU make pattern rule
% pattern rules are a GNU extension.  Convert the use of one to a
inference rule to allow this to build on OpenBSD.

v2: inference rules can't have additional prerequisites so add a target
rule to still depend on gen_pack_header.py

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-10 20:54:33 -07:00
Roland Scheidegger
430797843a gallivm: improve dumping of bitcode
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-11 04:43:35 +02:00
Vinson Lee
8d639138c7 swr: [rasterizer] Include cmath for std::isnan and std::isinf.
This patch fixes this build error.

  CXX      rasterizer/memory/libswrAVX_la-ClearTile.lo
In file included from rasterizer/memory/ClearTile.cpp:34:0:
./rasterizer/memory/Convert.h: In function ‘uint16_t Convert32To16Float(float)’:
./rasterizer/memory/Convert.h:170:9: error: ‘__builtin_isnan’ is not a member of ‘std’
     if (std::isnan(val))
         ^
./rasterizer/memory/Convert.h:170:9: note: suggested alternative:
<built-in>: note:   ‘__builtin_isnan’
./rasterizer/memory/Convert.h:176:14: error: ‘__builtin_isinf_sign’ is not a member of ‘std’
     else if (std::isinf(val))
              ^
./rasterizer/memory/Convert.h:176:14: note: suggested alternative:
<built-in>: note:   ‘__builtin_isinf_sign’

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95180
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-05-10 17:11:05 -07:00
Jason Ekstrand
a5660bf1f8 i965/blorp: Don't blend integer values during MSAA resolves
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 15:32:00 -07:00
Jason Ekstrand
4f4f393bf3 meta/blit: Don't blend integer values during MSAA resolves
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 15:31:50 -07:00
Jason Ekstrand
203c786a73 i965/fs: Default all constants to a location of -1
Otherwise constants which aren't live get an undefined constant location.
When we go to set up param and pull_param we end up assigning all unused
uniforms to slot 0.  This cases the Vulkan driver to segfault because it
doesn't have pull_param.

This fixes bugs in the Vulkan driver introduced in c3fab3d000.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2016-05-10 15:25:30 -07:00
Dave Airlie
d36d11ad90 st/glsl_to_tgsi: attach image to correct instruction for samples
This fixes a crash (but not the test):
GL45-CTS.shader_texture_image_samples_tests.functional_test

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-11 06:55:09 +10:00
Dave Airlie
07df3b81ff mesa: move MESA_MAP_NOWAIT_BIT up away from GL_MAP_PERSISTENT_BIT
This was colliding badly and making
GL45-CTS.buffer_storage.map_persistent_texture
fail on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-11 06:54:56 +10:00
Dave Airlie
b230d51a18 mesa/meta: check for signed/unsigned int conversion for pbo getteximage
When doing GetTexSubImage using a PBO, we should check if it involves
a signed/unsigned conversion and bail if it does, just like in the
other cases.

This fixes:
GL33-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo
on Haswell at least.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95324
Reviewed-by: Matt Turer <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-11 06:52:20 +10:00
Matt Turner
8bb156a261 i965: Handle BRW_OPCODE_DO on Gen6+ in brw_instruction_name().
This became a problem after the recent disassembler changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 12:12:46 -07:00
Bas Nieuwenhuizen
3d21720d31 radeonsi: Set declared tessellation LDS size to hardware size.
The calculated limit gave problems on SI as it was > 32 KiB
and the hardware LDS size on SI is only 32 KiB. It isn't
correct anyway when processing multiple patches in a threadgroup.

As we potentially have any number of patches such that the
used LDS is at most the hardware LDS size, and exact size
per patch is not known at compile time, this seems like
the only valid bound.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-10 20:14:55 +02:00
Rob Clark
8623e599fc freedreno/ir3: size input/output arrays properly
We index into these based on var->data.driver_location, which might have
gaps (ie. two inputs, one w/ drvloc 0 and other 2).  This shows up in
(for example) 'bin/copyteximage 1D', but was only noticed recently due
to additional asserts.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-10 13:17:27 -04:00
Ian Romanick
2483a9a08c ir_to_mesa: Emit smarter ir_binop_logic_or for vertex programs
Continue using ADD in the other case because a fragment shader backend
could fuse the ADD with a MUL to generate a MAD for ((x && y) || z).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 09:22:18 -07:00
Ian Romanick
f7328f9afd prog: Delete all remains of OPCODE_SNE, OPCODE_SEQ, OPCODE_SGT, and OPCODE_SLE
There is nothing left that can generate them.  These used to be
generated by ir_to_mesa or by the assembler for various NV extensions
that have been removed.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 09:22:18 -07:00
Ian Romanick
fd63e77998 ir_to_mesa: Do not emit OPCODE_SEQ or OPCODE_SNE
Nothing that consumes the output of this backend consumes them
navtively.  This is *not* the way i915 has implemented these
instructions, but, as far as I am able to tell, this is the way both the
Cg compiler and the HLSL compiler implement these operations.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 09:22:18 -07:00
Ian Romanick
15e6a1a3be ir_to_mesa: Do not emit OPCODE_SLE or OPCODE_SGT
Nothing that consumes the output of this backend consumes them
navtively.  This is the way i915 has implemented these instructions
since it began consuming GLSL.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-10 09:22:18 -07:00
Samuel Pitoiset
e46ac18ebe nvc0: enable compute support by default on GK110+
Compute support seems to be pretty stable now, and according to piglit
it doesn't seem to break 3D state.

As a side effect, this will expose ARB_compute_shader on GK110/GK208.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-10 17:47:01 +02:00
Marek Olšák
2b58bc4461 gallium/radeon: don't flush the GFX IB if DMA doesn't depend on it
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
fb89f06698 radeonsi: consolidate radeon_add_to_buffer_list calls for DMA
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
60946c0d60 gallium/radeon: add a heuristic for better (S)DMA performance
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
bb74152597 gallium/radeon: flush if DMA IB memory usage is too high
This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
70934de00e radeonsi: add new SDMA texture copy code
This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
  (unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.

v2: Extend the SDMA L2T VM fault workaround to T2L.
    - Same algorithm, just applied to T2L.
      (and using a 0-based address and surface.bo_size instead of buf->size)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
a512da36ae gallium/radeon: fix (S)DMA read-after-write hazards
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
f837c37f02 radeonsi: raise the max size for SDMA buffer copies
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
faa4f0191d radeonsi: remove SDMA texture copy code
Most of this has never worked according to the new test.

The new code will be radically different.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
498a40cae8 radeonsi: only expose *_init_*dma_functions from (S)DMA files
just normalizing the interfaces

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
3af28e558f gallium/radeon: implement randomized SDMA texture copy testing (v2)
v2: - adjustments for exercising all important SDMA code paths
    - decrease the probability of getting huge sizes (faster testing)
    - increase the probability of getting power-of-two dimensions
    - change the memory cap to 128MB (faster testing)
    - better detect which engine has been used

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
f475c9fb07 gallium/radeon: discard CMASK or DCC if overwriting a whole texture by DMA
v2: simplify the conditionals

Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
2f173b8e13 gallium/radeon: use a common function for DMA blit preparation
this is more robust and probably fixes some bugs already

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
2af4b637d8 gallium/radeon: split out code for discarding DCC
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
c85d0c17d9 gallium/radeon: rename r600_texture_disable_cmask -> discard_cmask
because it doesn't decompress

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
fb9fe352ea st/mesa: use transfer_inline_write for memcpy TexSubImage path
This allows drivers to use their own fast path for texture uploads.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
871d2aff24 gallium/radeon: fix partial layered transfers of cube (array) textures
a staging cube texture with array_size % 6 != 0 doesn't work very well

just use 2D_ARRAY or 2D for all staging textures

Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
c2377b394b gallium/radeon: align alignments for better buffer reuse
It's for the buffer cache.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
544967faf5 gallium/radeon: use gart_page_size instead of hardcoded 4096
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
bfa8a00920 winsys/radeon: use gart_page_size instead of private size_align
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Marek Olšák
9d8c283f28 winsys/amdgpu: move gart_page_size to struct radeon_winsys
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-10 17:20:09 +02:00
Roland Scheidegger
e4cf8717de gallivm: print declarations of intrinsics with GALLIVM_DEBUG=ir
Those aren't really interesting, however outputting them is helpful when
trying to feed the IR to llvm llc (or opt) for debugging.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-10 17:08:16 +02:00
Roland Scheidegger
5c200894c8 gallivm: use InternalLinkage instead of PrivateLinkage for texture functions
At least with MCJIT the disassembler will crash otherwise when trying to
disassemble such functions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-10 17:08:16 +02:00
Roland Scheidegger
8b66e2647d gallivm: disable avx512 features
We don't target this yet, and some llvm versions incorrectly enable it based
on cpu string, causing crashes.
(Albeit this is a losing battle, it is pretty much guaranteed when the next
new feature comes along llvm will mistakenly enable it on some future cpu,
thus we would have to proactively disable all new features as llvm adds them.)

This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested)

Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com

CC: <mesa-stable@lists.freedesktop.org>
2016-05-10 17:08:16 +02:00
Jose Fonseca
94e8653a3b Revert "nir: Try to warn when C99 extensions are used in nir headers."
This reverts commit 99474dc29b.

-Wpedantic is too verbose, even when applied to just a few includes.

We'll just have to deal with the issues as they come.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-10 03:29:24 -07:00
Samuel Iglesias Gonsálvez
4c9006f957 i965/fs: fix MOV_INDIRECT exec_size for doubles
In that case, the writes need two times the size of a 32-bit value.
We need to adjust the exec_size, so it is not breaking any hardware
rule.

v2:
  - Add an assert to verify type size is not less than 4 bytes (Jordan).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:09 +02:00
Samuel Iglesias Gonsálvez
75ada43a3a i965/fs: take into account doubles when calculating read_size for MOV_INDIRECT
v2:
- Fix assert's line width (Topi).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:09 +02:00
Samuel Iglesias Gonsálvez
03687ab77f i965/fs: demote_pull_constants() did not take into account double types
The constants could be double, and it was allocating size for float types
for the destination register of varying pull constant loads.

Then the fs_visitor::validate() will complain.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:09 +02:00
Samuel Iglesias Gonsálvez
c3fab3d000 i965/fs: push first double-based uniforms in push constant buffer
When there is a mix of definitions of uniforms with 32-bit or 64-bit
data type sizes, the driver ends up doing misaligned access to double
based variables in the push constant buffer.

To fix this, this patch pushes first all the 64-bit variables and
then the rest. Then, all the variables would be aligned to
its data type size.

v2:
- Fix typo and improve comment (Jordan).
- Use ralloc(NULL,...) instead of rzalloc(mem_ctx,...) (Jordan).
- Fix typo (Topi).
- Use pointers instead of references in set_push_pull_constant_loc() (Topi).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-05-10 11:25:09 +02:00
Iago Toral Quiroga
193cb67a84 i965/fs: recognize writes with a subreg_offset > 0 as partial
Usually, writes to a subreg_offset > 0 would also have a stride > 1
and we would recognize them as partial, however, there is one case
where this does not happen, that is when we generate code for 64-bit
imemdiates in gen7, where we produce something like this:

mov(8) vgrf10:UD, <low 32-bit>
mov(8) vgrf10+0.4:UD, <high 32-bit>

and then we use the result with a stride of 0, as in:

mov(8) vgrf13:DF, vgrf10<0>:DF

Although we could try to avoid this issue by producing different code
for this by using writes with a stride of 2, that runs into other
problems affecting gen7 and the fact is that any instruction that
writes to a subreg_offset > 0 is a partial write so we should really
recognize them as such.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:09 +02:00
Iago Toral Quiroga
34ed61b334 i965/fs/lower_simd_width: Fix registers written for split instructions
When the original instruction had a stride > 1, the combined registers
written by the split instructions won't amount to the same register space
written by the original instruction because the split instructions will
use a stride of 1. The current code assumed otherwise and computed the
number of registers written by split instructions as an equal share based
on the relation between the lowered width and the original execution size
of the instruction.

It is only after the split, when we interleave the components of the result
from the lowered instructions back into the original dst register, that the
original stride takes effect and we write all the registers specified by
the original instruction.

Just make the number of register written the same as the vgrf space we
allocate for the dst of the split instruction.

Fixes crashes in fp64 tests produced as a result of assigning incorrectly the
number of registers written by split instructions, which led to incorrect
validation of the size of the writes against the allocated vgrf space.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:09 +02:00
Iago Toral Quiroga
9741cff1ec i965/fs: rename our lower_d2f pass to lower_d2x
Since it no longer handles conversions from double to float but from
double to various other 32-bit types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:09 +02:00
Iago Toral Quiroga
efaf62a40a i965/fs: implement i2d and u2d
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
c63a6f2149 i965/fs: implement d2i and d2u
These need the same treatment as d2f, so generalize our d2f lowering to cover
these too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
e0c45182e3 i965/fs: implement d2b
v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
80f60a4302 i965/fs: implement fsign() for doubles
v2 (Sam):
  - Fix indentation (Kenneth)
  - Simplify code (Kenneth)

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
c9ecd651e6 i965/fs: add null_reg_df
Probably not needed since we fix the dst type of comparisons
automatically, but for consistency with the rest of null_reg_*
functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
e8a8fc9563 i965/fs: We only support 32-bit integer ALU operations for now
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Iago Toral Quiroga
9e5ce151a4 i965/fs: handle fp64 opcodes in brw_do_channel_expressions
In the case of the pack opcode we are already doing the
lowering in NIR, so no need to do it here. The unpack opcode
operates on scalars, so it should not be lowered.

In the case of frexp_sig and frexp_exp, they are lowered in
lower_instructions, so we don't have to care about them.

All the remaining opcodes involve conversions from and to doubles
and are business as usual.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Connor Abbott
a644b0939d i965/fs: add support for f2d and d2f
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Connor Abbott
9e1b3ea199 i965/fs: add a pass for legalizing d2f
We need to do this late, in order to avoid partial writes during the
optimization loop.

v2: Use subscript() instead of stride().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:08 +02:00
Connor Abbott
2286a74e3b i965/fs: fix dst width calculation in CSE
v2 (Sam):
- Fix line width (Topi).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:08 +02:00
Connor Abbott
fccd15524f i965/fs: fix regs_written in LOAD_PAYLOAD for doubles
v2: Account for the stride of the dst (Iago)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:07 +02:00
Connor Abbott
6b6d68ae07 i965/fs: fix is_copy_payload() for doubles
v2 (Sam):
- LOAD_PAYLOAD treats each header source as a 32B block
  regardless of the datatype. Drop the change (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:07 +02:00
Connor Abbott
e83f51d54e i965/fs: fix compares for doubles
The destination has to have the same source as the type, or else the
simulator will complain. As a result, we need to emit a CMP that
outputs a 64-bit wide result and then do a strided MOV to pick out the
low 32 bits of each channel.

v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:07 +02:00
Connor Abbott
a5d7e144ea i965/fs: extend exec_size halving in the generator
The HW has a restriction that only vertical stride may cross register
boundaries. Previously, this only mattered for SIMD16 instructions where
we needed to use the same regioning parameters as the equivalent SIMD8
instruction but double the exec size. But we need to do the same
splitting for 64-bit instructions as well as instructions with a stride
of 2 (which effectively consume 64 bits per element). Fix up the code to
do the right thing instead of special-casing SIMD16.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:07 +02:00
Connor Abbott
4f3888c1ca i965/fs: fix assign_constant_locations() for doubles
Uniform doubles will read two registers, in which case we need to mark
both as being live.

v2 (Sam):
  - Use a formula to get the number of registers read with proper
    units (Curro).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:07 +02:00
Connor Abbott
cc64c9e441 i965/fs: use byte_offset() in offset() for uniforms
This makes things more consistent, and also fixes the offset calculation
for double uniforms.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:07 +02:00
Connor Abbott
fe949949a9 i965/fs: handle uniforms in byte_offset()
v2: Do it only for uniforms (Iago)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Connor Abbott
1f51aada3f i965/fs: fix type_size() for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Iago Toral Quiroga
935e0e305d i965/fs: optimize unpack double
When we are actually unpacking from a double that we have previously
packed from its 32-bit components we can bypass the pack operation
and source from its arguments directly.

v2 (Sam):
- Fix line overflow (Topi)
- Bail if the parent instruction's source is not SSA (Connor)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Iago Toral Quiroga
ba1907f040 i965/fs: optimize pack double
When we are actually creating a double using values obtained from a
previous unpack operation we can bypass the unpack and source from
the original double value directly.

v2:
- Style changes (Topi)
- Bail is parent instruction's src is not SSA (Connor)

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Connor Abbott
7782f39e75 i965/fs/nir: translate double pack/unpack
v2 (Sam):
- Fix line overflow (Topi).

v3: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Connor Abbott
fd763177c1 i965/fs: add a pass for lowering PACK opcodes
v2: Use subscript() instead of stride() (Curro)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:06 +02:00
Connor Abbott
ba582e58cd i965/fs: add PACK opcode
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:05 +02:00
Francisco Jerez
cc3bae5cd7 i965/fs: Introduce helper to extract a field from each channel of a register.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-05-10 11:25:05 +02:00
Connor Abbott
d17cdacba3 i965/fs: always pass the bitsize to brw_type_for_nir_type()
v2 (Sam):
- Add bitsize to brw_type_for_nir_type() in optimize_extract_to_float()

v3 (Sam):
- Fix line width (Topi).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:05 +02:00
Connor Abbott
a308bae58f i965/fs: add support for printing double immediates
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:05 +02:00
Connor Abbott
0f2e227d5c i965/fs: don't propagate 64-bit immediates
They can only be used with 1-src instructions, which practically (since
we should've constant-propagated away all 1-src instructions with 64-bit
immediates in NIR) means that they must be kept in separate MOV's and
can't be propagated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:05 +02:00
Connor Abbott
0f1690fd95 i965/fs: use the NIR bit size when creating registers
v2 (Iago):
  - Squashed bits from 'support double precission constant operands for
    the implementation of 64-bit emit_load_const'.
  - Do not use BRW_REGISTER_TYPE_D for all 32-bit registers since that breaks
    asserts and functionality for some piglit tests. Just keep 32-bit types
    untouched and add 64-bit support.
  - Use DF instead of Q for 64-bit registers. Otherwise the code we generate
    will use Q sometimes and DF others and we hit unwanted DF/Q conversions,
    so always use DF.

v3 (Sam):
  - Mark 'reg_type' occurrences as const (Topi).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Tapani Palli <tapani.palli@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:04 +02:00
Connor Abbott
76de7af8e2 i965: fixup uniform setup for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:04 +02:00
Iago Toral Quiroga
3210870b34 i965: two-argument instructions can only use 32-bit immediates
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:04 +02:00
Iago Toral Quiroga
3d10adf603 i965: fix brw_abs_immediate() for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:04 +02:00
Iago Toral Quiroga
830d87840c i965: fix brw_saturate_immediate() for doubles
v2 (Sam):
  - Mark 'size' as const (Topi).
  - Add comment to explain that we do copies 64-bits regardless of the
    type (Topi)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-10 11:25:03 +02:00
Connor Abbott
7bcc4cccad i965: fix is_zero(), is_one() and is_negative_one() for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:03 +02:00
Connor Abbott
2ae409286c i965: fix brw_negate_immediate() for doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:03 +02:00
Connor Abbott
cbf7c7f099 i965/eu: add support for DF immediates
v2 (Sam):
  - Remove 'however' from the comment (Topi)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:03 +02:00
Connor Abbott
c0a1cd24a8 i965: add support for disassembling DF immediates
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:03 +02:00
Connor Abbott
bb175db16b i965: add support for getting/setting DF immediates
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:03 +02:00
Connor Abbott
5310bca024 i965: add brw_imm_df
v2 (Iago)
  - Fixup accessibility in backend_reg

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Topi Pohjolainen
9add73f641 i965/eu: Allow 3-src float ops with doubles
v2:
  - set 3src_src_type for BRW_REGISTER_TYPE_DF (Connor)

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Connor Abbott
367e762a71 i965/disasm: fix disasm of 3-src doubles
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Topi Pohjolainen
45066a6a59 i965: Tell backend register about double precision type
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Tapani P\344lli <tapani.palli@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Topi Pohjolainen
520b3b2fd1 i965: Determine size of double precision float register
This is used to determine how many registers an instruction reads and
writes as well as for offseting register region into a desired component.

v2 (Connor): rebase on master

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Tapani P\344lli <tapani.palli@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Topi Pohjolainen
e88cf0f2d2 i965: Lower DFRACEXP/DLDEXP
v2 (Connor): rebase on master which moved this to brw_link.cpp
v3 (Sam):
- Only enable DFREXP_DLDEXP_TO_ARITH in process_glsl_ir(). This is
used for doubles. Single floating point op is lowered by NIR.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:02 +02:00
Connor Abbott
30424fd25a i965: use pack/unpackDouble lowering
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:01 +02:00
Connor Abbott
bea2f8beb5 i965: use double lowering pass
v2: also lower trunc, ceil, floor, fract and roundEven (Iago)
v3: also lower mod for doubles (Sam)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:01 +02:00
Samuel Iglesias Gonsálvez
d00a239b28 freedreno/ir3: lower lrp when operating with double operands
Lower lrp when operating with double operands because float version of
lrp is also lowered.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-10 11:25:01 +02:00
Samuel Iglesias Gonsálvez
93e690830a i965: enable lrp lowering for doubles
Broadwell and previous generations does not support lrp instruction
operating with doubles.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-10 11:25:01 +02:00
Dave Airlie
008feb3687 st/glsl_to_tgsi: brown paper bag for the input offsets fix.
Oops, thanks compiler.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 14:41:21 +10:00
Dave Airlie
4d8a71f7f1 glsl: check geometry output vertices limits.
This fixes:
GL45-CTS.geometry_shader.limits.max_output_vertices

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 14:26:03 +10:00
Dave Airlie
13c68e1447 mesa/vbo: fix check for zero aliases with 2/10/10/10
This fixes:
GL33-CTS.gtf33.GL3Tests.vertex_type_2_10_10_10_rev.vertex_type_2_10_10_10_rev_attrib

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 14:24:49 +10:00
Eduardo Lima Mitev
60a5d02416 nir/print: Print memory qualifiers in a variable declaration
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-10 06:22:05 +02:00
Eduardo Lima Mitev
7f7f58f17f glsl: Apply memory qualifiers to vars inside named block interfaces
This is missing and memory qualifiers are currently being ignored for SSBOs.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-10 06:21:55 +02:00
Dave Airlie
f75a26d1ba st/glsl_to_tgsi: handle offsets from inputs
This fixes:
GL45-CTS.gpu_shader5.texture_gather_offset_color_repeat

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 13:14:29 +10:00
Rob Clark
aa730aca20 scripts: bump git_reviewer.pl --git-min-percent default
Bump up default percentage of commits required to be auto-picked for CC.
Seems from a bit of trial-and-error to come up with a more reasonable
list of CC's this way.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-09 19:30:28 -04:00
Kenneth Graunke
e034d80fe1 Revert "Revert "i965: Switch to scalar TCS by default.""
This reverts commit bd326c229c.

Now that we've fixed the GPU hangs, let's turn it back on.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-09 16:20:27 -07:00
Kenneth Graunke
5ce405ba0f i965: Actually assign binding table offsets for the TCS.
As far as I can tell, this was just entirely missing...honestly, I'm
not sure how anything worked at all.

Caught by noticing GPU hangs in image load store tests with scalar TCS,
but probably has broader implications.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-05-09 16:20:18 -07:00
Kenneth Graunke
e0e7280db0 i965: Clamp "Maximum VP Index" to 1 when gl_ViewportIndex isn't written.
fs_visitor::emit_urb_writes skips writing the VUE header for shaders
that don't write gl_PointSize, gl_Layer, or gl_ViewportIndex.  This
leaves their values uninitialized.  Kristian's nearby comment says:

"But often none of the special varyings that live there are written
 and in that case we can skip writing to the vue header, provided the
 corresponding state properly clamps the values further down the
 pipeline."

However, we were clamping gl_ViewportIndex to [0, 15], so we would end
up using a random viewport.  To fix this, detect when the shader doesn't
write gl_ViewportIndex, and clamp it to [0, 0].

The vec4 backend always writes zeros to the VUE header, so it doesn't
suffer from this problem.  With vec4-style HWord writes, we can write
the header and position together in a single message.  In the FS world,
we would need 4 extra MOVs of 0 and a longer message, or a separate
OWord write.  It's likely cheaper to just clamp the value.

Fixes DiRT Showdown and Bioshock Infinite, which only rendered half of
the screen - the lower left of two triangles.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93054
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-09 15:31:27 -07:00
Jordan Justen
e74812dbfe i965/hsw: Fix brw_store_data_imm*
For Gen6 through Haswell dword 1 is MBZ. In gen 8 it becomes part of
the 64-bit address.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-09 15:05:08 -07:00
Kenneth Graunke
96d43f2d08 i965: Reimplement ARB_transform_feedback2 on Haswell and later.
My old implementation accumulated <start, end> pairs in a buffer,
and eventually processed that data on the CPU.  This meant flushing
the batchbuffer and waiting for it to completely execute before we
could map it, resulting in really long stalls.  We could also run out
of space in the buffer, and have to do this early.

Instead, we can use Haswell's MI_MATH command to do the (end - start)
subtraction, as well as the multiplication by 2 or 3 to convert from
the number of primitives written to the number of vertices written.
We still need to CS stall to read the counters, but otherwise everything
is completely pipelined - there's no CPU<->GPU synchronization required.
It also uses only 80 bytes in the buffer, no matter what.

Improves performance in Manhattan on Skylake GT3e at 800x600 by
6.1086% +/- 0.954166% (n=9).  At 1920x1080, improves performance
by 2.82103% +/- 0.148596% (n=84).

v2: Fix number of primitives -> number of vertices calculation for
    GL_TRIANGLES (I was multiplying by 4 instead of 3.)  Caught by
    Jordan Justen.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-09 15:00:01 -07:00
Kenneth Graunke
fdb6c1887f i965: Add a brw_load_register_reg64 helper.
It appears that we can't do this in a single command (like we do for
MI_LOAD_REGISTER_IMM) - the Skylake simulator gets rather grumpy about
the command length if I try to combine them.  No matter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-09 15:00:01 -07:00
Kenneth Graunke
4c71c8a74a i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.
On Haswell, we need version 6 of the kernel command parser in order to
write the math registers.  Our implementation of ARB_query_buffer_object
heavily relies on MI_MATH, so we should only advertise it when MI_MATH
is available.  We also need MI_LOAD_REGISTER_REG, which requires version
7 of the command parser.

To make these checks easier, introduce a screen->has_mi_math_and_lrr
flag that will be set when both commands are supported.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-09 14:59:58 -07:00
Dave Airlie
2d41eb313f mesa/objectlabel: don't return info on genned but never bound textures.
This fixes some cases in the CTS KHR debug tests where it uses
glIsTexture to find an invalid ID and then call GetObjectLabel.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 06:06:09 +10:00
Dave Airlie
bbc6a27590 mesa: don't use genned but unnamed xfb objects.
If we try to draw or query an XFB object that hasn't been bound,
we shouldn't return any information.

This fixes a couple if cases in:
GL33-CTS.transform_feedback.api_errors_test

The ObjectLabel test is inspired by another test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-10 06:06:09 +10:00
Samuel Pitoiset
eafe3905d9 nv50/ir: silence unsupported TGSI_PROPERTY_CS_FIXED_BLOCK_*
We don't need them for compute shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-09 21:58:56 +02:00
Jordan Justen
2e2aa992ff mesa/compute: Fix indirect dispatch buffer size check on 32-bit systems
2655265fcb, but for compute.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-09 11:16:39 -07:00
Rob Clark
57763ee735 freedreno/ir3: fix fallout from new block iterators
Since this is potentially modifying the block structure of the shader,
it needs the _safe() version of the iterator.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-09 13:52:29 -04:00
Nicolai Hähnle
fe102f7677 radeonsi: workaround for tesselation on SI
We request more than 32KB of LDS here, which SI doesn't have. Since LLVM
recently started checking the size of declared LDS allocations, all shaders
involved in tesselation fail to compile on SI.

Note that the entire calculation here seems wrong, given how we calculate
indices for generic attributes, so the number ends up wrong on CI+ as well.
A proper solution is clearly needed, but this patch should serve as a band-aid
for SI in the meantime.

Also note that the real size of the LDS allocation in hardware is independent
from what we tell LLVM, so this is really more of a "cosmetic" change.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95198
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-09 11:52:46 -05:00
Nicolai Hähnle
d8f3e8e626 radeonsi: always allocate export memory for pixel shaders
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-09 11:52:46 -05:00
Nicolai Hähnle
ad1782cfb5 radeonsi: expose performance counters as 64 bit
This is useful for shader-related counters, since they tend to quickly
exceed 32 bits.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-09 11:52:46 -05:00
Rob Clark
f096096b77 nir/search: fix typo
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-09 12:46:24 -04:00
Tim Rowley
b65f7ec450 gallium: enable intel jitevents profiling
LLVM when configured with "intel jitevents" enabled can inform
VTune about dynamic code, so individual shaders are attributed
profiling data and the resulting assembly can be examined.

Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-05-09 11:25:02 -05:00
Bruce Cherniak
0062c5f09b swr: Add missing break in query switch statement.
Missed a switch break in query stat collection when refactoring queries.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-05-09 11:21:47 -05:00
Rob Clark
f33083a216 freedreno/ir3: allow for additional VS sysval inputs
There are a total of four possible currently, rather than 2.  So we need
to be prepared for the input array to grow by 16 components.  We could
get away with less if we could pack sysval inputs..  and the way this is
handled currently isn't really the nicest thing.  But it's a tactical
fix for an issue hit in:

GL31-CTS.gtf30.GL3Tests.transform_feedback.transform_feedback_vertex_id

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-09 11:51:59 -04:00
Emil Velikov
a0d9279e3b docs: add news item and link release notes for 11.1.4/11.2.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-09 14:28:20 +01:00
Emil Velikov
0c5752b672 docs: add sha256 checksums for 11.2.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-09 14:25:08 +01:00
Emil Velikov
f746aa348e docs: add release notes for 11.2.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-09 14:25:07 +01:00
Emil Velikov
596c881162 docs: add sha256 checksums for 11.1.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-09 14:25:04 +01:00
Emil Velikov
f93d8a885c docs: add release notes for 11.1.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-09 14:25:02 +01:00
Jose Fonseca
c521f2d737 scons: Improve Python module dependency discovery.
Several NIR scripts were using `from ... import ...` syntax, which wasn't
supported.

Using Python standard libary's modulefinder solves the problem with less
effort and hacks.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-09 14:19:24 +01:00
Marek Olšák
172bfdaa9e r300g: add support for PIPE_FORMAT_x8R8G8B8_*
And set endian swap for packed formats the way it should be done
in theory.

This allows big endian to work again, but it can still be buggy.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71789

Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-09 13:11:40 +02:00
Daniel Stone
e54b2e902a Revert "i965: Always use Y-tiled buffers on SKL+"
This commit broke Weston, Mutter, and xf86-video-modesetting, on KMS.

In order to use Y-tiled buffers, the kernel requires the tiling mode to
be explicitly named through the I915_FORMAT_MOD_Y_TILED AddFB2 modifier;
it disallows any attempt to infer the buffer's tiling mode.

As the GBM API does not have a way to extract modifiers for a buffer,
this commit broke all users of GBM on SKL+. Revert it for now, until we
get a way to extract modifier information from GBM, and also let GBM
users inform the implementation that it intends to use the modifiers.

This reverts commit 6a0d036483.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Hans de Goede <hdegoede@redhat.com>
2016-05-09 10:35:55 +01:00
Dave Airlie
920d78a32c mesa/shader_query: add missing subroutines cases
ARRAY_SIZE and LOCATION should accept the SUBROUTINE_UNIFORM types.

Fixes:
GL43-CTS.program_interface_query.subroutines-vertex
GL43-CTS.program_interface_query.subroutines-tess-control
GL43-CTS.program_interface_query.subroutines-tess-eval
GL43-CTS.program_interface_query.subroutines-geometry
GL43-CTS.program_interface_query.subroutines-fragment
GL43-CTS.program_interface_query.subroutines-compute

Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-09 06:30:52 +10:00
Kenneth Graunke
742bc53d04 spirv: Fix structure splitting with per-vertex interface arrays.
We want to use interface_type, not vtn_var->type.  They're normally
equivalent, but for geometry/tessellation per-vertex interface arrays,
we need to unwrap a level.

Otherwise, we tried to iterate a structure members but instead used
an array length.  If the array length was longer than the number of
fields in the structure, we'd crash.

Fixes the CreatePipelineGeometryInputBlockPositive layer validation
test.

v2: Just use glsl_without_array() on the vtn_var type
    (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-07 15:44:41 -07:00
Kenneth Graunke
1896682d27 compiler: Add a C wrapper for glsl_type::without_array().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-05-07 15:44:41 -07:00
Nicolai Hähnle
b9e6e8e7d4 radeonsi: fix undefined behavior (memcpy arguments must be non-NULL)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Nicolai Hähnle
146927ce7b radeonsi: fix some reported undefined left-shifts
One of these is an unsigned bitfield, which I suspect is a false positive, but
gcc 5.3.1 complains about it with -fsanitize=undefined.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Nicolai Hähnle
60d2fc233b gallium/radeon: clean left-shift undefined behavior
Shifting into the sign bit of a signed int is undefined behavior.
Unfortunately, there are potentially many places where this happens using
the register macros.

This commit is the result of running

sed -ie "s/(((\(\w\+\)) & 0x\(\w\+\)) << \(\w\+\))/(((unsigned)(\1) \& 0x\2) << \3)/g"

on all header files in gallium/{r600,radeon,radeonsi}.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Nicolai Hähnle
62b7958cd0 gallium: fix various undefined left shifts into sign bit
Funnily enough, some of these were turned into a compile-time error by gcc
with -fsanitize=undefined ("initializer is not a constant").

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:59 -05:00
Nicolai Hähnle
945c6887ab compiler/glsl: do not downcast list sentinel
This crashes gcc's undefined behaviour sanitizer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:46:58 -05:00
Nicolai Hähnle
bdad1393a0 mesa/main: fix another undefined left shift
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:45:04 -05:00
Nicolai Hähnle
3e1cf8bf3f mesa/main: define _NEW_xxx flags as unsigned shifts
Since 1 << 31 complains about undefined behaviour; the others are changed
only for consistency.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-07 16:44:33 -05:00
Bas Nieuwenhuizen
6291f19f71 radeonsi: Compute correct LDS size for fragment shaders.
No sure where the 36 came from, but we clearly need at least
48 bytes per attribute per primitive.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-06 21:40:17 +02:00
Eric Anholt
a1f698881e vc4: Add support for loading immediate values in QIR.
This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
2016-05-06 10:25:55 -07:00
Eric Anholt
890dc19eeb vc4: Make vc4_qpu_validate() produce more verbose failures.
Seeing the expansion of a QPU_GET_FIELD in an assert isn't very
informative, and it's hard find what's going wrong without getting a dump
of the instruction that failed.
2016-05-06 10:25:55 -07:00
Eric Anholt
8e2d0843c0 vc4: Add a small QIR validate pass.
This has caught a couple of bugs during loop development so far, and I
should probably have written it long ago.
2016-05-06 10:25:55 -07:00
Eric Anholt
daaa9d579d vc4: Fix the src count on exp2/log2.
Found by the upcoming QIR validate pass.
2016-05-06 10:25:55 -07:00
Eric Anholt
d36b28402f vc4: Reuse QPU disasm's cond flags in QIR.
In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
2016-05-06 10:25:55 -07:00
Eric Anholt
419fee92ee vc4: When emitting an instruction to an existing temp, mark it non-SSA.
Prevents a bug in the later control-flow support series.
2016-05-06 10:25:55 -07:00
Eric Anholt
1387e722cd vc4: Make sure that we don't overwrite the signal for PROG_END.
We should have already emitted a NOP due to the last instruction being a
TLB or VPM write.  However, if you disable dead code elimination then you
might get dead code at the end, and that dead code might have the signal
bits set to something non-default, at which point you die in assertion
failure.
2016-05-06 10:25:55 -07:00
Samuel Pitoiset
44de03b0f8 nvc0: unreference images when the context is destroyed
Like other resources, we need to unreference all images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-06 15:15:32 +02:00
Jose Fonseca
8ae78f7d28 nir: Remove spurious return from void function.
Left over from 450c061362.

Trivial.  Built locally with clang and gcc.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95296
2016-05-06 12:03:34 +01:00
Marek Olšák
901f57dff5 radeonsi: set DECOMPRESS_Z_ON_FLUSH if nr_samples >= 4
Vulkan always sets this. It only affects in-place Z decompression.
This is recommended for performance, but what app uses MSAA depth
texturing?

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-06 12:56:47 +02:00
Marek Olšák
4489d75a58 r600g: use the hw MSAA resolving if formats are compatible
This allows resolving RGBA into RGBX.
This should improve HL2 Lost Coast performance.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-06 12:56:47 +02:00
Kenneth Graunke
bd326c229c Revert "i965: Switch to scalar TCS by default."
This reverts commit b593737ed8.

Apparently it causes GPU hangs on some image load store tests.
Let's turn it back off until we figure out why.
2016-05-05 18:03:23 -07:00
Leo Liu
fef0e993a1 st/omx/enc: fix incorrect reference picture order for B frames
Stacking frames is for driver that's capable to do dual instances
encoding. Such feature is not enabled for B frames currently.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-05-05 19:26:43 -04:00
Jason Ekstrand
7bc987abe0 i965/fs: Move handling of samples_identical into the switch statement
This is where we handle texop_texture_samples so it makes things more
consistent.
2016-05-05 16:25:21 -07:00
Jason Ekstrand
3ba228f997 i965/fs: Simplify texture destination fixups
There are a few different fixups that we have to do for texture
destinations that re-arrange channels, fix hardware vs. API mismatches, or
just shrink the result to fit in the NIR destination.  These were all being
done in a somewhat haphazard manner.  This commit replaces all of the
shuffling with a single LOAD_PAYLOAD operation at the end and makes it much
easier to insert fixups between the texture instruction itself and the
LOAD_PAYLOAD.

Shader-db results on Haswell:

   total instructions in shared programs: 6227035 -> 6226669 (-0.01%)
   instructions in affected programs: 19119 -> 18753 (-1.91%)
   helped: 85
   HURT: 0

   total cycles in shared programs: 56491626 -> 56476126 (-0.03%)
   cycles in affected programs: 672420 -> 656920 (-2.31%)
   helped: 92
   HURT: 42
2016-05-05 16:25:21 -07:00
Jason Ekstrand
7de0ae634e i965/fs: stop inclinding glsl/ir.h in brw_fs.h
We are no longer using anything from GLSL IR in the FS backend.
2016-05-05 16:25:21 -07:00
Jason Ekstrand
a815499294 i965/fs: Merge nir_emit_texture and emit_texture
The fs_visitor::emit_texture helper originated when we still had both NIR
and IR visitors for the FS backend.  Since the old visitor was removed,
emit_texture serves no real purpose beyond arbitrarily splitting
heavily-linked code across two functions.
2016-05-05 16:25:21 -07:00
Connor Abbott
4fab8dd5ea nir: remove now-unused nir_foreach_block*_call()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:42 -07:00
Connor Abbott
7c36f9eb52 vc4: fixup for new nir_foreach_block()
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-05 16:19:41 -07:00
Connor Abbott
582815d9ea ir3: fixup for new nir_foreach_block() 2016-05-05 16:19:41 -07:00
Jason Ekstrand
31fc4a2528 nir/lower_double_ops: fixup for new nir_foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Jason Ekstrand
450c061362 nir/lower_double_pack: fixup for new nir_foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Jason Ekstrand
8c807cc2a6 nir/gather_info: fixup for new foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Connor Abbott
331b9f73a2 nir/lower_two_sided_color: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Connor Abbott
d40fbbc27e nir/lower_tex: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Connor Abbott
8a7fe634d2 nir/lower_outputs_to_temporaries: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:41 -07:00
Kenneth Graunke
b593737ed8 i965: Switch to scalar TCS by default.
Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2
shaders, as it takes four instructions to operate on a vec4, rather than
a single instruction.  However, the benefit is that it can process 8
objects per shader thread instead of 2.

Surprisingly, the shader-db statistics show an improvement in both
instruction and cycle counts:

Synmark: -31.25% instructions, -29.27% cycles, 0 hurt.
Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt.
Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt.
Shadow of Mordor:
   +13.24% instructions (26 with fewer instructions, 45 with more),
   -5.23% cycles (44 with fewer cycles, 27 with more cycles).

Presumably, this is because the SIMD8 URB messages are a much more
natural fit than the SIMD4x2 URB messages - there's a ton less header
setup.

I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e,
and the performance seems to be the same or increase ever so slightly
(< 1 FPS difference).  So I believe it's strictly superior.

There's also a lot more optimization potential we can do in scalar mode.

This will also help us finish fp64 support, as scalar support is going
to land much sooner than vec4-mode support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-05 14:24:00 -07:00
Kenneth Graunke
bc0062c54a nir: Optimize out stores of undefs.
There are a couple of cycle count changes in shader-db, but it's
basically a wash.

However, with the Broadwell scalar TCS backend enabled, many
Shadow of Mordor shaders benefit from this patch.  Because we don't
batch up output writes for TCS, vec4 outputs might not have all
components defined.  Many output writes have a value of undef,
which is useless.

With scalar TCS, stats for tessellation shaders on Broadwell:

total instructions in shared programs: 1283000 -> 1280444 (-0.20%)
instructions in affected programs: 34302 -> 31746 (-7.45%)
helped: 71
HURT: 0

total cycles in shared programs: 10798768 -> 10780682 (-0.17%)
cycles in affected programs: 158004 -> 139918 (-11.45%)
helped: 71
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-05 14:24:00 -07:00
Kenneth Graunke
c7a8b32700 nir: Replace vecN(undef, undef, ...) with a single undef.
shader-db statistics on Broadwell:

total instructions in shared programs: 8963409 -> 8962455 (-0.01%)
instructions in affected programs: 60858 -> 59904 (-1.57%)
helped: 318
HURT: 0

total cycles in shared programs: 71408022 -> 71406276 (-0.00%)
cycles in affected programs: 398416 -> 396670 (-0.44%)
helped: 199
HURT: 51

GAINED: 1

The only shaders affected were in Dota 2 Reborn.

It also sets up for the next optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-05 14:24:00 -07:00
Kenneth Graunke
49ea7454a1 nir: Rename opt_undef_alu to opt_undef_csel; update comments.
This better reflects what it does.  I plan to add other ALU
optimizations as well, so the old name would be confusing.

In preparation for that, also move the file comments about csels
above the opt_undef_csel function, and delete the ones about there
not being other optimizations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-05 14:24:00 -07:00
Kenneth Graunke
a808ba5965 i965: Rework passthrough TCS checks.
According to Timothy, using program_string_id == 0 to identify the
passthrough TCS is going to be problematic for his shader cache work.

So, change it to strcmp() the name at visitor creation time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-05 14:24:00 -07:00
Tim Rowley
ff8c0c9a35 swr: [rasterizer core] Faster modulo operator in ProcessVerts
Avoid % operator, since we know that curVertex is always incrementing.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:50:11 -05:00
Tim Rowley
2be7c3e780 swr: [rasterizer] Small warning cleanup
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:50:03 -05:00
Tim Rowley
b39c530f88 swr: [rasterizer] Add SWR_ASSUME / SWR_ASSUME_ASSERT macros
Fix static code analysis errors found by coverity on Linux

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:56 -05:00
Tim Rowley
db084f48eb swr: [rasterizer] Miscellaneous backend changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:48 -05:00
Tim Rowley
3951a2109e swr: [rasterizer] Add support for X24_TYPELESS_G8_UINT format
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:42 -05:00
Tim Rowley
909aee07f8 swr: [rasterizer jitter] Fix printing bugs for tracing.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:29 -05:00
Tim Rowley
bc084e6b3d swr: [rasterizer memory] Add missing store tiles function
Storing color hot tile to 8bit w-major stencil format.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:23 -05:00
Tim Rowley
5332c9d931 swr: [rasterizer jitter] Add asserts for supported formats in fetch shader
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:18 -05:00
Tim Rowley
6e89227054 swr: [rasterizer core] Fix thread allocation
Fix windows in 32-bit mode when hyperthreading is disabled on Xeons.

Some support for asymmetric processor topologies.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:11 -05:00
Tim Rowley
c2f5d2daa8 swr: [rasterizer core] Fix threadviz support in buckets
Need to do lazy eval of the threadviz knob since order of globals
is undefined.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:49:04 -05:00
Tim Rowley
1eb211c4a4 swr: [rasterizer] Whitespace cleanup and misc changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-05-05 14:48:55 -05:00
Nicolai Hähnle
d97e333ea4 radeonsi: mark descriptor loads as using dynamically uniform indices
This tells LLVM to always use SMEM loads for descriptors. It fixes a
regression in piglit's arb_shader_storage_buffer_object/execution/indirect.shader_test
that was caused by LLVM r268259 (but the proper fix is really here in Mesa).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-05 12:21:40 -05:00
Matt Turner
f01d92f473 i965/fs: Don't follow pow with an instruction with two dest regs.
Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on
Gen >= 8 platforms.

Evidently that commit allowed the scheduler to make different choices
that somehow finally ran afoul of a hardware bug in which POW and FDIV
instructions may not be followed by an instruction with two destination
registers (including compressed instructions). I presume the conditions
are more complex than that, but the internal hardware bug report (BDWGFX
bug_de 1696294) does not contain much more information.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Tested-by:  Mark Janes <mark.a.janes@intel.com> [v1]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-05-05 10:18:28 -07:00
Bruce Cherniak
9d86a5eea7 swr: Remove stall waiting for core query counters.
When gathering query results, swr_gather_stats was
unnecessarily stalling the entire pipeline.  Results are now
collected asynchronously, with a fence marking completion.

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2016-05-05 10:50:09 -05:00
Dave Airlie
76a36ac3ea mesa/ubo: add missing compute cases for ubo/atomic buffers
This fixes: GL43-CTS.compute_shader.resource-ubo

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-05 20:29:02 +10:00
Dave Airlie
2dd3fc3cac mesa/compute: drop pointless casts.
We already are a GLintptr, casting won't help.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-05 20:28:41 +10:00
Thomas Hindoe Paaboel Andersen
76a423efe0 mesa: remove null check before free
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-05-05 09:50:38 +02:00
Thomas Hindoe Paaboel Andersen
3a6763f0a0 freedreno: remove null check before free
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-05-05 09:34:01 +02:00
Thomas Hindoe Paaboel Andersen
8698194313 nir: fix assert for wildcard pairs
The assert was null checking dest_arr_parent twice. The intention
seems to be to check both dest_ and src_.

Added in d3636da9

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-05-05 09:33:02 +02:00
Brian Paul
be5010c4b8 glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml
The function returns GLuint, not GLfloat values.

v2: also fix the OES function

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-04 14:49:39 -06:00
Brian Paul
54d203a319 mesa: include texture format in glGenerateMipmap error message
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-04 14:49:39 -06:00
Brian Paul
a62f031bc3 main: uses casts to silence some _mesa_debug() format warnings
Silences warnings with 32-bit Linux gcc builds and MinGW which doesn't
recognize the ‘t’ conversion character.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-05-04 14:49:39 -06:00
Jordan Justen
51300a0387 docs: Mark GL_ARB_query_buffer_object as done for i965/hsw+
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 11:23:17 -07:00
Jordan Justen
f00c399bae i965: Implement ARB_query_buffer_object for HSW+
v2:
 * Declare loop index variable at loop site (idr)
 * Make arrays of MI_MATH instructions 'static const' (idr)
 * Remove commented debug code (idr)
 * Updated comment in set_query_availability (Ken)
 * Replace switch with if/else in hsw_result_to_gpr0 (Ken)
 * Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on
   hsw and gen8 (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-04 11:23:17 -07:00
Jordan Justen
357ff91359 i965/gen6+: Add load register immediate helper functions
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:17 -07:00
Jordan Justen
959e1e9e66 i965/hsw+: Add support for copying a register
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:17 -07:00
Jordan Justen
aad14a22cb i965/gen6+: Add support for storing immediate data into a buffer
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:17 -07:00
Jordan Justen
ac0bbf9ef3 i965: Add MI_MATH reg defs for HSW+
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:17 -07:00
Jordan Justen
9f581f8f24 i965: Add brw_store_register_mem32
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:17 -07:00
Jordan Justen
c54e5c2fb2 i965: Use offset instead of index in brw_store_register_mem64
This matches the byte based offset of brw_load_register_mem*.

The function is also moved into intel_batchbuffer.c like
brw_load_register_mem*.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 11:23:10 -07:00
Jan Vesely
77959ce07b r600,compute: create vtx buffer for text + rodata
Reserve buffer id 2

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-05-04 13:09:18 -04:00
Rob Clark
2e117a7649 freedreno: allow ctx->draw_vbo to fail
Pretty much only happens if shader variant compile fails.  But in this
case, if we haven't emitted cmdstream, we don't want to set needs_flush.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
291ac872a4 freedreno: move shader-stage dirty bits to global dirty flag
This was always a bit overly complicated, and had some issues (like
ctx->prog.dirty not getting reset at the end of the batch).  It also
required some special hacks to avoid resetting dirty state on binning
pass.  So just move it all into ctx->dirty (leaving some free bits
for future shader stages), and make FD_DIRTY_PROG just be the union
of all FD_SHADER_DIRTY_*.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
a48cccacf3 freedreno/a4xx: fix bogus offset for f32x24s8 stencil restore
fixes: $piglit/bin/fbo-clear-formats GL_ARB_depth_buffer_float

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
e7c64041e9 freedreno: add some debug_asserts() to catch insane offsets
Ofc won't catch *all* faults, but at least helpful for catching offsets
which are completely bogus.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
1f2bc64f31 freedreno/a4xx: deal with VS which do not write position
Fixes $piglit/bin/glsl-1.40-tf-no-position

a3xx may need similar?

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
a6ad30202c freedreno/ir3: remove a couple redundant is_flow()s
Now that the opc's encode the instruction category (making them unique)
we no longer need to check the category in addition to the opc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
f0a1f3de27 freedreno/ir3: cp small negative integers too
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
1f04d4bf59 freedreno/ir3: fix # of registers
The instruction encoding allows for more registers, but at least on
a3xx/a4xx they don't actually exist.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
173871dfb9 freedreno/ir3: lower immeds to const
Helps reduce register pressure and instruction counts for immediates
that would otherwise require a mov into gpr.

total instructions in shared programs:          4455332 -> 4369297 (-1.93%)
total dwords in shared programs:                8807872 -> 8614432 (-2.20%)
total full registers used in shared programs:   263062 -> 250846 (-4.64%)
total half registers used in shader programs:   9845 -> 9845 (0.00%)
total const registers used in shared programs:  1029735 -> 1466993 (42.46%)

                 half       full      const      instr     dwords
    helped           0       10415           0       17861        5912
      hurt           0        1157       21458         947          33

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
b15c7fc268 freedreno/ir3: add ir3_cp_ctx
Needed in next commit.. just split out to reduce noise.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark
b9985e5bde add REVIEWERS and get_reviewer.pl script
Copied from linux kernel (where it is called MAINTAINERS and
get_maintainer.pl), with minimal changes to script (to recognize
mesa src tree rather than linux kernel src tree, and to avoid
accidentaly CC'ing Linus Torvalds on mesa patches), and slimmed
down MAINTAINER file syntax to recognize that we don't really have
subsystem "maintainers" in the same sense as the linux kernel (ie. no
different mailing lists and git trees per subsystem).

The main point is to automate slapping on the correct CC's for patches
via git's --cc-cmd feature, more than anything else.

I didn't attempt to fully populate the REVIEWERS file, by a long shot.
This is an opt-in system and anyone else can add their own entries.

To utilize:

  git send-email --cc-cmd ./scripts/get_reviewer.pl ...

or to configure it to be the default:

  git config sendemail.cccmd ./scripts/get_reviewer.pl

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:46 -04:00
Ilia Mirkin
38fcf7cbad nouveau/video: properly detect the decoder class for availability checks
The kernel is now more strict with the class ids it exposes, so we need
to check the G98 and MCP89 classes as well as the GT215 class. This
effectively caused us to decide there were no decoding capabilities on
newer kernel for VP3 chips.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95251
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-05-04 10:45:07 -04:00
Kenneth Graunke
0332963d19 i965: Delete stale perf_debug().
MOCS for 3DSTATE_SO_BUFFER has existed for ages.
2016-05-04 02:29:03 -07:00
Kenneth Graunke
3a886721ed i965: Silence unused variable warning
I added this when deleting some unnecessary code in a rebase.
2016-05-04 00:46:31 -07:00
Juan A. Suarez Romero
97989059b9 mesa/main: handle double uniform matrices properly
When computing the offset in the uniform storage table, take into account
the size multiplier so double precision matrices are handled correctly.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-04 08:08:12 +02:00
Samuel Iglesias Gonsálvez
2ab2d2e588 nir: Separate 32 and 64-bit fmod lowering
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Samuel Iglesias Gonsálvez
b902377a56 nir/lower_double_ops: lower mod()
There are rounding errors with the division in i965 that affect
the mod(x,y) result when x = N * y. Instead of returning '0' it
was returning 'y'.

This lowering pass fixes those cases.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Matt Turner
9f81434c5f i965: Define GEN_GE/GEN_LE macros in terms of GEN_LT.
GEN_LT has a straightforward implementation on which we can build the
GEN_GE and GEN_LE macros.

Suggested-by:  Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:34:01 -07:00
Matt Turner
affaae197f i965: Add disassembler support for remaining opcodes.
For opcodes that changed meaning on different generations, we store a
pointer to a secondary table and the table's size in a tagged union in
place of the mnemonic and number of sources.

Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:34:00 -07:00
Matt Turner
b89b0a03f2 i965: Make opcode_descs and gen_from_devinfo() static.
The previous commit replaced direct uses of opcode_descs with calls to
the wrapper function, which should be the only method of accessing
opcode_descs's data. As a result gen_from_devinfo() can also be made
static.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:34:00 -07:00
Matt Turner
0ff4912cf4 i965: Actually check whether the opcode is supported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:33:59 -07:00
Matt Turner
667408b889 i965: Merge inst_info and opcode_desc tables.
I merged opcode_desc into inst_info (instead of the other way around)
because inst_info was sorted by opcode number.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:33:42 -07:00
Matt Turner
d01596613b i965: Move inst_info from brw_eu_validate.c to brw_eu.c.
Drop the uses of 'enum gen' to a plain int, so that we don't have to
expose the bitfield definitions and GEN_GE/GEN_LE macros to other users
of brw_eu.h. As a result, s/.gen/.gens/ to avoid confusion with
devinfo->gen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 22:33:42 -07:00
Francisco Jerez
1530e27534 i965/disasm: Wrap opcode_desc look-up in a function.
The function takes a device info struct as argument in addition to the
opcode number in order to disambiguate between multiple opcode_desc
entries for different instructions with the same opcode number.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]

[v2] mattst88: Put brw_opcode_desc() in brw_eu.c instead of moving it
               there in a later patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
[v3] mattst88: Return NULL if opcode >= ARRAY_SIZE(opcode_descs)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-03 22:32:40 -07:00
Francisco Jerez
1cc7573162 i965: Pass devinfo pointer to is_3src() helpers.
This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future.  In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 18:06:21 -07:00
Francisco Jerez
c55dc77ab1 i965: Pass devinfo pointer to brw_instruction_name().
A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations.  In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-03 18:06:21 -07:00
Kenneth Graunke
7d9143ad88 i965: Write a scalar TCS backend that runs in SINGLE_PATCH mode.
Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask.  One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.

Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread.  This has several intriguing properties:

- Accessing input arrays with gl_InvocationID as the index is a simple
  SIMD8 URB read with g1 as the header.  No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
  concurrency concerns are gone.  (We don't do this yet, though.)

v2: Drop first_non_payload_grf change, as it was always adding 0
    (caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-03 16:28:00 -07:00
Kenneth Graunke
75881bed9e i965: Rework the TCS passthrough shader to use NIR.
I'm about to implement a scalar TCS backend, and I'd rather not
duplicate all of this code there.

One change is that we now write the tessellation levels from all
TCS threads, rather than just the first.  This is pretty harmless,
and was easier.  The IF/ENDIF needed for that are gone; otherwise
the generated code is basically identical.

I chose to emit load/store intrinsics directly because it was easier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-03 16:27:52 -07:00
Brian Paul
ef5a31fc06 gallium/util: change assertion to conditional in util_bitmask_destroy()
If we fail to create a context in the VMware driver we call this function
unconditionally to free a bunch of bit vectors.  Instead of asserting on
a null pointer, just no-op.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-03 15:40:49 -06:00
Brian Paul
68116dcd5a cso: null-out previously bound sampler states
If, for example, we previously had 2 sampler states bound and now we
are binding one, we'd leave the second sampler state unchanged.
This change nulls-out the second sampler state in this situation.
We're already doing the same thing for sampler views.

This silences an occasional warning issued by the VMware driver when
the number of sampler views and sampler states disagreed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-03 15:40:49 -06:00
Brian Paul
05abaa65c7 svga: try to flag surfaces for sampling, in addition to rendering
This silences some warnings when we try to sample from surfaces that were
created for drawing, such as when blitting from one of the framebuffer
surfaces.  We were already doing the opposite situation (adding a bind
flag for rendering to surfaces declared as texture sources).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
abc6432d54 svga: fix copying non-zero layers of 1D array textures
Like cube maps, we need to convert the z information to a layer index.
Also rename the *_face vars to *_face_layer to make things a little more
understandable.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
b94f73c150 svga: clean up svga_pipe_blit.c
Remove dead code.  Fix formatting.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
8842be1132 rbug: s/Elements/ARRAY_SIZE/
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
7f641916bf freedreno: s/Elements/ARRAY_SIZE/
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-05-03 15:40:48 -06:00
Brian Paul
b91975714d trace: s/Elements/ARRAY_SIZE/
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
e193c5dd59 ilo: s/Elements/ARRAY_SIZE/
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-03 15:40:48 -06:00
Brian Paul
951bf8b4a6 i915g: s/Elements/ARRAY_SIZE/
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-03 15:40:48 -06:00
Samuel Pitoiset
5658ddc7fe nvc0: compute a percentage for metric-achieved_occupancy
metric-issue_slot_utilization and metric-branch_efficiency are already
computed as percentages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-03 23:18:50 +02:00
Samuel Pitoiset
10ec27760a nvc0: display some performance metrics with a percentage
This makes more sense for them.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-03 23:18:50 +02:00
Samuel Pitoiset
64937615a0 nvc0: store the driver query type for performance metrics
This will allow to use percentages for some metrics because the Gallium
HUD doesn't allow to display floating point numbers and 0 is printed
instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-03 23:18:50 +02:00
Samuel Pitoiset
a9bc3211f5 nvc0: fix exposing of metric-issue_slots for SM21/SM30
This is most likely a copy-paste error when I reworked this area few
weeks ago. For SM20, metric-issue_slots is equal to inst_issued because
there is only one pipeline, so the metric is not exposed there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
2016-05-03 23:18:50 +02:00
Mark Janes
0af8a7d50c mesa/objectlabel: handle NULL src string
This prevents a crash when a NULL src is passed with a non-NULL length.

fixes: dEQP-GLES31.functional.debug.object_labels.query_length_only
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95252

Signed-off-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-05-03 14:07:31 -07:00
Dave Airlie
265fe9dce8 glsl: subroutine types cannot be used in constructors.
This fixes two of the cases in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-04 06:44:45 +10:00
Dave Airlie
3110a0aa23 glsl: resource is a reserved keyword in GLSL 4.20 as well
resource just appears in GLSL 4.20 without any fanfare.

Fixes GL43-CTX.CommonBugs.CommonBug_ReservedNames

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-04 06:44:45 +10:00
Jan Vesely
ebbe31d57c gallium,utils: Fix trivial sign compare warnings
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-03 12:00:09 -04:00
Knut Andre Tidemann
c68a9cdaac anv: fix hang during generation of dev_icd.json.
Fixes: b370ec7c76 ("anv: tweak the %.json rule")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-03 11:42:47 +01:00
Anuj Phogat
883f3662db swrast: Add texfetch_funcs entries for astc 3d formats
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
63432eb370 mesa: Enable translation between astc 3d gl formats and mesa formats
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
54cac7ad96 mesa: Handle astc 3d formats in _mesa_get_compressed_formats()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
dcfea1d7eb mesa: Handle astc 3d formats in _mesa_base_tex_format()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
cf85ef1618 mesa: Account for astc 3d formats in _mesa_is_astc_format()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
38cd8145a8 mesa: Add a helper function is_astc_3d_format()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
72dfe0242d mesa: Add the missing defines for GL_OES_texture_compression_astc
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
57451e0fc1 mesa: Align the values of #define's in glheader.h
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
0306110fa9 mesa: Add OES_texture_compression_astc to extension table and gl_extensions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
059f36c671 mesa: Add entries for astc 3d formats initializing struct gl_format_info
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
705216dbed mesa: Add mesa formats for astc 3d formats
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
24bb6ee8b6 glapi: Update dispatch XML files for OES_texture_compression_astc.xml
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
63a7a9d115 mesa: Account for block depth in _mesa_format_image_size()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
87bf66daa9 mesa: Handle 3d block sizes in _mesa_compute_compressed_pixelstore
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
84a44844f2 mesa: Handle 3d block sizes in teximage error checks
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
ec60b3da69 mesa: Handle 3d block sizes in getteximage error checks
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:18 -07:00
Anuj Phogat
5713461ae7 mesa: Add an assert for BlockDepth in _mesa_get_format_block_size()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:17 -07:00
Anuj Phogat
9163c37349 mesa: Add a helper function to query 3D block sizes
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:17 -07:00
Anuj Phogat
6abb1b4984 mesa: Add block depth field in struct gl_format_info
This will be later required for 3D ASTC formats.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-03 03:43:17 -07:00
Dave Airlie
c4a0cd4662 mesa/copyimage: make sure number of samples match.
This fixes
GL43-CTS.copy_image.samples_missmatch
which otherwise asserts in the radeonsi driver.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-03 20:13:29 +10:00
Dave Airlie
5989a2937f mesa/objectlabel: don't do memcpy if bufSize is 0 (v2)
This prevents GL43-CTS.khr_debug.labels_non_debug from
memcpying all over the stack and crashing.

v2: actually fix the test.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-03 20:12:59 +10:00
Dave Airlie
30823f997b mesa/textureview: move error checks up higher
GL43-CTS.texture_view.errors checks for GL_INVALID_VALUE
here but we catch these problems in the dimensionsOK check
and return the wrong error value.

This fixes:
GL43-CTS.texture_view.errors.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-03 20:12:52 +10:00
Marek Olšák
5541e11b9a gallium/radeon: remove stencil_tile_split from metadata
this is a leftover from the days when depth-stencil buffers were
allocated by the DDX

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
20a77397fa gallium/radeon: remove tile_mode_array_valid flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
c8aac4fc0d winsys/amdgpu: pass PIPE_CONFIG to addrlib on texture import
This hasn't been needed, but I think we should set it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
dc970c4f4e winsys/amdgpu: read NUM_BANKS from buffer metadata
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
02f90cef7d radeonsi: remove unused tile mode getters
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
b9e3e87069 radeonsi: just read tile mode arrays in SDMA setup
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
0c2cba1ec6 radeonsi: just read tile mode arrays in SI DMA setup
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
c3ca54aee9 radeonsi: just read tile mode arrays in DB setup
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
ef45825708 gallium/radeon: add radeon_surf::macro_tile_index
for indexing cik_macrotile_mode_array

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
ed4fd542de winsys/radeon: drop support for kernels lacking tile mode array queries
This will allow us to simplify a lot of code around tiling.

Kernel 3.10 is required for SI support.
Kernel 3.13 is required for CIK support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
3d956b4bc0 st/mesa: fix blit-based GetTexImage for non-finalized textures
This fixes getteximage-depth piglit failures on radeonsi.

Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
77af6bcc26 winsys/radeon: count buffer size only once
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
3e3c43418e winsys/amdgpu: count buffer size only once
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
f98ba4123c winsys/amdgpu: loosen up requirements for how much memory IBs can use
ported from winsys/radeon.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
9ec00c23c2 radeonsi: when parsing dmesg, skip empty lines
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Marek Olšák
9983efca76 radeonsi: use the hw MSAA resolving if formats are compatible
This allows resolving RGBA into RGBX.
This should improve HL2 Lost Coast performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-02 22:49:25 +02:00
Samuel Pitoiset
819836d240 nv50,nvc0: re-bind old compute state after reading MP perf counters
This might be useful to avoid breaking the current compute state when
monitoring MP perf counters because we use a compute kernel to read out
those counters. This has been initially suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-02 22:30:48 +02:00
Rob Clark
dcf8c4425a nir: make lower_clamp_color pass work after lower i/o
Kinda important to work with tgsi_to_nir, which generates nir which
already has i/o lowered.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-02 14:25:38 -04:00
Eric Anholt
226bd92945 vc4: Use NIR lowering for sRGB decode.
This should get us the same decode code generated, but with a lot less
custom code in the driver.
2016-05-02 11:06:29 -07:00
Eric Anholt
4b326341f3 vc4: Just use NIR lowering for texture projection.
This means doing Newton-Raphson on the RCP, but it's probably actually a
good thing to be accurate on.
2016-05-02 11:06:29 -07:00
Eric Anholt
2f98bc100d vc4: Scalarize phi nodes as well.
This makes fewer programs with loops assertion fail, replacing them with
the rendering failure warning.
2016-05-02 11:06:29 -07:00
Eric Anholt
4a2ad8500d vc4: Add whitespace after each program stage dump.
In particular it's been hard to find the point where we switch from
dumping pre-optimization QIR and post-optimization QIR.
2016-05-02 11:06:29 -07:00
Eric Anholt
84322b2f31 vc4: Remove the CSE pass.
It's not doing anything according to shader-db now that we're using NIR.
It would have had to be reworked significantly anyway, to handle control
flow.
2016-05-02 11:06:29 -07:00
Eric Anholt
b145b731ab vc4: Emit only one FRAG_Z or FRAG_W QIR opcode.
We were generating piles of FRAG_W for interpolation, only to CSE them
away immediately.  Since this is the only thing that CSE is doing for us
any more, just avoid making the CSE work necessary.
2016-05-02 11:06:29 -07:00
Eric Anholt
e138716d8d vc4: Use the NIR cubemap normalization instead of our own.
This is one of two uses of the current QIR CSE pass according to
shader-db.  The NIR pass means that we'll end up doing Newton-Raphson on
our RCP, which we weren't doing before, but that's probably actually a
good thing.
2016-05-02 11:06:29 -07:00
Eric Anholt
3bee7581e6 vc4: Drop the support for DCE of texture instructions.
Now that we're using NIR for our optimization, there's no need for this
tricky code.
2016-05-02 11:06:29 -07:00
Nicolai Hähnle
155ce49603 radeonsi: fix PIPE_FORMAT_R11G11B10_FLOAT handling
That format has first_non_void < 0. This fixes a regression in piglit
arb_shader_image_load_store-semantics that was introduced by commit 76b8c5cc60,
while hopefully still shutting Coverity up (and failing in a more obvious way
if a similar error should re-appear).

Reviewed-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-02 11:38:23 -05:00
Nicolai Hähnle
169ace5636 radeonsi: correct NULL-pointer check in si_upload_const_buffer
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-02 11:37:55 -05:00
Dave Airlie
cf6dadb00b softpipe: bump 3D texture limit to 2048
The GL4.1 spec bumps this to 2048, so we should do so.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-02 07:29:02 +10:00
Dave Airlie
277170eeea softpipe: allow r32 xchg on shader images.
This is part of OES_shader_image_atomic.txt.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-02 07:28:58 +10:00
Ilia Mirkin
3950aa47df softpipe: avoid leaking local_mem on machines alloc failure
Spotted by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-05-01 11:19:08 -04:00
Ilia Mirkin
ad545d179b vbo: avoid leaking prim on vbo bind failure
Spotted by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-05-01 11:19:08 -04:00
Edward O'Callaghan
23cf24e227 mapi/glapi: Fix dup word typo in glapi_getproc.c
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-01 16:07:29 +02:00
Emil Velikov
44f921091a isl: automake: don't explicitly EXTRA_DIST the tests folder
The file(s) within are already picked thanks to the build rule of the
respective test. No need to have the folder in EXTRA_DIST.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 14:17:30 +01:00
Timothy Arceri
f982e2434b mesa: add LOCATION_COMPONENT support to GetProgramResourceiv
From Section 7.3.1.1 (Naming Active Resources) of the OpenGL 4.5 spec:

   "For the property LOCATION_COMPONENT, a single integer indicating the first
   component of the location assigned to an active input or output variable is
   written to params. For input and output variables with a component specified
   by a layout qualifier, the specified component is written. For all other
   input and output variables, the value zero is written."

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-05-01 23:13:36 +10:00
Timothy Arceri
b1c872a81e glsl: add component to has_layout() helper
I don't think this will do much as it's a compiler error
to use component without location which is already in the
table but its good to be consistent.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-05-01 23:13:28 +10:00
Timothy Arceri
589053dac7 glsl: validate linking of intrastage component qualifiers
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-05-01 23:13:22 +10:00
Timothy Arceri
0317dfcd9b glsl: update explicit location matching to support component qualifier
This is needed so we don't optimise away the varying when more than
one shares the same location.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-01 23:13:15 +10:00
Timothy Arceri
0d88b15f07 glsl: cross validate varyings with a component qualifier
This change checks for component overlap, including handling overlap of
locations and components by doubles. Previously there was no validation
for assigning explicit locations to a location used by the second half
of a double.

V3: simplify handling of doubles and fix double component aliasing
detection

V2: fix component matching for matricies

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-01 23:13:10 +10:00
Timothy Arceri
94438578d2 glsl: validate and store component layout qualifier in GLSL IR
We make use of the existing IR field location_frac used for tracking
component locations.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-01 23:13:05 +10:00
Timothy Arceri
2d9936a686 glsl: allow component qualifier on varying inputs
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-05-01 23:13:00 +10:00
Timothy Arceri
daa8df590b glsl: parse component layout qualifier
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-01 23:12:52 +10:00
WuZhen
ea4c1afd05 android: enable dlopen() on all architectures
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:29 +01:00
Jose Fonseca
5649d6ab06 winsys/sw/xlib: use correct free function for xlib_dt->data
Analogous to previous commit.

Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:29 +01:00
WuZhen
4f21f3f2e8 winsys/sw/dri: use correct free function for dri_sw_dt->data
align_malloc() is used to allocate dri_sw_dt->data, thus we should not
be using FREE() but align_free().

Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
[Emil Velikov: tweak commit summary/shortlog]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-01 12:31:29 +01:00
WuZhen
798f7a8596 tgsi: initialize stack allocated struct
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:29 +01:00
Emil Velikov
fb653641ea egl: android: do not feed invalid fourcc/pitch into the dri module
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:29 +01:00
Rob Herring
34ddef39ce egl: android: add dma-buf fd support
Add support for creating images from Android native buffers with dma-buf
fd. As dma-buf support also requires DRI image loader extension, add
that as well.

This is based on several originally patches written by Varad Gautam.
I've collapsed them into logical changes and done a bit of reformatting.
Using dma-bufs vs. GEM handles is now a runtime decision similar to the
wayland EGL instead of being compile time selection. The dma-buf support
is also re-written to use common dri2_create_image_dma_buf function in
egl_dri2.c.

Cc: Varad Gautam <varadgautam@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:29 +01:00
Rob Herring
81a6fff4c5 egl: android: factor out back buffer handling code
In preparation to use the same code for dma-bufs, factor out the code to a
separate function.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:28 +01:00
Rob Herring
dfaccf25f5 egl: android: factor out format conversion code to a function
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:28 +01:00
Rob Herring
d45884ef05 egl: android: disable __DRI_DRI2_LOADER support on render nodes
Use of __DRI_DRI2_LOADER extension is only supported for card nodes. In
order to support dmabufs, Android will be moving to using render nodes and
we need to disable the DRI2 loader extension.

This is based on the Wayland EGL code.

Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:28 +01:00
Rob Herring
dbbf7a8e61 Android: fix build ordering of subdirectories
Different versions of make behave differently in whether $(wildcard) sorts
the results or not. The Android build now explicitly sorts
all-named-subdir-makefiles which breaks the build because src/gallium
must be included after src/mesa/drivers/dri.

The Android build system doesn't support doing "include $(call
all-named-subdir-makefiles,...)" twice, so rework things by generating
the included makefile list and including them in 2 steps.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-01 12:31:28 +01:00
Jamey Sharp
595d56cc86 glShaderSource must not change compile status.
OpenGL 4.5 Core Profile section 7.1, in the documentation for
CompileShader, says: "Changing the source code of a shader object with
ShaderSource does not change its compile status or the compiled shader
code."

According to Karol Herbst, the game "Divinity: Original Sin - Enhanced
Edition" depends on this odd quirk of the spec.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-05-01 18:46:24 +10:00
Emil Velikov
9fa2e57a73 gallium/radeon: nuke the final pre LLVM 3.6 codepath
Missed with commit 100796c15c "gallium/radeon: drop support for LLVM
3.5"

v2: s/LLVN/LLVM/ in shortlog (Nicolai)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-01 08:57:32 +01:00
Emil Velikov
7336df06ed anv: include the files in the tarball
Namely the python script, the ICD header and private headers. We could
get the system version of the ICD ones, although there is no .pc file to
easily locate and/or manage them.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:46 +01:00
Emil Velikov
9e09507516 i965: don't forget to ship brw_nir_trig_workarounds.py
Otherwise we won't be able to regenerate the source file(s).

Signed-off-by:    Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:46 +01:00
Emil Velikov
1f04caa09c isl: include all the files in the tarball
Add the missing header(s), generation scripts, README ...

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:34 +01:00
Emil Velikov
cee69ccb92 spirv: automake: add missing headers to the tarball.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:06 +01:00
Emil Velikov
dc38e6b169 automake: wire up the intel vulkan driver to make distcheck
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:06 +01:00
Emil Velikov
dfbf1289a4 anv: update .gitignore
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
fcdcb829d8 anv: automake: remove no longer needed include
Thanks to last commit we can nuke it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
3285461ceb anv: automake: tweak anv_entrypoint.[ch] rule
Rather than using cat + cpp feed the file(s) directly into the latter.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:05 +01:00
Emil Velikov
bc7802098e anv: tweak libvulkan_intel.so link libraries
i.e do not use -lfoo directly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
9f235adf99 anv: cosmetic makefile changes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:05 +01:00
Emil Velikov
446234033d anv: place the builddir includes before the srcdir ones
Otherwise we risk picking the possibly outdated file in the source dir
over the fresh one in the builddir.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:05 +01:00
Emil Velikov
6cb814727d automake: tweak SUBDIR reorder and comment it
It should ease people with all the interaction and platforms and how
they interact (at least from a build POV) with each other.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:05 +01:00
Emil Velikov
4fcf0ba113 configure.ac: remove unused HAVE_EGL_PLATFORM_NULL conditional
Afaict the last user was based on st/egl.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
9f3588eb37 automake: drop "EGL_" from HAVE_EGL_PLATFORM_WAYLAND
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
5459db91e3 automake: drop "EGL_" from HAVE_EGL_PLATFORM_X11
The variable covers more than just EGL, let's try to untangle the
confusion it brings.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:05 +01:00
Emil Velikov
a56009d089 anv: get rid of VULKAN_ENTRYPOINT_CPPFLAGS variable
Add the missing include to AM_CPPFLAGS and use it throughout the
makefile.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:05 +01:00
Emil Velikov
6dc169e18f anv: factor out the X11/XCB build
Similar to earlier commit - move all the common bits into a single
place, thus improving readability and allowing us to see what's missing.

Also don't forget to add the missing bits. This commit should allows us
to build wayland only vulkan ;-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
cbc4837b83 anv: kill of custom define HAVE_WAYLAND_PLATFORM
Vulkan API already has equivalent, so simplify things as just use it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
9bc99f5668 anv: refactor wayland build handling
Rather than having things split out in multiple places, consolidate it
and add all the missing bits. Also ensure that we use the already built
static library libwayland-drm.la.

v2 Add missing '\' in the CFLAGS.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2016-05-01 08:38:04 +01:00
Emil Velikov
3a2d09dd65 automake: include vulkan subdir after wayland-drm
We'll reuse the existing wayland-drm static library with next commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:04 +01:00
Emil Velikov
fe918556a2 anv: use a common variable to manage the library dependencies
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
82d0b59f02 anv: use the GENERATED_FILES variable
... rather than having duplicates files through the sources lists.

Splitting things as is, has the side effect of making things clearer and
easing a potential android build. The latter of which automatically adds
BUILT_SOURCES to the binary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
3ee7d8b0eb anv: fold the tests' makefile
Recent commit removed the winsys defines from anv_private.h thus
breaking the tests. To fix that and avoid it in the future, merge the
tests makefile in the libvulkan one.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
f3cb0dcae1 anv: build the core vulkan only once
Introduce a static library libvulkan_common.la that is used by
libvukan_intel.la and libvulkan_test.la.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
21800d77ff anv: kill off custom CFLAGS
AM_CFLAGS already does all that we need.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
623cb3a598 anv: add missing link against the math library
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
e98cf60446 anv: split sources lists to Makefile.sources
Will allow others to reuse the lists (scons/android anyone ?) and makes
the file a lot shorter and easier to read.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
0d3e7b17c9 anv: remove custom rule to install the intel_icd.json
Autoconf already does the exact same thing as the manually written rule.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94969
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:04 +01:00
Emil Velikov
30e6f68b3b anv: tweak the LDFLAGS
Copy/paste from the rest of mesa, but namely.
 - The module should be shared only.
 - We don't need the explicit ".so", as the vulkan loader will retrieve
the full filename from the json
 - No unresolved symbols in the final binary
 - Use the linker garbage collector to slim down the final binary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:03 +01:00
Emil Velikov
b370ec7c76 anv: tweak the %.json rule
It's used only by dev_icd.json so just call it that way. While we're
here, manually expand $< (as it might cause issue on some systems)
and drop the unneeded install_libdir substitution.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:03 +01:00
Emil Velikov
abd360ab75 anv: add a comment about dev_icd.json
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:38:03 +01:00
Emil Velikov
44978a91ff genxml: ship all the files needed in the tarball
v2: The xml files are not called "gen*_pack.xml" (Jason)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:38:03 +01:00
Emil Velikov
3f23a0f8c1 anv: remove description about GENX_FUNC macro
The macro has been gone since commit 1f1cf6fcb0 "anv: Get rid of
GENX_FUNC"

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-01 08:37:25 +01:00
Emil Velikov
0700cdd5aa gallium/target-helpers: remove inline_wrapper_sw_helper.h
Unused as of commit dddedbec0e "{st,targets}/nine: use static/dynamic
pipe-loader"

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:37:25 +01:00
Mark Kettenis
b8e59292e6 egl/x11: resolve "initialization from incompatible pointer type" warning
With earlier commit we've moved a few functions and changing the
argument type from _EGLDisplay * to struct dri2_egl_display *.

The latter is effectively a wrapper around the former, thus
functionality was preserved, although GCC rightfully warned us about the
misuse.

Add a simple wrapper that casts and propagates the correct type.

Fixes: 9bbf3737f9 ("egl/x11: authenticate before doing chipset id
ioctls")
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:37:25 +01:00
Chuck Atkins
a92910ae37 glx: Refactor the configure options for glx implementation choice (v3)
Instead of cascading support for various different implementations of
GLX, all three options are now specified through the --enable-glx
option:

  --enable-glx=dri          : Enable the DRI-based GLX
  --enable-glx=xlib         : Enable the classic Xlib-based GLX
  --enable-glx=gallium-xlib : Enable the gallium Xlib-based GLX
  --enable-glx[=yes]        : Defaults to dri if DRI is enabled, else
                              gallium-xlib if gallium is enabled, else
                              xlib

This removes the --enable-xlib-glx option and fixes a bug in which both
the classic xlib-glx and gallium xlib-glx implementations were getting
built causing different versioned and conflicting libGL libraries to be
installed.

v2: Changes from various review feedback from Emil:
  a) Fixed typos
  b) Corrected help docs for new option
  c) Added appropriate a-b and r-b tags in commit msg
  d) Fixed various GLX related dependency checks.
v3: Rebased to current master and added changelog in commit msg

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94086

Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-01 08:37:25 +01:00
Thomas Hindoe Paaboel Andersen
cbcd7b60f5 nir/lower_double_ops: fix indentation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-30 12:16:32 -07:00
Thomas Hindoe Paaboel Andersen
21424e019d nir/opt_dead_cf: fix indentation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-30 12:16:29 -07:00
Thomas Hindoe Paaboel Andersen
6935726197 nir/opt_dead_cf: correction of side effect check
Parenthesis are needed here as ! takes precedence over the &. The
check had the opposite effect than intended.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-30 12:16:22 -07:00
Rob Clark
663c0e5155 freedreno/ir3: use pipe_debug_callback for shader-db traces
For multi-threaded shader-db support.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:20 -04:00
Rob Clark
2578e3edcb freedreno/a4xx: add debug callback to emit
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:19 -04:00
Rob Clark
51f20dd279 freedreno/a3xx: add debug callback to emit
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:19 -04:00
Rob Clark
41d288c306 freedreno: wire up core pipe_debug_callback
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:19 -04:00
Rob Clark
e04db879f8 freedreno/ir3: handle color clamp variant ourselves
Now that there is a pass to do this in NIR, lets just use that and
manage the variants ourself, rather than letting state-tracker do it.
This way, mesa/st will precompile shaders without requiring
ST_DEBUG=precompile (which requires a debug build).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:19 -04:00
Rob Clark
64abf6d404 nir: clamp-color-output support
Handled by tgsi_emulate for glsl->tgsi case.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-04-30 14:56:19 -04:00
Rob Clark
482cdc4c92 freedreno: fix indentation
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-30 14:56:19 -04:00
Marek Olšák
53435514c1 radeonsi: fix synchronization of shader images
This fixes the winsys->cs_is_buffer_referenced query, which is used for
synchronization before buffers are mapped.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-30 19:36:16 +02:00
Samuel Pitoiset
8f2238ccba st/glsl_to_tgsi: fix potential crash when allocating temporaries
When index - t->temps_size is greater than 4096, allocating space for
temporaries on demand will miserably crash. This can happen when a game
uses a lot of temporaries like the recent released Tomb raider.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-30 17:41:32 +02:00
Kenneth Graunke
750c38fad1 glsl: Lower vector_extracts to swizzles after lower_vector_derefs.
lower_vector_derefs can produce new vector_extract operations.
Neither i965 nor st_glsl_to_tgsi can handle them, so we'd best
convert them to swizzles.

Together with the previous patch, this fixes assertion failures in
GLideN64, as well as a new Piglit test which reproduces the issue:
spec/glsl-1.10/compiler/vector-dereference-in-dereference.frag

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95164
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-29 16:03:36 -07:00
Kenneth Graunke
1cd600dbb9 glsl: Convert lower_vec_index_to_swizzle to a rvalue visitor.
The old visitor missed some cases.  For example, it wouldn't handle
an ir_dereference_array with a vector_extract as the index.

Rather than trying to add the missing cases, just rewrite it as an
ir_rvalue_visitor.  This makes it easy to replace any expression,
and is much less code.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95164
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-29 16:03:29 -07:00
Thomas Faller
d53cf1ea4c mesa: simplify _mesa_Lightfv
Signed-off-by: Thomas Faller <tfaller1@gmx.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-29 11:08:01 -06:00
Nicolai Hähnle
aa6f88f891 gallium/radeon: fix crash in r600_set_streamout_targets
Protect against dereferencing a gap in the targets array. This was triggered
by a test in the Khronos CTS.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-29 11:55:06 -05:00
Nicolai Hähnle
98c348d26b st/glsl_to_tgsi: reduce stack explosion in recursive expression visitor
In optimized builds, visit(ir_expression *) experiences inlining with gcc that
leads the function to have a roughly 32KB stack frame. This is a problem given
that the function is called recursively. In non-optimized builds, the stack
frame is much smaller, hence one gets crashes that happen only in optimized
builds.

Arguably there is a compiler bug or at least severe misfeature here. In any
case, the easy thing to do for now seems to be moving the bulk of the
non-recursive code into a separate function. This is sufficient to convince my
version of gcc not to blow up the stack frame of the recursive part. Just to be
sure, add the gcc-specific noinline attribute to prevent this bug from
reoccuring if inliner heuristics change.

v2: put ATTRIBUTE_NOINLINE into macros.h

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95133
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95026
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92850
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-29 11:52:59 -05:00
Nicolai Hähnle
59af21c3e9 tgsi/text: fix parsing of memory instructions
Properly handle Target and Format parameters when present.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:56 -05:00
Nicolai Hähnle
4055babc75 tgsi/text: add str_match_name_from_array
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:53 -05:00
Nicolai Hähnle
a56edbdd8f tgsi/text: add str_match_format helper function
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:51 -05:00
Nicolai Hähnle
acb65a23a3 tgsi/build: pass Memory.Texture and .Format through tgsi_build_full_instruction
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:32 -05:00
Nicolai Hähnle
318d305f6d tgsi/dump: signal nospace when the last print exceeded the size
Previously, there was a bug where nospace wasn't signalled if it just so
happened that the very last print exceeded the available space.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:28 -05:00
Nicolai Hähnle
e08eaa5b72 tgsi/dump: shared dump_ctx initialization
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-29 11:39:21 -05:00
Emil Velikov
4b1ea6910e st/omx: don't return early in vid_enc_EncodeFrame()
Earlier commit plugged a memory leak, although it missed a pair of
brackets. Thus we unconditionally returned even in the case of no error.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95203
Fixes: b87856d25d ("st/omx: Fix resource leak on OMX_ErrorNone")
Tested-by: Andy Furniss <adf.lists@gmail.com>
Acked-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
What an embarassing bug - missing brackets. Andy can you confirm that it
resolves the issue ?
2016-04-29 15:36:18 +01:00
Andres Gomez
c750029b37 glsl: Checks for interpolation into its own function.
This generalizes the validation also to be done for variables inside
interface blocks, which, for some cases, was missing.

For a discussion about the additional validation cases included see
https://lists.freedesktop.org/archives/mesa-dev/2016-March/109117.html
and Khronos bug #15671.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-04-29 08:03:00 +02:00
Jason Ekstrand
6d4a426745 nir/algebraic: Support lowering for both 64 and 32-bit ldexp
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Jason Ekstrand
f0af5b87ec nir/opcodes: Make ldexp take an explicitly 32-bit int
There is no sense in having the double version of ldexp take a 64-bit
integer.  Instead, let's just take a 32-bit int all the time.  This also
matches what GLSL does where both variants of ldexp take a regular integer
for the exponent argument.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Jason Ekstrand
bee40dd730 nir/opcodes: Simplify the expressions for [un]pack_double
The new expressions are more explicit in terms of where the bits go so it's
a little easier to tell what's going on.  This is the way GLSL specifies
things so it's a bit easier to verify too.  It also has the benifit that
the new expressions easily vectorize so we can constant-fold vector forms
of the _split versions correctly.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Kenneth Graunke
2655265fcb mesa: Fix indirect draw buffer size check on 32-bit systems.
Fixes dEQP-GLES31.functional subtests:
draw_indirect.negative.command_offset_not_in_buffer_signed32_wrap
draw_indirect.negative.command_offset_not_in_buffer_unsigned32_wrap

These tests use really large values that overflow GLsizeiptr, at
which point the buffer size isn't less than "end".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95138
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2016-04-28 16:31:45 -07:00
Jason Ekstrand
70f89dd75e nir: Switch the arguments to nir_foreach_def
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_def(\([^,]*\),\s*\([^,]*\))/nir_foreach_def(\2, \1)/

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
5015260a05 nir: Switch the arguments to nir_foreach_use and friends
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/

and similar expressions for nir_foreach_use_safe, etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
9464d8c498 nir: Switch the arguments to nir_foreach_function
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
e63766fb4b nir: Switch the arguments to nir_foreach_parallel_copy_entry
This matches the "foreach x in container" pattern found in many other
programming languages.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
8564916d01 nir: Switch the arguments to nir_foreach_phi_src
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/

and a similar expression for nir_foreach_phi_src_safe.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
707e72f13b nir: Switch the arguments to nir_foreach_instr
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/

and similar expressions for nir_foreach_instr_safe etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand
261d62de33 anv/lower_push_constants: fixup for nir_foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Jason Ekstrand
bb65764a4a anv/apply_pipeline_layout: fixup for nir_foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Jason Ekstrand
621cbc0c14 anv/apply_dynamic_offsets: fixup for nir_foreach_block()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
7efff10585 i965/nir: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
3a8688fb41 nir/algebraic: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
1f8c100614 nir/validate: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
a471c161b1 nir/nir_worklist: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
db35177772 nir/remove_dead_variables: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
b3aaae398e nir/split_var_copies: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
9d41a1ffeb nir/repair_ssa: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
480a182ccd nir/opt_peephole_select: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
e5f37701ab nir/phi_builder: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
1ba40d834b nir/opt_cp: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
8dd7d78925 nir/opt_remove_phis: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
1a8c17a59e nir/opt_undef: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
52affdd2e6 nir/opt_dead_cf: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
ddc6639f85 nir/opt_dce: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
3afb3be674 nir/opt_gcm: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
eecf96f530 nir/opt_constant_folding: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
26b4c9ee15 nir/lower_samplers: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
f4ebff89e4 nir/normalize_cubemap_coords: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
492b3554a7 nir/lower_var_copies: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
c1b37c08bf nir/move_vec_src_uses_to_dest: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
ceed12557d nir/lower_vars_to_ssa: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
1557344c81 nir/lower_vec_to_movs: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
b1eada04b2 nir/lower_idiv: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
2febb88e6d nir/lower_to_source_mods: fixup for new foreeach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
c81ca60b41 nir/lower_io: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
7e909972e3 nir/lower_system_values: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
76c74de456 nir/lower_phis_to_scalar: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
b89f0bb58c nir/lower_indirect_derefs: fixup for new foreach_block()
v2 (Jason Ekstrand): Use nir_foreach_block_safe

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
e3c5bda16a nir/nir_lower_global_vars: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
480d78f55b nir/lower_atomics: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
06cf73a7ba nir/lower_load_const: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
15264133d7 nir/lower_locals_to_regs: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
1c6307aab4 nir/lower_gs_intrinsics: fixup for new foreach_block()
v2 (Jason Ekstrand): Use nir_foreach_block_safe

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
3bf3100794 nir/nir: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
686f247b21 nir/lower_clip: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
e36fbcfc3f nir/lower_alu_to_scalar: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
4179a56f42 nir/liveness: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
34af78edb3 nir/inline_functions: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
b23e59e172 nir/from_ssa: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Connor Abbott
d6a6c729ca nir/dominance: fixup for new foreach_block()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 15:52:17 -07:00
Samuel Pitoiset
9f92a8f00a nvc0: stick compute kernel arguments into uniform_bo
Having one buffer object for input kernel arguments coming from clover
and an other one for OpenGL user uniforms is unnecessary. Using the
uniform_bo object for both GL/CL uniforms avoids to declare a new BO.

This only affects compute programs but it should not hurt anything
because the states are dirtied and data will get reuploaded.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-29 00:44:08 +02:00
Tim Rowley
124a5d4ca0 swr: remove duplicated constant update code
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-28 16:16:46 -05:00
Marek Olšák
1a8c2ccb24 gallium/radeon: add the size only once in r600_context_add_resource_size
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-28 21:06:31 +02:00
Bas Nieuwenhuizen
8e43bc0eb6 winsys/radeon: enlarge buffer_indices_hashlist
Enlarge the buffer hashlist to prevent large numbers of misses
due to adding more buffers than can be cached in the hashlist.

Ported from winsys/amdgpu: 6373845d98

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-28 21:06:31 +02:00
Marek Olšák
92f6af2c4a gallium/radeon: drop support for LINEAR_GENERAL layout
Unused. All texture imports use LINEAR_ALIGNED regardless of what
the DDX does.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-28 20:16:56 +02:00
Marek Olšák
f564b61d33 radeonsi: rework clear_buffer flags
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 20:16:56 +02:00
Jason Ekstrand
d273ce5259 anv/dynamic_offsets: Fix the order of arguments to nir_build_imm 2016-04-28 11:05:56 -07:00
Jason Ekstrand
6028a67641 anv: Fix a build error caused by recent fp64 NIR changes 2016-04-28 10:13:42 -07:00
Jose Fonseca
99474dc29b nir: Try to warn when C99 extensions are used in nir headers.
Ideally we'd have nir.h being included with -Wpedantic too, but it fails
with:

src/compiler/nir/nir.h:754:20: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
    nir_alu_src src[];
                    ^
In file included from src/compiler/nir/glsl_to_nir.cpp:42:0:
src/compiler/nir/nir.h:919:16: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
    nir_src src[];

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-04-28 16:48:13 +01:00
Jose Fonseca
e7438009af nir: Remove spurious ; after nir_builder functions.
Makes -pedantic happy.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-04-28 16:48:12 +01:00
Jose Fonseca
caa5937ebb nir: Remove spurious ; after namespace.
Makes -pedantic happy.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-04-28 16:48:12 +01:00
Jose Fonseca
f7854d8227 nir: Avoid C99 field initializers.
As they are not standard C++ and are not supported by MSVC C++ compiler.

Just have nir_imm_double match nir_imm_float above.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-04-28 16:48:12 +01:00
Brian Paul
a609da60c0 gallium/util: s/Elements/ARRAY_SIZE/
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 09:04:24 -06:00
Brian Paul
f365488eaa mesa: improve comment on _mesa_check_disallowed_mapping(), return bool
The old comment was a bit terse.  Also, change the function return
type to bool.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 09:04:17 -06:00
Marek Olšák
7e7710a068 radeonsi: remove needless cache flushes at the end of CP DMA operations
not needed AFAIK

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 12:46:47 +02:00
Marek Olšák
7d49b459b6 radeonsi: remove flushes at the beginning and end of IBs done by the kernel
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-28 12:46:47 +02:00
Samuel Iglesias Gonsálvez
db07b46f2c nir: Add lrp lowering for doubles in opt_algebraic
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.

Add 'lower_flrp64' flag to lower this instruction in that case.

v2:
   - Rename lower_flrp_double to lower_flrp64 (Jason)
   - Fix typo (Jason)
   - Adapt the code to define bit_size information in the opcodes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Samuel Iglesias Gonsálvez
443600d51e nir: rename lower_flrp to lower_flrp32
A later patch will add lower_flrp64 option to NIR.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga
072613b3f3 nir/lower_double_ops: lower round_even()
At least i965 hardware does not have native support for round_even() on doubles.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga
bf91df7f7f nir/lower_double_ops: lower fract()
At least i965 hardware does not have native support for fract() on doubles.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga
126a1ac03f nir/lower_double_ops: lower ceil()
At least i965 hardware does not have native support for ceil on doubles.

v2 (Sam):
   - Improve the lowering pass to remove one bcsel (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:36 +02:00
Iago Toral Quiroga
29541ec531 nir/lower_double_ops: lower floor()
At least i965 hardware does not have native support for floor on doubles.

v2 (Sam):
  - Improve the lowering pass to remove one bcsel (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:35 +02:00
Iago Toral Quiroga
5fab3d178b nir/lower_double_ops: lower trunc()
At least i965 hardware does not have native support for truncating doubles.

v2:
  - Simplified the implementation significantly.
  - Fixed the else branch, that was not doing what we wanted.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Connor Abbott
2ea3649c63 nir: add a pass to lower some double operations
v2: Move to compiler/nir (Iago)
v3: Use nir_imm_int() to load the constants (Sam)
v4 (Sam):
  - Undo line-wrap (Jason).
  - Fix comment (Jason).
  - Improve generated code for get_signed_inf() function (Connor).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Connor Abbott
2cf3b28884 nir/builder: add nir_imm_double()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Samuel Iglesias Gonsálvez
3a150683ce nir/builder: Add bit_size info to nir_build_imm()
v2:
- Group num_components and bit_size together (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Jakob Sinclair
76b8c5cc60 radeonsi: check if value is negative
Fixes a Coverity defect by adding checks to see if a value is negative
before using it to index an array. By checking the value first it makes
the code a bit safer but overall should not have a big impact.

CID: 1355598

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-04-28 11:33:38 +02:00
Michel Dänzer
860210ccfc clover: Fix build against clang SVN >= r267772
(Re-pushing previous fix for clang SVN r265359, which was reverted in
the meantime)

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-28 12:57:03 +09:00
Lars Hamre
32cb7d61a9 glsl: fix lowering outputs for early/nested returns
Return statements in conditional blocks were not having their
output varyings lowered correctly.

This patch fixes the following piglit tests:
/spec/glsl-1.10/execution/vs-float-main-return
/spec/glsl-1.10/execution/vs-vec2-main-return
/spec/glsl-1.10/execution/vs-vec3-main-return

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-28 11:01:51 +10:00
Connor Abbott
122d27e998 nir: rewrite nir_foreach_block and friends
Previously, these were functions which took a callback. This meant that
the per-block code had to be in a separate function, and all the data
that you wanted to pass in had to be a single void *. They walked the
control flow tree recursively, doing a depth-first search, and called
the callback in a preorder, matching the order of the original source
code. But since each node in the control flow tree has a pointer to its
parent, we can implement a "get-next" and "get-previous" method that
does the same thing that the recursive function did with no state at
all. This lets us rewrite nir_foreach_block() as a simple for loop,
which lets us greatly simplify its users in some cases. This does
require us to rewrite every user, although the transformation from the
old nir_foreach_block() to the new nir_foreach_block() is mostly
trivial.

One subtlety, though, is that the new nir_foreach_block() won't handle
the case where the current block is deleted, which the old one could.
There's a new nir_foreach_block_safe() which implements the standard
trick for solving this. Most users don't modify control flow, though, so
they won't need it. Right now, only opt_select_peephole needs it.

The old functions are reimplemented in terms of the new macros, although
they'll go away after everything is converted.

v2: keep an implementation of the old functions around
v3 (Jason Ekstrand): A small cosmetic change and a bugfix in the loop
   handling of nir_cf_node_cf_tree_last().
v4 (Jason Ekstrand): Use the _safe macro in foreach_block_reverse_call

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-27 15:05:40 -07:00
Connor Abbott
958300137f nir/opt_cp: use nir_block_get_following_if()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-27 15:05:34 -07:00
Jordan Justen
aaaa22c775 vbo: Return INVALID_OPERATION during draw with a mapped buffer
Fixes the OpenGLES 3.1 CTS:
 * ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos

Because this is triggering the error message after the normal API
validation phase, we don't have the API function name available, and
therefore we generate an error message without the draw call name:

Mesa: User error: GL_INVALID_OPERATION in draw call (vertex buffers are mapped)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95142
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 14:30:06 -07:00
Nanley Chery
28d0bc72fb anv/formats: Return proper error code for unsupported formats
Fixes some failures in dEQP-VK.api.info.image_format_properties.* and
enables the test group to execute without assert failing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-27 11:28:30 -07:00
Nanley Chery
5f7e8eac42 anv/device: Set the compressed texture feature flags correctly
Sampling from an ETC2 texture is supported on Bay Trail and
from Gen8 onwards. While ASTC_LDR is supported on Gen9, the
logic to handle such formats has not yet been implemented in
the driver.

Fixes dEQP-VK.api.info.format_properties.compressed_formats.

v2: Enable ETC2 for Bay Trail (Kenneth Graunke)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-27 11:28:30 -07:00
Jason Ekstrand
e0806930ad nir/algebraic: Add a bit-size validator
This commit adds a validator that ensures that all expressions passed
through nir_algebraic are 100% non-ambiguous as far as bit-sizes are
concerned.  This way it's a compile-time error rather than a hard-to-trace
C exception some time later.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand
8a3e344180 nir/opt_algebraic: Fix some expressions with ambiguous bit sizes
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand
7e0ee3a38b nir/search: Respect the bit_size parameter on nir_search_value
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand
fcc1c8a437 nir/algebraic: Add a mechanism for specifying the bit size of a value
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand
cafb885e45 nir/algebraic: Use "uint" instead of "unsigned" for uint types
This is consistent with the rename done for the rest of NIR.  Currently,
"bool" is the only type specifier used in nir_opt_algebraic.py so this is
really a no-op.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand
736ee0bef7 nir/algebraic: Do better error reporting of bad expressions
Previously, if an exception was encountered anywhere, nir_algebraic would
just die in a fire with no indication whatsoever as to where the actual bug
is.  This commit makes it print out the particular search-and-replace
expression that is causing problems along with the exception.  Also, it
will now report all of the errors it finds and then exit at the end like a
standard C compiler would do.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Alejandro Piñeiro
b1dcedf393 isl: move -lm at the end of tests_ldadd
The test was failing to build with "undefined reference to `roundf'" errors,
so Make check on mesa was failing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-27 20:14:56 +02:00
Topi Pohjolainen
aef6a6c382 i965/blorp/gen8: Fix blitting of interleaved msaa surfaces
Fixes ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample.

Current logic divides given layer of one by number of samples (four)
trashing the layer to zero. Layer adjustment is only to be used with
non-interleaved msaa surfaces where samples for particular layer are
in multiple slices.

I copy-pasted a bit of documentation from
brw_blorp.c::brw_blorp_compute_tile_offsets().

Also took the opportunity to fix the comment regarding sampling
as 2D, cube textures are the only exception.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-27 19:57:40 +03:00
Brian Paul
1d242b6882 llvmpipe: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
23c55e5c23 tgsi: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
419e386571 os: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
d902504a67 hud: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
e522a76226 gallivm: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
489df4a71a draw: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Brian Paul
f93802c465 softpipe: s/Elements/ARRAY_SIZE/
Try to standardize on the later, which is defined in the common util/
directory.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Nicolai Hähnle
562c4a17b7 winsys/radeon: remove use_reusable_pool parameter from buffer_create
All callers set this parameter to true.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:41 -05:00
Nicolai Hähnle
13acf2b243 gallium/radeon: remove use_reusable_pool parameter from r600_init_resource
All callers set it to true.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:41 -05:00
Nicolai Hähnle
c868974396 radeon/video: always use the reusable buffer pool
A semantic error was introduced in a past refactoring that caused the bind
parameter to be passed into the use_reusable_pool parameter of buffer_create.
Since this clearly makes no sense, and there is no clear reason why the
cache _shouldn't_ be used, just use the cache always.

Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:41 -05:00
Nicolai Hähnle
8c43c06e04 radeonsi: work around an MSAA fast stencil clear problem
A piglit test (arb_texture_multisample-stencil-clear) has been sent.
This problem was discovered analyzing

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle
7a215a3e27 radeonsi: expclear must be disabled on first Z/S clear
The documentation and the HW team say so.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle
01a3bb5d8b radeonsi: move blend choice out of loop in si_blit_decompress_color
It does not depend on the level or layer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle
450ff0f0d5 radeonsi: use level mask for early out in si_blit_decompress_color
Mostly for consistency with the other decompress functions, but note that
in the non-DCC decompress case, the function can now early-out in slightly
more (albeit probably rare) cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle
0ff05b55c6 radeonsi: si_blit_decompress_depth is only used for staging
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:40 -05:00
Nicolai Hähnle
0b70fc2db4 radeonsi: only decompress the required ZS planes from si_blit
This happens to "fix" a rendering bug in KotOR2, because it avoids a still
not quite understood bug with MSAA fast stencil clear decompress. For the
stencil clear bug, I have sent a piglit test (arb_texture_multisample-stencil-clear).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle
def53a0b3d radeonsi: decompress Z & S planes in one pass
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle
dc6fc2f390 radeonsi: early out of si_blit_decompress_depth_in_place based on dirty mask
Avoid dirtying the db_render_state atom when possible.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle
d14d6c3f58 radeonsi: use MIN2 instead of expanded ?: operator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle
159f182a57 radeonsi: fix brace style
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Nicolai Hähnle
91fb4bb2e9 gallium/util: add u_bit_consecutive for generating a consecutive range of bits
There are some undefined behavior subtleties, so having a function to match
the u_bit_scan_consecutive_range makes sense.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-27 11:16:39 -05:00
Tim Rowley
504df3a1d7 swr: s/Elements/ARRAY_SIZE/
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 11:07:34 -05:00
Nicolai Hähnle
836cab51c8 radeonsi: emit s_waitcnt for shader memory barriers and volatile
Turns out that this is needed after all to satisfy some strengthened
coherency tests. Depends on support in LLVM, added in r267729.

v2: updated to reflect changes to the LLVM intrinsic

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-04-27 10:54:05 -05:00
Tim Rowley
e7201bd31b swr: [rasterizer] warning cleanup
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:41:54 -05:00
Tim Rowley
24f23817d2 swr: [rasterizer core] implement legacy depth bias enable
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:41:45 -05:00
Tim Rowley
fa36f8ec9c swr: [rasterizer jitter] support for dumping x86 asm
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:41:32 -05:00
Tim Rowley
a646ffdacf swr: [rasterizer core] more backend refactoring
BackendPixelRate should be easier to read/maintain now hopefully.

Small perf bump by moving some of the pfn's to inline functions
without template params.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:41:21 -05:00
Tim Rowley
8e815ff72c swr: [rasterizer jitter] add mSimdInt1Ty
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:41:12 -05:00
Tim Rowley
4e1e0b3a32 swr: [rasterizer core] backend refactor
Lump all template args into a bundle of traits, and add some
functionality to the MSAA traits.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-27 10:40:44 -05:00
Brian Paul
43f46caf76 svga: use the SVGA3D_DEVCAP_MAX_FRAGMENT_SHADER_INSTRUCTIONS query
Instead of a hard-coded 512.  The query typically returns 65536 now.
Fall back to 512 if the query fails as we do for vertex shaders (which
should never happen).

Note that we don't actually enforce this limit in our shaders but it gets
reported via the glGetProgramivARB(GL_MAX_PROGRAM_INSTRUCTIONS_ARB) query.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-04-27 08:43:33 -06:00
Hans de Goede
b5e7907f30 nouveau: codegen: LOAD: Take src swizzle into account
The llvm TGSI backend uses pointers in registers and does things
like:

LOAD TEMP[0].y, MEMORY[0], TEMP[0]

Expecting the data at address TEMP[0].x to get loaded to
TEMP[0].y. But this will cause the data at TEMP[0].x + 4 to be
loaded instead.

This commit adds support for a swizzle suffix for the 1st source
operand, which allows using:

LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]

And actually getting the desired behavior

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-27 16:11:48 +02:00
Hans de Goede
90f45357ab nouveau: codegen: LOAD: Do not call fetchSrc(1) if the address is immediate
"off" later gets set to NULL when the address is immediate, so move the
fetchSrc(1) call to the non-immediate branch of the if-else. This brings
handleLOAD's offset handling inline with how it is done in handleSTORE.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-27 16:11:48 +02:00
Hans de Goede
1958397a58 nouveau: codegen: LOAD: Always use component 0 when getting the address
LOAD loads upto 4 components from the specified resource starting at
the passed in x value of the 2nd source operand, the y, z and w
components of the address should not be used.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-27 16:11:48 +02:00
Stefan Dirsch
7d25ed7036 dri3: Check for dummyContext to see if the glx_context is valid
According to the comments in src/glx/glxcurrent.c __glXGetCurrentContext()
always returns a valid pointer. If no context is made current, it will
contain dummyContext. Thus a test for NULL will always fail.

https://lists.freedesktop.org/archives/mesa-dev/2016-April/113962.html

Signed-off-by: Stefan Dirsch <sndirsch@suse.de>
Reviewed-by: Egbert Eich <eich@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-27 13:03:34 +01:00
Egbert Eich
4d9b518ad2 dri2: Check for dummyContext to see if the glx_context is valid
According to the comments in src/glx/glxcurrent.c __glXGetCurrentContext()
always returns a valid pointer. If no context is made current, it will
contain dummyContext. Thus a test for NULL will always fail.

https://bugzilla.opensuse.org/show_bug.cgi?id=962609

Tested-by: Olaf Hering <ohering@suse.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-27 13:03:11 +01:00
Timothy Arceri
6d1a59d15b glsl: move uniform block validation to link_uniform_blocks.cpp
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-27 16:17:47 +10:00
Kenneth Graunke
73ada723f0 docs: Mention that {ARB,OES}_texture_stencil8 is supported on i965/gen8+
Thanks to Thomas Helland for reminding me to do this.
2016-04-26 21:32:35 -07:00
Kenneth Graunke
fd9a7d8f30 i965: Enable ARB_texture_stencil8 and OES_texture_stencil8 on Gen8+.
Stencil texturing is required by ES 3.1.  Apparently we never actually
turned it on.  Do that now.  Also turn on the desktop extension.

Fixes nine dEQP-GLES31.functional tests:

stencil_texturing.format.stencil_index8_2d
texture.border_clamp.formats.stencil_index8.nearest_size_pot
texture.border_clamp.formats.stencil_index8.nearest_size_npot
texture.border_clamp.formats.stencil_index8.gather_size_pot
texture.border_clamp.formats.stencil_index8.gather_size_npot
texture.border_clamp.unused_channels.stencil_index8
state_query.internal_format.renderbuffer.stencil_index8_samples
state_query.internal_format.texture_2d_multisample.stencil_index8_samples
state_query.internal_format.texture_2d_multisample_array.stencil_index8_samples

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-04-26 21:32:35 -07:00
Kenneth Graunke
12c43a355c mesa: Try to fix CopyTex[Sub]Image of stencil textures.
ES prohibits this, but GL appears to allow it.  We at least need this
much, or else we'll crash as there's no source to read from.

This fixed crashes in the ES tests before I realized I needed to
prohibit stencil instead.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-04-26 21:32:35 -07:00
Kenneth Graunke
027c6c1222 mesa: Disallow CopyTexSubImage on stencil formats in ES.
Fixes
- ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8
- ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-04-26 21:32:35 -07:00
Kenneth Graunke
1e44599a43 i965: Fix MapTextureImage for multi-slice/level stencil buffers.
We called intel_miptree_get_image_offset() to get the image offsets
for the current level/slice, but then proceeded to ignore the results
and clobber level/slice 0 every time.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94713
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-04-26 21:32:35 -07:00
Kenneth Graunke
361a24e140 i965: Move TCS output indirect_offset.file check out a level.
I want to add another condition.  Moving the indirect_offset.file
check out a level should make this a little easier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:59:56 -07:00
Kenneth Graunke
13195f7ef8 i965/fs: Reduce the response length of sampler messages on Skylake.
Often, we don't need a full 4 channels worth of data from the sampler.
For example, depth comparisons and red textures only return one value.
To handle this, the sampler message header contains a mask which can
be used to disable channels, and reduce the message length (in SIMD16
mode on all hardware, and SIMD8 mode on Broadwell and later).

We've never used it before, since it required setting up a message
header.  This meant trading a smaller response length for a larger
message length and additional MOVs to set it up.

However, Skylake introduces a terrific new feature: for headerless
messages, you can simply reduce the response length, and it makes
the implicit header contain an appropriate mask.  So to read only
RG, you would simply set the message length to 2 or 4 (SIMD8/16).

This means we can finally take advantage of this at no cost.

total instructions in shared programs: 9091831 -> 9073067 (-0.21%)
instructions in affected programs: 191370 -> 172606 (-9.81%)
helped: 2609
HURT: 0

total cycles in shared programs: 70868114 -> 68454752 (-3.41%)
cycles in affected programs: 35841154 -> 33427792 (-6.73%)
helped: 16357
HURT: 8188

total spills in shared programs: 3492 -> 1707 (-51.12%)
spills in affected programs: 2749 -> 964 (-64.93%)
helped: 74
HURT: 0

total fills in shared programs: 4266 -> 2647 (-37.95%)
fills in affected programs: 3029 -> 1410 (-53.45%)
helped: 74
HURT: 0

LOST:   1
GAINED: 143

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-26 19:55:04 -07:00
Jason Ekstrand
d800b7daa5 nir: Add a helper for figuring out what channels of an SSA def are read
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:55:04 -07:00
Jason Ekstrand
acc2f1fe36 i965/fs: Use inst->regs_written for rlen for texture instructions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:55:04 -07:00
Jason Ekstrand
c7a09c0571 i965/fs: Properly report regs_written from SAMPLEINFO
The previous behavior would only allocate one register and then write
four thus potentially stomping three innocent bystanders.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:55:04 -07:00
Jason Ekstrand
30b37e4e9b i965/blorp: Set regs_written on texturing instructions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:55:04 -07:00
Kenneth Graunke
0bd956b34b i965: Don't force a header for texture offsets of 0.
Calling textureOffset() with an offset of <0, 0, 0> is equivalent to
calliing texture().  We don't actually need to set up an offset,
which causes a message header to be created.

A fairly common pattern is to sample at a point with a bunch of
offsets, and average them.  It's natural to write all the lookups
as textureOffset, but use <0, 0> for the center sample.

shader-db results on Skylake:

total instructions in shared programs: 9092095 -> 9092087 (-0.00%)
instructions in affected programs: 2826 -> 2818 (-0.28%)
helped: 12
HURT: 2

total cycles in shared programs: 70870166 -> 70870144 (-0.00%)
cycles in affected programs: 15924 -> 15902 (-0.14%)
helped: 2
HURT: 0

This also helps prevent code quality regressions in a future patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Jason Ekstrand <jason@jlekstrand.net>
2016-04-26 19:55:04 -07:00
Patrick Rudolph
fb5d38e219 r600g: fix and optimize tgsi_cmp when using ABS and NEG modifier
Some apps set NEG and ABS on the source param to test for zero.
Use ALU_OP3_CNDE insted of ALU_OP3_CNDGE and unset both modifiers.

It also removes the need for a MOV instruction, as ABS isn't
supported on op3.

Tested on AMD CAYMAN and AMD RV770.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 12:48:50 +10:00
Dave Airlie
7aa3a93656 docs: update softpipe for ARB_compute_shader
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:01:12 +10:00
Dave Airlie
e749c30ceb softpipe: add support for compute shaders. (v2)
This enables ARB_compute_shader on softpipe. I've only
tested this with piglit so far, and I hopefully plan
on integrating it with my vulkan work. I'll get to
testing it with deqp more later.

The basic premise is to create up to 1024 restartable
TGSI machines, and execute workgroups of those machines.

v1.1: free machines.
v2: deqp fixes - add samplers support, finish
atomic operations, fix load/store writemasks.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:01:03 +10:00
Dave Airlie
f78bcb7638 tgsi/exec: initialise SysSemanticToIndex array to -1
We want to use the SysSemanticToIndex to tell if we've seen
the semantics at all.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:00:46 +10:00
Dave Airlie
fbea4e177f tgsi/exec: implement restartable machine.
This lets us restart the machine at a PC value, and exits
the machine when we hit a barrier.

Compute shaders will then execute all the threads up to the
barrier, then restart the machines after the barrier once
all are done.

v2: comment the code a bit, change return types.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:00:44 +10:00
Dave Airlie
8ffa3c58d4 tgsi/exec: make inputs/outputs optional for compute shaders.
compute shaders don't need input/outputs so don't bother
allocating memory for these.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:00:41 +10:00
Dave Airlie
16a9dc1e49 tgsi/exec: implement load/store/atomic on MEMORY.
This implements basic load/store/atomic ops on MEMORY types
for compute shaders.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:00:35 +10:00
Dave Airlie
354c5f2d0f tgsi/exec: split out setting up masks to separate function
This is just a cleanup that will make later changes easier
to make.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 08:56:22 +10:00
Dave Airlie
6cf36a7231 tgsi: accept a starting PC value for exec machine.
This will be used later to restart barriered execution
threads in compute, for now we just want to change the API.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 08:56:17 +10:00
Dave Airlie
912ed84f83 tgsi: move to using vector for system values.
For compute support some of the system values are .xyz types,
so move to using a vector instead of a single channel.

[airlied: squash swizzle fix from compute series].

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 08:26:53 +10:00
Dave Airlie
9013d9267c tgsi/exec: fix system value handling.
a) SysSemanticToIndex needs to be indexed with the semantic name
not the decl->Declaration.Semantic.

b) doing this in run is too late, as the mappings are all setup
prior to run in the execs.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 08:25:38 +10:00
Jason Ekstrand
4040fff81d i965/blorp: Convert state setup to C
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
71775afe6e i965/blorp: Make state setup C-safe
Previously they (very rarely) used C++isms that prevented them from being
compiled as C.  As of this commit, they can be compiled as either C or C++.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
bed74299c2 i965/blorp: Convert brw_blorp.cpp to a C file
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
0551f3dfa4 i965/blorp: Make all of brw_blorp.h accessible to C
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
b3f08b5424 i965/blorp: Turn brw_blorp_params into a C-style struct
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
33fa12c50f i965/blorp: Turn coord_transform into a C-style struct
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
b6dd8e42f0 i965/blorp: Turn blorp_surface_info into a C-style struct
This commit is mostly mechanical except that it changes where we set the
swizzle.  Previously, the blorp_surface_info constructor defaulted the
swizzle to SWIZZLE_XYZW.  Now, we memset to zero and fill out the swizzle
when we setup the rest of the struct.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
a543f741bf i965/blorp: Roll mip_info into surface_info
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
3839936497 i965/blorp: Get rid of the blorp_blit_params class
It was really just a wrapper around the function that constructed it.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
8096ed7e27 i965/blorp: Remove the hiz params class
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
e35d9407dc i965/blorp: Remove the clear params classes
They didn't really add anything other than a key and extra layers of
function calls.  This commit just inlines the extra functions and gets rid
of the extra classes.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
659400cba3 i965/blorp: Remove the arguments to brw_blorp_params()
No one was using anything other than the defaults.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Jason Ekstrand
2dda4ff014 i965/blorp: Refactor to get rid of the get_wm_prog virtual function
Instead of having a virtual member function for getting the WM/PS kernel,
we simply add fields for prog_data and the kernel to brw_blorp_parms and
always make sure those get set as part of the different constructors.

v2: Use use prog_data != NULL to check for a valid program instead of a
    magic kernel offset value

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-26 14:55:22 -07:00
Tim Rowley
18d1658633 swr: autogenerate swr_context_llvm.h
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-26 16:45:26 -05:00
Laurent Carlier
12cf08fcc3 anv: honor DESTDIR when installing icd file
https://bugs.freedesktop.org/show_bug.cgi?id=94969

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:57:54 -07:00
Juha-Pekka Heikkila
ec5f7fc7bd i965/meta: initialize values to avoid random behaviour on error path
if brw_meta_stencil_blit() errored at wrong place 'target' would
be uninitialized and cause random behaviour on leaving the funtion.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:54:29 -07:00
Juha-Pekka Heikkila
51632d6f27 meta: Avoid random memory access on error
Initialize drawFb to NULL in _mesa_meta_CopyImageSubData_uncompressed()
if getting readFb fails uninitialized drawFb will cause randomness
on cleanup.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:54:02 -07:00
Grazvydas Ignotas
cea3a7e615 mesa: add tags file to gitignore
For ctags users like me.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:49:27 -07:00
Jakob Sinclair
dda50af9c4 mesa: Remove every double semi-colon
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
e5d027ec7d glx: Remove every double semi-colon
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
ea327dc451 gallium: Remove every double semi-colon
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
de743a07ac egl: Remove every double semi-colon
Removes all accidental semi-colons in egl.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
e129e6eb89 gallium/r600: removing double semi-colons
Trivial change. Removing unnecessary semi-colons from the code.
I don't have push access so someone reviewing this can push it.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
12da8bb5f4 mesa/main: removing double semi-colons
Trivial change. Removing unnecessary semi-colons from the code.
I don't have push access so someone reviewing this can push it.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jakob Sinclair
09e4ac00ac glsl: removing double semi-colons
Trivial change. Removing unnecessary semi-colons from the code.
I don't have push access so someone reviewing this can push it.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-26 14:36:29 -07:00
Jose Fonseca
52c7443932 glx: Don't enclose includes inside extern "C" { }.
Ran `make check` inside src/glx to verify everything compiles and links
correctly.

https://bugs.freedesktop.org/show_bug.cgi?id=95158

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-26 21:28:34 +01:00
Marek Olšák
80e5fb60b4 radeonsi: add RW_BUFFERS only once in si_ce_needed_cs_space
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-26 21:37:07 +02:00
Marek Olšák
2b4b5ebfcf egl: fix make check broken by interop support 2016-04-26 21:37:07 +02:00
Samuel Pitoiset
e64ee4cf60 docs: mark ARB_compute_shader as done for nvc0
This has been merged few months ago but this should help
https://mesamatrix.net/ to update its list of supported extensions.

Please note that compute shaders are not really useful without
ARB_image_load_store and only GK104 and GK110 support it for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-26 21:10:10 +02:00
Samuel Pitoiset
5c429f88d9 nvc0: expose GLSL version 420 on GK110
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
a0e777f6a1 nvc0: enable ARB_shader_image_load_store on GK110
This exposes 8 images for all shader types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
2daaa5d657 gk110/ir: add emission for VSHL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
af5925209d gk110/ir: add emission for OP_SUEAU, OP_SUBFM and OP_SUCLAMP
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
1f8900a8e0 gk110/ir: add emission for OP_SULDB and OP_SUSTx
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
fddd8523d4 gk110/ir: add emission for OP_MADSP
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
c2ce22ca46 gk110/ir: add emission for OP_PERMT
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
222d1a1bff nvc0: expose GLSL version 420 on GK104
Other chipsets will be added later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Ilia Mirkin
9e367ed480 nvc0: enable ARB_shader_image_load_store on GK104
This exposes 8 images for all shader types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
0d64d39e81 nvc0: inform users that 3D images are not fully supported
3D images are a bit more complicated to implement and will probably
requires a bunch of headaches and we don't care for now because they
do not seem to be really used by apps.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
fdbb476829 nvc0: reduce GL_MAX_3D_TEXTURE_SIZE to 2048 on Kepler+
The blob sets it to 2048 and using 4096 reports an INVALID_DATA error
with RT_ARRAY_MODE when z is 4096. Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
6fc6d548ed nvc0/ir: check that the image format doesn't mismatch
This re-uses NVE4_SU_INFO_CALL which is not used anymore because we
don't use our lib for format conversions. While we are at it, add a
todo for image buffers because there are some robustness-related
issues to fix.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
fbeb69757c nvc0/ir: prevent out of bounds when no images are bound
Checking if the image address is not 0 should be enough to prevent
read faults. To improve robustness, make sure that the destination
value of atomic operations is correctly initialized in case the
instruction is not performed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
5ba5714483 nvc0/ir: add indirect support for images on Kepler
This fixes arb_shader_image_load_store-indexing and
arb_shader_image_load_store-max-images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
8b540db44c nvc0/ir: fix 1D arrays images for Kepler
For 1D arrays, the array index is stored in the Z component.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
e478156ed7 nvc0/ir: fix cube images for Kepler
Like 2d array images, the z-dimension needs to be clamped.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Ilia Mirkin
3ce80f924d nv50/ir: add support for SULDP -> SULDB conversion
This will allow to convert surface formats without adding an extra
call to our lib.

[hakzsam: make use of this for GK104]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
d64ea4e48e nv50/ir: make use of OP_SUQ for surfaces query
This implements RESQ for surfaces which comes from imageSize() GLSL
bultin. As the dimensions are sticked into the driver constant buffer,
this only has to be lowered with loads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v2)
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
7c47db359e nv50/ir: add OP_BUFQ for buffers query
TGSI RESQ allows both images and buffers but we have to make a
distinction between these two type of resources in our lowering pass.
Introducing OP_BUFQ which is a fake operand will allow to implement
OP_SUQ for surfaces.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
e09434047d nv50/ir: enable early fragment test with explicit user control
This feature can be enabled in two ways: as an optimization and by
explicit user control (with OpenGL 4.2 or ARB_shader_image_load_store).

This makes use of the recent TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL to
force early fragment tests when needed.

This fixes a bunch of
dEQP-GLES31.functional.image_load_store.early_fragment_tests.* tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
08f4faa542 nvc0/ir: fix constraints for OP_SUSTx on Kepler
Destination type is actually always 32-bits, so typeSizeof() returns 4
and no sources are condensed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
119d087758 nv50/ir: re-introduce TGSI lowering pass for images
This is loosely based on the previous lowering pass wrote by calim
four years ago. I did clean the code and fixed some issues.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
76ea143c38 nv50/ir: add support for TGSI image declarations
Old and dead resource code will be removed once images are completely
done. Based on original patch by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
1fb3cd2489 nvc0: add missing glMemoryBarrier bits
This fixes a bunch of subtests of
arb_shader_image_load_store-host-mem-barrier.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
9bc18a48f3 nvc0: enable RGB10_A2UI format on GK104
No clue why this was not enabled by default before, maybe because
the SULDP conversion was wrong. Anyway, this helps in fixing all
rgb10_a2ui piglit tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
da8171dc75 nvc0: shift address with blocksize for image buffers
This fixes a bunch of dEQP image buffers related tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
285f2edd14 nvc0: fix address offset when images have multiple levels
This fixes arb_shader_image_load_store-level.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
e28f247e24 nvc0: bind images on 3D shaders for Kepler
Similar to surfaces validation for compute shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
1eca4c51a2 nvc0: bind images on compute shaders for Kepler
Old surfaces validation code will be removed once images are completely
done for Fermi/Kepler, that explains why I only disable it for now.

This also introduces nvc0_get_surface_dims() which computes correct
dimensions regarding the given target.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
c6b3c346d1 nvc0: reserve an area for surfaces info in the driver constbuf
To process surfaces coordinates from the codegen part, and because
some information like the format is not always available (eg. when
writeonly is used), we have to stick some surfaces data in the
driver constbuf. This is especially true for OpenCL because we don't
know the format at shader compile time.

This bumps the size of each shader area from 1K to 2K.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
afa04785fa nvc0: add preliminary support for images
This implements set_shader_images() and resource invalidation for
images. As OpenGL requires at least 8 images, we are going to expose
this minimum value even if this might be raised for Kepler, but this
limit is mainly for Fermi because the hardware only accepts 8 images.

Based on original patch by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
c62b1b92f7 gk110/ir: add emission for (a OP b) OP c
This is pretty similar to NVC0 except that offsets have changed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-26 19:47:49 +02:00
Samuel Pitoiset
3da8528846 nvc0/ir: fix wrong emission of (a OP b) OP c
The third source must be emitted at offset 49 instead of 17 and the
not modifier is at 52 instead of 20. If you look a bit above in
emitLogicOp() you will see that the dest is emitted at 17 which
confirms that src(2) is obviously wrong.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-26 19:47:49 +02:00
Jose Fonseca
a2fe35bcdf scons: Support Clang on Windows.
- Introduce 'gcc_compat' env flag, for all compilers that define __GNUC__,
  (which includes Clang when it's not emulating MSVC.)

- Clang doesn't support whole program optimization

- Disable enumerator value warnings (not sure why Clang warns about them,
  as my understanding is that MSVC promotes enums to unsigned ints
  automatically.)

This is not enough to build with Clang + AddressSanitizer though.  More
follow up changes will be required for that.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-26 17:17:00 +01:00
Jose Fonseca
dcc3baf733 gallium: Include intrin.h instead of defining ourselves.
More portable, particularly when building with Clang, which implements
all MSVC intrisincs in its own intrin.h, but doesn't actually support
`#pragma instrinsic`.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-26 17:17:00 +01:00
Jose Fonseca
9a25c8af1b scons: Whenever possible decide what to do based on platform and not compiler.
Because compilers like GCC and Clang are effectively available everywhere
so their presence/absence is seldom conclusive.

Furthermore, all compilers we use now have stdint.h.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-26 17:17:00 +01:00
Jose Fonseca
c068610a7d scons: Move fallback HAVE_* definitions to headers.
These were being defined in SCons, but it's not practical:

- we actually need to include Gallium headers from external source trees, with
completely disjoint build infrastructure, and it's unsustainable to
replicate the HAVE_xxx checks or even hard-coded defines across
everywhere.

- checking compiler version via command line doesn't really work due to
  Clang essentially being like a cameleon which can fake either GCC or
  MSVC

There's no change for autoconf.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-26 17:17:00 +01:00
Juha-Pekka Heikkila
940da2ce0e nir: Add missing break into switch in construct_value()
There seemed to be missing one break in nested switchcases.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-26 17:45:56 +02:00
Bas Nieuwenhuizen
31631d8515 radeonsi: Fix memory leak in error path.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-26 15:41:19 +02:00
Oded Gabbay
514c5b5f4b radeonsi: fix build error because of missing param
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-26 13:48:43 +03:00
Oded Gabbay
965175aba3 r600g: use do_endian_swap in texture swapping function
For some texture formats we need to take "do_endian_swap" into account
when configuring their swizzling.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-26 11:00:16 +03:00
Oded Gabbay
c86c761343 r600g: use do_endian_swap in color swapping functions
For some formats we need to take "do_endian_swap" into account when
configuring swapping for color buffers.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-26 11:00:16 +03:00
Oded Gabbay
686ad477bd r600g: set endianess of 16/32-bit buffers according to do_endian_swap
This patch modifies r600_colorformat_endian_swap(), so for 16-bit and for
32-bit buffers, the endianess configuration will be determined not only
by the color/texture format, but also by the do_endian_swap parameter.

The only exception is for array formats, which are always set to not do
swapping, because for them gallium sets an alias based on the machine's
endianess.

v4:
V_0280A0_COLOR_16_16 and V_0280A0_COLOR_16_16_FLOAT should be set to
8IN16 because the bytes inside need to be swapped even for array formats.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-26 11:00:16 +03:00
Oded Gabbay
2242dbe11d r600g/radeonsi: send endian info to format translation functions
Because r600 GPUs can't do swap in their DB unit, we need to disable
endianess swapping for textures that are handled by DB.

There are four format translation functions in r600g driver:

- r600_translate_texformat
- r600_colorformat_endian_swap
- r600_translate_colorformat
- r600_translate_colorswap

This patch adds a new parameters to those functions, called
"do_endian_swap". When running in a big-endian machine, the calling
functions will check whether the texture/color is handled by DB -
"rtex->is_depth && !rtex->is_flushing_texture" - and if so, they will
send FALSE through this parameter. Otherwise, they will send TRUE.

The translation functions, in specific cases, will look at this parameter
and configure the swapping accordingly.

v4:
evergreen_init_color_surface_rat() is only used by compute and don't
handle DB surfaces, so just sent hard-coded FALSE to translation
functions when called by it.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-26 11:00:16 +03:00
Ilia Mirkin
4965c5bf72 glsl: add ability to use essl 3.20
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-25 23:40:54 -04:00
Ilia Mirkin
fa8c0ccfbc main: select ES3.2 version when all extensions are available
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-25 23:40:34 -04:00
Dave Airlie
e3e6859381 tgsi: pass a shader type to the machine create and clean up.
There was definitely bugs here mixing up the PIPE_ and TGSI_ defines,
hopefully they didn't cause any problems, since mostly it was special
cases for GEOMETRY.

This clarifies at shader machine create what type of shader this
machine will execute. This is needed also for compute shaders where
we don't want to allocate inputs/outputs.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-26 13:05:32 +10:00
Dave Airlie
a6aae0c24d gallium/tgsi: move tgsi_exec.h header out of draw_context.h
It gets annoying that changing the tgsi exec rebuilds the state
tracker unnecessarily. Putting this include into draw_gs.h which
uses it causes a lot less rebuilds.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-26 13:00:57 +10:00
Roland Scheidegger
bd07e20d20 gallivm: make sampling more robust against bogus coordinates
Some cases (especially these using fract for coord wrapping) did not handle
NaNs (or Infs) correctly - the following code assumed the fract result
could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract
result was NaN - which then could produce out-of-bound offsets.

(Note that the explicit NaN behavior changes for min/max on x86 sse don't
result in actual changes in the generated jit code, but may on other
architectures. Found by looking through all the wrap functions.)

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955

No piglit changes.

(v2: fix min/max typo in coord_mirror, add comment)

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>

Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-26 04:55:37 +02:00
Dave Airlie
d8edc3e97c radeonsi: fix missing include for Elements.
Since u_blitter.h no longer defines this.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-26 09:36:23 +10:00
Samuel Pitoiset
d12c3b02ff nvc0: bump the amount of shared memory per MP on Maxwell
According to the CUDA compute capability version, GM10x can expose
64KB of shared memory while GM20x can use 96KB.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-26 00:32:25 +02:00
Dave Airlie
5b6a1aee46 r600: fix missing include for Elements macro
This got removed from u_blitter.h and we were taking it from
there, this should just move to ARRAY_SIZE eventually.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-26 08:01:01 +10:00
Samuel Pitoiset
725431a5db gm107/ir: s/invalid load/invalid store/
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-25 23:55:52 +02:00
Rob Clark
d2fcd0ce38 freedreno/a3xx: remove unused fxn
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 17:10:14 -04:00
Rob Clark
8fe2076243 freedreno/ir3: convert over to ralloc
The home-grown heap scheme (which is ultra-simple but probably not good
to always allocate and memset such a chunk of memory up front) was a
remnant of fdre (where the ir originally came from).  But since we have
ralloc in mesa, lets just use that instead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 17:09:09 -04:00
Rob Clark
27cf3b0052 mesa/st: log some additional invalid-fbo cases
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 17:08:22 -04:00
Rob Clark
2c8674f5a9 freedreno: honor handle->offset
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:22 -04:00
Rob Clark
dfd23abdcc freedreno: disallow cat4 immed src
Normally this would never happen (constant-propagation in NIR would
eliminate the instruction), except it does happen for 'undef' which
we turn into immed 0.0 for bookkeeping purposes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:21 -04:00
Rob Clark
76c6cdd36a freedreno/a4xx: add render-target formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:21 -04:00
Rob Clark
7add166a5c freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:21 -04:00
Rob Clark
edcc6ce75d freedreno: reduce line width for deqp further
See a7eb12d0.. but that wasn't restrictive enough.  Fixes
dEQP-GLES3.functional.rasterization.primitives.line_strip_wide, and
similar

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:21 -04:00
Rob Clark
4610e5ef28 freedreno/ir3: fix sin/cos
We seem to need range reduction to get sane results.  Fixes glmark2
jellyfish bench, and a whole bunch of
dEQP-GLES3.functional.shaders.builtin_functions.precision.{sin,cos,tan}.*

v2: squashed in android build fixes from Rob Herring

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-25 16:16:21 -04:00
Kenneth Graunke
21b4bcdd05 i965: Unroll SIMD16 DDY_FINE on Sandybridge.
This fixes 10 dEQP-GLES3 subtests:
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.float_nicest.*.

Matt noticed that our Piglit tests for this use even numbered registers,
while the failing dEQP tests use odd numbered registers.  We believe
that it works for even numbered registers, but not otherwise.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-25 13:13:00 -07:00
Brian Paul
e915903c10 docs: update the instructions for getting a git account
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-25 14:10:40 -06:00
Brian Paul
ef3f00edd8 docs: update link to Intel's graphics website
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-25 14:10:40 -06:00
Jordan Justen
50b82ecd77 mesa/gles: Allow format GL_RED to be used with MESA_FORMAT_R_UNORM
If the bound framebuffer has a format of MESA_FORMAT_R_UNORM, then
IMPLEMENTATION_COLOR_READ_FORMAT will return GL_RED. This change
applies to OpenGLES contexts where additional restrictions are placed
on the formats that are allowed to be supported.

Fixes OpenGLES 3.1 CTS tests:
 * ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC16
 * ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC16Linear
 * ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC32F
 * ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC32FLinear

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-25 12:09:15 -07:00
Charmaine Lee
c4cb879f00 svga: eliminiate unnecessary constant buffer updates
Currently if the texture binding is changed, emit_fs_consts()
is triggered to update texture scaling factor for
rectangle texture or texture buffer size in the constant buffer.
But the update is only relevant if the texture binding includes
a rectangle texture or a texture buffer.

To eliminate the unnecessary constant buffer updates due to other texture
binding changes, a new flag SVGA_NEW_TEXTURE_CONSTS will be used
to trigger fragment shader constant buffer update when a rectangle texture
or a texture buffer is bound.

With this patch, the number of constant buffer updates in Lightsmark2008
reduces from hundreds per frame to about 28 per frame.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 12:59:29 -06:00
Charmaine Lee
686cd3c606 svga: mark the texture dirty for write transfer map only
Instead of unconditionally mark the texture subresource dirty at transfer map,
we'll set the dirty bit for write transfer only.

Tested with lightsmark2008 and glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 12:59:29 -06:00
Charmaine Lee
676931640f svga: fix assert with PIPE_QUERY_OCCLUSION_PREDICATE for non-vgpu10
With this patch, when running in hardware version 11, we'll use
SVGA3D_QUERYTYPE_OCCLUSION query type for PIPE_QUERY_OCCLUSION_PREDICATE
and return TRUE if samples-passed count is greater than 0.

Fixes glretrace/solidworks2012_viewport running in hardware version 11.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 12:59:29 -06:00
Charmaine Lee
d7a6c1a476 svga: minimize surface flush
Currently, we always do a surface flush when we try to establish
a synchronized write transfer map. But if the subresource has not
been modified, we can skip the surface flush. In other words,
we only need to do a surface flush if the to-be-mapped subresource
has been modified in this command buffer.

With this patch, lightsmark2008 shows about 15% performance improvement.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 12:59:29 -06:00
Frederic Devernay
23949cdf2c glapi: fix _glapi_get_proc_address() for mangled function names
In the dispatch table, all functions are stored without the "m" prefix.
Modify code so that OSMesaGetProcAddress works both with gl and mgl
prefixes. Similar to
https://lists.freedesktop.org/archives/mesa-dev/2015-September/095251.html

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94994
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-25 12:59:29 -06:00
Brian Paul
63df017fda util/blitter: use ARRAY_SIZE macro
And remove local definition of Elements() macro.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-25 12:59:29 -06:00
Brian Paul
e0184b3995 svga: s/Elements/ARRAY_SIZE/
Standardize on the later macro rather than a mix of both.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-04-25 12:59:29 -06:00
Brian Paul
77e4b41671 svga: whitespace and formatting fixes in svga_pipe_rasterizer.c 2016-04-25 12:59:29 -06:00
Brian Paul
25e0d3659f svga: whitespace and formatting fixes in svga_pipe_depthstencil.c 2016-04-25 12:59:29 -06:00
Brian Paul
595fbc8dee svga: whitespace and formatting fixes in svga_pipe_sampler.c 2016-04-25 12:59:29 -06:00
Brian Paul
1db8313168 gallium/util: initialize pipe_framebuffer_state to zeros
To silence a valgrind uninitialized memory warning.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94955
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-25 12:59:29 -06:00
Brian Paul
1e990978ee util/cache: add comments, fix formatting 2016-04-25 12:59:29 -06:00
Kenneth Graunke
4e2d22c5a7 i965: Mark URB reads as volatile.
They can be affected by URB writes.

In the upcoming scalar TCS backend, this prevents read-modify-write
cycles from being broken by CSE removing reads.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-25 11:45:15 -07:00
Kenneth Graunke
501bedffa6 i965: Make a few tessellation related functions non-static.
Also, move them to brw_shader.cpp so they're in a location for code
used by both the vec4 and fs worlds.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-25 11:44:48 -07:00
Brian Paul
464d6080c6 svga: separate HUD counters for state objects
Count depth/stencil, blend, sampler, etc. state objects separately
but just report the sum for the HUD.  This change lets us use gdb to
see the breakdown of state objects in more detail.

Also, count sampler views too.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-04-25 09:45:16 -06:00
Robert Foss
b87856d25d st/omx: Fix resource leak on OMX_ErrorNone
Avoid leaking buffer allocated for task if an error has occured.

Coverity id: 1213929
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 15:09:37 +01:00
Jonathan Gray
3c8f9ed9b7 isl: remove ffs function that conflicts with system headers
Remove a wrapper around __builtin_ffs that conflicts with system
headers on OpenBSD and perhaps elsewhere:

isl_priv.h:44: error: conflicting types for 'ffs'

v2: include strings.h to ensure prototype is found

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 15:06:46 +01:00
Grazvydas Ignotas
dc732a8ef2 gallium: use unreachable instead of asserts
Avoids warnings in release builds.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:34 +02:00
Grazvydas Ignotas
d14778656b anv: fix warnings in release build
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:31 +02:00
Grazvydas Ignotas
ff48375a16 isl: fix warnings in release build
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:28 +02:00
Grazvydas Ignotas
29d2c0e9e6 spirv: fix warning in release build
Mark variable MAYBE_UNUSED to avoid unused-but-set-variable warning in
release build.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:25 +02:00
Grazvydas Ignotas
cbb0d4ad75 gallium: fix warnings in release build
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:21 +02:00
Grazvydas Ignotas
bbeb9ab2f7 glsl: fix warning in release build
Mark variable MAYBE_UNUSED to avoid unused-but-set-variable warning in
release build.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:16 +02:00
Grazvydas Ignotas
e4fc06a2f8 util: add MAYBE_UNUSED for config dependent variables
This is mostly for variables that are only used in asserts and cause
unused-but-set-variable warnings in release builds. Could just use
UNUSED directly, but MAYBE_UNUSED should be less confusing and is
similar to what the Linux kernel has.

And yes __attribute__((unused)) can be used on variables on both GCC 4.2
(oldest supported by mesa) and clang 3.0 (just some random old version,
not sure what's the minimum for mesa).

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-25 12:23:10 +02:00
Hans de Goede
787a53988c nouveau: codegen: combineLd/St do not combine indirect loads
combineLd/St would combine, i.e. :

st  u32 # g[$r2+0x0] $r2
st  u32 # g[$r2+0x4] $r3

into:

st  u64 # g[$r2+0x0] $r2d

But this is only valid if r2 contains an 8 byte aligned address,
which is not guaranteed for compute shaders

This commit checks for src0 dim 0 not being indirect when combining
loads / stores as combining indirect loads / stores may break alignment
rules.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-25 11:45:07 +02:00
Rob Clark
0831eb94b9 freedreno/ir3: relax restriction in grouping
Currently we were two restrictive, and would insert an output move in
cases like: MOV OUT[0], IN[0].xyzw

Loosen the restriction to allow the current instruction to appear in the
neighbor list but only at it's current possition.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Rob Clark
36c9ea6e79 freedreno/ir3: fix small memory leak
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Rob Clark
610837fb98 freedreno/ir3: fix small RA bug
Normally the offset in the group would be the same, but not always.  For
example, in a sam(w) which only writes the 4th component.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Rob Clark
adf795432f freedreno/a4xx: better workaround for astc+srgb
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions.  But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures.  Work around this
setting up a second texture state and using that to sample alpha
separately.

This way, srgb->linear conversion happens in hw *prior* to
interpolation.

This fixes 546 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Rob Clark
a148300b13 Revert "freedreno/a4xx: lower srgb in shader for astc textures"
Better workaround in the following patch.

This reverts commit 899bd63ace.
2016-04-24 13:40:57 -04:00
Rob Clark
19118e6f47 freedreno/a4xx: blend state no longer depends on fb state
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Marek Olšák
c0c6ca40a2 Revert "st/dri: add 32-bit RGBX/RGBA formats"
This reverts commit ccdcf91104.

It breaks most KDE apps, because DRI doesn't support the RGBA component
ordering.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95071
2016-04-24 15:16:07 +02:00
Jonathan Gray
147a2d25ad genxml: use PYTHON3
Allows the build to work when the python3 binary is not "python3".

v2: remove x bit from the script at Emil's suggestion

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 16:45:05 -07:00
Nanley Chery
710b1d2e66 i965/tex_image: Flush certain subnormal ASTC channel values
When uploading a linear, void-extent, ASTC LDR block on Skylake, we are
required to flush to zero the UNORM16 channel values that would be
denormalized. This is specifically required for the values: 1, 2, and 3.

Fixes the 14 failing tests in:
   dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*

v2: Split out flushing function (Kristian Høgsberg)
v3: Map with READ instead of INVALIDATE (Kenneth Graunke)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 11:35:08 -07:00
Jonathan Gray
e29b3bfd6e configure.ac: search for and set PYTHON3
src/intel/genxml/gen_pack_header.py requires python3.

v2: check for python3.5 as well

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 01:06:20 -07:00
Topi Pohjolainen
f8dd07a2c3 i965/blorp: Enable for buffer resolves
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94181

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:29:15 +03:00
Topi Pohjolainen
c7cf17ae75 i965/blorp: Enable for normal color clears
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:29:15 +03:00
Topi Pohjolainen
c4ec0121a8 i965/blorp: Fix clear code for ignoring colormask for XRGB formats on Gen9+
This is equivalent of 73b01e2711
for blorp.

v2 (Ken): No need to call _mesa_format_has_color_component() now
          that the number of components is gotten from
          _mesa_base_format_component_count().

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:29:15 +03:00
Topi Pohjolainen
19948f1bf6 mesa/formats: Take luminance into account in component count
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-23 07:29:15 +03:00
Topi Pohjolainen
9e153c0692 i965/blorp: Do not trigger re-emission of base state address
In case blorp needs to configure it will be just as if render or
compute pipeline had configured it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:28:58 +03:00
Topi Pohjolainen
84db9ca3f7 i965/blorp: Reconfigure base state address only if needed
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:09:39 +03:00
Topi Pohjolainen
234b5f23f8 i965/blorp: Use BRW_NEW_BLORP instead of trashing all state bits
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:09:39 +03:00
Kenneth Graunke
6d5ce1b043 i965: Make all atoms to track BRW_NEW_BLORP by default
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
2016-04-23 07:09:39 +03:00
Topi Pohjolainen
65a5af6dd0 i965: Introduce state flag for blorp
In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger
re-emission of the entire 3D pipeline on the next draw.  However, there
are some packets BLORP simply leaves alone, so there's no need to
re-emit them.  Trying to reduce the set of dirty bits flagged after
BLORP runs is tricky.

Instead, we introduce a BRW_NEW_BLORP flag.  This should be set on any
atom which emits a packet that BLORP also emits.  When BLORP runs, it
will flag BRW_NEW_BLORP, causing those packets to get re-emitted.

This also makes it easy to avoid re-emitting specific atoms - we can
simply drop the BRW_NEW_BLORP flag on those.

To start, we assume that all packets need to be re-emitted.  This is the
safest approach and closest to the existing code's behavior.  Many of
these are obviously not required, and can be dropped in subsequent
patches.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:09:39 +03:00
Topi Pohjolainen
0e850452d1 i965/blorp/gen6: Use normal base state address setup
This is identical to the blorp version which only differs in case
fragment shader isn't used. In that case blorp would reset batch
buffer address to zero.
This is not really needed, and having blorp to use base state
address setup that is compatible with normal upload allows one to
skip resetting it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:09:39 +03:00
Topi Pohjolainen
ae73e86497 i965: Remove pointers to non-existing atoms
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-23 07:09:39 +03:00
Tom Stellard
9f110a9e10 radeonsi: Implement ddx/ddy on VI using ds_bpermute
The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the LDS
hardware to transfer data, but do not read or write LDS memory.

DDX BEFORE:                        |  DDX AFTER:
                                   |
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
v_lshlrev_b32_e32 v4, 2, v2        |  v_and_b32_e32 v2, 60, v2
v_and_b32_e32 v2, 60, v2           |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2        |  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1                   |  ds_bpermute_b32 v0, v2, v0 offset:4
ds_write_b32 v4, v0                |  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)               |
v_or_b32_e32 v0, 1, v2             |
v_lshlrev_b32_e32 v0, 2, v0        |
ds_read_b32 v1, v3                 |
ds_read_b32 v0, v0                 |
s_waitcnt lgkmcnt(0)               |
                                   |
LDS: 1 blocks                      |  LDS: 0 blocks

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-04-22 23:48:43 +00:00
Tom Stellard
128267d781 radeonsi: Use llvm.amdgcn.mbcnt.* intrinsics instead of llvm.SI.tid
We're trying to move to more of the new style intrinsics with include
the correct target name, and map directly to ISA instructions.

v2:
  - Only do this with LLVM 3.8 and newer.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-22 23:48:43 +00:00
Tom Stellard
d3427412a3 radeonsi: Set range metadata on calls to llvm.SI.tid
The range metadata tells LLVM the range of expected values for this intrinsic,
so it can do some additional optimizations on the result.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-22 23:48:41 +00:00
Tom Stellard
b31422d970 radeonsi: Create a helper function for computing the thread id
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-22 23:45:34 +00:00
Nanley Chery
86cd9a134f i965: Disable KHR_texture_compression_astc_hdr on Gen9
Although Gen9 samples from most HDR ASTC surfaces of correctly,
there currently are no software workarounds to fix the incorrect
sampling that occurs in others of certain color endpoint modes.

With this change, we are no longer failing the 14 tests from:
   dEQP-GLES3.functional.texture.compressed.astc.endpoint_value_hdr_cem_15.*

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-22 16:57:38 -07:00
Tim Rowley
ec089cd987 swr: [rasterizer memory] Constify load tiles
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:49:20 -05:00
Tim Rowley
6facf4b74a swr: [rasterizer core] CompleteDrawContext changes for gcc
Add explicit inline and non-inline versions of CompleteDrawContext
to make gcc happy.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:49:04 -05:00
Tim Rowley
0487377dce swr: [rasterizer] Small cleanups
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:56 -05:00
Tim Rowley
2c4c3c9c71 swr: [rasterizer scripts] Knob scripts tweaks
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:47 -05:00
Tim Rowley
ef293ee9c0 swr: [rasterizer] Interpolation utility functions
v2: use _mm_cmpunord_ps for vIsNaN

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:38 -05:00
Tim Rowley
27cc5924ea swr: [rasterizer core] TemplateArgUnroller
Switch boolean template arguments to typename template arguments of type
std::integral_constant<bool, VALUE>.

This allows the template argument unroller to easily be extended to enums.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:29 -05:00
Tim Rowley
46a448d161 swr: [rasterizer core] Arena: make most allocated blocks the same size
Reduces sorting cost

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:20 -05:00
Tim Rowley
794be41f91 swr: [rasterizer core] Fix global arena allocator bug
- Plus some minor code refactoring

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:48:11 -05:00
Tim Rowley
e42f00ee39 swr: [rasterizer core] Fix thread binding for 32-bit windows
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:47:59 -05:00
Tim Rowley
cd21f90ecf swr: [rasterizer fetch] Add support for fetching non-uniform component formats
For example, R10G10B10A2_UNORM.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:47:48 -05:00
Tim Rowley
244ae7af1b swr: [rasterizer core] Use CS spill/fill size in core
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:47:02 -05:00
Tim Rowley
ee9621e2f5 swr: fix memory leaks from vs/fs compilation
v2: varient -> variant

Reviewed by: George Kyriazis <George.Kyriazis@intel.com>
2016-04-22 18:05:02 -05:00
Tim Rowley
5815c8b3d3 swr: fix clang warnings
v2: use alternate logic version in swr_check_render_cond

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-22 18:03:41 -05:00
Rob Clark
e85bef8b12 freedreno/a4xx: fix encoding of blend color state
Fixes a whole bunch of dEQP-GLES3.functional.fragment_ops.random.* (now
they all pass)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-22 15:00:34 -04:00
Rob Clark
23abc41d2b freedreno: update generated headers
Pull in RB_BLEND_* fixes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-22 15:00:34 -04:00
Eric Anholt
79b36168e0 vc4: Make sure we recompile when sample_mask changes.
Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted
(there is also a bug with RCL tiled blits)

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-22 11:27:11 -07:00
Eric Anholt
876c647194 vc4: Fix validation of full res tile offset if used for non-MSAA.
There's no reason we couldn't do non-MSAA full resolution tile buffer
load/stores, but we would have claimed buffer overflow was being
attempted.  Nothing does this currently.
2016-04-22 11:27:11 -07:00
Eric Anholt
3fecaf0d0c vc4: Only do MSAA FB operations if the FB is MSAA.
I noticed this as a problem with ET:QW traces emitting coverage code when
the framebuffer was supposed to be single sampled.
2016-04-22 11:27:11 -07:00
Eric Anholt
1410403e1e vc4: Fix tests for format supported with nr_samples == 1.
This was a bug from the MSAA enabling.  Tests for surfaces with
nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly
fail out.

Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-22 11:27:11 -07:00
Eric Anholt
6eabdb8959 vc4: Don't try to blit from MSAA surfaces with mismatched width to dst.
I had made the previous blit fix non-MSAA only because I was thinking
about how the hardware infers stride from the RENDERING_CONFIG packet.
However, I'm also inferring the stride for both MSAA src and dst in
vc4_render_cl.c from the width argument in the ioctl.

Fixes 15 EXT_framebuffer_multisample piglit tests.
2016-04-22 11:27:11 -07:00
Kenneth Graunke
42dea145d9 i965: Disable channel expressions for scalar GS, TCS, TES.
On Broadwell, I get the following shader-db statistics:

Tessellation Control Shaders:

   total instructions in shared programs: 57327 -> 57012 (-0.55%)
   instructions in affected programs: 27334 -> 27019 (-1.15%)
   helped: 45
   HURT: 0

   total cycles in shared programs: 265692 -> 255188 (-3.95%)
   cycles in affected programs: 263122 -> 252618 (-3.99%)
   helped: 184
   HURT: 26

Tessellation Evaluation Shaders:

   total instructions in shared programs: 23236 -> 23157 (-0.34%)
   instructions in affected programs: 2791 -> 2712 (-2.83%)
   helped: 27
   HURT: 0

   total cycles in shared programs: 151858 -> 149704 (-1.42%)
   cycles in affected programs: 151858 -> 149704 (-1.42%)
   helped: 101
   HURT: 114

Geometry Shaders:

   Orbital Explorer goes from 6442 -> 6356 instructions.
   Two Shadow of Mordor shaders increase by a single instruction.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-22 10:26:30 -07:00
Topi Pohjolainen
1883613a24 i965/blorp: Add support for 2x msaa
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-22 17:02:29 +03:00
Topi Pohjolainen
125a7fdf32 i965/blorp: Add support for encoding/decoding interleaved 2x msaa
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-22 17:01:29 +03:00
Samuel Iglesias Gonsálvez
f70cacc4bd i965: don't lower mod() in glsl ir
NIR will lower it in nir_opt_algebraic.

No change in shader-db.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-22 13:44:28 +02:00
Timothy Arceri
72b5d00c9c glsl: fix cross validation for explicit locations on structs and arrays
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-22 20:59:57 +10:00
Nicolai Hähnle
39e9cf6cb1 radeonsi: implement TGSI_SEMANTIC_HELPER_INVOCATION
Depends on LLVM support introduced in r267102.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 23:14:04 -05:00
Ilia Mirkin
2bac561787 swr: ignore generated files in rasterizer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-04-22 00:07:25 -04:00
Ilia Mirkin
88ca4a43a2 nvc0: fix retrieving query results into buffer for timestamps
The timestamps are stored in a funny place, and even though they are a
64-bit result, are not stored with is64bit. Account for that when
retrieving the query result into a resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-04-22 00:06:49 -04:00
Jason Ekstrand
541e6c0500 i965/surface_state: Use libisl functions for image format lowering
This lets us delete some redundant code and keep all of the
image_load_store format lowering logic in one place: libisl.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
e53cabe730 i965/fs_surface_builder: Use isl instead of mesa for format info
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
1831fa104c i965/fs_surface_builder: Add a helper for converting GL to ISL formats
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
24bb75049b i965/fs_surface_builder: Explicitly handle FORMAT_NONE in num_image_coordinates
Previously, we were relying on has_matching_typed_format returning true for
MESA_FORMAT_NONE which, in turn, relied on _mesa_get_format_bytes returning
1 for MESA_FORMAT_NONE.  When we switch to ISL, this behaviour will no
longer be something we can rely on.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
f310c02b94 i965/fs_surface_builder: Take a GL format enum instead of mesa_format
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
2980507a19 isl/format: Add a get_num_channels helper
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
3415cf5f2f isl/format: Add more isl_format_has_type_channel functions
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
a4c04dd410 isl/format: Break the guts of has_[us]int_channel into a helper
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
ca8c5993bf anv/image: Use the has_matching_typed_storage_image_format helper from isl
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
65bd8317e2 isl: Add a helper for determining when a typed load/store can be used
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
90576ac963 isl: Take a devinfo in lower_storage_image_format instead of an isl_device
We want to call this function from the shader compiler and having a full
isl_device available at that point isn't practical.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
37f6f21b1f isl: Don't use designated initializers in the header
C++ doesn't support designated initializers and g++ in particular doesn't
handle them when the struct gets complicated, i.e. has a union.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
2785840586 isl: Include c99_compat.h
We need the restrict keyword in isl.h

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Jason Ekstrand
ef5dca2034 i965: Add a dependency on libisl
To avoid build issues, ensure that you're running `make' at the top level
and/or you've executed `make clean' beforehand.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-21 20:44:27 -07:00
Nicolai Hähnle
fe3b1e1448 radeon: handle query buffer allocation and mapping failures
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94984
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 22:33:12 -05:00
Nicolai Hähnle
b222580578 radeon: wire end_query return value to sw/hw_end
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 22:33:07 -05:00
Nicolai Hähnle
71f33a6f69 st/mesa: check return value of begin/end_query
They can only indicate out of memory conditions, since the other error
conditions are caught earlier.

v2: fix error message in EndQuery

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-21 22:33:03 -05:00
Nicolai Hähnle
32214e0c68 gallium: add bool return to pipe_context::end_query
Even when begin_query succeeds, there can still be failures in query handling.
For example for radeon, additional buffers may have to be allocated when
queries span multiple command buffers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 22:32:50 -05:00
Ben Widawsky
6a0d036483 i965: Always use Y-tiled buffers on SKL+
Starting with Skylake, the display engine is capable of scanning out from
Y-tiled buffers. As such, we can and should use Y-tiling for better efficiency.
This also has the added benefit of being able to fast clear the winsys buffer.

Note that the buffer allocation done for mipmaps will already never allocate an
X-tiled buffer for GEN9.

This has an almost universal positive impact on benchmarks, some improving by as
much as 20%.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 20:14:58 -07:00
Marek Olšák
c3b88cc2c1 softpipe: fix a warning due to an incorrect enum comparison
no change in behavior, because both are defined the same

Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák
c9e5a7df61 gallium: remove helpers converting to/from TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák
af249a7da9 gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák
fb523cb6ad gallium: merge PIPE_SWIZZLE_* and UTIL_FORMAT_SWIZZLE_*
Use PIPE_SWIZZLE_* everywhere.
Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE.
The new enum is called pipe_swizzle.

Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Marek Olšák
ed23335a31 gallium: use enums in p_shader_tokens.h (v2)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Acked-by: Jose Fonseca <jfonseca@vmware.com> (v1)

v2: name enums
2016-04-22 01:30:36 +02:00
Marek Olšák
0135bd44c2 gallium: use enums in p_defines.h (v2)
and remove number assignments which are consecutive

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Acked-by: Jose Fonseca <jfonseca@vmware.com> (v1)

v2: name enums
2016-04-22 01:30:34 +02:00
Marek Olšák
8cfc4cf76d radeonsi: remove the shader parameter from si_set_ring_buffer
not used anymore

this is a follow-up to the RW buffer cleanup.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-22 01:14:14 +02:00
Marek Olšák
3cbd8cfc7a radeonsi: decrease GS copy shader user SGPRs to 2
const buffers are no longer used since the clip plane const buffer was
moved to RW buffers

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák
3acaefb1bb radeonsi: shorten slot masks to 32 bits
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák
0954d5e982 radeonsi: clean up shader resource limit definitions
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák
3138a28ff2 radeonsi: move default tess level constant buffer to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:14 +02:00
Marek Olšák
302bec24bd radeonsi: move sample positions constant buffer to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák
860b658b97 radeonsi: move clip plane constant buffer to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák
698821bda3 radeonsi: rework polygon stippling to use constant buffer instead of texture
add it to the RW_BUFFERS descriptor array

now the slot masks don't have to have 64 bits

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák
bb1e647ada radeonsi: generalize si_set_constant_buffer
this will be used in the next commit

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák
36261c29cd radeonsi: make RW buffer descriptor array global, not per shader stage
v2: also simplify invalidation of RW buffer bindings (squashed)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Marek Olšák
1378487fb4 radeonsi: rename and rearrange RW buffer slots
- use an enum
- use a unique slot number regardless of the shader stage
  (the per-stage slots will go away for RW buffers)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-22 01:14:13 +02:00
Roland Scheidegger
4ff8cbb0d8 gallivm: fix bogus argument order to lp_build_sample_mipmap function
Screwed up since 0753b135f6.

(Only an issue with different min/mag filters, and then only in some cases,
which is probably why it went unnoticed for quite a while.
The effect should have simply been nearest mip filter instead of linear, iff
min was nearest, mag was linear, and all pixels hit the mignifying path.)

Fixes a bunch of dEQP failures.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-21 23:57:24 +02:00
Kenneth Graunke
73b01e2711 i965: Fix clear code for ignoring colormask for XRGB formats on Gen9+.
In commit cda886a485, Neil made us stop
advertising RGBX formats on Gen9+, as the hardware apparently no longer
has working fast clear support for those formats.  Instead, we just
fall back to RGBA formats, and use SCS to override alpha to 1.0.

This is fine, but had one unintended side effect: it made us fall back
to slow clears when the color mask disables alpha.  Normally, we ignore
the color mask for non-existent channels.  This includes alpha for XRGB
formats as writing garbage to the X channel is harmless.  But, now that
we use RGBA, we think there's a real alpha channel, and can't do the
optimization.

To hack around this, check if _BaseFormat is GL_RGB and ignore alpha.

Improves WebGL Aquarium performance on Skylake GT3e by about 50%
by letting it use repclears instead of slow clears.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-21 12:01:49 -07:00
Iago Toral Quiroga
bdaa0e12a2 i965/blorp: Improve precission of blitting coordinates when clipping
We do this in two steps: first we clip the dst rect and adjust the src
rect accordingly. Then we do it the other way around. In both passes
the adjustment part involves multiplying by a scale factor that can lead
to a small precision loss. This is breaking a few dEQP tests.

Specifically, the problem happens when we need to clip the same coordinate
twice. For example, if srcX0 and dstX0 need both to be clipped we want to
avoid the situation where we clip srcX0 first, then adjust dstX0 accordingly
but then we realize that the resulting dstX0 still needs to be clipped, so
we clip dstX0 and adjust srcX0 again. Each of these two passes can lead
to precission loss. What we want to do here is detect the rect that leads
to the largest clip (accounting for the scale factor involved), clip that
rect and adjust the other one. With this we ensure that the adjusted
coordinate does not need to be clipped again and we can skip a second pass,
improving precision.

Fixes the following 4 dEQP tests:
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_linear

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-04-21 10:43:39 -07:00
Bas Nieuwenhuizen
38f4cee3ff radeonsi: Add config parameter to si_shader_apply_scratch_relocs.
shader->config is not updated for compute kernels.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-21 19:36:19 +02:00
Matt Turner
1bc983cd64 glsl: Relax GLSL 1.10 float suffix error to a warning.
Float suffixes are allowed in all subsequent GLSL specifications, and
it's obvious what the user meant if they specify one. Accept it with a
warning to avoid breaking applications, like Planeshift (although it
looks like between 0.6.1 and 0.6.3 they might have removed the suffixes
from their shaders).

Reviewed-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:33:08 -07:00
Matt Turner
33565d6764 i965/fs: Readd opt_drop_redundant_mov_to_flags().
This reverts commit b449366587.

I removed the pass thinking that it was now not useful, but that was not
true. I believe I ran shader-db on HSW and saw no results, but HSW does
not use the unlit centroid workaround code and as a result does not emit
redundant MOV_DISPATCH_TO_FLAGS instructions.

On IVB, the shader-db results are:

total instructions in shared programs: 6650806 -> 6646303 (-0.07%)
instructions in affected programs: 106893 -> 102390 (-4.21%)
helped: 793

total cycles in shared programs: 56195538 -> 56103720 (-0.16%)
cycles in affected programs: 873048 -> 781230 (-10.52%)
helped: 553
HURT: 209

On SNB, the shader-db results are:

total instructions in shared programs: 7173074 -> 7168541 (-0.06%)
instructions in affected programs: 119757 -> 115224 (-3.79%)
helped: 799

total cycles in shared programs: 98128032 -> 98072938 (-0.06%)
cycles in affected programs: 1437104 -> 1382010 (-3.83%)
helped: 454
HURT: 237

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-21 10:32:40 -07:00
Topi Pohjolainen
0020ca3c92 i965/blorp: Do not emit pma stall on gen9+
This was left out from the original gen8 upload introduction.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 20:18:51 +03:00
Tim Rowley
81c1c481ed swr: add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT to get_param
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-21 11:32:09 -05:00
Emil Velikov
9dcb3dfb23 i965: automake: remove gratuitous "+" during variable assignment
There is not initial assignment, thus appending to it does not work.

Fixes: b27c85c4c0 "i965: add build rule for brw_nir_trig_workarounds.c"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-21 16:48:34 +01:00
Rob Herring
1ba203a085 gbm: add GBM_FORMAT_XBGR8888 format support
Add GBM_FORMAT_XBGR8888/__DRI_IMAGE_FORMAT_XBGR8888 format support which
is needed for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-21 14:45:56 +01:00
Rob Herring
ccdcf91104 st/dri: add 32-bit RGBX/RGBA formats
Add support for 32-bit RGBX/RGBA formats which are preferred for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-21 14:45:53 +01:00
Rob Herring
3b69076435 dri/common: add MESA_FORMAT_R8G8B8{A8, X8}_UNORM formats as supported configs
Add MESA_FORMAT_R8G8B8A8_UNORM and MESA_FORMAT_R8G8B8X8_UNORM formats as
these are the preferred formats for Android.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-21 14:45:21 +01:00
Rob Herring
b27c85c4c0 i965: add build rule for brw_nir_trig_workarounds.c on Android
Commit bfd17c76c1 ("i965: Port INTEL_PRECISE_TRIG=1 to NIR.") added a
generated file brw_nir_trig_workarounds.c which broke the Android build.
Add the necessary makefiles to the Android build.

Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-21 14:43:26 +01:00
Rob Herring
30239ba056 glsl: android: add back missing generated glcpp include path
Commit 4db8f15a25 ("glsl: move the android build scripts a level up")
dropped a generated include path for glcpp. Add it back adjusting for the
new location.

Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-21 14:43:21 +01:00
Jonathan Gray
28e3ae344b loader: add a libdrm case for loader_get_device_name_for_fd
Use dev_node_from_fd() with HAVE_LIBDRM to provide an implmentation
of loader_get_device_name_for_fd() for non-linux systems that
use libdrm but don't have udev or sysfs.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-21 14:41:41 +01:00
Jonathan Gray
5d09394fb1 i965/tiled_memcpy: don't unconditionally use __builtin_bswap32
Use the defines Mesa configure sets to indicate presence of the bswap32
builtins.  This lets i965 work on OpenBSD again after the changes that
were made in 0a5d8d9af4.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-21 14:41:41 +01:00
Jonathan Gray
9bbf3737f9 egl/x11: authenticate before doing chipset id ioctls
For systems without udev or sysfs that use drm ioctls in the loader
drm authentication must take place earlier or the loader will fail
"MESA-LOADER: failed to get param for i915".

Patch from Mark Kettenis.

Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
[Emil Velikov: remove gratuitous white-space]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-21 14:40:44 +01:00
Bas Nieuwenhuizen
4abe051a3f gallium/radeon: Silence possibly uninitialized variable warning.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 13:40:47 +02:00
Bas Nieuwenhuizen
51d1551241 winsys/amdgpu: Silence possibly uninitialized variable warning.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 13:40:42 +02:00
Bas Nieuwenhuizen
4d13c7c879 radeonsi: Enable loading into CE RAM.
We need to enable a bit in the CONTEXT_CONTROL packet for the
loads to work.

v2: Style issues.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-21 12:50:58 +02:00
Bas Nieuwenhuizen
f45f54e14a radeonsi: Use defines for CONTEXT_CONTROL instead of magic values.
v2: Use field names provided by Nicolai.
v3: Updated to use CONTEXT_CONTROL prefix.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 12:50:58 +02:00
Thomas Hindoe Paaboel Andersen
d4a21a0de0 winsys/amdgpu: fix preamble IB size
The missing break caused the IB size to be overwritten with
the size of IB_CONST.

This was introduced in: 7201230582

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-04-21 12:14:50 +02:00
Topi Pohjolainen
935ce14a44 i965/blorp: Reduce the urb size requirement for vertex buffer
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
26fdb7e51e i965/blorp: Reduce the size of vertex buffer
Previously the vertex buffer consisted of eight floats per vertex
of which six where constants. These can be as easily provided by
vertex fetcher as it is capable of filling vertex elements with
constant one and zero. This reduces the size of the vertex buffer
from 3 * 8 * 4 = 96 to 3 * 2 * 4 = 24 bytes.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
0ae360f098 i965/blorp: Do not tricker urb re-configuration unnecessarily
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
69dfb7b2b7 i965/blorp: Skip re-emitting urb config whenever possible
Otherwise clearing with blorp will regress performance in some
synthetic test cases.

v2: Used vsize >= 2 instead of vsize > 0, and updated the comment.
    Review by Ken in one of the earlier patches revealed this.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
7644e8ab68 i965/blorp: Prepare to switch from compute pipeline
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
aa322f8ae5 i965/blorp: Skip uploading state/options not needed for clears
In case there is no source it means the program does a simple
clear or a resolve. In such case there is no need to program
sampling state or enable pixel kill in fragment shader.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
87d333f2fe i965/blorp: Re-introduce clear programs
This partially reverts 2f28a0dc23

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
69c364f2dc i965/meta: Move check for srgb into is_color_fast_clear_compatible()
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
8a696e75d8 i965/meta: Expose check for fast clear compatibility
Also add the additional render format check to the same utility.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
a848ad6806 i965/meta: Expose fast clear value setup
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:03 +03:00
Topi Pohjolainen
fb14a2fc78 i965/meta: Expose non-fast clear rectangle calculation
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
9d79235e4e i965/meta: Expose resolve clear rectangle calculation
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
2757d723da i965/meta: Expose fast clear rectangle calculation
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
3ef957e783 i965: Declare input to mcs alignment calculation constant
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
c40b1efa70 i965/blorp: Switch the order of render and texture targets
On gen8 color resolving won't work anymore if the target isn't
the first entry in the binding table.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
0d062d79c3 i965/blorp: Reduce scope for generator and its inputs
Generator is only needed for getting the assembly.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
4c3de6b2d6 i965/blorp: Add support for disabling color blending
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
da5a477ce4 i965/blorp: Add support for setting fast clear operation
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
7de72f728b i965/blorp: Enable blits on gen8
v2 (Ken): Moved switch cases for gen8/9 in texel_fetch() to
          earlier patch adding gen8/9 sampling support.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
f7ab4e0cc4 i965/blorp: Prepare stencil sampling for gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:02 +03:00
Topi Pohjolainen
708453952b i965/blorp: Add check for supported sample numbers
v2 (Ken): Fix the condition on using meta for stencil blits:
          use_blorp -> !use_blorp

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:01 +03:00
Topi Pohjolainen
9e4d19372b i965/blorp: Add support for sampling 3D textures
This patch adds additional MOV instruction for all blorp programs
that use SHADER_OPCODE_TXF. Alternative is to augment blorp program
key to tell if z-coordinate is needed, add condition to the blorp
blit compiler and to produce a variant with and without the MOV.
This seems a little overkill.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:01 +03:00
Topi Pohjolainen
6b33d63d77 i965/blorp: Add support for source swizzle
In order to support cases where gen9 uses RGBA format to back client
requested RGB, one needs to have means to force alpha channel to one
when user requested RGB surface is used as blit source.

v2 (Ken): Use helper for constructing the swizzle (this should be
          changed to use brw_get_texture_swizzle() as a follow-up).
          Also calculate the swizzle for CopyTexSubImage.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:01 +03:00
Topi Pohjolainen
52e7008a5a i965/blorp: Pipeline upload support for gen8
v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state
          Add emission of pma stall.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:20:01 +03:00
Topi Pohjolainen
2fda441371 i965/gen8: Expose pma stall emission
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 10:19:30 +03:00
Topi Pohjolainen
8b2332e3d1 i965: Allow texture surface state setup to be used by blorp
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:42:10 +03:00
Topi Pohjolainen
0ad83d222b i965/blorp: Prepare sampling for gen9
v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These
          were wrongly introduced in blit-enabling patch.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:41:40 +03:00
Topi Pohjolainen
328ab6c268 i965/blorp: Prepare render target write for gen8
v2 (Ken): Use payload directly instead of retyping it into vec8.
          Drop the implied header, it isn't used for gen6+ anyway.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:40:33 +03:00
Topi Pohjolainen
135f00e666 i965/blorp/gen6: Prepare vertex buffer setup logic for gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:37:06 +03:00
Topi Pohjolainen
395abb9c3b i965/blorp/gen7: Expose state setup applicable to gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:36:53 +03:00
Topi Pohjolainen
ede09e672a i965/blorp: Use 8k chunk size for urb allocation
Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.

However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region.  This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.

v2 (Ken): Better description of the change, and do not change vs_size
          from 2 to 1.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:36:26 +03:00
Topi Pohjolainen
e04b3cdf33 i965/blorp/gen7: Prepare re-using for gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:36:14 +03:00
Topi Pohjolainen
f1ddfa8512 i965/blorp: Let compiler calculate the vertex buffer size
Currently the size is sizeof(float) times too large. One reserves
GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands
for the size of vertex buffer in bytes.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:35:58 +03:00
Topi Pohjolainen
4c526370ca i965/gen8: Expose state base address setup
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:35:45 +03:00
Topi Pohjolainen
9949103756 i965/gen8: Expose surface state helpers
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:35:34 +03:00
Topi Pohjolainen
4f1d9f2879 i965/gen9: Use correct size for DS_STATE
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-21 08:32:12 +03:00
Roland Scheidegger
0295db2a8b glsl: add forgotten textureOffset function for sampler2DArrayShadow
This was part of EXT_gpu_shader4 - as such it should have been supported
by glsl 130.
It was however forgotten, and not added until glsl 430 - with the wrong
syntax no less (glsl 430 mentions it was overlooked).
glsl 440 (but revision 8 only) fixed this finally for good.
At least nvidia supports this with just version glsl version 1.30 as well
(the spec doesn't explicitly say it should be supported retroactively),
so just add this to the other glsl 130 textureOffset functions.

Passes a (hacked) piglit tex-miplevel-selection test (2DArrayShadow
textureOffset -auto) with llvmpipe.

v2: fix up comment (by Ian), add testing to commit message.

Reviewed-by: Dave Airlie <airlied@gmail.com>
2016-04-21 02:38:46 +02:00
Kenneth Graunke
d8c8f4203f i965: Fix interpolateAtSample() on single sampled buffers.
Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
447d3eec6a i965: Fix gl_SampleMaskIn[] in per-sample shading mode.
The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.

Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:

sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
66a725570c i965: Only enable oMask output when there's a multisample FBO.
The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."

The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:

"No changes to the fragment alpha or coverage values are made at this
 step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
 is not one."

oMask output alters coverage masks and can kill pixels.  We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.

Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188

Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0

Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7

Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
81407531e0 i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.
I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch.  This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.

The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value.  So we
can just eliminate the check and generalize the field.

v2: Also update the Vulkan driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
de0a46a040 i965: Delete now dead persample_2x FS program key flag.
This was only used by the old gl_SampleID calculations.  The new code
doesn't need to handle 2x specially.

v2: Delete it from the Vulkan driver, too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
57118a19da i965: Simplify gl_SampleID setup on Gen8+.
On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations.  We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case.  Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.

For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet.  However, it works fine on Gen8+.  So, land the
code and use it where it's working, and leave a TODO for later.

v2: Fix register types in the comment (caught by Matt Turner!).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Kenneth Graunke
528255b0b1 i965: Flip key->compute_sample_id check.
This just moves the simple case first.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 16:18:47 -07:00
Bas Nieuwenhuizen
43ed1f73f8 st/mesa: Use correct size for compute CAPs.
Some CAPs are stored as 64-bit value while Mesa stores
the related constant as 32-bit value.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-21 00:27:01 +02:00
Kenneth Graunke
60a17d0718 i965: Properly handle integer types in opt_vector_float().
Previously, opt_vector_float() always interpreted MOV sources as
floating point, and always created a MOV with a F-type destination.

This meant that we could mess up sequences of integer loads, such as:

   mov vgrf6.0.x:D, 0D
   mov vgrf6.0.y:D, 1D
   mov vgrf6.0.z:D, 2D
   mov vgrf6.0.w:D, 3D

Here, integer 0/1/2/3 become approximately 0.0f, so we generated:

   mov vgrf6.0:F, [0F, 0F, 0F, 0F]

which is clearly wrong.  We can properly handle this by converting
integer values to float (rather than bitcasting), and emitting a type
converting MOV:

   mov vgrf6.0:D, [0F, 1F, 2F, 3F]

To do this, see first see if the integer values (converted to float)
are representable.  If so, we use a D-type MOV.  If not, we then try
the floating point values and an F-type MOV.  We make zero not impose
type restrictions.  This is important because 0D would imply a D-type
MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D,
where we want to use an F-type MOV.

Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend.  This
recently became visible due to changes in opt_vector_float() which
made it optimize more cases, but it was a pre-existing bug.

Apparently it also manages to turn more integer loads into VFs,
producing the following shader-db statistics on Haswell:

total instructions in shared programs: 7084195 -> 7082191 (-0.03%)
instructions in affected programs: 246027 -> 244023 (-0.81%)
helped: 1937

total cycles in shared programs: 65669642 -> 65651968 (-0.03%)
cycles in affected programs: 531064 -> 513390 (-3.33%)
helped: 1177

v2: Handle the type of zero better.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 15:05:13 -07:00
Kenneth Graunke
1aa28f3509 i965: Make opt_vector_float() only handle non-type-conversion MOVs.
We don't handle this properly - we'd have to perform the type conversion
before trying to convert the value to a VF.

While we could do that, it doesn't seem particularly useful - most
vector loads should be consistently typed (all float or all integer).

As a special case, we do allow type-converting MOVs of integer 0, as
it's represented the same regardless of the type.  I believe this case
does actually come up.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 15:05:13 -07:00
Kenneth Graunke
2a25a5142b i965: Fold vectorize_mov() back into the one caller.
After the previous patch, this helper is only called in one place.
So, just fold it back in - there are a lot of parameters here and
not much code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 15:05:13 -07:00
Kenneth Graunke
9967561158 i965: Rework opt_vector_float() control flow.
This reworks opt_vector_float() so that there's only one place that
flushes out any accumulated state and emits a VF.

v2: Don't break the sequence for non-representable numbers - just skip
    recording their values.  Only break it for non-MOVs or register
    changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-20 15:05:13 -07:00
Jason Ekstrand
50018522d2 anv: s/anv_batch_emit_blk/anv_batch_emit/
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
0a45395902 anv: Remove the old emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
86c52bc757 anv/gen7_pipeline: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
744e133431 anv/gen7_cmd_buffer: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
cae2f14947 anv/device: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
932c353592 anv/state: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
9e9f3f4e71 anv/gen8_pipeline: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
dba3727bea anv/genX_pipeline: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
a48f8340d9 anv/gen8_cmd_buffer: Use the new emit macro
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
8a6ced83e9 anv/cmd_buffer: Use the new emit macro for quaries
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
db25e1eec5 anv/cmd_buffer: Use the new emit macro for DRAWING_RECTANGLE
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
deb13870d8 anv/cmd_buffer: Use the new emit macro for compute shader dispatch
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
06fc7fa684 anv/cmd_buffer: Use the new emit macro for 3DSTATE_CONSTANT
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
a71ded0e18 anv/cmd_buffer: Use the new emit macro for DEPTH/STENCIL_BUFFER
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
56453eeaff anv/cmd_buffer: Use the new emit macro for PIPE_CONTROL and STATE_BASE_ADDRESS
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
1d4d6852b4 anv/cmd_buffer: Use the new emit macro for 3DPRIMITIVE commands
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Jason Ekstrand
64ad2d3bcd anv: Add a new block-based batch emit macro
This new macro uses a for loop to create an actual code block in which to
place the macro setup code.  One advantage of this is that you syntatically
use braces instead of parentheses.  Another is that the code in the block
doesn't even get executed if anv_batch_emit_dwords fails.

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-20 14:54:09 -07:00
Samuel Pitoiset
d30768025a gk110/ir: make use of IMUL32I for all immediates
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-20 22:55:36 +02:00
Samuel Pitoiset
17a37c78fc gk110/ir: do not overwrite def value with zero for EXCH ops
This is only valid for other atomic operations (including CAS). This
fixes an invalid opcode error from dmesg. While we are it, make sure
to initialize global addr to 0 for other atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-20 22:55:33 +02:00
Marcin Ślusarz
3caf2e89aa anv: fix build without Wayland platform
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 11:12:10 -07:00
Laurent Carlier
6c952d8ac7 anv: fix building on i686 with -mcpu=generic
mcpu=generic doesn't enable sse2, and anvil definitly needs it

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 10:48:11 -07:00
Jason Ekstrand
2ef7aef322 spirv: Trivially handle the NonWriteable decoration
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 10:33:23 -07:00
Connor Abbott
b6dc940ec2 nir: rename nir_foreach_block*() to nir_foreach_block*_call()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 09:47:05 -07:00
Samuel Pitoiset
7143068296 nvc0: avoid tex read fault from compute shaders on GK110
After some investigation, it seems like that disabling the UNK02C4
command avoid a read fault with texelFetch() from a compute shader.

I have no clue on what this method actually does, but this avoid the
GPU to hang with basic-texelFetch.shader_test without introducing any
compute-related regressions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-20 18:28:47 +02:00
Jason Ekstrand
87a4fb516e i965/vec4: Always split uniforms in array_access_to_pull_constants
Normally, we split uniforms at the end but in Vulkan, we bail because we
don't want pull constants.  However, we still need them split because
pack_uniforms relies on it.

I really don't like this patch not because it doesn't work (it does) but
because now that we're using MOV_INDIRECT, uniform numbers and sizes don't
really matter anymore.  In the FS backend, uniform splitting and packing is
handled all at once (actual re-assignment of locations happens later) and
we really should do it that way in vec4 eventually as well.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand
b3f43822c7 i965/vec4: Use the correct offset for the swizzle shift in push constants
This was actually caught by Ken in review the first time around but somehow
didn't get fixed before the patches were pushed. :-(

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand
9f16e170fe i965/vec4: Use nir_intrinsic_base in the load_uniform implementation
We shouldn't be reading the const_index directly

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:15:01 -07:00
Jason Ekstrand
f63a95080f anv/apply_dynamic_offsets: Provide a range on the load_uniform
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
2016-04-20 09:14:58 -07:00
Jason Ekstrand
35b758c378 anv/lower_push_constants: Stop treating scalar specially
All of the code that did something special based on vec4 vs. scalar is
bogus.  In the backend, everything is now in units of bytes and the vec4
backend can handle full std140 packing so we don't need to do anything
special anymore.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
2016-04-20 09:14:47 -07:00
Tim Rowley
3bbe8a09ea swr: fix resource backed constant buffers
Code was using an incorrect address for the base pointer.

v2: use swr_resource_data() utility function.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-by: Markus Wick <markus@selfnet.de>
2016-04-20 09:57:55 -05:00
Hans de Goede
2ac2ecdd6c nouveau: codegen: Add support for OpenCL global memory buffers
Add support for OpenCL global memory buffers, note this has only
been tested with regular load and stores and likely needs more work
for e.g. atomic ops.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20 13:46:03 +02:00
Hans de Goede
61d52a5fb9 nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.

This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.

Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-20 13:46:03 +02:00
Jose Fonseca
f02f4d09ce scons: Build dri_common_interop.c. 2016-04-20 12:41:24 +01:00
Marek Olšák
4fa3d35cc5 st/dri: implement the GL interop DRI extension (v2.2)
v2: - set interop_version
    - simplify the offset_after macro
v2.1: - use version numbers, remove offset_after
      - set "out_driver_data_written"
v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too
      - add whandle.offset to buf_offset
      - disable the minmax cache for GL_TEXTURE_BUFFER
2016-04-20 12:18:47 +02:00
Marek Olšák
37d3a26bd6 glx: implement GLX part of interop interface (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák
b6eda70843 egl: implement EGL part of interop interface (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák
5e9ed261ed dri_interface: add interface for GL interop with other APIs (v2)
v2: - use const
2016-04-20 12:18:47 +02:00
Marek Olšák
6eeb729490 include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v4.2)
v2: - use "enum" to define stuff
v3: - more comments, define MESA_GLINTEROP_UNSUPPORTED
v4: - add mesa_glinterop_device_info::interop_version
    - more comments
    - remove #define MESA_GLINTEROP_VERSION
    - use const for "in"
v4.1: - use version numbers for structures
      - add "out_driver_data_written"
v4.2: - buf_offset & buf_size affect GL_ARRAY_BUFFER too, this is required
        for sharing suballocations within a larger buffer
2016-04-20 12:15:41 +02:00
Nicolas Dufresne
8093990ef4 st/dri: Fix RGB565 EGLImage creation
When creating egl images we do a bytes to pixel conversion by deviding
by 4 regardless of the pixel format. This does not work for RGB565. In
this patch, we avoid useless conversion and use proper API when the
conversion cannot be avoided.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-20 17:55:30 +09:00
Nicolas Dufresne
4463f38766 st/dri: Factor out DRI2 to PIPE_FORMAT conversion
This code is already duplicated twice and will be useful again. This
will also help when adding formats.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-20 17:34:03 +09:00
Rob Clark
899bd63ace freedreno/a4xx: lower srgb in shader for astc textures
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions.  But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures.  Work around this
by doing the srgb->linear conversion in the shader instead.

This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*

(The remaining fails seem to be a bug w/ ASTC + linear filtering, also
possibly a420.0 specific.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 17:14:04 -04:00
Rob Clark
eddfc97709 nir/lower-tex: add srgb->linear lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-19 17:13:50 -04:00
Rob Clark
eb00a0fc58 nir/builder: const'ify swiz param
No need for it not to be const, and lets caller declare it const if
desired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-19 17:13:36 -04:00
Rob Clark
52ccc6349f nir/lower-tex: make options a local var
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:12:49 -04:00
Rob Clark
d4ff42bd0a freedreno: cleanup fd_set_sampler_views
The separate FS/VS entrypoints are no longer used since a3ed98f.  So
just inline them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:11:47 -04:00
Russell King
fadfaa82c6 tgsi/lowering: improved lowering for LRP
Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King
67da7dd98a tgsi/lowering: improved lowering for XPD
Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King
65460cf4c8 tgsi/lowering: add support for lowering TRUNC
Add support for lowering TRUNC using the following sequence:

	FRC tmpA, |src|
	SUB tmpA, |src|, tmpA
	CMP dst, -tmpA, tmpA

Note that this is incompatible with FRC lowering.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Russell King
23e870a888 tgsi/lowering: add support for lowering FLR and CEIL
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL.  Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.

We also need to deal with FLR instructions emitted by the lowering
code.  Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-19 16:04:44 -04:00
Bas Nieuwenhuizen
464cef5b06 radeonsi: enable TGSI support cap for compute shaders
v2: Use chip_class instead of family.

v3: Check kernel version for SI.

v4: Preemptively allow amdgpu winsys for SI.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen
1f32d5d59f radeonsi: Consider input SGPR count for compute shader SGPR count.
si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen
6c833ba1ab radeonsi: Add CE synchronization for compute dispatches.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:31:23 +02:00
Bas Nieuwenhuizen
e0b729c544 mesa/st: enable compute shaders if images are also supported
v2: Also depend on atomic counters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen
41d79bcbfa radeonsi: clean up compute flush
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:32 +02:00
Bas Nieuwenhuizen
7a92c08428 radeonsi: do not do two full flushes on every compute dispatch
v2: Add more CS_PARTIAL_FLUSH events.

Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.

Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.

v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.

According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
e764ee13ae radeonsi: split setting graphics and compute descriptors
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
061ce9399a radeonsi: split texture decompression for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
e56514f631 radeonsi: update predicate condition for compute dispatches
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
c3083d841e radeonsi: implement TGSI compute dispatch
v2: - Use radeon_set_sh_reg_seq.
    - Set predicate bit for conditional rendering.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
1349dd16ff radeonsi: only emit compute shader state when switching shaders
v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
ba1f66a73d radeonsi: rework compute scratch buffer
Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
107f4d3538 radeonsi: do per cs setup for compute shaders once per cs
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
52d3584dec radeonsi: don't pass scratch buffer to user SGPRs
As far as I can see we use relocations for clover too.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
422a19f76f radeonsi: split input upload off from si_launch_grid
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.

v2: - Clarified commit message.
    - Use radeon_set_sh_reg_seq.
    - Also upload input buffer for clover kernels, even when
      input_size is 0, as it contains grid parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
898298efc9 radeonsi: implement TGSI compute shader creation
v2: Moved scratch_enabled initialization after compile.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
85fd7817ee radeonsi: update shader count for compute shaders
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
da88c2a8e8 radeonsi: set maximum work group size based on block size
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
b082147b78 radeonsi: implement shared atomics
v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
8acf3e501b radeonsi: implement shared memory load/store
v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen
84a6761ae3 radeonsi: add shared memory
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
753a3e472b radeonsi: lower compute shader arguments
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
008d977d01 radeonsi: Use CE for all descriptors.
v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0b6c463dac gallium/util: Add u_bit_scan_consecutive_range64.
For use by radeonsi.

v2: Make sure that it works for all 64 bits set.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
058b54c624 radeonsi: Replace list_dirty with a mask.
We can then upload only the dirty ones with the constant engine.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
aabc7d61d6 radeonsi: Add CE uploader.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
0d7ddd6819 radeonsi: Allocate chunks of CE ram.
v2: Use 32 byte alignment.

v3: Don't allocate CE space for vertex buffer descriptors.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
86c71ff989 radeonsi: Add CE synchronization.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
fe1ef23b66 radeonsi: Add CE packet definitions.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
8fee75d606 radeonsi: Create CE IB.
Based on work by Marek Olšák.

v2: Add preamble IB.

Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.

The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.

v3: - Fix code style.
    - Remove needed space for vertex buffer descriptors.
    - Fail when the preamble cannot be created.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen
7201230582 winsys/amdgpu: Enlarge const IB size.
Necessary to prevent performance regressions due to extra flushing.

Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.

v2: Add preamble IB.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
7997b5f005 winsys/amdgpu: Add support for const IB.
v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00
Marek Olšák
e78170f388 winsys/amdgpu: split IB data into a new structure in preparation for CE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Marek Olšák
f4b77c764a gallium/radeon: move ring_type into winsyses
Not used by drivers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Jose Fonseca
1d2ac7a7ca llvmpipe: Call LLVMShutdown before exiting.
So that LLVM frees its globals.

Trivial.
2016-04-19 12:10:09 +01:00
Jose Fonseca
524042fa35 llvmpipe: Avoid LLVMGetGlobalContext in tests.
Trivial.
2016-04-19 12:10:02 +01:00
Jose Fonseca
bb9e8c5090 llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT.
64bits MSVCRT's exp2f(-inf) returns -inf instead of 0.  Tested with
MSVC 2013's CRT.  (I haven't tried 2015 yet.)

Also this does not happen with MinGW.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:53 +01:00
Jose Fonseca
ee9876be1d llvmpipe: Test more vector lengths.
All power of two of up native vector length.

There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest.  Fixing is left to the future, but the test is now
able to expect it to fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:44 +01:00
Jose Fonseca
932b71f17d gallivm: Avoid llvm::sys::getProcessTriple().
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:37 +01:00
Jose Fonseca
b5ca689cee gallivm: Remove lp_get_module_id.
Just keep a copy of the module_name in gallivm.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:26 +01:00
Jose Fonseca
969ba8bfa7 gallivm: Fix MCJIT with LLVM 3.3.
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.

This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.

Tested MCJIT with LLVM 3.3 to 3.6.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:17 +01:00
Jose Fonseca
cf4105740f gallivm: Make MCJIT a runtime option.
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.

The new option is GALLIVM_MCJIT.

Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows.  I'm almost certain this used to
work, so there probably is a regression somewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:14 +01:00
Jose Fonseca
7d2151b6ea scons: Show the unit test full path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:11 +01:00
Jose Fonseca
2211f8d559 gallivm: Use LLVMSetTarget.
Instead of LLVM C++ interfaces.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:31:00 +01:00
Jose Fonseca
9aa23b11e4 gallivm: Use LLVMPrintValueToString where available.
And llvm::raw_string_ostream where not (LLVM 3.3).

Thereby eliminating yet another dependency on unstable LLVM interfaces.

As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)

Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:37 +01:00
Jose Fonseca
f6621cd3be gallium/tests: Update UTIL_FORMAT_MAX_* defines.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-19 11:28:16 +01:00
Jose Fonseca
121a0cedc8 Revert "nv50/ra: isinf() is in namespace std since C++11."
This reverts commit f525db6358.

It was superseeded by commit 649704f1f7.
2016-04-19 11:22:45 +01:00
Eric Anholt
802b9292aa vc4: Fix fbo-generatemipmap-formats for NPOT.
Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but
we only get one value to control the stride of the src and dst for single
sampled buffers.  A RCL tile blit from level != 1 to level == 0 would
therefore load from the wrong stride.
2016-04-18 16:55:36 -07:00
Eric Anholt
2402bb6095 vc4: Remove unused "immediates" field
This was for TGSI, which we no longer have to deal with.
2016-04-18 16:48:45 -07:00
Ben Widawsky
2408899cb2 i965: Define miptree map functions static (trivial)
They were already declared as such. It was changed here:
commit 31f0967fb5
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 2 14:43:18 2015 -0700

    i965: Make intel_miptree_map_raw static

Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-18 16:12:13 -07:00
Matt Turner
b1d9353cb5 glsl: Properly handle ldexp(0.0f, non-zero-exp). 2016-04-18 15:48:54 -07:00
Dave Airlie
3a26ef23e7 gallivm: convert size query to using a set of parameters.
This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.

[airlied: use some local variables (Roland)]

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-19 07:33:39 +10:00
Tim Rowley
3227c10270 swr: dereference cbuf/zbuf/views on context destroy
Fixes resource memory leaks.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-18 15:52:26 -05:00
Rob Clark
77a9107bf2 freedreno/ir3: fix grouping issue w/ reverse swizzles
When we have something like:

   MOV OUT[n], IN[m].wzyx

the existing grouping code was missing a potential conflict.  Due to
input needing to be sequential scalar regs, we have:

 IN:  x <-> y <-> z <-> w

which would be grouped to:

 OUT: w <-> z2 <-> y2 <-> x  (where the 2 denotes a copy/mov)

but that can't actually work.  We need to realize that x and w are
already in the same chain, not just that they aren't both already in
new chain being built.

With this fixed, we probably no longer need the hack from f68f6c0.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-18 15:41:32 -04:00
Marek Olšák
ed66c75784 radeonsi: use enums in si_shader.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
0c52caf7b7 gallium/radeon: use enums in r600_query.h
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
dd9ca77cb9 radeonsi: always use PFP_SYNC_ME when doing flushes and waits
This is typically used by the closed driver before SURFACE_SYNC.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
1db5678688 radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
58494b42b5 radeonsi: add safety assertions for meta cache flushes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
78f58a4e6f radeonsi: don't use ACQUIRE_MEM on the graphics ring
It's only required on the compute ring. This matches the closed driver.

The compute flag is removed to prevent confusion and Bas's compute shader
patches remove it in the whole function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
3faecdd4e1 radeonsi: remove TODO and correct a comment in si_emit_cache_flush
Yes, that flag is really needed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:25 +02:00
Marek Olšák
28c2573b4f radeonsi: don't flush CB/DB caches for performance counters
I'm not sure about this. This will make the engines go idle, but the caches
will be unflushed. This should match app behavior without performance
counters, which can be a good thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
97c328b2a3 gallium/radeon: don't flush CB/DB caches for timestamp queries
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
6dc21b1962 gallium/util: fix undefined shift to the last bit in u_bit_scan
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-18 19:51:24 +02:00
Marek Olšák
9434aa8103 gallium/util: fix u_bit_scan_consecutive_range for mask == 0xffffffff
The second ffs returns 0, yielding count == -1.

v2: change 1 to 1u

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-18 19:51:24 +02:00
Marek Olšák
e50e1f86b0 gallium/radeon: fix Nine with its slightly shifted viewports
just need to do the calculation in floating-point and then round things
properly

Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-04-18 19:51:24 +02:00
Erik Faye-Lund
ee5b35142a docs: correct name for GL_OES_primitive_bounding_box
When this extension was added, an underscore were mistakenly replaced
by a space. Let's correct this, so it's a tad easier to grep for this
extension.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
2016-04-18 10:48:57 -07:00
Kenneth Graunke
c092f9b96a meta: Don't botch color masks when changing drawbuffers.
Color clears should respect each drawbuffer's color mask state.

Previously, we tried to leave the color mask untouched.  However,
_mesa_meta_drawbuffers_from_bitfield() ended up rebinding all the
color drawbuffers in a different order, so we ended up pairing
drawbuffers with the wrong color mask state.

The new _mesa_meta_drawbuffers_and_colormask() function does the
same job as the old _mesa_meta_drawbuffers_from_bitfield(), but
also rearranges the color mask state to match the new drawbuffer
configuration.

This code was largely ripped off from Gallium's st_Clear code.

This fixes ES31-CTS.draw_buffers_indexed.color_masks, which binds
up to 8 drawbuffers, sets color masks for each, and then calls
glClearBufferfv to clear each buffer individually.  ClearBuffer
causes us to rebind only one drawbuffer, at which point we used
ctx->Color.ColorMask[0] (draw buffer 0's state) for everything.

We could probably delete _mesa_meta_drawbuffers_from_bitfield(),
but I'd rather not think about the i965 fast clear code.  Topi is
rewriting a bunch of that soon anyway, so let's delete it then.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94847
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-18 10:39:31 -07:00
Kenneth Graunke
a33f94ba8c meta: Don't smash ColorMask when using MESA_META_COLOR_MASK save bit.
This allows meta operations to inspect the existing color mask, and
then do their own smashing.

BlitFramebuffer and Clear already override the color mask, so this
was also redundant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-18 10:39:26 -07:00
Eric Anholt
48fe53bbb9 vc4: Add support for rendering to cube map surfaces.
We need to fix up the offset to point at the face of the cube.  Fixes
piglit fbo-cubemap, copyteximage CUBE, and glean's fbo test.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-18 10:10:44 -07:00
Eric Anholt
21a9ed6207 vc4: Don't flush on read-only access of buffers read by the CL.
Fixes piglit mixed-immediate-and-vbo, and may significantly improve
performance of applications that store a 4-byte IB in the same VBO as
vertex data.
2016-04-18 10:10:44 -07:00
Eric Anholt
9e8a8b0c8b vc4: Sanity check that flushes don't happen between state emit and draw.
Catches the cause of failure in
arb_vertex_buffer_object-mixed-immediate-and-vbo, I've had this class of
failure before, and it probably won't be the last time.
2016-04-18 10:10:44 -07:00
Eric Anholt
56b14adf85 vc4: Sanity check strides for imported BOs.
If we're going to sample from or render to them at some particular size,
we'd better make sure that they actually are that size.  Causes some tests
under simulation to generate appropriate error messages instead of
failures.
2016-04-18 10:10:44 -07:00
Pierre Moreau
649704f1f7 math: Import isinf and others to global namespace
Starting from C++11, several math functions, like isinf, moved into the std
namespace. Since cmath undefines those functions before redefining them inside
the namespace, and glibc 2.23 defines the C variants as macros, the C variants
in global namespace are not accessible any longer.

v2: Move the fix outside of Nouveau, as suggested by Jose Fonseca, since anyone
    might need it when GCC switches to C++14 by default with GCC 6.0.

v3:
*   Put the code directly inside c99_math.h rather than creating a new header
    file, as asked by Jose Fonseca;
*   Guard the code behind glibc version checks, as only glibc > =2.23 defines
    isinf & co. as functions, as suggested by Jose Fonseca.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-18 11:10:25 +01:00
Oded Gabbay
d3c98c73dc r600g: Move R600_BIG_ENDIAN to r600_pipe_common.h
I need to do this so I could use R600_BIG_ENDIAN in files which include
r600_pipe_common.h but not r600_pipe.h

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-18 09:50:08 +03:00
Oded Gabbay
72d0d2ba59 r600g: fix code indentation
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-18 09:50:08 +03:00
Emil Velikov
a998e49259 docs: add news item and link release notes for 11.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-17 23:32:41 +01:00
Emil Velikov
50eeb5fb16 docs: add sha256 checksums for 11.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 596c6504b3)
2016-04-17 23:32:41 +01:00
Emil Velikov
c1bf47ada2 docs: add release notes for 11.1.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ca2fbf6f8f)
2016-04-17 23:32:41 +01:00
Roland Scheidegger
d11111a551 gallivm: don't use vector selects with llvm 3.7
llvm 3.7 sometimes simply miscompiles vector selects.
See https://bugs.freedesktop.org/show_bug.cgi?id=94972

This was fixed in llvm r249669
(https://llvm.org/bugs/show_bug.cgi?id=24532).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-18 00:23:34 +02:00
Dave Airlie
b3616f1326 nir: only dereference undef after NULL check. (v2)
Pointed out by coverity.

v2: nuke line, Jason pointed out the constructor does it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-18 07:37:48 +10:00
Emil Velikov
96b4cfe834 docs: update the sha256 checksums for 11.2.1
Turns out the previous tarballs got corrupted during upload which I
carelessly forgot to check prior to deleting the local ones.
Lesson learned - double check before removing the local ones.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 79b0e13913)
2016-04-17 19:32:20 +01:00
Emil Velikov
2197581816 docs: add news item and link release notes for 11.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-17 18:36:59 +01:00
Emil Velikov
03a234c1d1 docs: add sha256 checksums for 11.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c65835d812)
2016-04-17 18:36:59 +01:00
Emil Velikov
c15f457958 docs: add release notes for 11.2.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 21e6440e82)
2016-04-17 18:36:59 +01:00
Jason Ekstrand
f30f6e2625 i965/fs: Don't allow OOB array access of images
We have had a guard against OOB array access of images on IVB for a long
time, but it can actually cause hangs on any GPU generation.  This can
happen due to getting an untyped SURFACE_STATE for a typed message.  We
didn't used to hit this with the piglit test on anything other than IVB
because the OOB in the test would cause us to go past the top of the pull
constant UBO and we would get a surface index of 0 which is was always a
valid surface.  Now that we're pushing small arrays, we can end up grabbing
garbage from the GRF and going to some random index which causes a hang.
The solution is to just do the bounds check on all hardware.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94944
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-04-15 22:47:33 -07:00
Jason Ekstrand
93db828e42 anv/device: Images are only enabled in scalar stages
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-15 16:40:56 -07:00
Marek Olšák
c1a2fe7fd1 gallium/radeon: handle vertex shaders that disable clipping & viewport
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-16 00:21:15 +02:00
Nanley Chery
696d8ff5a1 mesa/texstore: Use Driver.CompressedTexSubImage in the default CompressedTexImage
Enable drivers to use their own implementation of this method instead of
the mesa default. Since the drivers that currently overwrite
dd_function_table::CompressedTexSubImage also overwrite
::CompressedTexImage, there should be no behavioral change.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-15 15:06:27 -07:00
Jason Ekstrand
5ec4ecce44 anv: Advertise vertexPipelineStoresAndAtomics based on scalar stages
Previously, we just looked at the hardware generation but this meant that
if you did INTEL_DEBUG=vec4 on BDW or SKL, you would have advertised but
non-working features.
2016-04-15 14:53:16 -07:00
Jason Ekstrand
0166ad6ced i965/vec4: Support full std140 layout for push constants
Up until now, we have been able to assume that all push constants are
vec4-aligned because this is what the GL driver gives us.  In Vulkan, we
need to be able to support full std140 because we get the layout from the
client.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 14:04:38 -07:00
Jason Ekstrand
a112391d52 i965/vec4: Handle MOV_INDIRECT in pack_uniform_registers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 14:04:38 -07:00
Jason Ekstrand
aaac8a1890 i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 14:04:38 -07:00
Jason Ekstrand
61ee5e62a2 i965/vec4: Use can_do_writemask in can_reswizzle
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 14:04:38 -07:00
Jason Ekstrand
75b68f9114 i965/vec4: Move can_do_writemask to vec4_instruction
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 14:04:37 -07:00
Chad Versace
4a80890177 util: Fix warning of invalid return value
_mesa_libgcrypt_init() returns NULL, but its return type is void.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2016-04-15 15:00:58 -07:00
Jason Ekstrand
cab30cc5f9 Merge branch 'vulkan' 2016-04-15 13:52:34 -07:00
Roland Scheidegger
64d3ae09b7 llvmpipe: (trivial) initialize src1_alpha var to NULL
The blend code would do a conditional assignment based on it, causing valgrind
to complain. Since that variable was actually unused in this case, this
doesn't fix anything but the warning.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94955
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-15 22:51:28 +02:00
Jason Ekstrand
d8b85c96d1 Merge remote-tracking branch 'public/master' into vulkan 2016-04-15 13:35:16 -07:00
Jason Ekstrand
1a100d4f28 configure: Add support for the Intel Vulkan driver
This adds a --with-vulkan-drivers option with one driver, "intel".  In the
future, we may add more drivers to this list.

v2: Don't enable any drivers by default.  This should prevent this patch
    from breaking anyone's build.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 13:29:29 -07:00
Jason Ekstrand
ce7e82fb6f i965/surface_formats: Update some formats for more recent gens
The surface format table hasn't entirely been kept up-to-date.  This commit
marks a couple more compressed formats as sampleable on gen8+ and adds the
A4B4G4R4 format as renderable on gen9.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-15 13:29:29 -07:00
Jason Ekstrand
7dac4a2889 util/list: Add list splicing functions
This adds functions for splicing one list into another.  These have
more-or-less the same API as the kernel list splicing functions.  The
implementation, however, was stolen from the Wayland list implementation.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2016-04-15 13:29:09 -07:00
Jason Ekstrand
17a181bfa6 Remove the Intel Vulkan readme 2016-04-15 13:17:08 -07:00
Tim Rowley
082f6d75ae gallium/swr: confine c++11 flag to swr driver
On the philosophy that a driver shouldn't change the compile flags
for the entire tree, take the clove approach of moving the c++11 flag
to the swr driver directory.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-15 14:43:01 -05:00
Tim Rowley
ee72fec9cf gallium/swr: allow swr use as a swrast dri driver
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-15 14:21:50 -05:00
Eric Anholt
f6d21bcd6b vc4: Fix subimage accesses to LT textures.
This code started out like the T case, iterating over utile offsets, but I
had partially switched it to iterating over pixel offsets.  I hadn't
caught this before because it's unusual to do piecemeal uploads to small
textures.

Fixes bad text rendering in QT5 apps, which use a 256x16 glyph cache.
Also fixes 6 piglit tests related to glTexSubImage() and
glGetTexSubImage().

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-15 11:57:17 -07:00
Mark Janes
ade3108bb5 util: Fix race condition on libgcrypt initialization
Fixes intermittent Vulkan CTS failures within the test groups:
dEQP-VK.api.object_management.multithreaded_per_thread_device
dEQP-VK.api.object_management.multithreaded_per_thread_resources
dEQP-VK.api.object_management.multithreaded_shared_resources

Signed-off-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94904

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-15 10:24:40 -07:00
Jason Ekstrand
8403e6de9f i965: Default to scalar GS 2016-04-15 09:54:42 -07:00
Jason Ekstrand
17d9a2b011 i965/surface_formats: Mark A4B4G4R4_UNORM as SKL+ only
This is what is indicated by the bspec.
2016-04-15 09:53:55 -07:00
Jason Ekstrand
d7189bdeee Revert "i965/fs: Properly write-mask spills"
This reverts commit 9c0109a1f6.
2016-04-15 09:53:55 -07:00
Jason Ekstrand
c3362453f9 Revert "i965/fs: Feel free to spill partial reads/writes"
This reverts commit 2434ceabf4.
2016-04-15 09:53:55 -07:00
Jason Ekstrand
2d5bd66e4f configure: Add support for detecting valgrind headers
We have several places where the Vulkan driver explicitly hooks into
valgrind when it's available.  We need to be able to detect it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-15 09:41:15 -07:00
Eduardo Lima Mitev
7e4628da48 nir/print: Fix printing variable mode
nir_variable_mode is currently a bitflag enum, while
nir_print::print_var_decl() assumes is still a numbered list.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-15 16:41:41 +02:00
John Sheu
f8752e0d95 xlib: remove MESA_GLX_VISUAL_HACK
This removes a hack introduced in 1999 in the first version of
fakeglx.c, with the comment:

  /* XXX revisit this after 3.0 is finished. */

Mesa 4.0 was released in 2001.  It is now 2016, and Mesa 11.0 was
released last year.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-04-15 07:46:00 +02:00
John Sheu
8a9c0f1025 xlib: fix leaks of returned values from XGetVisualInfo
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-04-15 07:45:46 +02:00
John Sheu
781232e0ac xlib: fix memory leak of and remove vishandle from XMesaVisualInfo
The vishandle member of XMesaVisualInfo is used to support the
comparison of XVisualInfo instances by pointer value, in
find_glx_visual().  The comparison however will always be false, as in
every case the comparison is made, the VisualInfo instance being
compared to is a new allocation passed in through a GLX API call.

In addition, the XVisualInfo instance pointed to by vishandle is itself
never freed, causing a memory leak.  Since vishandle is essentially
useless, we just remove it and thereby also fix the leak.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-04-15 07:45:28 +02:00
John Sheu
fe9d8cd79e xlib: do not cache return value of glXChooseVisual/glXGetVisualFromFBConfig
The returned XVisualInfo from glXChooseVisual/glXGetVisualFromFBConfig
is being cached in XMesaVisual.vishandle (and unconditionally
overwritten on subsequent calls).  However, these entry points are
specified to return XVisualInfo instances to be owned by the caller and
freed with XFree(), so the return values should not be retained.

With this change, XMesaVisual.vishandle is essentially unused and will
be removed in a subsequent change.

v2: update commit message

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-04-15 07:44:34 +02:00
Jason Ekstrand
76fa7b16f4 Merge remote-tracking branch 'public/master' into vulkan 2016-04-14 18:30:52 -07:00
Jason Ekstrand
547032c56a main/mtypes: Remove the "set" parameter from gl_uniform_block
This is a left-over from the early days of the Vulkan driver
2016-04-14 18:27:09 -07:00
Jason Ekstrand
f0bbb34e49 Revert "i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT"
This reverts commit 4115648a6b.  This commit
was half-baked and probably never should have been committed.  We'll add
this back in properly later when we need it.
2016-04-14 18:22:08 -07:00
Jason Ekstrand
eeff133158 i965: Expose the surface format table
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 18:07:48 -07:00
Jason Ekstrand
d7cddbd6d6 nir/lower_io: Add UBOs and SSBOs to get_io_offset_src
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 18:07:40 -07:00
Jason Ekstrand
c825e29a82 nir/intrinsics: Add a vulkan_resource_index intrinsic
This is used to facilitate the Vulkan binding model where each resource is
described by a (descriptor set, binding, array index) tuple.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-14 17:20:05 -07:00
Jason Ekstrand
1e0012e3e4 nir: Add a descriptor_set field to nir_variable
This is needed for supporting the Vulkan binding model

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-14 17:20:05 -07:00
Chad Versace
7a835b3fd9 dri: Fix robust context creation via EGL attribute
driCreateContextAttribs() emits an error if bit
__DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS is set for an ES context.  But,
EGL_EXT_create_context_robustness and EGL 1.5 both allow creation of
robust ES contexts. One requests a robust ES context by setting the
EGL_CONTEXT_OPENGL_ROBUST_ACCESS *attribute*, which Mesa's EGL layer
translates into the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS *bit*.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-14 17:38:41 -07:00
Jason Ekstrand
5567ae0547 Merge remote-tracking branch 'public/master' into vulkan 2016-04-14 17:14:28 -07:00
Leo Liu
8f4340c5e6 radeon/uvd: fix tonga feedback buffer size
This only applies to tonga

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-14 19:33:44 -04:00
Jason Ekstrand
f1d29099b4 i965: Push everything if pull_param == NULL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 16:00:18 -07:00
Jason Ekstrand
963513bb24 i965/fs: Push small uniform arrays
Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations.  The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing.  Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up.  To do this,
we use an algorithm similar to the on in split_virtual_grfs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
71f8039f72 i965/fs: Rename demote_pull_constants to lower_constant_loads
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
8e76f664be i965/vec4: Get rid of the uniform_size array
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
056849772f i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants
This commit moves us to an instruction based model rather than a
register-based model for indirects.  This is more accurate anyway as we
have to emit instructions to resolve the reladdr.  It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.

One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants.  This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
479e38ad63 i965/fs: Get rid of the param_size array
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
30874216cb i965/fs: Stop relying on param_size in assign_constant_locations
Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode.  With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore.  The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other.  In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
275855f315 i965/fs: Get rid of reladdr
We aren't using it anymore.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
3c93cdfaf5 i965/fs: Use MOV_INDIRECT for all indirect uniform loads
Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load.  We also have to
rework some of the other bits of the backend to handle this new form of
uniform load.  The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
63101177f3 nir: Add another index to load_uniform to specify the range read
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
27bd8ac6f3 i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware
While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant.  This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
889e6054b7 i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr
The subnr field is in bytes so we don't need to multiply by type_sz.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
7e08a13009 i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions
It should work fine without it and the visitor can set it if it wants.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
40a8fe04dc i965/fs: Add support for doing MOV_INDIRECT on uniforms
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 15:59:33 -07:00
Jason Ekstrand
48cc8c284a anv: Install the installable ICD 2016-04-14 15:15:00 -07:00
Jason Ekstrand
e40b867145 anv/intel_icd: Don't provide an absolute path
The driver will be installed to $(libdir)/libvulkan_intel.so and just
providing a driver name is enough for the loader.  This also ensures that
multi-arch systems work ok.
2016-04-14 15:15:00 -07:00
Jason Ekstrand
ca16373a2b configure: Add initial support for enabling Vulkan drivers 2016-04-14 15:15:00 -07:00
Jason Ekstrand
e61c812f76 anv/pipeline: Use the right mask for lower_indirect_derefs 2016-04-14 15:13:29 -07:00
Ben Widawsky
a8975a91cc i965: Make intel_get_param return an int
This will fix the spurious error message: "Failed to query GPU properties."
that was unintentionally added in cc01b63d73.

This patch changes the function to return an int so that the caller is able to
do stuff based on the return value.

The equivalent of this patch was in the original series that fixed up the
warning, but I dropped it at the last moment. It is required to make the desired
behavior of not warning when trying to query GPU properties from the kernel
unless there is something the user can do about it.

v2: Use strerror (Jason)
Make EINVAL check similar in all places (Ian)

NOTE: Broadwell appears to actually have some issue where the kernel returns
ENODEV when it shouldn't be. I will investigate this separately.

Reported-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-04-14 15:13:22 -07:00
Brian Paul
aed975d5c5 st/mesa: fix sampler view leak in st_DrawAtlasBitmaps()
I neglected to free the sampler view which was created earlier in the
function.  So for each glCallLists() command that used the bitmap atlas
to draw text, we'd leak a sampler view object.

Also, check for st_create_texture_sampler_view() failure and record
GL_OUT_OF_MEMORY.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-04-14 15:32:18 -06:00
Nicolai Hähnle
a17911ceb1 gallium/radeon: handle failure when mapping staging buffer
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-14 16:29:23 -05:00
Nicolai Hähnle
8bd0f0df50 radeonsi: mark ssbo and images descriptor pointers dirty at beginning of CS
Without this, we were getting non-deterministic VM faults under high pressure.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-14 16:29:23 -05:00
Jason Ekstrand
cb372b39ea i965/vec4: Use UD rather than D for uniform indirects
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 14:25:01 -07:00
Jason Ekstrand
240d16ea94 i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD
Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-14 14:24:57 -07:00
Samuel Pitoiset
bb4cdee9a4 nvc0: do not break the universe on GK110+
I removed that return 0 by mistake. Ooops.

Fixes: 6e23fd4 ("nvc0: allow to use compute support on GM200")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-14 21:57:21 +02:00
Samuel Pitoiset
6e23fd420d nvc0: allow to use compute support on GM200
This works like a charm but please not that NVF0_COMPUTE have to be set
because compute support is still not enabled by default on GK110+. This
will require more testing to make sure it won't break the 3D state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-14 21:01:51 +02:00
Jason Ekstrand
34b5db17d9 i965: remove pointless diff with the master branch 2016-04-14 10:39:54 -07:00
Jason Ekstrand
769b5614f8 nir/opt_algebraic: Remove the encoding line
This is an unneeded diff between the vulkan and master branches
2016-04-14 10:35:40 -07:00
Jason Ekstrand
c34be07230 spirv: Move to compiler/
While it does rely on NIR, it's not really part of the NIR core.  At the
moment, it still builds as part of libnir but that can be changed later if
desired.
2016-04-14 10:28:47 -07:00
Jason Ekstrand
bfa3a38280 nir: Remove some pointless delta between vulkan and master 2016-04-14 10:24:33 -07:00
Jose Fonseca
ffcc00ce30 scons: Build NIR.
Emil Velikov:
 - Attribute the src/{glsl,compiler}/nir move
 - Flesh out to separate SConscript

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-14 16:38:59 +01:00
Jose Fonseca
feb6732e80 nir: Use _snprintf on Windows.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-14 16:38:37 +01:00
Jose Fonseca
ba0c0e3940 nir: Avoid structure initalization expressions.
Not supported by MSVC, and completely unnecessary -- inline functions
work just as well.

NIR_SRC_INIT/NIR_DEST_INIT could and probably should be replaced by the
inline functions.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-14 16:38:37 +01:00
Jose Fonseca
8f96524f13 nir: Remove unistd.h include.
It doesn't seem needed, and is not available on MSVC.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-14 16:38:31 +01:00
Jose Fonseca
f8e2f1fba5 nir: Avoid empty {} struct initializer.
Not supported by MSVC and consistent through NIR.

[Emil Velikov: rebase]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-14 16:33:52 +01:00
Emil Velikov
bb949e262c gallium/swr: fold the almost identical Makefiles
Rather than having two almost identical Makefiles, with various VPATH
hacks just fold them, using COMMON_* variables and actually getting
things buildable/shipable.

v2: whitespace fixes, remove Makefile.sources-arch

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-04-14 16:30:57 +01:00
Tim Rowley
aee976703d install-gallium-links.mk: handle multiple libraries
Need to prevent bash from interpreting whitespace between libraries
as a command line.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-14 16:30:57 +01:00
Marek Olšák
112291964e radeonsi: don't overwrite the scratch offset in shader prologs
Prologs only look at num_input_sgprs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
ffe44d0283 radeonsi: fold num_user_sgprs where it is possible
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
51c4034f9b radeonsi: fix SGPRS calculation once more
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.

If this was fixed in LLVM, these workarounds wouldn't be needed.

LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
aaf5be4a29 radeonsi: disable hw ETC2 on Polaris
not supported by hw directly, but it's still fully supported by the driver

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-14 16:58:59 +02:00
Emil Velikov
4358cfc4ad doxygen: remove git rebase fallouts
Should never have been (git) added in the first place.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-14 09:49:09 +01:00
Jose Fonseca
8fcacb4f90 appveyor: Run unit tests.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jose Fonseca
50ddf03ada scons: Add a "check" target to run all unit tests.
Except:
- u_cache_test -- too long
- translate_test -- unreliable (it's probably testing corner cases that
  translate module doesn't care about.)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jose Fonseca
9ae0e8ee3c test/unit: Make translate_test invoke translate_create by default.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jose Fonseca
f8a51034bd test/unit: Make pipe_barrier_test actually check correct bahavior.
So it can run unattended.

Also make it silent by default.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jason Ekstrand
12f88ba32a Merge remote-tracking branch 'public/master' into vulkan 2016-04-13 20:25:39 -07:00
Michel Dänzer
171a570f38 clover: Fix build against LLVM SVN >= r266163
createInternalizePass now takes a callback instead of a StringSet.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-04-14 11:53:41 +09:00
Nanley Chery
79fbec30fc anv: Remove default scissor and viewport concepts
Users should never provide a scissor or viewport count of 0 because
they are required to set such state in a graphics pipeline. This
behavior was previously only used in Meta, which actually just
disables those hardware operations at pipeline creation time.

Kristian noticed that the current assignment of viewport count
reduces the number of viewport uploads, so it is not removed.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 18:02:38 -07:00
Nanley Chery
1949e502bc anv: Replace ::disable_scissor with ::use_rectlists
Meta currently uses screenspace RECTLIST primitives that lie within
the framebuffer rectangle. Since this behavior shouldn't change in the
future, disable the scissor operation whenever rectlists are used.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 18:00:41 -07:00
Nanley Chery
9f72466e9f anv: Delete anv_graphics_pipeline_create_info::disable_viewport
There are no users of this field.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 18:00:41 -07:00
Nanley Chery
cff0f6b027 gen{7,8}_pipeline: Always set ViewportXYClipTestEnable
For the following reasons, there is no behavioural change with this
commit: the ViewportXYClipTest function of the CLIP stage will continue
to be enabled outside of Meta (where disable_viewport is always false),
and the CLIP stage is turned off within Meta, so this function will
continue to be disabled in that case.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 18:00:41 -07:00
Nanley Chery
992bbed98d gen{7,8}_pipeline: Apply 3DPRIM_RECTLIST restrictions
According to 3D Primitives Overview in the Bspec, when the RECTLIST
primitive is in use, the CLIP stage should be disabled or set to have
a different Clip Mode, and Viewport Mapping must be disabled:

   Clipping: Must not require clipping or rely on the CLIP unit’s
   ClipTest logic to determine if clipping is required. Either the CLIP
   unit should be DISABLED, or the CLIP unit’s Clip Mode should be set
   to a value other than CLIPMODE_NORMAL.

   Viewport Mapping must be DISABLED (as is typical with the use of
   screen-space coordinates).

We swap out ::disable_viewport for ::use_rectlist, because we currently
always use the RECTLIST primitive when we disable viewport mapping, and
we'll likely continue to use this primitive.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 17:53:38 -07:00
Nanley Chery
88d1c19c9d anv_cmd_buffer: Don't make the initial state dirty
Avoid excessive state emission. Relevant state for an action command
will get set by the user:

From Chapter 5. Command Buffers,
 When a command buffer begins recording, all state in that command
 buffer is undefined.
 [...]
 Whenever the state of a command buffer is undefined, the application
 must set all relevant state on the command buffer before any state
 dependent commands such as draws and dispatches are recorded, otherwise
 the behavior of executing that command buffer is undefined.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 17:52:24 -07:00
Nanley Chery
9fae6ee026 anv/meta: Don't set the dynamic state for disabled operations
CmdSet* functions dirty the CommandBuffer's dynamic state. This causes
the new state to be emitted when CmdDraw is called. Since we don't need
the state that would be emitted, don't call the CmdSet* functions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 17:52:20 -07:00
Nanley Chery
76b0ba087c anv/clear: Disable the scissor operation
Since the scissor rectangle always matches that of the framebuffer,
this operation isn't needed.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-04-13 17:45:18 -07:00
Jason Ekstrand
b63a98b121 nir/dead_variables: Configurably work with any variable mode
The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-13 15:45:10 -07:00
Kenneth Graunke
505a8fbdf8 i965: Switch to NIR for ldexp lowering.
The old GLSL IR based lowering doesn't quite work right in all cases,
and fails several dEQP-GLES31 and Vulkan CTS tests.  Jason's new
approach in NIR passes all the tests.  There's not likely to be a ton
of advantage to lowering early in GLSL IR anyway, so...switch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:33 -07:00
Jason Ekstrand
4455bfa9a0 nir/algebraic: Add lowering for ldexp
The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today.  Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:19 -07:00
Jason Ekstrand
765dd65349 i965: Implement the new imod and irem opcodes
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:08 -07:00
Jason Ekstrand
745b3d295e nir: Add more modulus opcodes
These are all needed for SPIR-V

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:00 -07:00
Jason Ekstrand
d880c6f9f5 i965/vec4: Inline get_pull_constant_offset
It's not really doing enough anymore to justify a helper function.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-04-13 15:39:20 -07:00
Jason Ekstrand
dd616cab01 nir/lower_io: Allow for a full bitmask of modes
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-13 12:44:10 -07:00
Jason Ekstrand
2caaf0ac5e nir/lower_indirect: nir_variable_mode is now a bitfield
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-13 12:44:07 -07:00
Jason Ekstrand
ffa0e12e15 nir: Convert nir_variable_mode to a bitfield
There are several passes where we need to specify some set of variable
modes that the pass needs top operate on.  This lets us easily do that.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-13 12:40:12 -07:00
George Kyriazis
f69a61b1aa gallium/swr: Make flat shading tris work.
- Incorporate flatshade flag into the shader generation
- Use provoking vertex (vc) in shader when flat shading.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-13 13:46:37 -05:00
Rob Clark
c53a12fedc Revert "freedreno/a4xx: better occlusion/sample counting"
This reverts commit 62fa868728.

dEQP-GLES3.functional.occlusion_query.* was unhappy about that change.
Still not really sure *what* the other slots in the sample results
buffer are.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:40 -04:00
Rob Clark
46e9bbc918 freedreno/a4xx: rasterizer_discard support
This one is slightly annoying, since trying to write RBRC from draw
would clobber values set in the tiling/gmem code.  We could do command-
stream patching for RBRC, as is done on a3xx.  Although since it seems
to be a rarely used feature, it is easier just to do RMW to set/clear
the bit.

Fixes dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_triangles
and related tests.

a3xx still needs the same feature, although there it probably makes more
sense to take advantage of the existing cmdstream patching which is
required for RBRC for other reasons.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:21 -04:00
Rob Clark
216225ce57 freedreno/ir3: fix array textures on a4xx
Seems like a4xx needs offset added to array index for all arrays,
whereas a3xx only for cubemap arrays.  Fixes a whole swath of dEQP fails
(roughly *sampler2darray*).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:14 -04:00
Rob Clark
7e93b26b5d freedreno: fix stream-out offset handling for lines/tris
We need to increment offset by # of vertices, not by # of prims.  Fixes
a bunch of dEQP fails involving prims other than points.  For example,
dEQP-GLES3.functional.transform_feedback.position.lines_separate

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:02 -04:00
Rob Clark
6ca6e80f61 freedreno: fix handling for stream-out offsets
If changed && append, we shouldn't be resetting the internal offset back
to zero.  This fixes issues w/ sequences like:

   glBeginTransformFeedback()
   glDraw()
   glPauseTransformFeedback()
   glDraw()
   glResumeTransformFeedback()
   glDraw()
   glEndTransformFeedback()

Fixes dEQP-GLES3.functional.transform_feedback.array.separate.points.lowp_vec3
and related tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:54 -04:00
Rob Clark
0a4b0fc315 freedreno: fix prims-emitted query
This should only count when TF is not paused.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:47 -04:00
Rob Clark
a7eb12d089 freedreno: fix max-line-width
dEQP noticed that we were advertising completely bogus values.  The
actual maximum is 127.0f.

*But* we have to use an artifically low maximum to work around a bug
in the dEQP test, which gets confused when the max line width is too
large and lines start going off-screen.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:31 -04:00
Rob Clark
6bf462a1ab freedreno: add flag to enable dEQP hacks
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:24 -04:00
Rob Clark
f68f6c0246 freedreno/ir3: hack to avoid getting stuck in a loop
There are still some edge cases which result in a neighbor-loop.  Which
needs to be fixed, but this hack at least makes deqp tests finish.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:13 -04:00
Rob Clark
dd70945e09 freedreno/ir3: use (ss) instead of (sy) for ldlv
Fixes a bunch of flat-varying fail on a4xx (where we need to use ldlv to
read the un-interpolated varying).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:05 -04:00
Rob Clark
b35ad6e701 freedreno/ir3: cleanup double cmps.s from frontend
Since we cannot mov into a predicate register, the frontend uses a
'cmps.s p0.x, cond, 0' as a stand-in for mov to p0.x.  It does this
since it has no way to know that the source cond instruction (ie.
for a kill, br, etc) will only be used to write the predicate reg.
Detect this, and re-write the instruction writing p0.x to skip the
original cmps.[sfu].  (It is done like this, rather than re-writing
the dest of the first cmps.[sfu] in case the first cmps.[sfu]
actually has other users.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:14:41 -04:00
Matt Turner
9bac27dbf9 glsl: Rename "vertex_input_slots" -> "is_vertex_input"
vertex_input_slots would be an appropriate name for an integer, but not
a bool.

Also remove a cond ? true : false from a count_attribute_slots() call
site, noticed during the rename.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-13 11:00:21 -07:00
Jose Fonseca
9586468c03 gallivm: Workaround LLVM PR 27332.
The credit for finding and isolating this bug goes to Vinson and Roland.

The buggy LLVM versions were found by doing

  opt -instcombine llvm-pr27332.ll > /dev/null

where llvm-pr27332.ll is the IR from
https://llvm.org/bugs/show_bug.cgi?id=27332#c3

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-13 16:42:55 +01:00
Marek Olšák
dd0a296895 gallium/radeon: move a comment to the correct place
trivial
2016-04-13 17:31:03 +02:00
Nicolai Hähnle
9e9a2bb44a radeonsi: gate PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT by LLVM version
Otherwise we incorrectly claim ARB_ssbo support even with older LLVM versions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94917
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-13 10:06:22 -05:00
Elie TOURNIER
f04565c876 doxygen: Generate Doxygen for NIR
Now, one can do the following to generate and read the nir Doxygen:
cd $MESA_TOP/doxygen
make
firefox nir/index.html

Update v2:
Correct TAGFILES in nir.doxy

Signed-off-by: Elie TOURNIER <tournier.elie@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>

[Emil Velikov] v3: Rebase.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:33 +01:00
Elie TOURNIER
3157df58d0 doxygen: update glsl link
Signed-off-by: Elie TOURNIER <tournier.elie@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:30 +01:00
Rhys Kidd
0e9fc1228a doxygen: Remove deprecated settings in common.doxy
These Doxygen features are deprecated, as reported by Doxygen 1.8.9.1

Warning: Tag `USE_WINDOWS_ENCODING' at line 66 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `DETAILS_AT_TOP' at line 157 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `HTML_ALIGN_MEMBERS' at line 616 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `XML_SCHEMA' at line 848 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `XML_DTD' at line 854 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `MAX_DOT_GRAPH_WIDTH' at line 1115 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `MAX_DOT_GRAPH_HEIGHT' at line 1123 of file `common.doxy' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:26 +01:00
Rhys Kidd
3d18ab72bf doxygen: Fix typo in doxygen/tnl.doxy
TAGFILE relative folder should match .tag file

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:23 +01:00
Rhys Kidd
4ba409a364 doxygen: Correct TAGFILE linkage of main
core.doxy was renamed to main.doxy, along with output folder in
the below 2004 commit.

Correct the other modules' TAGFILE linkage to find the main folder.

  commit 3ef972f538
  Author: Brian Paul <brian.paul@tungstengraphics.com>
  Date:   Sun May 16 22:07:02 2004 +0000

      Replaced 'core' with 'main'.
      Other minor updates.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:19 +01:00
Rhys Kidd
7703a3e3d0 doxygen: Update .gitignore
The last of these output directories was removed in 2007.

  commit c2e0570831
  Author: Jerome Glisse <glisse@freedesktop.org>
  Date:   Fri Feb 16 23:18:56 2007 +0100

      Update doxygen doc to reflet vbo changes.

      Update doxygen doc, array_cache no longuer exist,
      new shiny vbo modules is there. Tested on unix,
      but i think i didn't broke that bat :).

  commit 3ef972f538
  Author: Brian Paul <brian.paul@tungstengraphics.com>
  Date:   Sun May 16 22:07:02 2004 +0000

      Replaced 'core' with 'main'.
      Other minor updates.

  commit 69db632a9d
  Author: Jose Fonseca <j_r_fonseca@yahoo.co.uk>
  Date:   Thu May 1 23:32:54 2003 +0000

      Move the Doxygen configuration files into the usual places and integrate with the build system.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:15 +01:00
Rhys Kidd
ced18f4d60 doxygen: Remove references to miniglx
miniglx was removed in February 2010. Clean up remaining
unnecessary doxygen references.

  commit a9e3669683
  Author: Kristian Høgsberg <krh@bitplanet.net>
  Date:   Thu Feb 25 16:17:04 2010 -0500

      Remove remaining miniglx references

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:12 +01:00
Rhys Kidd
29b805b929 doxygen: Fix doxygen/gbm.doxy TAGFILES
There has never been a doxygen/gbm_setup output folder.

Appears to have been a copy-paste error from original commit
in 245341f406.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:08 +01:00
Rhys Kidd
684e7a4a14 doxygen: Correct TAGFILE relative paths
Per Doxygen documentation, to combine external documentation (stored in
a *.tag file) with a project the TAGFILES option should be set in the
configuration file.

  A tag file typically only contains a relative location of the
  documentation from the point where doxygen was run. So when
  you include a tag file in other project you have to specify
  where the external documentation is located in relation this
  project.

  You can do this in the configuration file by assigning the
  (relative) location to the tag files specified after the
  TAGFILES configuration option.

  If you use a relative path it should be relative with respect
  to the directory where the HTML output of your project is
  generated; so a relative path from the HTML output directory
  of a project to the HTML output of the other project that is
  linked to.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:44:04 +01:00
Rhys Kidd
f066fb529b doxygen: Fix doxygen/glapi.doxy
The src/mesa/glapi folder was relocated in the below commit.
Amend the doxygen/glapi.doxy INPUT setting accordingly.

Whilst here, in addition this change also avoids a bug in the
consolidated Doxygen output caused by doxygen/glapi.doxy inadvertently
overwriting doxygen/swrast.tag via its GENERATE_TAGFILE setting.

This bug depended upon the specific order each *.tag was built.

   commit 296adbd545
   Author: Chia-I Wu <olv@lunarg.com>
   Date:   Mon Apr 26 12:56:44 2010 +0800

       glapi: Move to src/mapi/.

       Move glapi to src/mapi/{glapi,es1api,es2api}.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:43:58 +01:00
Rhys Kidd
cf3bc91c06 doxygen: Remove src/mesa/shader/ references
Mesa has not had a src/mesa/shader/ folder since Mesa 7.9 removed it
in October 2010, as part of a revised GLSL compiler written by Intel.

Remove doxygen/shader.doxy and consequential changes made throughout.

In addition to removing an unnecessary Doxygen doxyfile, this change also
avoids a bug in the consolidated Doxygen output caused by
doxygen/shader.doxy inadvertently overwriting doxygen/swrast.tag via its
GENERATE_TAGFILE setting.

This bug depended upon the specific order each *.tag was built.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-13 13:43:54 +01:00
Marek Olšák
04f15e491f gallium/radeon: add an env variable to force a level of aniso filtering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-13 12:42:28 +02:00
Jose Fonseca
cc5d8b678e llvmpipe: Test rounding of x.5.
Leverage nearbyintif function, which should be available on all C99
implementations.

Trivial.
2016-04-13 11:13:05 +01:00
Roland Scheidegger
cb438d8b3e gallivm: use llvm.nearbyint instead of llvm.round.
We used to use sse roundps intrinsic directly, but switched to use the llvm
intrinsics for rounding with e4f01da15d.
However, llvm semantics follows standard math lib round function which is
specced to do roundNearestAwayFromZero but we really want roundNearestEven
(moreoever, using round generates atrocious code since the cpu can't do it
directly and it results in scalar calls to libm __roundf).
So, use llvm.nearbyint instead, which does exactly the right thing, and even
has the advantage of being available with llvm 3.3 too. (I've verified it
actually generates a roundps instruction with llvm 3.3.)

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-13 11:13:03 +01:00
Pierre Moreau
f525db6358 nv50/ra: isinf() is in namespace std since C++11.
This fixes a compile error while building Nouveau with C++11 enabled (and
glibc >= 2.23). This happens if SWR is enabled, as it forces C++11.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>

https://bugs.freedesktop.org/show_bug.cgi?id=94907
2016-04-13 07:41:13 +01:00
Jose Fonseca
fa46848e51 scons: Allow building with Address Sanitizer.
libasan is never linked to shared objects (which doesn't go well with
-z,defs).  It must either be linked to the main executable, or (more
practically for OpenGL drivers) be pre-loaded via LD_PRELOAD.

Otherwise works.

I didn't find anything with llvmpipe.  I suspect the fact that the
JIT compiled code isn't instrumented means there are lots of errors it
can't catch.

But for non-JIT drivers, the Address/Leak Sanitizers seem like a faster
alternative to Valgrind.

Usage (Ubuntu 15.10):

   scons asan=1 libgl-xlib
   export LD_LIBRARY_PATH=$PWD/build/linux-x86_64-debug/gallium/targets/libgl-xlib
   LD_PRELOAD=libasan.so.2 any-opengl-application

Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-04-13 06:54:32 +01:00
Kenneth Graunke
d1c89f6005 mesa: Change an error code in glSamplerParameterI[iu]v().
This is supposed to be INVALID_OPERATION in ES.  We already did this
for the fv/iv variants, but not Iiv/Iuv, which are new in ES 3.2 (or
extensions).

Fixes:
ES31-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-12 20:30:32 -07:00
Jose Fonseca
46bfcd61f5 softpipe: Free tgsi.image elements on context destruction.
Courtesy of address sanitizer.

[airlied: free buffers as well]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 13:21:37 +10:00
Edward O'Callaghan
5a3d928e2c softpipe: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 13:21:37 +10:00
Eric Anholt
3b63301d9f vc4: Work around hardware limits on the number of verts in a single draw.
Fixes rendering failures in glmark2's refract and
bump:render-mode=high-poly demos, and partially in its terrain demo.
2016-04-12 19:10:51 -07:00
Thomas Hindoe Paaboel Andersen
6d6525a377 softpipe: avoid buffer overflow
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 11:51:35 +10:00
Thomas Hindoe Paaboel Andersen
b89708f95f tgsi: fix buffer overflow
Increase r to four channels as rgba is written to it
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 11:51:34 +10:00
Tim Rowley
b9294bc345 swr: handle pci cap requests
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-04-12 20:18:00 -05:00
Tim Rowley
b19d214b23 swr: support samplers in vertex shaders
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-04-12 20:18:00 -05:00
Nicolai Hähnle
10cfd7a604 radeonsi: enable GLSL 4.20 and therefore OpenGL 4.2
This is the last necessary bit for OpenGL 4.2 support. All driver-specific
functionality has already been implemented as part of extensions.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 20:13:49 -05:00
Iurie Salomov
047e3264f6 va: check null context in vlVaDestroyContext
Signed-off-by: Iurie Salomov <iurcic@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
2016-04-13 00:52:53 +01:00
Jason Ekstrand
8f3b516f2e nir/clone: Copy bit size when cloning registers
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-12 16:41:58 -07:00
Marek Olšák
8e70a58af3 radeonsi: fix a critical SI hang since PIPELINESTAT_START/STOP was added
For some reason unknown to me, SI hangs if the event is written after
CONTEXT_CONTROL.
2016-04-13 01:05:15 +02:00
Kenneth Graunke
95d622e16d glsl: Don't copy propagate or tree graft precise values.
This is kind of a hack.  We currently track precise requirements
by decorating ir_variables.  Propagating or grafting the RHS of an
assignment to a precise value into some other expression tree can
lose those decorations.

In the long run, it might be better to replace these ir_variable
decorations with an "exact" decoration on ir_expression nodes,
similar to what NIR does.

In the short run, this is probably good enough.  It preserves
enough information for glsl_to_nir to generate "exact" decorations,
and NIR will then handle optimizing these expressions reasonably.

Fixes ES31-CTS.gpu_shader5.precise_qualifier.

v2: Drop invariant handling, as it shouldn't be necessary (caught
    by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-12 15:57:48 -07:00
Mark Janes
9e351e077b util: Fix race condition on libgcrypt initialization
Fixes intermittent Vulkan CTS failures within the test groups:
dEQP-VK.api.object_management.multithreaded_per_thread_device
dEQP-VK.api.object_management.multithreaded_per_thread_resources
dEQP-VK.api.object_management.multithreaded_shared_resources

Signed-off-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94904

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-12 15:38:43 -07:00
Kristian Høgsberg Kristensen
8ec971a997 i965/tiled_memcpy: Fix rgba8_copy_16_aligned_dst() typo
Copy and paste error in commit eafeb8db66:

    i965/tiled_memcpy: Unroll bytes==64 case.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-12 15:32:43 -07:00
Kristian Høgsberg Kristensen
1af0f0151c glsl/linker: Recurse on struct fields when adding shader variables
ARB_program_interface_query requires that we add struct fields
recursively down to basic types.

Fixes 52 struct test cases in dEQP-GLES31.functional.program_interface_query.*

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 14:38:26 -07:00
Kristian Høgsberg Kristensen
778fd46aa4 glsl/linker: Pass name and type through to create_shader_variable()
No functional change here, but this now lets us recurse throught structs
in add_shader_variable().

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 14:38:26 -07:00
Kristian Høgsberg Kristensen
09f0121593 glsl/linker: Pass absolute location to add_shader_variable()
This lets us pass in the absolution location of a variable instead of
computing it in add_shader_variable() based on variable location and
bias. This is in preparation for recursing into struct variables.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 14:38:26 -07:00
Kristian Høgsberg Kristensen
8ab6aae4dc glsl/linker: Add add_shader_variable() helper
This consolidates the combination of create_shader_variable() and
add_program_resource() into a new helper function. No functional
difference, but we'll expand add_shader_variable() in the next few
commits.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 14:38:26 -07:00
Matt Turner
eafeb8db66 i965/tiled_memcpy: Unroll bytes==64 case.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 14:37:05 -07:00
Roland Scheidegger
0e605d9b3a i965/tiled_memcpy: Provide SSE2 for RGBA8 <-> BGRA8 swizzle.
The existing code uses SSSE3, and because it isn't compiled in a
separate file compiled with that, it is usually not used (that, of
course, could be fixed...), whereas SSE2 is always present with 64-bit
builds.  This should be pretty much as fast as the pshufb version,
albeit those code paths aren't really used on chips without llc in any
case.

v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments
v4: [mattst88] Rebase

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-12 14:37:01 -07:00
Matt Turner
fc88b4babf i965/tiled_memcpy: Move SSSE3 code back into inline functions.
This will make adding SSE2 code a lot cleaner.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 14:36:59 -07:00
Matt Turner
0a5d8d9af4 i965/tiled_memcpy: Optimize RGBA -> BGRA swizzle.
Replaces four byte loads and four byte stores with a load, bswap,
rotate, store; or a movbe, rotate, store.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 14:36:56 -07:00
Nicolai Hähnle
a191e6b719 radeonsi: fix bounds check in si_create_vertex_elements
This was triggered by
dEQP-GLES3.functional.vertex_array_objects.all_attributes

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 16:32:46 -05:00
Nicolai Hähnle
4285a97cea docs: mark atomic counters and SSBOs as done for radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:51 -05:00
Nicolai Hähnle
bfd11c5996 radeonsi: enable shader buffer pipe caps
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:48 -05:00
Nicolai Hähnle
4e81843b13 radeonsi: add shader buffer support to TGSI_OPCODE_RESQ
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:45 -05:00
Nicolai Hähnle
01109282ce radeonsi: add shader buffer support to TGSI_OPCODE_STORE
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:43 -05:00
Nicolai Hähnle
745014c502 radeonsi: add shader buffer support to TGSI_OPCODE_LOAD
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:41 -05:00
Nicolai Hähnle
68bc25c931 radeonsi: add shader buffer support to TGSI_OPCODE_ATOM*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:38 -05:00
Nicolai Hähnle
c6f5d000db radeonsi: add offset parameter to buffer_append_args
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:35 -05:00
Nicolai Hähnle
c565466eea radeonsi: adjust buffer_append_args to take a 128 bit resource
Move the buffer resource extraction code out into its own function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:32 -05:00
Nicolai Hähnle
e88018ffe5 radeonsi: preload shader buffers in shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:29 -05:00
Nicolai Hähnle
c495c0ad37 radeonsi: implement set_shader_buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:26 -05:00
Nicolai Hähnle
73c8b85b64 radeonsi: move resetting of constant buffers into a separate function
This will be re-used for shader buffers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:04 -05:00
Haixia Shi
35ade36c88 dri/i965: fix incorrect rgbFormat in intelCreateBuffer().
It is incorrect to assume that pixel format is always in BGR byte order.
We need to check bitmask parameters (such as |redMask|) to determine whether
the RGB or BGR byte order is requested.

v2: reformat code to stay within 80 character per line limit.
v3: just fix the byte order problem first and investigate SRGB later.
v4: rebased on top of the GLES3 sRGB workaround fix.
v5: rebased on top of the GLES3 sRGB workaround fix v2.

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 14:06:45 -07:00
Kenneth Graunke
e303e88a9c glsl: Reject illegal qualifiers on atomic counter uniforms.
This fixes

dEQP-GLES31.functional.uniform_location.negative.atomic_fragment
dEQP-GLES31.functional.uniform_location.negative.atomic_vertex

Both of which have lines like

layout(location = 3, binding = 0, offset = 0) uniform atomic_uint uni0;

The ARB_explicit_uniform_location spec makes a very tangential mention
regarding atomic counters, but location isn't something that makes sense
with them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-12 14:06:42 -07:00
Kenneth Graunke
929e44099f glsl: Add a method to print error messages for illegal qualifiers.
Suggested by Timothy Arceri a while back on mesa-dev:
https://lists.freedesktop.org/archives/mesa-dev/2016-February/107735.html

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-04-12 14:06:42 -07:00
John Sheu
7f08547248 xlib: fix memory leak on Display close
The XMesaVisual instances freed in the visuals table on display close
are being freed with a free() call, instead of XMesaDestroyVisual(),
causing a memory leak.

Signed-off-by: John Sheu <sheu@google.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-12 13:56:41 -06:00
Jakob Sinclair
d04bb14d04 st/mesa: Replace GLvoid with void
GLvoid was used before in OpenGL but it has changed to just using void.
All GLvoids in mesa's state tracker has been changed to void in this patch.

Tested this with piglit and no problems were found. No compiler warnings.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-12 13:37:16 -06:00
Bas Nieuwenhuizen
126da23d70 radeonsi: Mark ARB_robust_buffer_access_behavior as supported.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 20:53:10 +02:00
Bas Nieuwenhuizen
70dcd841f7 gallium: Add capability for ARB_robust_buffer_access_behavior.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 20:53:06 +02:00
Bas Nieuwenhuizen
285dc05055 mesa: Expose the ARB_robust_buffer_access_behavior extension.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 20:40:26 +02:00
Miklós Máté
aad8707b28 main: rework the compatibility check of visuals in glXMakeCurrent
Now it follows the compatibility criteria listed in section 2.1 of
the GLX 1.4 specification.
This is needed for post-process effects in SW:KotOR.

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 19:48:01 +02:00
Tim Rowley
df37b06276 swr: [rasterizer core] warning cleanup
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
06c59dc417 swr: [rasterizer] Put in rudimentary garbage collection for the global arena allocator
- Check for unused blocks every few frames or every 64K draws
- Delete data unused since the last check if total unused data is > 20MB

Doesn't seem to cause a perf degridation

Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
b990483de2 swr: [rasterizer core] Put DRAW_CONTEXT on a diet
No need for 256 pointers per DC.

Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
a939a58881 swr: [rasterizer core] Add experimental support for hyper-threaded front-end
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
9a8146d0ff swr: [rasterizer] Avoid segv in thread creation on machines with non-consecutive NUMA topology.
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
2c71fd4bf8 swr: [rasterizer core] Replace all naked OSALIGN macro uses with OSALIGNSIMD / OSALIGNLINE
Future proofing

Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
32a8653ad2 swr: [rasterizer] Ensure correct alignment of stack variables used as vectors
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
e1871c4459 swr: [rasterizer core] Quantize depth to depth buffer precision prior to depth test/write.
Fixes z-fighting issues.

Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
2a19aca05f swr: [rasterizer common] win32 build fixups
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:05 -05:00
Tim Rowley
c25244f2f7 swr: [rasterizer core] Affinitize thread scratch space to numa node of worker
Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:04 -05:00
Tim Rowley
f89f6d562a swr: [rasterizer] Misc fixes identified by static code analysis
No perf loss detected

Acked-by: Brian Paul <brianp@vmware.com>
2016-04-12 11:52:04 -05:00
Brian Paul
6c01478213 st/mesa: fix memleak in glDrawPixels cache code
If the glDrawPixels size changed, we leaked the previously cached
texture, if there was one.  This patch fixes the reference counting,
adds a refcount assertion check, and better handles potential malloc()
failures.

Tested with a modified version of the drawpix Mesa demo which changed
the image size for each glDrawPixels call.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-04-12 10:44:45 -06:00
Jose Fonseca
b5105e67a8 gallium: Use STATIC_ASSERT whenever possible.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 16:56:15 +01:00
Jose Fonseca
b025c23cfe softpipe: Use STATIC_ASSERT whenever possible.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 16:56:15 +01:00
Jose Fonseca
2f13d7543f svga: Use STATIC_ASSERT whenever possible.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 16:56:15 +01:00
Jose Fonseca
7279098dc5 mesa: Use STATIC_ASSERT whenever possible.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-12 16:56:15 +01:00
Marek Olšák
686b018ab3 r600g: use common scissor and viewport code
It's the same as radeonsi. This adds guard band support to r600g.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:25 +02:00
Marek Olšák
87a5b07f90 gallium/radeon: add R600/Evergreen/Cayman support to common viewport code
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:25 +02:00
Marek Olšák
2ca5566ed7 radeonsi: move scissor and viewport states into gallium/radeon
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:13:24 +02:00
Marek Olšák
db00f6cc9c radeonsi: use guard band clipping
Guard band clipping speeds up rasterization for primitives that are
partially off-screen.  This change in particular results in small
framerate improvements in a wide range of games.

Started by Grigori Goronzy <greg@chown.ath.cx>.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 17:12:14 +02:00
Marek Olšák
cb21f8a97c radeonsi: compute scissor from viewport in set_viewport_states
and clamp it right before emitting. This is a prerequisite for computing
the guard band.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:49 +02:00
Marek Olšák
5b6a0b7fc0 gallium/radeon: set GTT WC on tiled textures
Just for consistency. This should have no effect, because OpenGL textures
always go to VRAM.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
5a4b74d1ba gallium/radeon: relax requirements on VRAM placements on APUs
This makes Tonga with vramlimit=128 2x faster in Heaven.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
a57309f807 winsys/amdgpu: remove hack for low VRAM configuration
A better solution will be used.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
b36f19bf98 r600g: disable aniso filtering for non-mipmap textures on EG
this is the default behavior of the closed driver when running on VI

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
3bc2d967c4 r600g: clean up aniso state translation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
b0d4469519 radeonsi: disable aniso filtering for non-mipmap textures on SI-CI
The closed driver does this, but it looks at base_level and last_level
and uses a conditional assignment, which LLVM can't generate on SGPRs.

That led me to invent this solution that abuses the image descriptor.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
ddd33431c5 radeonsi: clean up aniso state translation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
f7420ef5b4 radeonsi: enable some sampler fields to match the closed driver
copied from the Vulkan driver

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
1a98be001f gallium/radeon: fix maximum texture anisotropy setup
We were overdoing it for non-power-of-two values.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
2d7be5d37e gallium/radeon: never choose a linear tiling for DB surfaces
Just for consistency. This is actually not a problem, because both addrlib
and radeon check and fix this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
b7878146c4 gallium/radeon: removing dead code for sharing stencil buffers
This is a remnant of the times when the DDX was allocating depth-stencil
buffers for windows. Now, st/dri allocates them and doesn't share them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
73aeebd772 radeonsi: allow clearing buffers >= 4 GB
Only CMASK and DCC clears can use this, because only textures can be so
large.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
1dd8832e04 gallium/radeon: allow allocating textures >= 4 GB
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:48 +02:00
Marek Olšák
0689741e51 winsys/radeon: fix printing allocation failures
print as unsigned instead of signed

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
0ba0933f48 winsys/amdgpu: add support for 64-bit buffer sizes
v2: fail in radeon_winsys_bo_create if size > 32 bits

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
7e78b5ed38 pb_buffer: switch pb_buffer::size to 64 bits
being able to allocate more than 4 GB may be useful

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
e241a63512 gallium/radeon: remove R600_QUERY_HW_FLAG_TIMER
not used anymore

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
0222351fc1 gallium/radeon: merge timer and non-timer query lists
All of them are paused only between IBs.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
7347c068d8 r600g: don't manually stop queries for blitter
r600_set_active_query_state does it better.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
12fee5b93e r600g: add pausing pipeline & streamout queries into set_active_query_state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
e90fe60b72 r600g: implement set_active_query_state for pausing occlusion queries
Use ZPASS_INCREMENT_DISABLE everywhere.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
5248676f87 r600g: simplify r600_set_occlusion_query_state
The caller does the same checking.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
b82893f93a gallium/radeon: move pipeline stat context flags to common code
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
aa79a3269f r600g: fix typo in r600 register definitions
Acked-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
a4c288d8e1 gallium/radeon: unify checking streamout enable state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:47 +02:00
Marek Olšák
466aa57185 radeonsi: fix mask checking when emitting scissors and viewports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
2016-04-12 14:29:46 +02:00
Marek Olšák
f3eebb84eb radeonsi: implement and rely on set_active_query_state
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Marek Olšák
e599b8f384 gallium: pause queries for all meta ops
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Marek Olšák
26171bd67e gallium: add pipe_context::set_active_query_state for pausing queries
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-12 14:29:46 +02:00
Bas Nieuwenhuizen
fc67375379 radeonsi: Synchronize a streamout write after read hazard.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 13:55:38 +02:00
Hans de Goede
dccdb655a1 nv30: Add missing PIPE_SHADER_CAP_INTEGERS to get_shader_param()
Add missing PIPE_SHADER_CAP_INTEGERS for frag shaders to
nv30_screen_get_shader_param().

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-04-12 11:41:12 +02:00
Haixia Shi
b0e3ba61b5 dri/i965: extend GLES3 sRGB workaround to cover all formats
It is incorrect to assume BGRA byte order for the GLES3 sRGB workaround.

v2: use _mesa_get_srgb_format_linear to handle all formats

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 02:06:12 -07:00
Eduardo Lima Mitev
ea8a65f503 i965: Add autogenerated 'brw_nir_trig_workarounds.c' to gitignore
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 10:44:19 +02:00
Rhys Kidd
703c1e69d8 glsl: Update hash table comments in constant propagation
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-12 01:29:19 -07:00
Dave Airlie
afa8707ba9 softpipe: add SSBO/shader atomics support.
This adds support for the features requires for ARB_shader_storage_buffer_object
and ARB_shader_atomic_counters, ARB_shader_atomic_counter_ops.

[airlied: some cleanups applied]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:16:13 +10:00
Dave Airlie
c2aeeca455 draw: add support for passing buffers to vs/gs shaders.
Like the image code, but for shader buffers this time.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:15:36 +10:00
Dave Airlie
081a958bcd tgsi: add support for buffer/atomic operations to tgsi_exec.
This adds support for doing load/store/atomic operations on
buffer objects.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:15:33 +10:00
Dave Airlie
9c7a0d188a tgsi: set nonhelpermask for vertex shaders
For atomic operations we really need to avoid executing unnecessary shaders, so for some
tests that just draw a single point we only want one vertex to get processed not 4,

this fixes a number of the atomic counters tests.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-12 14:15:16 +10:00
Ian Romanick
193a5cee6a nir: Fix typo in comment
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-11 19:24:19 -07:00
Markus Wick
18c8b927e2 nir: Merge redudant integer clamping.
Dolphin uses them a lot. Range tracking would be better in the long term,
but this two lines works fine for now.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 18:48:50 -07:00
Kenneth Graunke
bfd17c76c1 i965: Port INTEL_PRECISE_TRIG=1 to NIR.
This makes the extra multiply visible to NIR's algebraic optimizations
(for constant reassociation) as well as constant folding.  This means
that when the result of sin/cos are multiplied by an constant, we can
eliminate the extra multiply altogether, reducing the cost of the
workaround.

It also means we only have to implement it one place, rather than in
both backends.

This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
which has a ton of sin() calls, but always multiplies them by an
immediate constant.  The extra multiply gets folded away.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:44:17 -07:00
Kenneth Graunke
b0dffdc616 i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.
I want to be able to read other fields.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:44:12 -07:00
Kenneth Graunke
808d26c771 nir: Silence unused "options" warning in algebraic passes.
Some passes may not refer to options->..., at which point the compiler
will warn about an unused variable.  Just cast to void unconditionally
to shut it up.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:44:08 -07:00
Kenneth Graunke
5886cd79a0 nir: Do basic constant reassociation.
Many shaders contain expression trees of the form:

    const_1 * (value * const_2)

Reorganizing these to

    (const_1 * const_2) * value

will allow constant folding to combine the constants.  Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether.  Other
times, it can create more immediate constants, which can actually hurt.

Finding a good balance here is tricky.  While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.

shader-db results on Broadwell:

total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245

total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:43:55 -07:00
Boyuan Zhang
1c7ba7f156 radeon/uvd: alignment fix for decode message buffer
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-04-11 19:30:47 -04:00
Brian Paul
704d203d5f st/mesa: replace _mesa_sysval_to_semantic table with function
Instead of using an array indexed by SYSTEM_VALUE_x, just use a
switch statement.  This fixes a regression caused by inserting new
SYSTEM_VALUE_ enums but not updating the mapping to TGSI semantics.

v2: fix a few switch statement mistakes for compute-related enums

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-11 17:04:13 -06:00
Jason Ekstrand
a9e6213edd nir/lower_system_values: Add support for several computed values
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-11 13:53:03 -07:00
Jason Ekstrand
39103145ff glsl/shader_enums: Add the other two compute builtins
These weren't added before because they are actually calculated values that
are computed from other inputs.  However, in order to handle them in
nir_lower_system_values, it's nice for them to have a cannonical locaiton.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-11 13:53:00 -07:00
Jason Ekstrand
22836dbefa glsl/shader_enums: Add an enum for Vulkan InstanceIndex
In Vulkan, you have InstanceIndex which begins at the base instance value
rather than the zero-based InstanceID of GL.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-11 13:52:51 -07:00
Emil Velikov
581c8016f8 mesa: add missing header to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-11 19:08:23 +01:00
Emil Velikov
5e010a72c9 drivers/softpipe: add missing header to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-11 19:08:23 +01:00
Emil Velikov
c69ab885d7 mesa: automake: update and reuse X86_SSE41_FILES list
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 19:08:23 +01:00
Emil Velikov
28da0d6922 compiler: android: flesh out nir into separate makefile
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 19:08:23 +01:00
Emil Velikov
8d51500b2d compiler: automake: flesh out NIR into separate makefile.
Analogous to previous commit - improved readability at the expense of
an extra file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 19:08:23 +01:00
Emil Velikov
9324afc0e9 compiler: automake: split out glsl into separate makefile
Preserve the functionality while keeping the files smaller and
more readable.

v2: Do not include Makefile.sources from the GLSL makefile (silences
automake warnings)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2016-04-11 19:08:23 +01:00
Emil Velikov
3d67780b80 compiler: remove {glsl,nir}/Makefile.sources
No longer used as of last commit.

v2: Rebase.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2016-04-11 19:08:23 +01:00
Emil Velikov
c481c8f7f1 configure.ac: update the path of the generated files
... in order to determine if we need bison/flex. Failing to locate the
files will lead to mandating bison/flex even when building from a
release tarball.

CC: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 19:08:23 +01:00
Emil Velikov
4db8f15a25 glsl: move the android build scripts a level up
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 19:08:23 +01:00
Emil Velikov
abf7088eb7 glsl: move the scons build script a level up
It will allow us to remove the duplicate glsl/Makefile.sources.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 19:08:23 +01:00
Emil Velikov
594e868555 Part revert "gallium/auxiliary: don't build NIR sources with MSVC2008 flags"
This reverts commit 41c7912d04 but leaves
out the pragma [that inspired the original commit].

Building mesa requires MSVC2013 or later, thus we no longer need this.

v2: Use correct include path (src/glsl/nir -> src/compiler/nir)

Conflicts:
	src/gallium/auxiliary/Makefile.am

Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2016-04-11 19:08:23 +01:00
Nicolai Hähnle
590a37dc05 GL3: ARB_shader_image_load_store/size is done for radeonsi also in GLES
Trivial.
2016-04-11 12:48:10 -05:00
Brian Paul
05aec42d3d docs: fix Coverity URL 2016-04-11 09:10:39 -06:00
Oded Gabbay
d97f5d60f5 tgsi/doc: fix spelling error
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-11 11:43:43 +03:00
Jason Ekstrand
3aa1a5ee88 nir/lower_system_values: Simplify the computation of LocalInvocationIndex 2016-04-10 23:43:38 -07:00
Connor Abbott
a89c474157 nir: add a pass for lowering (un)pack_double_2x32
v2: Undo unintended change to the signature of
    nir_normalize_cubemap_coords (Iago).

v3: Move to compiler/nir (Iago)

v4: Remove Authors from copyright header (Michael Schellenberger)

v5 (Sam):
- Use nir_channel() and nir_ssa_for_alu_src() helpers (Jason)
- Inline lower_double_pack_instr() code into lower_double_pack_block()
  (Jason).
- Initialize nir_builder at lower_double_pack_impl() (Jason).

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Connor Abbott
663e6421df nir: add split versions of (un)pack_double_2x32
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Connor Abbott
b093808d26 nir: don't try to scalarize unpack_double_2x32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Connor Abbott
9e31e0a21b nir: add support for (un)pack_double_2x32
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Iago Toral Quiroga
d5d6260329 nir: add i2d and u2d opcodes
v2:
- Assert supports_int and don't fallback to nir_fmov (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Iago Toral Quiroga
b16d06252e nir: add d2i, d2u, d2b opcodes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Connor Abbott
a4bce07dc6 nir: add support for d2f and f2d
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Iago Toral Quiroga
fab5d4cd95 nir/glsl_to_nir: set bit_size on ssbo_load result
v2 (Sam):
- Add missing bit_size assignment when ssbo_load destination is a boolean.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:29:27 +02:00
Samuel Iglesias Gonsálvez
a741378cb5 nir/glsl_to_nir: add bit-size info to add_instr()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:28:01 +02:00
Connor Abbott
4b37c64f3b nir/split_var_copies: handle doubles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Connor Abbott
106a1b5501 nir/instr_set: handle 64-bit bit-sizes
v2: Revert spurious change in nir_opt_cse.c (Iago)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Connor Abbott
f2ccb63be1 nir: handle doubles in nir_deref_get_const_initializer_load()
v2 (Sam):
- Use proper bitsize value when calling to nir_load_const_instr_create()
  (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Connor Abbott
41c2541fc7 nir/print: add support for printing doubles and bitsize
v2:
- Squash the printing doubles related patches into one patch (Sam).

v3:
- Print using PRIx64 format: long is 32-bit on some 32-bit platforms but long
long is basically always 64-bit (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Connor Abbott
f5551f8a8b nir/glsl_to_nir: support doubles
v2:
- Don't set sized types to the destination of texture related opcodes.
  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Iago Toral Quiroga
8e69782e3e nir/lower_load_const_to_scalar: support doubles and multiple bit sizes
v2 (Sam):
- Add assert to detect bitsizes differents than 32 and 64 (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:05 +02:00
Iago Toral Quiroga
12f628adcb nir/lower_to_source_mods: Handle different bit sizes
v2 (Sam):
- Use helper to get base type from nir_alu_type.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Samuel Iglesias Gonsálvez
3663a2397e nir: add bit_size info to nir_load_const_instr_create()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Connor Abbott
a5b17ae745 nir/lower_vec: adapt to different bit sizes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Samuel Iglesias Gonsálvez
e3edaec739 nir: add bit_size info to nir_ssa_undef_instr_create()
v2:
- Make the users to give the right bit_sizes as arguments (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Connor Abbott
41a39e3384 nir/locals_to_regs: adapt to different bit sizes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Connor Abbott
40d1b671a9 nir/from_ssa: adapt to different bit sizes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 08:27:04 +02:00
Timothy Arceri
4979cec820 i965: fix struct type in comment
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-11 14:03:09 +10:00
Jason Ekstrand
7d58cfa366 nir: Add a pass for gathering various bits of shader info
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-10 20:43:47 -07:00
Ilia Mirkin
875543e270 i965: enable OES_texture_buffer on gen7+
It will only end up getting exposed on gen8+ since it requires GL ES
3.1, but it should be ready to go on gen7 when support for GL ES 3.1 is
completed there.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-10 20:24:26 -07:00
Dave Airlie
6f5f818b6d docs: add some missing softpipe entries.
I just forgot these when I added this stuff.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-11 13:14:48 +10:00
Kenneth Graunke
26c56e24e7 glsl: Don't remove XFB-only varyings.
Consider the case of linking a program with both a vertex and fragment
shader.  The VS may compute output varyings that are intended for
transform feedback, and not read by the fragment shader.

In this case, var->data.is_unmatched_generic_inout will be true,
but we still cannot eliminate the varyings.  We need to also check
!var->data.is_xfb_only.

Fixes failures in ES31-CTS.gpu_shader5.fma_precision_*, which happen
to use transform feedback in a way we apparently hadn't seen before.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-10 19:03:06 -07:00
Kenneth Graunke
ce84a92df5 i965/disasm: Decode per-slot offsets.
We just never bothered to decode this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-04-09 21:10:32 -07:00
Kenneth Graunke
20c8f36508 i965/disasm: Decode "channel mask present" bit correctly.
Bit 15 means "interleave" for most messages, but for SIMD8 messages it
means "use channel masks".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-04-09 21:10:20 -07:00
Kenneth Graunke
b790232524 i965/disasm: Simplify the URB opcode printing with ?:.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-04-09 21:10:11 -07:00
Ilia Mirkin
9b5bd20eb2 glsl: allow usage of the keyword buffer before GLSL 430 / ESSL 310
The GLSL 4.20 and ESSL 3.00 specs don't list 'buffer' as a reserved
keyword. Make the parser ignore it unless GLSL 4.30 / ESSL 3.10 are
used, or ARB_shader_storage_buffer_objects is enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-04-09 20:41:54 -04:00
Jason Ekstrand
bff7a8c4f3 anv/pipeline: Set up flat enables correctly 2016-04-09 17:06:59 -07:00
Jason Ekstrand
1275c7c744 genxml: Fix the name of a 3DSTATE_SF/SBE field on gen6-7.5 2016-04-09 17:02:21 -07:00
Jason Ekstrand
aa6f9a4e1e genxml: Break output detail of 3DSTATE_SF on gen7 into a struct
This makes it work like 3DSTATE_SBE[_SWIZ] on gen7+
2016-04-09 17:00:22 -07:00
Jason Ekstrand
ddae342618 genxml: Fix up MOCS in RENDER_SURFACE_STATE on gen6 to match gen7 2016-04-09 16:59:04 -07:00
Ilia Mirkin
cdb6fa91fa nvc0: handle the case where there are no framebuffer attachments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:44 -04:00
Ilia Mirkin
59ca92137b nv50,nvc0: support sending string markers down into the command stream
This should hopefully make it a little easier to debug with GL
applications like glretrace and looking at command streams.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:43 -04:00
Ilia Mirkin
f9480d7918 nv50,nvc0: add invalidate_resource support for buffer resources
Provide a callback to reallocate the underlying storage of a resource so
that it is not bound to any existing fences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-09 14:55:43 -04:00
Eric Anholt
30b818d5eb vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes.
This gives us one less set of special instruction generation cases, and
instead just the case for returning the correct register to read.
2016-04-08 18:41:46 -07:00
Eric Anholt
f029932cac vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.
This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.

No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
2016-04-08 18:41:46 -07:00
Eric Anholt
44d7b8ad12 vc4: Add a helper function for the construction of qregs.
The separate declaration of the struct is not helping clarity, and I was
going to be writing a whole lot more of these in the upcoming patches.
2016-04-08 18:41:45 -07:00
Eric Anholt
114c8b38d3 vc4: Add missing scheduling dependency for MS color writes. 2016-04-08 18:41:45 -07:00
Eric Anholt
483c172989 vc4: Drop the multi_instruction distinction for QIR instructions.
It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.

No change on shader-db.
2016-04-08 18:41:45 -07:00
Eric Anholt
a8b525f8c4 vc4: Handle SF on instructions that write r4.
Normal SFU writes couldn't have SF because they were marked as
multi_instruction, but tex_result and tlb_color_read weren't.  This ended
up not being a problem according to anything in shader-db, but it seems
possible.
2016-04-08 18:41:45 -07:00
Eric Anholt
e46b48963a vc4: Allow multi-instruction QIR nodes to get VPM optimization.
There used to be multi-instruction operations that would use src[] twice,
which is why we couldn't do some optimizations on them.  This is no longer
the case.

total instructions in shared programs: 77973 -> 77969 (-0.01%)
instructions in affected programs:     84 -> 80 (-4.76%)
total estimated cycles in shared programs: 234165 -> 234157 (-0.00%)
estimated cycles in affected programs:     92 -> 84 (-8.70%)
2016-04-08 18:41:45 -07:00
Eric Anholt
99a759a4a3 vc4: Switch to using NIR_PASS macros.
This gets us better validation of our NIR transformations.
2016-04-08 18:41:45 -07:00
Eric Anholt
7030eadbed vc4: Handle nir_intrinsic_load_user_clip_plane as a vec4.
I liked having all my NIR be scalar, but nir_validate() complains that the
intrinsic writes 4 components but the destination we set up was only 1
component.  I could generate a new scalar variant, but it's a lot easier
to just leave it as a vec4.  This doesn't hurt codegen since we GC unused
uniforms, and UCP dot products use all the components anyway.
2016-04-08 18:40:55 -07:00
Rhys Kidd
40e77741cf vc4: Emit a warning and proceed for handling loops in NIR.
We don't really suppor control flow yet, but it's a lot nicer to render
something and warn on stderr than to crash.

Fixes the following piglit tests:
- shaders/complex-loop-analysis-bug
- shaders/glsl-fs-discard-04

Converts the following piglit tests from crash to fail:
- shaders/glsl-fs-continue-inside-do-while
- shaders/glsl-fs-loop
- shaders/glsl-fs-loop-continue
- shaders/glsl-fs-loop-nested
- shaders/glsl-texcoord-array
- shaders/glsl-vs-continue-inside-do-while
- shaders/glsl-vs-loop
- shaders/glsl-vs-loop-continue
- shaders/glsl-vs-loop-nested

No piglit regressions.

v2 (Eric): Add stronger stderr warning.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
2450b219e5 vc4: Add a stub for NIR->QIR of control flow function nodes
We shouldn't have any NIR functions present since all GLSL functions get
inlined, but this would be a more informative error if it does happen.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
e5997778bc vc4: Add better debug of NIR->QIR control flow graph failure
Ensure NIR control flow graph nodes that are unhandled in QIR
are reported with sufficient verbosity to aid debugging.

This improves piglit outputs, amongst other tools.

There are no other remaining uses of assert(0) as a blunt tool
within vc4.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Rhys Kidd
e529dd179f vc4: Remove unused include from vc4_program.c
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-04-08 18:28:43 -07:00
Lars Hamre
e25c24c638 glsl: handle unsigned int wraparound in link_shaders()
v2: change check_explicit_uniform_locations() to return an
    unsigned 0 (Timothy Arceri)

We were storing the int result of check_explicit_uniform_locations()
in num_explicit_uniform_locs as an unsigned int which caused it to
be 4294967295 when a -1 was returned.

This in turn would cause the following error during linking:
error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > 98304)

Results from running piglit tests/all with this patch
and when ARB_explicit_uniform_location disabled:

changes:     178
fixes:       176
regressions: 2

The two regressions are for the following tests:
glean@glsl1-matrix column check (1)
glean@glsl1-matrix column check (2)
which regress from FAIL to CRASH.

The regressions are acceptable because the tests are currently failing due to
the aforementioned linker error.

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-09 11:06:04 +10:00
Jason Ekstrand
d4a28ae52a anv/meta: Make clflushes conditional on !devinfo->has_llc 2016-04-08 17:07:49 -07:00
Jason Ekstrand
c226e72a39 anv/formats: Advertise blit support for stencil
Thanks to advances in the blit code, we can do this now.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:59:29 -07:00
Jason Ekstrand
e3312644cb anv/blit2d: Add support for W-tiled destinations
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-08 15:59:26 -07:00
Jason Ekstrand
0a6842c1bd isl/surface_state: Set the correct pitch for W-tiled surfaces
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:52 -07:00
Jason Ekstrand
2e827816fa anv/blit2d: Add another passthrough varying to the VS
We need the VS to provide some setup data for other stages.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:49 -07:00
Jason Ekstrand
b377c1d08e anv/image: Remove the offset parameter from image_view_init
The only place we were using this was in meta_blit2d which always creates a
new image anyway so we can just use the image offset.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:45 -07:00
Jason Ekstrand
f9a2570a06 anv/blit2d: Add a bind_dst helper function
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:42 -07:00
Jason Ekstrand
15a9468d85 anv/blit2d: Simplify create_iview
Now it just creates the image and view.  The caller is responsible for
handling the offset calculations.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:40 -07:00
Jason Ekstrand
b8f3909b73 nir/gather_info: Handle discard_if
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:36 -07:00
Jason Ekstrand
819d0e1a7c anv/meta2d: Add support for blitting from W-tiled sources on gen7
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-08 15:58:03 -07:00
Jason Ekstrand
b0a5ca5cfc isl: Remove surf_get_intratile_offset_el
The intratile offset may not be a multiple of the element size so this
calculation is invalid.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:58:01 -07:00
Jason Ekstrand
b37502b983 isl: Rework the get_intratile_offset function
The old function tried to work in elements which isn't, strictly speaking,
a valid thing to do.  In the case of a non-power-of-two format, there is no
guarantee that the x offset into the tile is a multiple of the format
block size.  This commit refactors it to work entirely in terms of a tiling
(not a surface) and bytes/rows.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:58 -07:00
Jason Ekstrand
4caba94086 anv/image: Expose the guts of CreateBufferView for meta
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:55 -07:00
Jason Ekstrand
4ee80e8816 anv/blit2d: Refactor in preparation for different src/dst types
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:52 -07:00
Jason Ekstrand
85b9a007ac anv/blit2d: Add layouts for using a texel buffer source
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:49 -07:00
Jason Ekstrand
28eb02e345 anv/blit2d: Rename the descriptor set and pipeline layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:47 -07:00
Jason Ekstrand
00e70868ee anv/blit2d: Enhance teardown and clean up init error paths
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:45 -07:00
Jason Ekstrand
43fbdd7156 anv/blit2d: Factor binding the source image into a helper
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:43 -07:00
Jason Ekstrand
5187ab05b8 anv/blit2d: Inline meta_emit_blit2d
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:41 -07:00
Jason Ekstrand
b0a6cfb9b4 anv/blit2d: Pass the source pitch into the shader
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:39 -07:00
Jason Ekstrand
e466164c87 anv/blit2d: Break the texelfetch portion of shader building into a helper
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:37 -07:00
Jason Ekstrand
afada45590 anv/blit2d: Fix whitespace
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:35 -07:00
Jason Ekstrand
9553fd2c97 anv/blit2d: Fix a NIR writemask
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:32 -07:00
Jason Ekstrand
b38a0d64ba anv/meta2d: Don't declare an array sampler in the fragment shader
With the new blit framework we aren't using array textures and, from
talking with Nanley, we don't think it's going to be useful in the future
either.  Just get rid of it for now.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:57:28 -07:00
Jason Ekstrand
dd6f720046 anv/blit2d: Remove the tex_dim parameter from copy_fragment_shader
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-04-08 15:56:52 -07:00
Jason Ekstrand
6cc7aec5b0 i965/tiled_memcopy: Get rid of the direction parameter to get_memcpy
Now that we can use the much simpler rgba8_copy function, we don't need to
hand different functions out based on direction.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-08 12:09:20 -07:00
Jason Ekstrand
d2b32656e1 i965/tiled_memcpy: Rework the RGBA -> BGRA mem_copy functions
This splits the two copy functions into three: One for unaligned copies,
one for aligned sources, and one for aligned destinations.  Thanks to the
previous commit, we are now guaranteed that the aligned ones will *only*
operate on aligned memory so they should be safe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93962
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-08 12:09:15 -07:00
Jason Ekstrand
f6f54a29ca i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functions
Each of the [de]tiling functions has three mem_copy calls:

 1) Left edge to tile boundary
 2) Tile boundary to tile boundary in a loop
 3) Tile boundary to right edge

Copies 2 and 3 start at a tile edge so the pointer to tiled memory is
guaranteed to be at least 16-byte aligned.  Copy 1, on the other hand,
starts at some arbitrary place in the tile so it doesn't have any such
alignment guarantees.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-04-08 12:08:51 -07:00
Ben Widawsky
e5295b5fb4 i965: Check eu/subslices are > 0
Now that the check is restricted to gen8+, we should always get back a non-zero
positive value for the EU and subslice counts.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-08 11:52:29 -07:00
Ben Widawsky
cc01b63d73 i965: Fix eu/subslice warning
Older gen platforms do not actually return a value for sublice and eu total
(IMO, confusingly) they return -ENODEV. This patch defers the SSEU setup until
we have the actual GPU generation to avoid useless warnings when running on
older platforms with older kernels.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-08 11:52:29 -07:00
Ben Widawsky
4213b00e30 i965: Extract SSEU configuration info
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-08 11:51:01 -07:00
Brian Paul
4420f189b6 st/mesa: fix glReadBuffer() assertion failure
If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the
assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in
update_fp().

This is because we were calling st_validate_state() without first
updating Mesa state with _mesa_update_state().

The regression came from commit 83b589301f "st/mesa: fix
frontbuffer glReadPixels regressions".

The new piglit gl-1.0-simple-readbuffer test exercises this.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-08 09:49:05 -06:00
Thomas Hindoe Paaboel Andersen
b9855dcdf7 st/va: avoid dereference after free in vlVaDestroyImage
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-04-08 06:57:17 +01:00
Jason Ekstrand
e26a978773 Merge remote-tracking branch 'public/master' into vulkan 2016-04-07 16:56:34 -07:00
Jason Ekstrand
15895bf777 i965/fs: Use the scale helper in surface_builder
As requested by Curro
2016-04-07 16:49:09 -07:00
Marek Olšák
1cd19ebc4a radeonsi: do per-pixel clipping based on viewport states
In other words, vport scissors are derived from viewport states.
If the scissor test is enabled, the intersection of both is used.

The guard band will disable clipping, so we have to clip per-pixel.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-08 00:23:05 +02:00
Samuel Pitoiset
059308db84 nv50/ir: do not try to attach JOIN ops to ATOM
This might result in an INVALID_OPCODE dmesg error in case a join is
attached to an atomic operation.

Spotted with arb_shader_image_load_store-host-mem-barrier on GK104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-04-07 23:10:26 +02:00
Nicolai Hähnle
2abe4f8d7d radeonsi: raise number of samplers per shader to 32
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94835
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle
9d2693f58a radeonsi: expand the compressed color and depth texture masks to 64 bits
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle
f270067ef9 radeonsi: replace magic 16 by SI_NUM_USER_SAMPLERS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:06 -05:00
Nicolai Hähnle
f09036f6c0 gallium: raise PIPE_MAX_SAMPLERS to 32
The previous value of 18 was motivated by having drivers that want to expose
16 samplers but also use some additional samplers for internal use. Raising
the value even higher isn't going to hurt that case.

On the other hand, some drivers actually use PIPE_MAX_SAMPLERS as the number
of samplers they expose externally, so raising this number above 32 is fragile
(because several places in the code use bitfields, and tracking down and
widening all of them is prone to miss some case).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 13:15:05 -05:00
Nicolai Hähnle
84c4d069ac st/glsl_to_tgsi: make samplers_used an uint32_t (v2)
It is used as a bitfield, so it seems cleaner to keep it unsigned.

The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.

v2: add an assert for bitfield size and use 1u << idx

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-04-07 13:15:05 -05:00
Nicolai Hähnle
4bfcc86bf9 tgsi/scan: add an assert for the size of the samplers_declared bitfield
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-07 13:15:05 -05:00
Nicolai Hähnle
cc39879989 draw/aaline: stronger guard against no free samplers (v2)
Line anti-aliasing will fail when there is no free sampler available. Make
the corresponding guard more robust in preparation of raising
PIPE_MAX_SAMPLERS to 32.

The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.

v2: add an assert for bitfield size and use 1u << idx

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-04-07 13:15:05 -05:00
Nicolai Hähnle
040f5cb09e util/pstipple: stronger guard against no free samplers (v2)
When hasFixedUnit is false, polygon stippling will fail when there is no free
sampler available. Make the corresponding guard more robust in preparation
of raising PIPE_MAX_SAMPLERS to 32.

The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.

v2: add an assert for bitfield size and use 1u << idx

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-04-07 13:15:02 -05:00
Brian Paul
b7e67b2337 svga: new SVGA_MSAA env var to disable/enable MSAA pixel formats
On by default.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-07 11:42:43 -06:00
Brian Paul
9f443af449 svga: add some trivial null pointer checks
These small mallocs will probably never fail, but static analysis tools
may complain about the missing checks.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-07 11:42:43 -06:00
Samuel Pitoiset
60cf2fa477 trace: add missing set_shader_images()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 18:52:27 +02:00
Marek Olšák
5fac4887d8 radeonsi: disable perfect ZPASS counts for PIPE_QUERY_OCCLUSION_PREDICATE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 13:58:01 +02:00
Marek Olšák
baa0b3f4cc radeonsi: don't use the real barrier instruction in tess ctrl shaders
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-07 13:58:01 +02:00
Michel Dänzer
715e97e342 Revert "clover: Fix build against clang SVN >= r265359"
This reverts commit 0daab9878d.

The corresponding clang change was reverted.

Trivial.
2016-04-07 17:03:09 +09:00
Jason Ekstrand
05db680248 nir/types: Add a wrapper for count_attribute_slots
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-07 09:44:11 +02:00
Kristian Høgsberg Kristensen
068935844c genxml: Add GEN6 genxml
Not used yet, but let's put it here for now.
2016-04-06 21:08:34 -07:00
Dave Airlie
828d84c8e2 r600: use radeon_emit in a few more places in evergreen_compute
This is just a cleanup of the code.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:26 +01:00
Dave Airlie
0c40b6f96c r600: make compute global buffer functions static.
This moves things around so that the global buffer handling
functions in evergreen_compute.c are static.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:22 +01:00
Dave Airlie
a5d247dda0 r600: make two compute functions static.
These aren't used outside evergreen_compute.c

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:17 +01:00
Dave Airlie
41558efa87 r600: using pipe_grid_info more in evergreen_compute.
No reason to pull the pieces apart here, also make
one of the functions static as it's unused outside this.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:13 +01:00
Dave Airlie
a6e17d7d69 r600: in evergreen_compute use ctx consistently instead of ctx_
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:09 +01:00
Dave Airlie
aeb2be3a2f r600: use rctx consistently in evergreen_compute.c
Another step towards cleaning this up.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:39:05 +01:00
Dave Airlie
0560c82ff6 r600: cleanup whitespace in evergreen_compute.c
This aligns the code with the style of the rest of the driver.

Makes editing it a lot less painful.

Acked-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 04:38:51 +01:00
Edward O'Callaghan
6fc3e7c988 GL3.txt: Mark ARB_framebuffer_no_attachments as done
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
ea310f2b38 r600g: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
483a686f80 radeonsi: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:59 +10:00
Edward O'Callaghan
1156cad405 radeonsi: Improve assert info out of si_set_framebuffer_state()
Lets give the developer a little hand if we are going to assert
on a zero literal at the end of a branch.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
bb1bd0ddd7 radeonsi: Allow 16 samples MSAA mode for PIPE_FORMAT_NONE
For ARB_framebuffer_no_attachment; A is_format_supported() query
with 'PIPE_FORMAT_NONE' passed implies a query of the number of
samples supported from the framebuffer with no attachment.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
63f2b2f2c0 softpipe: Set samples and layers in set_framebuffer_state() cb
Carries across the number of samples and layers state in the
'softpipe_set_framebuffer_state()' callback. This state is
part of 'ARB_framebuffer_no_attachments' support.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
c6a514d7df mesa/st: Update framebuffer state with no.of samples,layers
Handle the case of ARB_framebuffer_no_attachment.
Also, kill off a dead debug printf() call while we are here.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
7ff28d2af0 gallium/trace: Dump no.of samples and layers in fb state
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
0b7075fed7 gallium: Put no.of {samples,layers} into pipe_framebuffer_state
Here we store the number of samples and layers directly in the
pipe_framebuffer_state so that in the case of
ARB_framebuffer_no_attachment we may make use of them directly.

Further, we adjust various gallium/auxiliary helper functions
accordingly.

V2:
  Convert branches in util_framebuffer_get_num_layers() and
  util_framebuffer_get_num_samples() to their canonical form.

V3:
  'git stash pop' the typo fix of 'cbufs' which should be
  'nr_cbufs' that was missing in V2, woops! Thanks Marek for
  pointing this out yet again.

V4:
  Squash in the following patch:

  'gallium/util: Ensure util_framebuffer_get_num_samples() is valid'

   Upon context creation, internal driver structures are malloc()'ed
   and memset() to zero them. This results in a invalid number of
   samples 'by default'. Handle this in the simplest way to avoid
   elaborate and probably equally sub-optimial solutions.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-07 12:03:58 +10:00
Edward O'Callaghan
b512b5fd36 mesa/st: Set _NumSamples in update_framebuffer_state()
Using PIPE_FORMAT_NONE to indicate what MSAA modes are supported
with a framebuffer using no attachment.

V.2:
 Rewrite MSAA mode loop to be more general.
V.3:
 Move comment to right place after loop was rewritten.
V.4: [airlied]
 remove unneeded variable, and assert, and unneeded pipe assignment

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-07 12:02:06 +10:00
Edward O'Callaghan
2016e9ffda gallium: Obtain ARB_framebuffer_no_attachment constants
Set default values for the constants required in
ARB_framebuffer_no_attachments and obtained the number
of layers from ``PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS``.

We also obtain the MaxFramebufferSamples value using
a query back to the driver for PIPE_FORMAT_NONE.

V.1:
 Merge if branch predicates into one branch.
 Move const init into st_init_limits()

[airlied: whitespace fixup]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:56:44 +10:00
Edward O'Callaghan
4bc9130fba gallium: Add PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT
Add PIPE_CAP to determine if the GL extension
'GL_ARB_framebuffer_no_attachments' shall be
supported.

The driver is required to support 'PIPE_FORMAT_NONE'
via its 'is_format_supported()' callback in order
to determine the MSAA modes the hardware supports so
that values requested from the application using
'GL_ARB_framebuffer_no_attachments' may be quantized
to what the hardware expects.

V.2:
 Fix doc for a more detailed description of the PIPE_CAP
 and the corresponding GL constant.

V.3:
 Renamed and repurposed once again.

V.4:
 Remove CAP from cap_mapping array.

[airlied: fix damaged whitespace]

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:56:44 +10:00
Edward O'Callaghan
85f79f0c75 mesa/st: Use _mesa_geometric_ functions appropriately
Change references to gl_framebuffer::Width, Height, MaxNumLayers
and Visual::samples to use the _mesa_geometric_ convenience functions
for those places where the geometry of the gl_framebuffer is needed.
This is in contrast to the geometry of the intersection of the
attachments of the gl_framebuffer.

This patch paves the way to enable GL_ARB_framebuffer_no_attachements
for all gallium drivers.

V.2:
 Remove itermeditate variable state.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:56:35 +10:00
Edward O'Callaghan
b40375a21c mesa: Add comment to framebuffer_parameteri()
V.2:
 Change 'N.B.,' to 'NOTE:'.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-07 11:55:33 +10:00
Jason Ekstrand
c62db279b6 i965/sf_state: Pull flat_enables out of prog_data
Previously, we were walking over the shader source to figure out which
inputs should be marked flat.  Now, we can just pull it out of prog_data.
This is needed for properly setting up 3DSTATE_SF/SBE for Vulkan and it
also means that it will get properly cached.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
e61cc87c75 i965/fs: Add a flat_inputs field to prog_data
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
5c5a9b7bf6 brw/device_info: Add a helper for getting a device name
This is needed by the Vulkan driver

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
a241ab43b5 i965/fs_surface_builder: Mask signed integers after conversion
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
3921b64e63 i965/fs: Make the repclear shader support either a uniform or a flat input
In the Vulkan driver we use a single flat input instead of a uniform
because setting up push constants is more disruptive to the pipeline than
setting up another vertex input.  This uses the number of uniforms as a key
to keep it working for the GL driver.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:50 -07:00
Jason Ekstrand
061969f9dd i965: Move get_hw_prim_for_gl_prim to brw_util.c
It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in
a file that's linked into libi965_compiler.la.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:47 -07:00
Bas Nieuwenhuizen
3393358115 radeonsi: set shader calling conventions
Note that old mesa + new LLVM or new mesa + old LLVM breaks
with this change and the corresponding LLVM change (D18559).

For LLVM version <= 3.8 we use the old method, but we can't detect
people using a post 3.8 svn version that is still too old.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-06 21:54:35 +02:00
Marek Olšák
0293d72fa5 drirc: add a workaround for blackness in Warsow
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
2016-04-06 12:53:40 +02:00
Ilia Mirkin
2e123e1a25 glsl: use has_shader_storage_buffer_objects helper
Replaces open-coded logic with existing helper.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-05 20:27:32 -04:00
Timothy Arceri
5d39f03806 glsl: remove remaining tabs in link_uniform_blocks.cpp
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:33 +10:00
Timothy Arceri
7ef57aa685 mesa: remove unused IsShaderStorage field
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:28 +10:00
Timothy Arceri
f1293b2f9b glsl: fully split apart buffer block arrays
With this change we create the UBO and SSBO arrays separately from the
beginning rather than putting them into a combined array and splitting
it apart later.

A bug is with UBO and SSBO stage reference querying is also fixed as
we now use the block index to lookup the references in the separate arrays
not the combined buffer block array.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:24 +10:00
Rob Clark
506b561ba7 freedreno/ir3: insert extra move into phi
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi.  But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src.  Insert an extra mov in the source block.  If it is not
required the CP pass will take it back out again.

Fixes:

  ./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
  ./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test

and probably others.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:43 -04:00
Rob Clark
f9cdbf4405 freedreno/ir3: eliminate unnecessary absneg's
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0).  So we'd end up with things like:

   cmps.s.ge r1.x, ...
   absneg.s r1.x, (neg)r1.x
   absneg.s r1.x, (abs)r1.x
   sel.b32 r2.x, r0.x, r1.x, r0.y

The (neg) already gets collapsed due to the following (abs).  Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:25 -04:00
Michel Dänzer
0daab9878d clover: Fix build against clang SVN >= r265359
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-05 17:00:58 +00:00
Bas Nieuwenhuizen
799789ba99 radeonsi: use bounded indexing for samplers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:18 +02:00
Bas Nieuwenhuizen
713353db18 radeonsi: use bounded indexing for constant buffers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:07 +02:00
Marek Olšák
a64dbdf612 gallium/radeon: allow multiple exports of the same texture with different usage
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.

v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-05 15:32:40 +02:00
Marek Olšák
25f96d2b97 docs/relnotes: document EGL_KHR_reusable_sync 2016-04-05 15:32:40 +02:00
Dongwon Kim
70299474f5 egl: add EGL_KHR_reusable_sync to egl_dri
This patch enables an EGL extension, EGL_KHR_reusable_sync.
This new extension basically provides a way for multiple APIs or
threads to be excuted synchronously via a "reusable sync"
primitive shared by those threads/API calls.

This was implemented based on the specification at

https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_reusable_sync.txt

v2
- use thread functions defined in C11/threads.h instead of
  using direct pthread calls
- make the timeout set with reference to CLOCK_MONOTONIC
- cleaned up the way expiration time is calculated
- (bug fix) in dri2_client_wait_sync, case EGL_SYNC_CL_EVENT_KHR
  has been added.
- (bug fix) in dri2_destroy_sync, return from cond_broadcast
  call is now stored in 'err' intead of 'ret' to prevent 'ret'
  from being reset to 'EGL_FALSE' even in successful case
- corrected minor syntax problems

v3
- dri2_egl_unref_sync now became 'void' type. No more error check
  is needed for this function call as a result.
- (bug fix) resolved issue with duplicated unlocking of display in
  eglClientWaitSync when type of sync is "EGL_KHR_REUSABLE_SYNC"

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-04-05 15:24:57 +02:00
Rob Clark
3e13572826 freedreno/ir3: deal with duplicate phi sources
Otherwise we end up with funny things like:

  mov.f32f32 r0.x, r1.y
  mov.f32f32 r0.x, r1.y

(It doesn't happen as much after fixing the problem w/ CP into phi src,
but it can still happen since we aren't too clever about generating phi
sources in the first place.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
f8feb97ba5 freedreno/ir3: fix silly brain-fart in RA
We want to consider all the vars, not 1/32nd of them, when extending
live-ranges.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
8e451c2d06 freedreno/ir3: don't cp into phi's
The block defining a phi source might not have been executed.  If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
383b6e87f9 freedreno/ir3: we can't store immediate values
Fixes some transform-feedback piglits, like:

bin/ext_transform_feedback-nonflat-integral

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
d47fb856af freedreno/ir3: add dumping for use/def/live-in/live-out
Turned out to be useful to debug an issue in RA.  Let's keep it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
38ae05a340 freedreno/ir3: drop unused instr category arg
No longer used, so drop the extra arg to ir3_instr_create()

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
19739e4fb9 freedreno/ir3: remove ir3_instruction::category
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
70735643f4 freedreno/ir3: encode instruction category in opc_t
Been on my TODO list for a while.  If nothing else this will make gdb
properly grok the opc_t enum.

This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t).  Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Jason Ekstrand
5ea3647f89 i965/fs: Move the code for load/store_shared to emit_cs_intrinsic
They are compute-shader only and that's where the code for doing atomics on
shared variables lives so it seemes to make sense.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-04 15:56:50 -07:00
Jason Ekstrand
80c72a8ea7 i965/nir: Provide a default LOD for buffer textures
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures.  GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless.  This commit allows other NIR producers to be
more lazy and not provide one at all.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-04 15:56:39 -07:00
Jason Ekstrand
e5c833db5a i965/compiler: Remove a redundant declaration of brw_compiler_create 2016-04-04 14:51:35 -07:00
Kenneth Graunke
3babb7b0a4 nir: Use PRIi64 and PRIu64 instead of %ld and %lu.
%ld and %lu aren't the right format specifiers for int64_t and uint64_t
on 32-bit (x86) systems.  They're %zu on Linux and %Iu on Windows.

Use the standard C99 macros in hopes that they work everywhere.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-04 14:38:48 -07:00
Kenneth Graunke
da5d08707b i965: Fix invalid pointer read in dead_control_flow_eliminate().
There may not be a previous block.  In this case, there's no real work
to do, so just continue on to the next one.

v2: Update for bblock->prev() API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-04 14:34:40 -07:00
Kenneth Graunke
9486614938 i965: Make bblock_t::next and friends return NULL at sentinels.
The bblock_t::prev/prev_const/next/next_const API returns bblock_t
pointers, rather than exec_nodes.  So it's a bit surprising.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-04 14:34:16 -07:00
Kenneth Graunke
5509d43a11 glsl: Lower variable indexing of system value arrays unconditionally.
lower_variable_index_to_cond_assign() did not handle system values.
gl_SampleMaskIn[] is a system value, and also an array.  Accessing it
with a variable index would trigger an unreachable() assert.

Rather than adding a new EmitNoIndirectSystemValues flag, we simply
lower unconditionally.  There is exactly one case where this occurs,
and for all current drivers, lowering produces optimal code.  Even
for future drivers with 32x MSAA, it produces reasonable code.

Fixes Piglit's new samplemaskin-indirect test.  Also fixes many ES31-CTS
tests when OES_sample_variables is enabled.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 14:29:21 -07:00
Jason Ekstrand
db35a851ad i965/defines: Unconditionally define primitives 2016-04-04 14:25:36 -07:00
Jason Ekstrand
6a04968784 Merge remote-tracking branch 'public/master' into vulkan 2016-04-04 13:58:05 -07:00
Jason Ekstrand
88ef2476dc i965/peephole_ffma: Only match a mul+add if none of the ops are exact
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 13:48:10 -07:00
Jason Ekstrand
eb93d6dec8 nir/search: Don't match inexact expressions with exact subexpressions
In the first pass of implementing exact handling, I made a mistake with
search-and-replace.  In particular, we only reallly handled exact/inexact
on the root of the tree.  Instead, we need to check every node in the tree
for an exact/inexact match.  As an example of this, consider the following
GLSL code

precise float a = b + c;
if (a < 0) {
   do_stuff();
}

In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results.  The solution is to simply bail if any of the
values are exact when matching an inexact expression.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 13:48:10 -07:00
Jason Ekstrand
fe247bbe92 nir: Stop double-printing function arguments 2016-04-04 12:10:20 -07:00
Jason Ekstrand
cb317b8d07 glsl: Stop force-enabling compute shaders
This isn't needed since we no longer use the GLSL compiler in Vulkan.
2016-04-04 12:09:12 -07:00
Jason Ekstrand
4d040a4ad3 glsl/standalone: Get rid of the unneeded _mesa_error_no_memory stub
This hasn't been needed since we stopped using the GLSL compiler in the
Vulkan driver and it was tripping up scons.  Removing it fixes the scons
build.
2016-04-04 12:07:51 -07:00
Kenneth Graunke
65fbc43d54 i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888).  One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly.  Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.

When enabled, fixes 16 dEQP precision tests

   dEQP-GLES31.functional.shaders.builtin_functions.precision.
   {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware).  Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-04 11:35:16 -07:00
Jason Ekstrand
8c8157bf6f Remove more spirv2nir remnants 2016-04-04 11:24:48 -07:00
Kenneth Graunke
3aa51e02d6 i965: Allow 8x MSAA on >= 64bpp formats on Gen8+.
See commit 3b0279a69 - this restriction is documented in the "Surface
Format" field of RENDER_SURFACE_STATE.

Looking at newer documentation, this restriction appears to exist on
Haswell, but no longer applies on Gen8+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-04-04 10:41:29 -07:00
Brian Paul
1eeec7ec41 docs: remove stray 'TBD' in 11.2.0 relnotes file 2016-04-04 10:33:11 -06:00
Emil Velikov
35132c413c docs: add news item and link release notes for 11.2.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-04 12:57:56 +01:00
Emil Velikov
dc4923d41f docs: add sha256 checksums for 11.2.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e7fb889dcc)
2016-04-04 12:55:55 +01:00
Emil Velikov
7dc11ed0b2 docs: Update 11.2.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ff9ddb9eb1)
2016-04-04 12:55:54 +01:00
Dave Airlie
f9b8b48bed mesa/get: fix MAX_GEOMETRY_SHADER_STORAGE_BLOCKS
this was returning the fragment shader value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-04 10:52:25 +01:00
Ilia Mirkin
4bc3b1ca48 nvc0: add hardware ETC2 and ASTC support on GK20A and GM107+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-04 00:32:48 -04:00
Ilia Mirkin
dab40d8083 docs: add note about GL_EXT_base_instance, sort entries
Trivial.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-03 21:18:17 -04:00
Ilia Mirkin
d76e1cd2dd mesa: expose EXT_base_instance in ES3 contexts
This extension is identical to ARB_base_instance. Reuse the same
entrypoints.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 20:40:55 -04:00
Ilia Mirkin
807e2c27ac mesa: expose EXT_polygon_offset_clamp in ES contexts
The extension spec was extended to also support ES. This functionality
is provided all the way back to ES 1.0.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 20:40:55 -04:00
Kenneth Graunke
40628886ca glsl: Print "precise" on ir_variable nodes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-03 17:33:38 -07:00
Jose Fonseca
7ad49daca6 gallivm: Introduce lp_format_intrinsic.
For adding .v4f32 like suffixes to intrinsics, taking special care for
scalar case, which was being often neglected.

This fixes invalid IR when doing mipmap filtering on SSE2 (the only
case where we'd use intrinsics with scalars.)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-04 00:06:09 +01:00
Ilia Mirkin
7af12a8dc6 glsl: make *sampler2DMSArray available in ESSL 3.20
Also avoid double-adding the *sampler2DMS types when the array ext is
enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:06:52 -04:00
Ilia Mirkin
aebb0e0186 glsl: make ssbo predicate return true when in a GLSL 430 or ESSL 310 shader
I can't tell whether this actually matters, but we're creating function
signatures with this predicate, so it should probably match when SSBO's
are available.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:06:49 -04:00
Ilia Mirkin
87906cbc37 glsl: allow conservative depth qualifiers in GLSL 420
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:06:35 -04:00
Ilia Mirkin
d50ffb5e46 mesa: add always-false-for-now enables for GL 4.3, 4.4, 4.5.
As the relevant extensions get implemented, the lines should be
uncommented. I believe this is (almost) everything needed for those GL
versions though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:01:15 -04:00
Ilia Mirkin
9abbc49712 glsl: add ARB_ES3_1_compatibility support
Oddly a bunch of the features it adds are actually from ESSL 3.20. But
the spec is quite clear, oh well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:01:15 -04:00
Ilia Mirkin
1708e24f65 mesa: add ES3_1_compatibility extension enable
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-03 18:01:15 -04:00
Jose Fonseca
a293f57e13 gallivm: Use llvm.fabs.
Exactly the same code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:09:09 +01:00
Jose Fonseca
e4f01da15d gallivm: Prefer backend agnostic intrinsic for rounding.
We could unconditionally use these instrinsics, but performance with SSE2
would suck, as LLVM falls back to calling libm.

lp_test_arit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:09:07 +01:00
Jose Fonseca
324451e73f gallivm: Add debug option to force SSE2.
For simulating less capable machines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 22:08:57 +01:00
Jose Fonseca
5fa31a4aba llvmpipe: Test abs.
Trivial.
2016-04-03 11:17:20 +01:00
Jose Fonseca
522ebe701d llvmpipe: Build lp_test_arit on MSVC too.
It builds fine now.  Probably due to C99 support.

Trivial.
2016-04-03 11:17:20 +01:00
Jose Fonseca
b284f1f7f9 gallivm: Fix performance regressions due to vector selects.
LLVM often can't determine the mask elements are all ones/zeros, and
there doesn't seem to be a good way to hint that.

Thanks to Roland Scheidegger for spotting and analyzing the issue.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Jose Fonseca
11c4e5b45c gallivm: Remove lp_build_load_volatile.
No longer needed.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Jose Fonseca
bcfb86b09d gallivm: Use standard LLVMSetAlignment from LLVM 3.4 onwards.
Only provide a fallback for LLVM 3.3.

One less dependency on LLVM C++ interface.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-03 09:51:27 +01:00
Timothy Arceri
6d54096fa6 mesa: remove unrequired else
The if always returns so no need for an else.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-03 09:55:19 +10:00
Ilia Mirkin
d64134ecae gm107/ir: add OP_SELP emission, used in DSQRT lowering
The current DSQRT lowering code emits an OP_SELP, so we have to handle
its emission. This will eventually go away, but no harm supporting this
op.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-02 19:27:51 -04:00
Ilia Mirkin
3610b1466d nv50/ir: we can't load local memory directly into an output
This fixes piglit tests like

tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test

and related ones.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-04-02 18:10:20 -04:00
Christian Schmidbauer
2a529a8ac8 st/nine: specify WINAPI only for i386 and amd64
Currently mesa fails building with the x32 abi as ms_abi is not defined
in such a case.

The patch uses ms_abi only for amd64 targets and stdcall only for i386
targets to be sure that those are defined.

This patch additionally checks for __GNUC__ to guarantee that
__attribute__ is available.

CC: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Schmidbauer <ch.schmidbauer@gmail.com>
Acked-by: Axel Davy <axel.davy@ens.fr>
2016-04-02 23:30:40 +02:00
Samuel Pitoiset
0852c5703b nv50/ir: fix envyas variants when building the code lib
nvc0 and nve4 have been respectively replaced by gf100 and gk104.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-02 20:00:57 +02:00
Brian Paul
36d8fed798 svga: remove unused svga_compile_key::texture_msaa field
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Brian Paul
b283c76342 svga: check TXF instruction's target to determine MSAA
Rather than the currently bound texture.  This goes along with the
earlier patch to get away from examining bound textures and sampler
views during shader translation.

Fixes VMware bug 1632739.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Brian Paul
ef10b5427a tgsi: add simple tgsi_is_msaa_target() helper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-02 08:05:20 -06:00
Timothy Arceri
070e5a7405 glsl: rename var and simplify if
is_ubo_var is true for both UBOs and SSBOs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
0fbd073dc2 glsl: store ubo or ssbo index in block index
Previously we store the buffer block index i.e the index of a combined
ubo/ssbo list.

Fixes several dEQP-GLES31.functional tests:
- program_interface_query.uniform.block_index.block_array
- program_interface_query.uniform.block_index.named_block
- program_interface_query.uniform.block_index.unnamed_block
- program_interface_query.uniform.random.10
- program_interface_query.uniform.random.15
- program_interface_query.uniform.random.22
- program_interface_query.uniform.random.24
- program_interface_query.uniform.random.26
- program_interface_query.uniform.random.28
- program_interface_query.uniform.random.3
- program_interface_query.uniform.random.31
- program_interface_query.uniform.random.38
- program_interface_query.uniform.random.5

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94116
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
1265e1c4e1 glsl: store stage reference in gl_uniform_block
This allows us to simplify the code and drop InterfaceBlockStageIndex
which is a per stage array of integers the size of all blocks in the
program combined including duplicates across stages. Adding a stage
ref per block will use less memory.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
d8855d66f4 glsl: simplify buffer block resource limit checking
This changes the code to use the buffer counts stored for each stage
rather than counting from scratch. It also moves the checks outside
of the for loop which means we now just get a single link error
message if we go over the max rather than X error messages where X
is the number we have exceeded the max by.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
0082b33a78 glsl: simplify SSBO resources check
We already have a count of active SSBOs per stage so use it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
3e74bf5b9d glsl: split buffer block arrays earlier
This will allow us to use them when checking resources in a
following patch and clean up a bunch of code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Timothy Arceri
0163881528 glsl: only set buffer block binding once during initialisation
Since 8683d54d2b there is now a single instance of the buffer
block information that needs to be updated rather than one instance
for each stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-02 17:10:56 +11:00
Kenneth Graunke
94ed482c19 glsl: Fix prorgram interface query locations biasing for SSO.
With SSO, the GL_PROGRAM_INPUT and GL_PROGRAM_OUTPUT interfaces refer to
the first and last shader stage linked into a program.  This may not be
the vertex and fragment shader stages.

So, subtracting VERT_ATTRIB_GENERIC0 and FRAG_RESULT_DATA0 is bogus.
We need to subtract VERT_ATTRIB_GENERIC0 for VS inputs,
FRAG_RESULT_DATA0 for FS outputs, and VARYING_SLOT_VAR0 for other cases.

Note that built-in variables get a location of -1.

Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var_explicit_location
- program_input.location.separable_fragment.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 22:05:20 -07:00
Kenneth Graunke
c123294dfe glsl: Return -1 for program interface query locations in many cases.
We were recording locations for all variables, even ones without an
explicit location set.  Implement the rules from the spec, and record
-1 in the resource list accordngly.  Make program_resource_location
stop doing math on negative values.  Remove hacks that are no longer
necessary now that we've stopped doing that.

Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var
- program_input.location.separable_fragment.var_array
- program_output.location.separable_vertex.var_array
- program_output.location.separable_vertex.var_array

v2: Delete more code

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 22:05:20 -07:00
Kenneth Graunke
9fe211bec4 glsl: Consolidate gl_VertexIDMESA -> gl_VertexID query hacks.
A program will either have gl_VertexID or gl_VertexIDMESA (the lowered
zero-based version), not both.  Just spoof it in the resource list so
the hacks are done in a single place.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 22:05:20 -07:00
Kenneth Graunke
013f25c3b3 glsl: Clean up some leftover cruft.
stages is always 1 << stage now.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 22:05:20 -07:00
Kenneth Graunke
98c22c0403 glsl: Add all system variables to the input resource list.
System values are just built-in input variables that we've opted to
special-case out of convenience.  We need to consider all inputs,
regardless of how we've classified them.

Unfortunately, there's one exception: we shouldn't add gl_BaseVertex
unless ARB_shader_draw_parameters is enabled, because it doesn't
actually exist in the language, and shouldn't be counted in the
GL_ACTIVE_RESOURCES query.

Fixes dEQP-GLES31.functional.program_interface_query.program_input.
resource_list.compute.empty, which expects gl_NumWorkGroups to appear
in the resource list.

v2: Delete more code

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 22:05:18 -07:00
Kenneth Graunke
6e8b9d5bdd glsl: Delete hack for VS system values.
This makes no sense.  If the stage being considered is the vertex
shader, then we'll add inputs and system values appropriately.

If we're not considering the vertex shader, then we absolutely should
not do anything with it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 21:58:25 -07:00
Kenneth Graunke
47daf17da0 glsl: Make add_interface_variables only consider the appropriate stage.
add_interface_variables() is supposed to add variables for the inputs of
the first shader stage linked into a program, and the outputs of the
last shader stage linked into a program.

From the ARB_program_interface_query specification:

"* PROGRAM_INPUT corresponds to the set of active input variables used by
   the first shader stage of <program>.  If <program> includes multiple
   shader stages, input variables from any shader stage other than the
   first will not be enumerated.

 * PROGRAM_OUTPUT corresponds to the set of active output variables
   (section 2.14.11) used by the last shader stage of <program>.  If
   <program> includes multiple shader stages, output variables from any
   shader stage other than the last will not be enumerated."

Previously, we used build_stageref here, which walks over all linked
shaders in the program.  This meant that internal varyings would be
visible.  We don't actually need any of build_stageref's code: we
already explicitly skip packed varyings, handle modes, and the name
comparisons just do a fuzzy string comparison of name with itself.

Fixes two tests: dEQP-GLES31.functional.program_interface_query.
program_{input,output}.referenced_by.referenced_by_vertex_fragment.

These tests have a VS and FS linked together into a single program.
Both stages have an input called "shaderInput".  But the FS input
should not be visible because it isn't the first stage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 21:58:25 -07:00
Kenneth Graunke
998ef1ad71 glsl: Clarify "mask" variable in add_interface_variables().
This is a bitfield of which stages refer to a variable.  It is not used
to mask off bits.  In fact, it's used to contribute additional bits.

Rename it and tidy a bit of the logic.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 21:58:25 -07:00
Kenneth Graunke
356c99b4e7 glsl: Pass stage to add_interface_variables().
add_interface_variables is supposed to add variables from either the
first or last stage of a linked shader.  But it has no way of knowing
the stage it's being asked to process, which makes it impossible to
produce correct stagerefs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-04-01 21:58:25 -07:00
Kenneth Graunke
2c5afe1fa9 glsl: Make vertex ID lowering declare gl_BaseVertex as hidden.
If the GL_ARB_shader_draw_parameters extension is enabled, we'll already
have a gl_BaseVertex variable.  It will have var->how_declared set to
ir_var_declared_implicitly, and will appear in the program resource
list.

If not, we make one for internal use.  We don't want it to be listed
in the program resource list, as the application won't be expecting
it.  Marking it hidden will properly exclude it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 21:58:22 -07:00
Kenneth Graunke
33df1c2935 glsl: Exclude ir_var_hidden variables from the program resource list.
We occasionally generate variables internally that we want to exclude
from the program resource list, as applications won't be expecting them
to be present.

The next patch will make use of this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 21:56:43 -07:00
Kenneth Graunke
15cd3ebede mesa: Make _mesa_choose_tex_format() handle stencil textures.
This is necessary for ARB_texture_stencil8 support on classic drivers.
Presumably Gallium works because it implements its own ChooseTexFormat.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-01 19:04:28 -07:00
Jordan Justen
ef1b397b07 glsl: Don't require matching centroid qualifiers
Note: This patch appears to violate older OpenGL and OpenGLES specs.

The OpenGLES GLSL 3.1 and OpenGL GLSL 4.3 specifications both remove
the requirement for the output and input centroid qualifiers to match.

The deqp
dEQP-GLES3.functional.shaders.linkage.varying.rules.differing_interpolation_2
test wants the newer OpenGLES 3.1 specification behavior, even for
OpenGLES 3.0. This patch simply removes the checking in all cases.

The OpenGLES 3.0 conformance test suite doesn't appear to require the
older ("must match") spec behavior.

For reference, here are the relavent spec citations:

  The OpenGL 4.2 spec says: "the last active shader stage output
  variables and fragment shader input variables of the same name must
  match in type and qualification (other than out matching to in)"

  The OpenGL 4.3 spec says: "interpolation qualification (e.g., flat)
  and auxiliary qualification (e.g. centroid) may differ."

  The OpenGLES GLSL 3.00.4 specification says: "The output of the
  vertex shader and the input of the fragment shader form an
  interface. For this interface, vertex shader output variables and
  fragment shader input variables of the same name must match in type
  and qualification (other than precision and out matching to in)."

  The OpenGLES GLSL 3.10 Specification says: "interpolation
  qualification (e.g., flat) and auxiliary qualification (e.g.
  centroid) may differ"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92743
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7819
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-01 18:06:19 -07:00
Bas Nieuwenhuizen
1a5c8c24b5 gallium: distinguish between shader IR in get_compute_param
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.

The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:13 +02:00
Bas Nieuwenhuizen
be5899dcf9 gallium: add global buffer memory barrier bit
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:51:06 +02:00
Bas Nieuwenhuizen
01f993a21f gallium: add threads per block TGSI property
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:50:59 +02:00
Bas Nieuwenhuizen
ea8f4a6b13 gallium: add compute shader IR type
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-02 01:49:57 +02:00
Timothy Arceri
5ea825f556 glsl: remove tabs and fix some other style issues in glcpp-parse.y
Note there are still tabs left in the parser rules.

Acked-by: Dave Airlie <airlied@redhat.com>
2016-04-02 10:32:01 +11:00
Jason Ekstrand
cc1320220f nir/gather_info: Add an assert for supported stages 2016-04-01 15:44:43 -07:00
Jason Ekstrand
ebb0bcc11d nir: Move variable_get_io_mask back into gather_info
It used to be in nir_gather_info.c until I moved it out to nir.h so it
could be re-used with some linking code that never got merged.  We'll move
it back out if and when we have real code to share it with.
2016-04-01 15:39:48 -07:00
Jason Ekstrand
95106f6bfb Merge remote-tracking branch 'public/master' into vulkan 2016-04-01 15:16:21 -07:00
Jason Ekstrand
14c46954c9 i965: Add an implemnetation of nir_op_fquantize2f16
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-01 13:52:56 -07:00
Jason Ekstrand
de60e250f5 nir: Add an opcode for stomping a 32-bit value to 16-bit precision
This correlates directly to the SPIR-V opcode OpQuantizeToF16

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-01 13:52:28 -07:00
Samuel Pitoiset
60e1c6a7fc nvc0: enable compute shaders on GK104 and GM107+
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
71f327aa21 nvc0: bump the maximum number of UBOs for compute on Kepler
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
839a469166 nvc0/ir: do not lower shared+atomics on GM107+
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
543fb95473 nvc0/ir: add atomics support on shared memory for Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
275019d7db nvc0/ir: fix wrong pred emission for ld lock on GK104
This fixes 84b9b8f (nvc0/ir: add missing emission of locked load
predicate).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
4f58b78c30 nvc0/ir: add support for compute UBOs on Kepler
Make sure to avoid out of bounds access in presence of indirect
array indexing by loading the size from the driver constant buffer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
3b246a71d7 nvc0: add indirect compute support on Kepler
The grid size is stored as three 32-bits integers in the indirect
buffer but the launch descriptor uses a 32-bits integer for both
griddim_y and griddim_z like this (z << 16) | y. To make it work,
the 16 high bits of griddim_y are overwritten by griddim_z.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
7797d5f7d9 nvc0: reduce likelihood of collision for real buffers on Kepler
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.

Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
e2e8085fac nvc0: store ubo info to the driver constbuf on Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
12aa047c98 nvc0: bind user uniforms for compute on Kepler
Uniform buffer objects will be sticked to the driver constant buffer
like buffers because the launch descriptor only allows 8 CBs.

Input kernel parameters for OpenCL are still uploaded to screen->parm
which is bound on c0, but this will be changed later with a new series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
1828d90a00 nvc0: bind shader buffers for compute on Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Samuel Pitoiset
debd910512 nvc0: bind driver cb for compute on c7[] for Kepler
Instead of using the screen->parm buffer object which will be removed,
upload auxiliary constants to uniform_bo to be consistent regarding
what we already do for Fermi.

This breaks surfaces support (for compute only) but this will be
properly re-introduced later for ARB_shader_image_load_store.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-01 22:26:24 +02:00
Jose Fonseca
f72de6f386 gallivm: Prevent disassembly debug output from being truncated.
By using os_log_message directly, as _debug_vprintf truncates messages
to 4K.

Also cleanup the disassemble interface.

Spotted by Roland.

Trivial.
2016-04-01 21:22:42 +01:00
Rob Clark
972054f5bf compiler: random comment fixup
Just noticed this in passing..  gl_shader_stage already has tess so this
comment no longer applies.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-01 12:34:40 -04:00
Brian Paul
58557b345c docs: minor updates to license.html file
Mesa demos are no longer part of the main Mesa tree/tarball.
Add Gallium and GLX code to list of major components.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-01 09:50:08 -06:00
Mauro Rossi
e09d04cd56 radeonsi: use util_strchrnul() to fix android build error
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()

This change avoids the following building error:

external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
undefined reference to 'strchrnul'
collect2: error: ld returned 1 exit status

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:56:57 +01:00
Rob Herring
952720ccee egl: android: enable EGL_FRAMEBUFFER_TARGET_ANDROID and EGL_RECORDABLE_ANDROID
Set EGL_FRAMEBUFFER_TARGET_ANDROID and EGL_RECORDABLE_ANDROID config
attributes to true for Android. These are required in Marshmallow.

The implementation of EGL_RECORDABLE_ANDROID support has 2 options in
the definition of the extension. Android implements the 2nd option
which is the encoder must support RGB input. The requested input format
is RGB888, so setting the attribute on all the native Android visual
formats should be sufficient.

Similarly, setting EGL_FRAMEBUFFER_TARGET_ANDROID for all configs with
a EGL_NATIVE_VISUAL_ID should be sufficient. Most likely, the HWC should
support the same set of formats the underlying DRM driver supports.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:45:13 +01:00
Rob Herring
e21e81aa18 egl: Add EGL_RECORDABLE_ANDROID attribute
This is used by Android to select an eglconfig compatible with screen
recording.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: add the _eglIsConfigAttribValid check]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:45:08 +01:00
Rob Herring
8975527f58 egl: Add EGL_FRAMEBUFFER_TARGET_ANDROID attribute
This is used by Android to select an eglconfig compatible with HWComposer.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: add the _eglIsConfigAttribValid check]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:44:25 +01:00
Rob Herring
2d9e0f24e1 Android: fix x86 gallium builds
Builds with gallium enabled fail on x86 with linker error:

external/mesa3d/src/mesa/vbo/vbo_exec_array.c:127: error: undefined reference to '_mesa_uint_array_min_max'

The problem is sse_minmax.c is not included in the libmesa_st_mesa
library. Since the SSE4.1 files are needed for both libmesa_st_mesa
and libmesa_dricore, move SSE4.1 files into a separate static library
that can be used by both.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-01 13:44:22 +01:00
Jose Fonseca
cdf7c6b83d gallivm: Use vector selects on LLVM 3.3+.
This is an old patch I had around.

Vector selects seem to work well from LLVM 3.3.  Using them should
improve code quality, as it might make constant propagation pass more
effective.

Tested lp_test_*

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-01 09:05:19 +01:00
Alejandro Piñeiro
cd7d631c71 glsl: do not raise unitialized variable warnings on builtins/reserved GL variables
Needed because not all the built-in variables are marked as system
values, so they still have the mode ir_var_auto. Right now it fixes
raising the warning when gl_GlobalInvocationID and
gl_LocalInvocationIndex are used.

v2: use is_gl_identifier instead of filtering for some names (Ilia
    Mirkin)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-01 09:54:09 +02:00
Ilia Mirkin
df03be196a nv50,nvc0: add PIPE_BIND_LINEAR support to is_format_supported
vdpau has recently come to rely on this, so make sure to check it
properly.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-31 21:53:11 -04:00
Ilia Mirkin
e0e1683087 mesa: add GL_OES/EXT_draw_buffers_indexed support
This is the same ext as ARB_draw_buffers_blend (plus some core
functionality that already exists). Add the alias entrypoints.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 21:12:49 -04:00
Kenneth Graunke
a57320a9ba i965: Use brw->urb.min_vs_urb_entries instead of 32 for BLORP.
Haswell GT2 and GT3 have a minimum of 64 entries.  Hardcoding 32
is not legal.

v2: Delete stale comment (caught by Alejandro).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-31 16:45:07 -07:00
Kenneth Graunke
58d4751fa0 i965: Fix textureSize() depth value for 1 layer surfaces on Gen4-6.
According to the Sandybridge PRM's description of the resinfo message,
the .z value returned will be Depth == 0 ? 0 : Depth + 1.  The earlier
PRMs have the same table.

This means we return 0 for array textures with a single slice, when
we ought to return 1.  Just override it to max(depth, 1).

Fixes 10 dEQP-GLES3.functional tests on Sandybridge:
shaders.texture_functions.texturesize.sampler2darray_fixed_vertex
shaders.texture_functions.texturesize.sampler2darray_fixed_fragment
shaders.texture_functions.texturesize.sampler2darray_float_vertex
shaders.texture_functions.texturesize.sampler2darray_float_fragment
shaders.texture_functions.texturesize.isampler2darray_vertex
shaders.texture_functions.texturesize.isampler2darray_fragment
shaders.texture_functions.texturesize.usampler2darray_vertex
shaders.texture_functions.texturesize.usampler2darray_fragment
shaders.texture_functions.texturesize.sampler2darrayshadow_vertex
shaders.texture_functions.texturesize.sampler2darrayshadow_fragment

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-31 15:23:49 -07:00
Ian Romanick
08ff5f4d1f nir: Simplify a bcsel to logical-or
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement.  This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.

shader-db results:

Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0

total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0

Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0

total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1

No change on G45 or Ironlake.

I also tried these expressions, but none of them affected any shaders in
shader-db:

   (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
   (('bcsel', a, 'b@bool', False),    ('iand', a, b)),
   (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Ian Romanick
cdea12bf03 ptn: Fix all users of ptn_swizzle
None of the callers actually wanted what it did.  In ptn_xpd, you only
ever want a vec3 swizzle.  In ptn_tex, you want a swizzle that matches
the number of required texture coordinates.

shader-db results:

G45:
total instructions in shared programs: 4011240 -> 4010911 (-0.01%)
instructions in affected programs: 59232 -> 58903 (-0.56%)
helped: 114
HURT: 0

total cycles in shared programs: 84314194 -> 84313220 (-0.00%)
cycles in affected programs: 779150 -> 778176 (-0.13%)
helped: 110
HURT: 13

Ironlake:
total instructions in shared programs: 6397262 -> 6396605 (-0.01%)
instructions in affected programs: 117402 -> 116745 (-0.56%)
helped: 227
HURT: 0

total cycles in shared programs: 128889798 -> 128888524 (-0.00%)
cycles in affected programs: 1214644 -> 1213370 (-0.10%)
helped: 179
HURT: 44

Sandy Bridge:
total instructions in shared programs: 8467391 -> 8467384 (-0.00%)
instructions in affected programs: 3107 -> 3100 (-0.23%)
helped: 10
HURT: 6

total cycles in shared programs: 117580120 -> 117573448 (-0.01%)
cycles in affected programs: 103158 -> 96486 (-6.47%)
helped: 84
HURT: 11

Ivy Bridge:
total instructions in shared programs: 7774255 -> 7774258 (0.00%)
instructions in affected programs: 1677 -> 1680 (0.18%)
helped: 8
HURT: 6

total cycles in shared programs: 65743828 -> 65739190 (-0.01%)
cycles in affected programs: 89312 -> 84674 (-5.19%)
helped: 78
HURT: 23

Haswell:
total instructions in shared programs: 7107172 -> 7107150 (-0.00%)
instructions in affected programs: 2048 -> 2026 (-1.07%)
helped: 16
HURT: 0

total cycles in shared programs: 64653636 -> 64647486 (-0.01%)
cycles in affected programs: 86836 -> 80686 (-7.08%)
helped: 85
HURT: 17

Broadwell and Skylake:
total instructions in shared programs: 8447529 -> 8447507 (-0.00%)
instructions in affected programs: 2038 -> 2016 (-1.08%)
helped: 16
HURT: 0

total cycles in shared programs: 66418670 -> 66413416 (-0.01%)
cycles in affected programs: 90110 -> 84856 (-5.83%)
helped: 83
HURT: 20

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Ian Romanick
8bb9c6ff7f ptn: Silence unused parameter warning
The KIL instruction doesn't have a destination, so ptn_kil never uses
dest.

program/prog_to_nir.c: In function ‘ptn_kil’:
program/prog_to_nir.c:547:38: warning: unused parameter ‘dest’ [-Wunused-parameter]
 ptn_kil(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src)
                                      ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Samuel Pitoiset
d22eca5f90 tgsi: silence compiler warning in fetch_sampler_unit()
The unit variable can be used uninitialized.

Fixes: 24e77cb09 ("tgsi: handle indirect sampler arrays. (v2)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-01 07:16:24 +10:00
Samuel Pitoiset
05902a6686 tgsi: fix out of bounds access in exec_atomop()
The number of channels must be 4 for all RGBA components.

Fixes: 22d129601 ("tgsi: add support for image operations to tgsi_exec. (v2.1)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-04-01 07:15:16 +10:00
Brian Paul
9076e04934 tgsi: split tgsi_util_get_texture_coord_dim() function into two
It was kind of overloaded, returning two different things.  Now get
the index of the shadow reference src register with a new
tgsi_util_get_shadow_ref_src_index() function.

To verify the new code, I added some temp/debug code which looped
over all TGSI_TEXTURE_x values, calling the old function and new and
checking that the returned indexes matched.

Also tested piglit "shadow" tests with softpipe/llvmpipe.
No testing of ilo and radeonsi changes.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:48:00 -06:00
Brian Paul
9d7cd43988 tgsi: skip texture query opcodes when examining texture targets
Should fix the assertion in piglit
spec@arb_gpu_shader5@texturegather@fs-r-none-shadow-2d when the
TXQ instruction specifies a 2D target but the sampler view was
declared as SHADOW2D.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-31 09:47:40 -06:00
Pierre Moreau
f96a403bc3 nv50/ir: Check for valid insn instead of def size
This fixes a null pointer dereference during the register allocation pass,
if a function had arguments.

Functions arguments get a definition from the function itself, a definition
which is therefore not linked to any instruction. If a value ends up having
a definition but no linked instruction, the register allocation pass doesn't
need to consider whether that value is generated by an instruction that
can only handle "short" registers (on nv50).

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-31 10:30:29 -04:00
Ilia Mirkin
a94d8d51d7 mesa: add GL_EXT_copy_image support
The extension is identical to GL_OES_copy_image. But dEQP has tests that
want the EXT variant.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
ebdb534548 mesa: add GL_OES_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
571f538a62 mesa: remove duplicate MAX_GEOMETRY_SHADER_INVOCATIONS entry
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
2c7f5fe296 st/mesa: add ES sample-shading support
We require the full ARB_gpu_shader5 for now, but in the future some
other CAP could get exposed to indicate that only the multisample-related
behavior of ARB_gpu_shader5 is available.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
3002296cb6 mesa: add GL_OES_shader_multisample_interpolation support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
411a88accc mesa: add GL_OES_sample_shading support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
5283e81015 glsl: add GL_OES_sample_variables support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
6a8ca859f9 mesa: add OES_sample_variables to extension table, add enable bit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-30 22:57:17 -04:00
Ilia Mirkin
903640c2ac glsl: add gl_MaxSamples, new in GL 4.5 / GL ES 3.2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-30 22:57:17 -04:00
Matt Turner
4fea98991c i965: Don't add barrier deps for FB write messages.
Ken did this earlier, and this is just me reimplementing his patch a
little differently.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
3495265158 i965: Add and use is_scheduling_barrier() function. 2016-03-30 19:54:30 -07:00
Matt Turner
b4e223cfbf i965: Remove NOP insertion kludge in scheduler.
Instead of removing every instruction in add_insts_from_block(), just
move the instruction to its scheduled location. This is a step towards
doing both bottom-up and top-down scheduling without conflicts.

Note that this patch changes cycle counts for programs because it begins
including control flow instructions in the estimates.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
a607f4aa57 i965: Assert that an instruction is not inserted around itself.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
7b208a7312 i965: Relax restriction on scheduling last instruction.
I think when this code was written, basic blocks were always ended by a
control flow instruction or an end-of-thread message. That's no longer
the case, and removing this restriction actually helps things:

   instructions in affected programs: 7267 -> 7244 (-0.32%)
   helped: 4

   total cycles in shared programs: 66559580 -> 66431900 (-0.19%)
   cycles in affected programs: 28310152 -> 28182472 (-0.45%)
   helped: 9577
   HURT: 879

   GAINED: 2

The addition of the is_control_flow() checks is not a functional change,
since the add_insts_from_block() does not put them in the list of
instructions to schedule. I plan to change this in a later patch.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
f60750968c i965/vec4/tcs: Set conditional mod on TCS_OPCODE_SRC0_010_IS_ZERO.
Missing this causes an assertion failure in the scheduler with the next
patch.

Additionally, this gives cmod propagation enough information to optimize
code better.

total instructions in shared programs: 7112991 -> 7112852 (-0.00%)
instructions in affected programs: 25704 -> 25565 (-0.54%)
helped: 139

total cycles in shared programs: 64812898 -> 64810674 (-0.00%)
cycles in affected programs: 127224 -> 125000 (-1.75%)
helped: 139

Acked-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
436bdd7403 Revert "i965: Don't add barrier deps for FB write messages."
This reverts commit d0e1d6b7e2.

The change in the vec4 code is a mistake -- there's never an
FS_OPCODE_FB_WRITE in vec4 code.

The change in the fs code had the (harmless) effect of not recognizing
an FB_WRITE as a scheduling barrier even if it was marked EOT --
harmless because the scheduler marked the last instruction of a block as
a barrier, something I'm changing in the following patches.

This will be reimplemented later in the series.
2016-03-30 19:54:30 -07:00
Matt Turner
0d253ce34a i965: Simplify full scheduling-barrier conditions.
All of these were simply code for "architecture register file" (and in
the case of destinations, "not the null register").

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:30 -07:00
Matt Turner
65bc94022b i965: Remove incorrect cycle estimates.
These printed the cycle count the last basic block (sched.time is set
per basic block!). We have accurate, full program, data printed
elsewhere.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-30 19:54:29 -07:00
Dave Airlie
10b189f985 st/mesa: fix fallout from xfb changes.
Failed to update state tracker with new buffer interface.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:36:55 +10:00
Matt Turner
05ee6627d6 nir: Fix typo from commit 6702f1acde. 2016-03-30 19:18:35 -07:00
Timothy Arceri
b273958c74 docs: mark xfb_* qualifiers as DONE
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:53:08 +11:00
Timothy Arceri
c5704bb350 mesa: add query support for GL_TRANSFORM_FEEDBACK_BUFFER interface
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:53:02 +11:00
Timothy Arceri
7234be0338 glsl: add transform feedback buffers to resource list
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:57 +11:00
Timothy Arceri
9e317271d7 mesa: add support to query GL_TRANSFORM_FEEDBACK_BUFFER_INDEX
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:47 +11:00
Timothy Arceri
51142e7705 mesa: add support to query GL_OFFSET for GL_TRANSFORM_FEEDBACK_VARYING
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:43 +11:00
Timothy Arceri
047139e8a0 mesa: rename tranform feeback varying macro XFB to XFV
A latter patch will use XFB for buffers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:39 +11:00
Timothy Arceri
b77c909878 glsl: always enable transform feedback mode when xfb_stride defined
This enables in shader defined transform feedback mode even if the
only place xfb_stride is defined is on the global out.

We don't worry about xfb_buffer since Issue 22 c) in the spec says:

   "If the shader has an "xfb_buffer" qualifier identifying a buffer,
    but doesn't declare "xfb_offset" on anything associated with it,
    what happens?

    ...

    variables not qualified with "xfb_offset" are not captured, which
    makes the associated "xfb_buffer" qualifier irrelevant."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:34 +11:00
Timothy Arceri
c95e92b14d glsl: handle varyings that are not written to but have an xfb_offset
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:29 +11:00
Timothy Arceri
d5c09d40b9 glsl: when lowering named interface set assigned flag
This will be used when checking if xfb should attempt to capture
a varying.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:22 +11:00
Timothy Arceri
a2fbc5ed44 glsl: reset current stream tracker
When we move to the next buffer we need to reset the stream
so that we don't generate an error message about streams not
matching.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:17 +11:00
Timothy Arceri
f2a3c87a00 glsl: generate link error when implicit stride is to large
This moves the check until after we have done the stride
calculation and applies it to the xfb_* qualifiers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:11 +11:00
Timothy Arceri
2fab85aaea glsl: add xfb_stride link time validation
From the ARB_enhanced_layous spec:

   "It is a compile-time or link-time error to have any *xfb_offset*
    that overflows *xfb_stride*, whether stated on declarations before
    or after the *xfb_stride*, or in different compilation units.

    ...

    When no *xfb_stride* is specified for a buffer, the stride of a
    buffer will be the smallest needed to hold the variable placed at
    the highest offset, including any required padding."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:05 +11:00
Timothy Arceri
8120e869b1 glsl: validate global out xfb_stride qualifiers and set stride on empty buffers
Here we use the built-in validation in
ast_layout_expression::process_qualifier_constant() to check for mismatching
global out strides on buffers in a single shader.

From the ARB_enhanced_layouts spec:

   "While *xfb_stride* can be declared multiple times for the same buffer,
   it is a compile-time or link-time error to have different values
   specified for the stride for the same buffer."

For intrastage validation a new helper link_xfb_stride_layout_qualifiers()
is created. We also take this opportunity to make sure stride is at least
a multiple of 4, we will validate doubles at a later stage.

From the ARB_enhanced_layouts spec:

   "If the buffer is capturing any double-typed outputs, the stride must
   be a multiple of 8, otherwise it must be a multiple of 4, or a
   compile-time or link-time error results."

Finally we update store_tfeedback_info() to apply the strides to
LinkedTransformFeedback and update the buffers bitmask to mark any global
buffers with a stride as active. For example a shader with:

layout (xfb_buffer = 0, xfb_offset = 0)  out vec4 gs_fs;
layout (xfb_buffer = 1, xfb_stride = 64) out;

Is expected to have a buffer bound to both 0 and 1.

From the ARB_enhanced_layouts spec:

   "A binding point requires a bound buffer object if and only if its
   associated stride in the program object used for transform feedback
   primitive capture is non-zero."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:52:00 +11:00
Timothy Arceri
cf039a309a mesa: split transform feedback buffer into its own struct
This will be used in a following patch to implement interface
query support for TRANSFORM_FEEDBACK_BUFFER.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:52 +11:00
Timothy Arceri
258299d87a glsl: use bitmask of active xfb buffer indices
This allows us to print the correct binding point when not all
buffers declared in the shader are bound.

For example if we use a single buffer:

layout(xfb_buffer=2, offset=0) out vec4 v;

We now print '2' when the buffer is not bound rather than '0'.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:47 +11:00
Timothy Arceri
99cb5151ed glsl: sort xfb varyings in offset/buffer order
The existing transform feedback code expects to receive the list
of varyings in increasing buffer order.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:38 +11:00
Timothy Arceri
0c66460fc6 glsl: basic linking support for xfb qualifiers
This adds the initial infrastructure for enabling transform feedback
mode via in shader qualifiers and adds initial buffer support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:33 +11:00
Timothy Arceri
4305a60173 glsl: add xfb helpers and fields to the tfeedback_decl class
We also apply any array/struct offsets.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:27 +11:00
Timothy Arceri
0822517936 glsl: add helper to process xfb qualifiers during linking
This function checks for any xfb_* qualifiers which will enable
transform feedback mode and cause any API defined xfb varyings
to be ignored.

It also counts the number of varyings that have a xfb_offset
qualifier and finally it calls the create_xfb_varying_names()
helper to generate the names of varyings to be caputured.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:21 +11:00
Timothy Arceri
707fd3972f glsl: add helper to generate xfb varying names
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:17 +11:00
Timothy Arceri
8b6f8fe503 glsl: add helper for counting varyings
This will be used to get a count of the number of varying name
strings we are required to generate for use with the query api.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:06 +11:00
Timothy Arceri
ba7a7d4c39 glsl: add xfb qualifier lowering support for named blocks
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:51:01 +11:00
Timothy Arceri
4a873ef049 glsl: add xfb qualifiers to has_layout helper
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:54 +11:00
Timothy Arceri
598790e856 glsl: apply xfb_stride to implicit offsets for ifc block members
When we have an interface block like:

layout (xfb_buffer = 0, xfb_offset = 0) out Block {
                             vec4 var1;
    layout (xfb_stride = 32) vec4 var2;
                             vec4 var3;
};

We take into account the stride of var2 when calculating the offset
for var3.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:49 +11:00
Timothy Arceri
04a72e6e57 glsl: add xfb_stride compile time rules
From the ARB_enhanced_layouts spec:

   "The *xfb_stride* qualifier specifies how many bytes are consumed
   by each captured vertex.  It applies to the transform feedback
   buffer for that declaration, whether it is inherited or explicitly
   declared. It can be applied to variables, blocks, block members,
   or just the qualifier out.  If the buffer is capturing any
   double-typed outputs, the stride must be a multiple of 8, otherwise
   it must be a multiple of 4, or a compile-time or link-time error
   results.

   ...

   The resulting stride (implicit or explicit) must be less than or
   equal to the implementation-dependent constant
   gl_MaxTransformFeedbackInterleavedComponents."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:44 +11:00
Timothy Arceri
edddad0eee glsl: add xfb_offset compile time rules
We also copy the qualifier values to the IR in this step.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:39 +11:00
Timothy Arceri
f6a8c7ef21 glsl: add xfb_buffer compile time rules
Also copies the qualifier values to GLSL IR.

From the ARB_enhanced_layouts spec:

    "The *xfb_buffer* qualifier can be applied to the qualifier out,
    to output variables, to output blocks, and to output block
    members.  Shaders in the transform  feedback capturing mode have
    an initial global default of

        layout(xfb_buffer = 0) out;

    This default can be changed by declaring a different buffer with
    xfb_buffer on the interface qualifier out.  This is the only way
    the global default can be changed.  When a variable or output block
    is declared without an  xfb_buffer qualifier, it inherits the global
    default buffer.  When a variable or output block is declared with an
    xfb_buffer qualifier, it has that declared buffer.  All members of a
    block inherit the block's buffer.  A  member is allowed to declare
    an xfb_buffer, but it must match the buffer inherited from its
    block, or a compile-time error results.

    The *xfb_buffer* qualifier follows the same conventions, behavior,
    defaults, and inheritance rules as the qualifier stream, and the
    examples for stream apply here as well.  This includes a block's
    inheritance of the current global default buffer, a block member's
    inheritance of  the block's buffer, and the requirement that any
    *xfb_buffer* declared on a block member must match the buffer
    inherited from the block.

    ...

    It is a compile-time error to specify an *xfb_buffer* that is
    greater than  the implementation-dependent constant
    gl_MaxTransformFeedbackBuffers."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:34 +11:00
Timothy Arceri
04d2f770c8 glsl: add field to track if xfb_buffer is an explicit or implicit value
Since any of the xfb_* qualifiers trigger the shader to be in
transform feedback mode we need an extra field to track if
the xfb_buffer on interface members was set explicitly since
xfb_buffer will always have a default value.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:29 +11:00
Timothy Arceri
733f1b2a55 glsl: add xfb_* qualifiers to glsl_struct_field
These will be used to hold qualifier values for interface and
struct members.

Support is added to the struct/interface constructors to copy these
fields upon creation.

We also update record_compare() to ensure we don't reuse a glsl_type
with the wrong xfb_* qualifier values.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:19 +11:00
Timothy Arceri
2dbcecb7a9 glsl: add IR fields for transform feedback layout qualifiers
Adds xfb_buffer/stride fields and adds comment to offset field
which is reused for xfb_offset.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:13 +11:00
Timothy Arceri
5c2516fc33 glsl: add validation for out layout qualifiers
This adds validation for all qualifiers as allowed by the
table in Section 4.4 (Layout Qualifiers) of the GLSL 4.5 spec.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:08 +11:00
Timothy Arceri
7b407fecec glsl: relax stage restrictions on layout defaults for outputs
The new xfb_buffer and xfb_stride global qualifiers are allowed in
geom, tess and vertex stages.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:04 +11:00
Timothy Arceri
c9afd94af6 glsl: parse new transform feedback layout qualifiers
We reuse the existing offset field for holding the xfb_offset
expression but create a new flag as to avoid hitting the rules
for the offset qualifier for UBOs.

xfb_buffer qualifiers require extra processing when merging as
they can be applied to global out defaults. We just apply the
same rules as we do for the stream qualifier as the spec says:

   "The *xfb_buffer* qualifier follows the same conventions,
    behavior, defaults, and inheritance rules as the qualifier
    stream, and the examples for stream apply here as well."

For xfb_stride we push everything into a global out field for
later processing as xfb_stride applies to the entire buffer.
We still need to have a separate field to store per variable
strides because they can still effect implicit offsets
e.g. when applied to block members with implicit offsets.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:50:00 +11:00
Timothy Arceri
13f6c788eb glsl: move process_qualifier_constant() to ast_type.cpp
We will make use of this function being here in the following patch.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:49:55 +11:00
Timothy Arceri
52caeee7e7 glsl: add transform feedback built-in constants
These are new built-ins added by ARB_enhanced_layouts.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:49:51 +11:00
Timothy Arceri
8765a9e0fe glsl: generate named interface block names correctly
Firstly this updates the named interface lowering pass to store the
interface without the arrays removed.

Note we need to remove the arrays in the interface/varying matching
code to not regress things but in future this should be fixed
futher as it would seem we currently successfully match interface
blocks with differnt array sizes.

Since we now know if the interface was an array we can reduce the
IR flags from_named_ifc_block_array and from_named_ifc_block_nonarray
to just from_named_ifc_block.

Next rather than having a different code path for named interface
blocks in program_resource_visitor we just make use of the one used
by UBOs this allows us to now handle arrays of arrays correctly.

Finally we add a new param to the recursion function
named_ifc_member this is because we only want to process a single
member at a time. Note that this is also the glsl_struct_field
from the original ifc type before lowering rather than the type
from the lowered variable. This fixes a bug in Mesa where we would
generate the names like WithInstArray[0].g[0][0] when it should be
WithInstArray[0].g[0] for the following interface.

   out WithInstArray {
      float g[3];
   } instArray[2];

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:49:47 +11:00
Timothy Arceri
7ebc3deaad glsl: Fix segfault when lhs is error_type in TCS
It seems expected that both lhs and rhs could be of type error_type
in this code however the TCS case wasn't expecting it.

Fixes segfault in an enhanced layouts GL CTS test.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-31 12:49:42 +11:00
Dave Airlie
c9367c13ca docs: update softpipe status for shader_image_load_store.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:30 +10:00
Dave Airlie
eb9ad9faa3 softpipe: add image support to softpipe (v3)
This adds support for ARB_shader_image_load_store to softpipe.

v2: add RESQ support (Ilia)
v3: constify, cleanup internals, add some comments (Brian).

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:16 +10:00
Dave Airlie
0d1f679ded draw: add support for passing images to vs/gs shaders.
This just adds support for passing through images to the
tgsi execution stage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:11 +10:00
Dave Airlie
22d1296013 tgsi: add support for image operations to tgsi_exec. (v2.1)
This adds support for load/store/atomic operations on images
along with image tracking support.

v2: add RESQ support. (Ilia)
v2.1: constify interface (Brian)
split get_image_coord_dim (Brian)

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:14:05 +10:00
Dave Airlie
493eab7679 softpipe: add support for explicit early depth testing
ARB_shader_image_load_store adds support for explicit early
depth testing. However we need to make sure we don't overwrite
values using the shader written values in this case.

This fixes early depth testing in softpipe to conform with
those requirements.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:54 +10:00
Dave Airlie
827393b76f tgsi: introduce NonHelperMask
This is a mask of which of the current 2x2 grid are non-helper
invocations. This allows us to mask off the helper invocations
later for the image operations.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:50 +10:00
Dave Airlie
ca180c09bb tgsi_exec: handle execmask when doing indirect lookups
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:46 +10:00
Dave Airlie
1ff4cc0535 tgsi_exec: add support for up to 3 address registers (v2)
v2: be consistent with other definitions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-31 09:13:08 +10:00
Matt Turner
6702f1acde nir: Propagate negates up multiplication chains.
total instructions in shared programs: 7112159 -> 7088092 (-0.34%)
instructions in affected programs: 1374915 -> 1350848 (-1.75%)
helped: 7392
HURT: 621

GAINED: 2
LOST:   2
2016-03-30 13:12:34 -07:00
Matt Turner
a74fc3fe8a i965: Don't inline intel_batchbuffer_require_space().
It's called by the inline intel_batchbuffer_begin() function which
itself is used in BEGIN_BATCH. So in sequence of code emitting multiple
packets, we have inlined this ~200 byte function multiple times. Making
it an out-of-line function presumably improved icache usage.

Improves performance of Gl32Batch7 by 3.39898% +/- 0.358674% (n=155) on
Ivybridge.

Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2016-03-30 13:12:34 -07:00
Christian König
1faca438bd r600: ignore PIPE_BIND_LINEAR in *_is_format_supported
Similar to radeonsi linear layout should work for all not compressed
or depth/stencil formats. Fixes issues with VDPAU on r600.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-03-30 20:00:27 +02:00
Thomas Hindoe Paaboel Andersen
9a73f5728e st/vdpau: correct null check
The null check of result was the wrong way around. Also, move memset
and dereference of result after the null check.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-03-30 20:00:27 +02:00
Brian Paul
4541a78502 docs: remove docs/COPYING which contains GPL license
There hasn't been GPL code in Mesa for a long time now.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-30 11:38:51 -06:00
Samuel Pitoiset
bb37886f75 glsl: add missing types for buffer images
Type of GLSL_SAMPLER_DIM_BUF can be sampler or image.

Spotted while trying to run dEQP tests related to
ARB_shader_image_load_store.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-30 19:01:33 +02:00
Lars Hamre
6773128bbf glsl: invalidate float suffixes for GLSL 1.10 and GLSL ES 1.00
Float suffixes are not allowed in GLSL 1.10 nor GLSL ES 1.00.

Fixes the following piglit tests:
tests/spec/glsl-1.10/compiler/literals/invalid-float-suffix-capital-f.vert
tests/spec/glsl-1.10/compiler/literals/invalid-float-suffix-f.vert`

v2: modify error message
v3: parse the float instead of returning an ERROR_TOK
v4: (by Ken) Change to is_version(120, 300) to avoid breaking ES3
    shaders; update commit message accordingly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81585
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-29 21:26:34 -07:00
Jason Ekstrand
cf2257069c nir/spirv: Set a default number of invocations for geometry shaders
The SPIR-V spec says geometry shaders are supposed to have one invocation
by default.  The execution mode is only required if there are multiple
invocations.
2016-03-29 20:30:27 -07:00
Roland Scheidegger
2d3b8aefda tgsi: (trivial) only verify target for is_tex instructions
d3d10 state tracker does not encode (valid) target (only offsets are
really used from the texture bits), since that information always comes
from the sview dcl, and not the instruction (note the meaning of target
is actually slightly different between gl and d3d10 in any case, because
d3d10 target does never include shadow bit).
Also move the msaa sampler identification as well - would need to set that
on the sview not sampler, so while this does not fix it make it at least
obvious it won't work with sample instructions.
2016-03-30 04:26:54 +02:00
Ilia Mirkin
553e37aa33 mesa: allow mutable buffer textures to back GL ES images
Since there is no way to create immutable texture buffers in GL ES,
mutable buffer textures are allowed to back images. See issue 7 of the
GL_OES_texture_buffer specification.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-29 21:41:03 -04:00
Brian Paul
513384d7e8 mesa: make _mesa_prepare_mipmap_level() static
No longer called from any other file.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-29 18:13:46 -06:00
Brian Paul
ed39de90f1 meta: use _mesa_prepare_mipmap_levels()
The prepare_mipmap_level() wrapper for _mesa_prepare_mipmap_level() is
not needed.  It only served to undo the GL_TEXTURE_1D_ARRAY height/depth
change was was made before the call to prepare_mipmap_level()

Said another way, regardless of how the meta code manipulates the height/
depth dims for GL_TEXTURE_1D_ARRAY, the gl_texture_image dimensions are
correctly set up by _mesa_prepare_mipmap_levels().

Tested by plugging _mesa_meta_GenerateMipmap() into the swrast driver
and testing with piglit.

v2 (idr): Early out of the mipmap generation loop with dstImage is NULL.
This can occur for immutable textures that have a limited range of
levels or in the presense of memory allocation failures.  Fixes
arb_texture_view-mipgen on Intel platforms.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
bab0752a80 docs: add HTTP link for Mesa downloads
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92628
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-29 18:13:46 -06:00
Brian Paul
5c85c3be26 tgsi: simplify tgsi_shader_info::is_msaa_sampler checking
We assert that fullinst->Instruction.Texture != 0 above so no need to
check it in the conditional.  We also have the fullinst->Texture.Texture
value in a local variable, so use it.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
86e1768c13 tgsi: collect texture sampler target info in tgsi_scan_shader()
Texture sample instructions specify a sampler unit and texture target
such as "1D", "2D", "CUBE", etc.  Sampler view declarations also specify
the sampler unit and texture target.

This patch checks that the texture instructions agree with the declarations
and collects the texture target type for each sampler unit.

v2: only compare instruction's texture target to the sampler view declaration
target if the instruction is a TEX instruction, not a SAMPLE instruction.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
6775268b61 gallium/docs: s/gven/given/ 2016-03-29 18:13:46 -06:00
Brian Paul
75b713455c xlib: add support for GLX_ARB_create_context
This adds the glXCreateContextAttribsARB() function for the xlib/swrast
driver.  This allows more piglit tests to run with this driver.

For example, without this patch we get:
$ bin/fbo-generatemipmap-1d -auto
piglit: error: waffle_config_choose failed due to WAFFLE_ERROR_UNSUPPORTED_
ON_PLATFORM: GLX_ARB_create_context is required in order to request an OpenGL
version not equal to the default value 1.0
piglit: error: Failed to create waffle_config for OpenGL 2.0 Compatibility Context
piglit: info: Failed to create any GL context
PIGLIT: {"result": "skip" }

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:45 -06:00
Brian Paul
d8d029f22b st/mesa: simplify st_generate_mipmap()
The whole st_generate_mipmap() function was overly complicated.  Now
we just call the new _mesa_prepare_mipmap_levels() function to prepare
the texture mipmap memory, then call the generate function which fills
in the texture images.

This fixes a failed assertion in llvmpipe/softpipe which is hit with the
new piglit generatemipmap-base-change test.  Also fixes some device errors
(format mismatches) with the VMware svga driver.

v2: fix a comment typo, per Sinclair

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:45 -06:00
Brian Paul
105fe52784 mesa: new _mesa_prepare_mipmap_levels() function for mipmap generation
Simplifies the loops in generate_mipmap_uncompressed() and
generate_mipmap_compressed().  Will be used in the state tracker too.
Could probably be used in the meta code.  If so, some additional
clean-ups can be done after that.

v2: use unsigned types instead of GLuint, per Ian

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-29 18:13:45 -06:00
Kenneth Graunke
d4a5a61d44 i965: Don't use CUBE wrap modes for integer formats on IVB/BYT.
There is no linear filtering for integer formats, so we should always
be using CLAMP_TO_EDGE mode.

Fixes 46 dEQP cases on Ivybridge (which were likely broken by commit
0faf26e6a0).

This workaround doesn't appear to be necessary on any other hardware;
I haven't found any documentation mentioning errata in this area.

v2: Only apply on Ivybridge/Baytrail to avoid regressing GLES3.1 tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
2016-03-29 15:43:18 -07:00
Kenneth Graunke
f8c69fbb54 Revert "i965: Set address rounding bits for GL_NEAREST filtering as well."
This reverts commit 60d6a8989a.

It's pretty sketchy, and apparently regressed a bunch of dEQP tests
on Sandybridge.
2016-03-29 15:35:07 -07:00
Rovanion Luckey
7087e0ab27 gallium: Format code in pb_buffer_fenced.c according to style guide.
This is a tiny housekeeping patch which does the following:

  * Replaced tabs with three spaces.
  * Formatted oneline and multiline code comments. Some doxygen
    comments weren't marked as such and some code comments were marked
    as doxygen comments.
  * Spaces between if- and while-statements and their parenthesis.

According to the mesa coding style guidelines.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:44:11 -06:00
Charmaine Lee
2d8df0306b svga: emit sampler declarations in the helper function for non vgpu10
With commit dc9ecf58c0,
we are now getting the sampler target from the sampler view
declaration. But since a sampler view declaration can be defined
after a sampler declaration, we need to emit the
sampler declarations in the pre-helpers function, otherwise,
the sampler target might not have defined yet for the sampler declaration.

Fixes viewperf maya-03 and various gl trace regressions in hwv11.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:09 -06:00
Brian Paul
96e0894106 svga: avoid freeing non-malloced memory
svga_shader_expand() will fall back to using non-malloced memory for
emit.buf if malloc fails. We should check if the memory is malloced
before freeing it in the error path of svga_tgsi_vgpu9_translate.

Original patch by Thomas Hindoe Paaboel Andersen <phomes@gmail.com>.
Remove trivial svga_destroy_shader_emitter() function, by BrianP.

Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:08 -06:00
Samuel Pitoiset
9d57c84994 nvc0/ir: move load/store lowering pass to handleLDST()
Having all this code in a big switch is not really a good pratice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 19:55:51 +02:00
Christian König
cc68dc2b5e st/mesa: implement new DMA-buf based VDPAU interop v2
Avoid using internal structures from another API.

v2: rebase and moved includes so they don't cause problem when VDPAU isn't installed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:22 +02:00
Christian König
bdeb22b7b6 st/vdpau: implement the new DMA-buf based interop v2
That should allow us to get away from passing internal structures around.

v2: rebased

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:18 +02:00
Christian König
0042aa508e st/vdpau: move FormatRGBAToPipe into the interop
We are going to need that in the Mesa state tracker as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:14 +02:00
Christian König
faba96bc60 st/vdpau: add new interop interface
Use DMA-buf for the VDPAU interop interface instead of using
internal structures.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:10 +02:00
Christian König
d180de3532 st/vdpau: use linear layout for output surfaces
Works around a bug in radeonsi and tiling is actually
not very beneficial in this use case.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:28:43 +02:00
Christian König
7eb5e5b8b4 radeonsi: ignore PIPE_BIND_LINEAR in si_is_format_supported v2
Linear layout should work for all not compressed or depth/stencil formats.

v2: restrict it a bit more

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-29 17:28:35 +02:00
Ilia Mirkin
9286cbdd1e st/mesa: enable OES_texture_buffer when all components available
OES_texture_buffer combines bits from a number of desktop extensions.
When they're all available, turn it on.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-29 10:15:21 -04:00
Adam Jackson
5e1aec6db0 glapi/glx: Mark the indirect swapped dispatch functions _X_COLD
A modest size savings:

   text	   data	    bss	    dec	    hex	filename
 264143	  15608	    232	 279983	  445af libglx.so.before
 254303	  15608	    232	 270143	  41f3f libglx.so.after

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-29 10:10:57 -04:00
Adam Jackson
ea0f62e45e glapi/glx: Sync some additional error checking from xserver
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-29 10:10:57 -04:00
Jordan Justen
f56f538ce4 anv/gen7: Fix command parser version test with indirect dispatch
Caught-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 22:30:33 -07:00
Alejandro Piñeiro
dcd41ca87a glsl: raise warning when using uninitialized variables
v2:
 * Take into account out varyings too (Timothy Arceri)
 * Fix style (Timothy Arceri)
 * Use a new ast_expression variable, instead of an
   ast_expression::hir new parameter (Timothy Arceri)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94129

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-29 07:28:57 +02:00
Alejandro Piñeiro
8568d02498 glsl: add is_lhs bool on ast_expression
Useful to know if a expression is the recipient of an assignment
or not, that would be used to (for example) raise warnings of
"use of uninitialized variable" without getting a false positive
when assigning first a variable.

By default the value is false, and it is assigned to true on
the following cases:
 * The lhs assignments subexpression
 * At ast_array_index, on the array itself.
 * While handling the method on an array, to avoid the warning
   calling array.length
 * When computed the cached test expression at test_to_hir, to
   avoid a duplicate warning on the test expression of a switch.

set_is_lhs setter is added, because in some cases (like ast_field_selection)
the value need to be propagated on the expression tree. To avoid doing the
propatagion if not needed, it skips if no primary_expression.identifier is
available.

v2: use a new bool on ast_expression, instead of a new parameter
    on ast_expression::hir (Timothy Arceri)

v3: fix style and some typos on comments, initialize is_lhs default value
    on constructor, to avoid a c++11 feature (Ian Romanick)

v4: some tweaks on comments (Timothy Arceri)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94129

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-29 07:28:57 +02:00
Jason Ekstrand
35e2e96b30 nir: Add a helper for getting the current block from a cursor
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
be98c47528 nir/lower_out_to_temp: Add an "entrypoint" parameter
Previously, the pass assumed that the entrypoint would be whatever function
happened to have the name "main".  We really shouldn't trust in the
function names.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
31a5bec93f nir/lower_out_to_temp: Steal the output's constant initializer
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
38de85f9a5 nir: Add a helper for getting the unique function in a shader
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
49be812be6 nir/sweep: Sweep function parameters
They are no longer in the list of local variables so we need to explicitly
sweep them.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
1be4c61c95 nir/builder: Add a helper for creating undefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
6a2479d618 nir/builder: Add a helper for storing to variable derefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
77e2ac1da7 nir/builder: Add a helper for building fdot instructions
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
da422663a6 nir: Add a variable_foreach_safe helper
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
731870fbe3 nir/Makefile: Fix alphabetization
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Ilia Mirkin
b4c0c514b1 mesa: add OES_texture_buffer and EXT_texture_buffer support
Allow ES 3.1 contexts to access the texture buffer functionality.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:29:29 -04:00
Ilia Mirkin
720670a615 glsl: add OES_texture_buffer and EXT_texture_buffer support
Expose the samplerBuffer/imageBuffer types, and allow the various
functions to operate on them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:20:49 -04:00
Ilia Mirkin
74b76c08a3 mesa: add OES_texture_buffer and EXT_texture_buffer extension to table
We need to add a new bit since the GL ES exts require functionality from
a combination of texture buffer extensions as well as images (for
imageBuffer) support. Additionally, not all GPUs support all the texture
buffer functionality (e.g. rgb32 isn't supported by nv50).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:19:14 -04:00
Ilia Mirkin
659beca666 mesa: properly return GetTexLevelParameter queries for buffer textures
This fixes all failures with dEQP tests in this area. While
ARB_texture_buffer_object explicitly says that GetTexLevelParameter & co
should not be supported, GL 3.1 reverses this decision and allows all of
these queries there.

Conversely, there is no text that forbids the buffer-specific queries
from being used with non-buffer images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-28 20:18:46 -04:00
Kenneth Graunke
4ed4a2af86 glsl: Delete initialized field from uniform storage test.
Timothy deleted this field.  Fixes "make check".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-28 17:02:00 -07:00
Jordan Justen
8dbfa265a4 anv/gen7: DispatchIndirect requires cmd parser 5
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
1a3adae84a anv/gen7: Save kernel command parser version
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
f60683b32a anv: Invalidate state cache before L3 partitioning set-up.
Port 10d84ba9f0 to anv.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
5879cb0251 anv: Fix cache pollution race during L3 partitioning set-up.
Port 0aa4f99f56 to anv.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Timothy Arceri
86d87d1047 mesa: remove initialized field from uniform storage
The only place this was used was in a gallium debug function that
had to be manually enabled.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 09:59:03 +11:00
Samuel Pitoiset
b8b3af2932 nvc0: use a different offset for buffers and surfaces
To not overwrite buffers and surfaces information, we need to use
a different offset in the driver constant buffer. Currently, OP_SUQ
is only supported for buffers but this will be slightly updated for
images support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 00:47:28 +02:00
Kenneth Graunke
60d6a8989a i965: Set address rounding bits for GL_NEAREST filtering as well.
Yuanhan Liu decided these were useful for linear filtering in
commit 76669381 (circa 2011).  Prior to that, we never set them;
it seems he tried to preserve that behavior for nearest filtering.

It turns out they're useful for nearest filtering, too: setting
these fixes the following dEQP-GLES3 tests:

functional.fbo.blit.rect.nearest_consistency_mag
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_min
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_y

Apparently, BLORP has always set these bits unconditionally.

However, setting them unconditionally appears to regress tests using
texture projection, 3D samplers, integer formats, and vertex shaders,
all in combination, such as:

functional.shaders.texture_functions.textureprojlod.isampler3d_vertex

Setting them on Gen4-5 appears to regress Piglit's
tests/spec/arb_sampler_objects/framebufferblit.

Honestly, it looks like the real problem here is a lack of precision.
I'm just hacking around problems here (as embarassing as it is).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 15:28:58 -07:00
Kenneth Graunke
0faf26e6a0 i965: Always use BRW_TEXCOORDMODE_CUBE when seamless filtering.
When using seamless cube map mode and NEAREST filtering, we explicitly
overrode the wrap modes to CLAMP_TO_EDGE.  This was to implement the
following spec text:

   "If NEAREST filtering is done within a miplevel, always apply apply
    wrap mode CLAMP_TO_EDGE."

However, textureGather() ignores the sampler's filtering mode, and
instead returns the four pixels that would be blended by LINEAR
filtering.  This implies that we should do proper seamless filtering,
and include pixels from adjacent cube faces.

It turns out that we can simply delete the NEAREST -> CLAMP_TO_EDGE
overrides.  Normal cube map sampling works by first selecting the
face, and then nearest filtering fetches the closest texel.  If the
nearest texel was on a different face, then that face would have been
chosen.  So it should always be within the face anyway, which
effectively performs CLAMP_TO_EDGE.

Fixes 86 dEQP-GLES31.texture.gather.basic.cube.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-28 15:25:04 -07:00
Kenneth Graunke
72473658c5 i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs.
Our driver uses the brw_render_cache mechanism to track buffers we've
rendered to and are about to sample from.

Previously, we did a single PIPE_CONTROL with the following bits set:
- Render Target Flush
- Depth Cache Flush
- Texture Cache Invalidate
- VF Cache Invalidate
- Instruction Cache Invalidate
- CS Stall

This combined both "top of pipe" invalidations and "bottom of pipe"
flushes, which isn't how the hardware is intended to be programmed.

The "top of pipe" invalidations may happen right away, without any
guarantees that rendering using those caches has completed.  That
rendering may continue altering the caches.  The "bottom of pipe"
flushes do wait for the rendering to complete.  The CS stall also
prevents further work from happening until data is flushed out.

What we wanted to do was wait for rendering complete, flush the new
data out of the render and depth caches, wait, then invalidate any
stale data in read-only caches.  We can accomplish this by doing the
"bottom of pipe" flushes with a CS stall, then the "top of pipe"
flushes as a second PIPE_CONTROL.  The flushes will wait until the
rendering is complete, and the CS stall will prevent the second
PIPE_CONTROL with the invalidations from executing until the first
is done.

Fixes dEQP-GLES3.functional.texture.specification.teximage2d_pbo
subtests on Braswell and Skylake.  These tests hit the meta PBO
texture upload path, which binds the PBO as a texture and samples
from it, while rendering to the destination texture.  The tests
then sample from the texture.

For now, we leave Gen4-5 alone.  It probably needs work too, but
apparently it hasn't even been setting the (G45+) TC invalidation
bit at all...

v2: Add Sandybridge post-sync non-zero workaround, for safety.

Cc: mesa-stable@lists.freedesktop.org
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-28 15:23:56 -07:00
Kenneth Graunke
de505f7d7b i965: Whack UAV bit when FS discards and there are no color writes.
dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no
framebuffer attachments, using a shader that discards based on
gl_FragCoord.  It uses occlusion queries to inspect whether pixels
are rendered or not.

Unfortunately, the hardware is not dispatching any pixel shaders,
so discards never happen, and the full quad of pixels increments
PS_DEPTH_COUNT, making the occlusion query results bogus.

To understand why, we have to delve into the WM_INT internal
signalling mechanism's formulas.

The "WM_INT::Pixel Shader Kill Pixel" signal is defined as:

    3DSTATE_WM::ForceKillPixel == ON ||
    (3DSTATE_WM::ForceKillPixel != Off &&
     !WM_INT::WM_HZ_OP &&
     3DSTATE_WM::EDSC_Mode != PREPS &&
     (WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) &&
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     (3DSTATE_PS_EXTRA::PixelShaderKillsPixels ||
      3DSTATE_PS_EXTRA:: oMask Present to RenderTarget ||
      3DSTATE_PS_BLEND::AlphaToCoverageEnable ||
      3DSTATE_PS_BLEND::AlphaTestEnable ||
      3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable))

Because there is no depth or stencil buffer, writes to those buffers
are disabled.  So the highlighted condition is false, making the whole
"Kill Pixel" condition false.  This then feeds into the following
"WM_INT::ThreadDispatchEnable" condition:

    3DSTATE_WM::ForceThreadDispatch != OFF &&
    !WM_INT::WM_HZ_OP &&
    3DSTATE_PS_EXTRA::PixelShaderValid &&
    (3DSTATE_PS_EXTRA::PixelShaderHasUAV ||
     WM_INT::Pixel Shader Kill Pixel ||
     WM_INT::RTIndependentRasterizationEnable ||
     (!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT &&
      3DSTATE_PS_BLEND::HasWriteableRT) ||
     (WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF &&
      (WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) ||
     (3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test Enable) ||
     (3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable ||
                                     WM_INT::Depth Write Enable ||
                                     WM_INT::Stencil Test Enable)))

Given that there's no depth/stencil testing, no writeable render target,
and the hardware thinks kill pixel doesn't happen, all of these
conditions are false.  We have to whack some bit to make PS invocations
happen.  There are many options.

Curro suggested using the UAV bit.  There's some precedence in doing
that - we set it for fragment shaders that do SSBO/image/atomic writes
when no color buffer writes are enabled.  We can simply include discard
here too.

Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests.

v2: Add a comment suggested and written by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-28 14:36:47 -07:00
Jason Ekstrand
433cf90650 nir/spirv: Remove the NoContraction hack
NIR now just handles this for us by not fusing if the multiply is marked as
exact.
2016-03-28 13:07:39 -07:00
Jason Ekstrand
5d9afb65a6 i965/peephole_ffma: Only match a mul+add if none of the ops are exact 2016-03-28 13:07:39 -07:00
Jason Ekstrand
035f66025b nir/search: Don't match inexact expressions with exact subexpressions
In the first pass of implementing exact handling, I made a mistake with
search-and-replace.  In particular, we only reallly handled exact/inexact
on the root of the tree.  Instead, we need to check every node in the tree
for an exact/inexact match.  As an example of this, consider the following
GLSL code

precise float a = b + c;
if (a < 0) {
   do_stuff();
}

In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results.  The solution is to simply bail if any of the
values are exact when matching an inexact expression.
2016-03-28 13:07:39 -07:00
Rhys Kidd
668b6ddfc5 vc4: Remove unused include from vc4_nir_lower_txf_ms.c
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-03-28 11:51:11 -07:00
Adam Jackson
2b8492d63e glapi/glx: Treat xserver generated targets as .PHONY
Meaning, always rebuild them when asked instead of bothering to look at
timestamps (and then wondering why nothing happened when you said make).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-28 14:37:12 -04:00
Adam Jackson
c2f0bc2537 glapi/glx: Thunk non-ABI calls through GetProcAddress
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-28 14:37:12 -04:00
Adam Jackson
ce3f0b23d1 glapi/glx: Emit direct GL calls instead of dispatch lookup
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-28 14:28:51 -04:00
Adam Jackson
c0a9cbea4d glx: Unbreak generating some of the xorg glx headers
Broken by:

    commit 9ace0b5422
    Author: Dylan Baker <baker.dylan.c@gmail.com>
    Date:   Wed May 20 15:49:11 2015 -0700

	glapi: glX_proto_size.py: use argparse instead of getopt

Which changed most, but not all, callers to use --header-tag instead of
-h.

Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-28 14:28:36 -04:00
Bas Nieuwenhuizen
dd5f0950e4 mesa/st: Fix NULL access if no fragment shader is bound
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-28 18:02:07 +02:00
Rob Clark
b4c72b792c freedreno/ir3: fix for load_front_face intrinsic
Seems like trying to widen in the same instruction as the add.s does a
non-sign-extending widen.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-28 10:19:53 -04:00
Rob Clark
3ca034cada freedreno/ir3: fix compiler warn
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-28 10:19:09 -04:00
Ilia Mirkin
b9f1affb2e nvc0: make sure to disable fetches from previously-set VBOs when blitting
We disable the vertex attributes, but also disable the VBO fetch details
as well, just in case. Not known to fix anything.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-28 08:36:34 -04:00
Ilia Mirkin
41100b6b44 nvc0: disable primitive restart and index bias during blits
Back in the dawn of time, we used to do immediate uploads for the vertex
data, and all was well. However Maxwell dropped support for immediate
vertex data, so we started feeding in a VBO (in all cases). But we
forgot to disable some things that apply in such cases, specifically
primitive restart and index bias. The latter was causing WoW and other
Blizzard games trouble as they use a pattern where they draw with a base
vertex (aka index bias), followed by texture uploads (aka blits,
internally).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91526
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <nouveau@karolherbst.de>
2016-03-28 08:35:38 -04:00
Ilia Mirkin
f667d15561 nvc0/ir: fix picking of coordinates from tex instruction for textureGrad
On Fermi, there's an argument in front of the coords that combines array
and indirect handle, while on Kepler the array and the indirect handle
are separate (and in front of the coords). We were previously only
accounting for the array bit of it, if there were an indirect access it
wouldn't be counted in the formula.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-28 08:35:38 -04:00
Ilia Mirkin
6711f159d9 nv50/ir: saturate depth writes
Apparently there's no post-FS clamping logic, so we have to do this by
hand. The depth will never be outside of the 0..1 range, even on
floating point zeta buffers, so this should be safe.

Fixes dEQP-GLES3.functional.fbo.depth.*clamp.* which tests writing
invalid values on various zeta buffer formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-28 08:35:38 -04:00
Marek Olšák
6262d6125a gallium/util: fix up inaccurate behavior of util_framebuffer_state_equal (v2)
v2: move the nr_cbufs check above the loop

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
2016-03-28 00:46:23 +02:00
Marek Olšák
21c479256a st/mesa: only minify height if target != 1D array in st_finalize_texture
The st_texture_object documentation says:
  "the number of 1D array layers will be in height0"

We can't minify that.

Spotted by luck. No app is known to hit this issue.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-28 00:44:45 +02:00
Miklós Máté
50d653c2bb mesa: optimize out the realloc from glCopyTexImagexD()
v2: comment about the purpose of the code
v3: also compare texFormat,
 add a perf debug message,
 formatting fixes

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Miklós Máté
baab345b19 st/mesa: fix handling the fallback texture
This fixes crash when post-processing is enabled in SW:KotOR.

v2: fix const-ness
v3: move assignment into the if() block

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Miklós Máté
920fbecf57 st/mesa: enable GL_ATI_fragment_shader
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Miklós Máté
dee274477f st/mesa: implement GL_ATI_fragment_shader
v2: fix arithmetic for special opcodes,
 fix fog state, cleanup
v3: simplify handling of special opcodes,
 fix rebinding with different textargets or fog equation,
 lots of formatting fixes
v4: adapt to the compile early, fix later architecture,
 formatting fixes

Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Miklós Máté
d71c1e9e54 program: add ATI_fragment_shader to shader stages list
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Miklós Máté
e2d5a6fac5 mesa: optionally associate a gl_program to ATI_fragment_shader
the state tracker will use it

Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 19:58:33 +02:00
Edward O'Callaghan
11bd53933e gallium/p_context.h: Make comment more readable
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 18:03:04 +02:00
Edward O'Callaghan
2df141087a mesa/st: Remove GLSLVersion clamping
While here, remove itermediate glsl_feature_level variable.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 18:00:36 +02:00
Edward O'Callaghan
ca22d2f1fd radeon/r600: Fix return type in failure branch
Commit `d4e847ea` introduced a warning about making an
integer from a pointer without a cast, fix it here.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 18:00:35 +02:00
Edward O'Callaghan
1fb05a9a0c radeon/r600_query.c: Minor style fix
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-27 18:00:35 +02:00
Dave Airlie
fc3b000fef virgl: drop next shader property for now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-26 17:50:32 +10:00
Jason Ekstrand
6d658e9bd5 i965: Allow mul+add fusing again 2016-03-25 21:35:41 -07:00
Jason Ekstrand
fbb9e1f008 spirv/alu: Add support for the NoContraction decoration 2016-03-25 21:35:41 -07:00
Jason Ekstrand
00fa795cd3 spirv/glsl: Add a helper for converting glsl opcodes into nir opcodes
This is similar to the way that regular ALU operations are handled.
2016-03-25 21:35:41 -07:00
Jason Ekstrand
98522c1853 nir/spirv: Get rid of the spirv2nir helper binary
This was useful once upon a time but now that we have a real Vulkan driver
to run our SPIR-V binaries through, there's really no point.
2016-03-25 21:35:41 -07:00
Nanley Chery
0e82896a11 anv/blit2d: Add a function to create an ImageView
This function differs from the open-coded implementation in that the
ImageView's width is determined by the caller and is not unconditionally
set to match the number of texels within the surface's pitch.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-25 17:33:50 -07:00
Nanley Chery
4eab37d6cd anv/image: Enable specifying a surface's minimum pitch
This is required to create multiple, horizontally adjacent, max-width
images from one blit2d surface. This is also required for more accurate
width specification of surfaces within a larger surface (which is seen
as the smaller surface's enclosing region).

Note that anv_image_create_info::stride has been unused since commit,
b369389640 .

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-25 17:33:40 -07:00
Timothy Arceri
8683d54d2b glsl: reduce buffer block duplication
This reduces some of the craziness required for handling buffer
blocks. The problem is each shader stage holds its own information
about a block in memory, we were copying that information to a
program wide list but the per stage information remained meaning
when a binding was updated we needed to update all versions of it.

This changes the per stage blocks to instead point to a single
version of the block information in the program list.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-26 09:26:30 +11:00
Jason Ekstrand
38250a9ca3 i965/vec4: Get rid of a stray predicate inverse in opquantizef16
This fixes 30 opquantize CTS tests on HSW
2016-03-25 14:37:37 -07:00
Jason Ekstrand
13bad493b4 nir/algebraic: Get rid of a redundant copy of fdiv lowering 2016-03-25 14:04:05 -07:00
Jason Ekstrand
08fe89864b nir/algebraic: Add better lowering of ldexp 2016-03-25 14:04:05 -07:00
Jason Ekstrand
b75d770963 nir/builder: Simplify nir_ssa_undef a bit 2016-03-25 14:04:05 -07:00
Jason Ekstrand
ab31951bef nir/spirv: Use the nir_ssa_undef helper from nir_builder 2016-03-25 14:04:05 -07:00
Jason Ekstrand
d2eee52a65 nir/builder: Add a bit size field to nir_ssa_undef 2016-03-25 14:04:05 -07:00
Jason Ekstrand
b50f7f0011 nir: Add a better comment for INTRINSIC_RANGE 2016-03-25 14:04:05 -07:00
Jason Ekstrand
add8c837b5 nir/glsl: Stop carying a pointer to the nir_shader in the visitor 2016-03-25 14:04:05 -07:00
Brian Paul
a8e5edaadf st/xa: emit sampler view declarations in shaders
Fixes recent regressions with the VMware gallium driver.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2016-03-25 14:53:59 -06:00
Tim Rowley
74a04840e5 swr: [rasterizer jitter] Fix MASKLOADD AVX prototype (float -> i32) 2016-03-25 14:45:40 -05:00
Tim Rowley
93c1a2dedf swr: [rasterizer core] NUMA optimizations...
- Affinitize hot-tile memory to specific NUMA nodes.
- Only do BE work for macrotiles assoicated with the numa node
2016-03-25 14:45:40 -05:00
Tim Rowley
090be2e434 swr: [rasterizer jitter] Fix logic bug for alpha-to-coverage. 2016-03-25 14:45:40 -05:00
Tim Rowley
0767e820fd swr: [rasterizer core] Fix Compute workitem retirement 2016-03-25 14:45:40 -05:00
Tim Rowley
813e89c0cc swr: [rasterizer core] Cleanup state ring arena after last draw that references it completes
Rather than waiting for the API thread to re-use it.
2016-03-25 14:45:40 -05:00
Tim Rowley
83822d7ed5 swr: [rasterizer jitter] add missing include for llvm jitevents 2016-03-25 14:45:40 -05:00
Tim Rowley
51549912d1 swr: [rasterizer core] Reduce Arena blocksize to 128KB (from 1MB).
With global allocator this doesn't seem to affect performance at all.
Overall memory consumption drops by up to 85%.
2016-03-25 14:45:40 -05:00
Tim Rowley
ed5b953919 swr: [rasterizer core] One last pass at Arena optimizations 2016-03-25 14:45:40 -05:00
Tim Rowley
ee6be9e92d swr: [rasterizer core] CachedArena optimizations
Reduce list traversal during Alloc and Free.

Add ability to have multiple lists based on alloc size (not used for now)
2016-03-25 14:45:39 -05:00
Tim Rowley
68314b6769 swr: [rasterizer jitter] support llvm-svn 2016-03-25 14:45:39 -05:00
Tim Rowley
ec9d4c4b37 swr: [rasterizer core] Globally cache allocated arena blocks for fast re-allocation. 2016-03-25 14:45:39 -05:00
Tim Rowley
12ce9d9aa1 swr: [rasterizer] more arena work 2016-03-25 14:45:39 -05:00
Tim Rowley
4893224e28 swr: [rasterizer core] Add clipping against user clip distances in the NullPS backend. 2016-03-25 14:45:39 -05:00
Tim Rowley
700a5b06e0 swr: [rasterizer core] Arena optimizations - preparing for global allocator. 2016-03-25 14:45:39 -05:00
Tim Rowley
5899076b6b swr: [rasterizer core] Reset DrawContext arena at end of draw rather than upon reclaim of DC
Keeps overall memory consumption lower.
Also, remove unused knobs.
2016-03-25 14:45:39 -05:00
Tim Rowley
7390418441 swr: [rasterizer core] Add clipping of user clip planes in clipper. 2016-03-25 14:45:39 -05:00
Tim Rowley
4b4547a721 swr: [rasterizer] Reduce max in-flight draws to 96 (by default) 2016-03-25 14:45:39 -05:00
Tim Rowley
9111d63228 swr: [rasterizer] Fix run-time check asserts
One innocuous (uninitialized variable), and one not so innocuous
(stack corruption).
2016-03-25 14:45:39 -05:00
Tim Rowley
257db3610a swr: [rasterizer jitter] signed immediate builder 2016-03-25 14:45:39 -05:00
Tim Rowley
b958aea78a swr: [rasterizer common] changes for cygwin 2016-03-25 14:45:39 -05:00
Tim Rowley
e1222ade00 swr: [rasterizer] code styling and update copyrights 2016-03-25 14:45:14 -05:00
Tim Rowley
c75314ec67 swr: [rasterizer core] Guard against enquing work to invalid hot tiles 2016-03-25 14:43:15 -05:00
Tim Rowley
fee56fda6f swr: [rasterizer] Stop setting viewport size to larger than hottile array
Guard against enquing work to invalid tiles
2016-03-25 14:43:14 -05:00
Tim Rowley
e374d2d24b swr: [rasterizer] Discard work + misc fixes 2016-03-25 14:43:14 -05:00
Tim Rowley
542d7dec7b swr: [rasterizer] remove use of BYTE type 2016-03-25 14:43:14 -05:00
Tim Rowley
be4c558d01 swr: [rasterizer core] Fix crash that can occur when switching contexts 2016-03-25 14:43:14 -05:00
Tim Rowley
51a11658d9 swr: [rasterizer] remove unused knob 2016-03-25 14:43:14 -05:00
Tim Rowley
61beaa2279 swr: [rasterizer core] subcontext rework 2016-03-25 14:43:14 -05:00
Tim Rowley
0c18900cfb swr: [rasterizer common] add _simd_s[rl]lv_epi32 2016-03-25 14:43:14 -05:00
Tim Rowley
bef222db22 swr: [rasterizer core] Alleviate potential stack overflow for 32bit builds
Move large stack allocations in the GS and clipper into thread local storage.
2016-03-25 14:43:14 -05:00
Tim Rowley
3132f731f8 swr: [rasterizer] remove use of UCHAR and UINT64 types 2016-03-25 14:43:14 -05:00
Tim Rowley
643857f596 swr: [rasterizer] remove use of FLOAT type 2016-03-25 14:43:14 -05:00
Tim Rowley
3252fe3705 swr: [rasterizer] Fix Coverity issues reported by Mesa developers. 2016-03-25 14:43:14 -05:00
Tim Rowley
45d52673c2 swr: [rasterizer] add debug/perf category to knobs 2016-03-25 14:43:13 -05:00
Tim Rowley
1da9c8a970 swr: [rasterizer core] don't assume linux is 64-bit 2016-03-25 14:43:13 -05:00
Tim Rowley
49678803f7 swr: [rasterizer common] remove old unused win32 types 2016-03-25 14:43:13 -05:00
Tim Rowley
aca5513184 swr: [rasterizer jitter] vpermps support 2016-03-25 14:43:13 -05:00
Tim Rowley
bfb954189e swr: [rasterizer] Add rdtsc buckets support for shaders
Pass pointer to core buckets mgr back to sim layer.

Add support for RDTSC_START/RDTSC_STOP macros in the builder.

Each unique shader now has a unique bucket associated with it,
enabling more detailed reporting at the shader level. Currently
due to some llvm issue with thread local storage, 64bit runs require
single threaded mode.
2016-03-25 14:43:13 -05:00
Tim Rowley
abd4aa68cc swr: [rasterizer core] backend reorganization 2016-03-25 14:43:13 -05:00
Tim Rowley
13303f3320 swr: [rasterizer core] store blend output in temporary instead of PS output.
Fixes additive blend problem with MSAA
2016-03-25 14:26:17 -05:00
Tim Rowley
3f4fba3772 swr: [rasterizer core] Move InitializeHotTiles and corresponding clear code out of threads.cpp. 2016-03-25 14:26:17 -05:00
Tim Rowley
bdd690dc36 swr: [rasterizer jitter] Cleanup use of types inside of Builder.
Also, cached the simd width since we don't have to keep querying
the JitManager for it.
2016-03-25 14:26:17 -05:00
Tim Rowley
7ead4959a5 swr: [rasterizer jitter] Fix type mismatch on select args for SCATTERPS 2016-03-25 14:26:17 -05:00
Tim Rowley
136988b42b swr: [rasterizer core] fix rasterizing multisampling with scissor enabled
We were not evaluating the scissor edge equations at sample positions.
2016-03-25 14:26:17 -05:00
Tim Rowley
45f0ce168c swr: [rasterizer core] RingBuffer class for DC/DS
Use head/tail ring buffer indices for thread synchronization.

1. SwrWaitForIdle loops until ring is empty. (head == tail)
2. GetDrawContext waits until ring is not full. (head - tail) == Ring Size
3. Draw enqueues by incrementing head.
4. Last worker thread to move past a DC dequeues by incrementing tail.

Todo: To reduce contention we can cache the tail in the API thread. For
example, if you know you have 64 free entries in the ring then you don't
need to keep checking the tail until you used those 64 entries.
2016-03-25 14:26:17 -05:00
Tim Rowley
dd0f9eed8c swr: [rasterizer] switch assert uses to SWR_ASSERT 2016-03-25 14:26:16 -05:00
Tim Rowley
45a4afa634 swr: [rasterizer core] Split all RECT_LIST draws into 1 RECT per draw
Needed until proper RECT_LIST PrimAssembly code is written.
2016-03-25 14:26:16 -05:00
Tim Rowley
3a25185990 swr: [rasterizer] Add string knob type 2016-03-25 14:26:16 -05:00
Jordan Justen
8f3c236674 anv: Use genxml register support for L3 Cache config
The programming of the L3 Cache registers should match the previous
manually packed LRI values.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-25 00:19:18 -07:00
Jordan Justen
7a03fb9ccb genxml: Add L3 Cache Control register definitions
Based on intel_reg.h (5912da45a6)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 23:49:53 -07:00
Jordan Justen
d353ba8f5f anv: Add genxml register support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 23:49:53 -07:00
Jordan Justen
b332013a56 genxml: Add register support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 23:46:59 -07:00
Sonny Jiang
f00c840578 radeonsi: add Polaris PCI IDs
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (Polaris10)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (Polaris11)
2016-03-24 23:08:12 -04:00
Sonny Jiang
f87ed903fb radeon/vce: disable two pipe mode for Polaris11
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-24 23:08:04 -04:00
Sonny Jiang
0c5477465f radeon/vce: add Polaris11 VCE firmware support
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
2016-03-24 23:07:53 -04:00
Sonny Jiang
42e442d888 radeonsi: add support for Polaris (v2)
v2: Polaris chips should be defined after Stoney

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> (v1)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Signed-off-by: Leo Liu <leo.liu@amd.com> (v2 diff)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v2 diff)
2016-03-24 23:07:32 -04:00
Sonny Jiang
f5e24b19e8 winsys/amdgpu: addrlib - add Polaris support (v2)
v2: fix indentation as noted by Michel

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-24 23:06:39 -04:00
Jason Ekstrand
2c3f95d6aa Merge remote-tracking branch 'public/master' into vulkan 2016-03-24 17:30:14 -07:00
Kenneth Graunke
511ce2925b mesa: Check glReadBuffer enums against the ES3 table.
From the ES 3.2 spec, section 16.1.1 (Selecting Buffers for Reading):

   "An INVALID_ENUM error is generated if src is not BACK or one of
    the values from table 15.5."

Table 15.5 contains NONE and COLOR_ATTACHMENTi.

Mesa properly returned INVALID_ENUM for unknown enums, but it decided
what was known by using read_buffer_enum_to_index, which handles all
enums in every API.  So enums that were valid in GL were making it
past the "valid enum" check.  Such targets would then be classified
as unsupported, and we'd raise INVALID_OPERATION, but that's technically
the wrong error code.

Fixes dEQP-GLES31's
functional.debug.negative_coverage.get_error.buffer.read_buffer

v2: Only call read_buffer_enuM_to_index when required (Eduardo).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-24 16:52:08 -07:00
Nanley Chery
a5dc3c0f02 anv: Sanitize Image extents and offsets
Prepare Image extents and offsets for internal consumption by assigning
the default values implicitly defned by the spec. Fixes textures on
several Vulkan demos in which the VkImageCopy depth is set to zero when
copying a 2D image.

v2 (Jason Ekstrand):
   Replace "prep" with "sanitize"
   Make function static inline
   Pass structs instead of pointers

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2016-03-24 16:15:00 -07:00
Jason Ekstrand
22b343a8ec nir: Add a pass to inline functions
This commit adds a new NIR pass that lowers all function calls away by
inlining the functions.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
debf23ec68 nir/builder: Add helpers for easily inserting copy_var intrinsics
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
79dec93ead nir: Add return lowering pass
This commit adds a NIR pass for lowering away returns in functions.  If the
return is in a loop, it is lowered to a break.  If it is not in a loop,
it's lowered away by moving/deleting code as needed.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
8d61d72524 nir: Add a cursor helper for getting a cursor after any phi nodes
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
18b0166749 nir/builder: Add a helper for inserting jump instructions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
97b663481c nir/cf: Make extracting or re-inserting nothing a no-op
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
7022a673cd nir: Add a function for comparing cursors
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
124f229ece nir/cf: Handle relinking top-level blocks
This can happen if a function ends in a return instruction and you remove
the return.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
364212f1ed nir: Add a pass to repair SSA form
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
ea98d415e4 nir/vars_to_ssa: Use the new nir_phi_builder helper
The efficiency should be approximately the same.  We do a little more work
per phi node because we have to sort the predecessors.  However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.

As a side-benifit, the phi builder actually handles unreachable blocks
correctly.  The original vars_to_ssa code, because of the way it iterated
the blocks and added phi sources, didn't add sources corresponding to
predecessors of unreachable blocks.  The new strategy employed by the phi
builder creates a phi source for each predecessor and should correctly
handle unreachable blocks by setting those sources to SSA undefs.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
42ddfc611f nir/dominance: Handle unreachable blocks
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:

loop {
   if (...) {
      break;
   } else {
      break;
   }
}

In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this.  First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable.  Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
e4dc82cfcf nir: Add a phi node placement helper
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis.  Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.

v2: Add better documentation

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand
9a41d94731 util/bitset: Allow iterating over const bitsets
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Rob Clark
61c7d20e4f ttn: remove stray global from header
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-24 16:04:54 -04:00
Samuel Pitoiset
b9c70fcdad nv50/ir: silence unhandled TGSI_PROPERTY_NEXT_SHADER info
radeonsi uses this property to make the best decision about which
shader to compile, but this is not currently used by our codegen.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-24 18:53:24 +01:00
Kenneth Graunke
d1bb1df87e mesa: Handle negative length in glPushDebugGroup().
The KHR_debug spec doesn't actually say we should handle this, but that
is most likely an oversight - it says to check against strlen and
generate errors if length is negative.  It appears they just forgot to
explicitly spell out that we should then proceed to actually handle it.

Fixes crashes from uncaught std::string exceptions in many
dEQP-GLES31.functional.debug.error_filters.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-24 10:47:50 -07:00
Kenneth Graunke
028459a00d mesa: Make glDebugMessageInsert deal with negative length for all types.
From the KHR_debug spec, section 5.5.5 (Externally Generated Messages):

   "If <length> is negative, it is implied that <buf> contains a null
    terminated string. The error INVALID_VALUE will be generated if the
    number of characters in <buf>, excluding the null terminator when
    <length> is negative, is not less than the value of
    MAX_DEBUG_MESSAGE_LENGTH."

This indicates that length should be set to strlen for all types, not
just GL_DEBUG_TYPE_MARKER.  We want it to be after validate_length()
so we still generate appropriate errors.

Fixes crashes from uncaught std::string exceptions in many
dEQP-GLES31.functional.debug.error_filters.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-24 10:47:45 -07:00
Kenneth Graunke
412e686da9 mesa: Include null terminator in GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH.
From the KHR_debug spec:
"Applications can query the number of messages currently in the log by
 obtaining the value of DEBUG_LOGGED_MESSAGES, and the string length
 (including its null terminator) of the oldest message in the log
 through the value of DEBUG_NEXT_LOGGED_MESSAGE_LENGTH."

Because we weren't including the null terminator, many dEQP tests
called glGetDebugMessageLog with a bufSize parameter that was 1 too
small, and unable to contain the message, so we skipped returning it,
failing many cases.

Fixes 298 dEQP-GLES31.functional.debug.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-24 10:47:29 -07:00
Nicolai Hähnle
6b763c026d st/mesa: use RGBA instead of BGRA for SRGB_ALPHA
This fixes a regression introduced by commit a8eea696 "st/mesa: honour sized
internal formats in st_choose_format (v2)".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94657
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94671
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-24 12:23:31 -05:00
Nicolai Hähnle
7880b81d39 radeonsi: silence a coverity warning
The following Coverity warning

5378     	tmpl.fetch_args = atomic_fetch_args;
5379     	tmpl.emit = atomic_emit;
>>>     CID 1357115:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "tmpl". Field "tmpl.intr_name" is uninitialized.
5380     	bld_base->op_actions[TGSI_OPCODE_ATOMUADD] = tmpl;
5381     	bld_base->op_actions[TGSI_OPCODE_ATOMUADD].intr_name = "add";

... is a false positive, but what the hell. This change should "fix" it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-24 12:23:14 -05:00
Bas Nieuwenhuizen
f96309753b mesa: replace gl_context->Multisample._Enabled with _mesa_is_multisample_enabled.
This removes any dependency on driver validation of the number of
framebuffer samples.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-03-24 08:36:43 -06:00
Rob Clark
0bea0e7141 nir: fix dangling ssadef->name ptrs
In many places, the convention is to pass an existing ssadef name ptr
when construction/initializing a new nir_ssa_def.  But that goes badly
(as noticed by garbage in nir_print output) when the original string
gets freed.

Just use ralloc_strdup() instead, and add ralloc_free() in the two
places that would care (not that the strings wouldn't eventually get
freed anyways).

Also fixup the nir_search code which was directly setting ssadef->name
to use the parent instruction as memctx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-24 08:30:04 -04:00
Jason Ekstrand
4e060d80ff glsl: Add propagate_invariance to the other makefile
This fixes the scons build
2016-03-23 21:12:44 -07:00
Jason Ekstrand
a984e44abd nir/glsl: Propagate invariant into NIR alu ops
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:07 -07:00
Jason Ekstrand
028d6ecfe0 glsl/rebalance_tree: Don't handle invariant or precise trees
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:07 -07:00
Jason Ekstrand
b2209b2333 glsl/opt_algebraic: Don't handle invariant or precise trees
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:07 -07:00
Jason Ekstrand
89b604922d glsl: Add a pass to propagate the "invariant" and "precise" qualifiers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand
91d6272c2b nir/alu_to_scalar: Propagate the "exact" bit
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand
865e83b9ec i965/peephole_ffma: Don't fuse exact adds
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand
5f39e3e165 nir/cse: Properly handle nir_ssa_def.exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand
0dbda153aa nir/algebraic: Flag inexact optimizations
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe.  This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact".  It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:02 -07:00
Jason Ekstrand
ed3a029e80 nir/algebraic: Fix fmin detection to match the spec
The previous transformation got the arguments to fmin backwards.  When NaNs
are involved, the GLSL min/max aren't commutative so it matters.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:00 -07:00
Jason Ekstrand
89545b1314 nir/algebraic: Get rid of an invlid fxor optimization
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:58 -07:00
Jason Ekstrand
3a7cb6534c nir/algebraic: Allow for flagging operations as being inexact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:55 -07:00
Jason Ekstrand
a6f25fa7d7 nir/search: Propagate exactness into newly created expressions
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:52 -07:00
Jason Ekstrand
ded3133d47 nir/builder: Add a flag for setting exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand
4ff89377d9 nir: Add an "exact" bit to nir_alu_instr
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand
f849f53990 nir/clone: Export nir_variable_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:11 -07:00
Jason Ekstrand
5fe8959912 nir/clone: Expose nir_constant_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:08 -07:00
Jason Ekstrand
c4c373f156 nir: Fix whitespace
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:25:53 -07:00
Brian Paul
9a6da49371 docs: use latest libDRM version
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-23 12:56:32 -06:00
Lars Hamre
43c6f3f82f compiler/glsl: allow sequence op as a const expr in gles 1.0
Allow the sequence operator to be a constant expression in GLSL ES
versions prior to GLSL ES 3.0

Fixes the following piglit test:
/all/spec/glsl-es-1.0/compiler/array-sized-by-sequence-in-parenthesis.vert

This is similar to the logic from process_initializer() which performs
the same check for constant variable initialization with sequence
operators.

v2: Fixed regression pointed out by Eduardo Lima Mitev

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-23 18:13:26 +01:00
Nicolai Hähnle
c4931ae174 radeonsi: fix out-of-bounds indexing of shader images
Results are undefined but may not crash. Without this change, out-of-bounds
indexing can lead to VM faults and GPU hangs.

Constant buffers, samplers, and possibly others will eventually need similar
treatment to support GL_ARB_robust_buffer_access_behavior.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-23 11:49:53 -05:00
Nicolai Hähnle
a8f5d11426 radeonsi: cache flush/invalidation for missing PIPE_BARRIER_*_BUFFER bits (v2)
This fixes arb_shader_image_load_store-host-mem-barrier.

v2: flush TC L2 for index buffers on <= CIK (Marek)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-23 11:48:19 -05:00
Nicolai Hähnle
fc94bc2986 st/mesa: add missing MemoryBarrier bits and some explanations
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-23 11:48:15 -05:00
Nicolai Hähnle
b15b1faefd gallium: add PIPE_BARRIER_STREAMOUT_BUFFER
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-23 11:48:02 -05:00
Marek Olšák
b8ec205515 radeonsi: fix 2D array MSAA failures since image support landed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-23 12:14:15 +01:00
Jason Ekstrand
9881eab197 i965/fs: Don't constant-fold RCP
No shader-db changes on Broadwell

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 16:46:15 -07:00
Jason Ekstrand
01425c45b3 i965: Remove the RCP+RSQ algebraic optimizations
NIR already has this optimization and it can do much better than the little
peephole in the backend.

No shader-db change on Haswell or Broadwell.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 16:46:15 -07:00
Jason Ekstrand
20417b2cb0 anv/device: Advertise version 1.0.5
Nothing substantial has changed since 1.0.2
2016-03-22 16:21:23 -07:00
Jason Ekstrand
204d937ac2 anv/device: Ignore the patch portion of the requested API version
Fixes dEQP-VK.api.device_init.create_instance_name_version

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94661
2016-03-22 16:20:45 -07:00
Jason Ekstrand
4844723405 anv: Don't assert-fail if someone asks for a non-existent entrypoint 2016-03-22 16:11:53 -07:00
Jason Ekstrand
8dd86e8aa7 Update to the latest Vulkan header from Khronos 2016-03-22 16:06:53 -07:00
Ian Romanick
d7a25a9def nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick
2bb006af68 nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.

G4X
total instructions in shared programs: 4012279 -> 4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0

total cycles in shared programs: 84315862 -> 84313584 (-0.00%)
cycles in affected programs: 1767232 -> 1764954 (-0.13%)
helped: 274
HURT: 81

Ironlake
total instructions in shared programs: 6399073 -> 6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0

total cycles in shared programs: 128892088 -> 128888810 (-0.00%)
cycles in affected programs: 2867452 -> 2864174 (-0.11%)
helped: 422
HURT: 137

Sandy Bridge
total instructions in shared programs: 8462174 -> 8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0

total cycles in shared programs: 117542276 -> 117534098 (-0.01%)
cycles in affected programs: 1239166 -> 1230988 (-0.66%)
helped: 369
HURT: 150

Ivy Bridge
total instructions in shared programs: 7775131 -> 7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0

total cycles in shared programs: 65759882 -> 65747268 (-0.02%)
cycles in affected programs: 1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141

Haswell
total instructions in shared programs: 7107786 -> 7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0

total cycles in shared programs: 64668028 -> 64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149

LOST:   2
GAINED: 0

Broadwell
total instructions in shared programs: 8980029 -> 8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0

total cycles in shared programs: 70070448 -> 70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111

LOST:   2
GAINED: 0

Skylake
total instructions in shared programs: 9115178 -> 9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0

total cycles in shared programs: 68848660 -> 68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116

LOST:   2
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick
348e5a71d8 nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs: 8462180 -> 8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0

total cycles in shared programs: 117542462 -> 117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0

Broadwell / Skylake
total instructions in shared programs: 8980833 -> 8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0

total cycles in shared programs: 70077900 -> 70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0

G45 and Ironlake showed no change.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:47:56 -07:00
Ian Romanick
564a8b8a26 nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.

Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0

total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0

No changes on earlier platforms.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:44:57 -07:00
Ian Romanick
bf0d60aa11 nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:

                 vec1 ssa_200 = flt ssa_199, ssa_194
                 vec1 ssa_201 = b2i ssa_200
                 vec1 ssa_202 = i2b -ssa_201

shader-db results:

Sandy Bridge
total instructions in shared programs: 8462257 -> 8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0

total cycles in shared programs: 117542934 -> 117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1

Ivy Bridge
total instructions in shared programs: 7775252 -> 7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0

total cycles in shared programs: 65760522 -> 65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2

Haswell
total instructions in shared programs: 7108666 -> 7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0

total cycles in shared programs: 64675726 -> 64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1

Broadwell / Skylake
total instructions in shared programs: 8980912 -> 8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0

total cycles in shared programs: 70077926 -> 70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6

G45 and Ironlake showed no change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:43:28 -07:00
Ian Romanick
a4079f1cb2 nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse.  On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1

total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53

Unsurprisingly, no changes on later platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Ian Romanick
9442db4f89 i965: Have NIR lower flrp on pre-GEN6 vec4 backend
Previously we were doing the lowering by hand in vec4_visitor::emit_lrp.
By doing it in NIR, we have the opportunity for NIR to do additional
optimization of the expanded code.

This also enables optimizations added by the next commit.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4024401 -> 4016538 (-0.20%)
instructions in affected programs: 447686 -> 439823 (-1.76%)
helped: 2623
HURT: 0

total cycles in shared programs: 84375846 -> 84328296 (-0.06%)
cycles in affected programs: 16964960 -> 16917410 (-0.28%)
helped: 2556
HURT: 41

Unsurprisingly, no changes on later platforms.

v2: Formatting and comment changes suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Brian Paul
18c5fa1122 swrast: fix discarded const warning in s_texture.c
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-22 08:35:27 -06:00
Marc-André Lureau
530593da65 i965: fix invalid memory write
I noticed some heap corruption running virgl tests, and valgrind
helped me to track it down to the following error:

==29272== Invalid write of size 4
==29272==    at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307)
==29272==    by 0x9029A7D: brw_DO (brw_eu_emit.c:1750)
==29272==    by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999)
==29272==    by 0x904491F: brw_compile_fs (brw_fs.cpp:5685)
==29272==    by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272==    by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272==    by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272==    by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272==    by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)
==29272==    by 0x8C84325: _mesa_link_program (shaderapi.c:1042)
==29272==    by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515)
==29272==    by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880)
==29272==  Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd
==29272==    at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==29272==    by 0x8ED11F7: ralloc_size (ralloc.c:113)
==29272==    by 0x8ED1282: rzalloc_size (ralloc.c:134)
==29272==    by 0x8ED14C0: rzalloc_array_size (ralloc.c:196)
==29272==    by 0x9019C7B: brw_init_codegen (brw_eu.c:291)
==29272==    by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124)
==29272==    by 0x9044883: brw_compile_fs (brw_fs.cpp:5675)
==29272==    by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272==    by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272==    by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272==    by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272==    by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)

if_depth_in_loop is an array of size p->loop_stack_array_size, and
push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1],
thus the condition to grow the array should be
p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently
off by 2...)

This can be reproduced by running the following test with virgl test
server:
LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner
./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-21 20:50:07 -07:00
Dave Airlie
53afbc980a tgsi: drop unused set_exec/kill_mask interfaces.
These don't get used and haven't been in git history from what I can
see, so drop them.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-22 13:07:05 +10:00
Dave Airlie
1e8435ce0c docs/relnotes: update ARB_internalformat_query2 status.
Signed-off-by: Dave Airlie <Airlied@redhat.com>
2016-03-22 09:54:08 +10:00
Dave Airlie
ee7c8b9804 st/mesa: add support for internalformat query2.
Add code to handle GL_INTERNALFORMAT_PREFERRED.
Add code to deal with GL_RENDERBUFFER being passes into ChooseTextureFormat.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-22 09:49:08 +10:00
Jason Ekstrand
869e393eb3 anv/batch_chain: Fall back to growing batches when chaining isn't available 2016-03-21 15:29:30 -07:00
Anuj Phogat
4ba47f7b2a i965: Fix assert conditions for src/dst x/y offsets
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-21 14:55:18 -07:00
Anuj Phogat
65cd2f8443 swrast: Move assert for 'slice' in to check_map_teximage
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-03-21 14:55:18 -07:00
xavier
fce0b55ccb r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.
Previously it was doing this transformation for a Trine 3 shader:
     MUL     R6.x.12,    R13.x.23, 0.5|3f000000
-    MULADD     R4.x.12,    -R6.x.12, 2|40000000, 1|3f800000
+    MULADD     R4.x.12,    -R13.x.23, -1|bf800000, 1|3f800000

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-22 07:43:13 +10:00
Samuel Pitoiset
9efd8b590f nvc0: make sure to delete samplers used by compute shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-21 22:04:18 +01:00
Kenneth Graunke
4b0a5b21ae i965/blorp: Make BlitFramebuffer() do sRGB encoding in ES 3.x.
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use.  The ES 3.0 specification is completely clear, and has
always stated this.

However, the GL specification has changed behavior in 4.1, 4.2, and
4.4.  The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording.  However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.

This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-21 13:55:32 -07:00
Kenneth Graunke
8679bb7c9e i965/blorp: Refactor sRGB encoding/decoding.
Because the rules for sRGB are so insane, we change brw_blorp_miptrees
to take decode_srgb and encode_srgb flags, which control linearization
of the source and destination separately.

This should make it easy to implement whatever crazy combination of
rules people throw at us.  For now, it should be equivalent.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-21 13:54:29 -07:00
Kenneth Graunke
eee8a53906 meta: Make BlitFramebuffer() do sRGB encoding in ES 3.x.
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use.  The ES 3.0 specification is completely clear, and has
always stated this.

However, the GL specification has changed behavior in 4.1, 4.2, and
4.4.  The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording.  However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.

This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.

Meta implements several other functions in terms of BlitFramebuffer,
and many of those explicitly do not perform sRGB encoding.  So, this
patch explicitly disables sRGB encoding in those other functions,
preserving the existing (correct) behavior.

If you're from the future and are reading this, hi!  Welcome to
the "fun" of debugging sRGB problems!  Best of luck!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-21 13:53:44 -07:00
Nicolai Hähnle
b74784638d docs: mark GL_ARB_shader_image_load_store/_size as done for radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:26 -05:00
Edward O'Callaghan
5219eb15e1 radeonsi: Set PIPE_SHADER_CAP_MAX_SHADER_IMAGES
This enables ARB_shader_image_load_store and ARB_shader_image_size.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
[allow the same number of images for all shader stages and require LLVM 3.9]

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:26 -05:00
Nicolai Hähnle
6f942ac5ee radeonsi: disable early Z if the fragment shader writes to memory
Empirically, both the EXEC_ON_* flags and LATE_Z are necessary.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
79762e877c tgsi/scan: add writes_memory to flag presence of stores or atomics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
e9d935ed0e radeonsi: force the DCC enable bit off in image descriptors for writing (v2)
This avoids a lockup at least on Tonga.

v2: only force DCC off on VI+ (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
43f5ce1d20 radeonsi: implement MemoryBarrier (v2)
v2: invalidate both constant and VMEM/TC L1 for constant buffers (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
97352aa50a radeonsi: implement volatile memory access
Prevent loads from being re-ordered or coalesced.

Atomics don't need special handling by definition, and stores don't need
special handling because LLVM is unable to detect dead image or buffer
stores.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
5a61b428f4 radeonsi: implement coherent memory access (v2)
v2: set glc=1 for volatile also on buffers

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
d6fa650454 radeonsi: Lower TGSI_OPCODE_MEMBAR down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:25 -05:00
Nicolai Hähnle
f7a85a8a0a radeonsi: Lower TGSI_OPCODE_ATOM* down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:24 -05:00
Nicolai Hähnle
bfcefcb3c7 radeonsi: Lower TGSI_OPCODE_STORE down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:24 -05:00
Nicolai Hähnle
1e82dedeca radeonsi: Lower TGSI_OPCODE_LOAD down to LLVM op (v3)
v2: new signature style for buffer intrinsics (offsets)
v3: new signature style for llvm.amdgcn.buffer.load.format (overloaded return)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2016-03-21 15:34:24 -05:00
Nicolai Hähnle
136686a51d radeonsi: extract the LLVM type name construction into its own function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
02bd0cd7b1 radeonsi: Lower TGSI_OPCODE_RESQ down to LLVM op
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
75539197c7 radeonsi: extract TXQ buffer size computation into its own function
This will allow it to be reused for RESQ.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
515fb2c09c radeonsi: decompress shader images
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
f61566b77a radeonsi: update shader image descriptor for invalidated buffer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
e85cf35a65 radeonsi: implement set_shader_images (v2)
Whether DCC is disabled depends on the access flags with which the image
is bound: image_load supports DCC, but store and atomic don't.

v2: remove an unnecessary masking of images->desc.enabled_mask

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:23 -05:00
Nicolai Hähnle
b1b7268f01 gallium/radeon: make r600_texture_disable_dcc externally accessible
We will need it in radeonsi for shader images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:22 -05:00
Nicolai Hähnle
457f9c6b25 tgsi/scan: track which shader images are really buffers
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:22 -05:00
Nicolai Hähnle
fa096a14af tgsi/scan: add images_writemask
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:22 -05:00
Nicolai Hähnle
1379544081 st/mesa: translate additional flags in MemoryBarrier
Re-order flags in the order in which they appear in the OpenGL spec in the
description of MemoryBarrier().

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:22 -05:00
Nicolai Hähnle
96cd908fd3 gallium: add additional PIPE_BARRIER_* bits
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 15:34:22 -05:00
Brian Paul
86caa67aef svga: add svga_winsys_context::pipe_debug_callback pointer
The svga winsys modules can use this to send debug messages to the
state tracker and Mesa.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-03-21 13:37:40 -06:00
Charmaine Lee
f8aaf0094d svga: Fix the index buffer rebind regression
The index buffer handle saved in the hw_state structure could
be invalid after the index buffer is destroyed. Instead of
rebinding the index buffer using the saved index buffer handle,
we will reset the index buffer handle in the hw_state structure
to force resending of the index buffer.

Fixes bug 1593320

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-21 13:37:40 -06:00
Charmaine Lee
47856e5945 svga: rebind stream output targets
To ensure stream output target surfaces are available for the draw commands,
we need to rebind the current stream output targets at the first draw in the
command buffer.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-21 13:37:40 -06:00
Charmaine Lee
47cfc83440 svga: rebind index buffer
Similar to other resources, current index buffer needs to be
rebound at the first draw of the current command buffer to make
sure the buffer is available for the draw command.

Fixes bug 1587263.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-21 13:37:40 -06:00
Brian Paul
299f8ca0a7 svga: minor formatting fix, comment addition
To sync with our internal tree.

Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-21 13:37:25 -06:00
Charmaine Lee
b45b47c5c9 svga: optimize constant buffer uploads
When a constant buffer slot is allocated in the upload buffer,
the allocated slot size is always in multiple of 256. But the actual buffer
size might not be in multiple of 256. This causes a gap between
the ending offset of a slot and the starting offset of the next slot.
The gap will prevent the two slots to be updated in a single update command.
In order to maximize the chance of merging the contiguous dirty ranges,
when a slot is to be allocated in the constant upload buffer,
specify a buffer size in multiple of 256.

There is about 10% performance improvement with Lightsmark2008 and
30% with Cinebench R11.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2016-03-21 12:58:25 -06:00
Charmaine Lee
0a1d91ef97 svga: add a few more resource updates HUD query
This patch adds the following HUD queries:
.num-resource-updates  -- number of resource update. Commands include
                          UPDATE_SUBRESOURCE, UPDATE_GB_IMAGE.
.num-buffer-uploads    -- number of buffer uploads.
.num-const-buf-updates -- number of set constant buffer.
.num-const-updates     -- number of set shader constant.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2016-03-21 12:58:25 -06:00
Charmaine Lee
79e343b36a svga: add new num-readbacks HUD query
To find out how many image readback command is issued.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2016-03-21 12:58:25 -06:00
Brian Paul
dc9ecf58c0 svga: use shader sampler view declarations
Previously, we looked at the bound textures (via the pipe_sampler_views)
to determine texture dimensions (1D/2D/3D/etc) and datatype (float vs.
int).  But this could fail in out of memory conditions.  If we failed to
allocate a texture and didn't create a pipe_sampler_view, we'd default
to using 0 (PIPE_BUFFER) as the texture type.  This led to device errors
because of inconsistent shader code.

This change relies on all TGSI shaders having an SVIEW declaration for
each SAMP declaration.  The previous patch series does that.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
b56b853ab3 gallium/tests: declare sampler views in shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
38e831ca3d gallium/util: declare sampler view in util_make_fs_blit_msaa_depthstencil()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
e7b5a844e3 postprocess: declare sampler views in shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
5a9f2a2d89 hud: add sampler view declaration in text fragment shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
b3daaefadb st/mesa: emit sampler view decls in drawpixels code
v2: support both TGSI_TEXTURE_2D and _RECT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
0f0a23d4d8 st/mesa: emit sampler view declaration in bitmap shader
In June 2015, Rob Clark started updating the tgsi utility code to emit
SVIEW declarations in various shaders (for polygon stipple, blitting,
etc).  These patches do the same for the Mesa state tracker.

The VMware driver will use this.

v2: support both TGSI_TEXTURE_2D and _RECT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
72eb5a3cfe st/mesa: emit sampler view declarations for ARB vert/frag programs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
eda81fa357 st/mesa: use correct TGSI texture target in drawpix fragment shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
83b5b3d66e st/mesa: use correct TGSI texture target in bitmap fragment shader
Depending on the driver's support for NPOT textures, we might use
a RECT texture instead of 2D texture.  We should propogate that info
to the fragment shader's TEX instruction.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Brian Paul
63e020d734 gallium/tgsi: pass TGSI tex target to tgsi_transform_tex_inst()
Instead of hard-coded 2D tex target in tgsi_transform_tex_2d_inst()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-21 11:59:25 -06:00
Nicolai Hähnle
a8b315b827 st/mesa: use the texture view's format for render-to-texture
Aside from the bug below, it fixes a simplistic test I've written locally,
and I see no regression in Piglit for radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94595
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-21 11:28:38 -05:00
Hans de Goede
dcf8a4d281 gallium: Remove unused TGSI_RESOURCE_ defines
These magic file-index defines where only ever used in the nouveau code
and that no longer uses them.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2016-03-21 12:20:58 +01:00
Hans de Goede
9b4c8f6629 nouveau: codegen: Do not silently fail in handeLOAD / handleSTORE / handleATOM
handeLOAD / handleSTORE / handleATOM can only handle TGSI_FILE_BUFFER
and TGSI_FILE_MEMORY. Make things fail explictly when another
register-file is used in these functions.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
2016-03-21 12:20:48 +01:00
Hans de Goede
86e4440361 nouveau: codegen: Disable more old resource handling code
Commit c3083c7082 ("nv50/ir: add support for BUFFER accesses") disabled /
commented out some of the old resource handling code, but not all of it.

Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.

Disable all the dead code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
2016-03-21 12:20:40 +01:00
Hans de Goede
71e315475c nouveau: codegen: gk110: Make emitSTORE offset handling identical to emitLOAD
Make the store offset handling in CodeEmitterGK110::emitSTORE identical
to the one in CodeEmitterGK110::emitLOAD handling.

This is just a cleanup, it does not cause any functional changes.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-21 12:20:38 +01:00
Hans de Goede
c783ad0e24 nouveau: codegen: Slightly refactor Source::scanInstruction() dst handling
Use the dst temp variable which was used in the TGSI_FILE_OUTPUT
case everywhere. This makes the code somewhat easier to reads
and helps avoiding going over 80 chars with upcoming changes.

This also brings the dst handling more in line with the src
handling.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-21 12:20:32 +01:00
Hans de Goede
54cdde5eff nouveau: codegen: Add support for clover / OpenCL kernel input parameters
Add support for clover / OpenCL kernel input parameters.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
2016-03-21 12:20:28 +01:00
Hans de Goede
3788e1bf74 tgsi: Add support for global / private / input MEMORY
Extend the MEMORY file support to differentiate between global, private
and shared memory, as well as "input" memory.

"MEMORY[x], INPUT" is intended to access OpenCL kernel parameters, a
special memory type is added for this, since the actual storage of these
(e.g. UBO-s) may differ per implementation. The uploading of kernel
parameters is handled by launch_grid, "MEMORY[x], INPUT" allows drivers
to use an access mechanism for parameter reads which matches with the
upload method.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
2016-03-21 12:20:24 +01:00
Hans de Goede
43ddec2f43 tgsi: Fix decl.Atomic and .Shared not propagating when parsing tgsi text
When support for decl.Atomic and .Shared was added, tgsi_build_declaration
was not updated to propagate these properly.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
2016-03-21 12:20:19 +01:00
Iago Toral Quiroga
8f45691cda doc: document spilling options accepted by INTEL_DEBUG
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-21 08:16:49 +01:00
Hans de Goede
b72156c8e0 tgsi: Fix return of uninitialized memory in tgsi_*_instruction_memory
tgsi_default_instruction_memory / tgsi_build_instruction_memory were
returning uninitialized memory for tgsi_instruction_memory.Texture and
tgsi_instruction_memory.Format. Note 0 means not set, and thus is a
correct default initializer for these.

Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory")
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-20 18:01:53 -04:00
Ilia Mirkin
bbbdcdcf75 st/mesa: report correct precision information for low/medium/high ints
When we have native integers, these have full precision. Whether they're
low/medium/high isn't piped through the TGSI yet, but eventually those
might have differing precisions. For now they're just 32-bit ints.

Fixes the following dEQP tests:

  dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
  dEQP-GLES3.functional.state_query.shader.precision_fragment_highp_int

which expected highp ints to have full 32-bit precision, not the default
23-bit float precision.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-20 17:51:08 -04:00
Nishanth Peethambaran
eeb117a09d st/omx/dec: Correct the timestamping
Attach the timestamp to the dpb buffer and use that timestamp
while pushing buffer from dpb list to the omx client.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nishanth Peethambaran <nishanth.peethambaran@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-20 15:01:28 -04:00
Nishanth Peethambaran
46de6bbb77 st/omx: Remove trailing spaces
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nishanth Peethambaran <nishanth.peethambaran@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-20 15:01:28 -04:00
Ilia Mirkin
7d98bfedd7 nv50/ir: fix indirect texturing for non-array textures on nvc0
If a layer parameter is provided, we want to flip it to position 0 (and
combine it with any indirect params). However if the target is not an
array, there is no layer, so we have to shift all of the arguments down
by one to make room for it.

This fixes situations where there were non-coordinate parameters, such
as bias, lod, depth compare, explicit derivatives. Instead of adding a
new parameter at the front for the indirect reference, we would swap one
of those in its place.

Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.*shadow

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-20 14:14:32 -04:00
Ilia Mirkin
adb40a7399 st/mesa: only minify depth for 3d targets
We make sure that that image depth matches the level's depth before
copying it into place. However we should only be minifying the first
level's depth for 3d textures - array textures have the same depth for
all levels.

This fixes tests such as
dEQP-GLES3.functional.texture.specification.texsubimage3d_depth.* and I
suspect account for a number of other odd situations I've run into where
level > 0 of array textures was messed up.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-20 14:14:32 -04:00
Ilia Mirkin
6eeb284e4f nv50/ir: normalize cube coordinates after derivatives have been computed
In "manual" derivative mode (always used on nv50 and sometimes on nvc0
but always for cube), the idea is that using the quadop instruction, we
set up the "other" quads to have values such that the derivatives work
out, and then run the texture instruction as if nothing were strange. It
pulls values from the other lanes, and does its magic.

However cube coordinates have to be normalized - one of the 3 coords has
to be 1, to determine which is the major axis, to say which face is
being sampled. We were normalizing the coordinates first, and then
adding the derivatives. This is wrong for two reasons:

- the coordinates got normalized by a scaling factor but the
  derivatives didn't
- the result of the addition didn't end up normalized

To resolve this, we flip the logic around to normalize *after* the
per-lane coordinates are set up.

This fixes a bunch of textureGrad cube dEQP tests.

NOTE: nv50 cube arrays with explicit derivatives are still broken, to be
resolved at a later date.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-20 14:14:32 -04:00
Marek Olšák
ea2bff1d11 gallium/radeon: remove remnants of R600 TGSI->LLVM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-20 00:57:05 +01:00
Marek Olšák
4e5dc69af1 r600g: flatten if (1) statement after removal of TGSI->LLVM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-20 00:57:05 +01:00
Marek Olšák
20a09897a6 r600g: remove TGSI->LLVM translation
It was useful for testing and as a prototype for radeonsi bringup,
but it's not used anymore and doesn't support OpenGL 3.3 even.

v2: try to fix OpenCL build

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-03-20 00:57:02 +01:00
Marek Olšák
8140154ae9 gallium/radeon: remove old CS tracing
Cons:
- it was only integrated in r600g
- it doesn't work with GPUVM
- it records buffer contents at the end of IBs instead of at the beginning,
  so the replay isn't exact
- it lacks an IB parser and user-friendliness

A better solution is apitrace in combination with gallium/ddebug, which
has a complete IB parser and can pinpoint hanging CP packets.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-20 00:56:35 +01:00
Marek Olšák
a73a657def radeonsi: process TGSI property NEXT_SHADER
This allows compiling the main shader part as ES or LS.

If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.

The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-19 23:20:01 +01:00
Marek Olšák
2bdd7a46a9 st/mesa: set TGSI property NEXT_SHADER
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-19 23:20:01 +01:00
Marek Olšák
fbe6e92899 gallium: add TGSI property NEXT_SHADER
Radeonsi needs to know which shader stage will execute after a shader
in order to make the best decision about which shader variant to compile
first.

This is only set for VS and TES, because we don't need it elsewhere.

VS has 3 variants:
- next shader is FS
- next shader is GS
- next shader is TCS

TES has 2 variants:
- next shader is FS
- next shader is GS

Currently, radeonsi always assumes the next shader is FS, which is suboptimal,
since st/mesa always knows which shader is next if the GLSL program is not
a "separate shader".

By default, ureg always sets "next shader is FS".

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-19 23:20:01 +01:00
Pierre Moreau
9184d9a0bb nvc0/ir: Use double constant in handleSQRT
Fixes: a100d89d09 (nv50,nvc0: Fix invalid constant.)
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-19 15:59:52 -04:00
Kenneth Graunke
789e096594 mesa: Disallow GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME on winsys FBO.
Fixes:
dEQP-GLES3.functional.negative_api.state.get_framebuffer_attachment_parameteriv

Apparently, GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is not allowed when
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is GL_FRAMEBUFFER_DEFAULT, and
is expected to result in a GL_INVALID_ENUM error.

No GL specification actually defines what GL_FRAMEBUFFER_DEFAULT means.
It probably means the window system FBO.  It also doesn't mention the
behavior of any queries for that type.  Various ARB folks seem fairly
confused about it too.  For now, just do something vaguely like what
dEQP expects.

I think we probably need to check the visual bits against 0 for the
attachment, but we haven't been doing that thusfar, and given how
confusingly this is specified, I can't imagine anyone relying on it.

v2: Improve comments, move error condition above the
    _mesa_get_fb0_attachment call, add forgotten "return"
    (all suggested/caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-19 12:58:15 -07:00
Ilia Mirkin
d2445b0083 nv50/ir: force-enable derivatives on TXD ops
This matters especially in vertex shaders, where derivatives are
disabled by default. This fixes textureGrad in vertex shaders on nv50.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-19 13:09:49 -04:00
Ilia Mirkin
d1b85dbffa nv50: reset TFB bufctx when we no longer hold a reference to the buffers
This fix is analogous to commit ff085d014.

This fixes some use-after-free situations in dEQP when an xfb state is
removed, and then a clear is triggered, which only does a partial
validation. It would attempt to read the no-longer-valid buffers,
resulting in crashes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-19 13:09:49 -04:00
Samuel Pitoiset
902bbda81b nvc0: avoid using magic numbers for the uniform_bo offsets
Instead make use of constants to improve readability.

The first 32 bytes of the driver constant buffer are unknown... This
doesn't seem to be used in the codegen part, but if the texBindBase
offset is shifted from 0x20 to 0x00, this breaks the universe for
really weird reasons. This sounds like to be related to textures.

Anyway, name this NVC0_CB_AUX_UNK_INFO and add a todo should be
enough for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-19 18:01:08 +01:00
Samuel Pitoiset
26cc411db8 nv50/ir: make use of auxCBSlot instead of magic numbers
This avoids using magic numbers for the driver constbuf slot which
is always 15 except for compute shaders on gk104+ where the slot 0
is used.

For gk104+, some special compute-related values like the thread
index are uploaded to screen->parm which is currently bound on c0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 18:01:04 +01:00
Samuel Pitoiset
d86933e6f4 nv50,nvc0: replace resInfoCBSlot by auxCBSlot
Having two different variables for the driver constant buffer slot
is confusing and really useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 18:00:59 +01:00
Samuel Pitoiset
e05492fd7f nv50/ir: fix compilation warning in handleSharedATOM()
In release build mode only, op may be used uninitialized because
the assertion has been removed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-19 17:01:17 +01:00
Vinson Lee
a100d89d09 nv50,nvc0: Fix invalid constant.
Fix clang build error.

  CXX      codegen/nv50_ir_lowering_nvc0.lo
codegen/nv50_ir_lowering_nvc0.cpp:1783:42: error: invalid suffix 'd' on floating constant
      Value *zero = bld.loadImm(NULL, 0.0d);
                                         ^

Fixes: c1e4a6bfbf ("nv50,nvc0: handle SQRT lowering inside the driver")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-18 20:38:41 -07:00
Kenneth Graunke
46610238e0 mesa: Do proper format error checks for GenerateMipmap in ES 3.x.
According to the OpenGL ES 3.2 spec's description of GenerateMipmap:

"An INVALID_OPERATION error is generated if the levelbase array was not
 specified with an unsized internal format from table 8.3 or a sized
 internal format that is both color-renderable and texture-filterable
 according to table 8.10."

Similar text exists in the ES 3.0 specification as well.

Our existing rules are pretty close, but miss a few things.  The
OpenGL specification actually doesn't have any text about internal
format checking - our existing code comes from a Khronos bug report.
The ES 3.x spec provides a clearer description.

Fixes dEQP-GLES3.functional.negative_api.texture.generatemipmap and
dEQP-GLES2.functional.negative_api.texture.generatemipmap_zero_level
_array_compressed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:43:47 -07:00
Kenneth Graunke
f1b0573510 mesa: Add color renderable/texture filterable format info for ES 3.x.
OpenGL ES 3.x contains a table of sized internal formats and their
required properties.  In particular, each format is marked as
"Color Renderable" or "Texture Filterable".

This patch introduces two functions that can be used to query the
information from that table.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:43:23 -07:00
Kenneth Graunke
88d28aa4d9 i965: Stop XY clipping point and line primitives.
Wide points and lines are not supposed to be clipped by the viewport.
Rather, they should be rendered, and any fragments outside of the
viewport should be discarded.

The traditional use case for this behavior is rendering moving wide
point particles.  When the center of the point approaches the viewport
edge, clipping would make it pop out of view early.

Fixes:
- dEQP-GLES2.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
- dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center
- dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:42:51 -07:00
Kenneth Graunke
0de64ab788 i965: Scissor to the viewport when rendering points/lines.
We're about to start allowing wide points/lines whose vertices are
outside the viewport past the clipper.  This scissoring hack ensures
that any fragments generated are still restricted to the viewport.

It is not necessary on Gen8+ as those platforms already discard
fragments which are outside the viewport.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:42:30 -07:00
Kenneth Graunke
d000a4989f i965: Include the viewport in the scissor rectangle.
We'll need to use scissoring to restrict fragments to the viewport
soon.  It seems harmless to include it generally, so let's do that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:42:15 -07:00
Kenneth Graunke
47be5a64c7 i965: Introduce an is_drawing_lines() helper.
Similar to is_drawing_points().

v2: Account for isoline tessellation output topology.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:41:59 -07:00
Kenneth Graunke
757674e8d0 i965: Move is_drawing_points to brw_state.h.
I need to use this in multiple source files.

v2: Rebase on TES output domain fix.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 18:41:25 -07:00
Jason Ekstrand
ecfb074276 anv/allocator: Make the bo_pool dynamically sized 2016-03-18 17:25:58 -07:00
Kenneth Graunke
5b2d8c2273 i965: Fix gl_TessLevelOuter[] for isolines.
Thanks to James Legg for finding this!

From the ARB_tessellation_shader spec:
"The number of isolines generated is derived from the first outer
 tessellation level; the number of segments in each isoline is
 derived from the second outer tessellation level."

According to the PRM, "TF.LineDensity determines # lines" while
"TF.LineDetail determines # segments".  Line Density is stored at
DWord 6, while Line Detail is at DWord 7.  So, they're not reversed
like they are for triangles and quads.

Fixes Piglit's spec/arb_tessellation_shader/execution/isoline,
and about 24 dEQP isoline tests (with GL_EXT_tessellation_shader
hacked on - it's not normally enabled).

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94524
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-18 16:45:23 -07:00
Kenneth Graunke
24298b7e2f i965: Decode non-normalized coordinates bit in SAMPLER_STATE.
We weren't printing this for some reason.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-18 16:44:51 -07:00
Kenneth Graunke
8679d40dc7 i965: Account for TES in is_drawing_points().
Now that we implement tessellation shaders, the TES might be the last
stage enabled.  If it's outputting points, then the primitive type
reaching the SF is points.  We need to account for this.

Caught by Ilia Mirkin.

v2: Update dirty bit comment above caller (caught by Iago)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-18 16:44:15 -07:00
Pierre Moreau
1282146d4e nv50: Mark compute states as dirty on context switch
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
[ Samuel Pitoiset: Trivial rebase conflict ]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-19 00:18:00 +01:00
Samuel Pitoiset
a734c0f8ba nv50/ir: print SUBFM subops
Only 3d subop is currently emitted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-19 00:09:18 +01:00
Samuel Pitoiset
af0c97fb90 nv50: add a new validation path for compute
This makes use of the new state validation interface to be consistent
with 3d.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:09:14 +01:00
Samuel Pitoiset
5ed387675d nv50: rework nv50_compute_validate_program()
Reduce the amount of duplicated code by re-using
nv50_program_validate(). While we are at it, change the prototype to
return void. We don't check anymore if the translation fails but
improving the state validation is a long process.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:09:09 +01:00
Samuel Pitoiset
a07ebc1993 nv50: rework the validation path for 3D
This exposes an interface for state validation that will be also used
to rework the compute validation path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:09:05 +01:00
Samuel Pitoiset
517d2c97e1 nv50: rename 3d binding points to NV50_BIND_3D_XXX
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:09:02 +01:00
Samuel Pitoiset
9374fc1e67 nv50: rename 3d dirty flags to NV50_NEW_3D_XXX
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:08:56 +01:00
Samuel Pitoiset
e844aac40b nv50: rename NV50_COMPUTE to NV50_CP
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:08:52 +01:00
Samuel Pitoiset
dedb46f582 nv50: rename nv50_context::dirty to nv50_context::dirty_3d
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-03-19 00:08:28 +01:00
Jason Ekstrand
b1c5d45872 anv/allocator: Add a size field to bo_pool_alloc 2016-03-18 11:50:53 -07:00
Brian Paul
9211b68ad3 st/mesa: clean up st_translate_texture_target()
Reformat code.  Improve assertion.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-18 12:06:31 -06:00
Brian Paul
0f73c3ab25 st/mesa: simplify drawpixels shader code with tgsi transform helper functions
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-18 12:06:30 -06:00
Brian Paul
373910f4e7 st/mesa: simplify bitmap shader code with tgsi transform helper functions
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-18 12:06:30 -06:00
Brian Paul
e9d5e68d1b tgsi: add tgsi_transform_op3_inst() function
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-18 12:06:30 -06:00
Juan A. Suarez Romero
7a712e64d6 doc: add 'vec4' option in INTEL_DEBUG
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-18 17:30:56 +01:00
Daniel Czarnowski
d4714512e4 egl: support EGL_LARGEST_PBUFFER in eglCreatePbufferSurface(...)
Patch provides a default for a set pbuffer surface size when
EGL_LARGEST_PBUFFER is used by the client. MIN2 macro is moved
to egldefines so that it can be shared.

Fixes following Piglit test:
   egl-create-largest-pbuffer-surface

From EGL 1.5 spec:
   "Use EGL_LARGEST_PBUFFER to get the largest available pbuffer
   when the allocation of the pbuffer would otherwise fail."

Currently there exists no API to query largest available pixmap size
using xlib or xcb so right now this seems most straightforward way to
ensure that we fulfill above API and also we don't attempt to allocate
'too big' pixmap which might succeed on server side but not work in
practice when driver starts to use it as a texture.

v2: add more explanation about the change (Emil)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-18 07:35:32 +02:00
George Kyriazis
dd63fa28f1 gallium/swr: Cleaned up some context-resource management
Removed bound_to_context.  We now pick up the context from the screen
instead of the resource itself.  The resource could be out-of-date
and point to a pipe that is already freed.

Fixes manywin mesa xdemo.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-03-17 20:57:52 -05:00
Timothy Arceri
952c166170 mesa: remove remaining tabs in prog_parameter.c
Acked-by: Matt Turner <mattst88@gmail.com>
2016-03-18 12:42:53 +11:00
Timothy Arceri
ce9c042ab3 mesa: inline _mesa_add_unnamed_constant()
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-18 12:42:43 +11:00
Timothy Arceri
fa9bd6b663 mesa: simplify and inline _mesa_lookup_parameter_index()
The function has only one user and strings are always null terminated.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-18 12:42:39 +11:00
Timothy Arceri
350b1ef027 mesa: make _mesa_lookup_parameter_constant static
This is not used outside of prog_parameter.c

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-18 12:42:34 +11:00
Timothy Arceri
7794b22a84 mesa: remove unused function
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-18 12:42:30 +11:00
Nicolai Hähnle
a8eea696b8 st/mesa: honour sized internal formats in st_choose_format (v2)
The bitcasting which is possible with shader images (and texture views?)
requires that when the user specifies a sized internal format for a
texture, we really allocate that format. To this end:

(1) find_exact_format should ignore sized internal formats and

(2) some of the entries in the mapping table corresponding to sized
    internal formats are reordered to use an RGBA format instead of
    a BGRA one.

This fixes arb_shader_image_load_store-bitcast in the (work in progress)
ARB_shader_image_load_store implementation for radeonsi.

v2: don't change the mapping of GL_RGB10: the change caused a regression
    because it preferred a format with an alpha channel, and GL_RGB10
    is not among the supported formats for shader images

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-17 19:26:40 -05:00
Dongwon Kim
49eb5e75bd configure.ac: enable_asm=yes when x-compiling across same X86 arch
Currently, configure script is forcing 'enable_asm' to be 'no'
whenever cross-compilation is performed on X86 host. This is
based on an assumption that target architecture is different
from host's (i.e. ARM). But there's always a case that we do
cross-compilation for target that is also X86 based just like
host in which same ASM codes will be supported. 'enable_asm'
should not be forced to be "no" anymore in this case.

v2: corrected commit message

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
2016-03-17 16:53:23 -07:00
Timothy Arceri
d6b9202873 glsl: disable varying packing when its not safe
In GL 4.4+ there is no guarantee that interpolation qualifiers will
match between stages so we cannot safely pack varyings using the
current packing pass in Mesa.

We also disable packing on outerward facing interfaces for SSO
because in ES we need to retain the unpacked varying information
for draw time validation. For desktop GL we could allow packing for
SSO in versions < 4.4 but its just safer not to do so.

We do however enable packing on individual arrays, structs, and
matrices as these are required by the transform feedback code and it
is still safe to do so.

Finally we also enable packing when a varying is only used for
transform feedback and its not a SSO.

This fixes all remaining rendering issues with the dEQP SSO tests,
the only issues remaining with thoses tests are to do with validation.

Note: There is still one remaining SSO bug that this patch doesn't fix.
Their is a chance that VS -> TCS will have mismatching interfaces
because we pack VS output in case its used by transform feedback but
don't pack TCS input for performance reasons. This patch will make the
situation better but doesn't fix it.

V4: fix out of order function params after rebase, make sure packing
still disabled in tess stages. Update comments as to why we disable
packing on SSO.

V3: ES 3.1 *does* require interpolation to match so don't disable
packing there. Rebased on master rather than on enhanced layouts
component packing series.

V2: Make is_varying_packing_safe() a function in the varying_matches
class, fix spelling (Matt) and make sure to remove the outer array
when dealing with Geom and Tess shaders where appropriate.
Lastly fix piglit regression in new piglit test and document the
undefined behaviour it depends on:
arb_separate_shader_objects/execution/vs-gs-linking.shader_test

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-18 10:26:34 +11:00
Timothy Arceri
c0ae6eeb3b glsl: pass disable_varying_packing bool to the lowering pass
This will allow us to choose to ignore the disable which will be
useful for more fine grained control over when to enable or disable
packing.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-18 10:26:30 +11:00
Marek Olšák
4ab2ac3349 radeonsi: fix Hyper-Z hangs on P2 configs
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-17 18:30:45 +01:00
Romain Failliot
151724159d docs: Renormalize older extensions.
For older extensions, there is an explanation first and the extension
name in brackets, like that:
    Clamping controls (GL_ARB_color_buffer_float)
I inverted that so we have the extension first and then the explanation
in brackets, like that:
    GL_ARB_color_buffer_float (Clamping controls)

It will help me later to parse the few extensions that use this syntax:
    all drivers that support <GL_extension>

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-17 11:35:20 -05:00
Romain Failliot
f5d47dd428 docs: Renormalize some extensions.
This fixes some exceptions I have to deal with in mesamatrix.net.
The extensions GL_ARB_texture_buffer_object had a comment between "DONE"
and the brackets.
And the extension GL_KHR_robustness (in GL 4.5 and GLES 3.1) was using
"90% done" instead of "in progress". The "90% done" is still here
though, but as an extension comment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-17 11:35:12 -05:00
Romain Failliot
3671bb3eaf docs: Realign the "Status" column.
The "Status" column was misaligned in some GL sections.
This is a lot of diffs, but it's only spaces in the end.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-17 11:35:09 -05:00
Romain Failliot
e571f11de8 docs: howto to read and edit GL3.txt
Added a small guide on how to read and edit GL3.txt.
I think this would help as much the devs as the users reading this file.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-17 11:34:50 -05:00
Brian Paul
84b961dd53 r300g: add missing layer argument to rws->buffer_get_handle() call
Fixes compilation error since 5aea0d691.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-03-17 09:52:21 -06:00
Christian König
5aea0d6919 radeon/winsys: add layer support for BO export
Add layer support to export individual array layers.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-17 14:17:06 +01:00
Christian König
04bc082f6a radeon/winsys: add offset support for BO import/export
Add offset support to handle NV12 offsets as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-17 14:17:03 +01:00
Christian König
f1e78a48f2 gallium/winsys/drm: add layer to struct winsys_handle
For exporting a specific layer of an array texture.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-17 14:16:59 +01:00
Christian König
29d26f1522 gallium/winsys/drm: add offset to struct winsys_handle
We are going to need this for EGL_EXT_image_dma_buf_import.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-17 14:16:03 +01:00
Connor Abbott
58fe7837b8 nir: propagate bitsize information in nir_search
When we replace an expresion we have to compute bitsize information for the
replacement. We do this in two passes to validate that bitsize information
is consistent and correct: first we propagate bitsize from child nodes to
parent, then we do it the other way around, starting from the original's
instruction destination bitsize.

v2 (Iago):
- Always use nir_type_bool32 instead of nir_type_bool when generating
  algebraic optimizations. Before we used nir_type_bool32 with constants
  and nir_type_bool with variables.
- Fix bool comparisons in nir_search.c to account for bitsized types.

v3 (Sam):
- Unpack the double constant value as unsigned long long (8 bytes) in
nir_algrebraic.py.

v4 (Sam):
- Use helpers to get type size and base type from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Connor Abbott
3124ce699b nir: add a bit_size parameter to nir_ssa_dest_init
v2: Squash multiple commits addressing the new parameter in different
    files so we don't break the build (Iago)

v3: Fix tgsi (Samuel)

v4: Fix nir_clone.c (Samuel)

v5: Fix vc4 and freedreno (Iago)

v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Iago Toral Quiroga
084b24f558 nir: rename nir_const_value fields to include bitsize information
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-17 11:16:33 +01:00
Connor Abbott
9076c4e289 nir: update opcode definitions for different bit sizes
Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.

v2: fix output type for u2f (Iago)

v3: do not change vecN opcodes to be float. The next commit will add
    infrastructure to enable 64-bit integer constant folding so this is isn't
    really necessary. Also, that created problems with source modifiers in
    some cases (Iago)

v4 (Jason):
  - do not change bcsel to work in terms of floats
  - leave ldexp generic

Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.

v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)

Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Connor Abbott
6700d7e423 nir: add nir_{src,dest}_bit_size() helpers
v2: use a ternary (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jason Ekstrand
e172dbe5d2 nir: Add a bit_size to nir_register and nir_ssa_def
This really hacky commit adds a bit size to registers and SSA values.  It
also adds rules in the validator to validate that they do the right things.

It's still an open question as to whether or not we want a bit_size in
nir_alu_instr or if we just want to let it inherit from the destination.
I'm inclined to just let it inherit from the destination.  A similar
question needs to be asked about intrinsics.

v2 (Connor):
  - Relax validation: comparisons have explicit destination sizes
    and implicit source sizes.

v3 (Sam):
- Use helpers to get size and base types of nir_alu_type enum.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Connor Abbott
3d37de930d nir/types: add a function to get the bitsize of a base type
v2: fix it for GLSL_TYPE_SUBROUTINE (Iago)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Samuel Iglesias Gonsálvez
c38a25af2f i965/nir: fix check to resolve booleans to work with sized nir_alu_type
As nir_alu_type has now embedded the data size, the check for the
instruction's output type (to see if a boolean resolve is required)
should ignore the data size part.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jason Ekstrand
78f1919429 nir: Add explicitly sized types
v2: Fix size/type mask to properly handle 8-bit types.

v3: Add helpers to get the bitsize and base type of a
nir_alu_type enum.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jordan Justen
3fd308a357 Merge remote-tracking branch 'origin/master' into vulkan 2016-03-17 01:44:07 -07:00
Jordan Justen
7d021cb15e i965/nir: Lower nir compute shader shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen
b1e7cdfdcf nir: Lower shared var atomics during nir_lower_io
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen
e3cbb9d37c nir: Add support for lowering load/stores of shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen
683c359c54 nir: Add atomic operations on variables
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen
3c807607df nir: Add compute shader shared variable storage class
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen
26f8262698 nir/print: Add space after shader_storage var mode
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Iago Toral Quiroga
5be11d2236 i965: Skip execution size adjustment for instructions of width 4
This code in brw_set_dest adjusts the execution size of any instruction
with a dst.width < 8. However, we don't want to do this with instructions
operating on doubles, since these will have a width of 4, but still
need an execution size of 8 (for SIMD8). Unfortunately, we can't just check
the size of the operands involved to detect if we are doing an operation on
doubles, because we can have instructions that do operations on double
operands interpreted as UD, operating on any of its 2 32-bit components.

Previous commits have made it so we never emit instructions with a horizontal
width of 4 that don't have the correct execution size set for gen6+, so
we can skip it in this case, avoiding the conflicts with fp64 requirements.

Expanding the same fix to other hardware generations requires many more
changes but since we are not targetting fp64 support on them
wer don't really care for now.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Samuel Iglesias Gonsalvez
22a10dd030 i965/vec4/gen6: fix exec_size for MOV with a width of 4 in generate_gs_ff_sync()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Samuel Iglesias Gonsalvez
b91b9e4b00 i965/vec4/gen6: fix exec_size for instructions with destination width of 4
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Samuel Iglesias Gonsalvez
30fc3fa24d i965/vec4/gen6: fix exec_size for instructions with width of 4 in generate_gs_svb_write()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Samuel Iglesias Gonsalvez
2fafc6b98c i965/gs/gen6: fix execsize for instructions with width of 4 in gen6_sol_program()
v2:
- Add assert (Topi).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Iago Toral Quiroga
f6342b5645 i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channel
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Iago Toral Quiroga
31a8604252 i965/eu: set execution size for SEND message in brw_send_indirect_message
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:25 +01:00
Iago Toral Quiroga
2d6af62a0f i965/fs: Set exec size for gen7 pull const loads
v2 (Topi):
  - No need to set the execsize for the indirect send message,
    the next patch will handle that.
  - Set the execution size explicitly instead of taking it from
    the width of the dst that we set before.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:24 +01:00
Iago Toral Quiroga
ea45b6e96d i965/eu: set correct execution size in brw_NOP
v2: NOP should have an execsize of 1 (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-17 08:23:24 +01:00
Kenneth Graunke
9c1e01c4a8 meta: Don't use integer handles for shaders or programs.
Previously, we gave our internal clear/blit shaders actual GL handles
and stored them in the shader/program hash table.  We used ordinary
GL API entrypoints to work with them.

We thought this shouldn't be a problem because GL doesn't allow
applications to invent their own names for shaders or programs.
GL allocates all names via glCreateShader and glCreateProgram.

However, having them in the hash table is a bit risky: if a broken
application guesses the name of our shaders or programs, it could
alter them, potentially screwing up future meta operations.

Also, test cases can observe the programs in the hash table.  Running
a single dEQP process that executes the following test list:

dEQP-GLES3.functional.negative_api.buffer.clear
dEQP-GLES3.functional.negative_api.shader.compile_shader
dEQP-GLES3.functional.negative_api.shader.delete_shader

would result in the last two tests breaking.  The compile_shader test
calls glCompileShader(9) straight away, and since it hasn't even created
any shaders or programs, it expects to get a GL_INVALID_VALUE error
because there's no such name.  However, because the clear test ran
first, it created Meta programs, so an object named "9" did exist.

This patch reworks Meta to work with gl_shader and gl_shader_program
pointers directly.  These internal programs have bogus names, and are
never stored in the hash tables, so they're invisible to applications.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94485
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-16 23:57:11 -07:00
Kenneth Graunke
0fe254168b mesa: Expose compile_shader() and link_program() beyond the file.
This will allow me to use them directly from Meta, bypassing the
versions that work with GL integer handles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-16 23:57:11 -07:00
Kenneth Graunke
7753657cf2 mesa: Make link_program() take a gl_shader_program, not a GLuint.
In half the callers, we already have a pointer, and don't need
to look it up again.  This will also help with upcoming meta work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-16 23:57:11 -07:00
Kenneth Graunke
a461e0003f mesa: Make compile_shader() take a gl_shader, not a GLuint.
In half the callers, we already have a pointer, and don't need
to look it up again.  This will also help with upcoming meta work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-16 23:57:11 -07:00
Kenneth Graunke
a7e9b31d5b meta: Use the _mesa_meta_compile_and_link_program helper more places.
Less boilerplate.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-16 23:57:11 -07:00
Eric Anholt
2b9f0dffe0 vc4: Move discard handling to the condition flag.
Now that the field exists in the instruction, we can make discards less
special.  As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.

This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup.  Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.

total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs:     1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs:     8200 -> 8211 (0.13%)
2016-03-16 11:28:47 -07:00
Eric Anholt
7c9fc43915 vc4: Don't make a temporary for setting flags.
The register allocator doesn't really do anything about the temp, so it
doesn't seem like it should matter.  However, the scheduler would think
that a new def is being created.

This doesn't change anything yet, but it avoids a bunch of regressions in
the next commit.
2016-03-16 11:28:34 -07:00
Eric Anholt
b4f45f319c vc4: Add a safety check for setting flags.
If a pack was on the src reg, should it be a float, int, or mul unpack?
Just complain, instead.
2016-03-16 11:28:34 -07:00
Eric Anholt
a298fb15af vc4: Reuse list_for_each_entry_safe_rev().
This didn't exist when I wrote the code.
2016-03-16 11:28:34 -07:00
Nanley Chery
5464f0c046 anv/blit: Reduce number of VUE headers being read
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:57:23 -07:00
Nanley Chery
f33866ae0a anv/blit: Remove completed finishme for VkFilter
This task was finished as of:
d9079648d0.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:57:19 -07:00
Nanley Chery
5647de8ba5 anv/blit2d: Only use one extent in meta_emit_blit2d
Since scaling isn't involved, we don't need multiple extents.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:57:14 -07:00
Nanley Chery
92fb65f117 anv/blit2d: Remove sampler from pipeline
Since we're using texelFetch with a sampled image, a sampler is no
longer needed. This agrees with the Vulkan Spec section 13.2.4
Descriptor Set Updates:

sampler is a sampler handle, and is used in descriptor updates for types
VK_DESCRIPTOR_TYPE_SAMPLER and VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
if the binding being updated does not use immutable samplers.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:57:00 -07:00
Nanley Chery
f8f9886915 anv/blit2d: Use texel fetch in frag shader
The texelFetch operation requires that the sampled texture coordinates
be unnormalized integers. This will simplify the copy shader for
w-tiled images (stencil buffers).

v2 (Jason):
   Use f2i for texel coords
   Fix num_components indirectly
   Use float inputs for interpolation
   Nest tex_pos functions

Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:56:51 -07:00
Nanley Chery
b487acc489 Revert "anv/meta: Make meta_emit_blit() public"
This reverts commit f391683922.

Some conflicts had to be resolved in order for this revert to be
successful.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:56:46 -07:00
Nanley Chery
1a0c63b880 Revert "anv/meta: Prefix anv_ to meta_emit_blit()"
This reverts commit 514c055717.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:56:41 -07:00
Nanley Chery
997a873f0c anv/blit2d: Customize meta blit structs and functions for blit2d API
* Add fields in meta struct
* Add support in meta init/teardown
* Switch to custom meta_emit_blit2d()

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:56:22 -07:00
Nanley Chery
2d8c632117 anv/blit2d: Copy anv_meta_blit.c functions
These will be customized for blit2d operations.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-16 10:56:10 -07:00
Kenneth Graunke
b566317e7e meta: Use ARB_explicit_attrib_location in the rest of the meta shaders.
This is cleaner than using glBindAttribLocation().

Not all drivers support the extension, but I don't think those drivers
use GLSL in the first place.  Apparently some Meta shaders already use
GL_ARB_explicit_attrib_location, so I think it should be okay.

Honestly, I'm not sure how the old code worked anyway - we bound the
attribute location for "texcoords", while all the shaders capitalized
or spelled it differently.

v2: Convert another instance in brw_meta_fast_clear.c.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-16 00:09:56 -07:00
Plamena Manolova
9d9965c06f mesa: Ignore glPointSize when GL_POINT_SIZE_ARRAY_OES is enabled
When a user defines a point size array and enables it, the point
size value set via glPointSize should be ignored. To achieve this,
we can simply toggle ctx->VertexProgram.PointSizeEnabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42187
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-15 15:49:48 -07:00
Jason Ekstrand
abaa3bed22 anv/device: Flush the fence batch rather than the start of the BO 2016-03-15 15:24:24 -07:00
Jason Ekstrand
7f6a0cb29c Merge remote-tracking branch 'public/master' into vulkan 2016-03-15 14:09:50 -07:00
Varad Gautam
e103b52aec vc4: Coalesce instructions using VPM reads into the VPM read.
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.

shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs:     5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs:     5345 -> 4993 (-6.59%)

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
2016-03-15 13:09:24 -07:00
Varad Gautam
00bdbb22a9 vc4: rename file to group vpm optimizations together
This file will contain optimization passes for both vpm reads
and writes.

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-03-15 12:49:37 -07:00
Eric Anholt
1c4b077409 vc4: Fix failures with nir_extract_* since the addition of the opcodes. 2016-03-15 12:49:37 -07:00
Roland Scheidegger
bb2c5e657b llvmpipe: fix lp_rast_plane alignment on 32bit
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

CC: <mesa-stable@lists.freedesktop.org>
2016-03-15 19:42:15 +01:00
Roland Scheidegger
12a4f0bed6 draw: fix line stippling
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

CC: <mesa-stable@lists.freedesktop.org>
2016-03-15 19:41:34 +01:00
Roland Scheidegger
4b249ed4cd softpipe: fix misleading TGSI_QUAD_SIZE usage
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.

While here, also add comment about not quite correct filtering.

There's no actual code difference.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-15 19:37:59 +01:00
Roland Scheidegger
9e9d69979c softpipe: fix anisotropic filtering crash
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>

CC: <mesa-stable@lists.freedesktop.org>
2016-03-15 16:40:05 +01:00
Nicolai Hähnle
4de25fa7b0 radeonsi: set DEPTH_BEFORE_SHADER based on FS_EARLY_DEPTH_STENCIL
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:59 -05:00
Nicolai Hähnle
0ffcc318e6 tgsi: add tgsi_full_src_register_from_dst helper function
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:49 -05:00
Nicolai Hähnle
c02d73af0b gallium/u_inlines: add util_copy_image_view
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:46 -05:00
Nicolai Hähnle
f6dc4f5558 st/mesa: set image access flags in st_bind_images
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:43 -05:00
Nicolai Hähnle
71a1b54b33 gallium: add access field to pipe_image_view
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:40 -05:00
Nicolai Hähnle
8c497b8fb5 st/glsl_to_tgsi: set FS_EARLY_DEPTH_STENCIL when required
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:37 -05:00
Nicolai Hähnle
e526f930aa tgsi: add TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:33 -05:00
Nicolai Hähnle
1c0cee8764 st/glsl_to_tgsi: set memory access type on image intrinsics
This is required to preserve the image variable's coherent/restrict/volatile
qualifiers in TGSI.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:30 -05:00
Nicolai Hähnle
dfcf420412 st/glsl_to_tgsi: provide Texture and Format information for image ops
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:26 -05:00
Nicolai Hähnle
3243b6fc97 tgsi: add Texture and Format to tgsi_instruction_memory
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-14 17:24:02 -05:00
Nicolai Hähnle
9b68bdf6f8 get: reconcile aliasing enums for MaxCombinedShaderOutputResources
The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and
MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only
appear once.

Noticed while implementing ARB_shader_image_load_store without previously
implementing SSBO.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-14 17:19:14 -05:00
Francisco Jerez
b054605722 i965/fs: Restrict inequality that can only hold equal in saturate propagation.
Should have no functional change.  The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:58:19 -07:00
Francisco Jerez
7d7990cf65 i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:58:11 -07:00
Francisco Jerez
93be4158ae i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:57:58 -07:00
Francisco Jerez
6691c03fd3 i965/fs: Add missing analysis invalidation in opt_sampler_eot().
Bug found by the liveness analysis validation pass that will be
introduced in a later commit.  opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:56:02 -07:00
Hans de Goede
4d02e91e49 clover: Fix pipe_grid_info.indirect not being initialized.
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.

This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-14 14:12:42 -07:00
Sarah Sharp
af06190760 mesa: docs: Intel i965 hardware limits.
This should help the next person working on hardware enabling figure out
where in the Intel PRMs to find the magic platform hardware values.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2016-03-14 14:00:29 -07:00
Sarah Sharp
0f5bfc7f01 mesa: docs: i965: Use correct doxygen groupings syntax
When reading the source code, it's useful to indicate that a group of
fields in a struct are related in someway. There were several places
where people tried to group related structure members with the {@
syntax, without realizing they also needed to add the \name syntax in
order to generate correct doxygen html.

There are several files with groupings that look like this:

struct foo {
    /**
     * Related fields description
     * @{
     */
    int bar;
    char baz;
    /** @} */
    long qux;
}

However, the doxygen syntax for grouping is:

struct foo {
    /**
     * \name Related fields description
     * @{
     */
    int bar;
    char baz;
    /** @} */
    long qux;
}

https://www.stack.nl/~dimitri/doxygen/manual/grouping.html

Without the group name definition, the fields don't get properly
grouped. Instead, the group description is applied to the first field.

Fix the Intel hardware information structure, brw_device_info to
properly group the GPU hardware limitations and hardware quirks fields.

Once you've run `cd doxygen; make clean; make all`,
updated documentation can be found at

mesa/doxygen/i965/structbrw__device__info.html

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2016-03-14 14:00:29 -07:00
Bruce Cherniak
e9d68cc3da gallium/swr: Resource management
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2016-03-14 14:07:48 -05:00
Marek Olšák
7a2333e4ef configure.ac: require libdrm 2.4.66 for drmGetDevice
since 737b6ed13e
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c no longer compiles:
error: unknown type name ‘drmDevicePtr’
2016-03-14 16:42:41 +01:00
Francisco Jerez
63250d8178 i965: Remove useless IR self-destruct backend_shader method.
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid.  Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-13 18:07:53 -07:00
Pierre Moreau
8c7acd87af nv50,nvc0: Set only NEW_CP_GLOBALS upon binding
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 22:34:50 +01:00
Rob Clark
e73ac84b93 freedreno/ir3: lower extract_byte/word
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:

commit 905ff86198
Author:     Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit:     Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800

    nir: Recognize open-coded extract_u16.

commit 76289fbfa8
Author:     Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit:     Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800

    nir: Recognize open-coded extract_u8.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 14:10:57 -04:00
Ilia Mirkin
c1e4a6bfbf nv50,nvc0: handle SQRT lowering inside the driver
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.

Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).

This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin
b3e7fb5234 nv50/ir: avoid folding mul + add if the mul has a dnz
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin
a651bc027d nvc0: fix blit triangle size to fully cover FB's > 8192x8192
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.

This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-13 13:17:24 -04:00
Rob Clark
01b071d530 freedreno: OUT_RELOC vs OUT_RELOCW fixes
Make sure we use OUT_RELOCW() in cases where the buffer is written to.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
f68c6951b8 freedreno/a4xx: hw binning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
b3fe196e21 freedreno/a4xx: use generated headers for draw initiator
No need to open-code this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
2224ba5976 freedreno/a4xx: remove RB_RENDER_CONTROL patching
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one.  It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
8824a765a2 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
476551a21f freedreno/a3xx: move where we deal w/ binning FS
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
dd9135c452 freedreno/a4xx: move where we deal w/ binning FS
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
09b3447344 freedreno/a3xx: constify the shader variants
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
5b955f09f7 freedreno/a4xx: constify the shader variants
Most of the driver just needs read-only access, so constify..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Rob Clark
d9395e4ed8 freedreno/a3xx: remove duplicate mark of end of binning cmds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Nicolai Hähnle
28d2a7e67c radeonsi: avoid crash when a sampler state is bound for a buffer texture
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Tested-by: Shawn Starr <shawn.starr@rogers.com>
2016-03-13 09:37:23 -05:00
Matt Turner
61b10b4eb7 i965: Use foreach_in_list_reverse_safe() macro.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-12 19:23:50 -08:00
Jason Ekstrand
98d58e7320 nir/clone: Add support for cloning a single function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
036b209484 nir/validate: Better function validation
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
f86f3c90aa nir/print: Better function argument printing
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration.  This changes
nir_print to print the type along with each parameter or return variable.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
13969565f9 nir/print: Factor variable name lookup into a helper
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
e4bebe8a02 nir: Create function parameters in function_impl_create
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
066d3c115e nir: Add a helper for creating a "bare" nir_function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
2ef4754a20 nir: Add a new "param" variable mode for parameters and return variables
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand
41ae553fda nir/glsl: Remove dead function parameter handling code
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning.  Let's just get rid
of it for now.  We can always put it back in if we decide to use NIR for
function inlining at some point in the future.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jordan Justen
b83785d86d anv/gen7: Add stall and flushes before switching pipelines
This is a port of 18c76551ee from OpenGL
to Vulkan.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 13:13:37 -08:00
Jordan Justen
c8ec65a1f5 anv: Add flush_pipeline_before_pipeline_select
flush_pipeline_before_pipeline_select adds workarounds required before
switching the pipeline.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 13:13:37 -08:00
Jordan Justen
1b126305de anv/genX: Add flush_pipeline_select_gpgpu
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 12:43:46 -08:00
Jason Ekstrand
41af9b2e51 HACK: Don't re-configure L3$ in render stages pre-BDW
This fixes a "regression" on Haswell and prior caused by merging the gen7
and gen8 flush_state functions.  Haswell should still work just fine if
you're on a 4.4 kernel, but we really should make it detect the command
parser version and do something intelligent.
2016-03-12 08:57:16 -08:00
Boyuan Zhang
6cf120ec77 st/va: add HEVC main 10 profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang
06c862d67d radeon/video: enable HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang
8be9efcce7 radeon/uvd: handle HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Jason Ekstrand
753ebe4457 anv/x11: Reset the SHM fence before presenting the pixmap
This seems to fix the flicker issue that I was seeing with dota2
2016-03-11 17:22:46 -08:00
Kristian Høgsberg Kristensen
9bff5266be anv/x11: Add present support
The old DRI3 implementation just used CopyArea instead of present.  We
still don't support all the MST fancyness, but it should at least avoid
some copies and allow for.

v2 (Jason Ekstrand):
   - Better object cleanup and destruction
   - Handle the CONFIGURE_NOTIFY event and return OUT_OF_DATE when needed
   - Track dirtyness via IDLE_NOTIFY rather than interating through the
     images sequentially
2016-03-11 16:54:17 -08:00
Jason Ekstrand
e920b184e9 anv/x11: Split image creation into a helper function
This lets us clean up error handling and make it correct.
2016-03-11 12:28:34 -08:00
Jason Ekstrand
41a147904a anv/wsi: Throttle rendering to no more than 2 frames ahead
Right now, Vulkan apps can pretty easily DOS the GPU by simply submitting a
lot of batches.  This commit makes us wait until the rendering for earlier
frames is comlete before continuing.  By waiting 2 frames out, we can still
keep the pipe reasonably full but without taking the entire system down.
This is similar to what the GL driver does today.
2016-03-11 11:31:13 -08:00
Jason Ekstrand
132f079a8c anv/gem: Use C99-style struct initializers for DRM structs
This is more consistent with the way the rest of the driver works and
ensures that all structs we pass into the kernel are zero'd out except for
the fields we actually want to fill.  We were previously doing then when
building with valgrind to keep valgrind from complaining.  However, we need
to start doing this unconditionally as recent kernels have been getting
touchier about this.  In particular, as of kernel commit b31e51360e88 from
Chris Wilson, context creation and destroy fail if the padding bits are not
set to 0.
2016-03-11 11:31:03 -08:00
Ben Widawsky
d1ab544bb8 i965/chv: Display proper branding
"Braswell" is a Cherryview based *thing*. It unfortunately requires extra
information to determine its marketing name. Unlike all previous products, and
hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to
brand string.

I put up a fight about adding any complexity to our GL renderer string code for
a very long time. However, a wise man made a comment to me that I couldn't argue
with: if a user installs Windows on their hardware, the brand string should be
the same as what we display in Linux. The Windows driver apparently does this
check, so we should too.

Note that I did manage to find a good use for this info anyway in the compute
shader thread counts.

v2: memcpy instead of strncpy, and some minor changes (Matt)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky
5e6a43a001 i965/chv: Update lower min for CS threads
We have better information now, and 28 was not a valid thing to support. 6 EUs
per sublice with 7 threads per EU is the minimum supported config.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky
3dc3dbc8d8 i965/chv: Check that compute threads are above threshold
The way we are organizing this code, the statically configured max_cs_threads
should always be the minimum value we actually support (ie. are aware of). As a
result, we can fall back to that if we get invalid numbers from the kernel (ie.
when the query succeeds, but the result is lower than expected).

I was originally planning to use an assert, but there is no reason to be so
mean.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky
9dd20b715a i965/chv: Use kernel provided info for max_cs_threads
With the previous patches, the code can find out the actual number of available
compute threads. It is enabled only for Cherryview since that is the only
platform I know for a fact has shipped devices which can benefit from this.  It
seems like other platforms /might/ benefit from this because of fused
configurations which /might/ have shipped. Fallback code is still there.

v2: Some minor adjustments from Matt

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
2016-03-11 11:17:28 -08:00
Ben Widawsky
38eb606884 i965: Query and store GPU properties from kernel
Certain products are not uniquely identifiable based on device id alone. The
kernel exports an interface to help deal with this. This patch merely introduces
the consumer of the interface and makes sure nothing breaks.

It is also possible to use these values for programming GPGPU mode, and I plan
to do that as well.

The interface was introduced in libdrm 2.4.60, which is already required, so it
should all be fine.

v2: Some minor changes recommended by Matt

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-11 11:17:28 -08:00
Nicolai Hähnle
9908b13af6 st/mesa: check that the image unit is valid in st_bind_images
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-11 11:53:40 -05:00
Bas Nieuwenhuizen
417b6721a0 radeonsi: Lazily re-set sampler views after disabling DCC
Clear DCC flags if necessary when binding a new sampler view.

v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-11 11:51:15 -05:00
Marek Olšák
af3454cad5 st/mesa: remove ST_NEW_MESA flag (v2)
Only used indirectly when checking dirty.st != 0

v2: also update st_cb_compute.c

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-11 16:07:18 +01:00
Nicolai Hähnle
e502801d98 r600g: clear compressed_depthtex/colortex_mask when binding buffer texture
Found by inspection of the source based on a bisected bug report.

This bug has been in the code for a long time, but the more recent PBO upload
feature exposed it because it leads to more uses of buffer textures.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94388
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-11 08:00:15 -05:00
Ilia Mirkin
f8ea98e4ec st/mesa: add GL_ARB_shader_atomic_counter_ops support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-10 22:36:17 -05:00
Ilia Mirkin
075a5742bf mesa: add GL_ARB_shader_atomic_counter_ops support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-10 22:34:46 -05:00
Ilia Mirkin
a8819fb1ff nvc0: add support for TGSI FMA ops
This will allow the nouveau backend to not try and split up ops that are
fused in GLSL.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-10 22:34:28 -05:00
Nicolai Hähnle
59c5508b9a radeonsi: update compressed_colortex_masks when a cmask is created or disabled
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:52 -05:00
Nicolai Hähnle
da68a9b215 radeonsi: move si_decompress_textures to si_blit.c
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:49 -05:00
Nicolai Hähnle
f03c9e5692 r600g: update compressed_colortex_masks when a cmask is created or disabled
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:46 -05:00
Nicolai Hähnle
784269aa40 gallium/radeon: notify all contexts when cmasks are enabled/disabled
There is an annoying corner case that I stumbled across while looking into
piglit's arb_shader_image_load_store/execution/load-from-cleared-image.shader_test
(which can be easily adapted to demonstrate the bug without the
ARB_shader_image_load_store extension)

When we bind a texture and then clear it using glClear (by attaching it
to the current framebuffer) for the first time, we allocate a separate
cmask for the texture to do fast clear, but the corresponding bit in
compressed_colortex_mask is not set. Subsequent rendering will use
incorrect data.

Conversely, when a currently bound texture with an existing cmask is
exported leading to that cmask being disabled, the compressed_colortex_mask
bit will remain set, leading to an assertion later on in debug builds.

Since iterating through all contexts and/or remembering where every
texture is bound would be costly, and cmask enable/disable should be
rare, we will maintain a global counter to signal contexts that they
must update their compressed_colortex_masks.

This patch introduces the global counter, and subsequent patches will
do the mask update.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:00 -05:00
Kenneth Graunke
9ea00c6f6b i965: Set a proper _BaseFormat for window system renderbuffers in ES.
intel_alloc_private_renderbuffer_storage did:

   rb->_BaseFormat = _mesa_base_fbo_format(ctx, internalFormat);

Unfortunately, internalFormat was usually an unsized format (such as
GL_DEPTH_COMPONENT).  In OpenGL ES, _mesa_base_fbo_format() refuses to
accept unsized formats, and returns 0 rather than a real base format.

This meant that we ended up with a completely bogus rb->_BaseFormat for
window system buffers on OpenGL ES.  All other renderbuffer allocation
functions in intel_fbo.c instead use the mesa_format, and do:

   rb->_BaseFormat = _mesa_get_format_base_format(...);

We can do likewise, using rb->Format.  This appears to work just fine.

dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial
failed, as it tried to perform a GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE query
on the window system depth buffer.  That query relies on a proper
rb->_BaseFormat being set, so it broke because rb->_BaseFormat was 0 due
to the above bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94458
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-10 11:23:52 -08:00
Kenneth Graunke
e032e4ad5a glcpp: Fix locations when encounting "#<NEWLINE>".
We were failing to reset our location tracking when encountering a
NEWLINE in the <HASH> state.  Rip the code from the <*>{NEWLINE} rule,
which handles this properly.

Also, update 146-version-first-hash.c to have proper expectations.
When I introduced the test, I didn't verify that the line/column
numbers were correct, and it turns out they varied based on the type
of newline ending.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94447
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-10 11:23:26 -08:00
Jason Ekstrand
1f3d582cba isl/surface_state: Set the clear color 2016-03-10 10:41:52 -08:00
Jason Ekstrand
8c819b8c2b genxml/gen75: Add the clear color bits to RENDER_SURFACE_STATE 2016-03-10 10:41:52 -08:00
Jason Ekstrand
6f47ed28b4 isl: Add more helpers for determining if a format is an integer format 2016-03-10 10:41:52 -08:00
Jason Ekstrand
b0e423cc4f isl: Remove redundant check
The green channel was checked twice.
2016-03-10 10:41:52 -08:00
Tim Rowley
84f857bef7 gallium/swr: remove use of BYTE from swr driver
Remove use of a win32-style type leaked from the swr rasterizer.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-03-10 11:20:58 -06:00
Samuel Pitoiset
dad3e5f4ef nvc0: expose SM35 perf counters to AMD_performance_monitor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:40 +01:00
Samuel Pitoiset
0e511400de nvc0: add driver metrics for SM35 (GK110)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:38 +01:00
Samuel Pitoiset
bf840aa523 nvc0: add MP performance counters for SM35 (GK110)
Because compute support is not enabled by default for these chipsets,
NVF0_COMPUTE=1 needs to be used, along with GALLIUM_HUD to enable
performance counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:35 +01:00
Samuel Pitoiset
f289e99dee nvc0: explode config of Kepler hardware SM events
This is really verbose but most of the configuration will be reused
for SM35 (GK110).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:32 +01:00
Samuel Pitoiset
a0ce8536b3 nvc0: rework the driver metrics infrastructure
This follows the same design as MP perf counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:29 +01:00
Samuel Pitoiset
41fb87249a nvc0: rework the MP counters infrastructure
This mainly improves how we define the different list of queries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:26 +01:00
Marek Olšák
7b29188a3f egl: clean up typedef madness in the backend API
let's use the dd.h format

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-10 18:03:14 +01:00
Iago Toral Quiroga
3e3de9ec0a glsl: report correct number of allowed vertex inputs and fragment outputs
Before we would always report 16 for both and we would only fail if either
one exceeded 16. Now we fail if the maximum for each is exceeded, even if
it is smaller than 16 and we report the correct maximum.

Also, expand the size of to_assign[] to 32. There is code at the top
of the function handling max_index up to 32, so this just makes the
code more consistent.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-10 08:48:53 +01:00
Vinson Lee
d46feee697 nouveau: Fix clang reserved-user-defined-literal error.
CXX      codegen/nv50_ir.lo
In file included from codegen/nv50_ir.cpp:28:
./nouveau_debug.h:19:30: error: invalid suffix on literal; C++11 requires a space between literal and identifier
      [-Wreserved-user-defined-literal]
   fprintf(stderr, "%s:%d - "fmt, __FUNCTION__, __LINE__, ##args)
                             ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-09 23:00:45 -08:00
Kenneth Graunke
3823b53ff8 mesa: Make glGetInteger64v convert float/doubles to 32-bit integers.
According to the GL 4.4 core specification, section 2.2.2 ("Data
Conversions For State Query Commands"):

"If a command returning integer data is called, such as GetIntegerv or
 GetInteger64v, a boolean value of TRUE or FALSE is interpreted as one
 or zero, respectively. A floating-point value is rounded to the nearest
 integer, unless the value is an RGBA color component, a DepthRange
 value, or a depth buffer clear value. In these cases, the query command
 converts the floating-point value to an integer according to the INT
 entry of table 18.2; a value not in [−1, 1] converts to an undefined
 value."

The INT entry of table 18.2 shows that b = 32, meaning the expectation
is to convert it to a 32-bit integer value.

Fixes:
dEQP-GLES3.functional.state_query.floats.blend_color_getinteger64
dEQP-GLES3.functional.state_query.floats.color_clear_value_getinteger64
dEQP-GLES3.functional.state_query.floats.depth_clear_value_getinteger64

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94456
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-09 19:44:18 -08:00
Nanley Chery
7fbbad0170 anv/blit2d: Use the tiling enum for simplicity
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Nanley Chery
514c055717 anv/meta: Prefix anv_ to meta_emit_blit()
Follow the convention for non-static functions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Nanley Chery
627728cce5 anv/meta: Split anv_meta_blit.c into three files
The new organization is as follows:
* anv_meta_blit.c: Blit and state setup/teardown commands
* anv_meta_copy.c: Copy and update commands
* anv_meta_blit2d.c: 2D Blitter API commands

Also, change the formatting to contain most lines
within 80 columns.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Nanley Chery
f391683922 anv/meta: Make meta_emit_blit() public
This can be reverted if the only other consumer, anv_meta_blit2d(),
uses a different method.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Nanley Chery
ddbc645846 anv/meta: Store src and dst usage flags in a variable
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Nanley Chery
7ebbc3946a anv/meta: Minimize height of images used for copies
In addition to demystifying the value being added to the height,
this future-proofs the code for new tiling modes and keeps the
image height as small as possible.

v2: Actually use the smallest height possible.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-09 10:57:47 -08:00
Emil Velikov
3dc2630e45 gallium/radeon: use explicit drm_major, drm_minor check
Just like everywhere else in the radeon codebase.

v2: Don't forget about drm_major == 3 (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 17:25:22 +00:00
Emil Velikov
b9c5c4af6d egl/x11: check the return value of xcb_dri2_get_buffers_reply()
... before using it. The function can return NULL, which we should check
prior to refererencing it in the next function(s).

Cc: Fabian Vogt <fvogt@suse.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93667
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-09 17:25:22 +00:00
Emil Velikov
373f118c6c gallium: do not wrap header inclusion in
Add one missing extern C guard within include/pipe/p_video_enums.h, and
remove the wrapping throughout gallium.

On Haiku one could even use the gallium debug_printf() although
that's another topic.

v2: Leave dbghelp.h as is (Jose)

Cc: Jose Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-09 17:21:39 +00:00
Dieter Nützel
69d389c52f opencl: fix .gitignore for .install-gallium-links
Fixes: 0b6157e971 "install-gallium-links: port changes from install-lib-links"

v2: move this to the top level .gitignore and added Fixes:
    like Emil Velikov <emil.l.velikov@gmail.com> suggested

Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:52 +00:00
Emil Velikov
f3e23ead53 egl: remove remnants of MESA_drm_display
Last set in st/egl, unused in mesa-demos and superseded by
EGL_KHR_platform_gbm.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
2295a4b1e1 egl: remove final pieces of KHR_vg_parent_image
Similar to previous commit - unused/unset for a long time.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
c85544a10c glapi: remove the final function offset tags
A commit earlier this year reworked out python scripts to use a separate
file for these. Followed by removing support from the parser, and
removing all of the offset tags.

Seems like we either missed a few, or people added them by mistake.
Either way let's nuke the ones that are still around.

Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
3ffab9a89c winsys/amdgpu/addrlib: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
a07192bd63 mesa/main: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
5351dc1522 i915: limit extern "C" hack only for libdrm headers
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:51 +00:00
Emil Velikov
cf215d92f6 xmesa: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:50 +00:00
Emil Velikov
2af3a0ca6f util/sha: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:50 +00:00
Emil Velikov
d426c17550 egl/wayland: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:50 +00:00
Emil Velikov
750da80b34 gbm: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-09 17:16:50 +00:00
Nicolai Hähnle
9f06e7f5c1 st/mesa: shader image atoms must be before framebuffer update
The reason is that the shader image atoms call st_finalize_texture, which
may set ST_NEW_FRAMEBUFFER.

This fixes an assertion triggered by a subtest of piglit's
arb_shader_image_load_store-invalid.

v2: add comment explaining order constraints (suggested by Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 11:40:06 -05:00
Nicolai Hähnle
4eb416bd9d gallivm: special case TGSI_OPCODE_STORE
This instruction has the resource (buffer or image) as a destination to
represent the writemask for SSBO writes. However, this is obviously not
a "real" destination for the purpose of emitting LLVM IR.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 11:39:55 -05:00
Nicolai Hähnle
10b2b584ee tgsi: set correct output mode for RESQ
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 11:39:43 -05:00
Marek Olšák
dcb2b77823 gallium: add CAPs returning PCI device location
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-09 15:02:28 +01:00
Marek Olšák
737b6ed13e winsys/amdgpu: get PCI info
This will be queried by the OpenCL stack using an interop call.

I have tested that the values match lspci.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:28 +01:00
Marek Olšák
ec74deeb24 radeonsi: set amdgpu metadata before exporting a texture
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:28 +01:00
Nicolai Hähnle
ff7e9412be radeonsi: extract the texture descriptor computation into its own function
This will allow this code to be re-used for shader images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Nicolai Hähnle
1197c69bdd radeonsi: extract the buffer descriptor computation into its own function
This will allow it to be re-used for shader image descriptors.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Nicolai Hähnle
2bf8ee34b8 radeonsi: remove resource field from si_sampler_view
view->resource is redundant with view->base.texture, so get rid of it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
2dec5e09e1 radeonsi: accept pipe_resource in si_sampler_view_add_buffer
and rename .._buffers -> .._buffer

Based loosely on Nicolai's patch. This will make it easier to cherry-pick
Nicolai's patches from his image support branch.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
f18fc70d6f radeonsi: disable DCC on handle export if expecting write access
This should be okay except that sampler views and images are not re-set.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Bas Nieuwenhuizen
1e48ec7571 radeonsi: add DCC decompression (v2)
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.

v2: Marek: rebase, remove decompression from si_flush_resource (not needed)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
b744ac9f44 radeonsi: allocate DCC in the same backing buffer as the texture
To allow sharing textures with DCC enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
60c08aa90b gallium/radeon: disable CMASK on handle export if sharing doesn't allow it (v2)
v2: remove the list of all contexts

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
970b979da1 gallium/radeon: eliminate fast color clear before sharing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
abac6bf67a gallium/radeon: don't use fast color clear if sharing doesn't allow it
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
d4e847ea33 gallium/radeon: disallow handle export for MSAA & depth textures
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
d95f593758 gallium/radeon: remember that texture_from_handle was called and its flags
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
c034d3dde0 gallium/radeon: check that handle usage doesn't change for a resource
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
6b187bbd9f gallium/radeon: disallow reallocation of shared buffers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
ecbd3aba17 gallium/radeon: if we can't discard a whole resource, discard the range instead
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
afdaffcbdb gallium/radeon: buffer valid range tracking only works with unshared buffers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
be73d35829 gallium/radeon: don't set texture metadata for buffers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
f914779c75 gallium/radeon: set texture metadata only once
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
69d8b75114 gallium/radeon: clean up r600_texture_get_handle
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
e3cee38e13 gallium/radeon: move code initializing texture metadata to its own function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
f4aa3256ef winsys/amdgpu: allow drivers to set/get opaque metadata
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
bd1feb2827 gallium/radeon: rename winsys buffer_get/set_tiling to buffer_get/set_metadata
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:26 +01:00
Marek Olšák
6011d7cf25 gallium/radeon: remove rcs parameter from radeon_winsys::buffer_set_tiling
This was needed for DRM < 2.12.0 where the kernel was rewriting tiling flags
in IBs.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:25 +01:00
Marek Olšák
260ef9c9be gallium/radeon: use a structure for passing tiling flags from/to winsys
and call it radeon_bo_metadata

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:25 +01:00
Marek Olšák
82db518f15 gallium: add external usage flags to resource_from(get)_handle (v2)
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.

v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER

Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-03-09 15:02:25 +01:00
Axel Davy
d943ac432d dri: add backbuffer use flag
This will be used by the next commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-09 15:02:25 +01:00
Timothy Arceri
2188c77a0e glsl: dont allow undefined array sizes in ES
This applies the rule to empty declarations.

Fixes:
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_vertex
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_fragment

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-09 20:30:42 +11:00
Jason Ekstrand
248ab61740 anv/cmd_buffer: Pull the core of flush_state into genX_cmd_buffer 2016-03-08 17:10:05 -08:00
Jason Ekstrand
28cbc45b3c anv/cmd_buffer: Split flush_state into two functions 2016-03-08 16:54:07 -08:00
Jason Ekstrand
42b4c0fa6e anv: Pull all of the genX_foo functions into anv_genX.h
This way we only have to declare them each once and we get it for all gens
at a single go.
2016-03-08 16:49:08 -08:00
Tamil velan
353a4f844f radeon/uvd: increase max height to 4096 for VI and newer
With this issue 'mpv --hwdec=vdpau --vo=vdpau <stream>' fails
for vdpau decode if the stream height is 4096. Vdpau decode of
height upto 4096 is necessary usecase on amdgpu driver for VI
and newer platforms.

The fix is in driver specific implementation of "Decoder
Query Capabilities" API to return 4096 for VI and newer
platforms. With this fix vdpauinfo reports height support as
4096 and mpv for vdpau decode works fine for 4096 height streams.

Signed-off-by: Tamil velan <Tamil-Velan.Jayakumar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-08 19:01:19 -05:00
Bas Nieuwenhuizen
6373845d98 winsys/amdgpu: enlarge buffer_indices_hashlist
Enlarge the buffer hashlist to prevent large numbers of misses
due to adding more buffers than can be cached in the hashlist.

The game I tested had CS's with up to 1500 buffers and the overhead
of amdgpu_lookup_buffer for various sizes was:

4096 1.97% (new value)
2048 4.37%
1024 6.92%
512  9.47% (old value)

(percentage of CPU usage in render thread as determined by perf)

The time spent in amdgpu_add_buffer self is ~4.2% in all cases and
for 4096 the time needed to clear the hashlist is still < 0.10%,
so I am not expecting significant regressions.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 00:52:07 +01:00
Jason Ekstrand
bbbdd32c19 anv/meta_clear: Use repclear again 2016-03-08 15:40:11 -08:00
Jason Ekstrand
dc504a51fb anv/pipeline: Unconditionally emit PS_BLEND on gen8+
Special-casing the PS_BLEND packet wasn't really gaining us anything.  It's
defined to be more-or-less the contents of blend state entry 0 only without
the indirection.  We can just copy-and-paste the contents.  If there are no
valid color targets, then blend state 0 will be 0-initialized anyway so
it's basically the same as the special case we had before.
2016-03-08 15:40:11 -08:00
Jason Ekstrand
cce65471b8 anv: Compact render targets
Previously, we would always emit all of the render targets in the subpass.
This commit changes it so that we compact render targets just like we do
with other resources.  Render targets are represented in the surface map by
using a descriptor set index of UINT16_MAX.
2016-03-08 15:40:11 -08:00
Samuel Pitoiset
32e848b016 nvc0: add a new validation path for compute
This makes use of the new state validation interface to be consistent
with 3d.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-09 00:19:21 +01:00
Samuel Pitoiset
db9b41d302 nvc0: rework the validation path for 3D
This exposes an interface for state validation that will be also used
to rework the compute validation path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-09 00:19:16 +01:00
Jordan Justen
a100a57e30 i965/hsw: Initialize SLM index in state register
For Haswell, we need to initialize the SLM index in the state
register. This can be copied out of the CS header dword 0.

v2:
 * Use UW move to avoid changing upper 16-bits of sr0.1 (mattst88)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94081
Fixes: piglit arb_compute_shader/execution/shared-atomics.shader_test
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-08 14:27:18 -08:00
Jordan Justen
d8347f12ea i965/compute: Skip SIMD8 generation if it can't be used
If the local workgroup size is sufficiently large, then the SIMD8
program can't be used. In this case we can skip generating the SIMD8
program. For complex programs this can save a significant amount of
time.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-08 14:27:18 -08:00
Jordan Justen
e1d54b1ba5 i965/fs: Allow spilling for SIMD16 compute shaders
For fragment shaders, we can always use a SIMD8 program. Therefore, if
we detect spilling with a SIMD16 program, then it is better to skip
generating a SIMD16 program to only rely on a SIMD8 program.

Unfortunately, this doesn't work for compute shaders. For a compute
shader, we may be required to use SIMD16 if the local workgroup size
is bigger than a certain size. For example, on gen7, if the local
workgroup size is larger than 512, then a SIMD16 program is required.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93840
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-08 14:27:18 -08:00
Timothy Arceri
91630d7453 glsl: don't always reject shaders with mismatching ifc blocks
Since we store some member qualifiers in the interface type
we need to be more careful about rejecting shaders just because
the pointer doesn't match. Its perfectly valid for some qualifiers
such as precision to not match across shader interfaces.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-09 09:21:42 +11:00
Timothy Arceri
3026b3565a glsl: make interstage_match() static
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-09 09:21:36 +11:00
Timothy Arceri
ebc419fcbd glsl: don't validate ifc blocks using validation meant for variables
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-09 09:21:31 +11:00
Kenneth Graunke
19f13b2096 mesa: Fix error code for GetFramebufferAttachmentParameter in ES 3.0+.
The ES 3.0+ specifications contain the exact same text as the OpenGL
specification, which says that we should return GL_INVALID_OPERATION.

ES 2.0 contains different text saying we should return GL_INVALID_ENUM.

Previously, Mesa chose the error code based on API (GL vs. ES).
This patch makes ES 3.0+ follow the GL behavior.  ES 2 remains as is.

Fixes dEQP-GLES3.functional.fbo.api.attachment_query_empty_fbo.
However, breaks the dEQP-GLES2 variant of the same test for drivers
which silently promote to ES 3.0.  This can be worked around by
exporting MESA_GLES_VERSION_OVERRIDE=2.0, but is a bug in dEQP-GLES2.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-08 12:46:28 -08:00
Kenneth Graunke
8b3496f378 mesa: Add GL_RED and GL_RG to ES3 effective internal format mapping.
The dEQP-GLES3.functional.fbo.completeness.renderable.texture.
{color0,depth,stencil}.{red,rg}_unsigned_byte tests appear to expect
GL_RED/GL_RG and GL_UNSIGNED_BYTE to map to GL_R8/GL_RG8, rather than
returning an INVALID_OPERATION error.

This makes perfect sense.  However, RED and RG are strangely missing
from the ES 3.0/3.1/3.2 spec's "Effective internal format corresponding
to external format and type" tables.  It may be worth filing a spec bug.

Fixes the 6 dEQP tests mentioned above.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-03-08 12:46:28 -08:00
Samuel Pitoiset
752769e053 nv50,nvc0: make sure to destroy the mutex used for blits
This mutex is initialized when the blitter is created, but it is never
destroyed. This doesn't hurt anything but it makes sense to destroy it
at blitter deletion.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-08 21:24:46 +01:00
Marek Olšák
3146014d5f gallium/radeon: don't use temporary buffers for persistent mappings
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-08 20:08:52 +01:00
Jason Ekstrand
14b18aba89 nir: Add a pass for lower indirect variable dereferences
This new pass lowers load/store_var intrinsics that act on indirect derefs
to if-ladder of direct load/store_var intrinsics.  The if-ladders perform a
simple binary search on the indirect.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-08 10:41:54 -08:00
Alejandro Piñeiro
ef76ea4ba9 i965/fs/nir: "surface_access::" prefix not needed
"using namespace brw::surface_access" is already present at the
top of the source file.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-08 17:55:28 +01:00
Brian Paul
6857420e79 mesa: fix malformed assertion in _image_format_class_to_glenum()
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-03-08 08:42:56 -07:00
Brian Paul
3ed8729f7b program: minor whitespace clean-ups in program_parse_extra.c 2016-03-08 08:42:56 -07:00
Christian König
37402aa4c6 st/mesa: conditionally enable GL_NV_vdpau_interop
Only enable it when we compile the state tracker as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-08 13:00:04 +01:00
Christian König
e148a3b6e9 radeon/uvd: disable MPEG1
The hardware simply doesn't support that correctly.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-08 12:57:08 +01:00
Alejandro Piñeiro
0548844e86 i965/vec4/nir: no need to use surface_access:: to call emit_untyped_atomic
Now that brw_vec4_visitor::emit_untyped_atomic was removed, there is no need
to explicitly set it.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-08 08:22:26 +01:00
Alejandro Piñeiro
d3a89a7c49 i965/vec4/nir: remove emit_untyped_surface_read and emit_untyped_atomic at brw_vec4_visitor
surface_access emit_untyped_read and emit_untyped_atomic provides the same
functionality.

v2: surface parameter of emit_untyped_atomic is a const, no need to
    specify default predicate on emit_untyped_atomic, use retype
    (Francisco Jerez).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-08 08:22:26 +01:00
Alejandro Piñeiro
0c5c2e2c93 i965/vec4: pass the correct src_sz to emit_send at emit_untyped_atomic
If the src is invalid, so src size is zero, the src_sz passed to emit
send should be zero too, instead of a default 1 if we are in a simd4x2
case. This can happens if using emit_untyped_atomic for an atomic
dec/inc.

v2: use the proper src_sz when calling emit_send, instead of just
    avoid loading src at emit_send if BAD_FILE (Francisco Jerez)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-08 08:22:26 +01:00
Kenneth Graunke
ea9fa5ff05 glcpp: Remove empty mid-rule action which changes test behavior.
Apparently this causes a slight difference in the parser's token
expectations, leading to a different error message.

It seems harmless, but I wanted to be cautious and separate it out.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-07 23:02:05 -08:00
Kenneth Graunke
e816c8b54a glcpp: Clean up most empty mid-rule actions left by previous commit.
I didn't want to pollute the previous patch with all the $4 -> $3
changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-07 23:02:03 -08:00
Kenneth Graunke
639bbe3cb4 glcpp: Delete unnecessary implicit version resolves.
We now have a bigger hammer.  The HASH_TOKEN NEWLINE rule still needs
to exist to ensure the 146-version-hash-first.c test still passes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-07 23:02:01 -08:00
Kenneth Graunke
07ec67d85c glcpp: Implicitly resolve version after the first non-space/hash token.
We resolved the implicit version directive when processing control lines,
such as #ifdef, to ensure any built-in macros exist.  However, we failed
to resolve it when handling ordinary text.

For example,

        int x = __VERSION__;

should resolve __VERSION__ to 110, but since we never resolved the implicit
version, none of the built-in macros exist, so it was left as is.

This also meant we allowed the following shader to slop through:

        123
        #version 120

Nothing would cause the implicit version to take effect, so when we saw
the #version directive, we thought everything was peachy.

This patch makes the lexer's per-token action resolve the implicit
version on the first non-space/newline/hash token that isn't part of
a #version directive, fulfilling the GLSL language spec:

"The #version directive must occur in a shader before anything else,
 except for comments and white space."

Because we emit #version as HASH_TOKEN then VERSION_TOKEN, we have to
allow HASH_TOKEN to slop through as well, so we don't resolve the
implicit version as soon as we see the # character.  However, this is
fine, because the parser's HASH_TOKEN NEWLINE rule does resolve the
version, disallowing cases like:

        #
        #version 120

This patch also adds the above shaders as new glcpp tests.

Fixes dEQP-GLES2.functional.shaders.preprocessor.predefined_macros.
{gl_es_1_vertex,gl_es_1_fragment}.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-07 23:01:43 -08:00
Jason Ekstrand
75af420cb1 anv/pipeline: Move binding table setup to its own helper 2016-03-07 22:24:31 -08:00
Jason Ekstrand
2308891ede anv: Store CPU-side fence information in the BO
This reduces the number of allocations a bit and cuts back on memory usage.
Kind-of a micro-optimization but it also makes the error handling a bit
simpler so it seems like a win.
2016-03-07 22:23:44 -08:00
Jason Ekstrand
f61d40adc2 anv/allocator: Better casting in PFL macros
We cast he constant 0xfff values to a uintptr_t before applying a bitwise
negate to ensure that they are actually 64-bit when needed.  Also, the
count variable doesn't need to be explicitly cast, it will get upcast as
needed by the "|" operation.
2016-03-07 22:23:44 -08:00
Jason Ekstrand
3d4f2b0927 anv/allocator: Move the alignment assert for the pointer free list
Previously we asserted every time you tried to pack a pointer and a counter
together.  However, this wasn't really correct.  In the case where you try
to grab the last element of the list, the "next elemnet" value you get may
be bogus if someonoe else got there first.  This was leading to assertion
failures even though the allocator would safely fall through to the failure
case below.
2016-03-07 22:23:44 -08:00
Jason Ekstrand
8c2b9d1529 anv/bo_pool: Allow freeing BOs where the anv_bo is in the BO itself 2016-03-07 22:23:44 -08:00
Tim Rowley
90f9df3210 gallium/swr: fix issues preventing a 32-bit build
Not a currently tested configuration, but these couple of small changes
allow a 32-bit build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94383
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-07 17:22:24 -06:00
Nanley Chery
181b142fbd anv/device: Up device limits for 3D and array texture dimensions
The limit for these textures is 2048 not 1024.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-07 15:21:50 -08:00
Tim Rowley
035d39b539 gallium/swr: remove use of UINT64 from swr_fence
Remove use of a win32-style type leaked from the swr rasterizer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-07 16:58:48 -06:00
Jason Ekstrand
428ffc9c13 anv/device: Actually free the CPU-side fence struct again
In 23de78768, when we switched from allocating individual BOs to using the
pool for fences, we accidentally deleted the free.
2016-03-07 14:50:52 -08:00
Kenneth Graunke
af41c0b7e0 glsl: Add function parameters to the parser symbol table.
In a shader such as:

    struct S { float f; }
    float identity(float S) { return S; }

we would think that "S" in "return S" referred to a structure, even
though it's shadowed by the "float S" parameter in the inner struct.

This led to the parser's grammar seeing TYPE_IDENTIFIER and getting
confused.

Fixes dEQP-GLES2.functional.shaders.scoping.valid.
function_parameter_hides_struct_type_{vertex,fragment}.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-07 14:09:55 -08:00
Kenneth Graunke
c4960068d5 glsl: Add single declaration variables to the symbol table too.
The lexer/parser use a symbol table to classify identifiers as
variables, functions, or structure types.

For some reason, we neglected to add variables in simple declarations
such as

    int x = 5;

but did add subsequent variables in multi-declarations:

    int x = 5, y = 6; // y gets added, but not x, for some reason

Fixes four dEQP-GLES2.functional.shaders.scoping.valid subcases:
- local_int_variable_hides_struct_type_vertex
- local_int_variable_hides_struct_type_fragment
- local_struct_variable_hides_struct_type_vertex
- local_struct_variable_hides_struct_type_fragment

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-07 14:09:31 -08:00
Kenneth Graunke
1107e48b9a mesa: Change GLboolean to bool in GenerateMipmap target checker.
This is not API facing, so just use bool.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-07 14:01:34 -08:00
Kenneth Graunke
2f8a43586e mesa: Make GenerateMipmap check the target before finding an object.
If glGenerateMipmap was called with a bogus target, then it would
pass that to _mesa_get_current_tex_object(), which would raise a
_mesa_problem() telling people to file bugs.  We'd then do the
proper error checking, raise an error, and bail.

Doing the check first avoids the _mesa_problem().  The DSA variant
doesn't take a target parameter, so we leave the target validation
exactly as it was in that case.

Fixes one dEQP GLES2 test:
dEQP-GLES2.functional.negative_api.texture.generatemipmap.invalid_target.

v2: Rebase on Antia's recent patch to this area.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-07 14:01:22 -08:00
Samuel Pitoiset
8f99c1bbce gm107/ir: add emission for ATOMS
This allows to perform atomic operations on shared memory.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 22:13:14 +01:00
Samuel Pitoiset
7f8565f0b2 tgsi: fix parsing of shared memory declarations
The SHARED TGSI keyword is only allowed with TGSI_FILE_MEMORY and not
with TGSI_FILE_BUFFER. I have found this by using the nouveau_compiler
from command line.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-03-07 22:13:08 +01:00
Samuel Pitoiset
c82086f7e9 gm107/ir: add emission for BAR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 18:39:50 +01:00
Samuel Pitoiset
8a109c0375 gk110/ir: add missing src predicate emission for BAR.RED
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 18:39:48 +01:00
Samuel Pitoiset
f4d2d49152 gk110/ir: allow to emit immediates for BAR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 18:39:46 +01:00
Samuel Pitoiset
cba89fdaa1 gk110/ir: fix wrong emission of BAR.SYNC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 18:39:43 +01:00
Samuel Pitoiset
5777e87bed nvc0/ir: make sure that thread count immediate for BAR fit
The limit of the thread count immediate value is 12 bits.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-07 18:39:41 +01:00
Brian Paul
3af78b426e svga: add new surface-write-flushes HUD query
To know when we're flushing the command buffer because we need to
write to surface in the command buffer.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-07 09:33:15 -07:00
Brian Paul
7e8cf34546 svga: add new flush-time HUD query
To measure the time spent flushing the command buffer.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-07 09:33:15 -07:00
Brian Paul
903afc370f svga: also dump SVGA3D_BUFFER surfaces in svga_screen_cache_dump()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-07 09:33:15 -07:00
Kristian Høgsberg Kristensen
32aa01663f anv: Quiet pTessellationState warning
Some application pass a dummy for pTessellationState which results in a
lot of noise. Only warn if we're actually given tessellation shadear
stages.
2016-03-06 22:06:24 -08:00
Ilia Mirkin
0941ef3dd5 mesa: flip current tf object back to default if current is being deleted
In the rather unusual case of Bind + Delete, we need to make sure that
we unbind the current tf object.

Fixes dEQP-GLES3.functional.lifetime.delete_bound.transform_feedback

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-07 00:36:08 -05:00
Ilia Mirkin
f6827e20d1 glsl: avoid stack smashing when there are too many attributes
This fixes a crash in

dEQP-GLES3.functional.transform_feedback.array_element.separate.points.lowp_mat3x2

and likely others. The vertex shader has > 16 input variables (without
explicit locations), which causes us to index outside of the to_assign
array.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-07 00:36:08 -05:00
Jason Ekstrand
23de78768b anv: Create fences from the batch BO pool
Applications may create a *lot* of fences, perhaps as much as one per
vkQueueSubmit.  Really, they're supposed to use ResetFence, but it's easy
enough for us to make them crazy-cheap so we might as well.
2016-03-06 14:26:52 -08:00
Francisco Jerez
3dd0441f6c i965/vec4: Propagate swizzles correctly during copy propagation.
This simplifies the code that iterates over the per-component values
found in the matching copy_entry struct and checks whether the
register regions that were copied to each component are similar enough
to be treated as a single (reswizzled) value which can be propagated
into the current instruction.

Aside from being scattered between opt_copy_propagation(),
try_copy_propagate(), and try_constant_propagate(), what I found
terribly confusing about the preexisting logic was that
opt_copy_propagation() tried to reorder the array of values according
to the swizzle of the instruction source, which meant one would have
had to invert the reordering applied at the top level in order to find
out which component to take from each value (we were just taking the
i-th component from the i-th value, which is not correct in general).
The saturate mask was also being swizzled incorrectly.

This consolidates the logic for matching multiple components of a
copy_entry into a single function which returns the result as a
regular src_reg on success, as if the copy had been performed with a
single MOV instruction copying all components of the src_reg into the
destination.

Fixes several ARB_vertex_program MOV test-cases from:
 https://cgit.freedesktop.org/~kwg/piglit/log/?h=arb_program

Acked-by: Matt Turner <mattst88@gmail.com>
2016-03-06 12:22:40 -08:00
Francisco Jerez
c70b7c80e3 i965: Don't try copy propagation if constant propagation succeeded.
It cannot get any better.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-06 12:22:40 -08:00
Francisco Jerez
dcf5e19e65 i965/vec4: Use swizzle() to swizzle immediates during constant propagation.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-06 12:22:40 -08:00
Francisco Jerez
ff7a2b489e i965: Add support for swizzling arbitrary immediates to (brw_)swizzle().
Scalar immediates used to be handled correctly by swizzle() (as the
identity) but since commit 58fa9d47b5 it
will corrupt the contents of the immediate.  Vector immediates were
never handled correctly, but we had ad-hoc code to swizzle VF
immediates in the vec4 copy propagation pass.  This takes care of
swizzling V and UV in addition.

v2: Don't implement swizzling of V/UV immediates (Matt).  If you need
    to swizzle an integer vector immediate in the future apply the
    following diff to go back to v1:

--- a/src/mesa/drivers/dri/i965/brw_eu.c
+++ b/src/mesa/drivers/dri/i965/brw_eu.c
@@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod)
 static unsigned
 imm_shift(enum brw_reg_type type, unsigned i)
 {
-   assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V &&
-          "Not implemented.");
-
    if (type == BRW_REGISTER_TYPE_VF)
       return 8 * (i & 3);
+   else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V)
+      return 4 * (i & 7);
    else
       return 0;
 }

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-06 12:22:40 -08:00
Francisco Jerez
537d3df974 i965: Pass symbolic swizzle to brw_swizzle() as a single argument.
And replace brw_swizzle1() with brw_swizzle().  Seems slightly cleaner
and will allow reusing brw_swizzle() in the vec4 back-end more easily.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-06 12:22:39 -08:00
Ilia Mirkin
ff085d014e nvc0: reset TFB bufctx when we no longer hold a reference to the buffers
This fixes some use-after-free situations in dEQP when an xfb state is
removed, and then a clear is triggered, which only does a partial
validation. It would attempt to read the no-longer-valid buffers,
resulting in crashes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-06 10:14:52 -05:00
Jason Ekstrand
21ee5fd326 anv: Emit null render targets
v2 (Francisco Jerez): Add the state_offset to the surface state offset
2016-03-05 20:47:10 -08:00
Ilia Mirkin
fa43c4bd99 nv50/ir: using sampleid/pos shouldn't force per-sample interpolation
See https://www.khronos.org/bugzilla/show_bug.cgi?id=1462

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-05 23:26:03 -05:00
Ilia Mirkin
313205cb8f st/mesa: don't force per-sample interp if only sampleid/pos are used
The OES extensions clarify this behaviour to differentiate between
per-sample invocation and per-sample interpolation. Using sampleid/pos
will force per-sample invocation but not per-sample interpolation.

See https://www.khronos.org/bugzilla/show_bug.cgi?id=1462

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-05 23:26:03 -05:00
Ilia Mirkin
dcbf8377be swrast: fix GL_ANY_SAMPLES_PASSED values in Result
Since commit 922be4eab, the expectation is that the query result
contains the correct value. Unfortunately swrast does not distinguish
between GL_SAMPLES_PASSED and GL_ANY_SAMPLES_PASSED. As a result, we
must fix up the query result in a post-draw fixup.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94274
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-03-05 23:25:52 -05:00
Jason Ekstrand
8502794c12 anv/pipeline: Handle null wm_prog_data in 3DSTATE_CLIP 2016-03-05 14:42:16 -08:00
Kristian Høgsberg Kristensen
7b348ab8a0 anv: Fix rebase error 2016-03-05 14:33:50 -08:00
Kristian Høgsberg Kristensen
34326f46df anv: Turn pipeline cache on by default
Move the environment variable check to cache creation time so we block
both lookups and uploads if it's turned off.
2016-03-05 13:54:24 -08:00
Kristian Høgsberg Kristensen
f2b37132cb anv: Check if shader if present before uploading to cache
Between the initial check the returns NO_KERNEL and compiling the
shader, other threads may have added the shader to the cache. Before
uploading the kernel, check again (under the mutex) that the compiled
shader still isn't present.
2016-03-05 13:54:24 -08:00
Kristian Høgsberg Kristensen
30bbe28b7e anv: Always use point size from the shader
There is no API for setting the point size and the shader is always
required to set it. Section 24.4:

   "If the value written to PointSize is less than or equal to zero, or
    if no value was written to PointSize, results are undefined."

As such, we can just always program PointWidthSource to Vertex. This
simplifies anv_pipeline a bit and avoids trouble when we enable the
pipeline cache and don't have writes_point_size in the prog_data.
2016-03-05 13:54:24 -08:00
Kristian Høgsberg Kristensen
6139fe9a77 anv: Also cache the struct anv_pipeline_binding maps
This is state the we generate when compiling the shaders and we need it
for mapping resources from descriptor sets to binding table indices.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
584f39c65e anv: Don't re-upload shaders when merging
Using anv_pipeline_cache_upload_kernel() will re-upload the kernel and
prog_data when we merge caches. Since the kernel and prog_data is
already in the program_stream, use anv_pipeline_cache_add_entry()
instead to only add the entry to the hash table.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
626559ed37 anv: Add anv_pipeline_cache_add_entry()
This function will grow the cache to make room and then add the entry.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
07441c344c anv: Rename anv_pipeline_cache_add_entry() to 'set'
This function is a helper that unconditionally sets a hash table entry
and expects the cache to have enough room. Calling it 'add_entry'
suggests it will grow the cache as needed.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
87967a2c85 anv: Simplify pipeline cache control flow a bit
No functional change, but the control flow around searching the cache
and falling back to compiling is a bit simpler.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
2b29342fae anv: Store prog data in pipeline cache stream
We have to keep it there for the cache to work, so let's not have an
extra copy in struct anv_pipeline too.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
37c5e70253 anv: Rename 'table' to 'hash_table' in anv_pipeline_cache
A little less ambiguous.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
c028ffea70 anv: Serialize as much pipeline cache as we can
We can serialize as much as the application asks for and just stop once
we run out of memory. This lets applications use a fixed amount of
space for caching and still get some benefit.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
cd812f086e anv: Use 1.0 pipeline cache header
The final version of the pipeline cache header adds a few more fields.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
26ed943eb9 anv: Fix shader key hashing
This was copied from inline code to a helper and wasn't updated to hash
a pointer instead.
2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
3baf8af947 anv: Remove excess whitespace 2016-03-05 13:50:07 -08:00
Kristian Høgsberg Kristensen
ab36eae5e7 anv: Remove left-over bits of sparse-descriptor code 2016-03-05 13:50:07 -08:00
Jason Ekstrand
1afdfc3e6e anv/pipeline: Implement the depth compare EQUAL workaround on gen8+ 2016-03-05 09:59:28 -08:00
Jason Ekstrand
7c1660aa14 anv: Don't allow D16_UNORM to be combined with stencil
Among other things, this can cause the depth or stencil test to spurriously
fail when the fragment shader uses discard.
2016-03-05 09:59:28 -08:00
Jason Ekstrand
9a90176d48 anv/pipeline: Calculate the correct max_source_attr for 3DSTATE_SBE 2016-03-05 09:59:28 -08:00
Brian Paul
a4678311be st/mesa: 78-column wrapping in st_extensions.c
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-05 09:21:05 -07:00
Brian Paul
9e6a6bd575 gallium/util: add new comments, assertions in u_debug_refcnt.c
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-05 09:20:34 -07:00
Brian Paul
b6a607b221 gallium/util: update comments and URL in u_debug_refcnt.c
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-05 09:20:28 -07:00
Brian Paul
cbca6964e2 gallium/util: make stream variable static in u_debug_refcnt.c
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-05 09:20:23 -07:00
Brian Paul
fb0abedce7 gallium/util: re-indent u_debug_refcnt.[ch]
Wrap comments to 78 columns, etc.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-03-05 09:20:14 -07:00
Brian Paul
a7ba29f6d8 gallium/tests: silence warning in compute.c
compute.c: In function ‘launch_grid’:
compute.c:435:20: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default]
         info.input = input;
                    ^

Maybe the pipe_grid_info::input field should be const void *?

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-05 09:15:44 -07:00
Timothy Arceri
31943e6ba5 glsl: replace remaining tabs in link_varyings.cpp
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2016-03-05 20:50:10 +11:00
Timothy Arceri
e2415e8467 glsl: replace remaining tabs in link_uniforms.cpp
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2016-03-05 20:50:05 +11:00
Jordan Justen
81f30e2f50 anv/hsw: Move query code to genX file for Haswell
This fixes many CTS cases, but will require an update to the kernel
command parser register whitelist. (The CS GPRs and TIMESTAMP
registers need to be whitelisted.)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-05 01:08:07 -08:00
Timothy Arceri
3322cb7b8d docs: mark align layout qualifier as DONE
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:39:13 +11:00
Timothy Arceri
037f68d81e glsl: apply align layout qualifier rules to block offsets
From Section 4.4.5 (Uniform and Shader Storage Block Layout
Qualifiers) of the OpenGL 4.50 spec:

  "The align qualifier makes the start of each block member have a
  minimum byte alignment.  It does not affect the internal layout
  within each member, which will still follow the std140 or std430
  rules. The specified alignment must be a power of 2, or a
  compile-time error results.

  The actual alignment of a member will be the greater of the
  specified align alignment and the standard (e.g., std140) base
  alignment for the member's type. The actual offset of a member is
  computed as follows: If offset was declared, start with that
  offset, otherwise start with the next available offset. If the
  resulting offset is not a multiple of the actual alignment,
  increase it to the first offset that is a multiple of the actual
  alignment. This results in the actual offset the member will have.

  When align is applied to an array, it affects only the start of
  the array, not the array's internal stride. Both an offset and an
  align qualifier can be specified on a declaration.

  The align qualifier, when used on a block, has the same effect as
  qualifying each member with the same align value as declared on
  the block, and gets the same compile-time results and errors as if
  this had been done. As described in general earlier, an individual
  member can specify its own align, which overrides the block-level
  align, but just for that member.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:39:07 +11:00
Timothy Arceri
5a27fefffe glsl: parse align layout qualifier
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:39:01 +11:00
Timothy Arceri
22b0082b9d docs: mark explicit byte offsets as DONE
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:55 +11:00
Timothy Arceri
802262c0af glsl: use explicit offset when lowering buffer access
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:49 +11:00
Timothy Arceri
96527c3cf2 glsl: copy explicit offset to uniform storage
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:44 +11:00
Timothy Arceri
e12a49ac12 glsl: update comment on offset field
The old comment was for the location not the offset, we now use
the field for block members so mention that also.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:39 +11:00
Timothy Arceri
9f24f42c49 glsl: add offset to glsl interface type
In this patch we also copy the offset value from the ast and
implement offset linking rules by adding it to the record_compare()
function.

From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:

   "Two blocks linked together in the same program with the same block
   name must have the exact same set of members qualified with
   offset and their integral-constant-expression values must be the
   same, or a link-time error results."

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:34 +11:00
Timothy Arceri
8abed7f185 glsl: apply compile-time rules for the offset layout qualifier
This implements the rules for the offset qualifier on block members.

From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:

   "The offset qualifier can only be used on block members of blocks
   declared with std140 or std430 layouts."

   ...

   "It is a compile-time error to specify an offset that is smaller than
   the offset of the previous member in the block or that lies within the
   previous member of the block."

   ...

   "The specified offset must be a multiple of the base alignment of the
   type of the block member it qualifies, or a compile-time error results."

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:30 +11:00
Timothy Arceri
6f45484ac7 glsl: enable offset layout qualifier for ARB_enhanced_layouts
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-03-05 19:38:26 +11:00
Timothy Arceri
1824ff1c2a glsl: reject invalid input layout qualifiers
Global in validation is already handled, this will do the validation
for variables, blocks and block members.

This fixes some CTS tests for the new enhanced layouts transform
feedback qualifiers.

V2: add some more valid input flags
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:07:09 +11:00
Timothy Arceri
bd53cc7b45 glsl: only apply default stream to output blocks
This is needed to allow invalid qualifier checks on inputs.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:07:04 +11:00
Timothy Arceri
78d3098c05 glsl: rework parsing of blocks
Previously interface blocks were giving the global default flags of
uniform blocks. This meant we could not check for invalid qualifiers
on interface blocks because they always contained invalid flags.

This changes parsing so that interface blocks now get an empty
set of layouts.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:07:00 +11:00
Timothy Arceri
d244986bf2 glsl: don't apply uniform/buffer layouts to interface blocks
If the following patch we will stop setting these layouts by default
on interface blocks, so we need to do this to avoid hitting the
assert.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-05 19:06:56 +11:00
Nanley Chery
4e75f9b219 anv: Implement VK_REMAINING_{MIP_LEVELS,ARRAY_LAYERS}
v2: Subtract the baseMipLevel and baseArrayLayer (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-04 21:25:23 -08:00
Kenneth Graunke
4ba7ad6cc1 i965: Only magnify depth for 3D textures, not array textures.
When BaseLevel > 0, we magnify the dimensions to fill out the size of
miplevels [0..BaseLevel).  In particular, this was magnifying depth,
thinking that the depth doubles at each level.  This is perfectly
reasonable for 3D textures, but dead wrong for array textures.

Changing the depth != 1 condition to a target == GL_TEXTURE_3D check
should make this only happen in the appropriate cases.

Fixes about 32 dEQP tests:
- dEQP-GLES31.functional.texture.gather.*.level_{1,2}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-03-04 21:25:08 -08:00
Jason Ekstrand
c1436e80ef anv/meta_clear: Set the right number of dynamic states 2016-03-04 19:18:20 -08:00
Juan A. Suarez Romero
2f76a9924e i965/vec4: add opportunistic behaviour to opt_vector_float()
opt_vector_float() transforms several scalar MOV operations to a single
vectorial MOV.

This is done when those MOV covers all the components of the destination
register. So something like:

mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf3.0.z:D, 0D

is transformed in:

mov vgrf3.0:F, [0F, 0F, 0F, 1F]

But there are cases where not all the components are written. For
example, in:

mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf4.0.xy:D, 1065353216D
mov vgrf4.0.w:D, 0D
mov vgrf6.0:UD, u4.xyzw:UD

Nor vgrf3 nor vgrf4 .z components are written, so the optimization is
not applied.

But it could be applied anyway with the components covered, using a
writemask to select the ones written. So we could transform it in:

mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F]
mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F]
mov vgrf6.0:UD, u4.xyzw:UD

This commit does precisely that: opportunistically apply
opt_vector_float() when possible.

total instructions in shared programs: 7124660 -> 7114784 (-0.14%)
instructions in affected programs: 443078 -> 433202 (-2.23%)
helped: 4998
HURT: 0

total cycles in shared programs: 64757760 -> 64728016 (-0.05%)
cycles in affected programs: 1401686 -> 1371942 (-2.12%)
helped: 3243
HURT: 38

v2: change vectorize_mov() signature (Matt).
v3: take in account predicates (Juan).
v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2016-03-04 19:16:52 -08:00
Jason Ekstrand
cc57efc67a anv/pipeline: Fix depthBiasEnable on gen7
The first time I tried to fix this, I set the wrong fields.
2016-03-04 17:56:12 -08:00
Jason Ekstrand
653261285e anv/cmd_buffer: Reset the state streams when resetting the command buffer 2016-03-04 17:54:29 -08:00
Jason Ekstrand
f700d16a89 anv/cmd_buffer: Include Haswell in set_subpass 2016-03-04 17:54:29 -08:00
George Kyriazis
feb71117ae st/xlib: Don't destroy screen on XCloseDisplay()
screen may still be used by other resources that are not yet freed.
To correctly fix this there will be a need to account for resources
differently, but this quick fix is not any worse than the original
code that leaked screens anyway.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 18:14:46 -07:00
Nanley Chery
a6fb62a864 isl: Fix RenderTargetViewExtent for mipmapped 3D surfaces
Match the comment stated above the assignment.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-04 13:20:44 -08:00
Nanley Chery
b80c8ebc45 isl: Get rid of isl_surf_fill_state_info::level0_extent_px
This field is no longer needed.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-04 13:20:03 -08:00
Jason Ekstrand
d154a5ebd6 anv/cmd_buffer: Let the pipeline set StencilBufferWriteEnable on gen9 2016-03-04 12:23:01 -08:00
Jason Ekstrand
f374765ce6 anv/cmd_buffer: Mask stencil reference values 2016-03-04 12:22:32 -08:00
Jason Ekstrand
d61dcec64d anv/clear: Pull the stencil write mask from the pipeline
The stencil write mask wasn't getting set at all so we were using whatever
write mask happend to be left over by the application.
2016-03-04 12:03:00 -08:00
Jason Ekstrand
ec18fef88d anv/pipeline: Set StencilBufferWriteEnable from the pipeline
The hardware docs say that StencilBufferWriteEnable should only be set if
StencilTestEnable is set.  It seems reasonable to set them together.
2016-03-04 12:03:00 -08:00
Jason Ekstrand
fcd8e57185 anv/pipeline: More competent gen8 clipping 2016-03-04 12:03:00 -08:00
Jason Ekstrand
a8afd29653 anv/pipeline: Use the right provoking vertex for triangle fans 2016-03-04 12:03:00 -08:00
Jason Ekstrand
fa8539dd6b anv/pipeline: Respect pRasterizationState->depthBiasEnable 2016-03-04 12:03:00 -08:00
Matt Turner
1f862e923c i965/fs: Optimize float conversions of byte/word extract.
instructions in affected programs: 31535 -> 29966 (-4.98%)
   helped: 23

   cycles in affected programs: 272648 -> 266022 (-2.43%)
   helped: 14
   HURT: 1

The patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4374 -> 4155 instructions (-5.01%)
 #1706: 3582 -> 3363 instructions (-6.11%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner
905ff86198 nir: Recognize open-coded extract_u16.
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner
76289fbfa8 nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:

   float((val >> 16u) & 0xffu)
   float((val >>  8u) & 0xffu)
   float((val >>  0u) & 0xffu)

Instead of shifting, masking, and type converting like this:

   shr(8)          g15<1>UD        g25<8,8,1>UD    0x00000010UD
   and(8)          g16<1>UD        g15<8,8,1>UD    0x000000ffUD
   mov(8)          g17<1>F         g16<8,8,1>UD

   shr(8)          g18<1>UD        g25<8,8,1>UD    0x00000008UD
   and(8)          g19<1>UD        g18<8,8,1>UD    0x000000ffUD
   mov(8)          g20<1>F         g19<8,8,1>UD

   and(8)          g21<1>UD        g25<8,8,1>UD    0x000000ffUD
   mov(8)          g22<1>F         g21<8,8,1>UD

i965 can simply extract a byte and convert to float in a single
instruction:

   mov(8)          g17<1>F         g25.2<32,8,4>UB
   mov(8)          g20<1>F         g25.1<32,8,4>UB
   mov(8)          g22<1>F         g25.0<32,8,4>UB

This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.

   instructions in affected programs: 28568 -> 27450 (-3.91%)
   helped: 7

   cycles in affected programs: 210076 -> 203144 (-3.30%)
   helped: 7

This patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4520 -> 4374 instructions (-3.23%)
 #1706: 3752 -> 3582 instructions (-4.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Kenneth Graunke
9d7faadd8a anv: Fix backwards shadow comparisons
sample_c is backwards from what GL and Vulkan expect.

See intel_state.c in i965.

v2: Drop unused vk_to_gen_compare_op.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-04 11:35:46 -08:00
George Kyriazis
01e92e7010 st/xlib: Hang off screen destructor off main XCloseDisplay() callback.
This resolves some order dependencies between the already existing
callback the newly created one.

Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 10:57:24 -07:00
George Kyriazis
51e562c3ea st/xlib: Support unlimited number of display connections
There is a limit of 10 display connections, which was a
problem for apps/tests that were continuously opening/closing display
connections.

This fix uses XAddExtension() and XESetCloseDisplay() to keep track
of the status of the display connections from the X server, freeing
mesa-related data as X displays get destroyed by the X server.

Poster child is the VTK "TimingTests"

Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 10:57:09 -07:00
Brian Paul
192ee9adb1 svga: add new command-buffer-size HUD query
To plot a graph of the command buffer size.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Brian Paul
1258f907f4 svga: add new svga_winsys_context::get_command_buffer_size()
To ask how large the current command buffer is.  Will be used for
a new GALLIUM_HUD graph.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Brian Paul
6fc8d90fa9 svga: reorder SVGA_QUERY_ switch cases to match declaration order
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Sinclair Yeh
f1410c5b91 svga: Force an RGBA view creation for an RGBA resource
glXCreatePixmap() may specify a GLX_TEXTURE_FORMAT_RGB_EXT format
for an RGBA resource, causing us to create an RGBX view for an
RGBA resource, a combination vgpu10 does not support.

When this is detected, change the request to create an RGBA view
instead.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 07:57:41 -07:00
Charmaine Lee
8366701f4c svga: fix an error in svga_texture_generate_mipmap
With this patch, make sure the shader resource view is properly created
before referencing it in the generate mipmap command.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 07:57:41 -07:00
Thomas Hellstrom
395c7b8fa1 winsys/svga: Increase the fence timeout
If running with a software renderer backend, the timeout may be
insufficient, and we don't want to release busy buffers too early.

In practice, SVGA gpu lockups are extremely rare.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-03-04 13:55:23 +01:00
Thomas Hellstrom
24ad7e16cd winsys/svga: Fix an uninitialized return value
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviwed-by: Brian Paul <brianp@vmware.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-03-04 13:54:38 +01:00
Kenneth Graunke
9ec246796f i965: Set MaxFramebufferWidth/Height to 16384, not viewport.
dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width}
started hitting assertion failures when emitting SURFACE_STATE, after
commit e8fd60e789 where Samuel increased the maximum viewport size to
32768, from 16384.

MaxFramebufferWidth/Height were being set to the maximum viewport size,
but are actually limited by the SURFACE_STATE width/height field range,
which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is
exposed).  So, reduce these to 16384 explicitly.

Fixes assert fails in the above mentioned dEQP tests.  (Those tests
still fail, however.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-03 21:31:22 -08:00
Francisco Jerez
a6046d217d glsl: Improve the accuracy of the acos() approximation.
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error.  The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.

Fixes four dEQP subtests:
dEQP-GLES31.functional.shaders.builtin_functions.precision.acos.
highp_compute.{scalar,vec2,vec3,vec4}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
2795fbcae3 glsl: Parameterize asin_expr() on the fit coefficients.
This will allow us to share the implementation while using different
polynomials for asin() and acos().

Francisco Jerez did this in the SPIR-V front-end; I'm merely porting
his idea to the GLSL world.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
aa37cbdff7 mesa: Allow Get*() of several forgotten IsEnabled() pnames.
From section 6.2 ("State Tables") of the GL 2.1 specification
(the text also appears in the GL 3.0 and ES 3.1 specifications):
"However, state variables for which IsEnabled is listed as the query
 command can also be obtained using GetBooleanv, GetIntegerv, GetFloatv,
 and GetDoublev."

GL_DEBUG_OUTPUT, GL_DEBUG_OUTPUT_SYNCHRONOUS, and GL_FRAGMENT_SHADER_ATI
were missing from the glGet*() functions.  All other IsEnabled() pnames
look to be present, as far as I can tell.

Fixes 8 dEQP-GLES31.functional.debug.state_query subtests:
debug_output[_synchronous]_get{boolean,float,integer,integer64}.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
b4b50b074b mesa: Make glGet queries initialize ctx->Debug when necessary.
dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_*
tries to call glGet on GL_DEBUG_GROUP_STACK_DEPTH right away, before
doing any other debug setup.  This should return 1.

However, because ctx->Debug wasn't allocated, we bailed and returned 0.

This patch removes the open-coded locking and switches the two glGet
functions to use _mesa_lock_debug_state(), which takes care of
allocating and initializing that state on the first time.  It also
conveniently takes care of unlocking on failure for us, so we don't
need to handle that in every caller.

Fixes dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_
{getboolean,getfloat,getinteger,getinteger64}.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
3ed260f54c hack to make dota 2 menus work 2016-03-03 16:21:09 -08:00
Jason Ekstrand
56ba13c994 isl/surface_state: Set L2 bypass disable for certain BC* formats 2016-03-03 16:16:57 -08:00
Eduardo Lima Mitev
47392011c0 Update docs to advertise new support for ARB_internalformat_query2
Support in Mesa main and i965 has just been added.

v2: Include note in 'New Features' of docs/relnotes/11.3.0.html.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-03 22:19:35 +01:00
Kenneth Graunke
623ce595a9 anv: Compile shader stages in pipeline order.
Instead of the arbitrary order modules might be specified in.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:36:19 -08:00
Nanley Chery
8dddc3fb1e anv/meta: Delete unused functions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:44 -08:00
Nanley Chery
d20f6abc85 anv/meta: Use blitter API for state-handling in Buffer Update/Copy
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:42 -08:00
Nanley Chery
318b67d157 anv/meta: Use blitter API in do_buffer_copy()
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:36 -08:00
Nanley Chery
96ff4d0679 anv/meta: Use blitter API in anv_CmdCopyImage()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:35 -08:00
Nanley Chery
9b6c95d46e anv/meta: Use blitter API for copies between Images and Buffers
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:20 -08:00
Nanley Chery
91640c34c6 anv/meta: Add function which copies between Buffers and Images
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:15 -08:00
Nanley Chery
61ad78d0d1 anv/meta: Add function to create anv_meta_blit2d_surf from anv_image
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:10 -08:00
Nanley Chery
2e9b08b9b8 anv/meta: Implement the blitter API functions
Most of the code in anv_meta_blit2d() is borrowed from do_buffer_copy().

Create an image and image view for each rectangle.
Note: For tiled RGB images, ISL will align the image's row_pitch up to
the nearest tile width.

v2 (Jason):
    Keep pitch in units of bytes
    Make src_format and dst_format variables
    s/dest/dst/ in every usage
v3: Fix dst_image width

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:04 -08:00
Nanley Chery
032bf172b4 anv/meta: Modify blitter API fields
Some fields are unnecessary. The variables "pitch" and "bs" are used
for consistency with ISL.

v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:53 -08:00
Jason Ekstrand
654f79a045 anv/meta: Add the beginnings of a blitter API
This API is designed to be an abstraction that sits between the VkCmdCopy
commands and the hardware.  The idea is that it is simple enough that it
*should* be implementable using the blitter but with enough extra data that
we can implement it with the 3-D pipeline efficiently.  One design
objective is to allow the user to supply enough information that we can
handle most blit operations with a single draw call even if they require
copying multiple rectangles.
2016-03-03 11:24:45 -08:00
Nanley Chery
d1e48b9945 anv/meta: Remove redundancies in do_buffer_copy()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:42 -08:00
Nanley Chery
cfe7036750 anv/meta: Replace copy_format w/ block size in do_buffer_copy()
This is a preparatory commit that will simplify the future usage of
this function.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:38 -08:00
Nanley Chery
d50ff250ec anv/meta: Add missing command to exit meta in anv_CmdUpdateBuffer()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:21 -08:00
Nanley Chery
1d9d90d9a6 anv/image: Create a linear image when requested
If a linear image is requested, the only possible result should be a
linearly-tiled surface.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:17 -08:00
Nanley Chery
091f1da902 isl: Don't filter tiling flags if a specific tiling bit is set
If a specific bit is set, the intention to create a surface with a
specific tiling format should be respected.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:23:40 -08:00
Nanley Chery
456f5b0314 isl: Add function to get intratile offsets from x/y offsets
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 10:56:15 -08:00
Jason Ekstrand
206414f92e anv/util: Fix vector resizing
It wasn't properly handling the fact that wrap-around in the source may not
translate to wrap-around in the destination.  This really needs unit tests.
2016-03-03 08:17:36 -08:00
Antia Puentes
4f028bfcc0 i965: Enable the ARB_internalformat_query2 extension
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:08 +01:00
Eduardo Lima Mitev
cbbdf8612d i965/formatquery: Add support for INTERNALFORMAT_PREFERRED query
This pname is tricky. The spec states that an internal format should be
returned, that is compatible with the passed internal format, and has
at least the same precision. There is no clear API to resolve this.

The closest we have (and what other drivers (i.e, NVidia proprietary) do,
is to return the same internal format given as parameter. But we validate
first that the passed internal format is supported by i965.

To check for support, we have the TextureFormatSupported map'. But
this map expects a 'mesa_format', which takes a format+typen. So, we must
first "come up" with a generic type that is suited for this internal format,
then get a mesa_format, and then do the validation.

The cleanest solution here is to add a method that does exactly what
the spec wants: a driver's preferred internal format from a given
internal format. But at this point we lack a clear view of what
defines this preference, and also there seems to be no API for it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:08 +01:00
Eduardo Lima Mitev
e064f43485 mesa/glformats: Consider DEPTH/STENCIL when resolving a mesa_format
_mesa_format_from_format_and_type() is currently not considering DEPTH and
STENCIL formats, which are not array formats and are not handled anywhere.

This patch adds cases for common combinations of DEPTH/STENCIL format and
types.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
ec299602a6 mesa/formatquery: Add (GET_)TEXTURE_IMAGE_TYPE pnames
These basically reuse the default implementation of GL_READ_PIXELS_TYPE.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
23f94146c9 mesa/formatquery: Add (GET_)TEXTURE_IMAGE_FORMAT pnames
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
020671f2a3 mesa/formatquery: Add READ_PIXELS_TYPE pname
We call the driver to provide its preferred type, but also provide a
default implementation that selects a generic type based on the passed
internal format.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
bec286f724 mesa/formatquery: Add READ_PIXELS_FORMAT pname
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
09550c16a5 mesa/formatquery: Add support for READ_PIXELS query
This is supported since very early version of OpenGL, but we still call the
driver to give it the opportunity to report caveat or no support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Alejandro Piñeiro
8d7696f638 mesa/formatquery: added FILTER pname support
It discards out the targets and internalformats that explicitly
mention (per-spec) that doesn't support filter types other than
NEAREST or NEAREST_MIPMAP_NEAREST. Those are:

  * Texture buffers target
  * Multisample targets
  * Any integer internalformat

For the case of multisample targets, it was used the existing method
_mesa_target_allows_setting_sampler_parameter. This would scalate
better in the future if new targets appear that doesn't allow to
set sampler parameters.

We consider RENDERBUFFER to support LINEAR filters, because although
it doesn't support this filter for sampling, you can set LINEAR
on a blit operation using glBlitFramebuffer.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Alejandro Piñeiro
a8736a2567 mesa/texparam: make public target_allows_setting_sampler_parameters
In order to allow to be used on ARB_internalformat_query2 implementation.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
e8ab7727e1 mesa/formatquery: Added framebuffer renderability related queries
From the ARB_internalformat_query2 specification:

   "- FRAMEBUFFER_RENDERABLE: The support for rendering to the resource via
      framebuffer attachment is returned in <params>.

    - FRAMEBUFFER_RENDERABLE_LAYERED: The support for layered rendering to
      the resource via framebuffer attachment is returned in <params>.

    - FRAMEBUFFER_BLEND: The support for rendering to the resource
      via framebuffer attachment when blending is enabled is returned in
      <params>."

For all of them,
    "Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
     If the resource is unsupported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
b4ee9f56fd mesa/formatquery: Added texture gather/shadow related queries
From the ARB_internalformat_query2 specification:

   "- TEXTURE_SHADOW: The support for using the resource with shadow samplers
      is written to <params>.

    - TEXTURE_GATHER: The support for using the resource with texture gather
      operations is written to <params>.

    - TEXTURE_GATHER_SHADOW: The support for using resource with texture gather
      operations with shadow samplers is written to <params>."

For all of them,

    "Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
     If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
557939c08f mesa/formatquery: Added texture view related queries
From the ARB_internalformat_query2 specification:

   "- TEXTURE_VIEW: The support for using the resource with the TextureView
      command is returned in <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource or operation is not supported, NONE is returned.

    - VIEW_COMPATIBILITY_CLASS: The compatibility class of the resource when
      used as a texture view is returned in <params>. The compatibility
      class is one of the values from the /Class/ column of Table 3.X.2. If
      the resource has no other formats that are compatible, the resource
      does not support views, or if texture views are not supported, NONE is
      returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
04e2e0b24a mesa/textureview: Make _lookup_view_class public
It will be used by the ARB_internalformat_query2 implementation to
implement the VIEW_COMPATIBILITY_CLASS <pname> query.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
2066c7be61 mesa/formatquery: Added CLEAR_BUFFER <pname> query
From the ARB_internalformat_query2 specification:

   "- CLEAR_BUFFER: The support for using the resource with ClearBuffer*Data
      commands is returned in <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
aed633bb97 mesa/formatquery: Added compressed texture related queries
From the ARB_internalformat_query2 specification:

   "- TEXTURE_COMPRESSED: If <internalformat> is a compressed format
      that is supported for this type of resource, TRUE is returned in
      <params>. If the internal format is not compressed, or the type of
      resource is not supported, FALSE is returned.

    - TEXTURE_COMPRESSED_BLOCK_WIDTH: If the resource contains a compressed
      format, the width of a compressed block (in bytes) is returned in
      <params>. If the internal format is not compressed, or the resource
      is not supported, 0 is returned.

    - TEXTURE_COMPRESSED_BLOCK_HEIGHT: If the resource contains a compressed
      format, the height of a compressed block (in bytes) is returned in
      <params>. If the internal format is not compressed, or the resource
      is not supported, 0 is returned.

    - TEXTURE_COMPRESSED_BLOCK_SIZE: If the resource contains a compressed
      format the number of bytes per block is returned in <params>.  If the
      internal format is not compressed, or the resource is not supported,
      0 is returned.
      (combined with the above, allows the bitrate to be computed, and may be
      useful in conjunction with ARB_compressed_texture_pixel_storage)."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
467f462c75 mesa/formatquery: Added simultaneous texture and depth/stencil queries
From the ARB_internalformat_query2 specification:

   "- SIMULTANEOUS_TEXTURE_AND_DEPTH_TEST: The support for using the resource
      both as a source for texture sampling while it is bound as a buffer for
      depth test is written to <params>. For example, a depth (or stencil)
      texture could be bound simultaneously for texturing while it is bound as
      a depth (and/or stencil) buffer without causing a feedback loop, provided
      that depth writes are disabled.

    - SIMULTANEOUS_TEXTURE_AND_STENCIL_TEST: The support for using the resource
      both as a source for texture sampling while it is bound as a buffer for
      stencil test is written to <params>. For example, a depth (or stencil)
      texture could be bound simultaneously for texturing while it is bound as
      a depth (and/or stencil) buffer without causing a feedback loop,
      provided that stencil writes are disabled.

    - SIMULTANEOUS_TEXTURE_AND_DEPTH_WRITE: The support for using the resource
      both as a source for texture sampling while performing depth writes to
      the resources is written to <params>.  For example, a depth-stencil
      texture could be bound simultaneously for stencil texturing while it
      is bound as a depth buffer. Feedback loops cannot occur because sampling
      a stencil texture only returns the stencil portion, and thus writes to
      the depth buffer do not modify the stencil portions.

    - SIMULTANEOUS_TEXTURE_AND_STENCIL_WRITE: The support for using the resource
      both as a source for texture sampling while performing stencil writes to
      the resources is written to <params>.  For example, a depth-stencil
      texture could be bound simultaneously for depth-texturing while it is
      bound as a stencil buffer. Feedback loops cannot occur because sampling
      a depth texture only returns the depth portion, and thus writes to
      the stencil buffer could not modify the depth portions.

For all of them,
    "Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
     If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
bd45fb3de4 mesa/formatquery: Added queries related to image textures
From the ARB_internalformat_query2 specification:

   "- IMAGE_TEXEL_SIZE: The size of a texel when the resource when used as
      an image texture is returned in <params>.  This is the value from the
      /Size/ column in Table 3.22. If the resource is not supported for image
      textures, or if image textures are not supported, zero is returned.

    - IMAGE_COMPATIBILITY_CLASS: The compatibility class of the resource when
      used as an image texture is returned in <params>.  This corresponds to
      the value from the /Class/ column in Table 3.22. The possible values
      returned are IMAGE_CLASS_4_X_32, IMAGE_CLASS_2_X_32, IMAGE_CLASS_1_X_32,
      IMAGE_CLASS_4_X_16, IMAGE_CLASS_2_X_16, IMAGE_CLASS_1_X_16,
      IMAGE_CLASS_4_X_8, IMAGE_CLASS_2_X_8, IMAGE_CLASS_1_X_8,
      IMAGE_CLASS_11_11_10, and IMAGE_CLASS_10_10_10_2, which correspond to
      the 4x32, 2x32, 1x32, 4x16, 2x16, 1x16, 4x8, 2x8, 1x8, the class
      (a) 11/11/10 packed floating-point format, and the class (b)
      10/10/10/2 packed formats, respectively.
      If the resource is not supported for image textures, or if image
      textures are not supported, NONE is returned.

    - IMAGE_PIXEL_FORMAT: The pixel format of the resource when used as an
      image texture is returned in <params>.  This is the value
      from the /Pixel format/ column in Table 3.22. If the resource is not
      supported for image textures, or if image textures are not supported,
      NONE is returned.

    - IMAGE_PIXEL_TYPE: The pixel type of the resource when used as an
      image texture is returned in <params>.  This is the value from
      the /Pixel type/ column in Table 3.22. If the resource is not supported
      for image textures, or if image textures are not supported, NONE is
      returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
990a7200e0 mesa/shaderimage: Added func to get the GL_IMAGE_CLASS from the format
It will be used by the ARB_internalformat_query2 implementation to
implement the IMAGE_COMPATIBILITY_CLASS <pname> query.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
52c3692324 mesa/formatquery: Added SHADER_IMAGE_{LOAD,STORE,ATOMIC} <pname> queries
From the ARB_internalformat_query2 specification:

    "- SHADER_IMAGE_LOAD: The support for using the resource with image load
      operations in shaders is written to <params>.
      In this case the <internalformat> is the value of the <format> parameter
      that would be passed to BindImageTexture.

    - SHADER_IMAGE_STORE: The support for using the resource with image store
      operations in shaders is written to <params>.
      In this case the <internalformat> is the value of the <format> parameter
      that is passed to BindImageTexture.

    - SHADER_IMAGE_ATOMIC: The support for using the resource with atomic
      memory operations from shaders is written to <params>."

For all of them:

    "Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
     If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
876f7a7c08 mesa/shaderimage: Make is_image_format_supported public
It will be used by the ARB_internalformat_query2 implementation
to implement queries related to the ARB_shader_image_load_store
extension.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
fae2b10ff9 mesa/formatquery: Added queries related to texture sampling in shaders
From the ARB_internalformat_query2 specification:

   "- VERTEX_TEXTURE: The support for using the resource as a source for
      texture sampling in a vertex shader is written to <params>.

    - TESS_CONTROL_TEXTURE: The support for using the resource as a source for
      texture sampling in a tessellation control shader is written to <params>.

    - TESS_EVALUATION_TEXTURE: The support for using the resource as a source
      for texture sampling in a tessellation evaluation shader is written to
      <params>.

    - GEOMETRY_TEXTURE: The support for using the resource as a source for
      texture sampling in a geometry shader is written to <params>.

    - FRAGMENT_TEXTURE: The support for using the resource as a source for
      texture sampling in a fragment shader is written to <params>.

    - COMPUTE_TEXTURE: The support for using the resource as a source for
      texture sampling in a compute shader is written to <params>."

For all of them,

   "Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
    If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
aeb759c7d6 mesa/formatquery: Added SRGB_DECODE_ARB <pname> query
From the ARB_internalformat_query2 specification:

   "- SRGB_DECODE_ARB: The support for toggling whether sRGB decode happens at
      sampling time (see EXT/ARB_texture_sRGB_decode) for the resource is
      returned in <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
bcb2f9cdb9 mesa/formatquery: Added SRGB_{READ,WRITE} <pname> queries
From the ARB_internalformat_query2 specification:

   "- SRGB_READ: The support for converting from sRGB colorspace on read
      operations (see section 3.9.18) from the resource is returned in
      <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource or operation is not supported, NONE is returned.

    - SRGB_WRITE: The support for converting to sRGB colorspace on write
      operations to the resource is returned in <params>.
      This indicates that writing to framebuffers with this internalformat
      will encode to sRGB color spaces when FRAMEBUFFER_SRGB is enabled (see
      section 4.1.8).
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource or operation is not supported, NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
e88cbb7a51 mesa/formatquery: Added COLOR_ENCODING <pname> query.
From the ARB_internalformat_query2 specification:

   "- COLOR_ENCODING: The color encoding for the resource is returned in
      <params>.  Possible values for color buffers are LINEAR or SRGB,
      for linear or sRGB-encoded color components, respectively. For non-color
      formats (such as depth or stencil), or for unsupported resources,
      the value NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Eduardo Lima Mitev
b1755535ec mesa/glformats: Add a helper function _mesa_is_srgb_format()
Returns true if the passed format is an sRGB format, false otherwise.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
87b2de3998 mesa/formatquery: Added mipmap related <pname> queries
Specifically MIPMAP, MANUAL_GENERATE_MIPMAP and AUTO_GENERATE_MIPMAP <pname>
queries.

From the ARB_internalformat_query2 specification:

   "- MIPMAP: If the resource supports mipmaps, TRUE is returned in <params>.
      If the resource is not supported, or if mipmaps are not supported for
      this type of resource, FALSE is returned.

    - MANUAL_GENERATE_MIPMAP: The support for manually generating mipmaps for
      the resource is returned in <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource is not supported, or if the operation is not supported,
      NONE is returned.

    - AUTO_GENERATE_MIPMAP: The support for automatic generation of mipmaps
      for the resource is returned in <params>.
      Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
      If the resource is not supported, or if the operation is not supported,
      NONE is returned."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:07 +01:00
Antia Puentes
079d99b830 mesa/genmipmap: Added a function to validate the internalformat
It will be used by the ARB_internalformat_query2 implementation to
implement mipmap related queries.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
06852f4b7a mesa/genmipmap: Added a function to check if the target is valid
It will be used by the ARB_internalformat_query2 implementation to
implement mipmap related queries.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
df3a37311d mesa/formatquery: Added {COLOR,DEPTH,STENCIL}_RENDERABLE <pname> queries
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
c22ceb08bb mesa/formatquery: Added {COLOR,DEPTH,STENCIL}_COMPONENTS <pname> queries
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
e976a30db8 mesa/formatquery: support for MAX_COMBINED_DIMENSIONS
It is implemented combining the values returned by calls to the 32-bit
query _mesa_GetInternalformati32v.

The main reason is simplicity. The other option would be C&P how we
implemented the support of GL_MAX_{WIDTH/HEIGHT/DEPTH} and GL_SAMPLES.

Additionally, doing this way, we avoid adding checks on the code, as
are done by the call to the query itself.

MAX_COMBINED_DIMENSIONS is the only pname pointed on the spec of
needing a 64-bit query. We handle that possibility by packing the
returning value on the two first 32-bit integers of params. This
would work on the 32-bit query as far as the value is not greater
that INT_MAX. On the 64-bit query wrapper we unpack those values
in order to get the final value.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Eduardo Lima Mitev
c5cf16a4fc mesa/teximage: add _mesa_is_cube_map_texture utility method
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
4e33278b39 main/formatquery: support for MAX_{WIDTH/HEIGHT/DEPTH/LAYERS}
Implemented by calling GetIntegerv with the equivalent pname and
handling individually the exceptions related to dimensions.

All those pnames are used to get the maximum value for each dimension
of the given target. The only difference between this calls and
calling GetInteger with pnames like GL_MAX_TEXTURE_SIZE,
GL_MAX_3D_TEXTURE_SIZE, etc is that GetInternalformat allows to
specify a internalformat.

But at this moment, there is no reason to think that the values would
be different based on the internalformat. The spec already take that
into account, using these specific pnames as example on Issue 7 of
arb_internalformat_query2 spec.

So this seems like a hook to allow to return different values based on
the internalformat in the future.

It is worth to note that the piglit test associated to those pnames
are checking the returned values of GetInternalformat against the
values returned by GetInteger, and the test is passing with NVIDIA
proprietary drivers.

main/formatquery: support for MAX_{WIDTH/HEIGHT/DEPTH/LAYERS}

Implemented by calling GetIntegerv with the equivalent pname and
handling individually the exceptions related to dimensions.

All those pnames are used to get the maximum value for each dimension
of the given target. The only difference between this calls and
calling GetInteger with pnames like GL_MAX_TEXTURE_SIZE,
GL_MAX_3D_TEXTURE_SIZE, etc is that GetInternalformat allows to
specify a internalformat.

But at this moment, there is no reason to think that the values would
be different based on the internalformat. The spec already take that
into account, using these specific pnames as example on Issue 7 of
arb_internalformat_query2 spec.

So this seems like a hook to allow to return different values based on
the internalformat in the future.

It is worth to note that the piglit test associated to those pnames
are checking the returned values of GetInternalformat against the
values returned by GetInteger, and the test is passing with NVIDIA
proprietary drivers.

v2: use _mesa_has## instead of direct ctx->Extensions access (Nanley Chery)

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
b750144b0a mesa/formatquery: support for IMAGE_FORMAT_COMPATIBILITY_TYPE
From arb_internalformat_query2 spec:

 "IMAGE_FORMAT_COMPATIBILITY_TYPE: The matching criteria use for the
  resource when used as an image textures is returned in
  <params>. This is equivalent to calling GetTexParameter with <value>
  set to IMAGE_FORMAT_COMPATIBILITY_TYPE."

Current implementation of GetTexParameter for this case returns a
field of a texture object, so the support of this pname was
implemented creating a temporal texture object and returning that
value.

It is worth to mention that right now that field is not reassigned
after initialization. So it is somehow hardcoded. An alternative
option would be return that value. That doesn't seems really scalable
though.

v2: use _mesa_has## instead of direct ctx->Extensions access (Nanley Chery)

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
e98a3c799f mesa/formatquery: handle unmodified buffer for SAMPLES on the 64-bit query
From arb_internalformat_query2 spec:

" If <internalformat> is not color-renderable, depth-renderable, or
  stencil-renderable (as defined in section 4.4.4), or if <target>
  does not support multiple samples (ie other than
  TEXTURE_2D_MULTISAMPLE, TEXTURE_2D_MULTISAMPLE_ARRAY, or
  RENDERBUFFER), <params> is not modified."

So there are cases where the buffer should not be modified. As the
64-bit query is a wrapper over the 32-bit query, we can't just copy
the values to the equivalent 32-bit buffer, as that would fail if the
original params contained values greater that INT_MAX. So we need to
copy-back only the values that got modified by the 32-bit query. We do
that by filling the temporal buffer by negatives, as the 32-bit query
should not return negative values ever.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
580816b747 mesa/formatquery: initial implementation for GetInternalformati64v
It just does a wrapping on the existing 32-bit GetInternalformativ.

We will maintain the 32-bit query as default as it is likely that
it would be the one most used.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
7241e1b5f4 mesa/formatquery: Added INTERNALFORMAT_{X}_{SIZE,TYPE} <pname> queries
From the ARB_internalformat_query2 spec:

   "- INTERNALFORMAT_RED_SIZE
    - INTERNALFORMAT_GREEN_SIZE
    - INTERNALFORMAT_BLUE_SIZE
    - INTERNALFORMAT_ALPHA_SIZE
    - INTERNALFORMAT_DEPTH_SIZE
    - INTERNALFORMAT_STENCIL_SIZE
    - INTERNALFORMAT_SHARED_SIZE
      For uncompressed internal formats, queries of these values return the
      actual resolutions that would be used for storing image array components
      for the resource.
      For compressed internal formats, the resolutions returned specify the
      component resolution of an uncompressed internal format that produces
      an image of roughly the same quality as the compressed algorithm.
      For textures this query will return the same information as querying
      GetTexLevelParameter{if}v for TEXTURE_*_SIZE would return.
      If the internal format is unsupported, or if a particular component is
      not present in the format, 0 is written to <params>.

    - INTERNALFORMAT_RED_TYPE
    - INTERNALFORMAT_GREEN_TYPE
    - INTERNALFORMAT_BLUE_TYPE
    - INTERNALFORMAT_ALPHA_TYPE
    - INTERNALFORMAT_DEPTH_TYPE
    - INTERNALFORMAT_STENCIL_TYPE
      For uncompressed internal formats, queries for these values return the
      data type used to store the component.
      For compressed internal formats the types returned specify how components
      are interpreted after decompression.
      For textures this query returns the same information as querying
      GetTexLevelParameter{if}v for TEXTURE_*TYPE would return.
      Possible values return include, NONE, SIGNED_NORMALIZED,
      UNSIGNED_NORMALIZED, FLOAT, INT, UNSIGNED_INT, representing missing,
      signed normalized fixed point, unsigned normalized fixed point,
      floating-point, signed unnormalized integer and unsigned unnormalized
      integer components. NONE is returned for all component types if the
      format is unsupported."

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
675418182b mesa/main: Extend _mesa_get_format_bits to accept new pnames
The new pnames accepted by the function are:

        - INTERNALFORMAT_RED_SIZE
        - INTERNALFORMAT_GREEN_SIZE
        - INTERNALFORMAT_BLUE_SIZE
        - INTERNALFORMAT_ALPHA_SIZE
        - INTERNALFORMAT_DEPTH_SIZE
        - INTERNALFORMAT_STENCIL_SIZE

It will be used by the ARB_internalformat_query2 implementation to
implement those pnames.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
4a8dae6247 mesa/main: Extend _mesa_base_format_has_channel to accept new pnames
The new pnames accepted by the function are:

        - INTERNALFORMAT_RED_SIZE
        - INTERNALFORMAT_GREEN_SIZE
        - INTERNALFORMAT_BLUE_SIZE
        - INTERNALFORMAT_ALPHA_SIZE
        - INTERNALFORMAT_DEPTH_SIZE
        - INTERNALFORMAT_STENCIL_SIZE
        - INTERNALFORMAT_RED_TYPE
        - INTERNALFORMAT_GREEN_TYPE
        - INTERNALFORMAT_BLUE_TYPE
        - INTERNALFORMAT_ALPHA_TYPE
        - INTERNALFORMAT_DEPTH_TYPE
        - INTERNALFORMAT_STENCIL_TYPE

It will be used by the ARB_internalformat_query2 implementation to
implement those pnames.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
f1c789fa00 mesa/main: Make legal_get_tex_level_parameter_target public
It will be used by the ARB_internalformat_query2 implementation
to check if the target is valid for those <pnames> that are said
in the spec that should return the same values than the
'glGetTexLevelParameter{if}v' function:

    - INTERNALFORMAT_RED_SIZE
    - INTERNALFORMAT_GREEN_SIZE
    - INTERNALFORMAT_BLUE_SIZE
    - INTERNALFORMAT_ALPHA_SIZE
    - INTERNALFORMAT_DEPTH_SIZE
    - INTERNALFORMAT_STENCIL_SIZE
    - INTERNALFORMAT_SHARED_SIZE
    - INTERNALFORMAT_RED_TYPE
    - INTERNALFORMAT_GREEN_TYPE
    - INTERNALFORMAT_BLUE_TYPE
    - INTERNALFORMAT_ALPHA_TYPE
    - INTERNALFORMAT_DEPTH_TYPE
    - INTERNALFORMAT_STENCIL_TYPE
    - IMAGE_FORMAT_COMPATIBILITY_TYPE

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Eduardo Lima Mitev
eacb2c971e mesa/formatquery: Added INTERNALFORMAT_PREFERRED pname
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
56ec2dfcb1 mesa/formatquery: Added the INTERNALFORMAT_SUPPORTED <pname> query
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
4722abc630 mesa/formatquery: Added a func to check <internalformat> supported
From the ARB_internalformat_query2 specification:

  "The INTERNALFORMAT_SUPPORTED <pname> can be used to determine if
   the internal format is supported, and the  other <pnames> are defined
   in terms of whether or not the format is supported."

v2: Consider also FBO base formats when checking if the internalformat is
    supported.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
5f6e3a0370 mesa/formatquery: Added func to check if the 'resource' is supported
Checks that the 'resource', as defined by the ARB_internalformat_query2
specification, is supported by the implementation for those 'pnames'
that require this check.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
95392cfa9d mesa/main: not fill mesa_error on _mesa_legal_texture_base_format_for_target
This would allow to use this method if you are just querying if it is
allowed, like for arb_internalformat_query2.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
aaf5ad513b mesa/teximage: Make _mesa_format_no_online_compression public
It will be used by the ARB_internalformat_query2 implementation
to check if a certain compressed 'internalformat' is supported
by texture 'targets'.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
5eef355823 mesa/teximage: make public is_renderable_texture_format
It will be used by the ARB_internalformat_query2 implementation
to check if the 'internalformat' passed is supported by texture
MULTISAMPLE 'targets'.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
b5d27bc5dd mesa/main: Added empty skeleton of glGetInternalformati64v
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Alejandro Piñeiro
2453bba504 mesa: Add dispatch and extension XML for GL_ARB_internalformat_query2
Equivalent to commit bda540 (that added GL_ARB_internalformat_query)

v2: include the new xml to to API_XML list at Makefile.am (Emil Velikov)

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:06 +01:00
Antia Puentes
d432337e2d mesa/formatquery: Added boilerplate code to extend GetInternalformativ
The goal is to extend the GetInternalformativ query to implement the
ARB_internalformat_query2 specification, keeping the behaviour defined
by the ARB_internalformat_query if ARB_internalformat_query2 is not
supported.

v2: Don't require ARB_internalformat_query when profile is GLES3.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Antia Puentes
806bc2bf22 mesa/formatquery: Added a func to check if the <target> is supported
From the ARB_internalformat_query2 spec:

  "If the particular <target> and <internalformat> combination do not make
   sense, or if a particular type of <target> is not supported by the
   implementation the "unsupported" answer should be given. This is not an
   error."

This function checks if the <target> is supported by the implementation.

v2: Allow RENDERBUFFER targets also on GLES 3 profiles.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Antia Puentes
4af3e5e9f1 mesa/formatquery: Added function to set 'unsupported' responses
The ARB_internalformat_query2 specification defines which is the
reponse best representing "not supported" or "not applicable" for
each <pname>.

Queries for unsupported features, targets, internalformats, combinations
of: target and internalformat, target and pname, pname and internalformat,
do not return an error but the corresponding 'unsupported' response.
We will use that response as the default answer.

For SAMPLES the 'unsupported' response is to not modify the 'params' buffer.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Antia Puentes
a6434f41cc mesa/formatquery: Added function to validate parameters
Handles the cases where an error should be returned according
to the ARB_internalformat_query and ARB_internalformat_query2
specifications.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Antia Puentes
b89463cdfd mesa/main: Add extension tracking bit for ARB_internalformat_query2
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
a347a0f53f mesa: Completely remove QuerySamplesForFormat from driver func table
At this point, all uses have been replaced by the more general hook
QueryInternalFormat, introduced by ARB_internalformat_query2.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
993d7345b7 mesa/formatquery: Use new driver hook QueryInternalFormat
Implements SAMPLES and NUM_SAMPLE_COUNTS queries using the new generic
driver call QueryInternalFormat, which is being introduced as replacement
of QuerySamplesForFormat to support ARB_internalformat_query2.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
25ee5c60dc mesa/formatquery: Remove tracking of number of elements in the response
Currently, the number of integers returned in the response to
GetInternalFormativ is being tracked by a 'count' variable.
This is so only the modified elements from the temporary buffer are copied into
the original user buffer.

However, with the introduction of ARB_internalformat_query2, keeping track
of 'count' would complicate the code a lot, considering the high number of
queries.

So, we propose to forget about tracking count, and move all the 16 elements
in the temporary buffer, back to the user buffer (clamped to user buffer size
of course). This is basically a trade-off between performance and code clarity.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
1f0b2ce8ec mesa/multisample: Check sample count using the new driver hook
Use QueryInternalFormat instead of QuerySamplesForFormat to obtain the
highest supported sample. QuerySamplesForFormat is to be removed.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
ee31b0b1d0 st/format: Replace QuerySamplesForFormat by new QueryInternalFormat hook
The previous code for SAMPLES and NUM_SAMPLE_COUNTS is reused as a private function.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
82be7735f3 i965/formatquery: Respond queries SAMPLES and NUM_SAMPLE_COUNTS
This effectively disables old QuerySamplesForFormat driver hook, since it is
never called by Mesa anymore.

v2: Call brw_query_samples_for_format() with a dummy buffer to calculate num
    samples, to avoid modifying the original buffer.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
2dabff9068 i965: Move brw_query_samples_for_format() to brw_queryformat.c
Now that there is a dedicated source file for internal format queries, this
function belongs there.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
28144c4476 i965: Add boilerplate function for QueryInternalFormat driver hook
By default, we call back the driver's hook fallback function that has generic
implementations for the all the queries.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
45054f9702 mesa: Add a default QueryInternalFormat() function for drivers
This is a fallback function for drivers not implementing
ARB_internalformat_query2.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Eduardo Lima Mitev
93d30c3de9 mesa: Add QueryInternalFormat to device driver virtual table
This new function queries different driver parameters for a particular target
and texture format. It is basically a driver hook to support
ARB_internalformat_query2.

Since ARB_internalformat_query2 introduced several new query parameters
over ARB_internalformat_query, having one driver hook for each parameter
is no longer feasible. So this is the generic entry-point for calls
to glGetInternalFormativ and glGetInternalFormati64v.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:05 +01:00
Iago Toral Quiroga
283c8372cb glsl/opt_array_splitting: Fix indentation
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-03 09:12:41 +01:00
Iago Toral Quiroga
4a60002424 glsl/opt_array_splitting: Fix crash when doing array indexing into other arrays
When we find indirect indexing into an array, the current implementation
of the array spliiting optimization pass does not look further into the
expression tree. However, if the variable expression involves variable
indexing into other arrays, we can miss that these other arrays also have
variable indexing. If that happens, the pass will crash later on after
hitting an assertion put there to ensure that split arrays are in fact
always indexed via constants:

shader_runner: opt_array_splitting.cpp:296:
void ir_array_splitting_visitor::split_deref(ir_dereference**): Assertion `constant' failed.

This patch fixes the problem by letting the pass step into the variable
index expression to identify these cases properly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89607
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-03 09:02:30 +01:00
Oded Gabbay
914d4967d7 radeonsi: Do colorformat endian swap for PIPE_USAGE_STAGING
There is an old if statement (dated to 2011) that prevented doing
endian swap for colorformat, in case the buffer is marked as
PIPE_USAGE_STAGING.

This is now wrong because st_ReadPixels() reads into a destination
texture that is marked with PIPE_USAGE_STAGING. Therefore, even if
the texture is rendered correctly to the monitor, when reading it
back we get unswapped/wrong values.

This patch makes the check_rgba() function in gl-1.0-readpixsanity
piglit test pass in big-endian.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-03 09:20:08 +02:00
Oded Gabbay
ef5183faea r600g: Do colorformat endian swap for PIPE_USAGE_STAGING
There is an old if statement (dated to 2011) that prevented doing
endian swap for colorformat, in case the buffer is marked
as PIPE_USAGE_STAGING.

This is now wrong because st_ReadPixels() reads into a destination
texture that is marked with PIPE_USAGE_STAGING. Therefore, even if
the texture is rendered correctly to the monitor, when reading it
back we get unswapped/wrong values.

This patch makes the check_rgba() function in gl-1.0-readpixsanity
piglit test pass in big-endian.

v2: removed duplicate call to r600_colorformat_endian_swap() inside
evergreen_init_color_surface_rat()

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-03 09:20:08 +02:00
Tim Rowley
7bb193d28c mesa/build: add OpenSWR to build
Tested on Linux (centos, ubuntu, and suse variants)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:42 -06:00
Tim Rowley
d003be2a30 gallium/docs - add OpenSWR documentation
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:41 -06:00
Tim Rowley
da4f95d168 gallium/target-helpers: add OpenSWR driver
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:41 -06:00
Tim Rowley
ea37602273 gallium/auxilary: more __cplusplus exports
swr driver which is written in C++ needs access to some more
gallium utility functions than are currently exposed.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:41 -06:00
Tim Rowley
c6e67f5a93 gallium/swr: add OpenSWR rasterizer
Acked-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:41 -06:00
Tim Rowley
2b2d3680bf gallium/swr: add OpenSWR driver
OpenSWR is a new software rasterizer for x86 processors designed
for high performance and high scalablility on visualization workloads.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-02 18:38:41 -06:00
Timothy Arceri
2eec41f6f1 glsl: replace remaining tabs in ir_builder.cpp
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2016-03-03 11:25:57 +11:00
Anuj Phogat
7026f27e33 mesa: Update comment
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-02 15:06:46 -08:00
Anuj Phogat
6ccead5b48 mesa: Fix function description
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-03-02 15:06:46 -08:00
Anuj Phogat
de61849994 meta: Remove the 'allocate_storage' parameter in _mesa_meta_pbo_GetTexSubImage()
Texture is already allocated before calling this meta function. So,
the value of 'allocate_storage' passed to the function is always false.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-02 15:06:46 -08:00
Anuj Phogat
6d4ebbe9e5 meta: Fix the pbo usage in meta for GLES{1,2} contexts
OpenGL ES 1.0 doesn't support using GL_STREAM_DRAW and both
ES 1.0 and 2.0 don't support GL_STREAM_READ in glBufferData().
So, handle it correctly by calling the _mesa_meta_begin()
before create_texture_for_pbo().

V2: Remove the changes related to allocate_storage. (Ian)

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-02 15:06:45 -08:00
Matt Turner
0d047d10f1 program: Clean up after condition code removal. 2016-03-02 12:15:58 -08:00
Matt Turner
961ead6746 program: Remove variable used only in assert(). 2016-03-02 12:15:58 -08:00
Matt Turner
de2ef0401b program: Drop GL_FRAGMENT_PROGRAM_NV from switch statement. 2016-03-02 12:15:58 -08:00
Jordan Justen
98cdce1ce4 anv/gen7: Use predicated rendering for indirect compute
For OpenGL, see commit 9a939ebb47.

Fixes:
 * dEQP-VK.compute.indirect_dispatch.upload_buffer.empty_command
 * dEQP-VK.compute.indirect_dispatch.gen_in_compute.empty_command

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-02 12:03:05 -08:00
Jordan Justen
da4745104c anv: Save batch to local variable for indirect compute
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-02 12:03:05 -08:00
Jason Ekstrand
b0867ca4b2 anv: Fix make check 2016-03-02 11:45:29 -08:00
Samuel Pitoiset
b94a46aa8e gk110/ir: fix wrong emission of NOT modifier for VOTE
Spotted by Coverity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-02 20:36:18 +01:00
Jason Ekstrand
2168082a48 isl: Fix make check 2016-03-02 11:31:22 -08:00
Jason Ekstrand
8f5a64e44f gen8/cmd_buffer: Properly return flushed push constant stages
This is required on SKL so that we can properly re-emit binding table
pointers commands.
2016-03-02 10:48:40 -08:00
Thomas Hindoe Paaboel Andersen
535002f4da gallium/cso: fix indentation
Only one of these were recently introduced. However, since
we keep copy/pasting the same wrong indentation we should
probably just fix it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-02 08:55:20 -07:00
Thomas Hindoe Paaboel Andersen
37cfc51b13 st/mesa: move dereference after null check
We should not dereference shader before we have done the
null check.

Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-02 08:55:20 -07:00
Matt Turner
ad17511302 i965/gen6/gs: Replace V-immediate with VF-immediate.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-02 07:28:52 -08:00
Marek Olšák
43f74ac67c gallium: fix PIPE_BIND_QUERY_BUFFER - PIPE_BIND_SCANOUT overlap
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-02 15:32:52 +01:00
Samuel Iglesias Gonsálvez
e8fd60e789 i965: set ctx->Const.MaxViewport{Width,Height} to 32k
From ARB_viewport_array spec:

" * On GL3-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
    [-16384, 16383].
  * On GL4-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
    [-32768, 32767]."

This range is set using ctx->Const.MaxViewportWidth value, so just bump
those constants to 32k for gen7+ which can support OpenGL 4.0.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-02 07:19:01 +01:00
Samuel Iglesias Gonsálvez
add57b3fa8 main: remove MAX_VIEWPORT_WIDTH and MAX_VIEWPORT_HEIGHT constants
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-02 07:19:01 +01:00
Samuel Iglesias Gonsálvez
aa849d97a0 main: call invalidate_framebuffer_storage() with driver's viewport limits
Don't use hardcoded ones because the driver can set different ones.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-02 07:19:01 +01:00
Jason Ekstrand
5b70aa11ee anv/meta_blit: Use unorm formats for 8 and 16-bit RGB and RGBA values
While Broadwell is very good about UINT formats, HSW is more restrictive.
Neither R8G8B8_UINT nor R16G16B16_UINT really exist on HSW.  It should be
safe to just use the unorm formats.
2016-03-01 21:45:20 -08:00
Kenneth Graunke
89e421369c Merge remote-tracking branch 'origin/master' into vulkan 2016-03-01 17:11:29 -08:00
Rob Clark
c4ae047cab freedreno/ir3: enable shareable shaders
Now that we are no longer using the pctx reference in the shader, drop
it and turn on shareable shaders.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-01 19:21:45 -05:00
Rob Clark
c3f2f8cbe4 freedreno/ir3: pass ctx to constant-emit code
Rather than fishing it out of the shader.  This removes the other big
user of shader->pctx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-01 19:20:44 -05:00
Rob Clark
5fd152bae8 freedreno/ir3: add dev ptr to ir3_compiler
And use this for allocating bo's to hold the shader binary, rather than
accessing the dev via ctx ptr.  One step towards making shaders sharable
across contexts.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-01 19:20:33 -05:00
Jason Ekstrand
e941fd8470 genxml: Make the border color pointer consistent across gens 2016-03-01 14:43:05 -08:00
Jason Ekstrand
eecd1f8001 gen7/pipeline: Add competent blending
This is mostly a copy-and-paste from gen8.  Blending still isn't 100% but
it fixes about 1100 CTS blend tests on HSW.
2016-03-01 13:51:58 -08:00
Jason Ekstrand
8b091deb5e anv: Unify gen7 and gen8 state
Now that we've pulled surface state setup into ISL, there's not much to do
here.
2016-03-01 12:17:23 -08:00
Matt Turner
1be953797e mesa: Remove NV_fragment_program remnants from dlist.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:30 -08:00
Matt Turner
89abb22a85 mesa: Remove NV_fragment_program_option enable bit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:30 -08:00
Matt Turner
ed72a1c118 program: Remove NV_fragment_program opcode parsing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
5429554f09 program: Remove NV_fragment_program scalar suffix parsing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
409c24f9cc program: Remove NV_fragment_program_option parsing support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
fe2d2c7ad8 program: Remove NV_fragment_program Abs support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
0d1f6c752f program: Remove incorrect comment about OPCODE_TXD.
The table in prog_instruction.h is correct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
624d06708d program: Remove OPCODE_TXP_NV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
aaef6cf4e3 program: Clean up after previous commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
7b50b0457d program: Remove condition-code and precision support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
9e11ff7e11 program: Remove OPCODE_KIL_NV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
a0c3650ad3 program: Remove RelAddr2 support.
Looks like more never-used crap from the first geometry shader attempt.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
6b1fb4862e program: Mark table const.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
fc61b41a95 mesa: Remove EmitCondCodes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
7fe206da28 docs: Remove descriptions of long dead Emit* fields.
Dead since commit d8a366200 in 2010.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2016-03-01 11:41:29 -08:00
Matt Turner
f3b68fc5fc glsl: Initialize gl_shader_program::EmptyUniformLocations.
Commit 65dfb30 added exec_list EmptyUniformLocations, but only
initialized the list if ARB_explicit_uniform_location was enabled,
leading to crashes if the extension was not available.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-03-01 11:41:29 -08:00
Ian Romanick
1a80ca22fe i965/meta: Don't pollute the framebuffer namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
8f1b1878a0 i965/meta: Use _mesa_bind_framebuffers instead of _mesa_BindFramebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
3071da3032 meta: Don't pollute the framebuffer namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Fixes piglit tests:
    - object-namespace-pollution glGetTexImage-compressed framebuffer
    - object-namespace-pollution glGenerateMipmap framebuffer

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
91e5825b8a meta/decompress: Track framebuffer using gl_framebuffer instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
3ed44fab18 meta/generate_mipmap: Track framebuffer using gl_framebuffer instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
ec5757f9c9 meta: Use _mesa_bind_framebuffers instead of _mesa_BindFramebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
7c254f0200 meta: Use _mesa_CreateFramebuffers instead of _mesa_GenFramebuffers
This enables later patches that will stop calling _mesa_GenFramebuffers
or _mesa_CreateFramebuffers which pollute the framebuffer namespace.

For framebuffers, the Bind call is still necessary.

sed -i -e 's/_mesa_GenFramebuffers/_mesa_CreateFramebuffers/' \
    src/mesa/drivers/common/*.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:20 -08:00
Ian Romanick
6b70c9ea98 i965/meta: Use _mesa_CreateFramebuffers instead of _mesa_GenFramebuffers
This enables later patches that will stop calling _mesa_GenFramebuffers
or _mesa_CreateFramebuffers which pollute the framebuffer namespace.

For framebuffers, the Bind call is still necessary.

sed -i -e 's/_mesa_GenFramebuffers/_mesa_CreateFramebuffers/' \
    src/mesa/drivers/dri/i965/*.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Ian Romanick
f76462cb6f meta: Save and restore the framebuffer using gl_framebuffer instead of GL API object handle
Some meta operations can be called recursively.  Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used.  If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.

This also fixes another latent bug in meta.  In a multithreaded,
multicontext application, one thread can delete an object that is bound
in another thread.  That object continues to exist until it is unbound
(i.e., its refcount drops to zero).  Meta unbinds objects all over the
place.  As a result, the rebind in _mesa_meta_end could fail because the
object vanished!

See https://bugs.freedesktop.org/show_bug.cgi?id=92363#c8.

Using _mesa_reference_<object type> to save and restore the objects
prevents the refcount from going to zero.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Ian Romanick
fed9b0ed5a mesa: Refactor bind_framebuffer to make _mesa_bind_framebuffers
Fixing dd_function_table::BindFramebuffer will come later because that
change is probably not suitable for stable.

v2: Fix whitespace issue noticed by Topi.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Ian Romanick
64aff35f84 meta: Use _mesa_check_framebuffer_status instead of _mesa_CheckFramebufferStatus
sed -i -e 's/_mesa_CheckFramebufferStatus(GL_DRAW_FRAMEBUFFER/_mesa_check_framebuffer_status(ctx, ctx->DrawBuffer/' \
    -e 's/_mesa_CheckFramebufferStatus(GL_FRAMEBUFFER[^)]*/_mesa_check_framebuffer_status(ctx, ctx->DrawBuffer/' \
    -e 's/_mesa_CheckFramebufferStatus(GL_READ_FRAMEBUFFER/_mesa_check_framebuffer_status(ctx, ctx->ReadBuffer/' \
    $(grep -rl _mesa_CheckFramebufferStatus src/mesa/drivers)

The second expression catches both GL_FRAMEBUFFER and GL_FRAMEBUFFER_EXT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Ian Romanick
92266ff7a3 meta: Obvious refactor of _mesa_meta_framebuffer_texture_image
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Ian Romanick
f69c743069 meta: Convert _mesa_meta_bind_fbo_image to take a gl_framebuffer instead of a GL API handle
Also change the name of the function to
_mesa_meta_framebuffer_texture_image.  The function is basically a
wrapper around _mesa_framebuffer_texture (which is used to implement
glFramebufferTexture1D and friends), so it makes sense for it's name to
be similar to that.

The next patch will clean _mesa_meta_framebuffer_texture_image up
considerably.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-03-01 11:07:19 -08:00
Jason Ekstrand
6e20c1e058 anv/cmd_buffer: Look at both sides for stencil enable
Now it's all consistent with gen9
2016-03-01 11:03:29 -08:00
Jason Ekstrand
4cfdd16500 anv/cmd_buffer: Clean up stencil state setup on gen7 2016-03-01 11:02:21 -08:00
Jason Ekstrand
bb08d86efe anv/cmd_buffer: Clean up stencil state setup on gen8 2016-03-01 10:58:43 -08:00
Kristian Høgsberg Kristensen
22d8666d74 anv: Add in image->offset when setting up depth buffer
Fix from Neil Roberts.

https://bugs.freedesktop.org/show_bug.cgi?id=94348
2016-03-01 09:19:39 -08:00
Jason Ekstrand
38f4c11c2f anv/pipeline: Pull 3DSTATE_SBE into a shared helper 2016-03-01 08:46:32 -08:00
Dave Airlie
ac222626ad virgl: add support for passing render condition flags to host.
This just passes the extra blit info to fix the render condition
tests.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-01 15:50:00 +10:00
Jason Ekstrand
3f8df795c1 genxml: Break output detail of 3DSTATE_SBE on gen7 into a struct
This makes it work like 3DSTATE_SBE_SWIZ on gen8+ which is much more
convenient.
2016-02-29 16:47:42 -08:00
Kenneth Graunke
24994ae926 i965: Push most TES inputs in vec4 mode.
(This is commit 4a1c8a3037 for vec4 mode.)

Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache.  Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.

However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs.  An arbitrary cut-off of
24 vec4 slots (12 registers) should suffice.  (I chose this instead of
the 32 vec4 slots used in the scalar backend to avoid regressing a few
Piglit tests due to the vec4 register allocator being too stupid to
figure out what to do.  We probably ought to fix that, but it's a
separate issue.)

Improves performance in GPUTest's tessmark_x64 microbenchmark by
41.5394% +/- 0.288519% (n = 115) at 1024x768 on my Clevo W740SU
(with Iris Pro 5200).

Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by
38.3576% +/- 0.759748% (n = 42).

v2: Simplify abs/negate handling, as requested by Matt.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-29 16:12:50 -08:00
Marek Olšák
c54f38494c r600g: remove support for DRM < 2.12.0 2016-03-01 00:18:54 +01:00
Marek Olšák
b7da8fa11d r300g: remove support for DRM < 2.12.0 2016-03-01 00:18:54 +01:00
Marek Olšák
a5e2a173dd winsys/radeon: drop support for DRM 2.12.0 (kernel < 3.2)
in order to make some winsys interface changes easier

This distros should use new DRM if they want to use new Mesa:
  Distro    kernel  mesa    eol
  SLES 10   2.6.16  6.4.2   2016-07
  SLED 11   3.0     9.0.3   2022-03
  RHEL 5    2.6.18  6.5.1   2017-03
  RHEL 6    2.6.32  10.4.3  2020-11
  Debian 6  2.6.32  7.7.1   2016-02

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:54 +01:00
Marek Olšák
69a8e435ce radeonsi: also dump shaders on a VM fault
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-03-01 00:18:54 +01:00
Marek Olšák
18df72b50b radeonsi: dump full shader disassemblies into ddebug logs
including prolog and epilog disassemblies

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:54 +01:00
Marek Olšák
74b4ce81fb radeonsi: allow dumping shader disassemblies to a file
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:54 +01:00
Marek Olšák
d0f3b524cd radeonsi: use re-Z
This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).

v2: add comments

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-01 00:18:19 +01:00
Marek Olšák
09bfbd43a0 tgsi/scan: count memory instructions
for radeonsi

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-01 00:11:32 +01:00
Jason Ekstrand
097564bb8e anv/cmd_buffer: Dirty push constants when changing pipelines. 2016-02-29 14:36:24 -08:00
Jason Ekstrand
d29fd1c7cb anv/cmd_buffer: Re-emit push constants packets for all stages 2016-02-29 14:36:24 -08:00
Jason Ekstrand
9715724015 anv/pipeline: Follow push constant alignment restrictions on BDW+ and HSW gt3 2016-02-29 14:36:24 -08:00
Jason Ekstrand
6986ae35ad anv/pipeline: Avoid a division by zero 2016-02-29 14:36:24 -08:00
Jason Ekstrand
51b618285d anv/pipeline: Use dynamic checks for max push constants
The GEN_GEN macros aren't available in anv_pipeline since it only gets
compiled once for the whold driver.
2016-02-29 14:36:24 -08:00
Dave Airlie
35859d5bbb mesa/fbobject: propogate Layered when reusing attachments.
When reusing a depth attachment as a stencil, we need to propogate
the layered bit, otherwise we fail to complete the framebuffer.

discovered running ./bin/fbo-depth-array depth-layered-clear
on virgl on haswell.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-03-01 07:34:37 +10:00
Nanley Chery
74b7b59db5 isl/surface_state: Fix array spacing on Gen7
v2: Don't cast the enum to a boolean (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-29 11:43:33 -08:00
Kristian Høgsberg Kristensen
9d8bae6137 anv: Don't advertise pipelineStatisticsQuery
We don't support that just yet.

Reported-by: Jacek Konieczny <jajcus@jajcus.net>
2016-02-29 10:55:39 -08:00
Axel Davy
83bc2acfe9 st/nine: Fix second Multithreading issue with MANAGED buffers
Here is another threading issue with MANAGED buffers:

Thread 1: buffer creation
Thread 1: buffer lock
Thread 2: Draw call
Thread 1: writes data
Thread 1: Unlock

Without this patch, the buffer is initially dirty
and in the list of things to upload after its creation.
The draw call will then upload the data and unset the dirty flag,
and the Unlock won't trigger a second upload.

Fixes regression introduced by cc0114f30b:
"st/nine: Implement Managed vertex/index buffers"

Cc: "11.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-29 18:55:58 +01:00
Axel Davy
44246fe99d st/nine: Fix Multithreading issue with MANAGED buffers
d3d calls are protected by mutexes, however if app is doing in
two threads:

Thread 1: buffer Lock
Thread 2: Draw call
Thread 1: writes data
Thread 1: Unlock

Then before this patch, the Draw call would begin to upload
the buffer.

Solves this by moving the moment we add the buffer to the queue
of things to upload (We move it from Lock time to Unlock time).

Cc: "11.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-29 18:55:58 +01:00
Axel Davy
35c858c42c st/nine: Handle READONLY for buffer MANAGED pool
READONLY won't trigger an upload.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-29 18:55:58 +01:00
Axel Davy
8a8affdfda st/nine: Use Position input helper for ps3 declared inputs
When the semantic is Position (which can happen with index 0 only),
use the helper to get Position input.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-29 18:55:58 +01:00
Axel Davy
f08c990af5 st/nine: Introduce helper for Position shader input
Cc: "11.2" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-29 18:55:58 +01:00
Marc-André Lureau
f1d12e7392 virtio_gpu: Add virtio 1.0 PCI ID to driver map
Add the virtio-gpu PCI ID for virtio 1.0 (according to the
specification, "the PCI Device ID is calculated by adding 0x1040 to the
Virtio Device ID")

Support for virtio 1.0 was added in qemu 2.4 (same time virtio-gpu
landed).

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 11:31:36 +00:00
Koop Mast
04bc09fdf9 st/clover: Add libelf cflags to the build
Otherwise the build will fail, when the library is in a non default
location.

v2 [Emil Velikov]
 - drop the unneeded cflags from targets/opencl.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Fixes: 7f585a6a98 "configure.ac: use pkg-config for libelf"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93524
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 11:30:15 +00:00
Emil Velikov
c212a70cd9 mesa; add get-extra-pick-list.sh script into bin/
This is a very rudimentary script that checks if any of the applied
cherry-picks have been referenced (fixed?) by another patch. With the
latter either missing the stable tag or hasn't yet been picked.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-02-29 11:25:35 +00:00
Emil Velikov
64500f21f3 automake: explicitly set distcheck configure flags
Pretty much all of these are enabled by default. Considering the recent
updates (see previous commits) one might as well list most/all of these
here.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:45 +00:00
Emil Velikov
325bc6fb4a automake: add more missing options for make distcheck
Namely - opencl, osmesa (only the gallium flavour as it conflicts with
the classic one), surfaceless egl platform and a couple gallium drivers
(virgl and vc4).

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:45 +00:00
Emil Velikov
0b6157e971 install-gallium-links: port changes from install-lib-links
Namely:
b662d5282f mesa: Add clean-local rule to remove .lib links.
5c1aac17ad install-lib-links: don't depend on .libs directory
fece147be5 install-lib-links: remove the .install-lib-links file

With these in place, make distcheck now passes and a race condition has
been avoided.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:45 +00:00
Rob Herring
51b22bd468 r600: Make enum alu_op_flags unsigned
In builds with clang, there are several errors related to the enum
alu_op_flags like this:

src/gallium/drivers/r600/sb/sb_expr.cpp:887:8:
error: case value evaluates to -1610612736, which cannot be narrowed to
type 'unsigned int' [-Wc++11-narrowing]

These are due to the MSB being set in the enum. Fix these errors by
making the enum values unsigned as needed. The flags field that stores
this enum also needs to be unsigned.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-29 10:51:45 +00:00
Rob Herring
92dd38df5a gallium/radeon: Add space between string literal and identifier
Fix compiles with clang that have this C++11 error:

src/gallium/drivers/radeon/r600_pipe_common.h:662:34:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-29 10:51:45 +00:00
Rob Herring
0156a33aa3 freedreno: drop unnecessary -Wno-packed-bitfield-compat
Enabling this warning doesn't generate any warnings with gcc, but is an
unknown option for clang, so drop it.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Rob Clark <robdclark@gmail.com> (v1)

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
v2: keep the warning around, commented out
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:45 +00:00
Rob Herring
8949edf018 Android: clean-up and fix DRI module path handling
MESA_DRI_MODULE_PATH is only getting set for classic DRI drivers and may or
may not be set correctly for gallium_dri.so depending on the makefile
include ordering. For Android 6 and earlier it is fine, but with build
system changes in AOSP master, it is not.

Move the path variables to a single place at the top level and introduce
MESA_DRI_MODULE_REL_PATH for Android 5 and later which require relative
paths. With this, there is a single variable to change.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Rob Herring
0663edf85b Android: remove headers from LOCAL_SRC_FILES
The Android build system now spits out warnings for header files listed in
LOCAL_SRC_FILES, so strip them out.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Rob Herring
6dae9176d6 Android: add -Wno-date-time flag for clang
clang complains about date/time macros:

src/mesa/main/context.c:403:25: error: expansion of date or time macro is not reproducible [-Werror,-Wdate-time]

Disable this warning.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Rob Herring
a2f16db19b Android: glsl: fix dependence on YACC_HEADER_SUFFIX from build system
The makefile was implicitly picking up YACC_HEADER_SUFFIX from the Android
build system, but this variable is now gone. Add it locally to fix the
build with AOSP master.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Rob Herring
794221fbb7 Android: remove dependence on .SECONDEXPANSION
With the Android build system changes to ninja/kati, the use of
.SECONDEXPANSION is no longer supported. Fix this by avoiding rule specific
variables and using $(transform-generated-source).

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Rob Herring
574a92b048 Android: fix build break from nir/glsl move to compiler/
Commits a39a8fbbaa ("nir: move to compiler/") and eb63640c1d
("glsl: move to compiler/") broke Android builds. Fix them.

There is also a missing dependency between generated NIR headers and
several libraries. This isn't a new issue, but seems to have been
exposed by the NIR move.

Built with i915, i965, freedreno, r300g, r600g, vc4, and virgl enabled.

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-29 10:51:44 +00:00
Oded Gabbay
a640ad15e1 gallium/radeon: disable evergreen_do_fast_color_clear for BE
This function is currently broken for BE. I assume it's because of
util_pack_color(). Until I fix this path, I prefer to disable it so users
would be able to see correct colors on their desktop and applications.

Together with the two following patches:
- gallium/r600: Don't let h/w do endian swap for colorformat
- gallium/radeon: remove separate BE path in r600_translate_colorswap

it fixes BZ#72877 and BZ#92039

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-29 12:26:27 +02:00
Oded Gabbay
e3dfc0e095 gallium/r600: Don't let h/w do endian swap for colorformat
Since the rework on gallium pipe formats, there is no more need to do
endian swap of the colorformat in the h/w, because the conversion between
mesa format and gallium (pipe) format takes endianess into account (see
the big #if in p_format.h).

v2: return ENDIAN_NONE only for four 8-bits components
(V_0280A0_COLOR_8_8_8_8)

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-29 12:26:27 +02:00
Oded Gabbay
9559071ed6 gallium/radeon: remove separate BE path in r600_translate_colorswap
After further testing, it appears there is no need for
separate BE path in r600_translate_colorswap()

The only fix remaining is the change of the last if statement, in the 4
channels case. Originally, it contained an invalid swizzle configuration
that never got hit, in LE or BE. So the fix is relevant for both systems.

This patch adds an additional 120 available visuals for LE and BE,
as seen in glxinfo

v2:
Tested for regressions by running piglit gpu.py with CAICOS (r600g) on
x86-64 machine. No regressions found.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-29 12:26:27 +02:00
Samuel Pitoiset
07ed003faf nv50/ir: emit VOTE instruction
Changes from v2:
 - add missing NOT modifier for GK110/GM107

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-28 23:58:11 +01:00
Jordan Justen
635c0e92b7 anv: Set CURBEAllocationSize in MEDIA_VFE_STATE
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-28 11:54:49 -08:00
Jordan Justen
1af5dacd76 anv/gen7: Enable SLM in L3 cache control register
Port 1983003 to gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-28 11:54:49 -08:00
Kristian Høgsberg Kristensen
b00b42d99b nir/spirv: Use the new bare sampler type 2016-02-28 11:24:05 -08:00
Jordan Justen
72efb68d48 anv/pipeline: Set URB offset to zero if size is zero
After 3ecd357d81, it may be possible for
the VS to get assigned all of the URB space.

On Ivy Bridge, this will cause the offset for the other stages to be
16, which cannot be packed into the ConstantBufferOffset field of
3DSTATE_PUSH_CONSTANT_ALLOC_*.

Instead we can set the offset to zero if the stage size is zero.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-28 10:51:38 -08:00
Jordan Justen
ef06ddb08a anv/pipeline: Set FS URB space to zero if the FS is unused
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-28 10:51:38 -08:00
Jordan Justen
45d8ce07a5 anv/pipeline: Set stage URB size to zero if it is unused
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-28 10:49:39 -08:00
Samuel Pitoiset
b3efa0a59e gk110/ir: add ld lock/st unlock emission
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-28 19:20:20 +01:00
Ilia Mirkin
aa3b85fd18 nv50,nvc0: bump minimum texture buffer offset alignment
It appears that it actually needs to be aligned to the datum size, so it
was 1 when testing with R8, but it can be as high as 16 with RGBA32.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-02-27 16:26:34 -05:00
Jason Ekstrand
46b7c242da anv/gen7: Clean up the dummy PS case
Fix whitespace and remove dead comments
2016-02-27 11:24:09 -08:00
Jason Ekstrand
e18a2f037a anv/gen7: Set MaximumNumberofThreads in the dummy PS packet 2016-02-27 11:23:56 -08:00
Jason Ekstrand
ad50896c87 anv/gen7: Only try to get the depth format the surface has depth 2016-02-27 11:23:18 -08:00
Jason Ekstrand
4b34f2ccb8 anv/image: Use isl for filling brw_image_param 2016-02-27 10:26:14 -08:00
Jason Ekstrand
bd6470fa6c isl: Add helpers for filling out brw_image_param 2016-02-27 10:26:14 -08:00
Jason Ekstrand
7363024cbd anv: Fill out image_param structs at view creation time 2016-02-27 10:26:14 -08:00
Jason Ekstrand
e9d126f23b anv/image: Add a ussage_mask field to image_view_init
This allows us to avoid doing some unneeded work on the meta paths where we
know that the image view will be used for exactly one thing.  The meta
paths also sometimes do things that aren't quite valid like setting the
array slice on a 3-D texture and we want to limit the number of paths that
need to be able to sensibly handle the lies.
2016-02-27 10:26:14 -08:00
Jason Ekstrand
b4c16fd01a isl: Move isl_image.c to isl_storage_image.c 2016-02-27 10:26:14 -08:00
Jason Ekstrand
eb19d640eb anv: Use isl to fill buffer surface states 2016-02-27 10:26:14 -08:00
Jason Ekstrand
a0cd20eb7f isl: Add a helper for filling a buffer surface state 2016-02-27 10:26:14 -08:00
Jason Ekstrand
9d5b8f7709 anv: Remove unneeded fiels from anv_image_view 2016-02-27 10:26:14 -08:00
Jason Ekstrand
b70a8d40fa anv/state: Remove unused fill_surface_state functions 2016-02-27 10:26:14 -08:00
Jason Ekstrand
ded57c3cca anv: Use ISL to fill out surface states 2016-02-27 10:26:14 -08:00
Jason Ekstrand
4a9b805ce5 anv/device: Store the default MOCS in the device 2016-02-27 10:26:13 -08:00
Jason Ekstrand
d798762cdb isl: Add a function for filling out a surface state 2016-02-27 10:26:13 -08:00
Jason Ekstrand
6b06072ba8 isl: Create per-gen helper libraries for gens 7, 8, and 9 2016-02-27 10:26:13 -08:00
Jason Ekstrand
82d2db80bb genxml: Add MOCS fields to RENDER_SURFACE_STATE
This allows us to set MOCS as a single uint32_t on all platforms.
2016-02-27 10:26:13 -08:00
Jason Ekstrand
452782f68b gen/genX_pack: Add genxml to the pack header path
If you have an out-of-tree build, gen8_pack.h and friends will not be in
the same folder as genX_pack.h so this will be a problem.  We fixed
out-of-tree earlier by adding the genxml folder to the includes for the
vulkan driver.  However, this is not a good long-term solution because we
want to use it in ISL as well.
2016-02-27 10:26:13 -08:00
Ilia Mirkin
e2dce1a340 mesa: add GL_OES_gpu_shader5 and GL_EXT_gpu_shader5 support
The two extensions are identical, and are largely taking bits of already
existing desktop functionality. We continue to do a poor job of
supporting the 'precise' keyword, just like we do on desktop.

This passes the relevant dEQP tests that I could find.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-27 00:08:28 -05:00
Ilia Mirkin
2875183463 mesa: expose GL_EXT_texture_sRGB_decode on GLES 3.0+
Could be exposed on earlier GLES versions if we supported EXT_sRGB, but
we don't, for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-26 23:55:45 -05:00
Nanley Chery
265d4c415c isl: Fix isl_surf_get_image_intratile_offset_el()
Consecutive tiles are separated by the size of the tile, not by the
logical tile width.

v2: Remove extra subtraction (Ville)
    Add parenthesis (Jason)
v3: Update the unit tests for the function

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-26 16:59:36 -08:00
Ian Romanick
585b18f305 i965/cfg: Fix comment list punctuation
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-26 16:51:27 -08:00
Ian Romanick
5bfb302783 i965/cfg: Split out dead control flow paths to simplify both paths
v2: Fix some bad indentation.  Suggested by Curro.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 16:51:27 -08:00
Ian Romanick
2513a20240 i965/cfg: Don't handle fully empty if/else/endif
This will now never occur.  The empty if-else part would have already
been removed leaving an empty if-endif part.

No shader-db changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 16:51:27 -08:00
Ian Romanick
69bb063ec2 i965/cfg: Eliminate an empty then-branch of an if/else/endif
On BDW,

total instructions in shared programs: 8448571 -> 8448367 (-0.00%)
instructions in affected programs: 21000 -> 20796 (-0.97%)
helped: 116
HURT: 0

v2: Remove spurious attempt to combine the if_block with the (removed!)
else_block.  Suggested by Matt and Curro.  Correct the comment
describing what the new pass does.  Suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 16:51:27 -08:00
Ian Romanick
c7deee69ea i965/cfg: Track prev_block and prev_inst explicitly in the whole function
This provides a trivial simplification now, and it makes some future
changes more straight forward.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 16:51:27 -08:00
Ian Romanick
70cf0eb5c7 i965/cfg: Slightly rearrange dead_control_flow_eliminate
'git diff -w' is a bit more illustrative.  A couple declarations were
moved, the continue was removed, and the code was reindented.  This will
simplify future changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 16:51:27 -08:00
Thomas Hindoe Paaboel Andersen
6bb6b5c341 anv: remove stray ; after if
Both logic and indentation suggests that the ; were not intended here.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-26 16:05:28 -08:00
Jason Ekstrand
b7bc52b5b1 anv/gen8: Emit the 3DSTATE_PS_BLEND packet 2016-02-26 16:04:48 -08:00
Kenneth Graunke
a0294c2cf3 i965: Simplify brw_nir_lower_vue_inputs() slightly.
The same code appeared in both branches; pull it above the if statement.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
8151003ade i965: Avoid recalculating the normal VUE map for IO lowering.
The caller already computes it.  Now that we have stage specific
functions, it's really easy to pass this in.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
15b3639bf1 i965: Avoid recalculating the tessellation VUE map for IO lowering.
The caller already computes it.  Now that we have stage specific
functions, it's really easy to pass this in.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
cfbd9831f8 i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.
Now that each stage is directly calling brw_nir_lower_io(), and we have
per-stage helper functions, it makes sense to just call the relevant one
directly, rather than going through multiple switch statements.

This also eliminates stupid function parameters, such as the two that
only apply to vertex attributes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
b96ddd2e52 i965: Split brw_nir_lower_inputs/outputs into per-stage functions.
These functions are both giant switch statements where most cases don't
overlap at all.  Let's put the bulk of the work in per-stage helpers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
d33c478bed i965: Remove catch-all nir_lower_io call with specific cases.
Most cases already call nir_lower_io explicitly for input and output
lowering.  This catch all isn't very useful anymore - we can just add it
to the remaining cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
51f8797993 i965: Move optimizations from brw_nir_lower_io to brw_postprocess_nir.
This simplifies things.  Every caller of brw_nir_lower_io() immediately
calls brw_postprocess_nir().  The only real change this will have is
that we get an extra brw_nir_optimize() call when compiling compute
shaders, but that seems fine.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
dcd4a841e9 i965: Always do NIR IO lowering at specialization time.
We've now hit literally every case other than geometry shaders (and
compute shaders, but those are a no-op).  So, let's just move geometry
shaders over too and be done with it.

The only advantage to doing this at link time was to save the expense
of running the pass on recompiles.  But we're already running a lot of
passes, and the extra code complexity isn't worth it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Kenneth Graunke
fa7135107f i965: Make an is_scalar boolean in brw_compile_gs().
Shorter than compiler->scalar_stage[MESA_SHADER_GEOMETRY], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Jason Ekstrand
b3cb6e78aa i965/nir: Do lower_io late for fragment shaders
The Vulkan driver wants to be able to delete fragment outputs that are
beyond key.nr_color_regions; this is a lot easier if we lower outputs at
specialization time rather than link time.

(Rationale added to commit message by Ken)

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-26 15:55:59 -08:00
Jordan Justen
7428e6f86a i965: Set dest type to UW for several send messages
Without this, on SIMD 16 the send instruction destination will appear
to write more than one destination register, causing the simulator to
report an error.

Of course, the send instruction can actually write more than one
destination register regardless of the type set for the destination,
so this is a bit strange.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-26 12:03:56 -08:00
Samuel Pitoiset
aad48f8691 nvc0: rework nvc0_compute_validate_program()
Reduce the amount of duplicated code by re-using
nvc0_program_validate(). While we are at it, change the prototype
to return void and remove nvc0_compute.h which is now useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26 14:00:27 +01:00
Samuel Pitoiset
e1f5c76047 nvc0: make sure to validate compute global buffers on Fermi
No reason to not validate those global buffers and this might avoid
fails if someone try to use the global memory from compute programs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26 14:00:23 +01:00
Samuel Pitoiset
dcf7938833 nvc0: move nvc0_validate_global_residents() to nvc0_compute.c
While we are at it, rename it to nvc0_compute_validate_globals() and
update its prototype.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-26 14:00:18 +01:00
Derek Foreman
d085a5dff5 egl/wayland: Try to use wl_surface.damage_buffer for SwapBuffersWithDamage
Since commit d1314de293 we ignore
damage passed to SwapBuffersWithDamage.

Wayland 1.10 now has functionality that allows us to properly
process those damage rectangles, and a way to query if it's
available.

Now we can use wl_surface.damage_buffer and interpret the incoming
damage as being in buffer co-ordinates.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
2016-02-26 11:49:09 +00:00
Dave Airlie
840aa52f50 virgl: add missing CAP turned off. 2016-02-26 04:03:09 +00:00
Miklós Máté
847f1cc698 program: Remove extra reference_program()
It was already done in get_mesa_program()

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-02-25 22:02:50 +01:00
Emil Velikov
51c65a4c48 automake: add nine to make distcheck
Will allow us to catch/prevent issues, like the one in mesa 11.2.0-rc1.

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-25 19:56:07 +00:00
Emil Velikov
b08dbc84fe st/nine: don't forget to bundle the nine_limits.h file
Without this mesa 11.2.0-rc1 ended up busted :-(

Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Repored-by: Ondřej Súkup <mimi.vx@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-25 19:56:07 +00:00
Matt Turner
4009a9ead4 i965/fs: Allow saturate propagation to propagate negations into MADs.
Allows us to transform

   mad      res  src0   src1   src2
   mov.sat  dst  -res

into

   mad.sat  dst  -src0 -src1   src2

instructions in affected programs: 3712 -> 3688 (-0.65%)
helped: 24

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:15 -08:00
Matt Turner
65d3217cb0 i965/fs: Allow saturate propagation to propagate negations into ADDs.
Allows us to transform

   add      res  src0   src1
   mov.sat  dst  -res

into

   add.sat  dst  -src0 -src1

No shader-db changes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:13 -08:00
Matt Turner
7b6113bc2d i965/fs: Allow saturate propagation to propagate negations into MULs.
Allows us to transform

   mul      res  src0   src1
   mov.sat  dst  -res

into

   mul.sat  dst  src0  -src1

instructions in affected programs: 45246 -> 45054 (-0.42%)
helped: 162

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:10 -08:00
Matt Turner
1567da1e28 i965/fs: Don't CSE negated multiplies with saturation.
It's not correct to CSE these multiplies

   mul.sat dst1, -a, b
   mul.sat dst2,  a, b

by emitting a negated MOV from dst1 to dst2:

   mul.sat dst1, -a, b
   mov     dst2, -dst1

Take 2.0*2.0 for example. The first multiply would produce 0.0 and the
second would produce 1.0.

Fixes bad generated code in 18 to 22 shaders:

instructions in affected programs: 432 -> 464 (7.41%)
helped: 4
HURT: 18

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:04 -08:00
Matt Turner
3da789f1e9 glsl: Consider ubo_load to be a horizontal operation.
Unclear to me whether it actually is a horizontal operation that cannot
be vectorized, but the fact that i965 generates the same code in either
case makes me less interested in finding out.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94199
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-25 10:50:34 -08:00
Jason Ekstrand
c32273d246 anv/device: Properly handle apiVersion == 0
From the Vulkan 1.0 spec section 3.2:

"If apiVersion is 0 the implementation must ignore it"
2016-02-25 08:52:37 -08:00
Andres Gomez
d1509a5848 glsl/ast: Implicit conversion from double to float is not allowed
Also, renamed get_conversion_operation to avoid
future misunderstandings.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-25 13:10:50 +01:00
Oded Gabbay
439b5b008f gallium/radeon: return correct values for BE in r600_translate_colorswap
Because I changed the swizzle check, I also need to adapt the return
values for each check.

It's basically almost the same as before, we just cross between STD and
STD_REV, and cross between ALT and ALT_REV

This fixes the rgba test in gl-1.0-readpixsanity (piglit) and also
fixes tri-flat (mesa demos).

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-25 09:21:08 +02:00
Oded Gabbay
ff8b41b702 gallium: remove duplicate define from enum pipe_format
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-25 09:21:08 +02:00
Ian Romanick
9d9aeb91b1 glsl: Detect do-while-false loops and unroll them
Previously loops like

   do {
      // ...
   } while (false);

that did not have any other loop-branch instructions would not be
unrolled.  This is commonly used to wrap multiline preprocessor macros.

This produces IR like

    (loop (
       ...
       break
    ))

Since limiting_terminator was NULL, the loop unroller would
throw up its hands and say, "I don't know how many iterations.  How
can I unroll this?"

We can detect this another way.  If there is no limiting_terminator
and the only loop-branch is a break as the last IR, there's only one
iteration.

On my very old checkout of shader-db, this removes a loop from Orbital
Explorer, but it does not otherwise affect the shader.  The loop removed
is the one the compiler inserts surrounding the switch statement.

This change does prevent some seriously bad code generation in some
patches to meta shaders that I recently sent out for review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-24 18:43:40 -08:00
Nanley Chery
3eb476fa14 i965: Enable tiled mem_copy with sRGB-formatted resources
RGBA8 and BGRA8 unorm formats are compatible with the various
mem_copy functions. Their sRGB counterparts are also compatible
because they're also color-renderable (of importance when the
specified resource is a readbuffer) and they share the same
physical layout.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-24 14:40:34 -08:00
Kristian Høgsberg Kristensen
59f5728995 Merge remote-tracking branch 'origin/master' into vulkan 2016-02-24 13:04:54 -08:00
Kristian Høgsberg Kristensen
25c2470b24 anv: Set max_hs_threads/max_ds_threads 2016-02-24 12:21:26 -08:00
Kenneth Graunke
3ecd357d81 anv: Allocate more push constant space.
Previously we allocated 4kB of push constant space for VS, GS, and PS
(for a total of 12kB) no matter what.  This works, but doesn't fully
utilize the space - we have 16kB or 32kB of space.

This makes anv use the same method as brw - divide up the space evenly
among all active shader stages.  This means HS and DS would get space,
if those shader stages existed.

In the future, we can probably do better by inspecting how many push
constants each shader stage uses, and weight things accordingly.  But
this is strictly better than the old code, and ideally we'd justify
a fancier solution with actual performance data.
2016-02-24 11:22:05 -08:00
Kenneth Graunke
3f11517730 anv: Properly size the push constant L3 area.
We were assuming it was 32kB everywhere, reducing the available URB
space.  It's actually 16kB on Ivybridge, Baytrail, and Haswell GT1-2.
2016-02-24 11:13:08 -08:00
Kenneth Graunke
7f9b03cc8b anv: Emit 3DSTATE_PUSH_CONSTANT_ALLOC_* via a loop.
Now we're emitting HS and DS packets as well.
2016-02-24 11:13:08 -08:00
Kenneth Graunke
1024a66fc4 anv: Emit 3DSTATE_URB_* via a loop.
Rather than keeping separate {vs,hs,ds,gs}_start fields, we now store an
array indexed by the shader stage (MESA_SHADER_*).  The 3DSTATE_URB_*
commands are also sequentially numbered.  This makes it easy to just
emit them in a loop.

This simplifies the code a little, and also will make it easier to add
more credible HS and DS code later.
2016-02-24 11:13:02 -08:00
Brian Paul
c95d5c5f6f mesa: replace for loop with bitshifting in supported_buffer_bitmask()
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:32:01 -07:00
Brian Paul
ac37d0475c mesa: updates some comments in buffers.c
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:53 -07:00
Brian Paul
d8412029bb mesa: make _mesa_draw_buffers() static
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:44 -07:00
Brian Paul
24d8080507 mesa: make _mesa_draw_buffer() static
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:41 -07:00
Brian Paul
ebfcf9de43 mesa: make _mesa_read_buffer() static
Not called from any other file.  Remove _mesa_ prefix and update comments.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:37 -07:00
Brian Paul
1e41c2e135 mesa: move declaration of buffer var in handle_first_current()
Declare the var in the scopes where it's used.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:31 -07:00
Brian Paul
c8fdb42c91 mesa: use gl_buffer_index in a few places
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:28 -07:00
Brian Paul
363019e17a st/mesa: remove useless break statement
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:23 -07:00
Brian Paul
953cb24e65 st/mesa: rename st_readpixels to st_ReadPixels
To match the convention of other device driver functions.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:31:17 -07:00
Brian Paul
83b589301f st/mesa: fix frontbuffer glReadPixels regressions
The change "mesa/readpix: Don't clip in _mesa_readpixels()" caused a
few piglit regressions.  The failing tests use glReadPixels to read
from the front color buffer.  The problem is we were trying to read
from a non-existant front color buffer.  The front color buffer is
created on demand in st/mesa.  Since the missing buffer bounds were
effectively 0 x 0 the glReadPixels was totally clipped and returned
early.

The fix involves creating the real front color buffer when we're about
to try reading from it.

Tested with llvmpipe and VMware driver on Linux, Windows.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94253
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94257
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-24 08:30:07 -07:00
Jason Ekstrand
c9564fd598 nir/spirv: Allow but warn for a few capabilities
Unfortunately, glslang gives us cull/clip distance and GS streams even if
the shader doesn't use it whenever a shader is declared as version 450.
This is a glslang bug, but we can easily enough ignore it for now.
2016-02-23 22:07:25 -08:00
Jason Ekstrand
f0f7cc22f3 anv/descriptor_set: Use the correct size for the descriptor pool
The descriptor sizes array gives the total number of each type of
descriptor that will ever be allocated from the pool, not the total amount
that may be in any particular set.  In our case, this simply means that we
have to sum a bunch of things up and there we go.
2016-02-23 21:25:37 -08:00
Jason Ekstrand
040355b688 nir/spirv: Add more capabilities 2016-02-23 21:01:00 -08:00
Jason Ekstrand
bd3db3d665 anv/meta: Allocate descriptor pools on-the-fly
We can't use a global descriptor pool like we were because it's not
thread-safe.  For now, we'll allocate them on-the-fly and that should work
fine.  At some point in the future, we could do something where we
stack-allocate them or allocate them out of one of the state streams.
2016-02-23 17:04:19 -08:00
Oded Gabbay
4b7e219e61 gallium/radeon: Correctly translate colorswaps for big endian
The current code in r600_translate_colorswap uses the swizzle information
to determine which colorswap to use.

This works for BE & LE when the nr_channels is <4, but when nr_channels==4
(e.g. PIPE_FORMAT_A8R8G8B8_UNORM), this method can not be used for both BE
and LE, because the swizzle info is the same for both of them.

As a result, r600g doesn't support 24bit color formats, only 16bit, which
forces the user to choose 16bit color in X server.

This patch fixes this bug by separating the checks for LE and BE and
adapting the swizzle conditions in the BE part of the checks.

Tested on an Evergreen GPU (Cedar GL FirePro 2270) running inside POWER7
Big-Endian Machine.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "11.2" "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-23 20:55:40 +02:00
Thomas Hindoe Paaboel Andersen
1807806add mesa: use sizeof on the correct type
Before the luminance stride was based on the size of GL_FLOAT
which is just the type constant (0x1406). Change it to use the
size of GLfloat.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-23 08:55:35 -07:00
Marek Olšák
190a291b03 tgsi/scan: handle holes between VS inputs, assert-fail in other cases
"st/mesa: overhaul vertex setup for clearing, glDrawPixels, glBitmap"
added a vertex shader declaring IN[0] and IN[2], but not IN[1].

Drivers relying on tgsi_shader_info can't handle holes in declarations,
because tgsi_shader_info doesn't track that.

This is just a quick workaround meant for stable that will work for vertex
shaders.

This fixes radeonsi DrawPixels and CopyPixels crashes.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-23 16:42:16 +01:00
Jason Ekstrand
bfbb238dea anv/descriptor_set: Set descriptor type for immuatable samplers 2016-02-22 21:39:14 -08:00
Jason Ekstrand
64e1c84059 intel/genxml: Update macro documentation 2016-02-22 21:20:04 -08:00
Francisco Jerez
31a0affa28 docs: Mark off GL_OES_shader_image_atomic as done.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 19:59:56 -08:00
Francisco Jerez
058ed980c6 i965/fs: Return result of image atomic in a register of the expected type.
So the result is of float type if we're implementing the float
overload of imageAtomicExchange.  This is the only back-end change
required to support OES_shader_image_atomic AFAICT.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 19:57:09 -08:00
Francisco Jerez
81c16a2dab glsl: Implement the required built-in functions when OES_shader_image_atomic is enabled.
This is basically just the same atomic functions exposed by
ARB_shader_image_load_store, with one exception:

    "highp float imageAtomicExchange(
         coherent IMAGE_PARAMS,
         float data);"

There's no float atomic exchange overload in the original
ARB_shader_image_load_store or GL 4.2, so this seems like new
functionality that requires specific back-end support and a separate
availability condition in the built-in function generator.

v2: Move image availability predicate logic into a separate static
    function for clarity.  Had to pull out the image_function_flags
    enum from the builtin_builder class for that to be possible.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 19:56:54 -08:00
Francisco Jerez
be125af95e glsl: Add usual extension boilerplate for OES_shader_image_atomic.
v2: No need for extension enable bits (Ilia).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 19:56:35 -08:00
Francisco Jerez
009bbecf6d mesa: Add extension table entry for OES_shader_image_atomic.
v2: No need for extension enable bits (Ilia).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 19:55:35 -08:00
Jason Ekstrand
ae619a0355 anv/state: Replace a bunch of ANV_GEN with GEN_GEN 2016-02-22 19:19:00 -08:00
Jason Ekstrand
442dff8cf4 anv/descriptor_set: Stop marking everything as having dynamic offsets 2016-02-22 17:23:29 -08:00
Kristian Høgsberg Kristensen
2570a58bcd anv: Implement descriptor pools
Descriptor pools are an optimization that lets applications allocate
descriptor sets through an externally synchronized object (that is,
unlocked).  In our case it's also plugging a memory leak, since we
didn't track all allocated sets and failed to free them in
vkResetDescriptorPool() and vkDestroyDescriptorPool().
2016-02-22 17:13:51 -08:00
Kristian Høgsberg Kristensen
353d5bf286 anv/x11: Free swapchain images and memory on destroy 2016-02-22 16:23:47 -08:00
Samuel Pitoiset
2999257e0f nvc0: rename 3d binding points to NVC0_BIND_3D_XXX
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
9c6a7bfb40 nvc0: rename 3d dirty flags to NVC0_NEW_3D_XXX
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
2c48369f54 nvc0: prefix compute macros with _CP_ instead of _COMPUTE_
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
bbff97ae39 nvc0: rename NVXX_COMPUTE to NVXX_CP
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
5330ed959e nvc0: rename nvc0_context::dirty to nvc0_context::dirty_3d
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
84b9b8f0a3 nvc0/ir: add missing emission of locked load predicate
Like unlocked store on shared memory, locked store can fail and the
second dest which is a predicate must be emitted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
9f0d059d4b nvc0/ir: add ld lock/st unlock emission on GK104
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Samuel Pitoiset
6526225f88 nv50/ir: restore OP_SELP to be a regular instruction
Actually OP_SELP doesn't need to be a compare instruction. Instead we
just need to set the NOT modifier when building the instruction.
While we are at it, fix the dst register type and use a GPR.

Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-22 21:28:51 +01:00
Mark Janes
08b408311c vulkan: fix out-of-tree builds 2016-02-22 11:31:15 -08:00
Brian Paul
9de3b0273d svga: unbind index buffer when drawing non-indexed primitives
Silences a warning reported by the svga3d device.

v2: also null-out the index buffer pointer

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-02-22 12:14:48 -07:00
Kristian Høgsberg Kristensen
f843aabdd4 intel/genxml: Add README
I've had people ask about the design of the pack functions, for example,
why aren't we using bitfields. I wrote up a bit of background on why and
how we ended up with the current design and we might as well keep that
with the code.
2016-02-22 09:14:25 -08:00
Nanley Chery
7b2c63a53c anv/meta_blit: Handle compressed textures in anv_CmdCopyImage
As with anv_CmdCopyBufferToImage, compressed textures require special
handling during copies.

Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-02-22 09:04:28 -08:00
Ilia Mirkin
571bd9ac42 mesa: add GL_EXT_texture_border_clamp support
This extension is identical to GL_OES_texture_border_clamp. But dEQP has
tests that want the EXT variant.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-22 10:38:56 -05:00
Ilia Mirkin
b6654831c3 mesa: add GL_OES_texture_border_clamp support
Only minor differences to the existing ARB_texture_border_clamp support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-22 10:38:56 -05:00
Ilia Mirkin
af8ad49541 mesa: bump version
11.2 has been branched, we're on 11.3 now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-22 10:38:37 -05:00
Emil Velikov
4cd5e5b48e nouveau: update the Makefile.sources list
Reflect the nv50->g80 change and the new gm107_texture header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-22 11:40:29 +00:00
Jason Ekstrand
f49ba0f7d8 nir/spirv: Add support for multisampled textures 2016-02-21 22:02:38 -08:00
Marek Olšák
ff360a52e6 radeonsi: implement binary shaders & shader cache in memory (v2)
v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
1132910e50 gallium/radeon: remove unused radeon_shader_binary_free_* functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
50ac2612d0 radeonsi: make radeon_shader_reloc name string fixed-sized
This will simplify implementations of binary shaders.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
1fe73d55e3 radeonsi: move some struct si_shader members to new struct si_shader_info
This will be part of shader binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
10fa269f4f radeonsi: use smaller types for some si_shader members
in order to decrease the shader size for a shader cache.

v2: add & use SI_MAX_VS_OUTPUTS

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
9aaf28da62 radeonsi: enable compiling one variant per shader
Shader stats from VERDE:

Default scheduler:

Totals:
SGPRS: 491272 -> 488672 (-0.53 %)
VGPRS: 289980 -> 311093 (7.28 %)
Code Size: 11091656 -> 11219948 (1.16 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 1732608 -> 2246656 (29.67 %) bytes per wave
Max Waves: 78063 -> 77352 (-0.91 %)
Wait states: 0 -> 0 (0.00 %)

Looking at some of the worst regressions, I get:
- The VGPR increase seems to be caused by the fact that if PS has used less
  than 16 VGPRs, now it will always use 16 VGPRs and sometimes even 20.
  However, the wave count remains at 10 if VGPRs <= 24, so no harm there.
- The scratch increase seems to be caused by SGPR spilling.
  The unnecessary SGPR spilling has been an ongoing issue with the compiler
  and it's completely fixable by rematerializing s_loads or reordering
  instructions.

SI scheduler:

Totals:
SGPRS: 374848 -> 374576 (-0.07 %)
VGPRS: 284456 -> 307515 (8.11 %)
Code Size: 11433068 -> 11535452 (0.90 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 509952 -> 522240 (2.41 %) bytes per wave
Max Waves: 79456 -> 78217 (-1.56 %)
Wait states: 0 -> 0 (0.00 %)

VGPRs - same story as before. The SI scheduler doesn't spill SGPRs so much
and generally spills way less than the default scheduler.
(522240 spills vs 2246656 spills)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
754cf171e9 radeonsi: print full shader name before disassembly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
3c98e0b369 radeonsi: compile non-GS middle parts of shaders immediately if enabled
Still disabled.

Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.

VS is always compiled as hw VS.
TES is always compiled as hw VS.

LS and ES stages are always compiled on demand.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
e038f8fd49 radeonsi: rework polygon stippling for PS prolog
Don't use the pstipple module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
4636d9be4a radeonsi: add PS prolog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:58 +01:00
Marek Olšák
e79bb746ab radeonsi: add PS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
eb10919b83 radeonsi: add TCS epilog
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
e1b21696a3 radeonsi: add VS epilog
It only exports the primitive ID.
Also used by TES when it's compiled as VS.

The VS input location of the primitive ID input is v2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
70de433dea radeonsi: add VS prolog
This is disabled with use_monolithic_shaders = true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
19a92886a8 radeonsi: first bits for non-monolithic shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
0303886b10 radeonsi: add code for dumping all shader parts together (v2)
v2: unify some code into si_get_shader_binary_size

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
17eb99d8b9 radeonsi: add code for combining and uploading shaders from 3 shader parts
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
9d5bf1a3ef radeonsi: fail compilation if non-GS non-CS shaders have rodata
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
09408764c1 radeonsi: separate 2 pieces of code from create_function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
292759220c radeonsi: add samplemask parameter to si_export_mrt_color
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
e6aea08b86 radeonsi: add start_instance parameter to get_instance_index_for_fetch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
dc27456194 radeonsi: separate out shader key bits for prologs & epilogs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
d995d4830e radeonsi: compute how many input VGPRs fragment shaders have
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
fe1b6ede01 radeonsi: compute how many input SGPRs and VGPRs shaders have
Prologs (shader binaries inserted before the API shader binary) need to
know this, so that they won't change the input registers unintentionally.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Marek Olšák
36202182ac gallium/radeon: add basic code for setting shader return values
LLVMBuildInsertValue will be used on return_value.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-21 21:08:57 +01:00
Samuel Pitoiset
3c9ed2015c nvc0: enable compute shaders on Fermi
Kepler compute support is really different than Fermi and it's not
ready yet.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:42:32 +01:00
Samuel Pitoiset
14a810e9d0 nv50/ir: add atomics support on shared memory for Fermi
Changes from v3:
 - move the previous OP_SELP change to the previous commit

Changes from v2:
 - make sure the op is OP_SELP when emitting the predicate and add one
   assert
 - use bld.getSSA() for mkOp2()
 - add cross edge between tryLockAndSetBB and joinBB

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:42:32 +01:00
Samuel Pitoiset
e0371e63df nv50/ir: make OP_SELP a compare instruction
This OP_SELP insn will be used to handle compare and swap subops.

Changes from v2:
 - fix logic for GK110+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:42:29 +01:00
Samuel Pitoiset
0c930557bf nv50/ir: add lock/unlock subops for load/store
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:42:02 +01:00
Samuel Pitoiset
45e85e16f5 nv50/ir: use s[] addr space for shared buffers
Shared memory address space (FILE_MEMORY_SHARED) must be used instead
of global memory when a shared memory area is declared.

Changes from v2:
 - oops, do not remove TGSI_FILE_BUFFER in a switch in
   nv50_ir_from_tgsi.cpp

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:58 +01:00
Samuel Pitoiset
80fc67fba5 nvc0: reduce likelihood of collision for real buffers on Fermi
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.

Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:53 +01:00
Samuel Pitoiset
807901b639 nvc0: invalidate compute state when switching pipe contexts
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:48 +01:00
Samuel Pitoiset
c6293877f0 nvc0: add support for indirect compute on Fermi
When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.

Changes from v2:
 - do not launch the grid if the number of groups for a dimension is 0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:45 +01:00
Samuel Pitoiset
fa7333a742 nvc0: bind textures/samplers for compute on Fermi
Textures and samplers don't seem to be aliased between COMPUTE and 3D.

Changes from v2:
 - refactor the code to share (almost) the same logic between 3d and
   compute

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:40 +01:00
Samuel Pitoiset
917a5ff6ea nvc0: bind shader buffers for compute on Fermi
This is loosely based on 3D. Shader buffers are bound on c15 (the
driver constbuf) at offset 0x200.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:37 +01:00
Samuel Pitoiset
a9b70a86db nvc0: bind driver constbuf for compute on Fermi
Changes from v3:
 - add new validation state for COMPUTE driver constbuf

Changes from v2:
 - always bind the driver consts even if user params come in via clover

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:32 +01:00
Samuel Pitoiset
527652629d nvc0: add a new validation state for 3D driver constbuf
This will be used to invalidate 3D driver constbuf when using COMPUTE
and vice-versa. This is needed because this CB contains a bunch of
useful information like the addrs of shader buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:29 +01:00
Samuel Pitoiset
57d4251003 nvc0: bind constant buffers for compute on Fermi
Loosely based on 3D.

Changs from v3:
 - invalidate COMPUTE CBs after validating 3D CBs because they are
   aliased

Changes from v2:
 - get rid of the 's' param to nvc0_cb_bo_push() because it doesn't
   matter to upload constbufs for compute using the 3d chan

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:25 +01:00
Samuel Pitoiset
53f92bb7f9 nvc0: allocate an area for compute user constbufs
For compute shaders, we might need to upload uniforms.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-21 10:41:21 +01:00
Jason Ekstrand
f1dddeadc2 anv: Fix a typo in apply_dynamic_offsets
shader->num_uniforms is in terms of bytes in i965.
2016-02-20 21:24:31 -08:00
Jason Ekstrand
b5868d2343 anv: Zero out the WSI array when initializing the instance 2016-02-20 19:30:14 -08:00
Jason Ekstrand
bc696f1db6 isl: Stop including mesa/main/imports.h
It pulls in all sorts of stuff we don't want.
2016-02-20 10:35:25 -08:00
Samuel Pitoiset
89d25a82e8 nv50: do not advertise about compute shaders
Compute shaders are totally unsupported. This avoids Clover to
report that OpenCL is supported on Tesla because it's a lie.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-20 19:25:12 +01:00
Jason Ekstrand
853fc3e431 genxml: Add mote includes in the generated headers 2016-02-20 09:33:20 -08:00
Jason Ekstrand
1f1cf6fcb0 anv: Get rid of GENX_FUNC
It was a bad idea.
2016-02-20 09:12:38 -08:00
Jason Ekstrand
371b4a5b33 anv: Switch over to the macros in genxml 2016-02-20 09:09:28 -08:00
Jason Ekstrand
0d76aa9485 intel/genxml: Add a couple of helper headers 2016-02-20 08:35:36 -08:00
Rhys Kidd
a0f55e91cc docs: Correct typo in LLVMpipe envvar description
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-20 16:15:35 +01:00
Ilia Mirkin
0b10ec1086 st/mesa: force depth mode to GL_RED for sized depth/stencil formats
See commit 9db2098d for the i965 version of this.

This fixes depth in a bunch of dEQP EXT_texture_border_clamp tests. And
probably other ones as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2016-02-19 17:37:39 -05:00
Daniel Czarnowski
e6f1a44d14 egl_dri2: set correct error code if swapbuffers fails
A return value of '-1' means that there was error during swap with a
window drawable, in this case we set error as EGL_BAD_NATIVE_WINDOW.

v2: coding style cleanup, better commit message

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-19 18:23:19 +00:00
Dongwon Kim
d1e1563bb6 egl: move Null check to eglGetSyncAttribKHR to prevent Segfault
Null-check on "*value" is currently done in _eglGetSyncAttrib, which is
after eglGetSyncAttribKHR dereferences it.

Move the check a layer up (in the beginning of eglGetSyncAttribKHR) to
avoid segfaults.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Emil Velikov: tweak commit message, add stable tag]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-19 18:23:19 +00:00
Ilia Mirkin
b697400a97 meta/copy_image: use precomputed dst_internal_format to avoid segfault
If the destination is a renderbuffer, dst_tex_image will be NULL. This
fixes the *to_renderbuffer dEQP copy image tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2016-02-19 13:10:28 -05:00
Ilia Mirkin
a03d6f2aa3 mesa: add GL_OES_texture_stencil8 support
It's basically the same thing as GL_ARB_texture_stencil8 except that
glCopyTexImage isn't supported, so add STENCIL_INDEX to the list of
invalid GLES formats for glCopyTexImage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-02-19 12:37:22 -05:00
Ilia Mirkin
2b938a390c st/mesa: fix pbo uploads
- LOD must be provided in .w for TXF (even for buffer textures)
 - User buffer must be valid at draw time
 - Must have a sampler associated with the sampler view

This makes PBO uploads work again on nouveau.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-19 11:30:33 -05:00
Ilia Mirkin
68c4af1c19 mesa: check fbo completeness based on internal format, not driver format
The base format is a function of the user-requested format, while the
driver format is not. So we should use the base format instead.

The driver format can be anything. Specifically in the stencil-only
case, it might be a depth/stencil format. However we still want to
refuse such an attachment when bound to GL_DEPTH.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-19 11:30:33 -05:00
Jason Ekstrand
2b85807458 genxml: Stop using unicode in the pack generator
This causes python problems and problems when people don't have a locale
set properly in their shell.
2016-02-19 08:05:35 -08:00
Dave Airlie
1375cb3c27 anv: fix warning about unused width variable.
We don't use width outside the debug clause here.
2016-02-19 08:01:54 -08:00
Brian Paul
0eb7b5c2a3 mesa: small optimization of _mesa_expand_bitmap()
Avoid a per-pixel multiply.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-19 08:51:51 -07:00
Brian Paul
8a2a1a6bd6 mesa: add special case ubyte[4] / BGRA conversion function
This reduces a glTexImage(GL_RGBA, GL_UNSIGNED_BYTE) hot spot in when
storing the texture as BGRA.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-19 08:51:51 -07:00
Brian Paul
44f48fead5 st/mesa: implement a simple cache for glDrawPixels
Instead of discarding the texture we created, keep it around in case
the next glDrawPixels draws the same image again.  This is intended
to help application which draw the same image several times in a row,
either within a frame or subsequent frames.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-02-19 08:51:51 -07:00
Brian Paul
71dcc067a5 llvmpipe: add a few const qualifiers
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-19 08:51:51 -07:00
Brian Paul
6d551f9ea3 trace: assorted whitespace and formatting fixes
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-02-19 08:49:51 -07:00
Brian Paul
e8689d9df3 trace: remove unneeded inline qualifiers
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-02-19 08:49:41 -07:00
Iago Toral Quiroga
72794b0bd9 glsl: fix emit_inline_matrix_constructor for doubles
Specifically, for the case where we initialize a dmat with a source
matrix that has fewer columns/rows.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-19 14:16:05 +01:00
Iago Toral Quiroga
d1617b4088 glsl: Mark float constants as such
So we don't generate double to float conversion code

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-19 14:16:05 +01:00
Iago Toral Quiroga
ad22886ef1 glsl: fix indentation in emit_inline_matrix_constructor
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-19 14:16:05 +01:00
Rob Clark
04ad05c987 glsl: fix standalone compiler
Need to set some non-zero limits for MaxCombinedUniformComponents,
otherwise we hit an "Too many <type> shader uniform components" error
in the linker.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-19 08:02:02 -05:00
Nicolai Hähnle
d7c4ffd1ee st/mesa: disable depth/stencil/alpha tests in PBO upload
Noticed by Brian Paul.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-18 20:49:12 -05:00
Brian Paul
2f3d06d9f9 svga: allow non-contiguous VS input declarations
This fixes a glDrawPixels regression since b63fe0552b.  The new
quad-drawing utility code uses 3 vertex attributes (xyz, rgba, st).
For glDrawPixels path we don't use the rgba attribute so there's a
gap in the TGSI VS input declarations (INPUT[0] = pos, INPUT[2] =
texcoord).  The TGSI->VGPU10 translations code did not handle this
correctly.  I missed this because my VM was configured for HWv11
while testing.

Another way to fix this would be to change the tgsi_scan.c code so
that the tgsi_shader_info::num_inputs (and num_outputs) included
the unused inputs/outputs.  These counts would then actually be
"max input register index + 1" rather than "number of used inputs".
But that change could impact all drivers so put it off for now.

No regressions found with piglit or typical GL apps.

v2: also update alloc_system_value_index() to use info.file_max[]

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-02-18 15:46:17 -07:00
Oded Gabbay
a3e3c3e621 gallivm: Check whether to stop disassemble only for x86
Because the if statement that checks whether we have a return
statement is valid only on x86, surround it with X86 or X86-64
arch defines

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-19 00:18:11 +02:00
Oded Gabbay
b3d42934a1 gallivm: use sstream for dissasembling
Currently, disassemble() directly prints to stdout. This has broke the
profiling support for llvmpipe JIT code.

This patch redirects the output to an sstream object, which is then
either gets printed to stdout (for assembly debugging) or gets written
to a file in /tmp/ (for profiling support).

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-19 00:18:11 +02:00
Rob Clark
93c62fdee9 trace: fix new gcc6 warnings
src/gallium/drivers/trace/tr_context.c:1713:39: warning: ‘rbug_blocker_flags’ defined but not used [-Wunused-const-variable]
 static const struct debug_named_value rbug_blocker_flags[] = {
                                       ^~~~~~~~~~~~~~~~~~

Note that use of rbug_blocker_flags was removed in:

commit 5494332128
Author: Jakob Bornecrantz <jakob@vmware.com>
Date:   Wed May 12 19:26:19 2010 +0100

    trace: Remove rbug from trace

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-02-18 17:10:55 -05:00
Rob Clark
5051d85b03 gallium/auxiliary: fix new gcc6 warnings
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c: In function ‘mm_bufmgr_create_from_buffer’:
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c:288:4:
warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
    if(mm->map)
    ^~
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c:286:1: note:
...this ‘if’ clause, but it is not
 if(mm->heap)
 ^~

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-02-18 17:10:55 -05:00
Rob Clark
bba836ea6a gallium/hud: fix new gcc6 warnings
src/gallium/auxiliary/hud/font.c:234:22: warning: ‘Fixed8x13_Character_159’ defined but not used [-Wunused-const-variable]
 static const GLubyte Fixed8x13_Character_159[] = {  9,  0,  0,  0,  0, 0,  0,170,  0,  0,  0,130,  0,  0,  0,130,  0,  0,  0,130,  0,  0, 0,170,  0,  0,  0,  0,  0};
                      ^~~~~~~~~~~~~~~~~~~~~~~
.... many more..

These are simply unused, just #if 0 them out for now, in case someone
wants to use them in the future.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-02-18 17:10:55 -05:00
Rob Clark
7d5372bfe8 mesa: fix new gcc6 warnings
src/mesa/main/texstore.c:92:22: warning: ‘map_1032’ defined but not used [-Wunused-const-variable]
 static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, ONE };
                      ^~~~~~~~
src/mesa/main/texstore.c:91:22: warning: ‘map_3210’ defined but not used [-Wunused-const-variable]
 static const GLubyte map_3210[6] = { 3, 2, 1, 0, ZERO, ONE };
                      ^~~~~~~~
src/mesa/main/texstore.c:90:22: warning: ‘map_identity’ defined but not used [-Wunused-const-variable]
 static const GLubyte map_identity[6] = { 0, 1, 2, 3, ZERO, ONE };
                      ^~~~~~~~~~~~

These appear to be unused since:

commit 8ec6534b26
Author:     Iago Toral Quiroga <itoral@igalia.com>
AuthorDate: Wed Oct 15 13:42:11 2014 +0200

    mesa: Use _mesa_format_convert to implement texstore_rgba.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 17:10:55 -05:00
Rob Clark
b01575ec99 glsl: fix new gcc6 warnings
src/compiler/glsl/lower_discard_flow.cpp:79:1: warning: ‘ir_visitor_status {anonymous}::lower_discard_flow_visitor::visit_enter(ir_loop_jump*)’ defined but not used [-Wunused-function]
 lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~

The base class method that was intended to be overridden was
'visit(ir_loop_jump *ir)', not visit_enter().

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 17:10:55 -05:00
Rob Clark
e93caca071 glsl: fix new gcc6 warnings
src/compiler/glsl/ast_to_hir.cpp: In function ‘unsigned int ast_process_struct_or_iface_block_members(exec_list*, _mesa_glsl_parse_state*, exec_list*, glsl_struct_field**, bool, glsl_matrix_layout, bool, ir_variable_mode, ast_type_qualifier*,
unsigned int, unsigned int)’:
src/compiler/glsl/ast_to_hir.cpp:6339:52: warning: ‘first_member_has_explicit_location’ may be used uninitialized in this function [-Wmaybe-uninitialized]
             if (!layout->flags.q.explicit_location &&
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
                 ((first_member_has_explicit_location &&
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   !qual->flags.q.explicit_location) ||
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  (!first_member_has_explicit_location &&
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   qual->flags.q.explicit_location))) {
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 17:10:55 -05:00
Rob Clark
e2060aaf57 i965: fix new gcc6 warnings
src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning:
‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function]
 fs_copy_prop_dataflow::dump_block_data() const
 ^~~~~~~~~~~~~~~~~~~~~

From looking at git history, it looks like this is intended to be unused
(ie. just for adding on-demand debug prints)

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 17:10:55 -05:00
Rob Clark
a13442ac67 util: fix new gcc6 warnings
src/util/hash_table.h:111:23: warning: ‘_mesa_fnv32_1a_offset_bias’ defined but not used [-Wunused-const-variable]
 static const uint32_t _mesa_fnv32_1a_offset_bias = 2166136261u;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 17:10:55 -05:00
Jason Ekstrand
698ea54283 anv/pipeline: Fix a typo in the pipeline layout code 2016-02-18 13:55:57 -08:00
Jason Ekstrand
d5bb23156d anv/allocator: Set is_winsys_bo to false for block pool BOs 2016-02-18 13:55:57 -08:00
Kenneth Graunke
1c694a6c20 glcpp: Disallow "defined" as a macro name.
Both GCC and Clang disallow this, and glslang has recently started
disallowing it as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94188
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-18 13:38:50 -08:00
Mark Janes
1b37276467 vulkan: fix out-of-tree build
We need to be able to find the generated gen*pack.h headers.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-18 12:30:27 -08:00
Jason Ekstrand
e0565f40ea anv/pipeline: Use nir's num_images for allocating image_params 2016-02-18 11:44:26 -08:00
Jason Ekstrand
79c0781f44 nir/gather_info: Count textures and images 2016-02-18 11:42:36 -08:00
Samuel Pitoiset
dfc95ad6d1 gallium/cso: only enable compute shaders when TGSI is supported
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94186
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-18 20:41:25 +01:00
Jason Ekstrand
e881c73975 anv/pipeline: Don't leak the binding map 2016-02-18 11:09:30 -08:00
Jason Ekstrand
8c23392c26 anv/formats: Don't use a compound literal to initialize a const array
Doing so makes older versions of GCC rather grumpy.  Newere GCC fixes this,
but using a compound literal isn't really gaining us anything anyway.
2016-02-18 10:44:08 -08:00
Jason Ekstrand
9851c8285f Move the intel vulkan driver to src/intel/vulkan 2016-02-18 10:37:59 -08:00
Jason Ekstrand
47b8b08612 Move isl to src/intel 2016-02-18 10:34:47 -08:00
Jason Ekstrand
f6d9587688 vulkan: Move XML and generator into src/intel/genxml 2016-02-18 10:30:29 -08:00
Kristian Høgsberg Kristensen
542c38df36 anv/meta: Initialize blend state for the right attachment
We were always initializing only RT 0. We need to initialize the RT
we're creating the clear pipeline for.
2016-02-18 10:22:50 -08:00
Kristian Høgsberg Kristensen
05f75a3026 anv/meta: Don't use the blit ds layout in resolve code 2016-02-18 10:22:50 -08:00
Rob Herring
5c7f97426d Android: disable unused-parameter warning
Android builds with -Wunused-parameter enabled which results in spewing
lots of warnings. Disable it so more meaningful warnings are more visible.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Rob Herring
7efc273df1 Android: enable building on arm64
Use the LOCAL_CFLAGS_{32/64} instead of arch specific variants to define
the DEFAULT_DRIVER_DIR. This enables building for arm64.

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Rob Herring
1f53a57b2f Android: Fix building secondary arch in mixed 32/64-bit builds
TARGET_CC is not defined for the secondary arch on combined 32/64-bit
builds. The build system uses 2ND_TARGET_CC instead and it is not meant
to be used in module makefiles. LOCAL_CC was used to provide C only
flags as -std=c99 is not valid for C++ files. Since Android 4.4,
LOCAL_CONLYFLAGS was added to set compiler flags on C files only, so it
can be used now instead of LOCAL_CC.

This will break on pre-4.4 versions of Android, but it unlikely anyone
is using current Mesa with such an old version of Android.

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Rob Herring
ba06ea1a37 egl: android: clean-up config attribute setting
Pass the additional config attributes to dri2_add_config to set them
instead of open coding them. This is in preparation to add more attributes.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Varad Gautam
e35c5af337 egl: android: fix visuals declaration
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Rob Herring
64d2f398f6 Android: fix build break in libmesa_program
Commit 5fd848f6c9 ("program: Use _mesa_geometric_samples to calculate
gl_NumSamples") broken Android builds. Add the missing include path "main"
to framebuffer.h like other includes in prog_statevars.c.

Cc: Neil Roberts <neil@linux.intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-18 17:47:33 +00:00
Ilia Mirkin
12e3ad2ae9 mesa: gl_NumSamples should always be at least one
From ARB_sample_shading:

    "gl_NumSamples is the total number of samples in the framebuffer,
     or one if rendering to a non-multisample framebuffer"

So make sure to always pass in at least 1.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O`Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2016-02-18 12:35:28 -05:00
Plamena Manolova
65dfb3048e compiler/glsl: Fix uniform location counting.
This patch moves the calculation of current uniforms to
link_uniforms, which makes use of UniformRemapTable which
stores all the reserved uniform locations.

Location assignment for implicit uniforms now tries to use
any gaps left in the table after the location assignment
for explicit uniforms. This gives us more space to store more
uniforms.

Patch is based on earlier patch with following changes/additions:

   1: Move the counting of explicit locations to
      check_explicit_uniform_locations and then pass
      the number to link_assign_uniform_locations.
   2: Count the number of empty slots in UniformRemapTable
      and store them in a list_head.
   3: Try to find an empty slot for implicit locations from
      the list, if that fails resize UniformRemapTable.

Fixes following CTS tests:
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93696
2016-02-18 11:53:35 +02:00
Jason Ekstrand
40c76d4efa Delete nir_lower_samplers.cpp
Somehow, in one of the merges with mesa master, the old file must have been
kept when nir_lower_samplers.cpp was moved to nir_lower_samplers.c.
2016-02-17 20:16:11 -08:00
Roland Scheidegger
d335b6abc0 gallivm, tgsi: provide fake sample_i_ms implementations
Just like the rest of the msaa "implementation" it's just fake for now...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-18 05:00:03 +01:00
Brian Paul
06d3b0a006 st/mesa: new st_DrawAtlasBitmaps() function for drawing bitmap text
This basically saves the current pipeline state, sets up state for
rendering, constructs a set of textured quads, renders, then restores
the previous pipeline state.

It shouldn't be hard to implement a similar function for non-gallium
drives.  With some code refactoring, the vertex definition code could
probably be shared.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-17 19:57:48 -07:00
Brian Paul
b26ddda12f mesa: implement a display list / glBitmap texture atlas
This improves the performance of applications which use glXUseXFont()
or wglUseFontBitmaps() and glCallLists() to draw bitmap text.

Basically, we collect all the glBitmap images from the display lists
and put them into a texture atlas.  To render the bitmaps for a
glCallLists() command, we render a set of textured quads where each
quad is textured with one bitmap image.  Actually, the rendering part
has to be done by the Mesa driver or Mesa/gallium state tracker.

Note that GLUT demos that use glutBitmapCharacter() don't benefit
from this.

v2, per Nicolai Hähnle:
- check the max tex rect size is at least 1024.
- add comment in dd.h that texture_rectangle is required.
- in _mesa_DeleteLists(), try to delete the atlas before the list(s)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-17 19:57:48 -07:00
Ilia Mirkin
6f4a725073 st/mesa: apply DepthMode swizzle to stencil texturing as well
Gallium doesn't present these as GL_RED-style. A swizzle is necessary to
present the proper data in the unused components.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-17 21:20:24 -05:00
Jason Ekstrand
005b9ac758 anv: Gut anv_pipeline_layout
Almost none of the data in anv_pipeline_layout is used anymore thanks to
doing real layout in the pipeline itself.
2016-02-17 18:04:40 -08:00
Kristian Høgsberg Kristensen
c2581a9375 anv: Build the real pipeline layout in the pipeline
This gives us the chance to pack the binding table down to just what the
shaders actually need.  Some applications use very large descriptor sets
and only ever use a handful of entries.  Compacted binding tables should be
much more efficient in this case.  It comes at the down-side of having to
re-emit binding tables every time we switch pipelines, but that's
considered an acceptable cost.
2016-02-17 18:04:39 -08:00
Jason Ekstrand
581e4468f9 nir/spirv: Add some more capabilities 2016-02-17 18:04:39 -08:00
Jason Ekstrand
fed8b7f817 anv/pipeline: Delete out-of-bounds fragment shader outputs 2016-02-17 18:04:39 -08:00
Jason Ekstrand
979732fafc nir: Add a helper for getting the one function from a shader 2016-02-17 18:04:39 -08:00
Jason Ekstrand
8c05b44bbb nir: Add a nir_foreach_variable_safe helper 2016-02-17 18:04:39 -08:00
Jason Ekstrand
d67d84f5e5 i965/nir: Do lower_io late for fragment shaders 2016-02-17 18:04:39 -08:00
Jason Ekstrand
7c26d8d471 anv/gen7_pipeline: Set WriteDisable = true if we have no color attachments 2016-02-17 18:04:39 -08:00
Jason Ekstrand
9f9cd3de44 anv/gen8_pipeline: Default color attachments to WriteDisable = true 2016-02-17 18:04:39 -08:00
Jason Ekstrand
da9fd74d34 anv: Pull StencilBufferWriteEnable from both sides 2016-02-17 18:04:39 -08:00
Nanley Chery
9963af8bbd anv: Ignore unused dimensions in vkCreateImage's anv_image
We ignore unused dimensions in the isl surface; do the same for the
resulting anv_image.

Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2016-02-17 17:32:26 -08:00
Ben Widawsky
20e8ee3662 i965/skl: Update Skylake renderer strings
Also adds some of the Iris/Pro parts which we previously didn't have named.

v2: 0x192d is gt3, not gt4
Adding some 'e' tags for eDRAM parts

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
2016-02-17 16:50:59 -08:00
Ben Widawsky
644c8a5151 i965/skl: Add two missing device IDs
The Iris part is left unbranded because we did not have these with original SKL.

v2: 0x192d is gt3, not gt4

v3: Forgot to update the temporary brand string when I did v2.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
2016-02-17 16:50:59 -08:00
Ilia Mirkin
f3cd62a765 mesa: allow multisampled format info to be returned on GLES 3.1
The restriction on multisampled integer texture formats only applies to
GLES 3.0, so don't apply it to GLES 3.1 contexts. This fixes a slew of

dEQP-GLES31.functional.state_query.internal_format.*

tests, which now all pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-17 19:30:40 -05:00
Kristian Høgsberg Kristensen
b8da261dc7 spirv: Fix SpvOpFwidth, SpvOpFwidthFine and SpvOpFwidthCoarse
"Result is the same as computing the sum of the absolute values of
    OpDPdx and OpDPdy on P."

We were doing sum of absolute values of OpDPdx of P and OpDPdx of NULL.
2016-02-17 15:28:52 -08:00
Kristian Høgsberg Kristensen
ae3e249d57 anv: Remove hacky PIPE_CONTROL in vkCmdEndRenderPass()
The vkCmdPipelineBarrier() command should work as intended now and we
need to pull the plug on this old hack.
2016-02-17 15:19:07 -08:00
Kristian Høgsberg Kristensen
5e92e91c61 anv: Rework vkCmdPipelineBarrier()
We don't need to look at the stage flags, as we don't really support any
fine-grained, stage-level synchronization. We have to do two
PIPE_CONTROLs in case we're both flushing and
invalidating.

Additionally, if we do end up doing two PIPE_CONTROLs, the first,
flusing one also has to stall and wait for the flushing to finish, so we
don't re-dirty the caches with in-flight rendering after the second
PIPE_CONTROL invalidates.
2016-02-17 15:18:06 -08:00
Ben Widawsky
2bf041d94f i965: Extract push constant state to a new file
Every stage has a corresponding 3DSTATE_CONSTANT_XS packet, so having
the code to create and emit push constant buffers in genX_vs_state.c
is a little strange.  Moving it to a separate file seems more logical.

v2 [Ken]: Rebase on master, explain motivation in the commit message.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-17 12:34:23 -08:00
Matt Turner
0e9dc59a58 i965: Make emit_minmax return an instruction*.
And use it in brw_fs_nir.cpp.
2016-02-17 12:35:27 -08:00
Matt Turner
2f2c00c727 i965: Lower min/max after optimization on Gen4/5.
Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.

On Ironlake:

   total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
   instructions in affected programs: 326604 -> 323322 (-1.00%)
   helped: 1411

   total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
   cycles in affected programs: 18950290 -> 18867176 (-0.44%)
   helped: 2419
   HURT: 328

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-17 12:35:27 -08:00
Matt Turner
378d98f87e i965/vec4: Initialize force_writemask_all in vec4_builder().
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-17 12:35:27 -08:00
Kristian Høgsberg Kristensen
3b9b908054 anv: Ignore unused dimensions in vkCreateImage
We would assert on unused dimensions (eg extent.depth for
VK_IMAGE_TYPE_2D) not being 1, but the specification doesn't put any
constraints on those. For example, for VK_IMAGE_TYPE_1D:

   "If imageType is VK_IMAGE_TYPE_1D, the value of extent.width must be
    less than or equal to the value of
    VkPhysicalDeviceLimits::maxImageDimension1D, or the value of
    VkImageFormatProperties::maxExtent.width (as returned by
    vkGetPhysicalDeviceImageFormatProperties with values of format,
    type, tiling, usage and flags equal to those in this structure) -
    whichever is higher"

We'll fix up the arguments to isl to keep isl strict in what it expects.
2016-02-17 12:21:51 -08:00
Kristian Høgsberg Kristensen
b63e28c0e1 anv: Set correct write domain on window system BOs
We need to make sure GEM understands that we're writing to the BO, in
case it needs to synchronize with other rings (blitter use in display
server, for example).
2016-02-17 11:19:56 -08:00
Tom Stellard
dc7cf07af3 radeon/llvm: Add TargetLibraryInfo to the pass manager
This will prevent optimization passes from introducing unsupported
library calls.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Tom Stellard
4f351a6cb1 radeon/llvm: Set the target triple on the module
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Tom Stellard
77f4e1c7ff gallivm: Add helpers for creating and destroying TargetLibraryInfo
This functionality is not exposed via the LLVM C API.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-17 19:06:41 +00:00
Samuel Pitoiset
cfd1dd0500 nvc0: invalidate all buffers when switching pipe contexts
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-17 21:14:24 +01:00
Ilia Mirkin
49c67926c7 st/mesa: fix up result_src.type when doing i2u/u2i conversions
Even though it's a no-op, it's important to keep track of the type so
that we can pick the properly-signed op later on.

This fixes dEQP-GLES3.functional.shaders.precision.uint.highp_div_fragment,
which ended up using IDIV instead of UDIV.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2016-02-17 13:30:33 -05:00
Brian Paul
5e52df2198 st/mesa: use cso_set_viewport_dims() in try_pbo_upload_common()
Note that this results in a different transformation for the viewport's
Z axis (depth range), but that doesn't matter for this case.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-17 11:25:02 -07:00
Jordan Justen
9a939ebb47 i965/gen7: Use predicated rendering for indirect compute
On gen7 (Ivy Bridge, Haswell), we will get a GPU hang if an indirect
dispatch is used, but one of the dimensions is 0.

Therefore we use predicated rendering on the GPGPU_WALKER command to
handle this case.

Fixes piglit test: spec/arb_compute_shader/zero-dispatch-size

From the ARB_compute_shader spec, under DispatchCompute:

"If the work group count in any dimension is zero, no work groups are
 dispatched."

And then for DispatchComputeIndirect:

... "is equivalent (assuming no errors are generated) to calling
DispatchCompute with <num_groups_x>, <num_groups_y> and
<num_groups_z>" ...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94100
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-17 09:25:47 -08:00
Rob Clark
37d540ba70 freedreno: expose time-elapsed query
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
ba194630cc freedreno/a4xx: implement time-elapsed query
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
62fa868728 freedreno/a4xx: better occlusion/sample counting
This seems to give more reliable results.  More similar to what we do on
a3xx, although I think it breaks the a3xx theory that the four sets of
results map to each MRT (since we appear to still only have four sets on
a4xx).  The divide-by-two is a bit odd, but seems to be needed for some
reason.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
87eb406791 freedreno/query: fix refcnt'ing issue
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
0e91dccf9c freedreno/query: some queries don't have ->begin_query()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
9d23d7b7cb freedreno/query: align counter snapshot locations
Some hw queries need their sample memory locations to have certain
alignment.  At the moment that isn't an issue, since the only hw query
is occlusion, so all samples have the same size.  But when others are
added with different sample sizes, this starts to be a problem.

All current and immediately upcoming hw queries simply need their
sample address aligned to their size, so let's use that for now.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
8529e210ec freedreno/query: add optional enable hook
Add enable hook for hw query providers.  Some will need to configure
perfctr selector registers, which we want to do at the start of the
submit.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
45ab5b1c34 freedreno: query max gpu freq
This will be needed to support converting from cycle counts to time for
performance related queries (initially time-elapsed, but there are some
additional performance counters that could be wired up).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
dcb69185a0 freedreno: update generated headers
Mostly to pull in perf ctrs.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-17 10:41:55 -05:00
Rob Clark
2a7ceb5957 freedreno/ir3: fix new gcc6 errors
src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c: In function ‘emit_tex’:
src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c:1368:26: warning: unused variable ‘const_off’ [-Wunused-variable]
  struct ir3_instruction *const_off[4];
                          ^~~~~~~~~
unused since:

commit 8750299a42
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Feb 9 14:51:28 2016 -0800

    nir: Remove the const_offset from nir_tex_instr

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-02-17 10:41:55 -05:00
Kristian Høgsberg Kristensen
5caa995c32 Revert "anv: Disable snooping for allocator pools again"
This reverts commit c136672c59.

We still have the intermittent missing flush for VkEvent in certain
vulkancts cases:

  piglit.deqp-vk.api.command_buffers.execute_large_primary
  piglit.deqp-vk.api.command_buffers.submit_count_non_zero,

Let's reenable the snooping until we figure out the root cause.
2016-02-16 23:23:49 -08:00
Kristian Høgsberg Kristensen
ecc67f1aac anv: Make driver and icd file installable
Change the name of the .so to libvulkan_intel.so and add an installable
icd with the installed paths.  Keep the icd file with build-tree paths,
but rename to dev_icd.json to make it clear that it's for development
purposes.
2016-02-16 23:23:17 -08:00
Kristian Høgsberg Kristensen
4a2d17f606 anv: Revise PhysicalDeviceFeatures and remove FINISHME 2016-02-16 15:43:12 -08:00
Karol Herbst
edf774bb7e nv50/ir: we can't do the add to mad conversion when the mul saturates
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 18:20:10 -05:00
Karol Herbst
068e9848ba nv50/ir: optimize neg(and(set, 1)) to set
helps shaders in saints row IV, bioshock infinite and shadow warrior

total instructions in shared programs : 1914931 -> 1903900 (-0.58%)
total gprs used in shared programs    : 247920 -> 247785 (-0.05%)
total local used in shared programs   : 5673 -> 5673 (0.00%)
total bytes used in shared programs   : 17558272 -> 17457320 (-0.57%)

                local        gpr       inst      bytes
    helped           0         137         719         719
      hurt           0          12           0           0

v2: remove this opt for OP_SLCT and check against float for OP_SET
v3: simplified the code

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 18:20:10 -05:00
Ilia Mirkin
ca23c8081f nv50/ir: fix quadop emission in the presence of predication
When there's a predicate, it just goes onto the sources list. If the
quadop only has a single regular source, we will end up thinking that
the predicate is the second source. Check explicitly for the predSrc so
that we don't accidentally emit the wrong thing.

This fixes a bunch of dEQP-GLES3.functional.shaders.derivate.* tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-02-16 18:20:10 -05:00
Ilia Mirkin
1d1ddfe5f8 nv50,nvc0: enable/disable seamless cubemap texturing as requested
In a situation where the seamless setting isn't available on a
per-texture basis (G200+ Teslas, and all Fermis), assume that all
samplers will have it identically set, and enable accordingly.

This fixes arb_seamless_cubemap piglit test on Fermi and Tesla.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 18:20:10 -05:00
Philipp Zabel
ecd1d94d1c anv: pCreateInfo->pApplicationInfo parameter to vkCreateInstance may be NULL
Fix a NULL pointer dereference in anv_CreateInstance in case
the pApplicationInfo field of the supplied VkInstanceCreateInfo
structure is NULL [1].

[1] https://www.khronos.org/registry/vulkan/specs/1.0/apispec.html#VkInstanceCreateInfo

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
2016-02-16 14:42:26 -08:00
Rob Clark
d49307435a st/mesa: add missing ETC2 entries to format_map
Noticed by Ilia when I was trying to figure out why some app was failing
to use ETC2.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:53:43 -05:00
Samuel Pitoiset
3d5f61a262 nvc0: enable compute support on GK110:GM200 with an envvar
Without this NVF0_COMPUTE environment variable, compute support is
initialized by default and this is not what we want for now because
it might break 3D. It will be enabled by default once we are sure it
won't break anything.

Please note that compute support on GM200+ is not enabled yet because
it needs to be double-checked.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 21:39:00 +01:00
Samuel Pitoiset
6d74fa5756 nvc0: add compute support for GM107
Fortunately, compute support on GM107 is very close to GK110, except
the GK110_COMPUTE.UNK02C4 which is invalid and should not be used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 21:39:00 +01:00
Samuel Pitoiset
bc331dd838 nvc0: fix compute state initialization on GK110+
Because our firmware doesn't support the GK110_COMPUTE.FIRMWARE[0x6]
method the GPU hangs when it is used. Removing it fix the issue and
allow to launch compute shaders on GK110+.

Tested on GK208 and GM107.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 21:39:00 +01:00
Timothy Arceri
a61823b584 glsl: remove duplicate interpolation_string() function
We already have one in the IR code that can be used everywhere its
needed in the AST code so remove the one from the AST.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-02-17 07:26:38 +11:00
Timothy Arceri
e70ece4eea glsl: remove unused helper
Seems to have become unused when i965 moved to NIR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-17 07:25:10 +11:00
Timothy Arceri
07e6a37332 glsl: set user defined varyings to smooth by default in ES
This is usually handled by the backends in order to handle the
various interactions with the gl_*Color built-ins.

The problem is this means linking will fail if one side on the
interface adds the smooth qualifier to the varying and the other
side just uses the default even though they match.

This fixes various deqp tests. The spec is not clear what to for
desktop GL so leave it as is for now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92743
2016-02-17 07:23:49 +11:00
Samuel Pitoiset
f638512890 gm107/ir: add ATOM CAS emission
This fixes the following dEQP test and the other compswap variants.

dEQP-GLES31.functional.ssbo.atomic.compswap.highp_int

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 20:53:39 +01:00
Samuel Pitoiset
09446cf5f6 st/mesa: do not init limits when compute shaders are not supported
When the number of uniform blocks is less than 12,
ARB_uniform_buffer_object can't be enabled and the maximum GL version
is not even 3.1...

This fixes a regression introduced in 7c79c1e (st/mesa: add compute
shader state) if the maximum number of uniform blocks allowed for
compute shaders is less than 12. This happens on Kepler but this might
also affect other Gallium drivers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2016-02-16 20:53:35 +01:00
Jordan Justen
f28d80fabf mesa: Don't call driver when there is no compute work
The ARB_compute_shader spec says:

  "If the work group count in any dimension is zero, no work groups
   are dispatched."

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 09:25:20 -08:00
Jordan Justen
8514c75a26 i965: Set compute shader shared memory max to 64k
See Ivy Bridge PRM, Volume 2, Part 2, 1.8.4 INTERFACE_DESCRIPTOR_DATA:

DWORD 5, bits 20:16: "This field indicates how much shared local
memory the thread group requires. The amount is specified in 4k
blocks, but only powers of 2 are allowed: 0, 4k, 8k, 16k, 32k and 64k
per half-slice."

For Haswell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:

DWORD 5, bits 20:16: With text identical to the Ivy Bridge PRM.

For Broadwell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:

DWORD 6, bits 20:16: With text identical to the Ivy Bridge PRM.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 09:25:20 -08:00
Brian Paul
f90801cd40 st/mesa: use new CSO_BITS_ALL_SHADERS
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-16 10:22:32 -07:00
Brian Paul
1bf8fa8277 cso: add CSO_BITS_ALL_SHADERS
For saving/restoring all shader stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-16 10:22:32 -07:00
Brian Paul
a0636157c4 st/mesa: simplify st->ctx, ctx->st usage in a various places 2016-02-16 10:22:32 -07:00
Brian Paul
5239832cf1 st/mesa: use _mesa_geometric_width/height() in glDrawPixels code
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 10:22:32 -07:00
Brian Paul
b92d48fb6b st/mesa: rename attr variable in st_DrawTex()
Rename to 'tex_attr' to be a bit more clear.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
5ce1f1245d st/mesa: use 'cso' instead of 'st->cso_context' in st_DrawTex()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
79ffe94c8b st/mesa: fix whitespace and add comment in st_DrawTex()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
4277618235 st/mesa: used _mesa_num_tex_faces() in st_finalize_texture()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
ffa1a1dd21 cso: make most of the cso_save/restore_x() functions static
Users of the CSO save/restore facility all use the new
cso_save/restore_state() functions instead.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
223ffd8a08 postprocess: use new cso_save/restore_state() functions
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
70e8a4f734 gallium/hud: use new cso_save/restore_state() functions
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
66889d8f84 gallium/util: use new cso_save/restore_state() functions
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
38db9a4e26 st/mesa: use cso_save/restore_state() in st_cb_texture.c
This simplifies the error handling code too.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
33fc248606 st/mesa: use new cso_save/restore_state() functions
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
9403571755 cso: add new cso_save/restore_state() functions
cso_save_state() takes a bitmask of state items to save.  Calling
cso_restore_state() restores those states.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
017a003f1c cso: remove comment
There's a similar comment just a few lines before.
2016-02-16 10:22:32 -07:00
Brian Paul
347b9418ac st/mesa: use new cso_set_viewport_dims() helper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
f7af12ae85 cso: add new cso_set_viewport_dims() helper
To simplify some viewport setting code in the state tracker.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
f88c859cd3 st/mesa: use 'cso' local var instead of st->cso_context
Just a little cleaner.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
d7d4fe90c4 st/mesa: consolidate quad drawing code
The glClear, glBitmap and glDrawPixels code now use a new st_draw_quad()
helper function.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:32 -07:00
Brian Paul
b63fe0552b st/mesa: overhaul vertex setup for clearing, glDrawPixels, glBitmap
Define a new st_util_vertex structure which is a bit smaller (9 floats
versus the previous 12 floats per vertex).  Clean up the glClear,
glDrawPixels and glBitmap code that sets up the vertex data and does the
drawing so it's all very similar.  This can lead to more consolidation.

v2: add assertion that vertex buffer slot == 0 to catch possible future
change in cso_get_aux_vertex_buffer_slot() behavior.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:31 -07:00
Brian Paul
2b1535f82f st/mesa: include u_draw.h, not u_draw_quad.h in st_draw.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-16 10:22:31 -07:00
Jason Ekstrand
48087cfc4e anv/icd.json: Update the ABI version 2016-02-16 08:02:17 -08:00
Jason Ekstrand
0a3324e66c anv: Pull Khronos stuff from the README 2016-02-16 07:43:21 -08:00
Jan Vesely
04085afcbf configure: Bail out on llvm-config component error
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-16 10:09:33 -05:00
Matthew Dawson
0bba5ca468 Handle removal of LLVMAddTargetData in SVN revision 260919
LLVM removed LLVMAddTargetData for the 3.9 release in r260919.  For the two
places in mesa where this is called, only enable the lines when compiling
for less then 3.9.

For the radeon driver, I'm not sure how to check if any other LLVM calls need
to be adjusted.  I think since the target data used is extracted from the
LLVMModule, it isn't necessary to pass it back to LLVM again.

The code does compile, and at least for radeonsi does run OpenGL games.

[ Michel Dänzer: Move #if closer to LLVMAddTargetData in lp_bld_init.c,
  and add HAVE_LLVM < 0x0309 guards around now unused occurrences of TD
  and data_layout ]

Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-16 16:18:35 +09:00
Topi Pohjolainen
7287cc8440 i965: Expose logic telling if non-msrt mcs is supported
Alos use the opportunity to mark inputs constant. (Context has to be
given as read-write to intel_miptree_supports_non_msrt_fast_clear()
to support debug output).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
dd37b6aaa9 i965/gen9: Refactor msrt mcs initialization
This will be re-used to initialize auxiliary buffers in lossless
compression case.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
2bd58790e2 i965: Add a few assertions on lossless compression
v2 (Ben): Use combination of msaa_layout and number of samples
          instead of introducing explicit type for lossless
          compression (intel_miptree_is_lossless_compressed()).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
56f29911ec i965: Add a flag telling color resolve pass to ignore CCS_E
v2 (Ben): Use combination of msaa_layout and number of samples
          instead of introducing explicit type for lossless
          compression (intel_miptree_is_lossless_compressed()).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
97f4ca90b8 i965: Add resolve option for lossless compression
v2 (Ben): Use combination of msaa_layout and number of samples
          instead of introducing explicit type for lossless
          compression (intel_miptree_is_lossless_compressed()).

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
0e79bff957 i965: Allow fast clear to be used with lossless compression
v2 (Ben): Use combination of msaa_layout and number of samples
          instead of introducing explicit type for lossless
          compression.
v3 (Ben): Squash with "i965: Resolve color buffer also in
          lossless compression case" and clarify simple
          non-compressed fast clear case.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:24 +02:00
Topi Pohjolainen
4b801116d3 i965: Add helper for detecting lossless compression
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-16 08:52:23 +02:00
Topi Pohjolainen
36b7c0dad9 Revert "i965: Restore vbo after color resolve during brw_try_draw_prims()"
This got pushed accidentally in the first place but wasn't reverted
as it didn't regress piglit but instead fixed one newly introduced
test exercising a corner in case in i965 driver. However, saving and
restoring vertex buffer context is complicated and requires more
thought.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94150

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
2016-02-16 08:52:14 +02:00
Ben Skeggs
33ace5544e nvc0: initial support for GM20x GPUs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:16 +10:00
Ben Skeggs
97fc3fd559 nvc0: implement support for maxwell texture headers
Adds support for the new TIC layout that's present on Maxwell GPUs,
heavily based on the code for the existing layout.

This code is required for GM20x support.  While GM10x supports the older
layout still, this commit switches it to use the updated version instead.

Piglit testing shows zero regressions on GM107.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:13 +10:00
Ben Skeggs
7333b0c20c nvc0: import maxwell texture header definitions from rnndb
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:10 +10:00
Ben Skeggs
733c8f8c73 nv50-: split tic format specification
We previously stored texture format information as it would appear in
the TIC.

We're about to support the new TIC layout that appeared with Maxwell,
so it makes more sense to store the data in a split-out format.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:07 +10:00
Ben Skeggs
a928cbc205 nv50-: remove nv50_texture.xml.h
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:05 +10:00
Ben Skeggs
ff1af29dd9 nvc0: switch nvc0_tex.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:03 +10:00
Ben Skeggs
c999736c18 nvc0: switch nvc0_surface.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:02 +10:00
Ben Skeggs
63880dca12 nv50: switch nv50_tex.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:57:00 +10:00
Ben Skeggs
a15c08c95c nv50: switch nv50_surface.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:58 +10:00
Ben Skeggs
59d93ad1be nv50: switch nv50_state.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:56 +10:00
Ben Skeggs
1a45b7afb6 nv50-: switch nv50_formats.c to updated g80_texture.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:54 +10:00
Ben Skeggs
d5ac81295d nv50: import updated g80_texture.xml.h from rnndb
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:52 +10:00
Ben Skeggs
7235b6250d nv50-: remove nv50_defs.xml.h
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:50 +10:00
Ben Skeggs
b04b16754c nv50-: switch nv50_formats.c to updated g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:48 +10:00
Ben Skeggs
3444f83077 nv50-: improved macros to handle format specification
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:45 +10:00
Ben Skeggs
346d7a24ea nv50-: separate vertex formats from surface format descriptions
We've previously had identical naming between vertex and texture
formats, so it mostly made sense to define these together.

However, upcoming patches are going to transition the driver over to
using updated texture header definitions using NVIDIA's naming, and this
will no longer be the case.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:42 +10:00
Ben Skeggs
3e2dd50d81 nvc0: remove unnecessary includes
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:40 +10:00
Ben Skeggs
e8eda47898 nvc0: switch nvc0_tex.c to updated g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:38 +10:00
Ben Skeggs
546ccf3f82 nvc0: switch nvc0_surface.c to updated g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:36 +10:00
Ben Skeggs
0a0d8e4497 nv50: remove unnecessary include
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:33 +10:00
Ben Skeggs
9c4b7748db nv50: switch nv50_transfer.c to g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:31 +10:00
Ben Skeggs
577eeb7984 nv50: switch nv50_tex.c to updated g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:29 +10:00
Ben Skeggs
114d41feb2 nv50: switch nv50_surface.c to updated g80_defs.xml.h
Verified (binary diff) to produce identical code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:27 +10:00
Ben Skeggs
413cc25753 nv50: import updated g80_defs.xml.h from rnndb
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-16 15:56:12 +10:00
Nicolai Hähnle
2de9317d5f st/mesa: count shader images in MaxCombinedShaderOutputResources
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-15 22:22:34 -05:00
Ilia Mirkin
1edbe0157d st/mesa: enable GL image extensions when backend supports them
This enables ARB_shader_image_load_store and ARB_shader_image_size when
the backend claims support for these. It will also implicitly enable the
image component of ARB_shader_texture_image_samples.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
2e0a84208b st/mesa: convert GLSL image intrinsics into TGSI
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
672257dc69 st/mesa: allow st_format.h to be included from C++ files
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Nicolai Hähnle
ef27190a34 st/mesa: set pipe_image_view layers correctly for 3D textures
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-15 22:22:33 -05:00
Nicolai Hähnle
f1b0bda6bc st/mesa: call st_finalize_texture from image atoms
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
78093167b1 st/mesa: add an image atom for shader images
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
e2a1ec5f0f tgsi: show textual format representation
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
9fbfa1abb2 gallium: add PIPE_SHADER_CAP_MAX_SHADER_IMAGES
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
bceff68114 gallium: make image views non-persistent objects
Make them akin to shader buffers, with no refcounting/etc. Just used to
pass data about the bound image in ->set_shader_images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-15 22:22:33 -05:00
Ilia Mirkin
cfbf25ac8f st/mesa: empty buffer binding if the buffer's not really there
This can happen with 0-sized buffers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-15 22:22:33 -05:00
Kristian Høgsberg Kristensen
a3672a241b anv/genxml: Include MBO bits for gen7 and gen75 2016-02-15 17:57:03 -08:00
Kristian Høgsberg Kristensen
c2b2ebf1ed anv: Add missing gen75_cmd_buffer_set_subpass() prototype 2016-02-15 17:40:15 -08:00
Adam Jackson
80ec20351c anv: Bump to 1.0.3
Probably this should be picked up from <vulkan.h> directly, or we should
just assume that any 1.0.x is legal.
2016-02-15 17:38:26 -08:00
Kristian Høgsberg Kristensen
b53edea76c anv/gen7: Make disabling the FS work
We disable the fragment shader for depth/stencil-only pipelines. This
commit makes that work for gen7.
2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
85f67cf16e anv: Deduplicate render pass code
This lets us share the renderpass code and depth/stencil state code
between gen 7 and gen 8.
2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
ac4fd0ed21 anv/gen7: Fix pipeline selection in init_device_state()
We need the 3D pipeline for the initial setup, not GPGPU.
2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
ea694637ac anv/gen7: Set 3DSTATE_SF depth buffer format correctly
We need to pull this from the render pass information at state flush
time.
2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
18dd59538b anv/gen7: Call flush_pipeline_select_3d() from CmdBeginRenderPass 2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
832f73f512 anv: Share flush_pipeline_select_3d() between gen7 and gen8 2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
53eaa0a6b8 anv: Fix warning 3DSTATE_VERTEX_ELEMENTS setup
This is a little more subtle. If elem_count is 0, nothing else happens
in this function, so we return early to avoid warning about
uninitialized 'p'.
2016-02-15 17:32:07 -08:00
Kristian Høgsberg Kristensen
5d72d7b12d anv: Fix misc simple warnings 2016-02-15 17:32:07 -08:00
Rhys Kidd
76e2af3dd4 docs: Document VC4_DEBUG envvar
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Rhys Kidd
aa82cc4b22 vc4: Add missing braces in initializer
Silences the following GCC warning:

mesa/src/gallium/drivers/vc4/vc4_qir_schedule.c: In function 'qir_schedule_instructions':
mesa/src/gallium/drivers/vc4/vc4_qir_schedule.c:578:16: warning: missing braces around initializer [-Wmissing-braces]
         struct schedule_state state = { 0 };
                ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Rhys Kidd
c75ced3623 vc4: Correct typo setting 'handled_qinst_cond'
Variable was previously always set to true. Accordingly, the later
assert() served no active purpose.

Found with GCC warning and code inspection:

mesa/src/gallium/drivers/vc4/vc4_qpu_emit.c: In function'vc4_generate_code':
mesa/src/gallium/drivers/vc4/vc4_qpu_emit.c:315:22: warning: variable 'handled_qinst_cond' set but not used [-Wunused-but-set-variable]
                 bool handled_qinst_cond = true;
                      ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Eric Anholt
655fa0f465 vc4: Don't treat conditional MOVs as raw MOV.
The two consumers want to know that the destination will be exactly the
source, which is not true if we might not set the destination.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Timothy Arceri
00a1bd13b5 glsl: warn in GL as well as ES when varying not written
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93339
2016-02-16 11:15:43 +11:00
Ilia Mirkin
6d39075c06 docs: update GLES 3.1 section for recent nvc0 additions
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-15 17:43:37 -05:00
Jason Ekstrand
08ecd8a8d1 anv/meta_resolve: Set origin_upper_left on gl_FragCoord
It's required by the spec and any shaders that don't set it will be broken.
I'm not really sure how multisampling was even working before...
2016-02-15 12:45:03 -08:00
Ilia Mirkin
4360ba0caf mesa: need to check resource and set length even if bufSize is 0
This fixes a number of dEQP tests, such as:

dEQP-GLES31.functional.program_interface_query.buffer_limited_query.resource_query

It was expecting the length to be set even in the bufSize == 0 case.
Also _mesa_get_program_resourceiv does some error checking on the
resource which should probably happen even in the bufSize == 0 case as
well although there's no dEQP test for that.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-02-15 12:20:25 -05:00
Ben Widawsky
66c790720b i965/bxt: Production thread counts
v2: Forgot to squash in the comment removal

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-15 07:48:09 -08:00
Daniel Czarnowski
5d87a7c894 egl_dri2: NULL check for xcb_dri2_get_buffers_reply()
Without the check, unsuccessful xcb_dri2_get_buffers_reply(...) causes
segmentation fault in dri2_get_buffers.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
2016-02-15 07:43:27 +02:00
Edward O'Callaghan
331f963b7e nv50,nvc0: Remove duplicate logic from nvc0_set_framebuffer_state()
We already have this logic in the gallium/util functions so
lets reduce some entropy while here.

V.2:
  Apply change to nv50 also as suggested by Samuel Pitoiset.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-02-14 23:56:54 +01:00
Samuel Pitoiset
cbf24a01dd nv50: add missing PIPE_SHADER_CAP_SUPPORTED_IRS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-14 22:56:02 +01:00
Kenneth Graunke
8122d21d15 i965: Fix gl_DrawID in the vec4 backend.
brw_draw_upload.c uploads VertexID/InstanceID first, then DrawID.
So we need to assign the attribute mapping in that order as well.

Fixes the following Pigit tests with the vec4 backend:
- arb_shader_draw_parameters-drawid vertexid
- arb_shader_draw_parameters-drawid-indirect basevertex

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-02-14 13:24:07 -08:00
Brian Paul
816c987b67 mesa: move assertion in _mesa_cube_face_target()
Fixes piglit arb_texture_view-sampling-2d-array-as-2d-layer regression.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94134
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-14 09:16:22 -07:00
Serge Martin
a4cff1859e clover: fix build failure since bfd695e
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-14 11:00:29 +01:00
Kenneth Graunke
565aa69970 glsl: Fix overflow of ImageAccess[] array.
The ImageAccess array is statically sized to MAX_IMAGE_UNIFORMS:

   GLenum ImageAccess[MAX_IMAGE_UNIFORMS];

There was no bounds checking ensuring we don't overflow.  Passing in a
shader with too many uniforms would cause writes to extend into other
fields, such as sh->NumImages.

Later linker checks already handle reporting an error when there are too
many images, so just avoid corrupting structures here.

This rearranges the logic a bit to look more like the sampler case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 21:12:18 -08:00
Ilia Mirkin
6411444c36 mesa: default FixedSampleLocations to true when using a dummy image
GL_ARB_texture_multisample and GLES 3.1 expect the initial value to be
GL_TRUE. This fixes

dEQP-GLES31.functional.state_query.texture_level.texture_2d_multisample_array.fixed_sample_locations_integer

and a few related tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-02-13 23:41:28 -05:00
Jason Ekstrand
7410c60988 nir/types: Add more type constructor functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
f05f576803 nir/types: Add a few more glsl_type_is_ functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
914829f766 nir/types: Add helpers for working with sampler and image types
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
d140b13fd5 nir/types: Add helpers for function types
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
b9e94ad806 glsl/types: Expose glsl_struct_field and glsl_function_param to C
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
954d46184f glsl/types: Add a helper for getting image types
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
95ea9f7708 glsl/types: Add support for function types
SPIR-V has a concept of a function type that's used fairly heavily.  We
could special-case function types in SPIR-V -> NIR but it's easier if we
just add support to glsl_types.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
5ec6a65388 glsl/types: Add a bare "sampler" type
This is to be used by SPIR-V for representing a sampler that isn't attached
to any particular image.  In SPIR-V, all of the interesting bits such as
dimensionality, sampled type, etc. come from the image, the bare "sampler"
type simply uses a sampled type of VOID and 0 values for the rest.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Jason Ekstrand
ac089126b9 glsl/types: Rename sampler_type to sampled_type
It's a bit more descriptive since it is the base type that you get when you
sample from it.  Also, the next commit adds a bare "sampler" type and we
need glsl_type::sampler_type available for a public static member.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-13 17:22:36 -08:00
Vinson Lee
4ed4c1d921 llvmpipe: Do not use barriers if not using threads.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94088
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-13 14:42:05 -08:00
Francisco Jerez
9e30d66b7c i965: Reupload push and pull constants when we get new shader image unit state.
Fixes several of the
"dEQP-GLES31.functional.image_load_store*load_store*single_layer" dEQP
tests that use image formats we implement using untyped surface
messages.

Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-13 14:33:32 -08:00
Samuel Pitoiset
40fcb6b9f9 i965: fix MAX_COMPUTE_SHARED_SIZE constant value
MAX_COMPUTE_SHARED_SIZE should be set to 32768. This fixes a regression
introduced in be27f77 (mesa: do not use a constant for
MAX_COMPUTE_SHARED_SIZE).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94139
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-13 23:13:31 +01:00
Samuel Pitoiset
7f0a19400e nv50/ir: add missing SV_TID and SV_CTAID sysvals on GM107
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 22:26:38 +01:00
Samuel Pitoiset
d11266aa06 nv50/ir: add MEMBAR emission for GM107
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 22:06:15 +01:00
Alejandro Piñeiro
a150101125 docs: document MESA_GLES_VERSION_OVERRIDE envvar
v2: Removed reference to FC not being an allowed suffix (Brian Paul)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-13 20:21:06 +01:00
Samuel Pitoiset
b410ed9215 st/mesa: fix pipe_grid_info initializer
Fixes MSVC build error which doesn't allow empty initializers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-13 17:08:24 +01:00
Samuel Pitoiset
628b0e8571 trace: add all compute related functions
Changes from v3:
 - dump the TGSI compute program

Changes from v2:
 - remove use of MALLOC()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:01:02 +01:00
Samuel Pitoiset
fe0b55f39e st/mesa: implement limits for ARB_compute_shader
According to the spec, this also increases the following minimum values:
 - MAX_COMBINED_TEXTURE_IMAGE_UNITS     96 (6*16), was 80
 - MAX_UNIFORM_BUFFER_BINDINGS          72 (6*12), was 60

ARB_compute_shader is not enabled by default because images support is
still not implemented yet. If you want to use it you need to set
MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader.

Changes from v2:
 - make use of the new PIPE_CAP_SHADER_SUPPORTED_IRS cap instead of
   enabling the extension when PIPE_CAP_COMPUTE is enabled.
 - query for PIPE_CAP_COMPUTE first
 - s/shader_supported_irs/compute_supported_irs/
 - disable ARB_compute_shader and add a comment which explains why

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:01:02 +01:00
Samuel Pitoiset
8aa666981b st/mesa: add compute program dispatch callbacks
This state tracker implements DispatchCompute() and DispatchComputeIndirect().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:01:01 +01:00
Samuel Pitoiset
805d92e540 st/mesa: add state validation for compute shaders
This binds atomics, constants, samplers, ssbos, textures and ubos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:01:01 +01:00
Samuel Pitoiset
61c87cd2c0 st/mesa: add mappings for compute shader sysvals
LOCAL_INVOCATION_ID, WORK_GROUP_ID and NUM_WORK_GROUPS are respectively
mapped to THREAD_ID, BLOCK_ID and GRID_SIZE.

Changes from v2:
 - add assertions in st_translate_program()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 16:01:00 +01:00
Samuel Pitoiset
e8db4e4e0a st/mesa: keep track of shared memory declarations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 16:01:00 +01:00
Samuel Pitoiset
dfa58f0ff0 st/mesa: add intrinsics for shared variables
This adds GLSL intrinsics for load/store and atomic operations.

Changes from v2:
 - use PROGRAM_MEMORY instead of PROGRAM_BUFFER

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:01:00 +01:00
Samuel Pitoiset
44e04dc809 st/mesa: add conversion for compute shaders
According to the spec, there are no predefined inputs nor any
fixed-function outputs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 16:01:00 +01:00
Samuel Pitoiset
7c79c1e3e2 st/mesa: add compute shader states
Changes from v2:
 - use as much common code as possible (eg. st_basic_variant)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 16:00:54 +01:00
Samuel Pitoiset
08c46025c8 st/mesa: add a second pipeline for compute
Compute needs a new and different validation path.

Changes from v2:
 - make use of unreachable() instead of assert() when the pipeline is
   invalid
 - move the st_pipeline enumeration to st_context.h instead of st_api.h

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
a8328e3a50 tgsi/ureg: add shared variables support for compute shaders
This introduces TGSI_FILE_MEMORY for shared, global and local memory.
Only shared memory is currently supported.

Changes from v2:
 - introduce TGSI_FILE_MEMORY

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
5e09ac78e5 gallium: add PIPE_SHADER_CAP_SUPPORTED_IRS
This cap indicates the supported representations of programs. It should
be a mask of pipe_shader_ir bits. It will allow to enable
ARB_compute_shader if the underlying driver supports TGSI.

Changes from v2:
 - improve description of PIPE_SHADER_CAP_SUPPORTED_IRS

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
43f4420fba gallium: add indirect compute parameters to pipe_grid_info
Like indirect draw, we need to store a resource and an offset that
needs to be 4 byte aligned. When indirect is used, the size of the
grid (in blocks) is stored with three 32-bit integers.

Changes from v2:
 - s/most values/block sizes/

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
bfd695e1d2 gallium: add a new interface for pipe_context::launch_grid()
This introduces pipe_grid_info which contains all information to
describe a launch_grid call. This will be used to implement indirect
compute in the same fashion as indirect draw.

Changes from v2:
 - correctly initialize pipe_grid_info for nv50/nvc0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
61ed09c7ea gallium/cso: add support for compute shaders
Changes from v2:
 - removed cso_{save,restore}_compute_shader() functions and the
   compute_shader_saved variable because disabling compute shaders for
   meta ops is not currently needed

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
ffd9c7fd74 mesa: add PROGRAM_MEMORY
This will be used for shared, global and local memory areas.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
a9eb1327be mesa: store shared size in gl_compute_program
The size of shared variables needs to be stored in gl_compute_program
in order to set up pipe_compute_state::req_local_mem.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Samuel Pitoiset
be27f772e8 mesa: do not use a constant for MAX_COMPUTE_SHARED_SIZE
This will allow to query the underlying drivers for the maximum
total storage size of all variables declared as <shared> with
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-13 15:51:17 +01:00
Ilia Mirkin
f2547883cf mesa: make compute maximums reflect driver-provided values
Looks like the various max's were never plumbed through.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-13 15:51:17 +01:00
Topi Pohjolainen
f709a08457 i965: Add means for limiting color resolves
Until now there has been only one type of color buffer that needs
to resolved - namely single sampled fast clear. As even the
sampler engine in GPU doesn't understand the associated meta data,
the color values need to be always resolved prior to reading them.

From SKL onwards there is new scheme supported called the lossless
compression of single sampled color buffers. This is something that
is understood by the sampling engine and therefore resolving of
these types of buffers is not necessary before sampling.
This patch adds means to make the distinction when considering if
resolve is needed.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-13 09:50:24 +02:00
Topi Pohjolainen
7513c5c782 i965: Refactor resolving of auxiliary mode
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-13 09:30:36 +02:00
Topi Pohjolainen
9002bcdb35 i965: Don't try to create aux buffer for non-msrt aux-buffer
In addition to simply calling miptree_create() the higher level
call intel_miptree_create() also considers if the buffer should
be associated with an auxiliary buffer based on the given format.

Here we are allocating an auxiliary buffer which in turn has such
format that would mislead intel_miptree_create_layout() later on
to try to associate the auxiliary buffer with an auxiliary buffer.
To prevent this the actual buffer creation logic was split out
into its own function. Lets invoke that instead.

v2 (Ben): Do not signal msaa layout with explicit argument but
          using layout_flags instead.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-13 09:28:41 +02:00
Jason Ekstrand
88042b9f10 nir: Get rid of the C++ NIR_SRC/DEST_INIT macros
These were originally added to reduce compiler warnings but aren't really
needed.  Getting rid of them reduces the diff between the Vulkan branch and
master, so we might as well.
2016-02-12 21:35:02 -08:00
Ben Widawsky
5743fd9571 i965: Rename optimizer debug 00 filename
This allows ls, and scripts to get the file names in the correct order of
optimization.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-12 20:52:28 -08:00
Kenneth Graunke
c8b0020f2f i965: Make brw_clear_cache NULL out stale program pointers.
The L3 partitioning code tries to look at all programs - both render
programs (VS/TCS/TES/GS/FS) and compute (CS).

After calling brw_clear_cache, all prog_data pointers are invalid and
point to freed data.  The intention was that flagging the dirty bits for
all programs would cause the next draw call to re-run the atoms for each
program stage, uploading new programs and installing new, valid pointers.

However, this doesn't quite work in our new multi-pipeline world.  When
drawing or dispatching a compute workload, we only consider the programs
for the appropriate pipeline: drawing sets up VS/TCS/TES/GS/FS, but not
CS, and vice versa.  This leaves pointers dangling a bit longer than
intended.

The L3 configuration code tries to inspect the prog_data for all shader
stages, so that we avoid having to reconfigure it when swapping back and
forth between render and compute workloads.  So we can't have dangling
pointers.

The fix is simple: have brw_clear_cache NULL out stale prog_data
pointers, making it safe to inspect.  The next L3 configuration pass
will see either the render shaders or compute shader as missing for
one go around, but will pick them up when both pipelines have run.

In other words, we'll simply reconfigure L3 twice, which is safe,
if a tiny bit wasteful - but then again, we just threw every compiled
shader we had on the floor and started recompiling the from scratch,
which is massively more wasteful, so it's not much of a concern.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
2016-02-12 20:35:34 -08:00
Ilia Mirkin
f56b5de877 mesa: avoid segfault in GetProgramPipelineInfoLog when no length
If there is no pipe info log, we would unconditionally deref length,
which was only optionally there. _mesa_copy_string handles the source
being null, as well as the length, so may as well just always call it.

Fixes a segfault in

dEQP-GLES31.functional.state_query.program_pipeline.info_log

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:50 -05:00
Ilia Mirkin
f82ff6207c mesa: reset offset/size to 0 when removing atomic binding
Similar to commit dd9d2963d6 (mesa: AtomicBufferBindings should be
initialized to zero.), we should reset these to zero when unbinding.
This fixes a number of dEQP failures due to cross-test pollution. The
tests properly unbound everything, but when querying the values again,
the expectation was that they would be 0.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-12 18:22:49 -05:00
Ilia Mirkin
b7e246d89a mesa: recognize enums GL_COLOR_ATTACHMENT8-31 as valid
Similar as for AUX1-3, these enums aren't invalid (i.e. -1) but also not
supported by mesa. Returning BUFFER_COUNT causes the proper error to be
returned by ReadBuffer and other functions. This resolves some failures
in

dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.read_buffer

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:49 -05:00
Ilia Mirkin
a663aa2a37 mesa/clear: update ClearBufferfv error handling for GL 4.5 spec
This fixes

dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.clear_bufferfv

and brings the logic up to spec with GL 4.5

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:49 -05:00
Ilia Mirkin
3a0051bea9 mesa/clear: update ClearBufferuiv error handling for GL 4.5 spec
This fixes

dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.clear_bufferuiv

and brings the logic up to spec with GL 4.5

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:49 -05:00
Ilia Mirkin
758162923b mesa/clear: simplify ClearBufferiv error handling
Might as well handle everything in the same error call.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:49 -05:00
Ilia Mirkin
86fd9d6b8e mesa/clear: remove dead code handling ClearBufferiv(GL_DEPTH)
There's a hunk above which sets INVALID_ENUM for GL_DEPTH
unconditionally.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 18:22:48 -05:00
Ilia Mirkin
d33ef19479 mesa: allow DEPTH_STENCIL_TEXTURE_MODE queries in GLES 3.1 contexts
This fixes

dEQP-GLES31.functional.state_query.texture.texture_2d_multisample.depth_stencil_mode_integer

and a few related tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-02-12 18:22:48 -05:00
Kenneth Graunke
2a0fc82864 i915: include teximage.h
To get _mesa_num_tex_faces() prototype.
2016-02-12 15:20:29 -08:00
Kristian Høgsberg Kristensen
c136672c59 anv: Disable snooping for allocator pools again
The race we were seeing on cherryview was caused by the multi-submit
problem with fences. We can now turn snooping off again an rely on
clflush and we intended.
2016-02-12 15:11:31 -08:00
Kristian Høgsberg Kristensen
b0c30b77d4 anv: Submit fence bo only after all command buffers
We were submitting the fence bo after each command buffer in a multi
command buffer submit, causing us to occasionally complete the fence too
early.
2016-02-12 15:08:09 -08:00
Brian Paul
320ccf710e i965: include teximage.h
To get _mesa_num_tex_faces() prototype.
2016-02-12 15:42:54 -07:00
Axel Davy
cc0114f30b st/nine: Implement Managed vertex/index buffers
We were implementing those the same way than
the default pool, which is sub-optimal.

The buffer is supposed to return pointer to
a ram copy when user locks, and automatically
update the vram copy when needed.

v2: Rename NineBuffer9_Validate to NineBuffer9_Upload
Rename validate_buffers to update_managed_buffers
Initialize NineBuffer9 managed fields after the resource
is allocated. In case of allocation failure, when the dtor
is executed, This->base.pool is then rightfully set.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-12 23:26:36 +01:00
Axel Davy
77d6c11f8f st/nine: Align stack for entry points
For 32 bits, incoming stack is 4-byte aligned.
We need to realign the stack to 16-byte at some point,
or there are issues later (crash with SSE, llvm, etc).

This patch chooses to align the stack at API entry points.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-12 23:26:36 +01:00
Axel Davy
d7a5468da9 st/nine: Drop path for ureg_NRM and ureg_CLAMP
using MIN/MAX is fine instead of CLAMP.
NRM doesn't exist anymore.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
6b43f5b1d4 st/nine: Remove usage of SQRT in ff code
SQRT is not supported everywhere, so replace
it by RSQ + MUL and handle case <= 0.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-12 23:26:36 +01:00
Axel Davy
17078d92ea st/nine: Fix stateblocks crashes with lights
We had several issues of crashes with it.
This should fix it.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
6cba347530 st/nine: SCRATCH does support all formats
Add new argument to d3d9_to_pipe_format_checked to
be able to bypass format support checks. This argument
is set to TRUE when the requested Pool is SCRATCH.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
dbcb4f46ad st/nine: Add format checks to create_zs_or_rt_surface
Returns INVALIDCALL when trying to create a surface
of unsupported format.

In practice, apps are supposed to check for format
support before trying to create a render target
of that format. However some bad behaving apps
could just try to create the surface and deduce if
it failed that it wasn't supported.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
3a2e0c7784 st/nine: Support ATI1/ATI2 for CubeTexture
Texture and CubeTexture use common code,
and thus ATI1/ATI2 is already implemented
for CubeTexture.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
6c4774bbe4 st/nine: Clean pSharedHandle Texture ctors checks
Clarify the behaviour and clean the checks

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
bb65b189f3 st/nine: Move texture creation checks
We were having checks at both Create*Texture functions
and in ctors.

Move all Create*Texture checks to ctors.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
d973a525d3 st/nine: Clean useless code in texture9.c
This->base.base.resource is worth NULL
for SYSTEMMEM textures.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
36b4bb303c st/nine: Do not set SHARED flag for shared textures.
We do not support shared textures, thus no need to set
the shared flag.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Axel Davy
77a5871c1d st/nine: Do not set resource usage for SYSTEMMEM
We do not create a resource for SYSTEMMEM textures,
thus we do not need to set resource usage.

The only exception is vertexbuffer SYSTEMMEM, since
we do use a pipe resource for them.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-12 23:26:36 +01:00
Brian Paul
9675fb6c68 mesa: move _mesa_num_tex_faces() to teximage.h
So it's near the other cube map helper functions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 15:11:38 -07:00
Brian Paul
6e09df24b5 mesa: simplify some code with new _mesa_cube_face_target() function
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 15:11:38 -07:00
Brian Paul
82db969ac0 mesa: add _mesa_cube_face_target() helper
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 15:11:24 -07:00
Brian Paul
d73f5a3133 mesa: make _mesa_tex_target_to_face() an inline function
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 15:10:37 -07:00
Brian Paul
6a08673c5e mesa: remove _ARB suffix from cube map enums
Just minor clean-up so we're consistent everywhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 15:10:15 -07:00
Brian Paul
ae70d0d68c docs: Visual Studio 2013 or later is now required 2016-02-12 15:08:35 -07:00
Timothy Arceri
4e59362d1b glsl: replace _strtoui64() with strtoull() for MSVC
Now that MSVC 2013 is required we can remove this.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-13 08:57:01 +11:00
Kristian Høgsberg Kristensen
39a120aefe anv: Implement VkPipelineCache
We hash the input SPIR-V, specialization constants, entrypoint and the
shader key using SHA1 to determine a unique identifier for the
combination. A VkPipelineCache is then a hash table mapping these
identifiers to the corresponding prog_data and kernel data.
2016-02-12 11:53:49 -08:00
Chad Versace
03bea8fda7 anv/meta_blit: Remove references to clearing
Long ago, the blit code used to handle clearing and blitting.

- Fix any comments that refer to clearing.
- Rename shader var 'attr' to 'tex_pos'. The name 'attr' is an artifact
  of the time when the shader was used for blitting as well as clearing.
2016-02-12 11:29:29 -08:00
Chad Versace
97b5a07378 anv/meta_blit: Coalesce glsl_vec4_type vars
Just a refactor. No behavior change.

Several expressions have the same value: they point to
glsl_vec4_type(). Coalesce them into a single variable.
2016-02-12 11:29:29 -08:00
Jason Ekstrand
699f21216f anv/device: clflush simple batches if !LLC 2016-02-12 11:00:42 -08:00
Jason Ekstrand
42155abdd7 anv: Add a clfush_range helper function 2016-02-12 11:00:08 -08:00
Jason Ekstrand
3c8dc1afd1 nir/spirv/glsl: Clean up the row-skipping swizzle logic a bit 2016-02-12 10:40:39 -08:00
Chad Versace
37f4dfb19d anv/meta: Move blit code to anv_meta_blit.c
The clear code lived in anv_meta_clear.c. The resolve code in
anv_meta_resolve.c. Only the blit code lived in anv_meta.c, alongside
the shareed meta code.

This is just a copy-paste patch. No change in behavior.
2016-02-12 09:56:24 -08:00
Chad Versace
cf7fd53850 anv/meta: Hardcode smooth texcoord interpolation in blit shaders
Trivial cleanup. No change in behavior.

Function argument 'attr_flat', in anv_meta.c:build_nir_vertex_shader(),
was always false.
2016-02-12 09:15:58 -08:00
Jose Fonseca
950da38164 mesa: Use _aligned_malloc/free for MinGW too.
We already use these for gallium in
src/gallium/auxiliary/os/os_memory_stdc.h and it's always better to
minimize divergences between MinGW and MSVC.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-12 14:51:28 +00:00
Jose Fonseca
c69ef377c8 mesa: Remove support for MSVC2008.
Spotted by Emil Velikov.

Trivial.
2016-02-12 10:31:15 +00:00
Jose Fonseca
5bc8d34526 util/u_atomic: Remove MSVC 2008 support.
Spotted by Emil Velikov.

Trivial.
2016-02-12 10:31:15 +00:00
Topi Pohjolainen
30711d984f i965: Stop considering if msrt aux buffers need aux buffer
Auxiliary buffers are always created with sample number of zero
which effectively prevents intel_miptree_create_layout() from trying
to associate auxiliary buffers with auxiliary buffers.

Now that there is more direct path available lets start using it
instead and stop even checking for such (im)possibility.

v2 (Ben): Do not signal msaa layout with explicit argument but
          using layout_flags instead.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-12 09:17:29 +02:00
Topi Pohjolainen
422b1386d7 i965: Separate miptree creation from auxiliary buffer setup
Currently the logic allocating and setting up miptrees is closely
combined with decision making when to re-allocate buffers in
X-tiled layout and when to associate colors with auxiliary buffers.

These auxiliary buffers are in turn also represented as miptrees
and are created by the same miptree creation logic calling itself
recursively. This means considering in vain if the auxiliary buffers
should be represented in X-tiled layout or if they should be
associated with auxiliary buffers again.
While this is somewhat unnecessary, this doesn't impose any problems
currently. Miptrees for auxiliary buffers are created as simgle-sampled
fusing the consideration for multi-sampled compression auxiliary
buffers. The format in turn is such that is not applicable for
single-sampled fast clears (that would require accompaning auxiliary
buffer).
But once the driver starts to support lossless compression of color
buffers the auxiliary buffer will have a format that would itself
be applicable for lossless compression. This would be rather
difficult and ugly to detect in the current miptree creation logic,
and therefore this patch seeks to separate the association logic
from the general allocation and setup steps.

v2 (Ben):
   - Do not reconsider for X-tiling in intel_miptree_create()
     as it was just forced to Y-tiling in miptree_create().
   - Do not drop checks for allocation failures.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-02-12 09:13:07 +02:00
Topi Pohjolainen
d089f2d932 i965: Isolate aligned dimensions for stencil only
This makes the logic a little more explicit and helps to keep
subsequent patches easier to read.

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-02-12 09:13:07 +02:00
Topi Pohjolainen
0dcd9a09d1 i965: Restore vbo after color resolve during brw_try_draw_prims()
Part of brw_try_draw_prims() is a check to validate textures
(brw_validate_textures()). In case of textures that currently have
only level zero but are marked for mipmap generation, i965 driver
will decide to replace the underlying buffer with a larger one
capable of holding also the additional levels. This results into
blit from the original buffer to the newly allocated (see
intel_miptree_copy_teximage()). This blit is currently handled with
blitter engine and hence it won't effect the ongoing draw operation.
However, this blit in turn may trigger color resolve on the source
buffer. In principle, this should be possible with fast cleared
buffers but I only started hitting it when I enabled lossless
compression (that reguires similar resolve to fast cleared buffers).

Now, the color resolve is a meta operation and uses the same drawing
path we are already in middle of. After quite a bit of debugging I
realized that the resolve will modify the current vbo setup but it
won't restore it afterwards resulting in the original draw call
using wrong vertex data.
When brw_try_draw_prims() gets called, the vbo logic in the Mesa
core (see vbo_draw_arrays()) has just bound the vbo (see
vbo_bind_arrays() and recalculate_input_bindings()). Color resolve
operation will overwrite the vbo setup by calling vbo_bind_arrays()
against the resolve rectangle (see brw_draw_rectlist()). Once the
color resolve is done the vbo setup is left to the resolve rectangle
state and the original drawing call yields bogus results.

This patch aims to restore the original state after the color
resolve by calling vbo_bind_arrays() yet again after the vertex
array state in the core context have been restored.

Now having said all this, I'd also like to state that I'm quite
uncomfortable with the nested meta operations. Ths original draw
call in this case is in fact a meta operation itself. It is a blit
from level zero to level one when generating the additional mipmap
levels (see _mesa_meta_GenerateMipmap()). Imagine the complexity
if the blit in the middle from buffer to another would go to meta
path also instead of blitter.

I would very tempted to try to move all the resolves to happen
before a meta operation is started.
Additionally I still feel that work I did earlier in the spring/
summer time moving meta operations to use direct state upload
bypassing the core context would make sense.

v2: Force input recalculation by setting the flag explicitly

v3: Do not attempt to restore vbo for opengles1 which doesn't
    support vertex buffer objects.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-02-12 09:13:07 +02:00
Topi Pohjolainen
779429d063 i965: Validate textures before altering driver state
Validation may kick off copies and subsequently color resolves.
Color resolves (and the copies themselves if ending up in meta path)
will overwrite the internal driver state but are not prepared to
restore it. Instead of adding that capability the validation can be
simply performed before the state is updated.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-02-12 09:13:07 +02:00
Kenneth Graunke
76f6f59c6e i965: Make brw_clear_cache flag all the bits on both pipelines.
Setting brw->ctx.NewDriverState and brw->ctx.NewGLState affects
the dirty bits for the current pipeline.  But, we need to flag
everything dirty on *both* pipelines, so that when we switch
back, we'll realize our programs are stale and re-upload them.

To accomplish this, flag the saved state for both pipelines.
Only one of them should matter, but this way we don't have to
check which we need to set.  It's harmless to set the other.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-11 22:53:19 -08:00
Samuel Iglesias Gonsálvez
61ceb36ead glsl: Allow invariant qualifer in block members in desktop OpenGL.
Feedback from Khronos is that 'invariant' should be allowed on block
members for desktop OpenGL. Fix piglit regression added by fe1e89a0:
invariant-qualifier-in-out-block-01.vert

v2:
- Allow it for in/out blocks in OpenGL ES too, so when OES_shader_io_blocks
is supported we don't need to do any change (Timothy)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89330
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-12 07:20:47 +01:00
Jason Ekstrand
ea93041ccc anv/device: Use a normal BO in submit_simple_batch 2016-02-11 21:39:15 -08:00
Jason Ekstrand
3a2b23a447 anv: Add a vk_icdGetInstanceProcAddr entrypoint
Aparently there are some issues in symbol resolution if an application
packages its own loader and you have a system-installed one.  I don't
really understand the details, but it's not onorous to add.
2016-02-11 21:20:12 -08:00
Kenneth Graunke
e9644cb1f9 i965: Consider tessellation in get_pipeline_state_l3_weights.
I think this was just missed; Curro and I were probably writing
code simultaneously and forgot to combine them at the end.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-11 19:15:17 -08:00
Kenneth Graunke
f275c61c30 i965: Split brw_upload_texture_surfaces into compute/render atoms.
When uploading state for the compute pipeline, we don't want to
look at VS/TCS/TES/GS/FS programs, as they might be stale, and
aren't relevant anyway.  Likewise, the render pipeline shouldn't
look at CS.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-11 19:15:08 -08:00
Jason Ekstrand
25b09d1b5d anv/event: Use a 64-bit value
The immediate write from PIPE_CONTROL is 64-bits at least on BDW.  This
used to work on 64-bit archs because the compiler would align the following
anv_state struct up for us.  However, in 32-bit builds, they overlap and it
causes problems.
2016-02-11 19:00:56 -08:00
Marek Olšák
f3943614ff radeonsi: fix build with LLVM 3.6
Broken by this cleanup: 3dc1cb0cc7

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-12 00:41:36 +01:00
Jason Ekstrand
3086c5a5e1 gen8/pipeline: Properly set bits in PS_EXTRA for W, depth, and samaple mask 2016-02-11 15:22:18 -08:00
Jason Ekstrand
4016619931 nir/spirv: Allow the clip distance capability. 2016-02-11 15:14:46 -08:00
Jason Ekstrand
da4a6bbbea gen8/pipeline: Pull gs_vertex_count from prog_data 2016-02-11 15:13:54 -08:00
Jason Ekstrand
ff8895ba56 Merge remote-tracking branch 'mesa-public/master' into vulkan 2016-02-11 15:09:30 -08:00
Jason Ekstrand
9f8c01b03c i965/gs: Pass VerticesIn though prog_data
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-11 15:07:20 -08:00
Jason Ekstrand
56eb9c44ad i965/fs: Pass usage of depth, W, and sample mask through prog_data
We really need to stop pulling information directly out of shaders for
state setup.  For one thing, if we want any sort of an on-disk shader
cache, having all of this metadata in one place is going to be crucial.
Also, passing it all through prog_data cleans up the compiler <-> state
setup API substantially.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-11 15:07:20 -08:00
Jason Ekstrand
ae3543950c i965/fs: Refactor setup_payload_gen6 to assume FS
It's extremely FS specific so the fact that we have a stage check in the
middle of it is rather bogus.  While were here, we rename
setup_payload_gen4 and setup_payload_gen6 to make it obvious that they are
both FS specific.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-11 15:07:20 -08:00
Samuel Pitoiset
d759f0ddf1 nv50,nvc0: remove unused parameter in nvXX_state_validate()
This 'words' parameter is there since 2011 but it has never been used.
While we are at it, get rid of the extern declaration.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-11 23:14:16 +01:00
Timothy Arceri
b600247035 glsl: don't validate interface blocks twice
We already check for opaque types so don't recheck for atomics
and images.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-12 09:12:23 +11:00
Timothy Arceri
98d3cc9fbc glsl: remove duplicate embedded struct validation
Commit c98deb18d5 in 2010 disallowed embedded struct definitions
in ES. Then in 2013 d9bb8b7b56 disallowed it for everything but
GLSL 1.10.

Commit c98deb18d5 seemed the cleanest way to do the check so its
been extended to cover GL and the other version has been removed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-12 09:06:49 +11:00
Jose Fonseca
0d4898ae80 include,gallium: Remove pre-MSVC 2013 compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-11 21:36:00 +00:00
Jose Fonseca
a97a955b92 scons: Eliminate MSVC2008 compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-11 21:36:00 +00:00
Jose Fonseca
1cadfe08c4 configure: Eliminate MSVC2008 compatibility.
We no longer need to build any part of Mesa with Windows SDK 7.0.7600 or
MSVC 2008.  MSVC 2013 will be the oldest we support.

In practice this means people are now free to declare variables in the
middle of blocks, on the whole Mesa tree.

Care should still be taken with variable length arrays and void pointer
arithmetic.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Hella-acked-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-11 21:36:00 +00:00
Chris Forbes
a2c8b5ece5 i965: ir: dump floats as %-g rather than %f, so we can see denormals
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-11 12:10:29 -08:00
Jordan Justen
9f36070c2f i965/gen7: Require kernel cmd_parser 5 for ARB_compute_shader
The indirect dispatch registers were whitelisted in command parser
version 5. (Version 5 is available as of Linux 4.4)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-11 10:49:13 -08:00
Marek Olšák
a8aa73f768 st/mesa: release GLSL IR in LinkShader after it's not needed
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-11 17:31:40 +01:00
Marek Olšák
906ecab450 mesa: call build_program_resource_list inside Driver.LinkShader
to allow LinkShader to free the GLSL IR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-11 16:56:28 +01:00
Marek Olšák
0f235c960c st/mesa: use correct pipe functions to create tess shaders
Broken by one of my cleanups. Spotted by luck.

Radeonsi doesn't care, because all shader create callbacks go to the same
function.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-11 16:56:28 +01:00
Marek Olšák
100796c15c gallium/radeon: drop support for LLVM 3.5
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

v2: adjust the comment in the amdgpu winsys
2016-02-11 16:48:30 +01:00
Marek Olšák
3dc1cb0cc7 radeonsi: obtain commonly used LLVM types only once
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-11 16:48:30 +01:00
Marek Olšák
1643dca513 radeonsi: cleanup shader codegen
si_shader_ctx -> ctx
type * ptr -> type *ptr
si_shader_context *shader -> si_shader_context *ctx

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-11 16:48:30 +01:00
Marek Olšák
1c8a1a8fed radeonsi: fix a crash when binding a sampler buffer
Buffers don't contain r600_texture.

Broken by 7aedbbacae:
"radeonsi: put image, fmask, and sampler descriptors into one array"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94091
2016-02-11 16:48:30 +01:00
Kristian Høgsberg Kristensen
2009e304f7 anv/pack: Handle case where a struct field covers multiple dwords
We also didn't add start to field.end to get the absolute field end
position.
2016-02-10 22:36:38 -08:00
Emil Velikov
0f3cea95ab docs: add news item and link release notes for 11.1.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-02-11 01:47:16 +00:00
Emil Velikov
0802afd92d docs: add sha256 checksums for 11.1.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e49dd21bcb)
2016-02-11 01:45:27 +00:00
Emil Velikov
323782aa57 docs: add release notes for 11.1.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 7bcd827806)
2016-02-11 01:45:25 +00:00
Jason Ekstrand
f710f3ca37 Merge remote-tracking branch 'mesa-public/master' into vulkan
This also reverts commit 1d65abfa58 because
now NIR handles texture offsets in a much more sane way.
2016-02-10 17:12:11 -08:00
Jason Ekstrand
7ef3e47c27 Merge commit '85f5c18fef1ff2f19d698f150e23a02acd6f59b9' into vulkan 2016-02-10 17:09:56 -08:00
Kristian Høgsberg Kristensen
d2623a3247 anv: Handle dwords that are all MBZ correctly
A few packets have dwords in them that are all MBZ and we failed to
write those. This change makes sure we iterate through all dwords and
write them all.
2016-02-10 16:36:47 -08:00
Jason Ekstrand
8750299a42 nir: Remove the const_offset from nir_tex_instr
When NIR was originally drafted, there was no easy way to determine if
something was constant or not.  The result was that we had lots of
special-casing for constant values such as this.  Now that load_const
instructions are SSA-only, it's really easy to find constants and this
isn't really needed anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-02-10 16:33:50 -08:00
Jason Ekstrand
70dff4a55e nir/lower_vec_to_movs: Better report channels handled by insert_mov
This fixes two issues.  First, we had a use-after-free in the case where
the instruction got deleted and we tried to return mov->dest.write_mask.
Second, in the case where we are doing a self-mov of a register, we delete
those channels that are moved to themselves from the write-mask.  This
means that those channels aren't reported as being handled even though they
are.  We now stash off the write-mask before remove unneeded channels so
that they still get reported as handled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-02-10 16:33:14 -08:00
Kristian Høgsberg Kristensen
09bb7ea4b7 anv: Fix out-of-tree build
We need to be able to find the generated nir_opcodes.h header.
2016-02-10 15:54:28 -08:00
Kristian Høgsberg Kristensen
9cc939d82f nir: Fix out-of-tree build for spirv2nir
This needs to be able to find the generated nir_opcodes.h header.
2016-02-10 15:54:28 -08:00
Jason Ekstrand
9be5a4bc29 nir/spirv: Fix handling of OpGroupMemberDecorate
We were pulling the member index from the wrong dword
2016-02-10 15:36:42 -08:00
Jason Ekstrand
ac04c6de2c nir/spirv: Assert that struct member ids are in-bounds 2016-02-10 15:36:41 -08:00
Marek Olšák
6ee1c386fe radeonsi: don't emit unnecessary NULL exports for unbound targets (v3)
v2: remove semantic index == 0 checks
    add the else statement to remove shadowing of args
v3: fix fbo-alphatest-nocolor regression

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
2016-02-10 23:53:17 +01:00
Mark Janes
8179834030 nir/spirv: fix build_mat_subdet stack smasher
The sub-determinate implementation pattern fixed by
6a7e2904e0 has a second instance in the
same file.

With the previous algorithm, when row and j are both 3, the index
overruns the array.  This only impacts the stack on 32 bit builds.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-10 14:43:03 -08:00
Kristian Høgsberg Kristensen
51c01e292c anv: Generate pack headers from XML definition
This huge commit switches us over to using a simple xml format (genxml)
for defining our command streamer commands and a python script for
generating the pack headers we use in the driver.
2016-02-10 14:31:26 -08:00
Ben Widawsky
088280e022 i965: Make sure we blit a full compressed block
This fixes an assertion failure in [at least] one of the Unreal Engine Linux
demo/games that uses DXT1 compression. Specifically, the "Vehicle Game".

At some point, the game ends up trying to blit mip level whose size is 2x2,
which is smaller than a DXT1 block. As a result, the assertion in the blit path
is triggered. It should be safe to simply make sure we align the width and
height, which is sadly an example of compression being less efficient.

NOTE: The demo seems to work fine without the assert, and therefore release
builds of mesa wouldn't stumble over this. Perhaps there is some unnoticeable
corruption, but I had trouble spotting it.

Thanks to Jason for looking at my backtrace and figuring out what was going on.

v2: Use NPOT alignment to make sure ASTC is handled properly (Ilia)
Remove comment about how this doesn't fix other bugs, because it does.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93358
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-10 14:08:46 -08:00
Marek Olšák
79d0082c64 radeon/uvd: silence a warning 2016-02-10 20:16:17 +01:00
Marek Olšák
d9c8a8fe61 r300g: silence warnings 2016-02-10 20:16:17 +01:00
Ian Romanick
0ecc9d907e meta/decompress: Don't pollute the renderbuffer namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Fixes piglit 'object-namespace-pollution glGetTexImage-compressed
renderbuffer' test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:55 -08:00
Ian Romanick
3aeff21fbf meta: Use internal functions for renderbuffer access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:53 -08:00
Ian Romanick
4087c17832 meta/decompress: Track renderbuffer using gl_renderbuffer instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:50 -08:00
Ian Romanick
47a5aa4bfa i965/meta: Don't pollute the renderbuffer namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:47 -08:00
Ian Romanick
03506c9ef1 i965/meta: Use internal functions for renderbuffer access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:44 -08:00
Ian Romanick
4c6b0e017c i965/meta: Return struct gl_renderbuffer* from brw_get_rb_for_slice instead of GL API handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:42 -08:00
Ian Romanick
ab2b631703 meta: Don't save or restore the renderbuffer binding
Nothing left in meta does anything with the RBO binding, so we don't
need to save or restore it.  The FBO binding is still modified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:40 -08:00
Ian Romanick
e273bbd60b meta: Use _mesa_CreateRenderbuffers instead of _mesa_GenRenderbuffers and _mesa_BindRenderbuffer
This has the advantage that it does not pollute the global binding
state.  It also enables later patches that will stop calling
_mesa_GenRenderbuffers / _mesa_CreateRenderbuffers which pollute the
renderbuffer namespace.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:36 -08:00
Ian Romanick
1e055e9211 i965/meta: Use _mesa_CreateRenderbuffers instead of _mesa_GenRenderbuffers and _mesa_BindRenderbuffer
This has the advantage that it does not pollute the global binding
state.  It also enables later patches that will stop calling
_mesa_GenRenderbuffers / _mesa_CreateRenderbuffers which pollute the
renderbuffer namespace.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:33 -08:00
Ian Romanick
eb5bc62e97 mesa: Refactor renderbuffer_storage to make _mesa_renderbuffer_storage
Pulls the parts of renderbuffer_storage that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:31 -08:00
Ian Romanick
9ae42ab1ec mesa: Refactor _mesa_framebuffer_renderbuffer
This function previously was only used in fbobject.c and contained a
bunch of API validation.  Split the function into
framebuffer_renderbuffer that is static and contains the validation, and
_mesa_framebuffer_renderbuffer that is suitable for calling from
elsewhere in Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-02-10 10:59:28 -08:00
Marek Olšák
7aedbbacae radeonsi: put image, fmask, and sampler descriptors into one array
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state

By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.

This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line

v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
    at the same time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-10 19:41:49 +01:00
Marek Olšák
796ee76e2e winsys/radeon: fix the num_tile_pipes comment to silence warnings 2016-02-10 19:41:49 +01:00
Alexandre Demers
111602e159 winsys/radeon: better explain the num_tile_pipes fixup for TAHITI (v2)
v2: Clarify the relation between num_tiles_pipes and GB_TILE_MODE and the fix
 needed for Tahiti as suggested by Marek.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-02-10 19:29:41 +01:00
Samuel Pitoiset
5e8db898fd st/mesa: check ureg_create() retval in create_pbo_upload_vs()
This avoids a possible NULL dereference because ureg_create() might
return a NULL pointer.

Spotted by coverity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-10 18:26:20 +01:00
Bernhard Rosenkränzer
e86ba7844f freedreno/ir3: Get rid of nested functions
This allows building Freedreno with clang

Signed-off-by: Bernhard Rosenkränzer <bero@linaro.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-10 11:26:48 -05:00
Chris Forbes
43d23e879c i965/blorp: Fix hiz ops on MSAA surfaces
Two things were broken here:
- The depth/stencil surface dimensions were broken for MSAA.
- Sample count was programmed incorrectly.

Result was the depth resolve didn't work correctly on MSAA surfaces, and
so sampling the surface later produced garbage.

Fixes the new piglit test arb_texture_multisample-sample-depth, and
various artifacts in 'tesseract' with msaa=4 glineardepth=0.

Fixes freedesktop bug #76396.

Not observed any piglit regressions on Haswell.

v2: Just set brw_hiz_op_params::dst.num_samples rather than adding a
    helper function (Ken).

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>

v3: moved the alignment needed for hiz+msaa to brw_blorp.cpp, as
    suggested by Chad Versace (Alejandro Piñeiro on behalf of Chris
    Forbes)

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>

Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-02-10 09:00:05 +01:00
Topi Pohjolainen
878b2b8964 i965/gen8: Remove dead assertion
The assertion is inside a condition mandating num_samples > 1 and
therefore the first half of the constraint is always met. The
second half in turn would only be applicable for single sampled
case and moreover it is trying to falsely check against surface
type instead of format.
Subsequent patches will introduce proper support for the lossless
compression and dropping this here makes the patches a little
simpler.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-10 09:11:34 +02:00
Topi Pohjolainen
3c432d48bf i965: Use constant pointer when checking for compression
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-10 09:10:45 +02:00
Brian Paul
85fab1f09a mesa: fix trivial comment typo in dlist.c 2016-02-09 20:09:30 -07:00
Kenneth Graunke
85f5c18fef i965/vec4: Drop support for ATTR as an instruction destination.
This is no longer necessary...and it doesn't make much sense to
have inputs as destinations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-02-09 17:01:45 -08:00
Kenneth Graunke
67c5d00273 i965/vec4/gs: Stop munging the ATTR containing gl_PointSize.
gl_PointSize is delivered in the .w component of the VUE header, while
the language expects it to be a float (and thus in the .x component).

Previously, we emitted MOVs to copy it over to the .x component.
But this is silly - we can just use a .wwww swizzle and access it
without copying anything or clobbering the value stored at .x
(which admittedly is useless).

Removes the last use of ATTR destinations.

v2: Use BRW_SWIZZLE_WWWW, not SWIZZLE_WWWW (caught by GCC).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-02-09 17:01:45 -08:00
Kenneth Graunke
d56ae2d160 i965: Apply VS attribute workarounds in NIR.
This patch re-implements the pre-Haswell VS attribute workarounds.
Instead of emitting shader code in the vec4 backend, we now simply
call a NIR pass to emit the necessary code.

This simplifies the vec4 backend.  Beyond deleting code, it removes
the primary use of ATTR as a destination.  It also eliminates the
requirement that the vec4 VS backend express the ATTR file in terms
of VERT_ATTRIB_* locations, giving us a bit more flexibility.

This approach is a little different: rather than munging the attributes
at the top, we emit code to fix them up when they're accessed.  However,
we run the optimizer afterwards, so CSE should eliminate the redundant
math.  It may even be able to fuse it with other calculations based on
the input value.

shader-db does not handle non-default NOS settings, so I have no
statistics about this patch.

Note that the scalar backend does not implement VS attribute
workarounds, as they are unnecessary on hardware which allows SIMD8 VS.

v2: Do one multiply for FIXED rescaling and select components from
    either the original or scaled copy, rather than multiplying each
    component separately (suggested by Matt Turner).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-02-09 17:01:45 -08:00
Jason Ekstrand
09b3e30dc6 anv: Fix up spirv for new texture/sampler split stuff 2016-02-09 16:48:36 -08:00
Brian Paul
cac54d7987 st/mesa: clarify some texture target code in st_cb_drawpix.c
Use st->internal_target instead of PIPE_TEXTURE_2D when choosing the
texture format.  Probably no real difference, but let's be consistent.

Simplify a test when determining whether we need normalized texcoords.

Add a new assertion.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-09 17:48:26 -07:00
Brian Paul
5e4de781fa st/mesa: fix bitmap texture target code and simplify tex sampler state
Bitmaps may be drawn with a PIPE_TEXTURE_2D or PIPE_TEXTURE_RECT resource
as determined at context creation by checking if PIPE_CAP_NPOT_TEXTURES is
supported.  But many places in the bitmap code were hard-coded to use
PIPE_TEXTURE_2D.  Use st->internal_target instead.

I think an older NV chip is the only case where a gallium driver does not
support NPOT textures.  Bitmap drawing was probably broken for that GPU.

Also, we only need one sampler state with texcoord normalization set up
according to st->internal_target.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-09 17:48:25 -07:00
Brian Paul
9e2a9d5743 st/mesa: use MAX3() macro, as we do for sampler view code below
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-09 17:48:25 -07:00
Brian Paul
a5b8ede253 st/mesa: move some st_cb_drawpixels.c code, add comments 2016-02-09 17:47:42 -07:00
Jason Ekstrand
b14f4c1fd3 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in the separate texture/sampler stuff from upstream
2016-02-09 16:47:37 -08:00
Jason Ekstrand
e01dd59b73 vtn: Use const_index helpers 2016-02-09 16:32:38 -08:00
Jason Ekstrand
e15f7551d1 anv/apply_pipeline_layout: Use the new const_index helpers 2016-02-09 16:32:38 -08:00
Jason Ekstrand
768bd7f272 Merge commit '8b0fb1c152fe191768953aa8c77b89034a377f83' into vulkan
This pulls in Rob Clark's const_index changes for NIR
2016-02-09 15:30:39 -08:00
Nanley Chery
c624241ef4 mesa/readpix: Dedent former _mesa_readpixels() if block
Formatting patch split out for easy reviewing.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-09 15:13:07 -08:00
Nanley Chery
b89a8a15c2 mesa/readpix: Don't clip in _mesa_readpixels()
The clipping is performed higher up in the call-chain.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-09 15:13:07 -08:00
Nanley Chery
605832736a mesa/readpix: Clip ReadPixels() area to the ReadBuffer's
The fast path for Intel's ReadPixels() unintentionally omits clipping
the specified area to a valid one. Rather than clip in various
corner-cases, perform this operation in the API validation stage.

The bug in intel_readpixels_tiled_memcpy() showed itself when the winsys
ReadBuffer's height was smaller than the one specified by ReadPixels().
yoffset became negative, which was an invalid input for tiled_to_linear().

v2: Move clipping to validation stage (Jason)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Marta Löfstedt <marta.lofstedt@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-09 15:13:07 -08:00
Nanley Chery
55d56d34e0 mesa/image: Make _mesa_clip_readpixels() work with renderbuffers
v2: Use gl_renderbuffer::{Width,Height} (Jason)

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-09 15:13:07 -08:00
Jason Ekstrand
d03e5d5255 i965/vec4: Plumb separate surfaces and samplers through from NIR
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
f88027f7bd i965/vec4: Separate the sampler from the surface in generate_tex
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
b8ab9c8c86 i965/fs: Plumb separate surfaces and samplers through from NIR
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
c0c14de130 i965/fs: Separate the sampler from the surface in generate_tex
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
a37b8110c1 i965/fs: Add an enum for keeping track of texture instruciton sources
These logical texture instructions can have a *lot* of sources.  It's much
safer if we have symbolic names for them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
5ec456375e nir: Separate texture from sampler in nir_tex_instr
This commit adds the capability to NIR to support separate textures and
samplers.  As it currently stands, glsl_to_nir only sets the texture deref
and leaves the sampler deref alone as it did before and nir_lower_samplers
assumes this.  Backends can still assume that they are combined and only
look at only at the texture index.  Or, if they wish, they can assume that
they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir
all set both texture and sampler index whenever a sampler is required (the
two indices are the same in this case).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
ee85014b90 nir/tex_instr: Rename sampler to texture
We're about to separate the two concepts.  When we do, the sampler will
become optional.  Doing a rename first makes the separation a bit more
safe because drivers that depend on GLSL or TGSI behaviour will be fine to
just use the texture index all the time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand
3f42184994 nir: Add some braces around loops and ifs 2016-02-09 15:00:17 -08:00
Kenneth Graunke
830b075e86 i965: Explicitly write the "TR DS Cache Disable" bit at TCS EOT.
Bit 0 of the Patch Header is "TR DS Cache Disable".  Setting that bit
disables the DS Cache for tessellator-output topologies resulting in
stitch-transition regions (but leaves it enabled for other cases).

We probably shouldn't leave this to chance - the URB could contain
garbage - which could result in the cache randomly being turned on
or off.

This patch makes the final EOT write 0 to the first DWord (which
only contains this one bit).  This ensures the cache is always on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-09 14:54:26 -08:00
Rob Clark
8b0fb1c152 freedreno/ir3: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-09 17:30:33 -05:00
Rob Clark
ced8d3e773 nir: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark
6921762de6 ptn: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark
ead05e8670 ttn: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark
b1770235ed ttn: small logic cleanup
The only case where dim!=NULL is where op==load_ubo.  But using
op==load_ubo is less confusing.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-09 17:30:33 -05:00
Rob Clark
b6cf98bc82 gtn: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark
1df3ecc1b8 nir: const_index helpers
Direct access to intr->const_index[n], where different slots have
different meanings, is somewhat confusing.

Instead, let's put some extra info in nir_intrinsic_infos[] about which
slots map to what, and add some get/set helpers.  The helpers validate
that the field being accessed (base/writemask/etc) is applicable for the
intrinsic opc, for some extra safety.  And nir_print can use this to
dump out decoded const_index fields.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Chad Versace
4c5dcccfba anv/image: Fix usage for depthstencil images
The tests assertion-failed in vkCmdClearDepthStencilImage because the
isl surface lacked ISL_SURF_USAGE_DEPTH_BIT.

Fixes: https://gitlab.khronos.org/vulkan/mesa/issues/26
Fixes: dEQP-VK.pipeline.timestamp.transfer_tests.host_stage_with_clear_depth_stencil_image_method
Fixes: dEQP-VK.pipeline.timestamp.transfer_tests.transfer_stage_with_clear_depth_stencil_image_method
2016-02-09 12:54:30 -08:00
Chad Versace
c5e521f391 anv/image: Refactor choose_isl_surf_usage()
- Rename local var isl_flags -> isl_usage.
- Fix comment.
2016-02-09 12:54:30 -08:00
Chad Versace
2f4bb00c2b anv/image: Fix choose_isl_surf_usage()
Don't translate VkImageCreateInfo::usage into an isl_surf_usage bitmask.
Instead, translate anv_image::usage, which is a superset of
VkImageCreateInfo::usage.

For-Issue: https://gitlab.khronos.org/vulkan/mesa/issues/26
2016-02-09 12:54:04 -08:00
Kenneth Graunke
8b0f6de73d glsl: Disallow transform feedback varyings with compute shaders.
If the only stage is MESA_SHADER_COMPUTE, we should complain that
there's nothing coming out of the geometry shader stage just as
we would if the first stage were MESA_SHADER_FRAGMENT.

Also, it's valid for tessellation shaders to be the stage producing
transform feedback varyings, so mention those in the compiler error.

Found by inspection.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-09 12:34:11 -08:00
Marek Olšák
329181ae33 radeonsi: enable denorms for 64-bit and 16-bit floats
This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
17fe3fa312 gallium: pass the robust buffer access context flag to drivers
radeonsi will not do bounds checking for loads if this is not set.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
d611fce23d gallium/radeon: add a function for adding llvm function attributes
This will be used for setting the new InitialPSInputAddr attribute.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
de2e28366a radeonsi: compile geometry shaders immediately
they have only 1 variant

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
f7a8b6fff5 radeonsi: split out code for deleting si_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
e21142087c radeonsi: move code writing tess factors into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
dc5fc3c2f6 radeonsi: make LLVM IR dumping less messy
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
c1041366db radeonsi: move a few r600_can_dump_shader calls to where they're needed
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
b6d5666fbf radeonsi: remove useless code that handles dx10_clamp_mode
"enable-no-nans-fp-math" is a wrong string and there was a disagreement
about fixing it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
57271d5364 radeonsi: dump SPI_PS_INPUT values along with shader stats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
5a53628f45 radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns it
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
9483fcc7f2 radeonsi: don't force gl_SampleMaskIn to 1 for smoothing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
c379c2540b radeonsi: split PS input interpolation code into its own function
This will be used by the fragment shader prolog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
b9126dcda8 radeonsi: implement forcing per-sample_interpolation using the shader key only
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.

This also removes the spi_ps_input states and moves the registers
to the PS state.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
4596f3c1b8 radeonsi: remove si_shader::ps_input_interpolate
tgsi_shader_info has this too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
6dda2455c8 radeonsi: move BCOLOR PS input locations after all other inputs
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).

Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.

This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
606e4185f3 radeonsi: move SPI_PS_INPUT_CNTL value computation to a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
90cbbe1c12 radeonsi: generate a color_two_side variant only if the shader reads colors
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
4bbbaaf191 radeonsi: move si_shader_context initialization into a separate function
This will be re-used later.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
a3e9a5f9f8 st/mesa: remove st_is_program_native
The default scenario sets GL_TRUE too.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-09 21:19:51 +01:00
Marek Olšák
7046c588eb st/mesa: unify destroy_program_variants cases for TCS, TES, GS
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-09 21:19:50 +01:00
Marek Olšák
75be3ee9f9 st/mesa: unify get_variant functions for TCS, TES, GS
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-09 21:19:50 +01:00
Marek Olšák
b8d31fdedf st/mesa: unify variants and delete functions for TCS, TES, GS
no difference between those

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-09 21:19:50 +01:00
Chad Versace
bdab29a312 isl: Add more assertions to isl_surf_get_depth_format()
R16_UNORM and R32_FLOAT are illegal formats for interleaved depthstencil
surfaces.
2016-02-09 11:40:08 -08:00
Jason Ekstrand
1d65abfa58 nir/spirv: Better handle constant offsets in texture lookups 2016-02-09 10:29:05 -08:00
Jason Ekstrand
209820739b nir/spirv: Set the vtn_mode and interface type for sampler parameters 2016-02-09 10:29:05 -08:00
Jason Ekstrand
de6c9c5f2e nir/inline_functions: Don't shadown variables when it isn't needed
Previously, in order to get things working, we just always shadowed
variables.  Now, we rewrite derefs whenever it's safe to do so and only
shadow if we have an in or out variable that we write or read to
respectively.
2016-02-09 10:29:05 -08:00
Jason Ekstrand
b6c00bfb03 nir: Rework function parameters 2016-02-09 10:29:05 -08:00
Jason Ekstrand
a485567d3a anv/WSI/X11: Use the right allocator for freeing swapchains 2016-02-09 10:29:05 -08:00
Brian Paul
fe14110f35 mesa: fix incorrect viewport position when GL_CLIP_ORIGIN = GL_LOWER_LEFT
Ilia Mirkin found/fixed the mistake.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93813
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-09 11:27:48 -07:00
Brian Paul
0193e20df5 mesa: rewrite save_CallLists() code
When glCallLists() is compiled into a display list, preserve the call
as a single glCallLists rather than 'n' glCallList calls.  This will
matter for an upcoming display list optimization project.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-09 11:27:48 -07:00
Brian Paul
711d5347cf mesa: add missing error check in _mesa_CallLists()
Generate GL_INVALID_VALUE if n < 0.  Return early if n==0 or lists==NULL.

v2: fix formatting, also check for lists==NULL.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-09 11:27:48 -07:00
Brian Paul
b1ddc03633 mesa: whitespace clean-ups in dlist.h
And remove 'extern' qualifiers.
2016-02-09 11:27:48 -07:00
Brian Paul
7d18faf8e7 st/mesa: don't allocate bitmap drawing state until needed
Most apps don't use glBitmap so don't allocate the bitmap cache or
gallium state objects/shaders/etc until the first call to st_Bitmap().

v2: simplify a conditional, per Gustaw Smolarczyk.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 11:27:48 -07:00
Brian Paul
a5799de3dc st/mesa: move the setup_bitmap_vertex_data() code into draw_bitmap_quad()
Now all the code to setup the vertex data and draw it is in one place.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 11:27:48 -07:00
Brian Paul
130d34ce65 st/mesa: refactor some bitmap drawing code
Move setup/restoration of rendering state into helper functions.
This makes the draw_bitmap_quad() function much more concise.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 11:27:47 -07:00
Chad Versace
e6d3432c81 anv: Replace anv_format::depth_format with ::has_depth
isl now understands depth formats. We no longer need depth formats in
the anv_format table.
2016-02-09 10:02:50 -08:00
Chad Versace
0a93067993 isl: Add func isl_surf_get_depth_format()
For depth surfaces, it gets the value for
3DSTATE_DEPTH_BUFFER.SurfaceFormat.
2016-02-09 10:02:50 -08:00
Chad Versace
4d037b551e anv: Rename anv_format::surface_format -> isl_format
Because that's what it is, an isl format.
2016-02-09 10:02:50 -08:00
Ilia Mirkin
922be4eab9 mesa: remove hack to fix up GL_ANY_SAMPLES_PASSED results
Both st/mesa and i965 should return a true/false result now, and the
only other driver implementing queries (radeon) doesn't support
ARB_occlusion_query2 which added that pname.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 11:59:35 -05:00
Ilia Mirkin
7aca4bb9b1 st/mesa: make use of the occlusion predicate query
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-09 11:59:35 -05:00
Ilia Mirkin
50235ab3ab nv50: add PIPE_QUERY_OCCLUSION_PREDICATE support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-02-09 11:59:35 -05:00
Ilia Mirkin
0cb1dda36e nv30: add PIPE_QUERY_OCCLUSION_PREDICATE support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-09 11:59:35 -05:00
Ilia Mirkin
0d04ec2fd2 ilo: add PIPE_QUERY_OCCLUSION_PREDICATE support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2016-02-09 11:59:27 -05:00
Nicolai Hähnle
c260175677 draw: use util_pstipple_* function for stipple pattern textures and samplers
This reduces code duplication.

Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-09 10:01:57 -05:00
Nicolai Hähnle
452e51bf1e draw: use util_pstipple_create_fragment_shader
This reduces code duplication. It also adds support for drivers where the
fragment position is a system value.

Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-02-09 10:01:32 -05:00
Marek Olšák
83b4d701c0 winsys/radeon: fix a wrong NUM_TILE_PIPES value from the kernel
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94019

Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-09 15:26:40 +01:00
Timothy Arceri
1aae5e8ced nir: remove unused nir_variable fields
These are used in GLSL IR to removed unused varyings and match
transform feedback variables. There is no need to use these in NIR.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:49:06 +11:00
Timothy Arceri
6235b69134 glsl: remove unrequired forward declaration
This was added in 2548092ad8 although I don't see why as it
was already in the linker.h header.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:48:55 +11:00
Timothy Arceri
9dd6a4ea79 glsl: clean up and fix bug in varying linking rules
The existing code was very hard to follow and has been the source
of at least 3 bugs in the past year.

The existing code also has a bug for SSO where if we have a
multi-stage SSO for example a tes -> gs program, if we try to use
transform feedback with gs the existing code would look for the
transform feedback varyings in the tes stage and fail as it can't
find them.

V2: Add more code comments, always try to remove unused inputs
to the first stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:44:22 +11:00
Timothy Arceri
fd0b89ad8d glsl: simplify ES Vertex/Fragment shader requirements
We really just needed to skip the existing ES < 3.1 check if we have
a compute shader, all other scenarios are already covered.

* No shaders is a link error.
* Geom or Tess without Vertex is a link error which means we always
  require a Vertex shader and hence a Fragment shader.
* Finally a Compute shader linked with any other stage is a link error.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:44:15 +11:00
Timothy Arceri
55fa3c44bc glsl: simplify required stages for linking rules
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:44:11 +11:00
Timothy Arceri
20823992b4 glsl: small tidy up now that link_shaders() exits early with 0 shaders
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:44:07 +11:00
Timothy Arceri
76cfb47207 glsl: don't attempt to link empty program
Previously an empty program would go through the entire
link_shaders() function and we would have to be careful
not to cause a segfault.

In core profile also now set link_status to false by
generating an error, it was previously set to true.

From Section 7.3 (PROGRAM OBJECTS) of the OpenGL 4.5 spec:

   "Linking can fail for a variety of reasons as specified in the
   OpenGL Shading Language Specification, as well as any of the
   following reasons:

    - No shader objects are attached to program."

V2: Only generate an error in core profile and add spec quote (Ian)

V3: generate error in ES too, remove previous check which was only
applying the rule to GL 4.5/ES 3.1 and above. My understand is that
this spec change is clarifying previously undefined behaviour and
therefore should be applied retrospectively. The ES CTS tests for
this are in ES 2 I suspect it was passing because it would have
generated an error for not having both a vertex and fragment shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:44:02 +11:00
Matt Turner
371c4b3c48 nir: Recognize open-coded bitfield_reverse.
Helps 11 shaders in UnrealEngine4 demos.

I seriously hope they would have given us bitfieldReverse() if we
exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
anyway?).

instructions in affected programs: 4875 -> 4633 (-4.96%)
cycles in affected programs: 270516 -> 244516 (-9.61%)

I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
handling of this. We'd actually like to match, e.g., step2 by matching
step1 once and then doing a pointer comparison for the second instance
of step1, but unfortunately we generate an enormous tuple for instead.

The .text size increases by 6.5% and the .data by 17.5%.

   text     data  bss    dec    hex  filename
  22957    45224    0  68181  10a55  nir_libnir_la-nir_opt_algebraic.o
  24461    53160    0  77621  12f35  nir_libnir_la-nir_opt_algebraic.o

I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
in a GL 4.0 context once we expose GL 4.0.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 21:20:58 -08:00
Matt Turner
2d0d9755da nir: Handle large unsigned values in opt_algebraic.
The next patch adds an algebraic rule that uses the constant 0xff00ff00.

Without this change, the build fails with

   return hex(struct.unpack('I', struct.pack('i', self.value))[0])
   struct.error: 'i' format requires -2147483648 <= number <= 2147483647

The hex() function handles integers of any size, and assigning a
negative value to an unsigned does what we want in C. The pack/unpack is
unnecessary (and as we see, buggy).

Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
2016-02-08 20:38:17 -08:00
Matt Turner
7be8d07732 nir: Do opt_algebraic in reverse order.
Walking the SSA definitions in order means that we consider the smallest
algebraic optimizations before larger optimizations. So if a smaller
rule is part of a larger rule, the smaller one will happen first,
preventing the larger one from happening.

instructions in affected programs: 32721 -> 32611 (-0.34%)
helped: 106

In programs whose nir_optimize loop count changes (129 of them):

   before:  1164 optimization loops
   after:   1071 optimization loops

Of the 129 affected, 16 programs' optimization loop counts increased.

Prevents regressions and annoyances in the next commits.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner
a8f0960816 nir: Recognize product of open-coded pow()s.
Prevents regressions in the next commit.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner
9f02e3ab03 nir: Add opt_algebraic rules for xor with zero.
instructions in affected programs: 668 -> 664 (-0.60%)
helped: 4

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Timothy Arceri
3fd4280759 glsl: validate arrays of arrays on empty type delclarations
Fixes:
dEQP-GLES31.functional.shaders.arrays_of_arrays.invalid.empty_declaration_without_var_name_fragment
dEQP-GLES31.functional.shaders.arrays_of_arrays.invalid.empty_declaration_without_var_name_vertex

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-09 13:52:52 +11:00
Kenneth Graunke
74f956c416 i965: Use nir_lower_load_const_to_scalar().
I don't know why, but we never hooked up this pass Eric wrote.
Otherwise, you can end up with stupid scalarized code such as:

   vec4 ssa_7 = load_const (0.0, 0.0, 0.0, 0.0)
   vec4 ssa_8 = ...
   vec1 ssa_9 = feq ssa_8, ssa_7
   vec1 ssa_10 = feq ssa_8.y, ssa_7.y
   vec1 ssa_11 = feq ssa_8, ssa_7.z
   vec1 ssa_12 = feq ssa_8.y, ssa_7.w

ssa_8.xyxy == <0, 0, 0, 0> should only take two feq instructions.

shader-db on Skylake:

total instructions in shared programs: 9121153 -> 9120749 (-0.00%)
instructions in affected programs: 32421 -> 32017 (-1.25%)
helped: 277
HURT: 69

total cycles in shared programs: 69003364 -> 69000912 (-0.00%)
cycles in affected programs: 899186 -> 896734 (-0.27%)
helped: 313
HURT: 403

This also prevents regressions when disabling channel expressions.

v2: Don't call opt_cse afterwards (requested by Matt).  It should
    happen in the optimization loop below anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-08 18:10:34 -08:00
Timothy Arceri
184afd8fd9 mesa: remove now unused sampler index handing code
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-09 12:03:02 +11:00
Timothy Arceri
edc108765e mesa: compute sampler index in ir_to_mesa rather than using UniformHash
The aim of this is to work towards removing UniformHash from the program
struct so that we don't need to hold onto it in memory and pass it around
outside the linker.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-02-09 12:02:58 +11:00
Kenneth Graunke
d0e1d6b7e2 i965: Don't add barrier deps for FB write messages.
There are never render target reads, so there are no scheduling hazards.

Giving the extra flexibility to the scheduler makes it possible to do
FB writes as soon as their sources are available, reducing register
pressure.  It also makes it possible to do the payload setup for more
than one FB write message at a time, which could better hide latency.

shader-db results on Skylake:

total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
instructions in affected programs: 2898 -> 2855 (-1.48%)
helped: 3
HURT:   0
LOST:   0
GAINED: 1

A reduction in instruction counts is surprising, but legitimate:
the three shaders helped were spilling, and reducing register
pressure allowed us to issue fewer spills/fills.

total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
cycles in affected programs: 4412402 -> 4306114 (-2.41%)
helped: 4457
HURT: 213

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-02-08 16:59:35 -08:00
Dave Airlie
6502b3f60e st/mesa: enable AoA for gallium drivers reporting GLSL 1.30
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-09 10:52:09 +10:00
Dave Airlie
b74e8c89a6 st/mesa: add atomic AoA support
reuse the sampler deref handling code to do the same
thing for atomics.

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-09 10:52:09 +10:00
Dave Airlie
90bbe3d781 mesa: drop unused nonconst sampler functions.
Since we fixed the glsl->tgsi conversion we no longer need
this function.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-09 10:52:08 +10:00
Dave Airlie
bb8bbe34e3 st/mesa: handle indirect samplers in arrays/structs properly (v4.1)
The state tracker never handled this properly, and it finally
annoyed me for the second time so I decided to fix it properly.

This is inspired by the NIR sampler lowering code and I only realised
NIR seems to do its deref ordering different to GLSL at the last
minute, once I got that things got much easier.

it fixes a bunch of tests in
tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/

v2: fix AoA tests when forced on.
I was right I didn't need all that code, fixing the AoA code
meant cleaning up a chunk of code I didn't like in the array
handling.

v3: start generalising the code a bit more for atomics.
v3.1: use UniformRemapTable

v4: handle uniforms differently using the param_index,
and go back to UniformStorage
fix issues identified by Timothy with deref handling.
v4.1: squash const fix and move handling 1D const out
of recursive function.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-09 10:52:08 +10:00
Dave Airlie
52801766a0 glsl/ir: add param index to variable.
We have a requirement to store the index into the mesa parameterlist
for uniforms. Up until now we've overwritten var->data.location with
this info. However this then stops us accessing UniformStorage,
which is needed to do proper dereferencing.

Add a new variable to ir_variable to store this value in, and change
the two uses to use it correctly.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-09 10:52:08 +10:00
Francisco Jerez
53739fddc6 i965: Rename define for the PIPE_CONTROL DC flush bit.
Its previous name was somewhat misleading, this really behaves like a
RW cache flush rather than an invalidation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-08 15:48:00 -08:00
Francisco Jerez
10d84ba9f0 i965: Invalidate state cache before L3 partitioning set-up.
The state cache is also L3-backed so it seems sensible to make sure
it's clean as we do for other RO caches before repartitioning the L3.
This wasn't part of my original L3 partitioning code because I was
able to reproduce hangs on Gen7 hardware when the state cache
invalidation happened asynchronously with previous 3D rendering, which
should no longer be possible after the previous change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-08 15:47:21 -08:00
Francisco Jerez
0aa4f99f56 i965: Fix cache pollution race during L3 partitioning set-up.
We need to split the stalling flush from the RO cache invalidation
into a different PIPE_CONTROL command to make sure that the top of the
pipe invalidation happens after any previous rendering is complete.
Otherwise it's possible for previous rendering to pollute the L3 cache
in the short window of time between RO invalidation and the completion
of the stalling flush.  Fixes rendering artifacts on Unigine Heaven,
Metro Last Light Redux and Metro 2033 Redux.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93540
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93599
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-08 15:45:44 -08:00
Francisco Jerez
1817e3c07a i965/fs: Don't emit unnecessary SEL instruction from emit_image_atomic().
The SEL instruction with predication mode NONE emitted when the atomic
operation doesn't need to be predicated is a no-op and might rely on
undocumented hardware behaviour.  Noticed by chance while looking at
the assembly output.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-08 15:43:05 -08:00
Matt Turner
c300559fbf i965/vec4: Update vec4 unit tests for commit 01dacc83ff.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94050
2016-02-08 15:32:12 -08:00
Francisco Jerez
cec6fe2ad8 vtn: Clean up acos implementation.
Parameterize build_asin() on the fit coefficients so the
implementation can be shared while still using different polynomials
for asin and acos.  Also switch back to implementing acos in terms of
asin -- The improvement obtained from cancelling out the pi/2 terms
was negligible compared to the approximation error.
2016-02-08 15:23:43 -08:00
Francisco Jerez
f50a651726 nir/spirv: Create integer types of correct signedness.
vtn_handle_type() creates a signed type regardless of the value of the
signedness flag, which usually doesn't make much of a difference
except when the type is used as base sampled type of an image type,
what will cause the base type of the NIR image variable to be
inconsistent with its format and cause an assertion failure in the
back-end (most likely only reproducible on Gen7), and may change the
semantics of the image intrinsic subtly (e.g. UMIN may become IMIN).
2016-02-08 15:23:35 -08:00
Brian Paul
01dacc83ff dri/common: include debug_output.h to silence warning 2016-02-08 10:52:02 -07:00
Brian Paul
59251610ed tgsi: minor whitespace fixes in tgsi_scan.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-08 09:29:38 -07:00
Brian Paul
42246ab1f5 tgsi: s/true/TRUE/ in tgsi_scan.c
Just to be consistent.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-08 09:29:38 -07:00
Brian Paul
da6e879a6c tgsi: use switches instead of big if/else ifs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-08 09:29:38 -07:00
Brian Paul
37eb3f0400 tgsi: break gigantic tgsi_scan_shader() function into pieces
New functions for examining instructions, declarations, etc.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-08 09:29:38 -07:00
Brian Paul
3c3ef69696 st/mesa: minor formatting fixes in st_cb_bitmap.c 2016-02-08 09:29:38 -07:00
Brian Paul
5fdbfb8d6f mesa: move GL_ARB_debug_output code into new debug_output.c file
The errors.c file had grown quite large so split off this extension
code into its own file.  This involved making a handful of functions
non-static.

Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-02-08 09:29:38 -07:00
Brian Paul
6691ba1fe8 gallium/util: whitespace, formatting fixes in u_debug_stack.c 2016-02-08 09:29:38 -07:00
Brian Paul
5d2539cb49 gallium/util: whitespace, formatting fixes in u_staging.[ch] files
Still some nonsensical comments.
2016-02-08 09:29:38 -07:00
Brian Paul
c84a8911fc gallium/util: switch over to new u_debug_image.[ch] code
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-08 09:29:38 -07:00
Brian Paul
3917c8f3f9 gallium/util: put image dumping functions into separate file
To try to reduce the clutter in u_debug.[ch]

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-08 09:29:38 -07:00
Brian Paul
6c7d4a7173 gallium/util: whitespace, formatting fixes in u_debug.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-08 09:29:38 -07:00
Samuel Pitoiset
efe5829578 trace: add missing pipe_context::clear_texture()
This fixes a crash with bin/arb_clear_texture-base-formats and
probably some other tests which use clear_texture().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-08 00:06:32 +01:00
Samuel Pitoiset
1dacbb7b46 trace: remove useless MALLOC() in trace_context_draw_vbo()
There is no need to allocate memory when unwrapping the indirect buf.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-08 00:06:22 +01:00
Vinson Lee
ccaf734275 mesa/extensions: Fix NVX_gpu_memory_info lexicographical order.
Fixes MesaExtensionsTest.AlphabeticallySorted.

Fixes: 1d79b99580 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-07 14:42:00 -08:00
Ilia Mirkin
88519c6087 glsl: return cloned signature, not the builtin one
The builtin data can get released with a glReleaseShaderCompiler call.
We're careful everywhere to clone everything that comes out of builtins
except here, where we accidentally return the signature belonging to the
builtin version, rather than the locally-cloned one.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
2016-02-07 17:23:58 -05:00
Ilia Mirkin
ac57577e29 glsl: make sure builtins are initialized before getting the shader
The builtin function shader is part of the builtin state, released
when glReleaseShaderCompiler is called. We must ensure that the
builtins have been (re)initialized before attempting to link with the
builtin shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
2016-02-07 17:23:57 -05:00
Samuel Pitoiset
04c2ca5038 tgsi: use TGSI_WRITEMASK_XYZW instead of hardcoding the mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
2016-02-06 20:24:41 +01:00
Kristian Høgsberg Kristensen
6c4c04690f anv: Deduplicate dispatch calls
This can all be shared between gen8+ and pre-gen8.
2016-02-05 22:36:53 -08:00
Timothy Arceri
ea7f64f74d glsl: don't generate transform feedback candidate when not required
If we are not even looking for one don't bother generating a candidate
list.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-06 14:34:43 +11:00
Timothy Arceri
c1bbaff1e8 glsl: replace unreachable code with an assert()
All interface blocks will have been lowered by this point so just
use an assert. Returning false would have caused all sorts of
problems if they were not lowered yet and there is an assert to
catch this later anyway.

We also update the tests to reflect this change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-06 14:34:35 +11:00
Jan Vesely
e377037bef r600, compute: Do not overwrite pipe_resource.screen
found by inspection.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-05 21:17:15 -05:00
Kristian Høgsberg Kristensen
bdefaae2b9 anv: Deduplicate anv_CmdDraw calls
These were all duplicated between gen7_cmd_buffer.c and
gen8_cmd_buffer.c. This commit consolidates both copies in
genX_cmd_buffer.c.
2016-02-05 16:41:56 -08:00
Kristian Høgsberg Kristensen
6cdada0360 anv: Move invariant state to small initial batch
We use the simple batch helper to submit a batch at driver startup time
which holds all the state that never changes.  We don't have a whole lot
and once we enable tesselation there'll be even less. Even so, it's a
simple mechanism and reduces our steady state batch sizes a bit.
2016-02-05 16:13:53 -08:00
Kristian Høgsberg Kristensen
c9c3344c4f anv: Split out batch submit helper from anv_DeviceWaitIdle
We'll reuse this mechanism in the next commit.
2016-02-05 16:13:52 -08:00
Kristian Høgsberg Kristensen
381d85545a anv: Share scratch_space helper between gen7 and gen8+
The gen7 pipeline has a useful helper function for this, let's use it in
gen8_pipeline.c too.  The gen7 function has an off-by-one bug though: we
have to compute log2(size / 1024) - 1, but we divide by 2048 instead so
as to avoid the case where size is less than 1024 and we'd return -1.
2016-02-05 16:13:52 -08:00
Kristian Høgsberg Kristensen
d1617dbec3 anv: Share URB setup between gen7 and gen8+ 2016-02-05 16:13:52 -08:00
Jason Ekstrand
9401516113 Merge remote-tracking branch 'mesa-public/master' into vulkan 2016-02-05 15:21:11 -08:00
Jason Ekstrand
741744f691 Merge commit mesa-public/master into vulkan
This pulls in the patches that move all of the compiler stuff around
2016-02-05 15:03:44 -08:00
Jason Ekstrand
9645b8eb1f Merge branch mesa-public/master into vulkan 2016-02-05 14:21:13 -08:00
Jan Vesely
5b51b2e000 r600g: Ignore format for PIPE_BUFFER targets
Fixes compute since 7dd31b81fe
gallium/radeon: support PIPE_CAP_SURFACE_REINTERPRET_BLOCKS

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 20:23:56 +01:00
Marek Olšák
d8e4908b63 mesa/get: fix a breakage after rebase
trivial.
2016-02-05 19:39:13 +01:00
Matt Turner
9f2e22bf34 i965/vec4: don't copy ATTR into 3src instructions with complex swizzles
The vec4 backend, at the end, does this:

    if (inst->is_3src()) {
       for (int i = 0; i < 3; i++) {
          if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
             assert(brw_is_single_value_swizzle(inst->src[i].swizzle));

So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.

Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.

Co-authored-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-02-05 09:33:19 -08:00
Marek Olšák
1106e79ed9 docs/relnotes: document memory info extensions 2016-02-05 17:47:59 +01:00
Marek Olšák
635555af6a gallium/radeon: implement query_memory_info (v2)
v2: don't use DIV_ROUND_UP (no so useful)
    also return eviction stats

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-02-05 17:31:58 +01:00
Marek Olšák
5f51a24a77 st/mesa: implement and enable memory info extensions (v2)
v2: assert and return if query_memory_info is not set
    rebase

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-02-05 17:31:53 +01:00
Marek Olšák
837f74aa51 mesa: implement GL_ATI_meminfo (v2)
v2: rebase

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-02-05 17:31:20 +01:00
Marek Olšák
1d79b99580 mesa: implement GL_NVX_gpu_memory_info (v2)
v2: implement eviction queries properly
    add gl_memory_info structure

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-02-05 17:30:07 +01:00
Marek Olšák
d2e4c9e737 gallium: add interface for querying memory usage and sizes (v2)
If you're worried about the duplication of some CAPs, we can remove them
later.

v2: add fields for memory eviction stats

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-02-05 17:29:38 +01:00
Marek Olšák
c577f2843a gallium/radeon: remove radeon_info::r600_tiling_config
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:29:19 +01:00
Marek Olšák
4f96846d9d gallium/radeon: get pipe_interleave_bytes AKA group_bytes from the winsys
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:59 +01:00
Marek Olšák
276621da45 gallium/radeon: set num_banks in the winsys
amdgpu doesn't have to set this, because radeonsi gets it from tile mode
arrays by default.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:40 +01:00
Marek Olšák
294ec530c9 gallium/radeon: just get num_tile_pipes from the winsys
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:24 +01:00
Marek Olšák
0f3556d308 winsys/amdgpu: add an assertion to cik_get_num_tile_pipes (v2)
v2: print an error to stderr

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:18 +01:00
Marek Olšák
a2291f7b57 winsys/amdgpu: remove an r600-only setting
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:12 +01:00
Marek Olšák
1e864d7379 gallium/radeon: rename & reorder members of radeon_info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-05 17:28:00 +01:00
Steinar H. Gunderson
feb53912f8 mesa: Fix locking of GLsync objects.
GLsync objects had a race condition when used from multiple threads
(which is the main point of the extension, really); it could be
validated as a sync object at the beginning of the function, and then
deleted by another thread before use, causing crashes. Fix this by
changing all casts from GLsync to struct gl_sync_object to a new
function _mesa_get_and_ref_sync() that validates and increases
the refcount.

In a similar vein, validation itself uses _mesa_set_search(), which
requires synchronization -- it was called without a mutex held, causing
spurious error returns and other issues. Since _mesa_get_and_ref_sync()
now takes the shared context mutex, this problem is also resolved.

Fixes bug #92757, found while developing Nageru, my live video mixer
(due for release at FOSDEM 2016).

v2: Marek: silence warnings, fix declaration after code

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 17:18:17 +01:00
Nicolai Hähnle
156e81f305 radeonsi: add placeholder MC and SRBM performance counter groups
Yet another change motivated by AMD GPUPerfStudio compatibility. These groups
are not directly accessible from userspace, and AMD GPUPerfStudio does not
actually query them - it just requires them to be there. Hence, adding
a placeholder for now.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:25:33 -05:00
Nicolai Hähnle
988f4b31f3 radeonsi: re-order the SQ_xx performance counter blocks
This is yet another change motivated by appeasing AMD GPUPerfStudio's
hardcoding of performance counter group numbers.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:25:30 -05:00
Nicolai Hähnle
75affd73b0 radeonsi: re-order the perfcounter hardware blocks
As documented in the comment, AMD GPUPerfStudio unfortunately hardcodes the
order of performance counter groups. Let's do the pragmatic thing and present
the same order as Catalyst/Crimson.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:25:27 -05:00
Nicolai Hähnle
b0e32548c8 gallium/radeon: add GPIN driver query group
This group was used by older versions of AMD GPUPerfStudio (via
AMD_performance_monitor) to identify the GPU family, and GPUPerfStudio
still complains when it isn't available.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:24:59 -05:00
Nicolai Hähnle
4b672b8310 radeonsi: Allow dumping LLVM IR before optimization passes
Set R600_DEBUG=preoptir to dump the LLVM IR before optimization passes,
to allow diagnosing problems caused by optimization passes.

Note that in order to compile the resulting IR with llc, you will first
have to run at least the mem2reg pass, e.g.

opt -mem2reg -S < shader.ll | llc -march=amdgcn -mcpu=bonaire

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> (original patch)
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (w/ debug flag)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:22:04 -05:00
Nicolai Hähnle
5aafc169ca gallium/radeon: emit LLVM ret void before radeon_llvm_finalize_module
This allows dumping a consumable LLVM module before the initial optimization
passes are run.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:54 -05:00
Nicolai Hähnle
7e9670c8bc st/mesa: bail out of try_pbo_upload_common when constant upload fails
Also fixes a resource leak when an upload_mgr is used for constants.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:51 -05:00
Nicolai Hähnle
a01e44adcc st/mesa: bail out of try_pbo_upload_common when vertex upload fails
At the same time, fix a memory leak noticed by Ilia Mirkin.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:48 -05:00
Nicolai Hähnle
b27c79bd81 st/mesa: reduce the scope of sampler_view in try_pbo_upload_common
We can get rid of our reference immediately, since the driver will hold
onto it for us.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:44 -05:00
Nicolai Hähnle
13e21e3ec5 st/mesa: do uploads earlier in try_pbo_upload_common
While rather unlikely, uploads _can_ fail. Doing them earlier means
we'll have to restore less state when they do fail, and it's slightly
easier to check the restore code.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-05 09:21:27 -05:00
Neil Roberts
eb9cf3cfc9 main: Use a derived value for the default sample count
Previously the framebuffer default sample count was taken directly
from the value given by the application. On the i965 driver on HSW if
the value wasn't one that is supported by the hardware it would hit an
assert when it tried to program the state for it. This patch fixes it
by adding a derived sample count to the state for the default
framebuffer. The driver can then quantize this to one of the valid
values in its UpdateState handler when the _NEW_BUFFERS state changes.
_mesa_geometric_samples is changed to use the new derived value.

Fixes the piglit test arb_framebuffer_no_attachments-query

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93957
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-05 11:05:10 +00:00
Neil Roberts
5fd848f6c9 program: Use _mesa_geometric_samples to calculate gl_NumSamples
Otherwise it won't take into account the default samples for
framebuffers with no attachments.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-05 11:05:06 +00:00
Neil Roberts
4995d9c9a0 main: Use _mesa_geometric_samples to calculate GL_SAMPLE_BUFFERS
Otherwise it won't take into account the default samples for
framebuffers with no attachments.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-05 11:05:01 +00:00
Neil Roberts
d8d4661ddb main: Use _mesa_geometric_samples to calculate the value of GL_SAMPLES
Otherwise it won't take into account the default samples for
framebuffers with no attachments.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-05 11:04:44 +00:00
Ilia Mirkin
2065e380b2 nvc0: avoid negatives in PUSH_SPACE argument
Fixup to commit 03b3eb90d - the number of buffers could be larger than
the number of elements, in which case we'd pass a negative argument to
PUSH_SPACE, which would be bad. While we're at it, merge it with the
other PUSH_SPACE at the top of the function.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-02-05 00:49:51 -05:00
Ilia Mirkin
03b3eb90d7 nvc0: add some missing PUSH_SPACE's
nvc0_vbo has explicit push space checking enabled, so we must run
PUSH_SPACE by hand. A few spots missed that.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-02-05 00:41:43 -05:00
Ilia Mirkin
1a0fde1f52 nvc0/ir: fix converting between predicate and gpr
The spill logic will insert convert ops when moving between files. It
seems like the emission logic wasn't quite ready for these converts.

Tested on fermi, and visually looked at nvdisasm output for maxwell.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-02-05 00:41:33 -05:00
Ilia Mirkin
2fed18b8a5 nvc0: add support for ARB_query_buffer_object
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-04 21:21:30 -05:00
Ilia Mirkin
9cd5bb9f9f st/mesa: add query buffer support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Ilia Mirkin
f9e6f46335 gallium: add PIPE_CAP_QUERY_BUFFER_OBJECT
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Ilia Mirkin
40d7f02c67 gallium: add a way to store query result into buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Ilia Mirkin
386a9ec77b mesa: add core implementation of ARB_query_buffer_object
Forwards query result writes to drivers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Ilia Mirkin
7c3f4b2fd8 mesa: add driver interface for writing query results to buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Rafal Mielniczuk
3efcd4df01 mesa: Handle QUERY_BUFFER_BINDING in GetIntegerv
Signed-off-by: Rafal Mielniczuk <rafal.mielniczuk2@gmail.com>
[imirkin: move to GL/GL_CORE section]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Rafal Mielniczuk
2d0ec0c272 mesa: Add QueryBuffer to context
Add QueryBuffer and initialise it to NullBufferObj on start

Signed-off-by: Rafal Mielniczuk <rafal.mielniczuk2@gmail.com>
[imirkin: also release QueryBuffer on free]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Rafal Mielniczuk
c5bab061da mesa: Add ARB_query_buffer_object extension flag
Signed-off-by: Rafal Mielniczuk <rafal.mielniczuk2@gmail.com>
[imirkin: add string to extensions.c]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Rafal Mielniczuk
4913d381a0 glapi: Add xml infrastructure for ARB_query_buffer_object
Signed-off-by: Rafal Mielniczuk <rafal.mielniczuk2@gmail.com>
[imirkin: move definition to gl_API.xml as it is very short]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-04 21:21:30 -05:00
Timothy Arceri
23e24e27ac glsl: simplify setting of image access qualifiers
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-05 10:05:40 +11:00
Timothy Arceri
815929bd15 mesa: remove dead program parameter functions
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-05 09:11:00 +11:00
Axel Davy
94d91c6707 st/nine: Use align_free when needed
Use align_free to free memory allocated
with align_malloc.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
6b12fe77ea st/nine: Disallow non-argb8888 cursors
Only argb8888 cursors are allowed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
24ddadbba9 st/nine: Enforce centroid for color input when multisampling is on
The color inputs must automatically use centroid whether
multisampling is used or not.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
d5389bb92d st/nine: Fix centroid flag
sem.reg.mod & NINED3DSPDM_CENTROID is worth 4 when
centroid is requested, whereas
TGSI_INTERPOLATE_LOC_CENTROID is worth 1.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
ee31f0fed4 st/nine: Use fast clears more often for MRTs
This enables to use fast clears in the following
case:

pixel shader renders to 1 RT
4 RT bound
clear
new pixel shader bound that renders to 4 RTs

Previously the fast clear path wouldn't be hit,
because when trying the fast clear path,
the framebuffer state would be configured for 1 RT,
instead of 4.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
e85ef7d8e5 st/nine: Use linear filtering for shadow mapping
Some docs say linear filtering is always used when
app does shadow mapping.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
0b35da59de st/nine: Respect block alignment on surface lock
Respect block alignment for ATI1/ATI2 format when trying to lock a
surface using LockRect().
Fixes failing WINE tests device.c test_surface_blocks() tests.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
56b4222b29 st/nine: Add Render state validation layer
Testing Win behaviour seems to show wrong states
are accepted, but then depending on the states
some specific 'good' behaviours happen.

This adds some validation to catch invalid
states and have these 'good' behaviours
when it happens.

Also reorders SetRenderState to match the expected
optimisation:
(Value == previous Value) => return immediately,
which affects D3D9 hacks too.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
7132617436 DRI_CONFIG: Add option to override vendor id
Add config option override_vendorid to report a fake card in d3dadapter9 drm.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
1a893ac886 st/nine: Implement NineDevice9_GetAvailableTextureMem
Implement a device private memory counter similar to Win 7.

Only textures and surfaces increment vidmem and may return
ERR_OUTOFVIDEOMEMORY. Vertexbuffers and indexbuffers creation always
succeedes, even when out of video memory.

Fixes "Vampire: The Masquerade - Bloodlines" allocating resources until crash.
Fixes "Age of Conan" allocating resources until crash.
Fixes failing WINE test device.c test_vidmem_accounting().

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
a961ec335d st/nine: Handle Window Occlusion
Apps can know if the window is occluded by checking for
specific error messages. The behaviour is different
for Device9 and Device9Ex.

This allow games to release the mouse and stop rendering
until the focus is restored.

In case of multiple swapchain we do care only of the device one.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
e59908e57f st/nine: Store minor version num
To keep compatible with older ID3DPresent interfaces (used to talk
with Wine), store the minor version num accessible to all
statetracker functions (in the NineDevice9 structure).

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
0ac01a9fd7 st/nine: Call flush_resource before flush
flush_resource needs to be called before flush (for
fast clear resolve, etc).

Removes useless computation of resource (it is
already set correctly).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
f481b9b952 st/nine: Fix remaining swapchain tests
Return D3DERR_INVALIDCALL instead of E_POINTER.
On error set ppBackBuffer to NULL.

Multiple swapchains can only be created in windowed mode as
windowed swapchain.

Set backbuffer to NULL in NineDevice9_GetBackBuffer, but not
in NineSwapChain9_GetBackBuffer.

This fixes all WINE's device.c test_swapchain() tests.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
cbbd3c65cc st/nine: Fix crash NineDevice9_CreateAdditionalSwapChain
When no window is specified, we should revert to the focus window.

This deserves more tests however (what if the device swapchain is
already using the focus window ?)

Fixes crash for FFXIV

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
996f76bd8a st/nine: Fix possible crash on error
In case swapchain creation fails This->swapchains[i] might be NULL and
causes a crash.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
40a0b97ebd st/nine: Test more presentation params
Return errors in case of invalid presentation parameters.
Fixes failing WINE tests device.c test_swapchain_parameters().

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
827fee059e st/nine: Fix resource9 private data
Store a copy of GUID in the header that is under our control and use it
as key for the hashtable instead of using the application provided pointer.
The application might change the memory after leaving the function.

Fixes a crash for issue https://github.com/iXit/Mesa-3D/issues/130

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
5c79bd666b st/nine: Print GUID instead of pointer
To ease debugging print the GUID instead of the pointer to it.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
2a4d1509c8 st/nine: Fix use of uninitialized memory
The values of box.z and box.depth weren't set and lead to a crash.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
924038c08f st/nine: Fix clear for multisample mismatch depth-stencil
Tests show in case of multisample mismatch between the depth-stencil
buffer and the render target, then it is not cleared.

Fixes failing WINE test visual.c test_multisample_mismatch().

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
7f58ba45a8 st/nine: Fix Volumetexture9_LockBox
Check for valid locked box dimensions.

Fixes failing wine tests device.c test_lockbox_invalid.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
35047681ff st/nine: Fix ATI2 pitch for non-square
Fixes crash for non-square textures.
We were using the height instead of the
width for some calculations.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
eeeab8d6b4 st/nine: Support D3DFMT_R8G8B8
Add support for D3DFMT_R8G8B8. It allows format conversion for
surfaces of pool scratch.

Usually gallium formats equivalents for d3d9 formats
have their names reversed.

The gallium format PIPE_FORMAT_R8G8B8_UNORM is the right
equivalent here, and its name is likely wrong (reversed).

Fixes a crash in TmNationsForever.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
a3e7525ada st/nine: Use cso for viewport
Use CSO to catch redundant viewport changes.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
495727af6b st/nine: Fix shade mode flat
Shade mode flat is only working if pixelshaders have interpolate
set to TGSI_INTERPOLATE_COLOR on color inputs.

Fixes failing WINE tests visual.c test_shademode().

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
fa887ba65b st/nine: Clear rendertarget on creation
Clear every rendertarget on creation.
Fixes https://github.com/iXit/Mesa-3D/issues/139

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
b142f61621 st/nine: Allow ColorFill on D3DFMT_NULL surfaces
Report success instead of failing as there's no resource for those surfaces.
Fixes a crash in Crysis: Warhead.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
04e22a04a6 st/nine: Introduce STREAMFREQ state
Previous vertex elements code update
was protected by
'if ((group & (NINE_STATE_VDECL | NINE_STATE_VS)) ||
state->changed.stream_freq & ~1)'
itself protected by
'if (group & (NINE_STATE_COMMON | NINE_STATE_VS))'

If no state is changed except the stream frequency,
no update would happen.

This patch solves the problem by adding a new
NINE_STATE_STREAMFREQ state.
Another way would be to add state->changed.stream_freq & ~1
check to the main test.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
15ce2778fb st/nine: Catch redundant SetStreamSourceFreq calls
Some apps do redundant SetStreamSourceFreq calls.
Catch them to improve performance.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
ea3f504f7c st/nine: Squash indexbuffer9 and vertexbuffer9
The indexbuffer9 codebase was lagging behind the one of vertexbuffer9.

Add buffer9 as common code base for indexbuffer9 and vertexbuffer9.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
b6bb8d561a st/nine: Unset vtxbuf on reset
We forgot to reset vtxbuf.
This fixes some crashes.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
b63c144d1e st/nine: Use pipe_resource_reference for vtxbuf
This seems cleaner to actually reference the resources for vtxbuf,
rather than relying on the fact the bound d3d streams do.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
b5876e4762 st/nine: Use ff vertex shader when position_t is used
When an application sets a vertex shader, we are supposed
to use it, and when no vertex shader are set, we are supposed
to revert to fixed function vertex shader.

It seems there is an exception: when the vertex declaration
has a position_t index, we should revert to fixed function
vertex shader.

Up to know we were checking if device->state.vs is set
to know whether to use programmable shader or not.

With this commit we determine whether we use programmable shader
or not when vertex shader/declaration are set, but
stateblocks do complicate things a bit.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
531acbc56b st/nine: Don't increment refcount on VertexDeclaration creation failure
NineUnknown_ctor increments the refcount even in case of an error.
Restructure the code to prevent refcount increments.
Fixes a couple of wine tests.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
b39fd5b1da st/nine: Change StretchRect check order
Textures in SYSTEMMEM don't have resources attached.
Instead of returning an error for them, StretchRect
was crashing.
This changes the check order to fix that case.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
a82e67812a st/nine: Initialize lights in stateblocks
This fixes a crash.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
9c1d93f8e7 st/nine: Fix fixed-function blendweights
The last weighted element is one minus the sum of all previous weights.
Fixes WINE test visual.c test_vertex_blending.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
cc830dc214 st/nine: Always normalize hitDir
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
ed7e1046b6 st/nine: Replace r[0] with tmp
Replace r[0] with tmp to ease code reading.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
9856203f5a st/nine: Fix ff calculation of midVec
In case of non local viewer the value has to be subtracted.
Fixes failing WINE tests in test_specular_lighting() (visual.c)

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
921f3eac58 st/nine: Implement D3DRS_SPECULARENABLE
Implement fixed function D3DRS_SPECULARENABLE.
Fixes failing WINE tests in test_specular_lighting() (visual.c)

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
9c26fa1b13 st/nine: Fix D3DRS_LOCALVIEWER being ignored
Set key->localviewer to D3DRS_LOCALVIEWER.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Axel Davy
aa4454ae85 st/nine: Fix rounding issue with vs1.1 a0 reg
vs1.1 rounds a0 to lowest integer, while
other versions do round to closest.

To use the same path as the other versions (with ARR),
we were substracting 0.5 for vs1.1 to get round to lowest.

This gives wrong result if a0 is set to 0:
round(0 - 0.5) = -1

Instead just use ARL for vs1.1

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Axel Davy
dbb03f6b5b st/nine: Fix D3DPMISCCAPS_FOGANDSPECULARALPHA support
The documentation of the flag doesn't make sense.
To sum up the doc, if not set, specular alpha contains fog,
and if set specular alpha contains 0 (except for ff).

However in practice when the flag is there, apps do use specular alpha
as if it could be used normally, which makes much more sense than the doc.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-02-04 22:12:17 +01:00
Patrick Rudolph
9298a0b81b st/nine: Fix AlphaCmpCaps
AlphaCmpCaps should advertise D3DPCMPCAPS_NEVER as well.

Fixes https://github.com/iXit/Mesa-3D/issues/142

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-02-04 22:12:17 +01:00
Chad Versace
3eebf3686b anv: Drop anv_image::needs_*_surface_state
anv_image::needs_sampler_surface_state was a redundant member,
identical to (usage & VK_IMAGE_USAGE_SAMPLED_BIT). Likewise for the
other needs_* members.
2016-02-04 12:20:51 -08:00
Chad Versace
42b9320fbf anv/image: Rename nonrt_surface_state
Let's call it what it is, not what it is not. Rename it to
'sampler_surface_state'.
2016-02-04 12:20:51 -08:00
Marek Olšák
bff640b3e0 radeonsi: implement PK2H and UP2H opcodes
Based on a gallivm patch by Ilia Mirkin.

+8 piglit regressions due to precision issues (I blame the tests)

The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead
of emulation with integer instructions. They are GLSL 4.00 intrinsics.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-02-04 19:52:28 +01:00
Matt Turner
973ba3f4d4 glsl: Ensure glsl/ exists before making the lexer/parser.
Reported-by: Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93989
2016-02-04 09:31:17 -08:00
Matt Turner
8c7a42b3e8 i965/fs: Allocate single register at a time for constants.
No instruction counts changed, but:

  total cycles in shared programs: 64834502 -> 64781530 (-0.08%)
  cycles in affected programs: 16331544 -> 16278572 (-0.32%)
  helped: 4757
  HURT: 4288

  GAINED: 66
  LOST:   20

I remember trying this when I first wrote the pass, but it wasn't
helpful at the time.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-02-04 09:30:58 -08:00
Marek Olšák
8ec24678ac radeonsi: fix Hyper-Z on Stoney
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-04 16:47:41 +01:00
Patrick Baggett
9c78cfd547 mesa: Use SSE prefetch instructions rather than 3DNow instructions
64-bit Pentium 4 CPUs don't have the 3DNow prefetch instructions
which results in an Illegal instruction crash.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Timothy Arceri <t_arceri@yahoo.com.au>
https://bugs.freedesktop.org/show_bug.cgi?id=27512
2016-02-04 22:02:31 +11:00
Jason Ekstrand
1f5d56304f anv/descriptor_set: Fix descriptor copies
We weren't pulling the actual binding location information out of the set
layout.  The new code mirrors the descriptor write code.
2016-02-03 22:44:33 -08:00
Ilia Mirkin
edd494ddf0 nv50/ir: make sure to fetch all sources before creating instruction
We must fetch all sources into the instruction stream before generating
the instruction that uses them. Otherwise we'll define values after
using them, which won't work so well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-02-03 18:40:38 -05:00
Ilia Mirkin
a9d5c64c34 nv50: avoid freeing the symbols if they're about to be stored
Spotted by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-02-03 18:40:26 -05:00
Ilia Mirkin
9284fd9c0d st/mesa: fix potential null deref if no shader is passed in
Spotted by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-02-03 18:40:13 -05:00
Ilia Mirkin
5ac7f0433b glx: update to updated version of EXT_create_context_es2_profile
The EXT spec has been updated to:
 - logically combine the es2_profile and es_profile exts
 - allow any legal version to be requested

dEQP tests request a specific ES version when using GLX, so this allows
dEQP upstream to run against GLX with the appropriate X server patch
(which had similar disabling logic).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Adam Jackson <ajax@redhat.com> (v3)

v1 -> v2:
 - distinguish between DRI_API_GLES{,2,3}
 - add GLX_EXT_create_context_es_profile client-side support
v2 -> v3:
 - fix error in computing mask
2016-02-03 15:44:51 -05:00
Ilia Mirkin
ad0e48e518 dir-locals.el: set case-label offset to 0
While this is the default, private .emacs files might have it set to
something else. No harm in forcing it to 0.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-02-03 15:44:51 -05:00
Jose Fonseca
1c0f95f602 appveyor: Bump shallow clone depth.
To prevent build failures when a large patch series is committed, like
happened in https://ci.appveyor.com/project/jrfonseca-fdo/mesa/build/322
due to 10 commits between dac2964f3e and
6f428328d3 where submitted before the
build slave started the git clone.

100 commits should be bigger than any patch series seen in practice, and
it takes practically the same time to download as 5 commits.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-03 19:37:19 +00:00
Rob Clark
029c89a0cc Revert "compiler: removed unused Makefile.sources"
Whoops, didn't mean to push this one.

This reverts commit 78f4c555b9.
2016-02-03 14:35:10 -05:00
Rob Clark
1be9184ff3 compiler: fix .gitignore for glsl_compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-03 13:32:46 -05:00
Rob Clark
78f4c555b9 compiler: removed unused Makefile.sources
We seem to end up w/ duplication between compiler/Makefile.sources and
compiler/glsl/Makefile.sources.  The latter appears unused.  Delete it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-03 13:19:45 -05:00
Nicolai Hähnle
43a401a792 gallium: fix the documentation of PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE
This parameter is equivalent to the corresponding OpenGL implementation
limit which is in texels, not bytes.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:12:37 +01:00
Nicolai Hähnle
7dd31b81fe gallium/radeon: support PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
This is already used internally in si_resource_copy_region for compressed
textures, so the only real change here is the adjusted surface size
computation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:37 +01:00
Nicolai Hähnle
4b02f16537 st/mesa: implement PBO upload for glCompressedTex(Sub)Image
v2:
- use st->pbo_upload.enabled flag

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:37 +01:00
Nicolai Hähnle
f38bb36f57 st/mesa: redirect CompressedTexSubImage to our own implementation
This is where PBO upload will go.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:36 +01:00
Nicolai Hähnle
16c2ea1fcc st/mesa: inline the implementation of _mesa_store_compressed_teximage
We will write our own version of texsubimage for PBO uploads, and we will
want to call that here as well.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:36 +01:00
Nicolai Hähnle
c99f2fe70e st/mesa: implement PBO upload for multiple layers
Use instancing to generate two triangles for each destination layer and use
a geometry shader to route the layer index.

v2:
- directly write layer in VS if supported by the driver (Marek Olšák)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:36 +01:00
Fredrik Höglund
757071ca7c st/mesa: Accelerate PBO uploads
Create a PIPE_BUFFER sampler view on the pixel-unpack buffer, and draw
the image on the texture with a fragment shader that maps fragment
coordinates to buffer coordinates.

Modifications by Nicolai Hähnle:
- various cleanups and fixes (e.g. error handling, corner cases)
- split try_pbo_upload into two functions, which will allow code to be
  shared with compressed texture uploads
- modify the source format selection to only test for support against
  the PIPE_BUFFER target

v2:
- update handling of TGSI_SEMANTIC_POSITION for recent changes in master
- MaxTextureBufferSize is number of texels, not bytes (Ilia Mirkin)
- only enable when integers are supported (Marek Olšák)
- try harder to hit the TextureBufferOffsetAlignment
- remove unnecessary MOV from the fragment shader

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:35 +01:00
Nicolai Hähnle
4a448a63ad st/mesa: use the correct address generation functions in st_TexSubImage blit
We need to tell the address generation functions about the dimensionality of
the texture to correctly implement the part of Section 3.8.1 (Texture Image
Specification) of the OpenGL 2.1 specification which says:

    "For the purposes of decoding the texture image, TexImage2D is
    equivalent to calling TexImage3D with corresponding arguments
    and depth of 1, except that
      ...
      * UNPACK SKIP IMAGES is ignored."

Fixes a low impact bug that was found by chance while browsing the spec and
extending piglit tests.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:35 +01:00
Nicolai Hähnle
6af6d7b08a gallium: Add PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
This cap indicates whether pipe->create_surface can reinterpret a texture
as a surface with a format of different block width/height (but equal
block size).

v2: fix whitespace

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:34 +01:00
Nicolai Hähnle
3abb548ef6 gallium: Add PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY
This cap indicates that the driver only supports R, RG, RGB and RGBA
formats for PIPE_BUFFER sampler views.

v2: move into "unsupported features" section for nouveau (Ilia Mirkin)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-03 14:10:34 +01:00
Nicolai Hähnle
bc8a6842a9 mesa: add MESA_NO_MINMAX_CACHE environment variable
When set to a truish value, this globally disables the minmax cache for all
buffer objects.

No #ifdef DEBUG guards because this option can be interesting for
benchmarking.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:04:11 +01:00
Nicolai Hähnle
761c7d59c4 vbo: disable the minmax cache when the hit rate is low
When applications stream their index buffers, the caches for those BOs become
useless and add overhead, so we want to disable them. The tricky part is
coming up with the right heuristic for *when* to disable them.

The first question is which hit rate to aim for. Since I'm not aware of any
interesting borderline applications that do something like "draw two or three
times for each upload", I just kept it simple.

The second question is how soon we should give up on the caching. Applications
might have a warm-up phase where they fill a buffer gradually but then keep
reusing it. For this reason, I count the number of indices that hit and miss
(instead of the number of calls that hit or miss), since comparing that to
the size of the buffer makes sense.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:04:06 +01:00
Nicolai Hähnle
115c643b16 mesa: add USAGE_DISABLE_MINMAX_CACHE flag to buffer UsageHistory
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:59 +01:00
Nicolai Hähnle
6b057f8ecc vbo: cache/memoize the result of vbo_get_minmax_indices (v3)
Some games developers are unaware that an index buffer in a VBO still needs
to be read by the CPU if some varying data comes from a user pointer (unless
glDrawRangeElements and friends are used). This is particularly bad when
they tell us that the index buffer should live in VRAM.

This cache helps, e.g. lifting This War Of Mine (a particularly bad
offender) from under 10fps to slightly over 20fps on a Carrizo.

Note that there is nothing prohibiting a user from rendering from multiple
threads simultaneously with the same index buffer, hence the locking. (The
internal buffer map taken for the buffer still leads to a race, but at least
the locks are a move in the right direction.)

v2: disable the cache on USAGE_TEXTURE_BUFFER as well (Chris Forbes)

v3:
- use bool instead of GLboolean for MinMaxCacheDirty (Ian Romanick)
- replace the sticky USAGE_PERSISTENT_WRITE_MAP bit by a direct
  AccessFlags check

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:49 +01:00
Nicolai Hähnle
1a570d96a6 vbo: move vbo_get_minmax_indices into its own source file
We will add more code for caching/memoization. Moving the existing code
into its own file helps keep things modular.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:48 +01:00
Nicolai Hähnle
46b7a526f5 mesa/main: bail earlier for size == 0 in _mesa_clear_buffer_sub_data
Note that the conversion of the clear data (when data != NULL) can fail due
to an out of memory condition, but it does not check any error conditions
mandated by the spec. Therefore, it is safe to skip when size == 0.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:46 +01:00
Nicolai Hähnle
fd7229b437 mesa/main: add USAGE_PIXEL_PACK_BUFFER flag to buffer UsageHistory
We will want to disable minmax index caching for buffers that are used in this
way.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:45 +01:00
Nicolai Hähnle
54c4a9803b mesa/main: add USAGE_TRANSFORM_FEEDBACK_BUFFER flag to buffer UsageHistory
We will want to disable minmax index caching for buffers that are used in this
way.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:41 +01:00
Nicolai Hähnle
55fb921d69 util/hash_table: add _mesa_hash_table_num_entries
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-03 14:03:35 +01:00
Nicolai Hähnle
8b11d8cfbf util/hash_table: add _mesa_hash_table_clear (v4)
v4: coding style change (Matt Turner)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
2016-02-03 14:03:25 +01:00
Leo Liu
6ad2e55a14 st/omx/dec/h264: fix corruption when scaling matrix present flag set
The scaling list should be filled out with zig zag scan

v2: integrate zig zag scan for list 4x4 to vl(Christian)
v3: move list determination out from the loop(Ilia)

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-02-02 20:29:47 -05:00
Leo Liu
4f598f2173 vl: add zig zag scan for list 4x4
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-02-02 20:29:43 -05:00
Roland Scheidegger
848a023c05 llvmpipe: use scissor_planes_needed helper function
So it doesn't get out of sync in multiple places.
2016-02-03 01:25:45 +01:00
Jordan Justen
141ef75569 i965/gen8: Initialize aux_mode to GEN8_SURFACE_AUX_MODE_NONE
GEN8_SURFACE_AUX_MODE_NONE is 0, so this is a no-op.

Yet, this also makes it clear that we can compare aux_mode to the
other GEN8_SURFACE_AUX_MODE_ values. We will want to compare to
GEN8_SURFACE_AUX_MODE_HIZ.

v2: Some very minor cherry-pick conflicts due to moving it around in the series.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-02-02 15:44:18 -08:00
Mark Janes
6a7e2904e0 nir/spirv: fix build_mat4_det stack smasher
When generating a sub-determinate matrix, a 3-element swizzle array was
indexed with clever inline boolean logic.  Unfortunately, when i and j
are both 3, the index overruns the array, smashing the next variable on
the stack.

For 64 bit builds, the alignment of the 3-element unsigned array leaves
32 bits of spacing before the next local variable, hiding this bug.  On
i386, a subcolumn pointer was smashed then dereferenced.
2016-02-02 15:30:54 -08:00
Mark Janes
ea8c2d118a anv: Fix anv_descriptor_set reference error on deletion
anv_descriptor_set_destroy uses the descriptor sets's set_layout member
to iterate the set's buffer views.  However, the set_layout reference
may have previously been freed.

On 64 bit builds, this bug generated valgrind errors but did not affect
CTS test results.  On 32 bit builds, it reliably produces assertions and
memory corruption.
2016-02-02 15:28:01 -08:00
Kristian Høgsberg Kristensen
5a06bac4a0 anv: Use @LIB_DIR@ in anv_icd.json
Otherwise we may get a lib vs lib64 mismatch.
2016-02-02 14:36:22 -08:00
Ilia Mirkin
18f688d62a mesa: use default geometry's samples when there are no attachments
Whether multisampling is turned on depends, in part, on whether
attachments are themselves multisample surfaces. However when there are
no attachments, we should rely on the default geometry for this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-02-02 17:08:46 -05:00
Ilia Mirkin
095da3b550 mesa: invalidate framebuffer when changing parameters
This fixes dEQP-GLES31.functional.fbo.completeness.no_attachments

When the width or height are 0, the framebuffer is incomplete. We may
also not have been passing the new state down to the driver when the
widths/heights/etc changed. Make sure to dirty the state so that the
framebuffer state is revalidated at draw time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 17:08:46 -05:00
Ilia Mirkin
beac7b1b8b mesa: use geometric helper for computing min samples
In case we have a draw buffer without attachments, we should be looking
at the default number of samples.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-02-02 17:08:46 -05:00
Ilia Mirkin
2d4976fa19 mesa: the _mesa_geometric_* functions require full types from mtypes.h
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 17:08:46 -05:00
Jason Ekstrand
fd99f3d658 anv/device: Improve version error reporting 2016-02-02 13:16:13 -08:00
Jason Ekstrand
c7f26bbed9 vulkan: Bump the header to 1.0.3 2016-02-02 13:08:47 -08:00
Jason Ekstrand
0d2145b50f anv/fence: Default to not ready
This is kind-of silly.  We *really* need to do a better job of making sure
all objects have all their default values set.  We probably also want to,
eventually, put everything into the BO (to save memory) and, more
specifically, make the GPU write the "ready" flag.  That way GetFenceStatus
won't ever have to call into the kernel.
2016-02-02 12:22:03 -08:00
Niels Ole Salscheider
fb44cfadce winsys/radeon: Do not deinit the pb cache if it was not initialized
This fixes a crash in pb_cache_release_all_buffers.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-02-02 21:11:15 +01:00
Marek Olšák
84a6d2d7d6 tgsi/scan: add tgsi_shader_info::reads_samplemask 2016-02-02 21:04:52 +01:00
Marek Olšák
0d68b91220 radeonsi: rework RB+ for Stoney
This fixes it.

States which also need to be taken into account:
- SPI color formats - each down-conversion format supports only a limited set
  of SPI formats
- whether MSAA resolving and logic op are enabled

These need special handling:
- blending
- disabled channels

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák
066d76c2f4 radeonsi: rename cb_target_mask state to cb_render_state
and rename a variable in the function.

SX_PS_DOWNCONVERT will be emitted here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:19 +01:00
Marek Olšák
5f0f9a5619 radeonsi: treat intensity render targets exactly like red
The motivation is to simplify the Stoney RB+ code.
Intensity is already treated as red except here.

No piglit regressions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-02-02 21:03:18 +01:00
Marek Olšák
f96f94966d tgsi: set correct src type for UP2H
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-02-02 21:02:26 +01:00
Connor Abbott
19db71807f util/hash_table: don't compare deleted entries
The equivalent of the last patch for the hash table. I'm not aware of
any issues this fixes.

v2:
- use entry_is_deleted (Timothy)

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2016-02-02 14:42:40 -05:00
Connor Abbott
8fc2f652a2 util/set: don't compare against deleted entries
When we delete entries in the hash set, we mark them "deleted" by
setting their key to the deleted_key, which points to a dummy
deleted_key_value. When searching for an entry, we normally skip over
those, but set_add() had some code for searching for duplicate entries
which forgot to skip over deleted entries. This led to a segfault inside
the NIR vectorization pass, since its key comparison function
interpreted the memory where deleted_key_value resides as a pointer and
tried to dereference it.

v2:
- add better commit message (Timothy)
- use entry_is_deleted (Timothy)

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2016-02-02 14:42:32 -05:00
Jordan Justen
bd97b62525 glsl: Disable tree grafting optimization for shared variables
Fixes:
 * dEQP-GLES31.functional.compute.basic.shared_atomic_op_multiple_groups
 * dEQP-GLES31.functional.compute.basic.shared_atomic_op_multiple_invocation
 * dEQP-GLES31.functional.compute.basic.shared_atomic_op_single_group
 * dEQP-GLES31.functional.compute.basic.shared_atomic_op_single_invocation

From https://android.googlesource.com/platform/external/deqp

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-02 10:50:40 -08:00
Jordan Justen
afef1422cb glsl: Enable debug prints for do_common_optimization
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-02 10:50:40 -08:00
Roland Scheidegger
5e090079e1 Revert "i965: Provide sse2 version for rgba8 <-> bgra8 swizzle"
This reverts commit ab30426e33.

Apparently the memory isn't quite as aligned when this gets called
as it should be, causing crashes. (Albeit this looks independent
from this code, should crash just as well if ssse3 is enabled when
compiling without this patch.)
https://bugs.freedesktop.org/show_bug.cgi?id=93962
2016-02-02 15:45:59 +01:00
Dave Airlie
e7a27f70b9 virgl: mark function as static
This is fallout from the previous changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93961

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 17:55:40 +10:00
Roland Scheidegger
7221b8aec6 gallivm: add PK2H/UP2H support
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due to those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:20 +01:00
Roland Scheidegger
5171ec9ca9 gallivm: add PK2H/UP2H support
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).
2016-02-02 05:58:19 +01:00
Roland Scheidegger
dc16086e3b tgsi: add PK2H/UP2H support
The util functions handle the half-float conversion.
Note that piglit won't like it much due to:
a) The util functions use magic float mul conversion but when run inside
softpipe/llvmpipe, denorms are flushed to zero, therefore when the conversion
is from/to f16 denorm the result will be zero. This is a bug which should be
fixed in these functions (should not rely on denorms being available), but
will happen elsewhere just the same (e.g. conversion to f16 render targets).
b) The util functions use trunc round mode rather than round-to-nearest. This
is NOT a bug (as it is a d3d10 requirement). This will result of rounding not
representable finite values to MAX_F16 rather than INFINITY. My belief is the
piglit tests are wrong here but it's difficult to tell (generally glsl
rounding mode is undefined, however I'm not sure if rounding mode might need
to be consistent for different operations). Nevertheless, for gl it would be
better to use round-to-nearest, but using different rounding for GL and d3d10
is an unsolved problem (as it affects things like conversion to f16 render
targets, clear colors, this shader opcode).

Hence for now don't enable the cap bit (so the code is unused).
(Code is from imirkin, comment from sroland)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmvware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:19 +01:00
Roland Scheidegger
99bd96abbb llvmpipe: drop scissor planes early if the tri is fully inside them
If the tri is fully inside a scissor edge (or rather, we just use the
bounding box of the tri for the comparison), then we can drop these
additional scissor "planes" early. We do not even need to allocate
space for them in the tri.

The math actually appears to be slightly iffy due to bounding boxes
being rounded, but it doesn't matter in the end.

Those scissor rects are costly - the 4 planes from the scissor are
already more expensive to calculate than the 3 planes from the tri itself,
and it also prevents us from using the specialized raster code for small
tris.

This helps openarena performance by about 8% or so. Of course, it helps
there that while openarena often enables scissoring (and even moves the
scissor rect around) I have not seen a single tri actually hit the
scissor rect, ever.

v2: drop individual scissor edges, and do it earlier, not even allocating
space for them.
v3: help the compiler a bit with simpler code, suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:19 +01:00
Roland Scheidegger
9d2a34e105 llvmpipe: minor cleanup of sse2 for calc_fixed_position
Just slightly simpler assembly.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:19 +01:00
Roland Scheidegger
8aa168eb8f llvmpipe: use vector loads for (optimized) tri raster funcs
When we switched to 64bit rasterization, we could no longer use straight
aligned loads for loading the plane data. However, what the code actually
does for loading 3 planes, is 12 scalar loads + 9 unpacks, and then there's
another 8 unpacks for the transpose we need (!).

It would be possible to do the (scalar) loads of course already transposed
(at least saving the additional unpacks), however instead just use
(un)aligned vector loads, and recalculate the eo values, which is much less
instructions (note in case of the triangle_32_3_4 case, the eo values are
not even used, making the scalar loads + unpacks for them all the more
pointless).

This drops execution time of the triangle_32_3_4 function considerably,
albeit it doesn't really make a measurable difference (for small tris we're
essentially limited by vertex throughput in any case), for triangle_32_3_16
it's essentially noise (the loop is more costly than the initial code there).

(I'm thinking about just ditching storing the eo values in the plane data,
so could switch back to using aligned planes, however right now they are
still used in the other raster functions dealing with planes with scalar
code. Also not touching the ppc code, might not be that bad there in any
case.)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-02-02 05:58:19 +01:00
Roland Scheidegger
ab30426e33 i965: Provide sse2 version for rgba8 <-> bgra8 swizzle
The existing code used ssse3, and because it isn't compiled in a separate
file compiled with that, it is usually not used (that, of course, could
be fixed...), whereas sse2 is always present at least with 64bit builds.
This should be pretty much as fast as the pshufb version, albeit those
code paths aren't really used on chips without llc in any case.

v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-02 05:58:19 +01:00
Roland Scheidegger
116e4dc995 mesa: fix typo in python scripts
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-02-02 05:58:19 +01:00
Rob Herring
f0f4259324 virgl: also build vtest for Android
Enabling swrast on Android causes a link error because vtest is missing.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:51 +10:00
Rob Herring
2d3301e4d5 virgl: fix reference counting of prime handles
The virgl reference counting of buffers is broken for prime fd buffers.
Each prime fd passed into virgl_drm_winsys_resource_create_handle creates
a new resource. The solution requires creating a separate hash table to
track flink names separately from prime handles.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:29 +10:00
Rob Herring
f87330dbce virgl: reuse screen when fd is already open
It is necessary to share the screen between mesa and gralloc to
properly ref count resources. This implements a hash lookup on
the file description to re-use an already created screen. This is
a similar implementation as freedreno and radeon.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-02 09:58:29 +10:00
Mark Janes
ac0589b213 i965: fix unsigned long overflows for i386
bit-shifts on 32 bit unsigned longs overflow in several places.  The
intention was for 64 bit integers to be used.
2016-02-01 14:52:22 -08:00
Mauro Rossi
6711592c2f nouveau/video: wrap assertion within #ifndef NDEBUG
The change is necessary to avoid the following building error in android:

external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c: In function 'nouveau_vp3_bsp_next':
external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c:269:14: error: 'bsp_bo' undeclared (first use in this function)
       assert(bsp_bo->size >= str_bsp->w0[0] + num_bytes[i]);
              ^
This matches the declaration of the variables in question.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-01 17:45:19 -05:00
Ilia Mirkin
047b917718 st/mesa: treat a write as a read for range purposes
We use this logic to detect live ranges and then do plain renaming
across the whole codebase. As such, to prevent WaW hazards, we have to
treat a write as if it were also a read.

For example, the following sequence was observed before this patch:

 13: UIF TEMP[6].xxxx :0
 14:   ADD TEMP[6].x, CONST[6].xxxx, -IN[3].yyyy
 15:   RCP TEMP[7].x, TEMP[3].xxxx
 16:   MUL TEMP[3].x, TEMP[6].xxxx, TEMP[7].xxxx
 17:   ADD TEMP[6].x, CONST[7].xxxx, -IN[3].yyyy
 18:   RCP TEMP[7].x, TEMP[3].xxxx
 19:   MUL TEMP[4].x, TEMP[6].xxxx, TEMP[7].xxxx

While after this patch it becomes:

 13: UIF TEMP[7].xxxx :0
 14:   ADD TEMP[7].x, CONST[6].xxxx, -IN[3].yyyy
 15:   RCP TEMP[8].x, TEMP[3].xxxx
 16:   MUL TEMP[4].x, TEMP[7].xxxx, TEMP[8].xxxx
 17:   ADD TEMP[7].x, CONST[7].xxxx, -IN[3].yyyy
 18:   RCP TEMP[8].x, TEMP[3].xxxx
 19:   MUL TEMP[5].x, TEMP[7].xxxx, TEMP[8].xxxx

Most importantly note that in the first example, the second RCP is done
on the result of the MUL while in the second, the second RCP should have
the same value as the first. Looking at the GLSL source, it is apparent
that both of the RCP's should have had the same source.

Looking at what's going on, the GLSL looks something like

  float tmin_8;
  float tmin_10;
  tmin_10 = tmin_8;
... lots of code ...
  tmin_8 = tmpvar_17;
... more code that never looks at tmin_8 ...

And so we end up with a last_read somewhere at the beginning, and a
first_write somewhere at the bottom. For some reason DCE doesn't remove
it, but even if that were fixed, DCE doesn't handle 100% of cases, esp
including loops.

With the last_read somewhere high up, we overwrite the previously
correct (and large) last_read with a low one, and then proceed to decide
to merge all kinds of junk onto this temp. Even if that weren't the
case, and there were just some writes after the last read, then we might
still overwrite a merged value with one of those.

As a result, we should treat a write as a last_read for the purpose of
determining the live range.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2016-02-01 17:40:18 -05:00
Jason Ekstrand
8776d3cb8e nir/spirv: Fix UBO loads of a single element of a row-major matrix 2016-02-01 14:03:05 -08:00
Jason Ekstrand
499f7c2f0b nir/spirv: Handle the LOD parameter of OpImageQuerySizeLod 2016-02-01 14:03:05 -08:00
Jason Ekstrand
b1a1623293 nir/spirv: Add support for SpvOpImage 2016-02-01 14:03:05 -08:00
Jason Ekstrand
593f88c0db nir/spirv: Fix the UBO loading case of a single row-major matric column 2016-02-01 14:03:05 -08:00
Jason Ekstrand
abc0e5c1b8 nir/spirv: Fix the UBO loading case of a single row-major matric column 2016-02-01 13:26:59 -08:00
Jason Ekstrand
2d2c6fc6bb anv/wsi/wayland: Advertise sRGB 2016-02-01 13:06:35 -08:00
Jason Ekstrand
443c578bca anv/wsi/x11: Expose SRGB all the time
After a long discussion with Eric Anholt and Owen Taylor, I learned that
X11 is basically always sRGB as that's what the scanout hardware does and X
doesn't modify anything.  Therefore, we should just always expose sRGB
formats.
2016-02-01 13:06:35 -08:00
Chad Versace
afb327a985 anv: Structify a one-member union
anv_descriptor contained a union with one member.
2016-02-01 12:18:10 -08:00
Kristian Høgsberg Kristensen
dc5fdcd6b7 anv: Advertise robustBufferAccess
The GPU does most of this for us as long as we set up tight bounds for
the buffers, which we do. Additionally, we range check dynamically
buffers in the shader. With that it's safe to turn on robustBufferAccess.
2016-02-01 12:00:05 -08:00
Chad Versace
ffbc32f8d9 anv/meta: Strip trailing whitespace 2016-02-01 10:51:01 -08:00
Chad Versace
aa5e257860 anv: Update MSAA status in README 2016-02-01 10:46:24 -08:00
Matt Turner
75c9def8ee i965/gen7+: Use NIR for lowering of pack/unpack opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
f4952421cd i965/vec4: Implement nir_op_pack_uvec2_to_uint.
And mark nir_op_pack_uvec4_to_uint unreachable, since it's only produced
by lowering pack[SU]norm4x8 which the vec4 backend does not need.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
955d052058 nir: Add lowering support for unpacking opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
9b8786eba9 nir: Add lowering support for packing opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
1dc312e295 i965/fs: Implement support for extract_word.
The vec4 backend will lower it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
68f8c5730b nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
8709dc0713 glsl: Remove 2x16 half-precision pack/unpack opcodes.
i965/fs was the only consumer, and we're now doing the lowering in NIR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
1a53a4fc7a i965/fs: Switch from GLSL IR to NIR for un/packHalf2x16 scalarizing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
9ce901058f nir: Add lowering of nir_op_unpack_half_2x16.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
e4278a847e i965: Make separate nir_options for scalar/vector stages.
We'll want to have different lowering options set for scalar/vector
stages.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
252d497d4c i965: Move brw_compiler_create() to new brw_compiler.c.
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner
140a886c41 nir: Make argument order of unop_convert match binop_convert.
Strangely the return and parameter types were reversed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Jason Ekstrand
a88b1eeb13 Update the README 2016-02-01 06:10:51 -08:00
Jason Ekstrand
ea63663a72 wsi/x11: Remove B8G8R8_UNORM
We don't actually support that format yet because ISL doesn't have an enum
for it.  We need to beef up the formats table to allow for tiled-only
formats.
2016-02-01 06:00:50 -08:00
Marta Lofstedt
77a60ab5dc mesa: enable enums for OES_geometry_shader
Enable GL_OES_geometry_shader enums for OpenGL ES 3.1.

V4: EXTRA tokens updated according to comments from Ilia Mirkin.

V5: Account for check_extra does not evaluate "or" lazy. Fix issues
with EXTRA_EXT_FB_NO_ATTACH_CS.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-02-01 09:30:50 +01:00
François Tigeot
a48afb92ff gallium: Add DragonFly support
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-31 11:56:09 +00:00
Jordan Justen
f96a6c65a3 anv/gen7: Rename gen7_batch_lr* to emit_lr*
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 15:06:03 -08:00
Jordan Justen
b207a6b5aa anv/gen7: Set BypassGatewayControl in MEDIA_VFE_STATE
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 15:06:03 -08:00
Ilia Mirkin
7f19e29305 nv50/ir: get rid of memory stores with nop values
This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.

total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs    : 910139 -> 895501 (-1.61%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst
    helped           0        7442        4693
      hurt           0          90        2687

Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-30 17:18:41 -05:00
Ilia Mirkin
3ca941d60e nv50/ir: fix false global CSE on instructions with multiple defs
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do

    a, b = tex()
    if () c = a
    else c = b

which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.

This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 17:18:41 -05:00
Ilia Mirkin
3ca2001b53 nv50,nvc0: fix buffer clearing to respect engine alignment requirements
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.

This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.

As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
2016-01-30 16:01:41 -05:00
Jordan Justen
2d8726a4b7 anv/genX_pipeline: Remove unnecessary #include files
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:30:54 -08:00
Rob Clark
f15447e7c9 freedreno/ir3: ignore clip-vertex varying
Since we emulate clip-planes, the clip-vertex is used within the VS
itself (thanks to nir_lower_clip).  So just ignore it as a VS output.
Fixes a boatload of piglit tests that were asserting on unknown
varying slot.

(Also unrelated spelling/typo fix.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:29:21 -05:00
Rob Clark
f20cf22b54 freedreno/ir3: don't ignore local vars
With glsl_to_nir we end up with local variables, instead of global, for
arrays.

Note that we'll eventually have to do something more clever, I think,
when we support multiple functions, but that will probably take some
work in a few places.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:57 -05:00
Rob Clark
8039a2a6b3 freedreno/ir3: handle tex instrs w/ const offset
Something we start to see with glsl_to_nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:27:27 -05:00
Rob Clark
f212d7dc50 freedreno/ir3: support load_front_face intrinsic
With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE.
But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:11:54 -05:00
Jordan Justen
8e48ff3ad6 anv/gen7: Set SLM size in interface descriptor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:10:54 -08:00
Rob Clark
9e05e8cb75 freedreno: limit string marker to max packet size
Experimentally derived max size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-30 12:10:13 -05:00
Jordan Justen
ab0d8608d2 anv: Support MEDIA_VFE_STATE for gen7
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:08:34 -08:00
Jordan Justen
dd2effb0e7 anv/gen7: Subtract 1 from num_elements when setting up buffer surface state
e8f51fe4 for gen7

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:00:00 -08:00
Jordan Justen
4bb1e7937a anv/gen7: Disable fs dispatch for depth/stencil only pipelines
292031a for gen7

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:00:00 -08:00
Jordan Justen
f5b3a2fe32 anv/gen7: Add support for gl_NumWorkGroups
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:00:00 -08:00
Jordan Justen
7e46cc8603 anv/gen7/compute: Setup push constants and local ids
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 09:00:00 -08:00
Jordan Justen
b1158ced45 anv/genX: Add genX_pipeline.c for compute_pipeline_create
Adds initial compute_pipeline_create implementation for gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-30 08:58:11 -08:00
Jason Ekstrand
1a442a7923 Merge branch 'vulkan' into 'vulkan'
Vulkan WSI Wayland fixes

Two small fixes to make mailbox mode actually work again.

See merge request !4
2016-01-30 10:28:12 -05:00
Jason Ekstrand
c668dc9f75 anv/pass: Initialize has_resolve 2016-01-30 07:16:33 -08:00
Jason Ekstrand
ad813b072a anv/wsi: Set the platform field of VkIcdSurfaceBase 2016-01-30 07:05:53 -08:00
Jason Ekstrand
5acc4e2ebf anv/wsi/x11: Actually pull information from the window's visual 2016-01-30 03:51:47 -08:00
Jason Ekstrand
66e8b5cf2b anv/wsi/x11: Actually check for DRI3 2016-01-30 03:50:31 -08:00
Jason Ekstrand
44ec860cd6 anv/WSI: Support more usage bits
They're just images and we have no intention of stompping alpha channels
(at least not yet), so there's no reason why you can't sample.
2016-01-29 20:52:44 -08:00
Jason Ekstrand
337c1e0871 anv/formats: Add more compressed formats
This adds support for the DX compression formats.  Given that ETC and EAC
are working fine, these should be ok too.
2016-01-29 20:46:31 -08:00
Jason Ekstrand
c688e4db11 anv/wsi: Rework to be compatable with the loader 2016-01-29 20:39:21 -08:00
Jason Ekstrand
d4953fb340 vulkan: Import vk_icd.h 2016-01-29 20:37:45 -08:00
Jason Ekstrand
a19ceee46c anv/device: Fix version check
The bottom-end check was wrong so it was only working on <= 1.0.0.  Oops.
2016-01-29 20:36:58 -08:00
Ilia Mirkin
438d421f8b nvc0: avoid crashing when there are holes in vertex array bindings
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-29 22:10:42 -05:00
Ilia Mirkin
899b1b98a4 nvc0: enable atomic counters and ssbo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 22:10:42 -05:00
Ilia Mirkin
48cf392c0e nv50/ir: handle new TGSI MEMBAR opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
df043f0764 nvc0/ir: fix atomic compare-and-swap arguments
Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
7b9a77b905 nv50/ir: add support for indirect buffer loading
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:48 -05:00
Ilia Mirkin
2c4eeb0b5c nv50/ir: add SUQ op by reading the info from driver constbuf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin
c3083c7082 nv50/ir: add support for BUFFER accesses
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:47 -05:00
Ilia Mirkin
abe427ebd2 nvc0: handle shader buffer memory barrier
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:22:38 -05:00
Ilia Mirkin
fe01be4ad5 nvc0: add state management for shader buffers
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:07 -05:00
Ilia Mirkin
b4688c4615 nvc0: double per-shader stage driver constants area
We need to store a lot more info now with per-buffer address/size.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-29 21:06:06 -05:00
Ilia Mirkin
ae725d5746 trace: add support for set_shader_buffers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add arg_begin/arg_end around buffer array
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
fea25db925 st/mesa: enable ARB_shader_storage_buffer_object when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
6fb8fac853 st/mesa: add shader buffer barrier bit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
792bab24ac st/mesa: add support for memory barrier intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)

v1 -> v2: use TGSI_MEMBAR defines
2016-01-29 21:05:47 -05:00
Ilia Mirkin
c0e1c54a4f st/mesa: use RESQ to find buffer size
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-29 21:05:47 -05:00
Ilia Mirkin
6880036694 st/mesa: add support for SSBO binding and GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

v1 -> v2: some 80 char reformatting
2016-01-29 21:05:46 -05:00
Ilia Mirkin
9d6f9ccf6b st/mesa: add atomic counter support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:46 -05:00
Ilia Mirkin
0fddb677e6 mesa: add PROGRAM_IMMEDIATE, PROGRAM_BUFFER
This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and
adds PROGRAM_BUFFER to the list. These are used purely inside
glsl_to_tgsi conversion.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:05:35 -05:00
Ilia Mirkin
35f8488668 glsl: keep track of ssbo variable being accessed, add access params
Currently any access params (coherent/volatile/restrict) are being lost
when lowering to the ssbo load/store intrinsics. Keep track of the
variable being used, and bake its access params in as the last arg of
the load/store intrinsics.

If the variable is accessed via an instance block, then 'variable'
points to the instance block variable and not the field inside the
instance block that we are accessing. In order to check access
parameters for the field itself we need to detect this case and keep
track of the corresponding field struct so we can extract the specific
field access information from there instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add tracking of struct field
v2 -> v3: minor adjustments based on Iago's feedback
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-29 21:05:08 -05:00
Ilia Mirkin
2b089c7ffe glsl: always initialize image_* fields, copy them on interface init
Interfaces can have image properties set in case they are buffer
interfaces. Make sure not to lose this information.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-29 21:04:56 -05:00
Ilia Mirkin
2ccc42fd2c tgsi: add MEMBAR opcode to handle memoryBarrier* GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-29 21:04:36 -05:00
Kristian Høgsberg Kristensen
f28645f71c anv: Don't disable snooping for mempools
There's an intermittent flushing problem with VkEvent that we need to
root cause. For now, using the snooping feature keeps the memory pools
up to date with GPU writes and fixes the problem.
2016-01-29 17:19:51 -08:00
Kristian Høgsberg Kristensen
0c4ef36360 anv: clflush is only orderered against mfence
We can't use the more fine-grained load and store fence commands (lfence
and mfence), since clflush is only guaranteed to be ordered with respect
to mfence.
2016-01-29 14:56:41 -08:00
Kristian Høgsberg Kristensen
31d3486bd2 anv: Limit flushing to the range of mapped memory 2016-01-29 14:56:41 -08:00
Ben Widawsky
89ec36f221 anv/cmd_buffer: Emit gen9 style SF state for CHV
The state for line width changes on Cherryview to use the GEN9 bits (for extra
precision).
2016-01-29 14:12:32 -08:00
Ben Widawsky
31508bd0ce anv/gen8: Extract SF state
For upcoming patch to address difference in Cherryview.
2016-01-29 14:11:53 -08:00
Michel Dänzer
30fcf241e1 winsys/amdgpu: Process RADEON_FLAG_* independently from RADEON_DOMAIN_*
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Michel Dänzer
62f837e2ea winsys/amdgpu: Handle RADEON_FLAG_NO_CPU_ACCESS
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862

(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-29 16:06:06 +09:00
Karol Herbst
29d09f8747 nv50/ir: optimize mad/fma with third argument 0 to mul
Very modest effect, but it's clearly the right thing to do.

total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs    : 910157 -> 910131 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          55          85          85
      hurt           0          26          20          20

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:59:41 -05:00
Karol Herbst
3aa681449e nv50/ir: run DCE backwards
Reduces calls up to 50%

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:29 -05:00
Karol Herbst
978ae28ca2 nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))
Following shader-db results on GK110:

total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs    : 910187 -> 910157 (-0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)

                local        gpr       inst      bytes
    helped           0          18         821         821
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-28 15:34:22 -05:00
Chad Versace
f8a4abcd15 anv: Do resolves at end of subpass 2016-01-28 10:49:50 -08:00
Chad Versace
bef8456ede anv/meta: Remove unneeded resolve pipeline
Vulkan does not allow resolving a single-sample image. So remove that
pipeline from anv_meta_state::resolve::pipelines.
2016-01-28 10:45:11 -08:00
Chad Versace
ac5594fa71 anv/meta_resolve: Remove redundant initialization params 2016-01-28 10:14:39 -08:00
Chad Versace
142da00486 anv: Drop const on anv_framebuffer::attachments
The attachments should be const, but the driver's function signatures
are generally not const-friendly.

Drop the const because it conflicts with upcoming
anv_cmd_buffer_resolve_subpass().
2016-01-28 10:03:00 -08:00
Chad Versace
22258e279d anv: Add anv_subpass::has_resolve
Indicates that the subpass has at least one resolve attachment.
2016-01-28 10:03:00 -08:00
Chad Versace
3d863e8dad anv/meta_resolve: Save/Restore viewport and scissor 2016-01-28 10:03:00 -08:00
Chad Versace
8487569fa7 anv/meta_resolve: Begin pass outside emit_resolve()
This refactor is preparation for handling subpass resolve attachments.
2016-01-28 10:03:00 -08:00
Chad Versace
2bab3cd681 anv/image: Update usage flags for multisample images
Meta resolves multisample images by binding them as textures. Therefore
we must add VK_IMAGE_USAGE_SAMPLED_BIT.
2016-01-28 10:03:00 -08:00
Ilia Mirkin
089f605439 glsl: disallow implicit conversions in ESSL shaders
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-28 11:31:19 -05:00
Axel Davy
dda7a84986 radeonsi: Add option for SI scheduler
Add a debug option to select the LLVM SI Machine Scheduler.
R600_DEBUG=sisched

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-28 17:22:44 +01:00
Jason Ekstrand
608b411e9f anv/device: Add a better version check.
We now check that the requested version is precicely within the range of
versions that we support.
2016-01-28 08:19:40 -08:00
Samuel Iglesias Gonsálvez
f9c43dd22f glsl: double-precision values don't support interpolation
ARB_gpu_shader_fp64 spec says:

  "This extension does not support interpolation of double-precision
  values; doubles used as fragment shader inputs must be qualified as
  "flat"."

Fixes the regressions added by commit 781d278:

arb_gpu_shader_fp64-double-gettransformfeedbackvarying
arb_gpu_shader_fp64-tf-interleaved
arb_gpu_shader_fp64-tf-interleaved-aligned
arb_gpu_shader_fp64-tf-separate

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93878
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-28 11:35:03 +01:00
Jason Ekstrand
6286a74f6b anv/device: Advertise 1.0.2 2016-01-27 22:02:51 -08:00
Jason Ekstrand
ec80d6388a anv/formats: Properly set FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT
This was added last minute and the API bumped to 1.0.2.
2016-01-27 22:02:06 -08:00
Jason Ekstrand
ac75746448 vulkan.h: Update to 1.0.2 2016-01-27 21:59:00 -08:00
Jason Ekstrand
c64bc5463d anv/device: Improve the api version check to allow 1.0.X 2016-01-27 21:56:46 -08:00
Eric Anholt
3fba517bdd vc4: Throttle outstanding rendering after submission.
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more.  Prevents up to
seconds of lag with window movement in X with xcompmgr -c.  There may
be useful tuning to do in the future, but for now this gets us
usability.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Eric Anholt
2a449ce7c9 vc4: Don't record the seqno of a failed job submit.
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-01-27 20:05:37 -08:00
Francisco Jerez
4604b2871a vtn: Improve accuracy of acos approximation.
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error.  The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
2016-01-27 19:55:21 -08:00
Jason Ekstrand
7fb35a8228 An alternate arccosine implementation 2016-01-27 19:55:21 -08:00
Jason Ekstrand
983db2b804 anv/meta_resolve: Fix a bug in the meta pipeline destroy path 2016-01-27 19:48:43 -08:00
Chad Versace
9b240a1e3d anv/skl: Fix crash in 16x multisampling
We built meta clear and resolve pipelines for only up to 8x samples.
There were no 16x pipelines.
2016-01-27 18:38:15 -08:00
Chad Versace
61d3d49820 anv: Fix comment for anv_meta_state arrays
Array element i is for 2^i samples, not log2(i) samples.
2016-01-27 18:32:05 -08:00
Ben Widawsky
0e06f76a84 i965/skl: Utilize new 5th bit for gateway messages
Modify comment as spotted by Matt, and Chris Forbes

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-27 17:12:56 -08:00
Ben Widawsky
2af3281fee anv/push constants: Use constant buffer #2
SKL has a workaround which requires either some weird programming of buffer 3,
OR, just never using buffer 0. Since we don't actually use multiple constant
buffers, it's easier to just not use 0.

Only SKL requires this workaround, but there is no harm in applying it to all
platforms. The big change here is that buffer #0 is relative to dynamic state
base normally (depending upon ISTPM), where buffer 1-3 is a GPU virtual address.
2016-01-27 17:09:36 -08:00
Chad Versace
5d4f3298ae anv/meta: Implement multisample clears 2016-01-27 17:01:59 -08:00
Chad Versace
eb6fb65fd1 anv/meta: Simplify failure handling during clear init
Remove all the fine-grained cleanup in
anv_device_init_meta_clear_state(). Instead, if anything fails during
initialization, simply call anv_device_finish_meta_clear_state() and let
it clean up the partially initialized anv_meta_state::clear.
2016-01-27 17:01:56 -08:00
Chad Versace
4085f1f230 anv/meta: Implement vkCmdResolveImage
This handles multisample color images that have a floating-point or
normalized format and have a single array layer.

This does not yet handle integer formats nor multisample array images.
2016-01-27 16:55:59 -08:00
Chad Versace
8cc1e59d61 anv/meta: Add func anv_meta_get_iview_layer()
This function is just meta_blit_get_dest_view_base_array_slice(), but
moved to the shared header anv_meta.h.

Will be needed by anv_meta_resolve.c.
2016-01-27 16:52:30 -08:00
Chad Versace
8cc6f058ce anv/gen8: Begin enabling pipeline multisample state
As far as I can tell, this patch sets all pipeline multisample state
except:
    - alpha to coverage
    - alpha to one
    - the dispatch count for per-sample dispatch
2016-01-27 16:52:27 -08:00
Chad Versace
57e4a5ea99 anv/gen8: Set multisample surface state 2016-01-27 16:48:20 -08:00
Chad Versace
9b3d660878 anv/meta: Merge anv_meta_clear.h into anv_meta.h
The header was too small.
2016-01-27 16:48:20 -08:00
Kenneth Graunke
32e4c5ae30 vtn: Make tanh implementation even stupider
The dEQP "precision" test tries to verify that the reference functions

   float sinh(float a) { return ((exp(a) - exp(-a)) / 2); }
   float cosh(float a) { return ((exp(a) + exp(-a)) / 2); }
   float tanh(float a) { return (sinh(a) / cosh(a)); }

produce the same values as the built-ins.  We simplified away the
multiplication by 0.5 in the numerator and denominator, and apparently
this causes them not to match for exactly 1 out of 13,632 values.

So, put it back in, fixing the test, but making our code generation
(and precision?) worse.
2016-01-27 15:34:50 -08:00
Jason Ekstrand
8f0ef9bbeb nir/opt_algebraic: Use a more elementary mechanism for lowering ldexp 2016-01-27 15:21:28 -08:00
Jason Ekstrand
f7d6b8ccfe gen8/state: Fix QPitch for compressed textures on Broadwell 2016-01-27 15:12:42 -08:00
Jason Ekstrand
162c662585 anv/image: Use the entire image height for compressed meta blits 2016-01-27 15:12:42 -08:00
Nanley Chery
235abfb7e6 anv/image: Enlarge the image level 0 extent
The extent previously was supposed to match the mip at a given level
under the assumption that the base address would be that of the mip
as well.

Now however, the base address only matches the offset of the
containing tile. Therefore, enlarge the extent to match that of
phys_slice0, so that we don't draw/fetch in out of bounds territory.

This solution isn't perfect because the base adress isn't always at
the first tile, therefore the assumed valid memory region by the HW
contains some number of invalid tiles on two edges.
2016-01-27 15:12:42 -08:00
Jason Ekstrand
96cf5cfee1 anv/image: Minify before dividing by block dimensions 2016-01-27 15:12:42 -08:00
Jason Ekstrand
1bea1eff38 anv/meta: Don't double-call choose_buffer_format
This fixes all the renderpass tests
2016-01-27 15:12:42 -08:00
Nanley Chery
dd22b5c914 anv/meta: Modify make_image_for_buffer()'s image
Always use a valid buffer format and convert the extent to units of
elements with respect to original image format.
2016-01-27 15:12:42 -08:00
Nanley Chery
d3c1fd53e2 anv/image: Use custom VkBufferImageCopy for iview initialization
Use a custom VkBufferImageCopy with the user-provided struct as
the base. A few fields are modified when the iview is uncompressed
and the underlying image is compressed.
2016-01-27 15:12:42 -08:00
Nanley Chery
6a579ded87 anv: Add offset parameter to anv_image_view_init()
This is the offset of the tile that contains the mip specified by
isl_surf_get_image_intratile_offset_el(). Used to draw to/from the
specified mip.
2016-01-27 15:12:42 -08:00
Nanley Chery
4a0075feeb anv/meta: Calculate mip offset for compressed texture
This value will be used in a later commit.
2016-01-27 15:12:42 -08:00
Nanley Chery
1c87cb51be anv/meta: Disambiguate slice variable value
This will simplify the usage of
isl_surf_get_image_intratile_offset_el().
2016-01-27 15:12:42 -08:00
Nanley Chery
8c0c25abde gen8_state: use iview extent to program RENDER_SURFACE_STATE
When creating an uncompressed ImageView on an compressed Image, the
SurfaceFormat is updated to match the ImageView's. The surface
dimensions must also change so that the HW sees the same size image
instead of a 4x larger one.

Fixes the following error which results from running many VulkanCTS
compressed tests in one shot:
  ResourceError (vk.queueSubmit(queue, 1, &submitInfo, *m_fence):
  VK_ERROR_OUT_OF_DEVICE_MEMORY at
  vktPipelineImageSamplingInstance.cpp:921)

Makes all compressed format tests with a height > 1 pass.
2016-01-27 15:12:42 -08:00
Nanley Chery
3f01bbe7f3 anv/image: Scale iview extent by backing image
Aligns with formula's presented in Vulkan spec concerning CopyBufferToImage.
18.4 Copying Data Between Buffers and Images

This won't conflict with valid API usage, because:
1) Users are not allowed to create an uncompressed ImageView with a
compressed Image.
see: VkSpec - 11.5 Image Views - VkImageViewCreateInfo's Valid Usage box

2) If users create a differently formatted compressed ImageView with a
compressed Image, the block dimensions will still match.
see: VkSpec - 28.3.1.5 Format Compatibility Classes - Table 28.5
2016-01-27 15:12:42 -08:00
Nanley Chery
010ab34839 anv/meta: Set depth to 0 for buffer image in CopyBufferToImage()
The buffer image is a flat 2D surface. Each surface represents an
array/depth layer, therefore, the Z-offset is 0 when blitting.
2016-01-27 15:12:42 -08:00
Nanley Chery
2fb8b859f6 anv/meta: Use the uncompressed rectangle when blitting
For an uncompressed ImageView of a compressed Image, the
dimensions and offsets are all divided by the appropriate
block dimensions.

We are not yet using an uncompressed ImageView for a
compressed Image, but will do so in a future commit.
2016-01-27 15:12:42 -08:00
Nanley Chery
c3546685ed i965: Update the surface_format table for ETC formats
Enable ETC support for BDW+. In Vulkan, an array lookup on
surface_format[] is used to determine HW support for certain
formats. In contrast, Mesa dynamically populates an array
which reports this information.
2016-01-27 15:12:42 -08:00
Nanley Chery
308ec0279b anv/image: Update usages of isl_surf_get_image_offset_sa 2016-01-27 15:12:42 -08:00
Nanley Chery
02629a16d1 isl: Add logical z offset to GEN4_2D surfaces
3D surfaces in Skylake are stored with ISL_DIM_LAYOUT_GEN4_2D. Any
delta in the logical z offset causes an equivalent delta in the
surface's array layer.
2016-01-27 15:12:42 -08:00
Chad Versace
a6ecfe1dd3 isl/tests: Add some tests for intratile offsets
Test isl_surf_get_image_intratile_offset_el() in the tests:
  test_bdw_2d_r8g8b8a8_unorm_512x512_array01_samples01_noaux_tiley0
  test_bdw_2d_r8g8b8a8_unorm_1024x1024_array06_samples01_noaux_tiley0
2016-01-27 15:12:42 -08:00
Chad Versace
7ab0d2e2c0 isl: Add func isl_get_intratile_image_offset_el() 2016-01-27 15:12:42 -08:00
Chad Versace
18a83eaa8c isl/tests: Rename t_assert_offset()
Rename it to t_assert_offset_el(), clarifying that the offset in units
of surface elements, not samples.
2016-01-27 15:12:42 -08:00
Chad Versace
fa08f95ff5 isl: Add func isl_surf_get_image_offset_el()
This replaces function isl_surf_get_image_offset_sa()
2016-01-27 15:12:42 -08:00
Chad Versace
ea44d31528 isl: Fix row pitch for compressed formats
When calculating row pitch, the row's width in samples must be divided
by the format's block width. The commit below accidentally removed the
division.

    commit eea2d4d059
    Author: Chad Versace <chad.versace@intel.com>
    Date:   Tue Jan 5 14:28:28 2016 -0800
    Subject: isl: Don't align phys_slice0_sa.width twice
2016-01-27 15:12:42 -08:00
Chad Versace
45ecfcd637 isl: Add func isl_surf_get_tile_info() 2016-01-27 15:12:42 -08:00
Kenneth Graunke
9f954310e8 vtn: Fix atan2 for non-scalars.
The if/then/else block was bogus, as it can only take a scalar
condition, and we need to select component-wise.  The GLSL IR
implementation of atan2 handles this by looping over components,
but I decided to try and do it vector-wise, and messed up.

For now, just bcsel.  It means that we do the atan1 math even if
all components hit the quick case, but it works, and presumably
at least one component will hit the expensive path anyway.
2016-01-27 15:07:42 -08:00
Kenneth Graunke
f92a35d831 vtn: Fix Modf.
We were botching this for negative numbers - floor of a negative rounds
the wrong way.  Additionally, both results are supposed to retain the
sign of the original.

To fix this, just take the abs of both values, then put the sign back.
There's probably a better way to do this, but this works for now.
2016-01-27 14:21:08 -08:00
Kenneth Graunke
4acfc9effb i965: Fix SIN/COS precision problems.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-27 13:56:54 -08:00
Ilia Mirkin
34c2c7c61e glsl: only expose double mod when doubles are available
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-27 15:15:10 -05:00
Kristian Høgsberg Kristensen
b833e7a63c anv: Put back code to grow shader scratch space
This was lost in commit a71e614d33.
2016-01-27 11:36:56 -08:00
Kenneth Graunke
38a3a535eb anv: Update the device limits.
Fixes dEQP-VK.api.info.device.properties.  I haven't tested any others.
2016-01-26 23:09:45 -08:00
Jason Ekstrand
d3607351fe gen7/cmd_buffer: SCISSOR_RECT structs are tightly packed
The pointer has to be 32-byte aligned, but the structs themselves are 2
dwords each, tightly packed.
2016-01-26 22:10:14 -08:00
Jason Ekstrand
f2f03c5b65 anv/pipeline: Set MaximumVPIndex in 3DSTATE_CLIP 2016-01-26 21:52:59 -08:00
Jason Ekstrand
dc3de6f8df anv/pipeline: Only lower input indirects if EmitNoIndirectInput is set 2016-01-26 21:45:21 -08:00
Jason Ekstrand
9ac624751e anv/formats: Use is_power_of_two instead of is_rgb to determine renderability 2016-01-26 20:29:16 -08:00
Jason Ekstrand
2af3acd061 HACK/i965/surface_formats: Mark A4B4G4R4 as being supported
The table has this marked as unsupported on all gens, but I don't really
believe that given how early it is in the table.  I've tested and it seems
to work on Broadwell.  The Bspec says that it sould be renderable on SKL+
but alpha blending is questionable.

Side note: We really need to audit the format table again.
2016-01-26 20:29:16 -08:00
Jordan Justen
c20f78dc5d anv: Support swizzled formats.
Some formats require a swizzle in order to map them to actual hardware
formats.  This allows us to turn on two new Vulkan formats.
2016-01-26 20:29:16 -08:00
Jason Ekstrand
9bc72a9213 anv/image: Do swizzle remapping in anv_image.c
TODO: At some point, we really need to make an image_view_init_info that's
a flyweight and stop stuffing everything into image_view.
2016-01-26 20:23:59 -08:00
Jason Ekstrand
7d84fe9b1f HACK: Expose support for stencil blits
If someone actually tries to use them, they won't work, but at least we
don't fail to return format properties now.
2016-01-26 17:29:49 -08:00
Kenneth Graunke
32dcfc953e vtn: Delete references to IMix opcode.
This is being removed in SPIR-V.

Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=15452
2016-01-26 17:02:35 -08:00
Ben Widawsky
c5dc6cdf26 i965/skl: Utilize new 5th bit for gateway messages
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-01-26 15:44:48 -08:00
Jason Ekstrand
a1ea45b857 genX/pipeline: Don't make vertex bindings with holes 2016-01-26 15:44:18 -08:00
Jason Ekstrand
7ef0d39cb2 anv/cmd_buffer: Put base_instance in the second component 2016-01-26 15:44:02 -08:00
Francisco Jerez
6840cc1513 anv/image: clflush surface state map in anv_fill_buffer_surface_state().
Some of its users had the required clflush on non-LLC platforms, some
didn't.  Put the clflush in anv_fill_buffer_surface_state() so we
don't forget.
2016-01-26 15:14:50 -08:00
Francisco Jerez
fc7a7b31c5 anv/image: clflush the right state map in anv_fill_image_surface_state().
It was clflushing the nonrt_surface_state structure regardless of
which state structure was actually being initialized.
2016-01-26 15:14:50 -08:00
Francisco Jerez
a50dc70e21 anv/image: Upload raw buffer surface state for untyped storage image and texel buffer access. 2016-01-26 15:14:50 -08:00
Francisco Jerez
d2ec510dda anv/image: Fix image parameter initialization. 2016-01-26 15:14:50 -08:00
Francisco Jerez
d9e0b9a06a isl/gen9: Fix slice offset calculation for 1D array images.
The X component of the offset is set to the layer index times layer
height which is obviously bogus, return the vertical offset of the
slice as Y component instead.  Fixes a few image load/store tests that
use 1D arrays on SKL when forcing it to fall back to untyped reads and
writes.
2016-01-26 15:14:50 -08:00
Jason Ekstrand
cc065e0ad7 i965/fs_surface_builder: Mask signed integers after conversion 2016-01-26 15:14:50 -08:00
Jason Ekstrand
ba393c9d81 anv/image: Actually fill out brw_image_param structs 2016-01-26 15:14:50 -08:00
Jason Ekstrand
aa9987a395 anv/image_view: Add base mip and base layer fields
These will be needed by image_load_store
2016-01-26 15:14:50 -08:00
Jason Ekstrand
42cd994177 gen7: Add support for base vertex/instance 2016-01-26 14:56:37 -08:00
Jason Ekstrand
4bf3cadb66 gen8: Add support for base vertex/instance 2016-01-26 14:56:37 -08:00
Jason Ekstrand
6ba67795db nir/spirv: Add proper support for InstanceIndex 2016-01-26 14:56:37 -08:00
Jason Ekstrand
1c3b7fe1ee nir/lower_io: Lower INSTNACE_INDEX 2016-01-26 14:56:37 -08:00
Jason Ekstrand
b2b7c93318 glsl/enums: Add an enum for Vulkan instance index 2016-01-26 14:56:37 -08:00
Jason Ekstrand
da75492879 genX/pipeline: Break emit_vertex_input out into common code
It's mostly the same and contains some non-trivial logic, so it really
should be shared.  Also, we're about to make some modifications here that
we would really like to share.
2016-01-26 14:56:37 -08:00
Karol Herbst
19ae5de981 nv50/ir: fix memory corruption when spilling and redoing RA
When RA fails, and we spill, we have to clean everything up before doing
RA again. We were forgetting to reset the hi/lo linked lists - at
least the hi list is guaranteed to still have pointers to now-deleted
RIG nodes.

Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2016-01-26 17:55:06 -05:00
Kristian Høgsberg Kristensen
fe6ccb6031 anv: Remove long unused anv_aub.h 2016-01-26 14:53:00 -08:00
Kristian Høgsberg Kristensen
074a7c7d7c anv: Dirty fragment shader descriptors in meta restore
We need to reemit render targets, so dirtying VK_SHADER_STAGE_VERTEX_BIT
doesn't help us much.
2016-01-26 14:44:02 -08:00
Kristian Høgsberg Kristensen
725d969753 anv: Reemit STATE_BASE_ADDRESS after second level cmd buffers
Otherwise the primary batch will continue using the state base addresses
set by the secondary.  Fixes remaining renderpass tests.
2016-01-26 14:44:02 -08:00
Timothy Arceri
d580a979a4 glsl: remove old FINISHME
This should have been removed long ago.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-27 09:15:21 +11:00
Chad Versace
df5f6d824b anv/meta: Fix sample mask in clear pipelines
Once we begin emitting the correct sample mask,
genX_3DSTATE_SAMPLE_MASK_pack will hit an assertion if the mask contains
too many bits.
2016-01-26 11:04:44 -08:00
Marek Olšák
98cebc913c configure.ac: don't require EGL/DRM and GBM if OpenGL is disabled
This allows building VDPAU/OMX/VA drivers without OpenGL and its
dependencies.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-01-26 19:07:03 +01:00
Jan Vesely
efc4142acd r600,compute: Plug few memory leaks
v2: drop inline keyword
    drop radeon_llvm_dispose_kernel_module wrapper

v3: move definitions to .c file
    use in radeonsi

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 19:04:38 +01:00
Jan Vesely
e1dcd333e4 r600: Typos and whitespace fixes
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-26 19:01:22 +01:00
Marek Olšák
2924ca131f radeonsi: fix clover crash
caused by ce1e7784d0

Trivial.
2016-01-26 18:53:41 +01:00
Marek Olšák
af57507e4f radeonsi: fix shader precompilation for shader-db
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)

v2: also set the alpha func (trivial)
2016-01-26 18:49:50 +01:00
Ilia Mirkin
38c63abf09 glsl: add GL_OES_geometry_point_size and conditionalize gl_PointSize
For now this will be enabled in tandem with GL_OES_geometry_shader.
Should a driver come along that wants to separate them out, another
enable can be added.

Also adds the missed GL_OES_geometry_shader define in glcpp.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-26 12:36:15 -05:00
Emil Velikov
eb63640c1d glsl: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:33 +00:00
Emil Velikov
a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00
Emil Velikov
f694da80c7 compiler: move the glsl_types C wrapper alongside their C++ brethren
At a later stage we might want to split out the NIR specific [XXX:
which one was it], as to make things move obvious and rename the files
appropriately. This patch aims to split it out of nir.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:27 +00:00
Emil Velikov
24f984f64a nir: move glsl_types.{cpp,h} to compiler
Allows us to remove the SCons workaround :-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:24 +00:00
Emil Velikov
1a882fd2ee nir: move shader_enums.[ch] to compiler
This way one can reuse it in glsl, nir or other infrastructure without
pulling nir as dependency.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:20 +00:00
Emil Velikov
2f86383091 compiler: introduce a libcompiler static library
Currently it's an empty library, although it'll be used to store common
code between GLSL and NIR that is compiler specific (rather than generic
as the one in src/util).

XXX: strictly speaking we could add a python/mako parser to generate the
relevant files instead including builtin_type_macros.h in such a manner.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:07:27 +00:00
Nicolai Hähnle
41875ac4ed gallium/ddebug: add 'verbose' option
This currently just writes out the name of dump files, which can be useful
to easily correlate those files with other log outputs (driver debug output,
apitrace calls, etc.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:58:55 -05:00
Nicolai Hähnle
f4c8fa4e49 gallium/ddebug: make 'noflush' also affect 'always' mode
This changes the default behavior of 'always' mode to be consistent with
hang detection mode.

I have used this to more easily compare dumped command streams using diff.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:58:49 -05:00
Nicolai Hähnle
8894b5f008 radeonsi: use llvm.amdgcn.s.barrier instead of llvm.AMDGPU.barrier.local
The new name for the intrinsic was introduced in LLVM r258558.

v2: use ternary operator instead of preprocessor

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-26 09:57:06 -05:00
Jason Ekstrand
725fb3623f i965/compiler: Set nir_options.vertex_id_zero_based 2016-01-25 16:10:28 -08:00
Jason Ekstrand
6b6a8a99f8 HACK/i965: Default to scalar GS on BDW+ 2016-01-25 15:52:53 -08:00
Ben Widawsky
a443b5b732 i965/bxt: Fix conservative wm thread counts.
When setting the conservative thread counts, I halved everything. That isn't
correct for the wm, which has nothing to do with actual thread counts. I suck.

BXT only has 1 slice, and there is some ambiguity about subslices, so just
reserve the max possible for now. It looks like this might fix:
piglit.spec.glsl-1_50.execution.variable-indexing.gs-output-array-vec4-index-wr.bxtm64.
I kind of question why that is, but it is what Jenkins says.

Mark is current running some of the other blacklisted tests on this patch. (it
effects anything requiring scratch space).

Cc: mesa-stable <mesa-stable@lists.freedesktop.org>
Cc: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-01-25 15:51:17 -08:00
Jason Ekstrand
e462d4d815 Merge remote-tracking branch 'mattst88/nir-lower-pack-unpack' into vulkan 2016-01-25 15:50:31 -08:00
Jason Ekstrand
6bbf3814dc gen7/state: Apply min/mag filters individually for samplers
This fixes tests which apply different min and mag filters, and depend on
the min filter to be correct.
2016-01-25 15:33:08 -08:00
Ben Widawsky
9c69f4632d gen8/state: Apply min/mag filters individually for samplers
This fixes tests which apply different min and mag filters, and depend on the
min filter to be correct.
2016-01-25 15:29:18 -08:00
Jason Ekstrand
2434ceabf4 i965/fs: Feel free to spill partial reads/writes
Now that we properly handle write-masking, this should be safe.
2016-01-25 15:23:10 -08:00
Jason Ekstrand
9c0109a1f6 i965/fs: Properly write-mask spills
For unspills (scratch reads), we can just set WE_all all the time because
we always unspill into a new GRF.  For spills, we have two options: If the
instruction has a 32-bit-per-channel destination and "normal" regioning,
then we just do a regular write and it will interleave channels from
different control-flow paths properly.  If, on the other hand, the the
regioning is non-normal, then we have to unspill, run the instruction, and
spill afterwards.  In this second case, we need to do the spill with
we_ALL.
2016-01-25 15:23:10 -08:00
Kristian Høgsberg Kristensen
8e07f7942e anv: Remove a few finished finishme 2016-01-25 15:16:13 -08:00
Kristian Høgsberg Kristensen
76c096f0e7 anv: Remove stale assert
This goes back to when we didn't have the subpass number in the command
buffer begin info.
2016-01-25 15:15:59 -08:00
Matt Turner
874ede4983 i965/gen7+: Use NIR for lowering of pack/unpack opcodes. 2016-01-25 14:48:34 -08:00
Matt Turner
5deba3f00a i965/vec4: Implement nir_op_pack_uvec2_to_uint.
And mark nir_op_pack_uvec4_to_uint unreachable, since it's only produced
by lowering pack[SU]norm4x8 which the vec4 backend does not need.
2016-01-25 14:24:07 -08:00
Matt Turner
8bb22dc351 nir: Add lowering support for unpacking opcodes. 2016-01-25 14:24:07 -08:00
Matt Turner
d7781038f5 nir: Add lowering support for packing opcodes. 2016-01-25 14:24:07 -08:00
Matt Turner
6c1b3bc950 i965/fs: Implement support for extract_word.
The vec4 backend will lower it.
2016-01-25 14:24:07 -08:00
Matt Turner
26f0444ead nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.
2016-01-25 14:24:07 -08:00
Nanley Chery
2c94f659e8 anv/meta: Fix CopyBuffer when size matches HW limit
Perform a copy when the copy_size matches the HW limit (max_copy_size).
Otherwise the current behavior is that we fail the following assertion:

      assert(height < max_surface_dim);

because the values are equal.
2016-01-25 12:26:39 -08:00
Kristian Høgsberg Kristensen
c21de2bf04 anv: Don't use uninitialized barycentric_interp_modes
If we don't have a fragment shader, wm_prog_data in undefined.
2016-01-25 11:34:32 -08:00
Kristian Høgsberg Kristensen
292031a1a5 anv: Disable fs dispatch for depth/stencil only pipelines
Fixes most renderpass bugs.
2016-01-25 11:26:19 -08:00
Matt Turner
26b2cc6f3a glsl: Remove 2x16 half-precision pack/unpack opcodes.
i965/fs was the only consumer, and we're now doing the lowering in NIR.
2016-01-25 11:12:36 -08:00
Matt Turner
24d385f85c i965/fs: Switch from GLSL IR to NIR for un/packHalf2x16 lowering. 2016-01-25 11:11:56 -08:00
Matt Turner
5eb1145434 nir: Add lowering of nir_op_unpack_half_2x16. 2016-01-25 11:11:56 -08:00
Matt Turner
84166aed92 i965: Make separate nir_options for scalar/vector stages.
We'll want to have different lowering options set for scalar/vector
stages.
2016-01-25 11:11:26 -08:00
Matt Turner
b6bb3b9bcd i965: Move brw_compiler_create() to new brw_compiler.c.
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.
2016-01-25 11:11:25 -08:00
Matt Turner
b126039784 nir: Make argument order of unop_convert match binop_convert.
Strangely the return and parameter types were reversed.
2016-01-25 11:11:08 -08:00
Ian Romanick
2542871387 meta: Use internal functions to set texture parameters
_mesa_texture_parameteriv is used because (the more obvious)
_mesa_texture_parameteri just stuffs the parameter in an array and calls
_mesa_texture_parameteriv.  This just cuts out the middleman.

As a side bonus we no longer need check that ARB_stencil_texturing is
supported.  The test doesn't allow non-supporting implementations to
avoid any work, and it's redundant with the value-changed test.

Fix bug #93717 because the state restore commands at the bottom of
_mesa_meta_GenerateMipmap no longer depend on the bound state.

Fixes  piglit   arb_direct_state_access-generatetexturemipmap  with  the
changes  recently sent  to the  piglit mailing  list.  See  the bugzilla
entry for more info.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93717
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-25 10:43:47 -08:00
Ian Romanick
18b0ba340b meta/blit: Restore GL_DEPTH_STENCIL_TEXTURE_MODE state for GL_TEXTURE_RECTANGLE
Commit c246828c added the code to save and restore the stencil
texturing mode.  The restore, however, was erroneously inside the
'target != GL_TEXTURE_RECTANGLE' block.

Fixes piglit test 'arb_stencil_texturing-blit_corrupts_state
GL_TEXTURE_RECTANGLE'.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-25 10:43:47 -08:00
Ian Romanick
f7800fadff meta/copy_image: Fix typo in comment
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-25 10:43:47 -08:00
Ian Romanick
bae8a4f05b mesa: Don't include meta.h
Commit 055093e removed the call to _mesa_meta_in_progress, and meta.h
has not been necessary in src/mesa/main/enable.c since.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-25 10:43:47 -08:00
Nicolai Hähnle
1067e6eb55 radeonsi: add DCC buffer for sampler views on new CS
This fixes a VM fault and possible lockup in high memory pressure situations.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-25 10:16:12 -05:00
Nicolai Hähnle
0bacbf5b7e radeonsi: emit rw_buffers for tes_shader only if tes_shader present
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:08 -05:00
Nicolai Hähnle
2385b253c6 radeonsi: do not set the shader->key for gs copy shaders
The key for a geometry shader would be interpreted as the key for a vertex
shader further down the line, which really doesn't make sense.

This does not affect the contents of shader->key because geometry shaders
don't have any key entries anyway.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:05 -05:00
Nicolai Hähnle
46c0ba60c6 radeonsi: si_llvm_emit_vs_epilogue is never used with gs copy shaders
Hence remove the misleading branch on is_gs_copy_shader.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:02 -05:00
Nicolai Hähnle
c55b9499d5 radeonsi: move is_gs_copy_shader to si_shader_context
It is only used during shader creation now, so no need to keep it around
afterwards.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:16:00 -05:00
Nicolai Hähnle
a7754ffd31 radeonsi: replace use of is_gs_copy_shader in si_shader_vs
We now have an explicit parameter that contains the same information, and
this will allow us to get rid of is_gs_copy_shader in the si_shader struct.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:55 -05:00
Nicolai Hähnle
004fcd4230 radeonsi: ensure that VGT_GS_MODE is sent when necessary
Specifically, when the API switches from using a GS to not using a GS and then
back to using the same GS again, we do not have to re-send all the GS state,
but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part
of the VS state.

This fixes a rendering bug in Dolphin, but surely other applications are
affected as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:31 -05:00
Nicolai Hähnle
9f89bd69df radeonsi: extract the VGT_GS_MODE calculation into its own function
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-25 10:15:08 -05:00
Samuel Pitoiset
429371f22a trace: fix a segfault when tracing indirect draw calls
Like other resources, the indirect draw buffer must be unwrapped.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-24 19:53:53 +01:00
Marek Olšák
24ea81a491 Revert "mesa: enable enums for OES_geometry_shader"
This reverts commit 67e3098703.

It breaks a bunch of geometry shader tests, such as "spec@!opengl 3.2@minmax"
and others depending on the glGet queries.
2016-01-24 15:47:39 +01:00
Marek Olšák
e707b9d8ba winsys/amdgpu: optionally use buffer lists with all allocated buffers
Set RADEON_ALL_BOS=1 to use it.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-23 17:01:54 +01:00
Jason Ekstrand
a804d82ef6 anv/cmd_buffer: Zero out binding tables and samplers in state_reset
This fixes a use of an undefined value if the client uses push constants in
a stage without ever setting any descriptors on GEN8-9.
2016-01-22 22:57:05 -08:00
Jason Ekstrand
9e0bc29f80 nir/opcodes: Properly flush denormals in fquantize2f16 2016-01-22 22:18:31 -08:00
Jason Ekstrand
89672d81f3 i965/nir: Properly flush denormals in nir_op_fquantize2f16 2016-01-22 22:18:31 -08:00
Kenneth Graunke
ae9f73ea40 glsl: Conditionalize atan2 math.
In the old hand-writen implementation of atan2, the calculation of
atan(y/x) was performed conditionally in the "then" block of the
outermost if statement.  I believe I accidentally lifted this out
into unconditional code when converting to IR builder.

For reference, the original hand-written IR is visible in commit
722eff674b.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Erik Faye-Lund <kusmabite@gmail.com>
2016-01-22 21:03:00 -08:00
Jason Ekstrand
2bfb9f29b8 anv/format: Add a helpful comment about format names 2016-01-22 19:14:41 -08:00
Jason Ekstrand
259e1bdf79 anv/formats: Add support for 3 more formats 2016-01-22 19:03:27 -08:00
Jason Ekstrand
0b6c1275d0 anv/pipeline: Add a default L3$ setup 2016-01-22 19:02:55 -08:00
Rob Herring
7ee8954753 virgl: enable building on Android
This is just a copy-n-paste and rename of vc4 Android makefiles.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-23 12:35:29 +10:00
Rob Herring
657dc4f533 virtio_gpu: Add PCI ID to driver map
Add the virtio-gpu PCI ID so the driver probing works.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-23 12:35:24 +10:00
Chad Versace
99a4885328 anv/formats: Rename ambiguous func parameter
vkGetPhysicalDeviceImageFormatProperties has multiple 'flags'
parameters.
2016-01-22 17:51:24 -08:00
Chad Versace
149d5ba64d anv/formats: Advertise multisample formats
Teach vkGetPhysicalDeviceImageFormatProperties() to advertise
multisampled formats.
2016-01-22 17:50:15 -08:00
Chad Versace
d96d78c3b6 anv/image: Drop assertion that samples == 1 2016-01-22 17:19:57 -08:00
Chad Versace
fda074b23f isl: Fix gen8_choose_msaa_layout()
Gen8 requires any Y tiling, not any *standard* Y tiling.
2016-01-22 17:19:57 -08:00
Chad Versace
2fa1f745ea isl: Add func isl_tiling_is_any_y() 2016-01-22 17:19:57 -08:00
Chad Versace
fa5f45e8aa anv/meta: Assert correct sample counts for blit funcs
Add assertions to:
    anv_CmdBlitImage
    anv_CmdCopyImage
    anv_CmdCopyImageToBuffer
    anv_CmdCopyBufferToImage
2016-01-22 17:19:57 -08:00
Chad Versace
dfcb4ee6df anv: Add anv_image::samples
It's set but not yet used.
2016-01-22 17:19:57 -08:00
Chad Versace
1c5d7b38e2 anv: Use isl_device_get_sample_counts()
Use it in vkGetPhysicalDeviceProperties.
2016-01-22 17:19:57 -08:00
Chad Versace
14b753f666 isl: Add func isl_device_get_sample_counts() 2016-01-22 17:19:57 -08:00
Nanley Chery
d4de918ad0 gen8/state: Remove SKL special-casing for MinimumArrayElement
MinimumArrayElement carries the same meaning for BDW and SKL.
Suggested by Jason.

No regressions in dEQP-VK.pipeline.image.view_type.cube_array.*
Fixes a number of cube tests, including cube_array_base_slice
and cube_base_slice tests.
2016-01-22 17:10:14 -08:00
Chad Versace
6a03c69adb anv/state: Dedupe code for lowering surface format
Add helper anv_surface_format().
2016-01-22 16:49:17 -08:00
Francisco Jerez
11d5c1905c anv/meta: Set sampler type and instruction arrayness consistently in blit shader. 2016-01-22 16:43:18 -08:00
Francisco Jerez
bf151b8892 anv/meta: Fix meta blit fragment shader for 1D arrays. 2016-01-22 16:43:15 -08:00
Jason Ekstrand
53b83899e0 genX/state: Set CubeSurfaceControlMode to OVERRIDE
This makes it act like the address mode is set to TEXCOORDMODE_CUBE
whenever this sampler is combined with a cube surface.  This *should* be
what we need for Vulkan.  Interestingly, the PRM contains a programming
note for this field that says simply, "This field must be set to
CUBECTRLMODE_PROGRAMMED".  However, emprical evidence suggests that it does
what the PRM says it does and OVERRIDE is just fine.
2016-01-22 16:34:13 -08:00
Jason Ekstrand
35879fe829 gen8/state: Divide depth by 6 for cube maps for GEN8
For Broadwell cube maps, MinimumArrayElement is in terms of 2d slices (a
multiple of 6) but Depth is in terms of whole cubes.
2016-01-22 16:14:54 -08:00
Nanley Chery
3cd8c0bb04 gen8_state: Enable all cube faces
These fields are ignored for non-cube surfaces. For cube surfaces
these fields should be enabled when using TEXCOORDMODE_CLAMP and
TEXCOORDMODE_CUBE.

TODO: Determine if these are the only two modes used in Vulkan.
2016-01-22 16:12:52 -08:00
Kenneth Graunke
b3340cd32a i965: Implement a drirc workaround for broken dual color blending.
OpenGL's dual color blending feature was specified so that an
implementation could support both multiple render targets (MRT) and
dual source blending.  Fragment shader outputs specify both "location"
(the render target number) and "index" (either color 0 or 1).

I believe DirectX only has the notion of "location" - if using dual
color blending, location 0 or 1 will specify the operands.  If not,
then location means the render target index.  The two features can't
be used together.

As such, some applications mistakenly try to use <loc = 0, index = 0>
and <loc = 1, index = 0> in a shader used for dual color blending with
a single render target, rather than the correct <loc = 0, index = 0>
and <loc = 0, index = 1>.

In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug.
Unigine is aware of the problem, and quickly developed a fix, but has
not bothered to change the download link on their website to a working
copy in over a year.  People were still using the broken version and
complaining.  We tried working around this by disabling dual color
blending, but that apparently hurts performance, and people were once
again unhappy.

On i965, dual source blending is achieved by using different framebuffer
write messages than normal rendering.  So, we have to compile different
code for the two cases.  We're not being pedantic: we actually have to
know in order to function.

Normally, dual source blending is detectable in the shader: if a shader
has an output with index = 1, then it's meant for blending, not MRT.
With the broken inputs, they're indistinguishable, so we can only tell
by looking at the current GL state.

This patch implements a new drirc workaround:

   export dual_color_blend_by_location=true

which makes the i965 driver detect when OpenGL state is configured for
dual source blending, and recompile the fragment shader to use the right
messages.  In that case, we allow either location = 1 or index = 1 to
specify the second source for the blending equations.

It also re-enables GL_ARB_blend_func_extended for Unigine.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-22 14:14:26 -08:00
Marek Olšák
cd9c07e7cd radeonsi: add ETC1 support for Stoney
It's a subset of ETC2. Tested.

For more information, see page 42 and onward:
http://www.graphicshardware.org/previous/www_2007/presentations/strom-etc2-gh07.pdf

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-22 22:05:42 +01:00
Marek Olšák
b3bac55621 radeonsi: change LLVM intrinsics for BREV, CLAMP, EX2
Requested by Matt Arsenault.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Marek Olšák
ce1e7784d0 radeonsi: add max waves / SIMD to shader stats (v2)
v2: account for LDS usage in PS
    the limit is per SIMD, not per CU

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Marek Olšák
5944f3d2fc radeonsi: enable late VS allocation (v3)
v2: take the number of CUs into account
v3: change in LS allocation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Marek Olšák
97648229e4 radeonsi: allow using all CUs for tessellation and on-chip GS (v2)
v2: After more discussion with hw teams, the kernel already contains the
    optimal settings allowing us to use all CUs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 22:05:42 +01:00
Jeremy Huddleston Sequoia
7c99557f53 Revert "mesa: Deal with size differences between GLuint and GLhandleARB in GetAttachedObjectsARB"
This reverts commit 739ac3d39d.

This will be done a differnet way.
See http://lists.freedesktop.org/archives/mesa-dev/2016-January/105642.html
2016-01-22 13:02:01 -08:00
Jason Ekstrand
107a109d1c isl/format_layout: R11G11B10_FLOAT is unsigned 2016-01-22 11:57:49 -08:00
Jason Ekstrand
e5558ffa64 anv/image: Move common code to anv_image.c 2016-01-22 11:57:01 -08:00
Jason Ekstrand
84612f4014 anv/state: Refactor surface state setup into a "fill" function 2016-01-22 11:40:56 -08:00
Francisco Jerez
448285ebf2 anv/state: Add missing clflushes for storage image surface state. 2016-01-22 11:12:09 -08:00
Francisco Jerez
d533c3796d anv/state: Factor out surface state calculation from genX_image_view_init.
Some fields of the surface state template were dependent on the
surface type, which is dependent on the usage of the image view, which
wasn't known until the bottom of the function after the template had
been constructed.  This caused failures in all image load/store CTS
tests using cubemaps.  Refactor the surface state calculation into a
function that is called once for each required usage.
2016-01-22 11:12:09 -08:00
Jason Ekstrand
16780632c2 i965/nir: Temporariliy disable mul+add fusion
We don't want to do this in the long-run but it's needed for passing the
NoContraction tests at the moment.  Eventually, we want to plumb this
through NIR properly.
2016-01-22 11:10:54 -08:00
Ben Widawsky
315cda6715 i965/fs: Remove unused count from vs urb setup
This was originally removed here:
commit 031d350132
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Tue Aug 25 16:59:12 2015 -0700

    i965/vs: Unify URB entry size/read length calculations between backends.

Then added back:
commit bd198b9f0a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Aug 14 16:01:33 2015 -0700

    i965/vs: Simplify fs_visitor's ATTR file.

Note that the authorship dates are out of order, but the above reflects the
order of the commit dates.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-22 10:38:41 -08:00
Chad Versace
d9abbbe0d8 isl: Fix indentation of isl_format_layout comment 2016-01-22 09:48:11 -08:00
Chad Versace
65f3c420c3 isl/tests: Give tests less cryptic names 2016-01-22 09:46:48 -08:00
Chad Versace
f9d4d09549 isl: Fix isl_surf_get_image_offset_sa for gen4_3d layout
Bug found by unit test
test_bdw_3d_r8g8b8a8_unorm_256x256x256_levels09_tiley0.
2016-01-22 09:45:22 -08:00
Chad Versace
891ed5ca8c isl/tests: Add test for bdw 3d surface
test_bdw_3d_r8g8b8a8_unorm_256x256x256_levels09_tiley0

Currently fails.
2016-01-22 09:45:21 -08:00
Nicolai Hähnle
d76bd85c35 Revert "radeonsi: fix discard-only fragment shaders (v2)"
This reverts commit 843855bbf0.

It became redundant due to Marek's earlier pushed 8667a1ae which achieves
the same thing.
2016-01-22 12:40:26 -05:00
Nicolai Hähnle
843855bbf0 radeonsi: fix discard-only fragment shaders (v2)
When a fragment shader is used that has no outputs but does conditional
discard (KILL_IF), all fragments are killed without this patch.

By comparing various register settings, my conclusion is that the exec mask
is either not properly forwarded to the DB by NULL exports or ends up being
unused, at least when there is _only_ a NULL export (the ISA documentation
claims that NULL exports can be used to override a previously exported exec
mask).

Of the various approaches I have tried to work around the problem, this one
seems to be the least invasive one.

v2: take discard by alpha test into account as well

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-22 11:59:50 -05:00
Marta Lofstedt
3e640c256a mesa: Update _mesa_has_geometry_shaders
Updates the _mesa_has_geometry_shaders function to also look
for OpenGL ES 3.1 contexts that has OES_geometry_shader enabled.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-22 17:13:55 +01:00
Marta Lofstedt
ae4e4ba06d glsl: add support for GL_OES_geometry_shader
This adds glsl support of GL_OES_geometry_shader for
OpenGL ES 3.1.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-22 17:13:55 +01:00
Marta Lofstedt
67e3098703 mesa: enable enums for OES_geometry_shader
Enable GL_OES_geometry_shader enums for OpenGL ES 3.1.

V4: EXTRA tokens updated according to comments from Ilia Mirkin.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-22 17:13:55 +01:00
Marta Lofstedt
af5a14d1e0 glapi: add GL_OES_geometry_shader extension
Add xml definitions for the GL_OES_geometry_shader extension
and expose the extension for OpenGL ES 3.1.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-22 17:13:55 +01:00
Emil Velikov
bb58b59998 docs: correct 11.1.1 release year
Seems like I wasn't ready to let 2015 go :-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-22 15:50:48 +00:00
Emil Velikov
45c5000ffc docs: add news item and link release notes for 11.0.9
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-22 15:49:47 +00:00
Emil Velikov
87b0a52de8 docs: add sha256 checksums for 11.0.9
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-22 15:47:12 +00:00
Emil Velikov
51e8152186 docs: add release notes for 11.0.9
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-22 15:47:11 +00:00
Chad Versace
fbc87ce4be isl/tests: Remove copy-paste assertion 2016-01-22 07:18:04 -08:00
Chad Versace
63d999b762 isl/tests: Fix build
isl_device_init() acquired a new param for bit6 swizzling.
2016-01-22 07:17:57 -08:00
Marek Olšák
a9d5842ec0 radeonsi: add ETC2 support for Stoney
Tested and working.
2016-01-22 15:36:14 +01:00
Marek Olšák
6f428328d3 radeonsi: implement SAMPLEPOS system value without a constant buffer load
We always get per-sample input position.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
2b66bc87d4 winsys/amdgpu: compute num_good_compute_units correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
0d8e4f958f gallium/radeon: rename max_compute_units -> num_good_compute_units
radeon sets this correctly, but not amdgpu

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
99dfeb01bd radeonsi: disable SPI color outputs the shader doesn't write
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
f6360de8c0 radeonsi: use all SPI color formats
because not using SPI_SHADER_32_ABGR doubles fill rate.

We should also get optimal performance if alpha isn't needed or blending
isn't enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
933e3c4145 radeonsi: use 32_AR for alpha-to-coverage without a color buffer
This avoids the fp16 packing instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
f1f0158837 radeonsi: add shader conversion code for all SPI color formats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
e28b8530b9 radeonsi: set CB_SHADER_MASK according to SPI color formats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
8667a1aea2 radeonsi: use SPI_SHADER_COL_FORMAT fields instead of export_16bpc
This does change the behavior slightly:
  If a shader writes COLOR[i] and that color buffer isn't bound,
  the shader will export MRT_NULL instead and discard the IR tree that
  calculates the output. The only exception is alpha-to-coverage, which
  requires an alpha export.

v2: - update a comment about 16BPC
    - account for MRTZ when when fixing alpha-test/kill

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Marek Olšák
0446ea9d08 radeonsi: don't enable blending if colormask == 0
most likely useless, but doesn't hurt

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-22 15:02:40 +01:00
Ilia Mirkin
dac2964f3e glsl: always compute proper varying type, irrespective of varying packing
Normally there's a producer and consumer, and the producer var gets
picked. In both the vertex->gs and tes->gs cases, that's the un-arrayed
version.

In the SSO case, however, there is no producer. So we picked the arrayed
GS variable, and as a result, used more slots than we should. More
critically, these slots would also no longer line up with the producer's
calculation. To fix this, we need to fix up the type of the variable
based on stage no matter what.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93650
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-22 08:48:27 -05:00
Emil Velikov
54702c2fa1 egl/dri2: expose srgb configs when KHR_gl_colorspace is available
Otherwise the user has no way of using it, and we'll try to access the
linear one.

v2:
 - Bail out when KHR_gl_colorspace is missing and srgb is set (Marek)

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Fixes: c2c2e9ab604(egl: implement EGL_KHR_gl_colorspace (v2))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91596
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
2016-01-22 11:55:54 +00:00
Emil Velikov
f29a772a7e targets/dri: android: use WHOLE static libraries
By using whole static libraries the android buildsystem provides
whole-archive (alike) solution. This means that we don't need to worry
about the order of the static libraries and any reverse, recursive or
circular dependencies that they have between one another.

Without this the linker will discard any unused hunks of one library
and we'll end up with unresolved symbols as those are required by
another static library. This issue has become more prominent with the
introduction of pipe-loader.

Whole static libraries has been used in i915/i965 for a very long
time, so we might do the same.

v2:
 - Better commit message (Ilia)
 - Keep external dependencies as [normal] static libs (Mauro)

Cc: mesa-stable@lists.freedesktop.org
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-22 11:55:34 +00:00
Emil Velikov
72fda2b710 i915: correctly parse/set the context flags
With an earlier commit we've spit the flags parsing to a separate
function, but forgot to update all the dri modules to use it.

Noticed when we've enabled KHR_debug for every dri module - fdo#93048

Fixes: 38366c0c6e "dri_util: Don't assume __DRIcontext->driverPrivate
is a gl_context"
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-01-22 11:54:01 +00:00
Iago Toral Quiroga
ab0c7c0829 glsl/lower_instructions: fix regression in dldexp_to_arith
The commit b4e198f47f changed the offset and bits parameters of the
bitfield insert operation from scalars to vectors. However, the lowering
of ldexp on doubles operates on each vector component and emits scalar
code (since it has to deal with the lower and upper 32-bit chunks of
each double component), so it needs its bits and offset parameters to
be scalars.

Fixes fp64 regression (crash) in:
spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-ldexp-dvec4.shader_test

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-22 08:14:11 +01:00
Francisco Jerez
2e54381622 anv/batch_chain: Fix patching up of block pool relocations on Gen8+.
Relocations are 64 bits on Gen8+.  Most CTS tests that send
non-trivial work to the GPU would fail when run from a single deqp-vk
invocation because they were effectively relying on reloc presumed
offsets to be wrong so the kernel would come and apply relocations
correctly.
2016-01-21 16:30:44 -08:00
Jason Ekstrand
13aaf90048 nir/spirv: Ignore cull distance 2016-01-21 16:20:39 -08:00
Jason Ekstrand
13858a1c1a nir/lower_system_values: Use the correct invication id for CS 2016-01-21 16:20:39 -08:00
Jason Ekstrand
d8c0e0805b nir/spirv: Properly assign locations to split structures 2016-01-21 16:20:39 -08:00
Jason Ekstrand
514507825c nir/spirv: Improve handling of variable loads and copies
Before we were asuming that a deref would either be something in a block or
something that we could pass off to NIR directly.  However, it is possible
that someone would choose to load/store/copy a split structure all in one
go.  We need to be able to handle that.
2016-01-21 16:20:39 -08:00
Jason Ekstrand
7e5e64c8a9 nir/spirv: Make vectors a proper array time with an array_element
This makes dealing with single-component derefs easier
2016-01-21 16:20:39 -08:00
Jason Ekstrand
a8af0f536c nir/spirv: Rework access chains a bit to allow for literals
This makes them much easier to construct because you can also just specify
a literal number and it doesn't have to be a valid SPIR-V id.
2016-01-21 16:20:39 -08:00
Jason Ekstrand
5d9a6fd526 vtn/variables: Compact local loads/stores into one function
This is similar to what we did for block loads/stores.
2016-01-21 16:20:39 -08:00
Jason Ekstrand
b298743d7b nir/spirv: Add an actual variable struct to spirv_to_nir
This allows us, among other things, to do structure splitting on-the-fly to
more correctly handle input/output structs.
2016-01-21 16:20:39 -08:00
Jason Ekstrand
2892693d56 nir/spirv: Split variable handling out into its own file
It's 1300 lines all by itself and it will only grow.
2016-01-21 16:20:39 -08:00
Jason Ekstrand
1112bf633f nir/spirv: Rework access chains
Previously, we were creating nir_deref's immediately.  Now, instead, we
have an intermediate vtn_access_chain structure.  While a little more
awkward initially, this will allow us to more easily do structure splitting
on-the-fly.
2016-01-21 16:18:37 -08:00
Eduardo Lima Mitev
263f829d2e i965/vec4/tcs: Return NULL instead of false in brw_compile_tcs()
brw_compile_tcs() is expected to return 'const unsigned *', so the compiler
complains.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-21 16:16:26 -08:00
Kenneth Graunke
824f776355 nir/spirv: Implement ModfStruct opcode. 2016-01-21 14:57:47 -08:00
Kenneth Graunke
f89d5cb807 nir/spirv: Delete stray fmod remnants.
Jason left these stray code fragments in
22804de110.
2016-01-21 14:54:20 -08:00
cstout
13b87e02b9 freedreno/a4xx: Add support for adreno 430
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-21 17:20:11 -05:00
Christian Gmeiner
66672e791c freedreno: make opc array static const
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-21 17:20:11 -05:00
Rob Clark
bc1a37378c freedreno: implement emit_string_marker
Writes string to cmdstream in payload of a no-op packet.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-21 17:20:11 -05:00
Rob Clark
d6408372eb gallium: add GREMEDY_string_marker
Since the GREMEDY extensions are normally only exposed by the gremedy
debugger (and could possibly trigger debug paths in the app), we don't
expose the extension by default, but instead only with
ST_DEBUG=gremedy.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-21 17:19:56 -05:00
Rob Clark
a6a99fbf05 mesa: wire up EmitStringMarker for KHR_debug
The extension spec[1] describes DEBUG_TYPE_MARKER as "Annotation of the
command stream".  So for DEBUG_TYPE_MARKER, also pass the buf to the
driver's EmitStringMarker() to be inserted in the command stream.

[1] https://www.opengl.org/registry/specs/KHR/debug.txt

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-21 17:19:05 -05:00
Rob Clark
1f7a96e005 mesa: add GREMEDY_string_marker
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-21 17:19:05 -05:00
Kristian Høgsberg Kristensen
ac60e98a58 vk: Do render cache flush for GEN8+
This is needed for SKL as well.
2016-01-21 14:18:52 -08:00
Kristian Høgsberg Kristensen
9eab8fc683 vk: Emit surface state base address before renderpass
If we're continuing a render pass, make sure we don't emit the depth and
stencil buffer addresses before we set the state base addresses.

Fixes crucible func.cmd-buffer.small-secondaries
2016-01-21 14:18:52 -08:00
Neil Roberts
cbf0e64ee1 texobj: Remove redundant checks that the texture cube faces match size
The texture mipmap completeness checking code was checking whether all
of the faces have the same size. However this is pointless because the
code just above it checks whether the face has the expected size
calculated for the mipmap level anyway so the error condition could
never be reached. This patch just removes it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-21 21:45:53 +00:00
Neil Roberts
666d96d169 texobj: Fix the completeness checks for cube textures
According to the GL 1.4 spec section 3.8.10, a cubemap texture is only
complete if:

• The level base arrays of each of the six texture images making up
  the cube map have identical, positive, and square dimensions.
• The level base arrays were each specified with the same internal
  format.
• The level base arrays each have the same border width.

Previously the texture completeness code was only checking the first
point. This patch makes it additionally check the other two.

This fixes the following two dEQP tests:

deqp-gles2.functional.texture.completeness.cube.format_mismatch_rgba_rgb_level_0_neg_z
deqp-gles2.functional.texture.completeness.cube.format_mismatch_rgb_rgba_level_0_pos_z

And also this Piglit test:

spec/!opengl 2.0/incomplete-cubemap-format

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93792
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-21 21:45:18 +00:00
Grazvydas Ignotas
0153ff8379 r600g: don't leak driver const buffers
The buffers are referenced from r600_update_driver_const_buffers()
 -> r600_set_constant_buffer() -> u_upload_data(), but nothing
ever releases the reference. Similar case with driver_consts.
Found using valgrind.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-21 15:36:24 -05:00
Kristian Høgsberg Kristensen
c5490d0277 vk: Fix indirect push constants
This currently sets the base and size of all push constants to the
entire push constant block. The idea is that we'll use the base and size
to eventually optimize the amount we actually push, but for now we don't
do that.
2016-01-21 11:10:11 -08:00
Kristian Høgsberg Kristensen
83c86e09a8 Merge remote-tracking branch 'jekstrand/wip/i965-uniforms' into vulkan 2016-01-21 11:09:58 -08:00
Jeremy Huddleston Sequoia
739ac3d39d mesa: Deal with size differences between GLuint and GLhandleARB in GetAttachedObjectsARB
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Nicolai Hähnle <nhaehnle@gmail.com>
2016-01-21 09:18:06 -08:00
Jeremy Huddleston Sequoia
b20d6bf96d mesa: Fix format warnings
main/shaderapi.c:1318:51: warning: format specifies type 'unsigned int' but the argument has type 'GLhandleARB' (aka 'unsigned long') [-Wformat]
      _mesa_debug(ctx, "glDeleteObjectARB(%u)\n", obj);
                                          ~~      ^~~
                                          %lu

Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-21 09:18:06 -08:00
Jeremy Huddleston Sequoia
a087a09fa8 mesa: Fix some function prototype mismatching
main/api_exec.c:543:36: warning: incompatible pointer types passing 'void (GLhandleARB, GLuint, const GLcharARB *)' (aka 'void (unsigned long, unsigned int, const char *)') to
parameter of
      type 'void (*)(GLuint, GLuint, const GLchar *)' (aka 'void (*)(unsigned int, unsigned int, const char *)') [-Wincompatible-pointer-types]
      SET_BindAttribLocation(exec, _mesa_BindAttribLocation);
                                   ^~~~~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:7590:88: note: passing argument to parameter 'fn' here
static inline void SET_BindAttribLocation(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint, GLuint, const GLchar *)) {
                                                                                       ^
main/api_exec.c:547:31: warning: incompatible pointer types passing 'void (GLhandleARB)' (aka 'void (unsigned long)') to parameter of type 'void (*)(GLuint)' (aka 'void (*)(unsigned
int)')
      [-Wincompatible-pointer-types]
      SET_CompileShader(exec, _mesa_CompileShader);
                              ^~~~~~~~~~~~~~~~~~~
./main/dispatch.h:7612:83: note: passing argument to parameter 'fn' here
static inline void SET_CompileShader(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint)) {
                                                                                  ^
main/api_exec.c:568:33: warning: incompatible pointer types passing 'void (GLhandleARB, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLcharARB *)' (aka 'void (unsigned long,
unsigned int,
      int, int *, int *, unsigned int *, char *)') to parameter of type 'void (*)(GLuint, GLuint, GLsizei, GLsizei *, GLint *, GLenum *, GLchar *)' (aka 'void (*)(unsigned int,
unsigned int,
      int, int *, int *, unsigned int *, char *)') [-Wincompatible-pointer-types]
      SET_GetActiveAttrib(exec, _mesa_GetActiveAttrib);
                                ^~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:7711:85: note: passing argument to parameter 'fn' here
static inline void SET_GetActiveAttrib(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint, GLuint, GLsizei , GLsizei *, GLint *, GLenum *, GLchar *)) {
                                                                                    ^
main/api_exec.c:571:35: warning: incompatible pointer types passing 'GLint (GLhandleARB, const GLcharARB *)' (aka 'int (unsigned long, const char *)') to parameter of type
      'GLint (*)(GLuint, const GLchar *)' (aka 'int (*)(unsigned int, const char *)') [-Wincompatible-pointer-types]
      SET_GetAttribLocation(exec, _mesa_GetAttribLocation);
                                  ^~~~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:7744:88: note: passing argument to parameter 'fn' here
static inline void SET_GetAttribLocation(struct _glapi_table *disp, GLint (GLAPIENTRYP fn)(GLuint, const GLchar *)) {
                                                                                       ^
main/api_exec.c:585:33: warning: incompatible pointer types passing 'void (GLhandleARB, GLsizei, GLsizei *, GLcharARB *)' (aka 'void (unsigned long, int, int *, char *)') to
parameter of
      type 'void (*)(GLuint, GLsizei, GLsizei *, GLchar *)' (aka 'void (*)(unsigned int, int, int *, char *)') [-Wincompatible-pointer-types]
      SET_GetShaderSource(exec, _mesa_GetShaderSource);
                                ^~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:7788:85: note: passing argument to parameter 'fn' here
static inline void SET_GetShaderSource(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint, GLsizei, GLsizei *, GLchar *)) {
                                                                                    ^
main/api_exec.c:597:29: warning: incompatible pointer types passing 'void (GLhandleARB)' (aka 'void (unsigned long)') to parameter of type 'void (*)(GLuint)' (aka 'void (*)(unsigned
int)')
      [-Wincompatible-pointer-types]
      SET_LinkProgram(exec, _mesa_LinkProgram);
                            ^~~~~~~~~~~~~~~~~
./main/dispatch.h:7909:81: note: passing argument to parameter 'fn' here
static inline void SET_LinkProgram(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint)) {
                                                                                ^
main/api_exec.c:628:30: warning: incompatible pointer types passing 'void (GLhandleARB, GLsizei, const GLcharARB *const *, const GLint *)' (aka
      'void (unsigned long, int, const char *const *, const int *)') to parameter of type 'void (*)(GLuint, GLsizei, const GLchar *const *, const GLint *)' (aka 'void (*)(unsigned
int, int,
      const char *const *, const int *)') [-Wincompatible-pointer-types]
      SET_ShaderSource(exec, _mesa_ShaderSource);
                             ^~~~~~~~~~~~~~~~~~
./main/dispatch.h:7920:82: note: passing argument to parameter 'fn' here
static inline void SET_ShaderSource(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint, GLsizei, const GLchar * const *, const GLint *)) {
                                                                                 ^
main/api_exec.c:653:28: warning: incompatible pointer types passing 'void (GLhandleARB)' (aka 'void (unsigned long)') to parameter of type 'void (*)(GLuint)' (aka 'void (*)(unsigned
int)')
      [-Wincompatible-pointer-types]
      SET_UseProgram(exec, _mesa_UseProgram);
                           ^~~~~~~~~~~~~~~~
./main/dispatch.h:8173:80: note: passing argument to parameter 'fn' here
static inline void SET_UseProgram(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint)) {
                                                                               ^
main/api_exec.c:655:33: warning: incompatible pointer types passing 'void (GLhandleARB)' (aka 'void (unsigned long)') to parameter of type 'void (*)(GLuint)' (aka 'void (*)(unsigned
int)')
      [-Wincompatible-pointer-types]
      SET_ValidateProgram(exec, _mesa_ValidateProgram);
                                ^~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:8184:85: note: passing argument to parameter 'fn' here
static inline void SET_ValidateProgram(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint)) {

main/dlist.c:9457:26: warning: incompatible pointer types passing 'void (GLhandleARB)' (aka 'void (unsigned long)') to parameter of type 'void (*)(GLuint)' (aka 'void (*)(unsigned
int)')
      [-Wincompatible-pointer-types]
   SET_UseProgram(table, save_UseProgramObjectARB);
                         ^~~~~~~~~~~~~~~~~~~~~~~~
./main/dispatch.h:8173:80: note: passing argument to parameter 'fn' here
static inline void SET_UseProgram(struct _glapi_table *disp, void (GLAPIENTRYP fn)(GLuint)) {
                                                                               ^
1 warning generated.

Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-21 09:18:06 -08:00
Andreas Boll
5d4b20267d glapi: Build glapi_gentable.c only on Darwin
Removes the public symbol _glapi_create_table_from_handle from
libGL.so.1.2.0 on all platforms except Darwin.

Since the symbol is not used on other platforms it makes sense to
build glapi_gentable.c only on Darwin.

As a side effect it accelerates the build a bit and reduces the size
of libGL.so.1.2.0 as follows:

size lib/libGL.so.1.2.0 on my system shows
   text	   data	    bss	    dec	    hex	filename
 469211	  21848	   2720	 493779	  788d3	lib/libGL.so.1.2.0 before
 420988	  11240	   2720	 434948	  6a304	lib/libGL.so.1.2.0 after

A little bit of history:

_glapi_create_table_from_handle was introduced in

commit 85937f4c0d
Author: Jeremy Huddleston <jeremyhu@apple.com>
Date:   Thu Jun 9 16:59:49 2011 -0700

    glapi: Add API that can create a _glapi_table from a dlfcn handle

    Example usage:

    void *handle = dlopen(opengl_library_path, RTLD_LOCAL);
    struct _glapi_table *disp = _glapi_create_table_from_handle(handle,
"gl");

    Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>

and the only user in mesa was added in

commit f35913b96e
Author: Jeremy Huddleston <jeremyhu@apple.com>
Date:   Thu Jun 9 17:29:51 2011 -0700

    apple: Use _glapi_create_table_from_handle to initialize our
dispatch table

    Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>

gl_gentable.py was also used for XQuartz in xserver 1.11 - 1.14.

v2: Fix typos in commit message
    Add missing XORG_GLAPI_OUTPUTS += \ into src/mapi/glapi/gen/Makefile.am
    Add glapi_gentable.c to EXTRA_DIST for inclusion in the release
    tarball

v3: Fix commit message: s/gl_gentable.c/glapi_gentable.c/

Reported-by: Arlie Davis <arlied@google.com>
Cc: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-21 15:04:02 +01:00
Arlie Davis
daa775b58e mesa: Reduce libGL.so binary size by about 15%
This patch significantly reduces the size of the libGL.so binary. It does
not change the (externally visible) behavior of libGL.so at all.

gl_gentable.py generates a function, _glapi_create_table_from_handle.
This function allocates a large dispatch table, consisting of 1300 or so
function pointers, and fills this dispatch table by doing symbol lookups
on a given shared library.  Previously, gl_gentable.py would generate a
single, very large _glapi_create_table_from_handle function, with a short
cluster of lines for each entry point (function).  The idiom it generates
was a NULL check, a call to snprintf, a call to dlsym / GetProcAddress,
and then a store into the dispatch table.  Since this function processes
a large number of entry points, this code is duplicated many times over.

We can encode the same information much more compactly, by using a lookup
table.  The previous total size of _glapi_create_table_from_handle on x64
was 125848 bytes.  By using a lookup table, the size of
_glapi_create_table_from_handle (and the related lookup tables) is reduced
to 10840 bytes.  In other words, this enormous function is reduced by 91%.
The size of the entire libGL.so binary (measured when stripped) itself drops
by 15%.

So the purpose of this change is to reduce the binary size, which frees up
disk space, memory, etc.

size lib/libGL.so.1.2.0 on my system shows (Andreas)
   text	   data	    bss	    dec	    hex	filename
 565947	  11256	   2720	 579923	  8d953	lib/libGL.so.1.2.0 before
 469211	  21848	   2720	 493779	  788d3	lib/libGL.so.1.2.0 after

v2: Incorporate Matt's feedback.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Tested-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
2016-01-21 15:03:53 +01:00
Jordan Justen
b1a7a27d60 nir/spirv: Handle compute shared atomics
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
a7e5b683ca nir/spirv: Support workgroup (shared) variable translation
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
bc035db3c8 anv/gen8: Set SLM size in interface descriptor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
819cb69434 anv/gen8+9: Invalidate color calc state when switching to the GPGPU pipeline
Port 044acb9256 to anv.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
19830031cb anv/gen8: Enable SLM in L3 cache control register
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
97b09a9268 anv/pipeline: Set size of shared variables in prog_data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
86daceb7f2 i965/nir: Lower nir compute shader shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
ca55817fa1 nir: Lower shared var atomics during nir_lower_io
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
36157cd5ea nir: Add support for lowering load/stores of shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
7a9a54b5c8 nir: Add atomic operations on variables
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
10db985fa0 nir: Add compute shader shared variable storage class
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
65a5407931 nir/print: Add space after shader_storage var mode
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Jordan Justen
9f4a72c9e3 i965/fs/nir: Move shared variable load/store to nir_emit_cs_intrinsic
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-21 00:31:29 -08:00
Ilia Mirkin
daa0fd7843 nv50/ir: 64-bit splitting fixes
Take reading shader outputs into account, and use setFlagsDef for the
carry since we rely on having i->flagsDef being set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:34 -05:00
Ilia Mirkin
c0b66d96d7 gk110/ir: allow carry to be set/read by imad
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:34 -05:00
Ilia Mirkin
73c9ca7544 gm107/ir: add carry emission to LOP and IADD
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:34 -05:00
Ilia Mirkin
71a489633b gm107/ir: add ATOM and CCTL support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:34 -05:00
Ilia Mirkin
57b0025814 gm107/ir: set LD/ST address width bit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:34 -05:00
Ilia Mirkin
2e533ab74b gk110/ir: fix double-wide vm address 2016-01-20 19:37:34 -05:00
Ilia Mirkin
8c2dfe05c5 gk110/ir: add OP_CCTL handling 2016-01-20 19:37:33 -05:00
Ilia Mirkin
7d9a97d6be gk110/ir: add atomic op emission, fix gmem loads
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 19:37:33 -05:00
Chad Versace
5ce5a7d021 anv/image: Stop including gen8_pack.h in common file 2016-01-20 15:42:17 -08:00
Chad Versace
8ab527de03 isl: Add a README
Most of the file-level comment in isl.h is moved to the README.
2016-01-20 15:24:40 -08:00
Roland Scheidegger
dc8b9bd0aa llvmpipe: warn about illegal use of objects in different contexts
Doing that is clearly a bug. We can't quite assert as st/mesa may hit this,
but increase at least visibility of it a bit.
(For the non-refcounted objects it would be illegal too, but we can't detect
that unless we'd store the context ourselves. Plus, those don't tend to cause
random crashes at context or object destruction time... So just sampler views,
surfaces and so targets for now.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-21 00:09:55 +01:00
Roland Scheidegger
e925ec8811 llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex info
I removed this mistakenly in 2dbc20e456. I
actually thought it should not be necessary and a piglit run didn't show
any differences, but this shouldn't have been in there.
draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER.
The new polygon-mode-facing test indeed shows why this is necessary, there's
lots of invalid reads and writes with valgrind (also crashes without
valgrind), because the pre-pipeline vertex size doesn't match the
post-pipeline vertex size (note this won't help much with stages which don't
have the prepare hook which can grow the vertex size, in particular the wide
point stage, but this isn't used by llvmpipe). The test still won't pass, of
course, but it is only usage of uninitialized values now, which is much
less dangerous...
(Albeit I'm pretty sure for i915 it really is not needed anymore as it
doesn't care about the extra outputs and doesn't call
draw_prepare_shader_outputs().)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-21 00:09:55 +01:00
Ilia Mirkin
dc3ac418bf nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers
Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm))
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 18:03:36 -05:00
Kristian Høgsberg Kristensen
7b7a7c2bfc vk: Make maxSamplerAllocationCount more reasonable
We can't allocate 4 billion samplers. Let's go with 64k.
2016-01-20 14:36:52 -08:00
Ilia Mirkin
3a63576168 gk110/ir: fix load from shared memory
It was accidentally using the store opcode.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 17:16:09 -05:00
Ilia Mirkin
9f23007a7a gk110/ir: add partial BAR support
This is enough for the plain TGSI BARRIER implementation.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-20 17:16:09 -05:00
Kristian Høgsberg Kristensen
8ef002dd7a vk/tests: Add stub for anv_gem_get_bit6_swizzle() 2016-01-20 13:47:40 -08:00
Kristian Høgsberg Kristensen
420e8664cb vk/tests: Add isl include path 2016-01-20 13:47:40 -08:00
Kenneth Graunke
b76e4458f9 nir/spirv/glsl450: Use fabs not iabs in ldexp.
This was just wrong.
2016-01-20 12:18:02 -08:00
Tapani Pälli
f1152c3455 Revert "glsl: move uniform calculation to link_uniforms"
This reverts commit 4475d8f916.
2016-01-20 22:04:46 +02:00
Kristian Høgsberg Kristensen
947ebd9c71 isl: Add ish.h to libsil_la_SOURCES 2016-01-20 12:03:46 -08:00
Jason Ekstrand
21b2d87408 nir/spirv/glsl450: Implement FrexpStruct 2016-01-20 11:36:41 -08:00
Jason Ekstrand
c7896d1868 spirv/nir/glsl450: Use vtn_create_ssa_value to create SSA values 2016-01-20 11:36:26 -08:00
Jason Ekstrand
e45748bade anv/device: Default to scalar GS on BDW+ 2016-01-20 11:16:44 -08:00
Jason Ekstrand
34f9a5f301 nir/spirv: Pull texture dimensionality out of the image when available 2016-01-20 11:11:30 -08:00
Jason Ekstrand
59ef7c6507 anv/meta: fix UpdateBuffer in the case where we do multiple updates 2016-01-20 07:56:48 -08:00
Jason Ekstrand
a0516cfbac anv/meta: Fix a finishme 2016-01-20 07:33:41 -08:00
Tapani Pälli
4475d8f916 glsl: move uniform calculation to link_uniforms
Patch moves uniform calculation to happen during link_uniforms, this
is possible with help of UniformRemapTable that has all the reserved
locations.

Location assignment for implicit locations is changed so that we
utilize also the 'holes' that explicit uniform location assignment
might have left in UniformRemapTable, this makes it possible to fit
more uniforms as previously we were lazy here and wasting space.

Fixes following CTS tests:
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array

v2: code cleanups, increment NumUniformRemapTable correctly, fix
    find_empty_block to work properly and add some more comments.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-20 07:24:39 +02:00
Timothy Arceri
0a6a05c8ea glsl: add missing explicit_image_format flag to has_layout()
Fixes piglit regression after fixes to duplicate layout rules.

Previously catching multiple layouts was relying on the code
meant to catch duplicates within a single layout(...), this
change triggers the rules for multiple layouts.

Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-01-20 15:45:56 +11:00
Jason Ekstrand
c7203aa621 nir/spirv: Move OpPhi handling to vtn_cfg.c
Phi handling is somewhat intrinsically tied to the CFG.  Moving it here
makes it a bit easier to handle that.  In particular, we can now do SSA
repair after we've done the phi node second-pass.  This fixes 6 CTS tests.
2016-01-19 19:00:00 -08:00
Jason Ekstrand
891564adb9 nir/spirv: Handle OpLine and OpNoLine in foreach_instruction
This way we don't have to explicitly handle them everywhere.
2016-01-19 19:00:00 -08:00
Kenneth Graunke
e79f8a4926 nir: Lower ldexp to arithmetic.
This is a port of Matt's GLSL IR lowering pass to NIR.  It's required
because we translate SPIR-V directly to NIR, bypassing GLSL IR.

I haven't introduced a lower_ldexp flag, as I believe all current NIR
consumers would set the flag.  i965 wants this, vc4 doesn't implement
this feature, and st_glsl_to_tgsi currently lowers ldexp
unconditionally anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-19 18:10:30 -08:00
Kenneth Graunke
b3cc10f3b2 nir: Let nir_opt_algebraic rules contain unsigned constants > INT_MAX.
struct.pack('i', val) interprets `val` as a signed integer, and dies
if `val` > INT_MAX.  For larger constants, we need to use 'I' which
interprets it as an unsigned value.

This patch makes us use 'I' for all values >= 0, and 'i' for negative
values.  This should work in all cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-19 18:10:30 -08:00
Jason Ekstrand
eb2a119da2 anv/meta: Implement UpdateBuffer 2016-01-19 16:53:35 -08:00
Jason Ekstrand
0ae1bd321e anv/meta: Implement CmdFillBuffer 2016-01-19 16:53:35 -08:00
Jason Ekstrand
46eef31311 anv/meta_clear: Call emit_clear directly in ClearImage
Using the load op means that we end up with recursive meta.  We shouldn't
be doing that.
2016-01-19 16:53:35 -08:00
Jason Ekstrand
6325a75011 anv/meta_clear: Do save/restore in actual entry points 2016-01-19 16:53:35 -08:00
Jason Ekstrand
56dbf13045 anv: Add support for VK_WHOLE_SIZE several places 2016-01-19 16:53:35 -08:00
Kenneth Graunke
549be68258 nir/spirv/glsl450: Implement Frexp. 2016-01-19 16:46:03 -08:00
Roland Scheidegger
b21973acaa llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formats
If we have a d24x8 format, there is no stencil. Therefore, we can always
clear these bits too, which means this will be some kind of memset rather
than read-modify-write.
This is good for some 7% increase or so in gears with huge window size -
seems to have a bigger effect if things aren't in caches. Of course, any
real app won't spend nearly as much time comparatively in clearing
depth buffer in the first place, so the speedup will be much lower.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-20 01:45:56 +01:00
Kenneth Graunke
68c9ca1a94 nir/spirv/glsl450: Blindly implement Atan2.
This is untested and probably broken.

We already passed the atan2 CTS tests before implementing this opcode.
Presumably, glslang or something was giving us a plain Atan opcode
instead of Atan2.  I don't know why.
2016-01-19 16:14:05 -08:00
Kenneth Graunke
2ab3efa0ad nir/spirv/glsl450: Implement Atan. 2016-01-19 16:14:05 -08:00
Kenneth Graunke
bc9d9bc2e3 nir/spirv/glsl450: Implement Asin and Acos. 2016-01-19 16:14:05 -08:00
Francisco Jerez
f8ac314cc2 i965: Implement compute sampler state atom.
Fixes a number of GLES31 CTS failures and hangs on various hardware:

 ES31-CTS.texture_gather.plain-gather-depth-2d
 ES31-CTS.texture_gather.plain-gather-depth-2darray
 ES31-CTS.texture_gather.plain-gather-depth-cube
 ES31-CTS.texture_gather.offset-gather-depth-2d
 ES31-CTS.texture_gather.offset-gather-depth-2darray
 ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader
 ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader
 ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers
 ES31-CTS.compute_shader.resources-texture

Some of them were actually passing by luck on some generations even
though we weren't uploading sampler state tables explicitly for the
compute stage, most likely because they relied on the cached sampler
state left from previous rendering to be close enough.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725
Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-19 16:11:04 -08:00
Francisco Jerez
9e4c8acd78 i965: Trigger CS state reemission when new sampler state is uploaded.
This reuses the NEW_SAMPLER_STATE_TABLE state bit (currently only used
on pre-Gen7 hardware) to signal that the sampler state tables have
changed in order to make sure that the GPGPU interface descriptor is
updated.

Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-19 16:11:04 -08:00
Kenneth Graunke
4fc018576b glsl: Don't abbreviate tessellation shader stage names.
I have a patch that writes shaders as .shader_test files, and it uses
this function to create the headers (i.e. [vertex shader]).

[tess ctrl shader] isn't a valid shader_runner header - it's spelled
out as [tessellation control shader].

There's no real reason to abbreviate it, so spell it out.

v2: Rebase on Rob's patches to move the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-19 14:57:42 -08:00
Timothy Arceri
11fc7ad62e mesa: remove link validation that should be done elsewhere
Even if re-linking fails rendering shouldn't fail as the previous
succesfully linked program will still be available. It also shouldn't
be possible to have an unlinked program as part of the current rendering
state.

This fixes a subtest in:
ES31-CTS.sepshaderobjs.StateInteraction

This change should improve performance on CPU limited benchmarks as noted
in commit d6c6b186cf.

>From Section 7.3 (Program Objects) of the OpenGL 4.5 spec:

   "If a program object that is active for any shader stage is re-linked
    unsuccessfully, the link status will be set to FALSE, but any existing
    executables and associated state will remain part of the current rendering
    state until a subsequent call to UseProgram, UseProgramStages, or
    BindProgramPipeline removes them from use. If such a program is attached to
    any program pipeline object, the existing executables and associated state
    will remain part of the program pipeline object until a subsequent call to
    UseProgramStages removes them from use. An unsuccessfully linked program may
    not be made part of the current rendering state by UseProgram or added to
    program pipeline objects by UseProgramStages until it is successfully
    re-linked."

   "void UseProgram(uint program);

   ...

   An INVALID_OPERATION error is generated if program has not been linked, or
   was last linked unsuccessfully.  The current rendering state is not modified."

V2: apply the rule to both core and compat.

Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-20 09:35:04 +11:00
Timothy Arceri
6a660a5f5d glsl: allow multiple layout qualifiers for a single declaration
From the ARB_shading_language_420pack spec:

   "More than one layout qualifier may appear in a single
   declaration. If the same layout-qualifier-name occurs in
   multiple layout qualifiers for the same declaration, the
   last one overrides the former ones."

The parser was already failing correctly when the extension is
not available but testing for duplicates within a single layout
qualifier was still causing this to fail when available as both
cases share the same function for merging.

Here we add a parameter to differentiate between the two uses
and apply it to the duplicate test.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-20 08:06:50 +11:00
Timothy Arceri
564009986f glsl: update parser to allow duplicate default layout qualifiers
In order to only create a single node for each default declaration
we add a new boolean parameter to the in/out merge function to
only create one once we reach the rightmost layout qualifier.

From the ARB_shading_language_420pack spec:

   "More than one layout qualifier may appear in a single
   declaration. If the same layout-qualifier-name occurs in
   multiple layout qualifiers for the same declaration, the
   last one overrides the former ones."

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-20 08:06:45 +11:00
Timothy Arceri
a0a93470e3 glsl: move default layout qualifier rules out of the parser
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-20 08:06:40 +11:00
Timothy Arceri
fd612e4547 glsl: split layout_defaults into specific types
This will allow merging of duplicate layout qualifiers as allowed
by ARB_shading_language_420pack

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-20 08:06:35 +11:00
Timothy Arceri
c8b8c578d1 glsl: allow duplicate layout-qualifier-names
This is added by ARB_enhanced_layouts although it doesn't fit
into any of the six main changes so we enable this independently.

From the ARB_enhanced_layouts spec:

   "More than one layout qualifier may appear in a single
   declaration. Additionally, the same layout-qualifier-name
   can occur multiple times within a layout qualifier or across
   multiple layout qualifiers in the  same declaration"

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-20 08:06:29 +11:00
Matt Turner
866a6bf9f7 i965/vec4: Spaces around operators. 2016-01-19 12:12:38 -08:00
Matt Turner
e734fb0326 i965: Inform compiler of variable range to silence warning.
Extends commit 6531ccb70 to silence the warning in release builds as
well.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-19 12:08:59 -08:00
Matt Turner
a439788c59 glsl: Restore Mesa-style to shader_enums.c/h. 2016-01-19 12:08:59 -08:00
Jason Ekstrand
5e57a87dcf anv/pipeline: Fix point size 2016-01-19 12:03:13 -08:00
Daniel Stone
f9ca780ea4 anv/wsi: Mark Wayland buffers as busy
We were diligently setting Wayland buffers as non-busy, but nowhere in
the code did we set them to busy when submitted to the server. This
meant that acquire_next_image would only ever find the same buffer in
a loop, over and over.

Signed-off-by: Daniel Stone <daniels@collabora.com>
2016-01-19 16:54:55 +00:00
Daniel Stone
ba5ef49dcb anv/wsi: Avoid stuck Wayland connection
In acquire_next_image, we are waiting for a wl_buffer::release to arrive
and release one of the buffers in our swapchain. Most compositors don't
explicitly flush release events, so we may need to perform a roundtrip
instead, to ensure the event arrives.

Signed-off-by: Daniel Stone <daniels@collabora.com>
2016-01-19 16:54:55 +00:00
Christian König
f3b067af86 st/va: fix motion adaptive deinterlacing
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-19 17:28:38 +01:00
Nicolai Hähnle
e6281a2850 util/u_pstipple.c: copy immediates during transformation
Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...

Fixes a bug in Kodi with radeonsi reported by Christian König.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-19 10:52:35 -05:00
Marta Lofstedt
2bcacc69b9 mesa: Move sanity check of BindVertexBuffer for OpenGL ES 3.1
Sanity check of BindVertexBuffer for OpenGL ES in
_mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-19 13:08:42 +01:00
Timothy Arceri
d018619d7f glsl: fix interface block error message
Print the stream value not the pointer to the expression,
also use the unsigned format specifier.

Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-19 14:51:31 +11:00
Jason Ekstrand
3276610ea6 getX/state: Set LOD pre-clamp to OpenGL mode
This gets us another couple hundred sampler tests
2016-01-18 17:51:35 -08:00
Jason Ekstrand
580b2e85e4 isl/device: Add a flag for bit 6 swizzling 2016-01-18 17:21:05 -08:00
Jason Ekstrand
587842a0ca anv/gem: Add a helper for getting bit6 swizzling information 2016-01-18 17:21:05 -08:00
Jason Ekstrand
c2a6f4302e nir/spirv: Patch through image qualifiers 2016-01-18 17:21:05 -08:00
Jason Ekstrand
56c8a5f2b8 nir/spirv: Implement ImageQuerySize for storage iamges
SPIR-V only has one ImageQuerySize opcode that has to work for both
textures and storage images.  Therefore, we have to special-case that one a
bit and look at the type of the incoming image handle.
2016-01-18 17:21:05 -08:00
Jason Ekstrand
bb8cadd169 nir/spirv: Insert movs around image intrinsics
Image intrinsics always take a vec4 coordinate and always return a vec4.
This simplifies the intrinsics a but but also means that they don't
actually match the incomming SPIR-V.  In order to compensate for this, we
add swizzling movs for both source and destination to get the right number
of components.
2016-01-18 17:21:05 -08:00
Ilia Mirkin
a31819cff8 nv50/ir: swap the least-ref'd source into src1 when both const/imm
The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.

This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.

While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:

total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs    : 945082 -> 943417 (-0.18%)
total local used in shared programs   : 30372 -> 30288 (-0.28%)
total bytes used in shared programs   : 50089256 -> 50047880 (-0.08%)

                local        gpr       inst      bytes
    helped          21         822        3332        3332
      hurt           0         278         565         565

And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.

On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):

total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs    : 728719 -> 725131 (-0.49%)
total local used in shared programs   : 3552 -> 3552 (0.00%)
total bytes used in shared programs   : 43995688 -> 43932928 (-0.14%)

                local        gpr       inst      bytes
    helped           0        1380        3252        3774
      hurt           0         287        1710        1365

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-18 17:52:07 -05:00
Ilia Mirkin
af686e7de3 st/mesa: restore the stObj's size if it was cleared out
An issue could still occur if the base level is set, but fixing that
would require a lot more logic.

This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-18 17:52:07 -05:00
Jason Ekstrand
6f956b0b22 anv/meta: Improve meta clear cleanup a bit 2016-01-18 14:07:46 -08:00
Jason Ekstrand
45d17fcf9b anv: Misc allocation scope fixes 2016-01-18 14:04:13 -08:00
Jason Ekstrand
378af64e30 anv/meta: Add a meta allocator that uses SCOPE_DEVICE
The Vulkan spec requires all allocations that happen for device creation to
happen with SCOPE_DEVICE.  Since meta calls into other things that allocate
memory, the easiest way to do this is with an allocator.
2016-01-18 14:03:24 -08:00
Rob Clark
805e080ba0 freedreno/a4xx: use smaller threadsize for more registers
Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize.  (The vertex shader already
runs at TWO_QUADS, which is the minimum.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Rob Clark
6062941e4d freedreno: per-generation OUT_IB packet
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet.  So allow for PFD vs PFE per
generation.  Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Jason Ekstrand
3dfa6a881c anv/meta: Initialize a handle to null 2016-01-18 13:05:02 -08:00
Jason Ekstrand
d49298c702 gen8: Fix border color
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address.  The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
2016-01-18 12:16:31 -08:00
Jason Ekstrand
bfcc744892 genX/pack: Add a __gen_fixed helper and use it for TextureLODBias
The __gen_fixed helper properly clamps the value and also handles negative
values correctly.  Eventually, we need to make the scripts generate this
and use it for more things.
2016-01-18 11:35:04 -08:00
Jason Ekstrand
5a67df2546 anv/pack: Make TextureLODBias a proper 4.8 float
XXX: We need to update the generators so this doesn't get stompped.
2016-01-18 10:36:53 -08:00
Jason Ekstrand
15e6af0708 nir/spirv: Handle if's where the merge is also a break or continue 2016-01-18 10:10:47 -08:00
Jason Ekstrand
14ebd0fdd7 nir/spirv: Hanle continues that use SSA values from the loop body
Instead of emitting the continue before the loop body we emit it
afterwards.  Then, once we've finished with the entire function, we run
nir_repair_ssa to add whatever phi nodes are needed.
2016-01-18 09:43:12 -08:00
Jason Ekstrand
61ba97522e nir/lower_returns: Repair SSA after doing return lowering 2016-01-18 09:43:12 -08:00
Jason Ekstrand
b11825590d nir: Add a pass to repair SSA form 2016-01-18 09:43:12 -08:00
Jason Ekstrand
a7a5e8a2de nir/vars_to_ssa: Use the new nir_phi_builder helper
The efficiency should be approximately the same.  We do a little more work
per phi node because we have to sort the predecessors.  However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
2016-01-18 09:18:42 -08:00
Jason Ekstrand
8aab4a7bd2 nir: Add a phi node placement helper
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis.  Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
2016-01-18 09:18:42 -08:00
Jason Ekstrand
b1f1200e80 util/bitset: Allow iterating over const bitsets 2016-01-18 09:18:42 -08:00
Emil Velikov
c03f3dd0a5 gallium: bundle the compat header u_pwr8.h in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov
7bc714509b mapi: include gl.xml in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov
a78e08e88f i965: adding missing headers to the dist tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Christian König
eaf7ec9cfc st/va: add motion adaptive deinterlacing v2
v2: minor cleanup

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-18 10:59:32 +01:00
Michel Dänzer
ad20be1f30 gallium/radeon: Rename do_invalidate_resource to invalidate_buffer
And only call it from r600_invalidate_resource for buffer resources.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
0491dd1deb st/dri: Don't call invalidate_resource for NULL depth/stencil buffers
Fixes crash in 4 EGL piglit tests with radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
a9ab7172a6 radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer
4297259fc8 radeonsi: Print "LLVM emitted unknown config register" warning only once
Say "LLVM" instead of "Compiler" for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Oded Gabbay
679a654a77 llvmpipe: use vpkswss when dst is signed
This patch fixes a bug when building a pack instruction.

For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.

This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale

I've also corrected some coding style issues in the function.

v2: Returned else statements to vmware style

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-18 09:45:25 +02:00
Dave Airlie
119bef9543 glsl: fix subroutine lowering reusing actual parmaters
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).

this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-18 15:02:34 +10:00
Timothy Arceri
9258d9f23d glsl: remove special case for detecting stream duplicates
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:28 +11:00
Timothy Arceri
eac2cece31 glsl: add missing explicit_stream flag to has_layout()
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:16 +11:00
Timothy Arceri
86677f1016 mesa: fix segfault in glUniformSubroutinesuiv()
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:

   "The command

       void UniformSubroutinesuiv(enum shadertype, sizei count,
                                  const uint *indices);

   will load all active subroutine uniforms for shader stage
   shadertype with subroutine indices from indices, storing
   indices[i] into the uniform at location i. The indices for
   any locations between zero and the value of
   ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
   used will be ignored."

V2: simplify NULL check suggested by Jason.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=93731
2016-01-18 11:53:24 +11:00
Timothy Arceri
50376e0c0e glsl: fix segfault linking subroutine uniform with explicit location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
2016-01-18 11:30:45 +11:00
Ilia Mirkin
4ac1274caa gm107/ir: don't do indirect frag shader inputs on GM107
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
3281ae96c8 tgsi: initialize Atomic field in tgsi_default_declaration
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-17 16:37:04 -05:00
Ilia Mirkin
5a81b48ad0 nvc0: bsp_bo can't be null
We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Oded Gabbay
529aa8249a llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-17 21:07:27 +02:00
Rob Clark
02ac91d717 freedreno/ir3: fix mad 3rd src delay calc
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-17 12:21:45 -05:00
Rob Clark
2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark
6a33c5c0df freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:20 -05:00
Rob Clark
ddede497b8 freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions.  This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:22:43 -05:00
Rob Clark
ebd3a1fc17 ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:52 -05:00
Rob Clark
fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark
cc7ed34df9 freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:46 -05:00
Rob Clark
680664dff9 freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:37 -05:00
Rob Clark
2809c87f90 freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:59 -05:00
Rob Clark
fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00
Rob Clark
d430f443de freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:33 -05:00
Rob Clark
6f0377d651 ttn: add missing writemask on store_output
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-16 13:35:44 -05:00
Rob Clark
683794fd60 nir/print: const_index is signed
Noticed this with $piglit/bin/vp-address-01

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-16 13:35:44 -05:00
Rob Clark
211b0644e6 nir: few missing struct names
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'.  But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.

So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-16 13:35:43 -05:00
Ilia Mirkin
32a9fe013b nv50/ir: add saturate support on ex2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-16 00:10:56 -05:00
Jeff Muizelaar
e5fefe49f2 gallivm: avoid crashing in mod by 0 with llvmpipe
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-16 03:36:29 +01:00
Kenneth Graunke
d54a70aa18 glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions.  Implementations of
current language revisions may or may not perform them.

This patch makes Mesa apply implicti conversions even on current
language versions.  Applications appear to expect this behavior,
and there's really no downside to doing so.

Fixes shader compilation in Shadow of Mordor.

Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-15 17:53:44 -08:00
Jason Ekstrand
61b0cfd84e i965/fs: Always set channel 2 of texture headers in some stages
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers.  However, in compute,
geometry, and tessellation stages, the hardware is not so nice.  In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15.  As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask.  This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled.  Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
9870f798be i965/fs/generator: Take an actual shader stage rather than a string
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
0a6811207f i965/vec4: Use UW type for multiply into accumulator on GEN8+
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."

Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
f509a89082 nir/lower_system_values: Lower vertexID to id+base if needed 2016-01-15 16:15:50 -08:00
Jason Ekstrand
6b64dddd71 anv/batch_chain: Remove padding from the BO before emitting BUFFER_END 2016-01-15 15:59:58 -08:00
Jason Ekstrand
67bf74f020 anv/batch_chain: Don't call current_batch_bo() again
We call it once at the top of the function and then hold on to the pointer.
It shouldn't have changed, so there's no reason to query for it again.
2016-01-15 15:49:32 -08:00
Jason Ekstrand
117cac75d0 nir/spirv: Stop trusting the SPIR-V for the number of texture coordinates 2016-01-15 11:13:51 -08:00
Roland Scheidegger
03f66dfb4b llvmpipe: ditch additional ref counting for vertex/geometry sampler views
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Roland Scheidegger
2f9a325b6a llvmpipe: fix "leaking" textures
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...

v2: rename var

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Chad Versace
0e420cb67f anv: Populate SURFACE_STATE more safely
genX_image_view_init allocates up to 3 separate SURFACE_STATE structures,
and populates each from a single template. Stop mutating the template
between each final SURFACE_STATE.
2016-01-15 11:00:22 -08:00
Chad Versace
eab6212efd anv/meta: Stop leaking renderpass and framebuffer 2016-01-15 10:14:07 -08:00
Chad Versace
482a1f5eab anv/meta: Reuse code for vkCmdClear{Color,DepthStencil}Image
The two function bodies were very similar. Move common code to
anv_cmd_clear_image().

Fixes all 'dEQP-VK.renderpass.formats.*' on Skylake.
2016-01-15 07:46:10 -08:00
Chad Versace
1afe33f8b3 anv/gen8: Fix SF_CLIP_VIEWPORT's Z elements
SF_CLIP_VIEWPORT does not clamp Z values. It only scales and shifts
them. Clamping to VkViewport::minDepth,maxDepth is instead handled by
CC_VIEWPORT.

Fixes dEQP-VK.renderpass.simple.depth on Broadwell.
2016-01-14 22:53:05 -08:00
Chad Versace
842b424d3b anv/meta: Implement vkCmdClearDepthStencilImage 2016-01-14 22:53:05 -08:00
Chad Versace
e4b17a2e1a anv/meta: Implement vkCmdClearAttachments 2016-01-14 22:53:05 -08:00
Chad Versace
0038ae2e4a anv/meta: Add VkClearRect param to emit_clear()
Prepares for vkCmdClearAttachments.
2016-01-14 22:53:05 -08:00
Chad Versace
11f5433715 anv: Distinguish between subpass setup and subpass start
vkCmdBeginRenderPass, vkCmdNextSubpass, and vkBeginCommandBuffer with
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT, all *setup* the
command buffer for recording commands for some subpass.  But only the
first two, vkCmdBeginRenderPass and vkCmdNextSubpass, can *start*
a subpass.

Therefore, calling anv_cmd_buffer_begin_subpass() inside
vkCmdBeginCommandBuffer is misleading. Clarify its purpose by renaming
it to anv_cmd_buffer_set_subpass() and adding comments.
2016-01-14 22:53:05 -08:00
Chad Versace
deb8dd89b5 anv: Emit load clears at start of each subpass
This should improve cache residency for render targets.

Pre-patch, vkCmdBeginRenderPass emitted all the meta clears for
VK_ATTACHMENT_LOAD_OP_CLEAR before any subpass began. Post-patch,
vCmdBeginRenderPass and vkCmdNextSubpass emit only the clears needed for
that current subpass.
2016-01-14 22:53:05 -08:00
Chad Versace
0679bef49f anv/meta: Create 8 pipelines for color clears
This prepares for moving the clear ops from the start of the render pass
into each subpass.

Pipeline N will be used to clear color attachment N of the current
subpass. Currently meta color clears still create a throwaway subpass
with exactly one attachment, so currently only pipeline 0 is used.

This is an ugly hack to workaround the compiler's current inability to
dynamically set the render target index in the render target write
message.
2016-01-14 22:53:05 -08:00
Chad Versace
2997b0da4a anv: Allow override of pipeline color attachment count
Add anv_graphics_pipeline_create_info::color_attachment_count. If
non-negative, then it overrides the color attachment count in the
pipeline's subpass. Useful for meta. (All the hacks for meta!)
2016-01-14 22:53:05 -08:00
Chad Versace
13610c03a7 anv/meta: Name the nir shaders
The names appear in debug output.
2016-01-14 22:53:05 -08:00
Chad Versace
6a1a760e3c anv: Move MAX_* defs to top of anv_private.h
Because I need to use MAX_RTS in struct anv_meta_state.
2016-01-14 22:53:05 -08:00
Chad Versace
4c2bafb9bf anv: Define zero() macro
zero(x) memsets x to zero. Eliminates bugs due to errors in memset's
size param.
2016-01-14 22:53:05 -08:00
Chad Versace
f2700d665c anv/meta: Rename emit_load_*_clear funcs
The functions will soon handle clears unrelated to
VK_ATTACHMENT_LOAD_OP_CLEAR, namely vkCmdClearAttachments. So remove
"load" from their name:

    emit_load_color_clear -> emit_color_clear
    emit_load_depthstencil_clear -> emit_depthstencil_clear
2016-01-14 22:53:05 -08:00
Chad Versace
356f952f87 anv/meta: Use anv_cmd_state::attachments for clears
Rewrite anv_cmd_buffer_clear_attachments, which emits the top-of-pass
clears, to use the data provided in anv_cmd_state::attachments. This
prepares for deferring each attachment clear to the first subpass that
uses the attachment.
2016-01-14 22:53:05 -08:00
Chad Versace
a4b045ca44 anv: Add anv_cmd_state::attachments
This array contains attachment state when recording a renderpass instance.
It's populated on each call to anv_cmd_buffer_set_pass.

The data is currently set but unused. We'll use it later to defer each
attachment clear to the subpass that first uses the attachment.
2016-01-14 22:53:05 -08:00
Samuel Iglesias Gonsálvez
781d2787bc glsl: restrict consumer stage condition to modify interpolation type
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.

If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.

Fixes the following CTS test:
   ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate

Fixes the following dEQP tests:

dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-15 07:06:41 +01:00
Jason Ekstrand
5d1c2736b6 i965/fs/generator: Change a comment as per jordan's suggestion 2016-01-14 22:03:15 -08:00
Kenneth Graunke
3657cbf24f i965: Apply add_const_offset_to_base for vec4 VS inputs too.
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
a3500f943e i965: Make add_const_offset_to_base() work at the shader level.
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
824d82025d i965: Make an is_scalar boolean in brw_compile_vs().
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
bb6612f06b nir/builder: Add a nir_build_ivec4() convenience helper.
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Tapani Pälli
cf96bce0ca glsl: mark explicit uniforms as explicit in other stages too
If shader declares uniform explicit location in one stage but
implicit in another, explicit location should be used. Patch marks
implicit uniforms as explicit if they were explicit in previous stage.
This makes sure that we don't treat them implicit later when assigning
locations.

Fixes following CTS test:
   ES31-CTS.explicit_uniform_location.uniform-loc-implicit-in-some-stages3

v2: move check to cross_validate_globals (Timothy)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-15 07:12:42 +02:00
Jason Ekstrand
6be517b20e i965/fs: Always set hannel 2 of texture headers in some stages 2016-01-14 20:42:47 -08:00
Jason Ekstrand
e1d13cd058 i965/fs/generator: Take an actual shader stage rather than a string 2016-01-14 20:27:56 -08:00
Francisco Jerez
0556b87de4 i965/gen7.5+: Disable resource streamer during GPGPU workloads.
The RS and hardware binding tables are only supported on the 3D
pipeline and can lead to corruption if left enabled during a GPGPU
workload.  Disable it when switching to the GPGPU (or media) pipeline
and re-enable it when switching back to the 3D pipeline.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2016-01-14 19:26:24 -08:00
Francisco Jerez
c8df0e7bf3 i965/gen7: Emit stall and dummy primitive draw after switching to the 3D pipeline.
This hardware bug can supposedly lead to a hang on IVB and VLV.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-14 19:26:23 -08:00
Francisco Jerez
635be1402c i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines.
AFAIK brw_emit_select_pipeline() is only called once during context
init on Gen4-5, at which point the pipeline is likely to be already
idle so it may just happen to work by luck regardless of the MI_FLUSH.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-14 19:26:23 -08:00
Francisco Jerez
18c76551ee i965/gen6-7: Implement stall and flushes required prior to switching pipelines.
Switching the current pipeline while it's not completely idle or the
read and write caches aren't flushed can lead to corruption.  Fixes
misrendering of at least the following Khronos CTS test:

 ES31-CTS.shader_image_load_store.basic-allTargets-store-fs

The stall and flushes are no longer required on Gen8+.

v2: Emit PIPE_CONTROL with non-zero post-sync op before the write
    cache flush on SNB due to hardware bug. (Ken)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-14 19:26:23 -08:00
Francisco Jerez
044acb9256 i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline.
This hardware bug can cause a hang on context restore while the
current pipeline is set to GPGPU (BDWGFX HSD 1909593).  In addition to
clearing the valid bit, mark the CC state as dirty to make sure that
the CC indirect state pointer is re-emitted when we switch back to the
3D pipeline.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-14 19:26:23 -08:00
Francisco Jerez
22ac1f6922 i965: Add state bit to trigger re-emission of color calculator state.
This will be used on Gen8+ to make sure that the color calculator
state pointers are re-emitted when switching back to the 3D pipeline
after some GPGPU workload due to a hardware workaround.  There are
other state bits already defined that could be used to achieve the
same effect but they all cause a ton of unrelated state to be
re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new
one, state bits are cheap.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-14 19:26:23 -08:00
Jason Ekstrand
47af950df5 anv/apply_pipeline_layout: Stomp texture array size to 1 2016-01-14 18:58:25 -08:00
Jason Ekstrand
6483d3f8fe nir/spirv: Fix texture return types
We were just hard-coding everything to a vec4.  This meant we weren't
handling shadow samplers at all and integer things were getting the wrong
return type.
2016-01-14 18:48:57 -08:00
Ilia Mirkin
fffb559129 nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:

total local used in shared programs   : 54116 -> 30372 (-43.88%)

Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 20:14:01 -05:00
Ilia Mirkin
e231f59b6d nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection
Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.

This helps a lot of Metro 2033 Redux and a handful of KSP shaders:

total instructions in shared programs : 6288373 -> 6261517 (-0.43%)
total gprs used in shared programs    : 944051 -> 945131 (0.11%)
total local used in shared programs   : 54116 -> 54116 (0.00%)

A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 20:13:59 -05:00
Ilia Mirkin
37b67db6ae nvc0/ir: be careful about propagating very large offsets into const load
Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.

This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.

Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 18:20:27 -05:00
Kristian Høgsberg Kristensen
2eb52198ff vk: Fix struct field indentation 2016-01-14 15:18:40 -08:00
Chad Versace
5dea9d0039 anv: Document anv_cmd_state::current_pipeline
It's the value of PIPELINE_SELECT.PipelineSelection.
2016-01-14 13:18:40 -08:00
Chad Versace
ed33ccde63 anv: Make vkBeginCommandBuffer reset the command buffer
If its the command buffer's first call to vkBeginCommandBuffer, we must
*initialize* the command buffer's state. Otherwise, we must *reset* its
state. In both cases, let's use anv_ResetCommandBuffer.

From the Vulkan 1.0 spec:

   If a command buffer is in the executable state and the command buffer
   was allocated from a command pool with the
   VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag set, then
   vkBeginCommandBuffer implicitly resets the command buffer, behaving
   as if vkResetCommandBuffer had been called with
   VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT not set. It then puts
   the command buffer in the recording state.
2016-01-14 13:14:40 -08:00
Chad Versace
ea20389320 anv: Add FIXME for vkResetCommandPool
vkResetCommandPool currently destroys its command buffers. The Vulkan
1.0 spec requires that it only reset them:

    Resetting a command pool recycles all of the resources from all of
    the command buffers allocated from the command pool back to the
    command pool. All command buffers that have been allocated from the
    command pool are put in the initial state.
2016-01-14 13:14:40 -08:00
Chad Versace
20fd816b6b anv: Remove duplicate func prototype
anv_private.h declared anv_cmd_buffer_begin_subpass twice.
2016-01-14 13:14:40 -08:00
Chad Versace
0415dfcfe7 anv/meta: Add FINISHME for clearing multi-layer framebuffers 2016-01-14 13:14:40 -08:00
Jason Ekstrand
32f8bcb84f i965/vec4: Use UW type for multiply into accumulator on GEN8+
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."
2016-01-14 12:04:25 -08:00
Jason Ekstrand
45349acad0 Merge remote-tracking branch 'mesa-public/master' into vulkan
This fixes the bitfieldextract and bitfieldinsert CTS tests
2016-01-14 11:36:27 -08:00
Ilia Mirkin
7a521ddf36 nvc0: allow fragment shader inputs to use indirect indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-14 14:28:04 -05:00
Ilia Mirkin
e94ef885bb st/mesa: use surface format to generate mipmaps when available
This fixes the recently posted mipmap + texture views piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-14 14:28:04 -05:00
Marek Olšák
dc96a18d24 radeonsi: don't miss changes to SPI_TMPRING_SIZE
I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-14 19:55:41 +01:00
Charmaine Lee
6303231a1d svga: add DXGenMips command support
For those formats that support hw mipmap generation, use the
DXGenMips command. Otherwise fallback to the mipmap generation utility.

Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench)

v2: make sure the texture surface was created with the render target bind flag
    set relocation flag to SVGA_RELOC_WRITE for the texture surface

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:44:25 -07:00
Charmaine Lee
78e628ae43 svga: add num-generate-mipmap HUD query
The actual increment of the num-generate-mipmap counter will be done
in a subsequent patch when hw generate mipmap is supported.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:39:53 -07:00
Charmaine Lee
3038e8984d gallium/st: add pipe_context::generate_mipmap()
This patch adds a new interface to support hardware mipmap generation.
PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify
if this new interface is supported; if not supported, the state tracker will
fallback to mipmap generation by rendering/texturing.

v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers
v3: add format to the generate_mipmap interface to allow mipmap generation
    using a format other than the resource format
v4: fix return type of trace_context_generate_mipmap()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-14 10:39:53 -07:00
Brian Paul
b1e11f4d71 st/mesa: declare struct pipe_screen in st_cb_bufferobjects.h
To silence a compiler warning.  Trivial.
2016-01-14 10:38:18 -07:00
Matt Turner
b82e26a6a4 nir: Lower bitfield_extract.
The OpenGL specifications for bitfieldExtract() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.

This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as

   bitfieldExtract:
      bits > 31 ? value : bfe(value, offset, bits)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-14 09:28:01 -08:00
Matt Turner
15640ee77a nir: Handle <bits>=32 case in bitfield_insert lowering.
The OpenGL specifications for bitfieldInsert() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.

This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as

   bitfieldInsert:
      bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-14 09:27:52 -08:00
Jason Ekstrand
f46f4e4886 nir/spirv: Add initial support for Vertex/Instance index 2016-01-14 09:12:32 -08:00
Jason Ekstrand
3d0fac7aca vulkan.h: Pull in 1.0.1 header 2016-01-14 08:37:54 -08:00
Jason Ekstrand
24a6fcba77 vulkan-1.0.0: Bump the version to 1.0.0 2016-01-14 08:26:37 -08:00
Jason Ekstrand
c310fb032d vulkan-1.0.0: Rework memory barriers 2016-01-14 08:09:39 -08:00
Brian Paul
6470435190 st/mesa: add check for color logicop in blit_copy_pixels()
We check that a bunch of raster operations are disabled in
blit_copy_pixels().  We also need to check that color logicop is
disabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:08:21 -07:00
Jason Ekstrand
b14a78cfb8 vulkan-1.0.0: No-op WSI changes 2016-01-14 08:02:44 -08:00
Jason Ekstrand
6d3322d0e5 vulkan-1.0.0: Make extents unsigned 2016-01-14 08:00:18 -08:00
Jason Ekstrand
b57c72d964 vulkan-1.0.0: Rework blits to use four offsets 2016-01-14 07:59:37 -08:00
Jason Ekstrand
f6cae99294 vulkan-1.0.0: Split out command buffer inheritance info 2016-01-14 07:45:15 -08:00
Jason Ekstrand
f99f847412 vulkan-1.0.0: Re-order some structs in the header 2016-01-14 07:43:05 -08:00
Jason Ekstrand
aab9517f3d vulkan-1.0.0: Misc. field and argument renames 2016-01-14 07:41:45 -08:00
Jason Ekstrand
d877095e66 vulkan-1.0.0: Get rid of MIPMAP_MODE_BASE 2016-01-14 07:32:16 -08:00
Jason Ekstrand
7b81637762 vulkan-1.0.0: Convert pPreserveAttachments to a uint32_t 2016-01-14 07:30:46 -08:00
Jason Ekstrand
802f00219a anv/device: Update features and limits 2016-01-14 07:30:46 -08:00
Jason Ekstrand
08735ba91c anv/cmd_buffer: Fix setting of viewport/scissor count 2016-01-14 07:30:46 -08:00
Jason Ekstrand
ed4fe3e9ba anv/state: Respect SamplerCreateInfo.anisotropyEnable 2016-01-14 07:30:46 -08:00
Jason Ekstrand
8a81d136f8 anv/image: Fill out VkSubresourceLayout.arrayPitch 2016-01-14 07:30:46 -08:00
BogDan Vatra
102c74277f WIP: Partially upgrade to vulkan v0.221.0
TODO, make use of:
- VkPhysicalDeviceFeatures.drawIndirectFirstInstance,
- VkPhysicalDeviceFeatures.inheritedQueries
- VkPhysicalDeviceLimits.timestampComputeAndGraphics
- VkSubmitInfo.pWaitDstStageMask
- VkSubresourceLayout.arrayPitch
- VkSamplerCreateInfo.anisotropyEnable
2016-01-14 07:30:46 -08:00
Nicolai Hähnle
e976860638 gallium/radeon: do not reallocate user memory buffers
The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:41:24 -05:00
Nicolai Hähnle
321140d563 gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:41:04 -05:00
Nicolai Hähnle
08c71740ad gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:40:00 -05:00
Nicolai Hähnle
70e66c57bb st/mesa: implement Driver.InvalidateBufferSubData
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:39:57 -05:00
Nicolai Hähnle
9e2240e892 st/mesa: use pipe->invalidate_resource instead of buffer re-allocation
Drivers are expected to avoid unnecessary work when possible in this code
path.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:39:53 -05:00
Nicolai Hähnle
654670b404 gallium: add PIPE_CAP_INVALIDATE_BUFFER
It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:39:38 -05:00
Nicolai Hähnle
6f4ae81005 mesa: add Driver.InvalidateBufferSubData
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-14 09:39:30 -05:00
Nicolai Hähnle
53c77494aa mesa: fix the checks in _mesa_InvalidateBuffer(Sub)Data
Change the check to be in line with what the quoted spec fragment says.

I have sent out a piglit test for this as well.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-14 09:39:22 -05:00
Nicolai Hähnle
cbcdef7b40 winsys/radeon: fix warnings about incompatible pointer types
Some confusion between pb_buffer and radeon_bo as well as between
radeon_drm_winsys and radeon_winsys.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-14 09:33:58 -05:00
Neil Roberts
06b526de05 texobj: Check completeness with InternalFormat rather than Mesa format
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA and the other as RGB because they would both
have an RGBA Mesa format. If we check the InternalFormat instead then
we are directly checking the API usage which I think better matches
the intention of the check.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93700
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-01-14 12:18:24 +00:00
Jordan Justen
8ce2b0e140 nir/spirv: Add support for ArrayLength op
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-13 23:34:45 -08:00
Jason Ekstrand
4507d8a57a nir/spirv/alu: Properly implement mod/rem 2016-01-13 16:53:02 -08:00
Jason Ekstrand
7d5ae2d34b i965: Implement nir_op_irem and nir_op_srem 2016-01-13 16:53:02 -08:00
Ben Widawsky
f4ab7340ca i965: Remove unused hw_must_use_separate_stencil
I spotted this while looking for what needs updating in future platforms.

I'm too lazy to go through the git logs, but it was probably missed by Jason
when all the brw refactoring happened.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-13 16:41:04 -08:00
Matt Turner
138a7dc826 i965: Drop extra newline from shader compile messages.
Ilia changed shader-db's run.c to not expect messages to contain a
newline in shader-db commit 51bbc8035.
2016-01-13 16:19:18 -08:00
Jason Ekstrand
cac99fffdb nir: Add more modulus and remainder opcodes
SPIR-V makes a distinction between "modulus" and "remainder" for both
floating-point and signed integer variants.  The difference is primarily
one of which source they take their sign from.  The "remainder" opcode for
integers is equivalent to the C/C++ "%" operation while the "modulus"
opcode is more mathematically correct (at least for an unsigned divisor).
This commit adds corresponding opcodes to NIR.
2016-01-13 15:18:36 -08:00
Jason Ekstrand
0079523a0d nir/spirv: Add support for OpSpecConstantOp 2016-01-13 15:18:36 -08:00
Jason Ekstrand
8c408b9b81 nir/spirv/alu: Factor out the opcode table 2016-01-13 15:18:36 -08:00
Jason Ekstrand
9b7e08118b anv/pipeline: Pass through specialization constants 2016-01-13 15:18:36 -08:00
Jason Ekstrand
c95c3b2c21 nir/spirv: Add initial support for specialization constants 2016-01-13 15:18:36 -08:00
Matt Turner
74cff779eb nir: Change bfm's semantics to match Intel/AMD/SM5.
Intel/AMD's hardware instructions do not handle arguments of 32.
Constant evaluation should not produce a result different from the
hardware instruction.

The s/1ull/1u/ change is intentional: previously we wanted defined
behavior for the "1 << 32" case, but we're making this case undefined so
we can make it 1u and save ourselves a 64-bit operation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 11:22:40 -08:00
Matt Turner
a5fcff6628 glsl: Fix undefined shifts.
Shifting into the sign bit is undefined, as is shifting by 32.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 11:22:11 -08:00
Matt Turner
966a0dd720 glsl: Handle failure of Python codegen scripts.
If a Python codegen script failed, it would write a zero-byte file,
which on subsequent invocations of make would trick it into thinking the
file was appropriately generated.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
84d6130c21 glsl, nir: Make ir_triop_bitfield_extract a vectorized operation.
We would like to be able to combine

   result.x = bitfieldExtract(src0.x, src1.x, src2.x);
   result.y = bitfieldExtract(src0.y, src1.y, src2.y);
   result.z = bitfieldExtract(src0.z, src1.z, src2.z);
   result.w = bitfieldExtract(src0.w, src1.w, src2.w);

into a single ivec4 bitfieldInsert operation.  This should be possible
with most drivers.

This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN.  The type of all three operands will be the same,
for simplicity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
b4e198f47f glsl, nir: Make ir_quadop_bitfield_insert a vectorized operation.
We would like to be able to combine

   result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
   result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
   result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
   result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);

into a single ivec4 bitfieldInsert operation.  This should be possible
with most drivers.

This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN.  The type of all four operands will be the same,
for simplicity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
b85a229e1f glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly.  NIR can handle ir_quadop_bitfield_insert as well.

These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:35:12 -08:00
Matt Turner
92f1773869 nir: Fix constant evaluation of bfm.
NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's
ir_binop_bfm takes <bits> as src0 and <offset> as src1.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:35:12 -08:00
Matt Turner
7dc2e5f940 i965/fs: Skip assertion on NaN.
A shader in Unreal4 uses the result of divide by zero in its color
output, producing NaN and triggering this assertion since NaN is not
equal to itself.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:32:53 -08:00
Matt Turner
64800933b8 i965/fs: Add debugging to constant combining pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:32:53 -08:00
Brian Paul
9638c03a4e meta: remove const qualifier on _mesa_meta_fb_tex_blit_begin()
To silence a compiler warning about a const/non-const mismatch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 08:02:25 -07:00
Brian Paul
235a299534 st/mesa: fix incorrect buffer token passed to _mesa_BindFramebuffer()
I added this code right at the end, and got it wrong.
Only used by the WGL_ARB_render_texture code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-13 08:01:56 -07:00
Emil Velikov
2065ffb4cf docs: add news item and link release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-13 15:27:50 +02:00
Emil Velikov
183b5ff109 docs: add sha256 checksums for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 4b2d9f29e9)
2016-01-13 15:25:32 +02:00
Emil Velikov
8f16739528 docs: add release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 330aa44a0d)
2016-01-13 15:25:31 +02:00
Neil Roberts
cda886a485 i965/gen9: Don't allow the RGBX formats for texturing/rendering
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-01-13 12:16:31 +00:00
Marek Olšák
4ea0febcb0 radeonsi: move POSITION and FACE fragment shader inputs to system values
And FACE becomes integer instead of float.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Marek Olšák
caf3c2abea radeonsi: simplify gl_FragCoord behavior
It will become a system value, not an input.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Samuel Iglesias Gonsálvez
69c4c75264 glsl: add image_format check in cross_validate_globals()
Fixes CTS test:

ES31-CTS.shader_image_load_store.negative-linkErrors

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93410

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-13 07:01:55 +01:00
Tapani Pälli
e937fd779f mesa: do not validate io of non-compute and compute stage
Fixes regression on SSO tests that have both non-compute and
compute programs in a program pipeline.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93532
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-13 07:31:57 +02:00
Tapani Pälli
6b0706b2aa glsl: add packed varyings for outputs with single stage program
Commit 8926dc8 added a check where we add packed varyings of output
stage only when we have multiple stages,  however duplicates are already
handled by changes in commit 0508d950 and we want to add outputs also in
case where we have only one stage.

Fixes regression caused by 8926dc8 for following test:
   ES31-CTS.program_interface_query.separate-programs-vertex

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-13 07:30:46 +02:00
Roland Scheidegger
38cdcb000d llvmpipe: (trivial) use cast wrapper for __m128d to __m128 casts
some compiler was unhappy.
2016-01-13 04:48:41 +01:00
Roland Scheidegger
49ec647c3b llvmpipe: avoid most 64 bit math in rasterization
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (9773722c2b) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).

This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:50:57 +01:00
Roland Scheidegger
16530fdc82 llvmpipe: scale up bounding box planes to subpixel precision
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:59 +01:00
Roland Scheidegger
0298f5aca7 llvmpipe: add sse code for fixed position calculation
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).

(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:09 +01:00
Roland Scheidegger
9422999e40 draw: fix key comparison with uninitialized value
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).

v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
2016-01-13 02:43:04 +01:00
Jason Ekstrand
610aa00cdf nir/spirv: Add support for OpQuantize 2016-01-12 15:36:38 -08:00
Jason Ekstrand
282a837317 i965: Implement nir_op_fquantize2f16 2016-01-12 15:35:00 -08:00
Jason Ekstrand
15a56459d7 nir: Add a fquantize2f16 opcode
This opcode simply takes a 32-bit floating-point value and reduces its
effective precision to 16 bits.
2016-01-12 15:33:02 -08:00
Timothy Arceri
6143e2d651 mesa: print the invalid enum when CreateShader fails
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-13 09:46:56 +11:00
Jason Ekstrand
aee970c844 anv/device: Bump the max program size again
No one will ever need more than 128K, right?
2016-01-12 13:49:05 -08:00
Kenneth Graunke
c034dbeda8 glsl: Make read_from_write_only_variable_visitor ignore .length().
.length() on an unsized SSBO variable doesn't actually read any data
from the SSBO, and is allowed on variables marked 'writeonly'.

Fixes compute shader compilation in Shadow of Mordor.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-12 12:20:02 -08:00
Kenneth Graunke
9095847c25 i965: Mark TCS URB writes as having side effects.
This adds barrier dependencies around TCS_OPCODE_URB_WRITE, preventing
reads and writes from being incorrectly scheduled.

Fixes rendering in GFXBench 4.0's tessellation demo.

For some reason, we haven't ever listed URB writes as having
side-effects.  This hasn't been a problem because in most stages, we
never read from the URB, and only write to each location once.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93526
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-12 12:19:47 -08:00
Kristian Høgsberg Kristensen
d7a193327b vk: Implement workaround for occlusion queries
We have an issue with occlusion queries (PIPE_CONTROL depth writes)
after using the pipeline with the VS disabled. We work around it by
using a depth cache flush PIPE_CONTROL before doing a depth write.

Fixes dEQP-VK.query_pool.*
2016-01-12 11:50:36 -08:00
Jason Ekstrand
6fc278ae4f anv/UpdateDescriptorSets: Respect write.dstArrayElement 2016-01-12 11:45:12 -08:00
Kristian Høgsberg Kristensen
af422fe9b3 Merge ../mesa into vulkan
Merge master again to get the brw_device_info with the
correct slice counts for KBL.
2016-01-12 10:54:26 -08:00
Kristian Høgsberg Kristensen
7df20f0c14 vk: Support SpvBuiltInViewportIndex 2016-01-12 10:53:59 -08:00
Kristian Høgsberg Kristensen
2b4bacb84b vk: Use the correct stride for CC_VIEWPORT structs 2016-01-12 10:53:59 -08:00
Tom St Denis
56fc2986d5 st/omx: Avoid segfault in deconstructor if constructor fails
If the constructor fails before the LIST_INIT calls the pointers
will be null and the deconstructor will segfault.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-01-12 19:13:19 +01:00
Christian König
6f898f740c vl: use preferred format for deinterlacing
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:42 +01:00
Christian König
5fdd4a5aef vl: improve motion adaptive deinterlacer
Handle other formats than YV12 as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:39 +01:00
Christian König
e945235aed st/va: add BOB deinterlacing v2
Tested with MPV.

v2: correctly handle compositor deinterlacing as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:35 +01:00
Christian König
3949cf0e02 st/va: add NV12 -> NV12 post processing v2
Usefull for mpv and GStreamer.

v2: use common functionality for size adjustment.

Signed-off-by: Indrajit-kumar Das <Indrajit-kumar.Das@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:28 +01:00
Christian König
9f644295dc st/va: use vl_video_buffer_adjust_size
Use the new helper function instead of open coding it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:24 +01:00
Christian König
da39637764 st/vdpau: use vl_video_buffer_adjust_size
Use the new helper function instead of open coding it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:21 +01:00
Christian König
52ca9a9b8b vl/buffers: extract vl_video_buffer_adjust_size helper
Useful for the state trackers as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:16 +01:00
Christian König
8479782361 st/va: make the implementation thread safe v2
Otherwise we might crash with MPV.

v2: minor cleanups suggested on the list.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-01-12 13:26:24 +01:00
Jason Ekstrand
62e56492c3 nir/spirv: Allow non-block variables with interface types in lists
The original objective was to disallow UBO and SSBO variables from the
variable lists.  This was accidentally broken in b208620fd when fixing some
other interface issues.
2016-01-12 01:32:19 -08:00
Jason Ekstrand
4141d13de5 nir/spirv: Handle matrix decorations on arrays of matrices
Connor's original shallow-copy plan works great except that a couple of the
decorations apply to a matrix which may be some levels down in an array.
We weren't properly unpacking that.  This fixes most of the remaining SSBO
and UBO layout tests.
2016-01-12 01:04:44 -08:00
Tapani Pälli
8926dc87af mesa: use gl_shader_variable in program resource list
Patch changes linker to allocate gl_shader_variable instead of using
ir_variable. This makes it possible to get rid of ir_variables and ir
in memory after linking.

v2: check that we do not create duplicate entries with
    packed varyings

v3: document 'patch' bit (Ilia Mirkin)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-12 09:07:10 +02:00
Tapani Pälli
4985159ad6 glsl: track total amount of uniform locations used
Linker missed a check for situation where we exceed max amount of
uniform locations with explicit + implicit locations. Patch adds this
check to already existing iteration over uniforms in linker.

Fixes following CTS test:
   ES31-CTS.explicit_uniform_location.uniform-loc-negative-link-max-num-of-locations

v2: use var->type->uniform_locations() (Timothy)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-12 07:52:44 +02:00
Jason Ekstrand
b208620fd2 nir/spirv: Allow creating local/global variables from interface types
Not sure if this is actually allowed, but it's not that hard to just strip
the interface information from the type.
2016-01-11 17:45:54 -08:00
Jason Ekstrand
350bbd3d15 nir/spirv: Allow base derefs in get_vulkan_resource_index 2016-01-11 17:45:24 -08:00
Jason Ekstrand
1c5393d57d nir/spirv: Allow OpBranchConditional without a merge
This can happen if you have a predicated break/continue.
2016-01-11 17:03:52 -08:00
Jason Ekstrand
24523e98a4 nir/spirv/cfg: Allow breaking from the continue block 2016-01-11 17:03:16 -08:00
Jason Ekstrand
c381906bbd nir/spirv: Stop wrapping carry/borrow in b2i
The upstream versions now return an integer like GLSL/SPIR-V want.
2016-01-11 17:02:30 -08:00
Jason Ekstrand
dee09d7393 nir/spirv: Better handle OpCopyMemory 2016-01-11 16:29:38 -08:00
Jason Ekstrand
1ca97cefb0 nir/spirv: Add no-op support for OpSourceContinued 2016-01-11 16:06:11 -08:00
Erik Faye-Lund
395b53dad6 main: get rid of needless conditional
We already check if the driver changed the completeness, we don't
need to duplicate that check. Let's just early out there instead.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-12 11:02:16 +11:00
Erik Faye-Lund
2a15dc0dd5 gallium/util: removed unused header-file
This hasn't been in use since c476305 ("gallium/util: pregenerate
half float tables"), where the last bit of run-time init using this
was killed. So let's just get rid of the pointless header.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-12 11:02:08 +11:00
Samuel Pitoiset
e67f5cac79 nvc0: do not force re-binding of compute constbufs on Fermi
Re-binding compute constant buffers after launching a grid have no effects
because they are not currently validated and because dirty_cp is not updated
accordingly. This might also prevent weird future behaviours when UBOs will
be bound for compute.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-12 00:47:20 +01:00
Ian Romanick
5be700e5cc meta: Unconditionally set GL_SKIP_DECODE_EXT
The path that depends on this will be avoided (by fallback_required) if
the extension is not supported.  _mesa_set_sampler_srgb_decode does not
generate GL errors (by design), so there are no problems there.

I kept this change separate and last because it is one of the few in the
series that is not a candidate for the stable branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:04 -08:00
Ian Romanick
1799eddb51 meta: Only bind the sampler in one place
All of the calls after the first _mesa_bind_sampler call are DSA style
calls that don't depend on the current binding.

I kept this change separate and last because it is one of the few in the
series that is not a candidate for the stable branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:04 -08:00
Ian Romanick
ae50157363 meta/decompress: Don't pollute the sampler object namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:04 -08:00
Ian Romanick
b03ee127d8 meta/decompress: Save and restore the sampler using gl_sampler_object instead of GL API object handle
Some meta operations can be called recursively.  Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used.  If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:04 -08:00
Ian Romanick
d4094f64c1 meta/decompress: Track sampler using gl_sampler_object instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
1998af813a meta/decompress: Use internal functions for sampler object access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
b85c5fe526 meta/generate_mipmap: Don't pollute the sampler object namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
d6782712a1 meta/generate_mipmap: Save and restore the sampler using gl_sampler_object instead of GL API object handle
Some meta operations can be called recursively.  Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used.  If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
36f413209f meta/generate_mipmap: Track sampler using gl_sampler_object instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
b94e7f398d meta/generate_mipmap: Use internal functions for sampler object access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
963065b76c meta/blit: Don't pollute the sampler object namespace in _mesa_meta_setup_sampler
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
533320e4d1 meta/blit: Save and restore the sampler using gl_sampler_object instead of GL API object handle
Some meta operations can be called recursively.  Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used.  If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.

v2: Add a comment explaining why samp_obj_save is set to NULL in
_mesa_meta_fb_tex_blit_begin.  This came out of review feedback from
Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
d796b491cc meta/blit: Use internal functions for sampler object access
This requires tracking the sampler object using the gl_sampler_object*
instead of the object name.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
ad5b1b41ae meta/blit: Group the SamplerParameteri calls with the other sampler operations
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
adb4b31bc3 mesa: Refator _mesa_BindSampler to make _mesa_bind_sampler
Pulls the parts of _mesa_BindSampler that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
4cf5c85ec7 mesa: Add _mesa_set_sampler_srgb_decode method
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
ecba76d3c0 mesa: Add _mesa_set_sampler_filters method
v2: Add filter enum assertions.  Suggested by Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Ian Romanick
08822b4b43 mesa: Add _mesa_set_sampler_wrap method
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-01-11 15:38:03 -08:00
Jason Ekstrand
bb5882e6af nir/spirv/cfg: Handle unreachable instructions 2016-01-11 15:35:15 -08:00
Jason Ekstrand
fc3f659aa9 nir/vars_to_ssa: Add phi sources for unreachable predecessors
It is possible to end up with unreachable blocks if, for instance, you have
an "if (...) { break; } else { continue; } unreachable()".  In this case,
the unreachable block does not show up in the dominance tree so it never
gets visited.  Instead, we go and visit all of those in follow-on pass.
2016-01-11 15:33:44 -08:00
Samuel Pitoiset
3029d60de7 nvc0: remove useless goto in nvc0_launch_grid()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-12 00:19:34 +01:00
Ian Romanick
5318bd351e mesa: Mark Identity as const
I was going to send this as review for dce1e1a8, but I missed that
window.  This saves 64 bytes of unshared data and prelaces it with 96
bytes shared text.  My guess is that some of the calls to memcpy get
optimized to something else.

   text	   data	    bss	    dec	    hex	filename
7847613	 220208	  27432	8095253	 7b8615	i965_dri.so before
7847709	 220144	  27432	8095285	 7b8635	i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Brian Paul <brianp@vmware.com>
2016-01-11 14:34:38 -08:00
Jason Ekstrand
c974b94578 nir/spirv: Properly handle OpConstantNull 2016-01-11 14:30:46 -08:00
Jason Ekstrand
96683065f2 nir/spirv: Assert that matrix types are valid 2016-01-11 14:30:46 -08:00
Jason Ekstrand
d032ede26f nir/types: Add an is_error helper 2016-01-11 14:30:46 -08:00
Jason Ekstrand
17cfafd83a nir/spirv: Handle OpNoLine 2016-01-11 14:30:46 -08:00
Chad Versace
52d4af6a3c anv/gen7: Remove unheeded helper begin_render_pass()
The helper didn't help much. It looks like a leftover from past
code-reuse.  Now it's called from exactly one location,
gen7_CmdBeginRenderPass(). So fold it into its caller.
2016-01-11 14:08:30 -08:00
Oded Gabbay
647d8e95d1 configure.ac: always define __STDC_CONSTANT_MACROS
The ISO C99 standard (7.18.4) specifies that C++
implementations should define UINT64_C only when
__STDC_CONSTANT_MACROS is defined.

Because we now use UINT64_C in our cpp files (since commit
208bfc493d), we need to add this define.

This also solves compilation errors with GCC 4.8.x on ppc64le machines.

v2: add this define to SCons build system

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-11 23:28:23 +02:00
Kenneth Graunke
aa6aa39a8f i965: Upload 3DSTATE_BINDING_TABLE_POINTERS_HS when !TCS on Gen9+.
Gen9+ requires us to emit 3DSTATE_BINDING_TABLE_POINTERS_HS for the
hull shader push constants to take effect.  The passthrough TCS uses
push constants for the default tessellation levels.  So, when those
change, we need to re-upload the binding table as well.

Fixes five Piglit tests on Skylake:
- spec/arb_tessellation_shader/vs-tes-vertex
- spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-quads
- spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-tris
- spec/arb_tessellation_shader/tes-read-texture
- spec/arb_tessellation_shader/tess_with_geometry

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-11 12:10:00 -08:00
Mark Janes
f2c8913536 Add missing platform information for KBL
In testing KBL, I found:

 - urb size was not set for slices gt1.5, gt2, and gt3.  The value I
   used for these slices (384) was taken from an earlier patch authored
   by Ben Widawsky.

 - slice count was missing.  This field was added by
   a403ad4f5a

With this commit, KBL passes piglit at parity with SKL.

Note: As requested by Kristian, Sarah modified this patch to drop
setting urb size for gt1.5, gt2, and gt3, since the correct default is
set in the GEN9 macro by commit c1e38ad370
"i965/skl: Use larger URB size where available."

Signed-off-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2016-01-11 11:24:20 -08:00
Jason Ekstrand
790565b06e anv/pipeline: Handle output lowering in anv_pipeline instead of spirv_to_nir
While we're at it, we delete any unused variables.  This allows us to prune
variables that are not used in the current stage from the shader.
2016-01-11 11:06:06 -08:00
Jason Ekstrand
b8ec48ee76 anv/pipeline: Only delete functions for SPIR-V shaders
We can assume that direct NIR shaders only have one entrypoint
2016-01-11 11:06:06 -08:00
Jason Ekstrand
30883adfb8 nir/spirv: Get rid of a bunch of stage asserts
Since we may have multiple entrypoints from different stages, we don't know
what stage we are actually in so these asserts are invalid.
2016-01-11 11:06:06 -08:00
Jason Ekstrand
9f4ba499d1 nir/spirv: Take an entrypoint stage as well as a name 2016-01-11 11:06:06 -08:00
Jason Ekstrand
83bf1f752d nir/dead_variables: Add a a mode parameter
This allows dead_variables to be used on any type of variable.
2016-01-11 11:06:06 -08:00
Ilia Mirkin
f21df5c513 nv50/ir: the whole point of data array is to hand out regular registers
Fixes: 0d3051f75a (nv50/ir: Fix scratch allocation size and file)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-11 13:01:11 -05:00
Dave Airlie
a9eace326e mesa/uniform_query: add IROUNDD and use for doubles->ints (v2)
For the case where we convert a double to an int, we should
round the same as we do for floats.

This fixes GL41-CTS.gpu_shader_fp64.state_query

v2: add IROUNDD (Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-11 02:27:51 +00:00
Timothy Arceri
124c9c2b97 glsl: replace unreachable code path with assert
The lower_named_interface_blocks() pass is called before we try
assign locations to varyings so this shouldn't be reachable.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-11 09:24:05 +11:00
Timothy Arceri
cf757f48ea Revert "glsl: replace unreachable code path with assert"
This reverts commit 98270fd20d.

Something went terribly wrong the commit is not what the commit
message says.
2016-01-11 09:20:39 +11:00
Timothy Arceri
98270fd20d glsl: replace unreachable code path with assert
The lower_named_interface_blocks() pass is called before we try
assign locations to varyings so this shouldn't be reachable.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-11 09:18:51 +11:00
Timothy Arceri
e4c5ace6a9 glsl: combine if blocks
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-11 09:18:45 +11:00
Rhys Kidd
7b4f8c827d mesa: Update todo regarding StencilOp and StencilOpSeparate.
OpenGL 2.0 function StencilOp() is in part internally implemented via
StencilOpSeparate(). This change happened some time ago, however the
accompanying doxygen todo comment was not accordingly updated.

Replace the outdated portion of this doxygen todo comment, leaving the
remainder unchanged.

Also better respect the 80 character suggested line length in this file.

v2: Fully remove comment, following code review by t_arceri@yahoo.com.au

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-11 09:10:17 +11:00
Kenneth Graunke
5e3edd4b28 glsl: Make bitfield_insert/extract and bfi/bfm non-vectorizable.
Currently, opt_vectorize() tries to combine:

    result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
    result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
    result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
    result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);

into a single ir_quadop_bitfield_insert opcode, which operates on
ivec4s.  However, GLSL IR's opcodes currently require the bits and
offset parameters to be scalar integers.  So, this breaks.

We want to be able to vectorize this eventually, but for now, just
chicken out and make opt_vectorize() bail by marking all the bitfield
insert/extract related opcodes as horizontal.  This is a relatively
uncommon case today, so we'll do the simple fix for stable branches,
and fix it properly on master.

Fixes assertion failures when compiling Shadow of Mordor vertex shaders
on i965 in vec4 mode (where OptimizeForAOS enables opt_vectorize()).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-09 15:46:37 -08:00
Pierre Moreau
0d3051f75a nv50/ir: Fix scratch allocation size and file
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-09 12:58:21 -05:00
Kristian Høgsberg Kristensen
a9c0e8f00f vk: Handle uninitialized FS inputs and gl_PrimitiveID
These show up as varying_to_slot[attr] == -1. Instead of storing -1 - 2
in swiz.Attribute[input_index].SourceAttribute, handle it correctly.
2016-01-09 01:03:20 -08:00
Kristian Høgsberg Kristensen
b538ec5409 vk: Support reseting timestamp query pools 2016-01-09 00:51:50 -08:00
Kristian Høgsberg Kristensen
925ad84700 vk: Advertise number of timestamp bits
We have 36 bits.
2016-01-09 00:51:14 -08:00
Kristian Høgsberg Kristensen
dae800daa8 vk: Expose correct timestampPeriod for SKL
Skylake uses 83.333ms per tick.
2016-01-09 00:50:04 -08:00
Kristian Høgsberg Kristensen
ec8e261208 vk: Mark VkEvent and VkSemaphore as done 2016-01-09 00:48:41 -08:00
Kristian Høgsberg Kristensen
bbb2a85c81 vk: Assert on use of uninitialized surface state
This exposes a case where we want to anv_CmdCopyBufferToImage() on an
image that wasn't created with VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT and
end up using uninitialized color_rt_surface_state from the meta image
view.
2016-01-08 23:51:11 -08:00
Kristian Høgsberg Kristensen
a8cdef3dce vk: Only begin subpass if we're continuing a render pass
If VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT is not set in
pBeginInfo->flags, we don't have a render pass or framebuffer. Change
the condition that guard looking up render pass and framebuffer to test
for VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT instead of
VK_COMMAND_BUFFER_LEVEL_SECONDARY.

Fixes all remaining crashes in dEQP-VK.api.command_buffers.*.
2016-01-08 23:02:46 -08:00
Kristian Høgsberg Kristensen
7c5e1fd998 vk: Remove unsupported warnings for Skylake and Broxton
These are working as well as Broadwell and Cherryiew. The recent merge
from mesa master brings in Kabylake device info and that should be all
we need to enable that.
2016-01-08 22:29:06 -08:00
Kristian Høgsberg Kristensen
f0993f81c7 Merge ../mesa into vulkan 2016-01-08 22:16:43 -08:00
Jason Ekstrand
cfdc955fd5 anv/reloc_list: Make valgrind explicitly check relocation data 2016-01-08 16:44:54 -08:00
Nicolai Hähnle
da5d4583e5 mesa: merge bind_atomic_buffers_{base|range}
Reduced code duplication should make the code more maintainable.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 19:37:38 -05:00
Nicolai Hähnle
5eb104d6ab mesa: merge bind_shader_storage_buffers_{base|range}
Reduced code duplication should make the code more maintainable.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 19:37:38 -05:00
Nicolai Hähnle
e8dd7cc303 mesa: merge bind_uniform_buffers_{base|range}
Reduced code duplication should make the code more maintainable.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 19:37:37 -05:00
Nicolai Hähnle
b3ca26cded mesa: merge bind_xfb_buffers_{base|range}
Reduced code duplication should make the code more maintainable.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 19:37:37 -05:00
Kristian Høgsberg Kristensen
81f7fd3c54 glsl: Don't add nir files to libglsl_la_SOURCES
SCons doesn't understand nir yet and doesn't want to compile the glsl to
nir pass. Move the files to their own variable so we can add it only for
automake.

Tested-by: Brian Paul <brianp@vmware.com>
2016-01-08 16:15:49 -08:00
Jason Ekstrand
7a1c4a0ccc nir/spirv: Add matrix determinants and inverses 2016-01-08 16:02:30 -08:00
Ilia Mirkin
e3706a7118 nv50,nvc0: use a face sysval to avoid the useless back-and-forth conversion
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-08 17:40:52 -05:00
Kristian Høgsberg Kristensen
82ad571abf glsl: Move _mesa_shader_stage_to_string/abbrev to shader_enums.c
These are used by code that doesn't necessarily link to libglsl.la. Move
them to shader_enums.[ch] where we keep similar helpers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-08 14:26:20 -08:00
Kristian Høgsberg Kristensen
1d25ef6ae7 i965: Move GLSL lowering passes out of libi965_compiler.la
The scope of libi965_compiler.la is to be able to take nir shaders and
generate i965 EU code.  As such, we don't want the GLSL IR lowering
passes in the library. With this change, libi965_compiler.la no longer
needs to link to libglsl.la.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-08 14:26:16 -08:00
Kristian Høgsberg Kristensen
e97caba1f6 glsl: Move glsl_to_nir files to LIBGLSL_FILES
libglsl_la_SOURCES includes both NIR_FILES and LIBGLSL_FILES, so for
libglsl.la consumers, this is a no-op. libnir.la however no longer uses
any GLSL IR infrastructure and can be used without also linking to
libglsl.la.

Acked-by: Matt Turner <mattst88@gmail.com>
2016-01-08 14:26:12 -08:00
Jordan Justen
1d54ac6c9f mesa: Use separate indices for UBO & SSBO during binding
Previously we were treating the binding index for Uniform Buffer
Objects and Shader Storage Buffer Objects as being part of the
combined BufferInterfaceBlocks array.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93322
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-08 13:11:31 -08:00
Jordan Justen
cf66a8ffb7 mesa: Map program UBOs and SSBOs to Interface Blocks
v2:
 * Fill UboInterfaceBlockIndex and SsboInterfaceBlockIndex in
   split_ubos_and_ssbos (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-08 13:10:28 -08:00
Jordan Justen
c7f6e42a7d anv: Increate dynamic pool block size from 2k to 16k
This is needed because compute push constant data is replicated per
invocation. For gen7, this can be up to 64. With a push constant data
max of 128 bytes, this is 8k of data. We need additional space for
local-id payloads, so we are going with 16k for now.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-08 13:03:30 -08:00
Jason Ekstrand
4e15d26e47 nir/spirv: Fix a small bug in row-major matrix loading 2016-01-08 12:27:25 -08:00
Sarah Sharp
5d349fab46 mesa: docs: Add link to planet.freedesktop.org
The freedesktop.org blog feeds aren't mentioned on either mesa3d.org or
any of the graphics project wikis (including the DRI wiki) on
freedeskop.org.  Fix that by linking to it from the sidebar.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 12:18:12 -08:00
Ilia Mirkin
dff1caccac freedreno: add ir3_compiler to gitignore
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-08 15:16:37 -05:00
Ilia Mirkin
90ba06618e gallium: add a RESQ opcode to query info about a resource
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:33 -05:00
Ilia Mirkin
ebfb5446c7 gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:33 -05:00
Ilia Mirkin
266d001261 gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERS
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:33 -05:00
Ilia Mirkin
8cb493acc7 tgsi: update atomic op docs
Specify that the operation only applies to the x component, not
per-component as previously specified. This is unnecessary for GL and
creates additional complications for images which need to support these
operations as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:33 -05:00
Ilia Mirkin
bdef02ff26 tgsi: add a is_store property
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:33 -05:00
Ilia Mirkin
50b8488926 tgsi: provide a way to encode memory qualifiers for SSBO
Each load/store on most hardware can specify what caching to do. Since
SSBO allows individual variables to also have separate caching modes,
allow loads/stores to have the qualifiers instead of attempting to
encode them in declarations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:32 -05:00
Ilia Mirkin
888ddd632d ureg: add buffer support to ureg
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:32 -05:00
Ilia Mirkin
8cc9a8aa2a tgsi: add ureg support for image decls
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-08 15:10:32 -05:00
Jose Fonseca
208bfc493d glsl: Ensure 64bits shift is used.
I believe that `1u << x`, where x >= 32 yields undefined results
according to the C standard.

Particularly MSVC says `warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)`.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:06:59 +00:00
Jose Fonseca
e378184d9c mesa/main: Avoid void function returning a value warning.
Trivial.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:06:59 +00:00
Oded Gabbay
6613042c4e configure.ac: add --enable-profile
For profiling mesa's code, especially llvmpipe, PROFILE should be
defined. Currently, this define can only be generated if mesa is
built using scons.
This patch makes it possible to generate this define also when building
mesa through automake tools.

v2:

- Change --enable-llvmpipe-profile to --enable-profile
- Add -fno-omit-frame-pointer to CFLAGS and CXXFLAGS when enabling profile

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-08 21:59:47 +02:00
Jason Ekstrand
fe2f44f2a4 nir/spirv: Use create_ssa_value for block_load_store 2016-01-08 11:50:34 -08:00
Jason Ekstrand
8b9dfb4b6d nir/spirv: Add real support for outer products 2016-01-08 11:38:59 -08:00
Jason Ekstrand
927ef0ea4e nir/spirv: Add support for add, subtract, and negate on matrices 2016-01-08 11:26:43 -08:00
Jason Ekstrand
393562f47b nir/spirv: Split ALU operations out into their own file 2016-01-08 11:26:43 -08:00
Marek Olšák
1e463d20ba nine: allow fragment shader POSITION and FACE to be system values
Reported-by: Axel Davy <axel.davy@ens.fr>
2016-01-08 20:07:16 +01:00
Marek Olšák
d0cf66d835 vl: allow fragment shader POSITION to be a system value
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:07:16 +01:00
Marek Olšák
69f43c2cc9 util/pstipple: allow fragment shader POSITION to be a system value
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:07:16 +01:00
Marek Olšák
8a13ce14fd st/mesa: add support for POSITION and FACE system values
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:07:15 +01:00
Marek Olšák
c00e534283 tgsi/scan: update for POSITION and FACE sytem values
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:07:15 +01:00
Marek Olšák
34738a92de gallium: add caps for POSITION and FACE system values
v2: document the integer behavior

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:07:15 +01:00
Marek Olšák
24737f2298 program: add a helper for rewriting FP position input to sysval
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:06:23 +01:00
Marek Olšák
4191c1a57c glsl: optionally declare gl_FragCoord & gl_FrontFacing as system values
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 20:06:23 +01:00
Marek Olšák
c07cf5f5a9 tgsi/ureg: handle redundant declarations in ureg_DECL_system_value
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-08 20:06:22 +01:00
Marek Olšák
c886422656 tgsi/ureg: remove index parameter from ureg_DECL_system_value
It can be trivially derived from the number of already declared system
values. This allows ureg users not to worry about which index to choose.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-08 20:06:22 +01:00
Marek Olšák
91e8f2b0a5 st/mesa: remove dead code from mesa_to_tgsi
These aren't part of ARB_fragment_program.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-08 20:06:22 +01:00
Edward O'Callaghan
cb513485a0 radeon, si: Use TGSI chan name defines in lp_build_emit_fetch() calls
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-08 12:18:36 -05:00
Edward O'Callaghan
b42254eff3 gallium/aux: Use TGSI chan name defines inplace of literals
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-08 12:18:24 -05:00
Nicolai Hähnle
d6db7ceedf mesa: check that internalformat of CopyTexImage*D is not 1, 2, 3, 4
The piglit copyteximage check has recently been augmented to test this, but
apparently it hasn't been fixed in Mesa so far.

This language also already appears in the OpenGL 2.1 spec (Ian).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-08 10:58:27 -05:00
Jason Ekstrand
72bff62e7f nir/spirv: Add support for SSBO atomics 2016-01-07 22:13:46 -08:00
Jason Ekstrand
fe57ad62a6 nir/spirv: Rework UBOs and SSBOs
This completely reworks all block load/store operations.  In particular, it
should get row-major matrices working.
2016-01-07 22:13:46 -08:00
Chad Versace
1818463733 anv/gen9: Fix cube surface state
For gen9 SURFTYPE_CUBE, the RENDER_SURFACE_STATE's Depth,
MinimumArrayElement, and RenderTargetViewExtent is in units of full
cubes and so must be divided by 6.

Fixes 'dEQP-VK.pipeline.image.view_type.cube_array.cube_array.*'.

Now all of 'dEQP-VK.pipeline.image.*' passes.
2016-01-07 17:20:25 -08:00
Chad Versace
24d82a3f79 anv/gen8: Refactor genX_image_view_init()
Drop the temporary variables for RENDER_SURFACE_STATE's Depth and
RenderTargetViewExtent. Instead, assign them in-place.

This simplifies the next commit, which fixes gen9 cube surfaces.
2016-01-07 17:20:25 -08:00
Kristian Høgsberg Kristensen
1b1dca75a4 vk: Make sure we emit binding table pointers after push constants
SKL needs this to make sure we flush the push constants. It gets a
little tricky, since we also need to emit binding tables before push
constants, since that may affect the push constants (dynamic buffer
offsets and storage image parameters).  This patch splits emitting
binding tables from emitting the pointers so that we can emit push
constants after binding tables but before emitting binding table
pointers.
2016-01-07 16:31:57 -08:00
Kristian Høgsberg Kristensen
a18b5e642c vk: Implement VK_QUERY_RESULT_WITH_AVAILABILITY_BIT 2016-01-07 16:31:57 -08:00
Kristian Høgsberg Kristensen
bbf3fc815b vk: Add missing DepthStallEnable to OQ pipe control 2016-01-07 16:31:57 -08:00
Kristian Høgsberg Kristensen
067dbd7a17 vk: Issue PIPELINE_SELECT before setting up render pass
We need to make sure we're selected the 3D pipeline before we start
setting up depth and stencil buffers.
2016-01-07 16:31:57 -08:00
Jordan Justen
d24e88b98e anv/gen7: Setup state to enable barrier() function
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 17:11:46 -08:00
Jordan Justen
36a2304686 anv/gen8: Setup state to enable barrier() function
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 17:11:46 -08:00
Jason Ekstrand
040e314143 i965/compiler: Enable more lowering in NIR
We don't need these for GLSL or ARB, but we need them for SPIR-V

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-07 16:14:42 -08:00
Jason Ekstrand
d00abcc283 nir/algebraic: Add more lowering
This commit adds lowering options for the following opcodes:

 - nir_op_fmod
 - nir_op_bitfield_insert
 - nir_op_uadd_carry
 - nir_op_usub_borrow

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-07 16:14:38 -08:00
Jason Ekstrand
b0d4ee520e nir/opcodes: Fix up uadd_carry and usub_borrow
Both were defined as returning bool but the gpu_shader5 functions are
defined to return int.  Also, we had the parameters for usub borrwo
backwards in the folding expression.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-07 16:14:25 -08:00
Ilia Mirkin
67b31b3c59 nvc0: add ARB_indirect_parameters support
I chose to make separate macros for this due to the additional
complexity and extra scratch usage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
9a54ccf30a st/mesa: expose ARB_indirect_parameters when the backend driver allows
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
e1eab5a76f mesa: add support for ARB_indirect_parameters draw functions
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
9327e2d312 mesa: add parameter buffer, used for ARB_indirect_parameters
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
b3e2c21fe5 glapi: add ARB_indirect_parameters definitions
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
7ca67c752b nvc0: add support for real ARB_multi_draw_indirect
The draw groups are now split up into groups of 32 if there's a
non-packed stride, or in groups of 400-500 if the draw data is packed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
d3e43baffe nvc0: adjust indirect draw macros to handle multiple draws at once
These are still invoked one at a time, but the underlying macro can
handle multiple draws.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
2860f20859 st/mesa: add support for new mesa indirect draw interface
This shifts all indirect draws to go through the new function. If the
driver doesn't have support for multi draws, we break those up and
perform N draws. Otherwise, we pass everything through for just a single
draw call.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
d67b9ba9a1 gallium: add caps to expose support for multi indirect draws
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
3e11656694 gallium: add sufficient draw interface to allow new indirect features
This makes it possible to support indirect multidraws as well as having
the number of such draws to come from a separate GPU resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
60d0cfd429 vbo: create a new draw function interface for indirect draws
All indirect draws are passed to the new draw function. By default
there's a fallback implementation which pipes it right back to
draw_prims, but eventually both the fallback and draw_prim's support for
indirect drawing should be removed.

This should allow a backend to properly support ARB_multi_draw_indirect
and ARB_indirect_parameters.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 18:38:45 -05:00
Roland Scheidegger
2923c7a0ed llvmpipe: do 64bit plane calculations in the sse path
The sse path was pretty much disabled for practical purposes because the
largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations.
This is actually not that difficult, though a problem is that we can't do
a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall,
the code still looks reasonable, though it's not like changes there in
setup really make much of a difference in the end...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:14 +01:00
Roland Scheidegger
fad283ba9e llvmpipe: don't store eo as 64bit int
eo, just like dcdx and dcdy, cannot overflow 32bit.
Store it as unsigned though just in case (it cannot be negative, but
in theory twice as big as dcdx or dcdy so this gives it one more bit).
This doesn't really change anything, albeit it might help minimally on
32bit archs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:14 +01:00
Roland Scheidegger
b61b9a377e llvmpipe: use aligned data for the assembly program in setup
Back in the day (before 24678700ed) the values
were not actually in a struct but even then I can't see why we didn't simply
align the values. Especially since it's trivial to do so.
(Not that it actually matters since the code is pretty much unused for now.)

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-01-08 00:34:13 +01:00
Roland Scheidegger
9db7309595 draw: initialize prim header flags when clipping lines
Otherwise, clipped lines would have undefined stippling reset bit if line
stippling is enabled.
(Untested, and I just assume copying over the bits from the original line
is actually the right thing to do.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-08 00:34:13 +01:00
Roland Scheidegger
64da11f052 draw: fix line stippling with unfilled prims
The unfilled stage was not filling in the prim header, and the line stage
then decided to reset the stipple counter or not based on the uninitialized
data. This causes some failures in conform linestipple test (albeit quite
randomly happening depending on environment).
So fill in the prim header in the unfilled stage - I am not entirely sure
if anybody really needs determinant after that stage, but there's at least
later stages (wide line for instance) which copy over the determinant as well.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:13 +01:00
Timothy Arceri
5cf156c6b4 glsl: replace null check with assert
This was added in 54f583a20 since then error handling has improved.

The test this was added to fix now fails earlier since 01822706ec

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-08 09:12:45 +11:00
Nicolai Hähnle
051603efd5 i965: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:12 -05:00
Nicolai Hähnle
1b74c02e83 i915: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:09 -05:00
Nicolai Hähnle
8882b46226 radeon: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:03 -05:00
Nicolai Hähnle
1c2187b1c2 st/mesa: use _mesa_delete_buffer_object
This is more future-proof than the current code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-07 17:06:58 -05:00
Nicolai Hähnle
6aed083b93 mesa/bufferobj: make _mesa_delete_buffer_object externally accessible
gl_buffer_object has grown more complicated and requires cleanup. Using this
function from drivers will be more future-proof.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:05:54 -05:00
Chad Versace
4c7f4c25d0 anv/meta: Fix hardcoded format size in anv_CmdCopy*
When looping through VkBufferImageCopy regions, for each region we
incremented the offset into the VkBuffer assuming the format size was 4.

Fixes CTS tests dEQP-VK.pipeline.image.view_type.cube_array.3d.* on
Skylake.
2016-01-07 13:56:58 -08:00
Oded Gabbay
f41b6cfb07 llvmpipe: use sse2 conv code for altivec
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.

This patch increase the FPS in POWER arch across the board
between 10%-25%

I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-07 22:07:02 +02:00
Chad Versace
a50c78a5cf isl: Add missing break statement in array pitch calculation
Fixes regression in ed98c374bd3f1952fbab3031afaf5ff4d178ef41.
2016-01-07 11:08:12 -08:00
Chad Versace
d1e6c1b29b isl/gen9: Fix array pitch of 3d surfaces
For tiled 3D surfaces, the array pitch must aligned to the tile height.

From the Skylake BSpec >> RENDER_SURFACE_STATE >> Surface QPitch:

   Tile Mode != Linear: This field must be set to an integer multiple of
   the tile height

Fixes CTS tests 'dEQP-VK.pipeline.image.view_type.3d.format.r8g8b8a8_unorm.*'.
Fixes Crucible tests 'func.miptree.r8g8b8a8-unorm.aspect-color.view-3d.*'.
2016-01-07 11:04:17 -08:00
Chad Versace
0af77fe5b6 isl: Refactor func isl_calc_array_pitch_sa_rows
Update the function to calculate the array pitch is *element rows*, and
it rename it accordingly to isl_calc_array_pitch_el_rows.
2016-01-07 11:04:17 -08:00
Jordan Justen
2f0a10149c isl: Assert that alignments are not 0 for isl_align
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Jordan Justen
4d68c477ad anv: Assert that alignments are not 0 for align_*
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Jordan Justen
be91f23e3b isl: Fix image alignment calculation
The previous code was resulting in an alignment of 0.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Marek Olšák
bca18057a3 radeonsi: adjust the parameters of si_shader_dump
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
0a51b010e5 radeonsi: move si_shader_dump call out of si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
b0df5f4c19 radeonsi: inline si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
c9c031f3d0 radeonsi: move si_shader_dump call out of si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
f8b34fe093 radeonsi: separate shader dumping code to si_shader_dump and *_dump_stats
Eventually, I'd like to dump stats for several combined binaries, which is
why you don't see a binary parameter in si_shader_dump_stats

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
ccd7d7e13d radeonsi: add si_shader_destroy_binary
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
5c9f104567 radeonsi: don't pass si_shader to si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
54ed83669e radeonsi: move si_shader_binary_upload out of si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
f20a76a4fd radeonsi: always keep shader code, rodata, and relocs in memory
We won't compile shaders in draw calls, but we will concatenate shader
binaries according to states in draw calls, so keep the binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
63345cfc3a radeonsi: don't pass si_shader to si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
2d3a96448a radeonsi: don't pass si_shader to si_shader_binary_read_config
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
20b9b5d7f5 radeonsi: add struct si_shader_config
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
890873d106 radeonsi: move NULL exporting into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
a72ed2f6bc radeonsi: move MRT color exporting into a separate function
This will be used by a fragment shader epilog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
0ffe3d3772 radeonsi: use EXP_NULL for pixel shaders without outputs
This never happens currently.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
677c65968b radeonsi: only use LLVMBuildLoad once when updating color outputs at the end
without LLVMBuildStore.

So:
- do LLVMBuildLoad
- update the values as necessary
- export

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
185267a6fd radeonsi: export "undef" values for undefined PS outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
1ce659f820 radeonsi: move MRTZ export into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
5f3e6b5b0f radeonsi: simplify setting the DONE bit for PS exports
First find out what the last export is and simply set the DONE bit there.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
e00f3f23b1 radeonsi: set SPI color formats and CB_SHADER_MASK outside of compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
4e597c25c7 radeonsi: write all MRTs only if there is exactly one output
This doesn't fix a known bug, but better safe than sorry.

Also, simplify the expression in si_shader.c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
746a7a7498 radeonsi: determine SPI_SHADER_Z_FORMAT outside of shader compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
2cb8bf90cd radeonsi: determine DB_SHADER_CONTROL outside of shader compilation
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
ff7e77724e tgsi/scan: set which color components are read by a fragment shader
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
18ec76730a tgsi/scan: fix tgsi_shader_info::reads_z
This has no users in Mesa.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
f3658be108 tgsi/scan: set if a fragment shader writes sample mask
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Kenneth Graunke
3e8f644ed3 glsl: Disallow vectorization of vector_insert/extract.
vector_insert takes a vector, a scalar location, and a scalar value,
and produces a new vector with that component updated.  As such, it
can't be vectorized properly.

vector_extract takes a vector and a scalar location, and returns
that scalar component of the vector.  Vectorization doesn't really
make any sense.

Treating both as horizontal operations makes sure the vectorizer
won't try to touch these.

Found by inspection.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-06 21:22:06 -08:00
Jason Ekstrand
d8cd5e333e anv/state: Pull sampler vk-to-gen maps into genX_state_util.h 2016-01-06 19:53:45 -08:00
Jason Ekstrand
195c60deb4 nir/spirv: Wrap borrow/carry ops in b2i
NIR specifies them as booleans but SPIR-V wants ints.
2016-01-06 17:13:06 -08:00
Jason Ekstrand
000eb00862 nir/spirv/cfg: Only set fall to true at the start of a case
Previously, we were setting it to true at the top of the switch statement.
However, this causes all of the cases to get executed until you hit a
break.  Instead, you want to be not executing at the start, start executing
when you hit your case, and end at a break.
2016-01-06 17:00:55 -08:00
Roland Scheidegger
8d4039ecdb softpipe: tell draw about the vertex layout we want
This makes it more similar to llvmpipe. It also allows us to let draw emit
code handle things like getting zeros for non-existing vs outputs
automatically. There probably isn't really any overhead either way, there isn't
really any "simply copy everything" code in the emit path it would copy each
attrib individually just the same. Likewise, we still do another mapping step
in softpipe as the layout may still not match exactly (same as in llvmpipe,
should probably nuke the pointless mapping in both drivers).

This fixes the piglit arb_fragment_layer_viewport no_gs/no_write tests.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 02:00:04 +01:00
Roland Scheidegger
8e3a76791f llvmpipe: use ints not unsigned for slots
They can't actually be 0 (as position is there) but should avoid confusion.

This was supposed to have been done by af7ba989fb
but I accidentally pushed an older version of the patch in the end...
Also prettify slightly. And make some notes about the confusing and useless
fs input "map".

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:59:17 +01:00
Roland Scheidegger
2dbc20e456 draw: nuke the interp parameter from vertex_info
draw emit couldn't care less what the interpolation mode is...
This somehow looked like it would matter, all drivers more or less
dutifully filled that in correctly. But this is only used for emit,
if draw needs to know about interpolation mode (for clipping for instance)
it will get that information from the vs anyway.
softpipe actually used to depend on that interpolation parameter, as it
abused that structure quite a bit but no longer.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:58:05 +01:00
Roland Scheidegger
892e2d1395 softpipe: don't abuse the draw vertex_info struct for something different
softpipe would calculate two "vertex layouts". The second one was however
just used for internal purposes, draw would know nothing about it even though
it looked exactly the same as the other one we tell draw about.
So, store that information separately as this was just confusing.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:57:21 +01:00
Roland Scheidegger
b64d008052 softpipe: fix mapping of "special" vs outputs
Unlike llvmpipe, softpipe always tells draw to emit the vertices as-is.
The two vertex layouts it calculates are a bit confusing, one which is just
used to tell draw to emit vertices as-is, and the other which has draw written
all over it but draw is completely unaware of and is used only to look up the
correct interpolation info later in setup.
Thus, the slots used are different to what llvmpipe does (I'm going to clean
up the confusing two layout stuff).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:56:43 +01:00
Roland Scheidegger
01761a38e8 llvmpipe: scratch some special handling of vp_index/layer
It was actually slightly buggy (missing initialization / setup not dependent
on new vs albeit I didn't see issues), but the case of non-existing attributes
is now handled by draw emit code so don't need that anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:55:45 +01:00
Roland Scheidegger
afa035031f draw: rework handling of non-existing outputs in emit code
Previously the code would just redirect requests for attributes which
don't exist to use output 0. Rework this to output all zeros instead which
seems more useful - in particular some extensions like
ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by
previous stages. That way, drivers don't have to special case this depending
if the vs/gs outputs some attribute or not.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:52:39 +01:00
Jordan Justen
de65d4dcaf anv: Fix build without VALGRIND
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-06 15:54:51 -08:00
Jason Ekstrand
5bbf060ece i965/compiler: Enable more lowering in NIR
We don't need these for GLSL or ARB, but we need them for SPIR-V
2016-01-06 15:30:53 -08:00
Jason Ekstrand
573351cb0f nir/algebraic: Add more lowering
This commit adds lowering options for the following opcodes:

 - nir_op_fmod
 - nir_op_bitfield_insert
 - nir_op_uadd_carry
 - nir_op_usub_borrow
2016-01-06 15:30:53 -08:00
Jason Ekstrand
1f503603d3 nir/opcodes: Fix the folding expression for usub_borrow 2016-01-06 15:30:53 -08:00
Jason Ekstrand
22804de110 nir/spirv: Properly implement Modf 2016-01-06 15:30:53 -08:00
Jason Ekstrand
1f3593d8a1 nir/builder: Add a helper for storing to a deref 2016-01-06 15:30:53 -08:00
Sarah Sharp
39c41be50d mesa: Add KBL PCI IDs and platform information.
Add PCI IDs for the Intel Kabylake platforms.  The IDs are taken
directly from the Linux kernel patches, which are under review:

http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2

The Kabylake PCI IDs taken from the kernel are rearranged to be in order
of GT type, then PCI ID.

Please note that if this patch is backported, the following fixes will
need to be added before this patch:

commit 28ed1e08e8 "i965/skl: Remove early platform support"
commit c1e38ad370 "i965/skl: Use larger URB size where available."

Thanks to Ben for fixing a bug around setting urb.size, and being
patient with my questions about what the various fields mean.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (KBL-GT2)
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2016-01-06 15:11:00 -08:00
Sinclair Yeh
0819287f56 svga: Rename SVGA_HINT_FLAG_DRAW_EMITTED
Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH
because preemptive flush can be unblocked by more commands than
draw.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:04:45 -07:00
Sinclair Yeh
9ccc716534 svga: allow preemptive flushing on DMA, update, and readback commands
The existing code effectively turns off preemptive flushing for all
but the regions used for draws.  This turns out to be overly
restrictive as some memory regions, e.g. GMR, may never get a draw
when used as a DMA upload staging area, causing problems for apps
that upload a large amount of textures, e.g. Unigine Heaven.

This patch fixes the Unigine Heaven memory allocation error and
has been verified to not cause a regression in the previous extended
retina display issue.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:03:33 -07:00
Charmaine Lee
b074a5b02d svga: skip vertex attribute instruction with zero usage_mask
In emit_input_declarations(), we are skipping declarations for those
registers that are not being used. But in emit_vertex_attrib_instructions(),
we are still emitting instructions to tweak the vertex attributes even if
they are not being used. This causes an assert in the backend because an
input register is not declared in the shader. This patch fixes the problem
by skipping the instruction if the vertex attribute is not being used.
Changes in this patch is originated from the code snippet from Jose as
suggested in bug 1530161.

Tested with piglit, Heaven, Turbine, glretrace.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:01:38 -07:00
Brian Paul
b59fad8478 st/mesa: minor clean-ups in st_atom.c
Remove useless comment.  Reformat code.
2016-01-06 15:53:47 -07:00
Brian Paul
85444ab08b st/mesa: replace bitmap size checks with assertion
The _mesa_Bitmap() caller already checks for zero-sized bitmaps.
2016-01-06 15:53:47 -07:00
Brian Paul
18038b9fd6 st/mesa: check texture target in allocate_full_mipmap()
Some kinds of textures never have mipmaps.  3D textures seldom have
mipmaps.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:47 -07:00
Brian Paul
c032ae85ee st/mesa: move mipmap allocation check logic into a function
Better readability and easier to extend.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
0d39b5fc3b main: s/GLuint/GLbitfield for state bitmasks
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c81ddc2092 vbo: s/GLuint/GLbitfield/ for state bitmasks
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
3c0521cd0f st/mesa: use GLbitfield in st_state_flags, add comments
Use GLbitfield instead of GLuint to be consistent with other variables.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
4cd1bd46ed s/GLuint/GLbitfield/ for st_invalidate_state() parameter
To match dd_function_table::UpdateState().

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
2cc52801c0 st/mesa: be more careful about state validation in st_Bitmap()
If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can
skip state validation before drawing a bitmap since that state doesn't
effect bitmap rendering.

This further increases the performance of the ipers demo on llvmpipe
to about what it was before commit 36c93a6fae.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
b6bcf08641 st/mesa: move bitmap cache flushing out of state validation
Just do it where needed (before drawing, clearing, etc).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c28d72a347 st/mesa: check state->mesa in early return check in st_validate_state()
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache.  And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.

This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c75d00e054 st/mesa: protect debug printf() with a conditional instead of comment 2016-01-06 15:53:46 -07:00
Brian Paul
72d6bbca5b st/mesa: fix comment indentation in st_flush_bitmap_cache() 2016-01-06 15:53:46 -07:00
Timothy Arceri
e58be8ac0e glsl: fix varying slot allocation for blocks and structs with explicit locations
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.

Also don't assign a negitive to the unsigned expl_location variable.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 09:44:32 +11:00
Timothy Arceri
47dde2bd45 glsl: don't try adding built-ins to explicit locations bitmask
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:26 +11:00
Timothy Arceri
ac6e2c2056 glsl: fix overlapping of varying locations for arrays and structs
Previously we were only reserving a single location for arrays and
structs.

We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.

This patch fixes both issues.

V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.

Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:20 +11:00
Timothy Arceri
5907a02ab6 glsl: create helper to remove outer vertex index array used by some stages
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:16 +11:00
Timothy Arceri
30991d7389 glsl: remove unused varyings before packing them
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.

V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:12 +11:00
Krzysztof Sobiecki
0d7477a289 gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.

Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-06 16:09:12 -05:00
Eric Anholt
bbd29f1375 vc4: Fix driver build from last minute rebase fix.
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
2016-01-06 12:49:45 -08:00
Eric Anholt
25aa436e86 vc4: Optimize out a comparison for bcsel based on an ALU comparison
We routinely have code like:

	vec1 ssa_220 = fge ssa_104, ssa_61
	vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105

and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.

total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs:     9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs:     17232 -> 16717 (-2.99%)
2016-01-06 12:43:09 -08:00
Eric Anholt
7a9eb76786 vc4: Add missing sRGB decode to texel fetches.
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
2016-01-06 12:43:09 -08:00
Eric Anholt
f01ca9eeda vc4: Add support for GL_ARB_texture_swizzle.
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
2016-01-06 12:43:09 -08:00
Eric Anholt
12519a972f vc4: Use NIR texture lowering for texture swizzling.
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.

This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.

instructions in affected programs:     165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs:     461 -> 451 (-2.17%)
2016-01-06 12:43:08 -08:00
Eric Anholt
71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt
0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Chad Versace
8284786c5d anv/gen9: Teach gen9_image_view_init() about 1D surface qpitch
QPitch is usually expressed as rows of surface elements (where a surface
element is an compression block or a single surface sample. Skylake 1D
is an outlier; there QPitch is expressed as individual surface
elements.
2016-01-06 09:38:57 -08:00
Chad Versace
e05b307942 isl: Add isl_surf_get_array_pitch_el()
Will be needed to program SurfaceQPitch for Skylake 1D arrays.
2016-01-06 09:38:57 -08:00
Chad Versace
c1e890541e isl/gen9: Support ISL_DIM_LAYOUT_GEN9_1D 2016-01-06 09:38:57 -08:00
Chad Versace
eea2d4d059 isl: Don't align phys_slice0_sa.width twice
It's already aligned to the format's block width. Don't align it again
in isl_calc_row_pitch().
2016-01-06 09:38:57 -08:00
Chad Versace
39d043f94a isl: Fix the documented units of isl_surf::row_pitch
It's the pitch between surface elements, not between surface samples.
2016-01-06 09:38:57 -08:00
Chad Versace
dcb9c11dc7 anv/gen9: Fix oob lookup of surface halign, valign
For 1D surfaces and for surfaces with Yf or Ys tiling, the hardware
ignores SurfaceVerticalAlignment and SurfaceHorizontalAlignment.
Moreover, the anv_halign[] and anv_valign[] lookup tables may not even
contain the surface's actual alignment values. So don't do the lookup
for those surfaces.
2016-01-06 09:38:57 -08:00
Chad Versace
94566d9b68 anv/meta: Teach meta how to blit from a 1D image
Meta needed a VkShader with a 1D sampler type.
2016-01-06 09:38:57 -08:00
Edward O'Callaghan
1953cee6d7 gallium/drivers/svga: Use unsigned for loop index
Fix a 's/unsigned int/unsigned/' consistency case while here.

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
8e2a8ec731 gallium/drivers/r600: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
76a7d6f412 gallium/drivers/ilo: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
5071c192cc gallium: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
bfabd5e74a gallium/drivers: Remove unnecessary semicolons
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
67d4b4b28c gallium: Remove unnecessary semicolons
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Oded Gabbay
9d59b9d00c llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
 Name            Before     After    Delta
------------------------------------------------
openarena        16.35      16.7     2.14%
xonotic          4.707      4.97     5.57%

glmark2 didn't show a significant (more than 1%) difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
925c46cfc4 llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   139.8      142.7    2.07%

openarena and xonotic didn't show a significant (more than 1%)
difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
3bbe16ea79 llvmpipe: Optimize do_triangle_ccw for POWER8
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   136.6      139.8    2.34%
openarena         16.14      16.35    1.30%
xonotic           4.655      4.707    1.11%

v2:

- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
e99555ef0b llvmpipe: add POWER8 portability file - u_pwr8.h
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.

All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".

Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.

All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.

All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).

v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
afe88f66a8 configure.ac: Detect if running on POWER8 arch
To determine if we could use special POWER8 assembly directives, we first
need to detect whether we are running on POWER8 architecture. This patch
adds this detection to configure.ac and adds the necessary compilation
flags accordingly.

v2:

- Add option to disable POWER8 instructions generation
- Detect whether building on BE or LE machine and build with
  -mpower8-vector only on LE machine
- Make the printed messages more standard

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Kenneth Graunke
7295f4fcc2 nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.
The nir_opt_algebraic rule

(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),

can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv.  (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point.  So, make NIR do it for us.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-05 19:22:11 -08:00
Kenneth Graunke
bd21b54607 i965: Only turn on ARB_compute_shader if we can write registers.
Compute shaders require reconfiguring the L3 for shared local memory
support.  We have to be able to write the L3 registers to do that.

This effectively turns off compute shaders prior to Kernel 4.2.

(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly.  I've moved it out in this patch.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-01-05 18:07:27 -08:00
Kenneth Graunke
25b7e4a01f i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.
That's what it's for.  Plus, we actually implement rcp.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-05 18:07:27 -08:00
Timothy Arceri
3d402d4450 mesa: fix GL_MAX_NAME_LENGTH query for tessellation shaders
This fixes some piglit subtests for ARB_program_interface_query.

V3: remove some of the unnecessary parentheses
V2: fix alignment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-06 12:01:09 +11:00
Jason Ekstrand
7a069bea5d nir/spirv: Fix switch statements with duplicate cases 2016-01-05 16:18:01 -08:00
Jason Ekstrand
506a467f16 nir/spirv/cfg: Assert that blocks only ever get added once
This effectively prevents infinite loops in cfg_walk_blocks.
2016-01-05 15:56:59 -08:00
Timothy Arceri
e1e1b67878 glsl: don't change the varying type in validation code
There is a function dedicated to demoting unused varyings lets
trust it to do its job.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:58 +11:00
Timothy Arceri
21590a307c glsl: move lowering after matching validation
After lowering the matching flag is_unmatched_generic_inout is lost so
we need to move this validation before lowering.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:54 +11:00
Timothy Arceri
0508d9504a glsl: only add outward facing varyings to resourse list for SSO
An SSO program can have multiple stages and we only want to add the externally
facing varyings. The current code was adding both the packed inputs and outputs
for the first and last stage of each program.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-06 10:52:48 +11:00
Jason Ekstrand
71a25a0b07 nir/spirv: Simplify phi node handling
Instead of trying to crawl through predecessor chains and build phi nodes,
we just do a poor-man's out-of-ssa on the spot.  The into-SSA pass will
deal with putting the actual phi nodes in for us.
2016-01-05 14:59:40 -08:00
Anuj Phogat
4d2a7f5111 i965/gen9: Modify the conditions to use blitter on skl+
Conditions modified allow skl+ to use blitter:
 - for all tiling formats
 - to write data to YF/YS tiled surfaces

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
0bf037c0fe i965/gen9: Return false in place of assert in intelEmitCopyBlit()
This allows the fallback paths to handle it correctly.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
5cbe01c83f i965/gen9: Remove regions overlap check in fast copy blit
Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Anuj Phogat
3c8b97a45b i965/gen9: Don't use fast copy blit in case of non power of 2 cpp
Fast copy blit is currently enabled for use only with Yf/Ys tiling.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-05 13:43:32 -08:00
Jason Ekstrand
ec899f6b42 anv/pipeline: Lower indirect temporaries and inputs 2016-01-05 13:42:52 -08:00
Jason Ekstrand
bff45dc44e nir: Add an indirect deref lowering pass 2016-01-05 13:42:52 -08:00
Ian Romanick
ee4676aa57 i915/i965: Fix typo in perf_debug message
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-05 13:18:45 -08:00
Brian Paul
a13e9adbee st/mesa: minor indentation fixes 2016-01-05 13:04:46 -07:00
Kristian Høgsberg Kristensen
30521fb19e vk: Implement a basic pipeline cache
This is not really a cache yet, but it allows us to share one state
stream for all pipelines, which means we can bump the block size without
wasting a lot of memory.
2016-01-05 12:03:21 -08:00
Kristian Høgsberg Kristensen
f551047751 vk: Destroy device->mutex when destroying the device 2016-01-05 12:03:21 -08:00
Brian Paul
f4caa7d2fc draw: minor indentation fix 2016-01-05 13:03:05 -07:00
Brian Paul
dce1e1a8eb mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:05 -07:00
Brian Paul
95d412181d util: add debug_dump_ubyte_rgba_bmp()
Like debug_dump_float_rgba_bmp() but takes ubyte values.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
f04d7439a0 mesa: check for z=0 in _mesa_Vertex3dv()
It's very rare that a GL app calls glVertex3dv(), but one in particular
calls it lot, always with Z = 0.  Check for that condition and convert
the call into glVertex2f.  This reduces VBO memory used and reduces
the number of times we have to switch between float[2] and float[3]
vertex formats in the svga driver.  This results in a small but
measurable performance improvement.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
eec8d7e7e0 svga: fix test for SVGA_NEW_STIPPLE
We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon
stipple state changes.  Before, we only set the flag when we were
enabling stipple, but not disabling.

We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state
set since it's a subset of SVGA_NEW_RAST, but let's be explicit.

This doesn't fix any known bugs.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
993b04ee2c svga: add some comments in svga_state_vs.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
fc07658895 svga: change svga_hw_view_state::dirty to boolean
Since it's a true/false value.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
077aa3be93 svga: avoid emitting redundant SetVertexBuffers() commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
b11bd20889 svga: check for no-ops in svga_bind_sampler_states()
and svga_set_sampler_views().  If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Chad Versace
8d6f0a1b80 isl: Don't force linear for 1d surfaces in gen7_filter_tiling()
gen7_filter_tiling() should filter out only tiling flags that are
incompatible with the surface. It shouldn't make performance decisions,
such as forcing linear for 1D; that's the role of the caller.
2016-01-05 11:37:32 -08:00
Chad Versace
8135786605 isl: Document gen7_filter_tiling() 2016-01-05 11:35:13 -08:00
Chad Versace
33f06842be isl: Prefer linear tiling for 1D surfaces 2016-01-05 11:35:13 -08:00
Chad Versace
98af1cc6d7 isl: Remove isl_format_layout::bpb
struct isl_format_layout contained two near-redundant members: bpb (bits
per block) and bs (block size). There do exist some hardware formats for
which bpb != 8 * bs, but Vulkan does not use them. Therefore we don't
need bpb.
2016-01-05 10:00:39 -08:00
Chad Versace
89b68dc8d0 anv: Use isl_format_layout::bs instead of ::bpb
For all formats used by Vulkan, 8 * bs == bpb.
(bs=block_size_in_bytes, bpb=bits_per_block)
2016-01-05 10:00:39 -08:00
Chad Versace
a1d64ae561 isl: Align isl_surf::phys_level0_sa to the format's compression block 2016-01-05 09:52:07 -08:00
Chad Versace
2172f0e9bb isl: Fix mis-documented units of isl_surf::phys_level_sa
It's in physical surface samples. Hence the _sa suffix.
2016-01-05 09:52:07 -08:00
Ilia Mirkin
6531ccb705 i965: quieten compiler warning about out-of-bounds access
gcc 4.9.3 shows the following error:

brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
    return brw_names[slot - VARYING_SLOT_MAX];

This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-05 12:07:53 -05:00
Julien Isorce
777d1453f1 build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.

Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11

LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple                  :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG4Simple            :	VAEntrypointVLD
      VAProfileMPEG4AdvancedSimple    :	VAEntrypointVLD
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileH264Baseline           :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
abb30b9c8b nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
7ba27f60f7 nouveau: split nouveau_vp3_bsp in begin/next/end
It allows to call nouveau_vp3_bsp_next multiple times
between one begin/end.

It is required to support st/va.

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
[imirkin: create strparm_bsp function, simplified w0 calculation]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
851e7e12aa st/va: count number of slices
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
2016-01-05 15:02:47 +00:00
Ilia Mirkin
14f21f53d5 i965/wm: use binding size for ubo/ssbo when automatic size is unset
This fixes the same tests that commit 8cf2e892f was attempting to fix:

ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize

as confirmed by Samuel.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-05 01:30:09 -05:00
Ilia Mirkin
a1d664a0b7 Revert "i965/wm: use proper API buffer size for the surfaces."
This reverts commit 8cf2e892fc. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-05 01:29:49 -05:00
Jason Ekstrand
8b403d599b nir/spirv: Add support for the ControlBarrier instruction 2016-01-04 22:08:24 -08:00
Jason Ekstrand
ba7b5edc26 anv/UpdateDescriptorSets: Use the correct index for the buffer view 2016-01-04 21:36:11 -08:00
Jason Ekstrand
b8f0bea07a nir/spirv: Implement extended add, sub, and mul 2016-01-04 20:59:16 -08:00
Jason Ekstrand
3a3c4aecf1 nir/spirv: Add support for bitfield operations 2016-01-04 17:37:10 -08:00
Jason Ekstrand
01ba96e059 nir/spirv: Add support for msb/lsb opcodes 2016-01-04 17:37:10 -08:00
Jason Ekstrand
f32370a536 nir/spirv: Add a documenting assert for OpConstantSampler 2016-01-04 17:37:10 -08:00
Jason Ekstrand
0309199802 nir/spirv: Add initial support for ConstantNull 2016-01-04 17:37:10 -08:00
Chad Versace
8cc21d3aea isl: Align single-level 2D surfaces to compression block
This fixes an assertion failure at isl.c:1003.

Reported-by: Nanley Chery <nanley.g.chery@intel.com>
2016-01-04 16:48:58 -08:00
Jason Ekstrand
151694228d anv/formats: Hand out different formats based on tiled vs. linear 2016-01-04 16:08:05 -08:00
Jason Ekstrand
f665fdf0e7 anv/image_view: Separate vulkan and isl formats
Previously, anv_image_view had a anv_format pointer that we used for
everything.  This commit replaces that pointer with a VkFormat enum copied
from the API and an isl_format.  In order to implement RGB formats, we have
to use a different isl_format for the actual surface state than the obvious
one from the VkFormat.  Separating the two helps us keep things streight.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
ceb05131da anv_get_isl_format: Support depth+stencil aspect value
You just get the depth format in this case.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
a7cc12910d anv/image: Do more work in anv_image_view_init
There was a bunch of common code in gen7/8_image_view_init that we really
should be sharing.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
87dd59e578 anv/formats: Rework GetPhysicalDeviceFormatProperties
It now calls get_isl_format to get both linear and tiled views of the
format and determines linear/tiled properties from that.  Buffer properties
are determined from the linear format.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
2712c0cca3 anv/formats: Add a tiling parameter to get_isl_format
Currently, this parameter does nothing.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
603a3a9439 isl/format: Add some helpers for working with RGB formats 2016-01-04 16:08:05 -08:00
Jason Ekstrand
0639f44d0f isl: Add a file for format helpers 2016-01-04 16:08:05 -08:00
Jason Ekstrand
5f5fc23e7c genX/state: Pull some generic helpers into a shared header 2016-01-04 16:08:05 -08:00
Jason Ekstrand
ad9ff4f2b2 meta/blit: Rework how format and aspect choices are made
This commit does two things.  First, it introduces choose_* functions for
chosing formats and aspects.  Second, it changes the copy (not blit) code
to use appropreately sized UINT formats for everything except depth.  There
are two main reasons for this:  First, it means that compressed and other
non-renderable texture upload should "just work" because it won't be
tripping over non-renderable formats.  Second, it allows us to easly copy
an RGB buffer to and from an RGBX image because the formats will get
switched over to their UINT variants and the shader will deal with the
extra channel for us.
2016-01-04 16:08:05 -08:00
Jason Ekstrand
3200a81a55 anv/image: Add a vk_format field
We've been trying to move away from anv_format for a while and this should
help with the transition.  There are cases (mostly in meta) where we need
the original format for the image and not the isl_format.  These will be
moved over to the new vk_format and everythign else will use the isl_format
from the particular anv_surface.
2016-01-04 16:08:05 -08:00
Nicolai Hähnle
2123bfcc9c st/mesa: make KHR_debug output independent of context creation flags (v2)
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.

v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-04 18:40:49 -05:00
Chad Versace
0d7614dce6 isl: Document mnemonic in Yf and Ys tiling
The 'f' means "four K". The 's' means "sixty-four K".
2016-01-04 15:37:39 -08:00
Kristian Høgsberg Kristensen
0f34a4ec4e isl: Use isl_align_npot for row_pitch
Many formats are not power-of-two bytes per pixels and we need the
non-power-of-two align macro here.

This reverts the revert from 4f9a211b, but keeps the change from a827b553
that fixed the yuv if-else mix-up.
2016-01-04 10:53:47 -08:00
Kristian Høgsberg Kristensen
abc1c9878f vk: Don't leak pipeline if initialization fails 2016-01-04 10:42:50 -08:00
Kristian Høgsberg Kristensen
fca1c08e34 vk: Allocate subpass attachment in one big block
This avoids making a lot of small allocations and handles allocation
failure correctly.

Fixes dEQP-VK.api.object_management.alloc_callback_fail.* failures.
2016-01-04 10:07:10 -08:00
Kristian Høgsberg Kristensen
5526c1782a vk: Handle allocation failures in meta init paths
Fixes dEQP-VK.api.object_management.alloc_callback_fail.* failures.
2016-01-04 10:07:08 -08:00
Kristian Høgsberg Kristensen
b2ad2a20b6 vk: Handle allocation failure in anv_pipeline_init()
Fixes dEQP-VK.api.object_management.alloc_callback_fail.* failures.
2016-01-04 10:06:50 -08:00
Kristian Høgsberg Kristensen
3954594eb4 vk: Call vk_error when we generate a VK_ERROR 2016-01-04 10:02:50 -08:00
Kristian Høgsberg Kristensen
75e01c8b2d vk: Only finish wayland wsi if we created it
Failure during instance creation will leave instance->wayland_wsi
undefined. When we then try to clean that up we crash. Set
instance->wayland_wsi to NULL on failure and only clean it up if it's
non-NULL.

Fixes part of dEQP-VK.api.object_management.alloc_callback_fail.*
2016-01-04 10:02:50 -08:00
Chad Versace
05c22f2d74 isl: Fix row pitch for linear buffers
isl always aligned the row pitch to the surface's image alignment.  This
was sometimes wrong when the surface backed a VkBuffer. For a VkBuffer,
the surface's row pitch is set by VkBufferImageCopy::bufferRowLength,
whose required alignment is only that of the VkFormat.

In particular, VkBuffer rows are packed in many dEQP and Crucible tests.
And packed rows are rarely aligned to the surface's image alignment.

Fixes: dEQP-VK.pipeline.image.view_type.2d.format.r8g8b8a8_unorm.size.13x13
2016-01-04 09:57:25 -08:00
Chad Versace
a827b553d9 isl: Fix swapped if-else in isl_calc_row_pitch
The YUV case was applied to non-YUV formats. Oops.
2016-01-04 09:57:23 -08:00
Ilia Mirkin
b16c9be4a5 nvc0: scale up inter_bo size so that it's 16M for a 4K video
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-04 11:32:45 -05:00
Ilia Mirkin
b5f2f7073f nv50,nvc0: fix crash when increasing bsp bo size for h264
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-04 11:32:45 -05:00
Samuel Iglesias Gonsálvez
8cf2e892fc i965/wm: use proper API buffer size for the surfaces.
Commit 5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.

This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:

ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-04 07:52:24 +01:00
Marek Olšák
86fa48426c radeonsi: remove unused parameter from si_shader_binary_read_config
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
b6d95248f0 radeonsi: move si_shader_binary_upload out of si_shader_binary_read
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
7fa6bb47e3 gallium/radeon: dump LLVM module outside of radeon_llvm_compile
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
fb98acb5a1 gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5
It's the same behavior that we use for later LLVM.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
cd7f252b11 gallium/radeon: r600_can_dump_shader should get TGSI processor type directly
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
fd7000bd78 radeonsi: pass TGSI processor type to si_shader_binary_read for dumping
the parameter will be used later

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
3ce0a2fd7f radeonsi: pass TGSI processor type to si_compile_llvm for dumping
the parameter will be used later

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Marek Olšák
dd79034ca6 radeonsi: rename shader parameter definitions and variables for more clarity
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-03 22:41:16 +01:00
Ilia Mirkin
34217018c4 nvc0/ir: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 16:20:52 -05:00
Ilia Mirkin
20dee333f3 st/mesa: use PK2H/UP2H when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:47 -05:00
Ilia Mirkin
e9f43d6333 gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:41 -05:00
Ilia Mirkin
459e4532af tgsi: update PK2H/UP2H channel behavior info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:20:27 -05:00
Ilia Mirkin
6eb74b87b8 gallium: document PK2H/UP2H
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-03 16:19:57 -05:00
Samuel Pitoiset
0ab2c21b93 st/mesa: fix parameter names for tesseval/tessctrl prototypes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 22:01:18 +01:00
Ilia Mirkin
bf34748b39 nouveau: fix double-const qualifier
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 11:32:15 -05:00
Rob Clark
3684e899ea freedreno/ir3: use NIR_PASS helper macros
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark
317628dbb3 nir: extract out helper macros for running passes
Note these are a bit uglier, due to avoidance of GNU C extensions.  But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-03 09:11:27 -05:00
Rob Clark
23bd6affb2 freedreno/ir3: we require block_index metadata
Found during NIR_TEST_CLONE=1 piglit run.  We were using block->index
but forgetting to require it.  Causing things to not work with a cloned
shader which didn't preserve block_index.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark
74135f804a freedreno/ir3: refactor NIR IR handling
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass.  This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Rob Clark
ab4efb19dc freedreno/ir3: drop unnecessary unreachable() case
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-03 09:11:27 -05:00
Samuel Pitoiset
6a49fcfb1f gallium/tests: fix build with clang compiler
Nested functions are supported as an extension in GNU C, but Clang
don't support them.

This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-03 12:18:00 +01:00
Samuel Pitoiset
53dddab78c nv50,nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-03 12:17:05 +01:00
Kenneth Graunke
28dea26626 i965: Make TCS precompile use the TES primitive mode when available.
If there's a linked TES program, we should just use the actual
primitive mode.  If not, just guess triangles (as we did before).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke
4a1c8a3037 i965: Push most TES inputs in SIMD8 mode.
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache.  Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.

However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs.  An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.

Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.

Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.

Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).

Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke
b022150d70 i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.
We need a MOV to replicate g0.0<0,1,0> to all 8 channels.  Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD.  However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.

All input loads can use the same header - we don't need to re-expand
g0 every time.  CSE accomplishes this, saving instructions.

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0

total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke
53a9b6223f i965: Move 3-src subnr swizzle handling into the vec4 backend.
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums.  When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.

In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum.  This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].

This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.

This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend.  I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Eric Anholt
64253fdb2e vc4: Fix build from upload changes. 2016-01-02 17:33:19 -08:00
Nicolai Hähnle
8f384d07a8 gallium/radeon: send LLVM diagnostics as debug messages
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.

Removed an fprintf that could never be triggered.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
255ccd1e99 gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)
This will allow us to send shader debug info via the context's debug callback.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
f8cd11403a radeonsi: send shader info as debug messages in addition to stderr output
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.

Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
4bb1c8dfec radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)
This will allow us to send shader debug info.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Nicolai Hähnle
b6847062dd gallium/radeon: implement set_debug_callback
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Jason Ekstrand
f6c4658cde nir/spirv: Fix group decorations
They were completely bogus before.  For one thing, OpDecorationGroup
created a value of type undef rather than decoration_group.  Also
OpGroupMemberDecorate didn't properly apply the decoration to the different
members of the different groups.  It *should* be correct now but there's no
good way to test it yet.
2016-01-02 11:53:36 -08:00
Jason Ekstrand
6b0b57225c anv/device: Only allocate whole pages in AllocateMemory
The kernel is going to give us whole pages anyway, so allocating part of a
page doesn't help.  And this ensures that we can always work with whole
pages.
2016-01-02 07:52:24 -08:00
Marek Olšák
ecb2da1559 u_upload_mgr: allow specifying PIPE_USAGE_* for the upload buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák
37d0aea772 u_upload_mgr: remove alignment parameter from u_upload_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák
1bb79c3a7b u_upload_mgr: pass alignment to u_upload_buffer manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
e0f932846c u_upload_mgr: pass alignment to u_upload_data manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
020009f7cc u_upload_mgr: pass alignment to u_upload_alloc manually
The fixed alignment of u_upload_mgr will go away.
This is the first step.

The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
ffc4716e97 u_upload_mgr: rework the application of alignment
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.

Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.

This a prerequisite for allowing a variable alignment for suballocations.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
36c93a6fae st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.

v2: don't remove st_upload_constants calls, clarify why they're needed

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00
Marek Olšák
294ed5cd13 program: add _mesa_reserve_parameter_storage
The next commit will use this.

Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00
Jordan Justen
a2942d8f26 mesa: Fix warning with MESA_VERBOSE=api for BindBufferRange
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-01 17:27:14 -08:00
Ilia Mirkin
c1d14c6817 nv50,nvc0: make sure there's pushbuf space and that we ref the bo early
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-01 19:52:41 -05:00
Ilia Mirkin
33a415310b st/mesa: sort extensions enablement array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-01 19:50:02 -05:00
Rob Clark
816ddee6b8 nir/lower_clip: add missing writemask on store
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-01 15:32:46 -05:00
Jordan Justen
3dce7bf268 mesa: Add MESA_VERBOSE=api for GL_ARB_program_interface_query
v2:
 * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-01 12:00:51 -08:00
Jordan Justen
36db91c4c4 mesa: Add MESA_VERBOSE=api for several indexed BindBuffer variants
v2:
 * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-01 12:00:51 -08:00
Jason Ekstrand
f076d5330d anv/device: Handle non-4k-aligned calls to MapMemory
As per the spec:

   minMemoryMapAlignment is the minimum required alignment, in bytes, of
   host-visible memory allocations within the host address space. When
   mapping a memory allocation with vkMapMemory, subtracting offset bytes
   from the returned pointer will always produce a multiple of the value of
   this limit.
2016-01-01 09:29:29 -08:00
Dave Airlie
b835255992 st/glsl_to_tgsi: fix block movs for doubles
While playing with fp64, I disable varying packing to debug
something else, and noticed we never emitted half the output
movs for double matrix arrays.

We should be moving the left index two slots for dual
source doubles, and the right index two slots for non-vs
input doubles.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
d214ce86cf st/glsl_to_tgsi: handle different attrib size
vertex inputs are counted differently in some cases, with
vertex inputs we need to make sure we don't double count them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
dc7b33c1f3 st/glsl_to_tgsi: readd the double_reg2 for input index mapping
Otherwise we end up emitting the wrong index for the second
double.

This fixes dmat-vs-gs-tcs-tes.shader_test and dvec3-vs-gs-tcs-tes.shader_test

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
84dbf3c4ff st/glsl_to_tgsi: when doing reladdr get vec4 of correct type
This fixes fp64 relative addressing, in the upcoming
dmat-vs-gs-tcs-tes.shader_test.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
d87894b98f st/glsl_to_tgsi: handle double immediates in matrices properly.
This handles matrix initialisation properly.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
7351c7684f st/glsl_to_tgsi: setup writemask for double arrays and matricies.
It's important for the double instruction emission code that
the writemasks are correct going in for double so it know
which channels to replicate.

This fixes it for the array and matrix cases.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
14506dcae2 st/glsl_to_tgsi: handle doubles in array shrinking code.
This code takes into account double inputs in the array
shrinking code. This fixes some issues with doubles
and geom/tess inputs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
aab0c6c9c4 st/glsl_to_tgsi: handle doubles outputs in arrays.
This handles the case where a double output is stored
in an array, and tracks it for use in the double
instruction emit code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
fc890d703e st/glsl_to_tgsi: store if dst is double in array
This is just a precursor patch to a fix for doubles with
tessellation that I've written.

We need to descend into output arrays in that case and
mark dst's as double.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Jason Ekstrand
6b5cbdb317 anv/format: Get rid of num_channels 2015-12-31 12:07:43 -08:00
Jason Ekstrand
3fe1f118f8 anv/cmd_buffer: Fix a pointer-cast typo 2015-12-31 12:07:43 -08:00
Chad Versace
86ecb28ec6 isl: Document some isl_surf::phys_level0_sa invariants
isl_dim_layout restricts the range of isl_surf::phys_level0_sa.
2015-12-31 12:06:02 -08:00
Jason Ekstrand
5318424d49 anv/pipeline: Better vertex input channel setup
First off, it now uses isl formats instead of anv_format.  Also, it
properly handles integer vs. floating-point default channels and can
properly handle alpha-only channels.  (Not sure if those are allowed).
2015-12-31 12:02:08 -08:00
Jason Ekstrand
c6364495b2 anv/pipeline: Move vk_to_gen tables into a shared header 2015-12-31 12:02:08 -08:00
Chad Versace
d25cff687b isl: Better document surface units
Logical pixels, physical surface samples, and physical surface elements.

Requested-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-31 11:56:13 -08:00
Chad Versace
373fd89e4b isl: Document the 3D block extent of isl_format 2015-12-31 11:55:48 -08:00
Jason Ekstrand
1ddcbbf05f nir/spirv: Add a missing break statement in handle_image 2015-12-30 21:57:04 -08:00
Jason Ekstrand
4f9a211b4a Revert "isl: Fix assertion failure for npot pixel formats"
This reverts commit 96d1baa88d.
2015-12-30 21:01:55 -08:00
Jason Ekstrand
0bb103d010 nir/spirv: Handle push constants after decorations 2015-12-30 20:54:27 -08:00
Jason Ekstrand
3421ba1843 anv/device: Place memory types at heapIndex == 0
Previously, they were at heapIndex == 1 even though we only advertised one
heap.
2015-12-30 19:32:43 -08:00
Jason Ekstrand
cf6ce424e0 nir/spirv: Fix constant num_elements and allocation
Thanks to the addition of nir_clone, we now have a num_elements field in
nir_constant which we weren't setting.  Also, constants have to be parented
to the variable they initialize, so we have to make a copy.
2015-12-30 18:51:59 -08:00
Jason Ekstrand
601b7d5f98 nir/lower_outputs_to_temporaries: Reparent constant initializers 2015-12-30 18:51:06 -08:00
Jason Ekstrand
7d57528233 nir/clone: Expose nir_constant_clone 2015-12-30 18:44:19 -08:00
Jason Ekstrand
fed98df428 nir/gather_info: Add support for end_primitive_with_counter 2015-12-30 17:45:43 -08:00
Jason Ekstrand
5afac62b28 nir/spirv: Handle OpLine 2015-12-30 17:45:43 -08:00
Jason Ekstrand
149f35bbba nir/spirv: Let OpEntryPoint act as an OpName 2015-12-30 17:45:43 -08:00
Jason Ekstrand
5f7f88524c nir/lower_outputs_to_temporaries: Take a nir_function entrypoint 2015-12-30 17:45:43 -08:00
Jason Ekstrand
0fe4580e64 nir/spirv: Add support for multiple entrypoints per shader
This is done by passing the entrypoint name into spirv_to_nir.  It will
then process the shader as if that were the only entrypoint we care about.
Instead of returning a nir_shader, it now returns a nir_function.
2015-12-30 17:45:43 -08:00
Jason Ekstrand
e993e45eb1 nir/spirv: Get the shader stage from the SPIR-V
Previously, we depended on it being passed in.
2015-12-30 17:45:43 -08:00
Jason Ekstrand
db3a64fcea nir/spirv: Use shader stage for determining variable locations 2015-12-30 17:45:43 -08:00
Jason Ekstrand
d7ae2200f9 nir/spirv: Get rid of default GS info
shaderc has been fixed for a while now.
2015-12-30 17:45:43 -08:00
Jason Ekstrand
d9c9a117dc nir/spirv: Handle execution modes as decorations
They're basically the same thing.
2015-12-30 17:45:43 -08:00
Jason Ekstrand
2b6bcaf91a nir/spirv: Separate handling of preamble from type/var/const instructions 2015-12-30 17:45:43 -08:00
Chad Versace
96d1baa88d isl: Fix assertion failure for npot pixel formats
When aligning to isl_format_layout::bs (which is the number of bytes in
the pixel), use isl_align_npot() instead of isl_align(), because
isl_align() works only for power-of-2 alignment.

Fixes assertion in
dEQP-VK.pipeline.image.view_type.1d.format.r16g16b16_sfloat.size.512x1.
2015-12-30 16:28:19 -08:00
Kenneth Graunke
65d3f85eb3 nvc0: Set winding order regardless of domain.
Quads need to respect winding order, too - not just triangles.

Fixes rendering in GFXBench 4.0's tessellation benchmark.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-30 16:04:12 -08:00
Kenneth Graunke
7cdc2b9ca0 glsl: Fix varying struct locations when varying packing is disabled.
varying_matches::record tries to compute the number of components in
each varying, which varying_matches::assign_locations uses to assign
locations.  With varying packing, it uses glsl_type::component_slots()
to come up with a reasonable value.

Without varying packing, it fell back to an open-coded computation
that didn't bother to handle structs at all.  I believe we can simply
use 4 * glsl_type::count_attribute_slots(false), which already handles
these cases correctly.

Partially fixes rendering in GFXBench 4.0's tessellation benchmark.
(NVE0 is almost right after this, but i965 is still mostly garbage.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-30 16:04:12 -08:00
Kenneth Graunke
4acf71c89b drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.
Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't
specify which fragment shader output is which, so there's at best a
50/50 chance of us guessing it correctly.  This is invalid.

Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago,
but hasn't actually released them for whatever reason.  So, add the
workaround back so that it works for most people.

Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge.  For whatever
reason, Broadwell worked.  4.1 and 1.1 have always worked.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-12-30 16:04:12 -08:00
Ilia Mirkin
5ac15f788b glsl: add GL_ARB_shader_draw_parameters define
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-30 18:59:18 -05:00
Jason Ekstrand
07b4f17aaf nir/spirv/GLSL450: Add support for SAbs 2015-12-30 14:41:49 -08:00
Kenneth Graunke
e6cd0c0e1c nir/spirv: Implement IsInf and IsNan built-ins. 2015-12-30 14:10:44 -08:00
Jason Ekstrand
a7e827192b isl: Tile-align height in image size calculation
This fixes a bunch of gpu hangs on the dEQP-VK.glsl.ShaderExecutor.common
group of CTS tests.
2015-12-30 14:03:47 -08:00
Ilia Mirkin
517a93b346 nvc0: add ARB_shader_draw_parameters support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-30 16:55:57 -05:00
Ilia Mirkin
89bda9772d st/mesa: add GL_ARB_shader_draw_parameters support
Hooks up the new system values, passes the drawid in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
daaf0bdf46 gallium: add a drawid to pipe_draw_info
This will allow the state tracker to inform the driver where in a
broken-up multidraw we currently are. This can then be passed into the
vertex shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
87b4e4e29f gallium: add PIPE_CAP_DRAW_PARAMETERS
This allows the state tracker to know that the various draw parameters
are available in vertex shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
bb52ea45cc gallium: add baseinstance/drawid semantics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Kenneth Graunke
9f23116bfa Revert "nir/spirv: Update to the 1.0 GLSL.std.450 header"
This reverts commit b33f5d3889,
and also removes the (empty) case statements for the new built-ins.

It doesn't look like glslang has updated yet, so updating the header
just breaks everything, as we no longer agree on opcode numbers.
2015-12-30 13:26:56 -08:00
Jason Ekstrand
e6fc170afb anv/allocator: Rework state streams again
If we're going to hav valgrind verify state streams then we need to ensure
that once we choose a pointer into a block we always use that pointer until
the block is freed.  I was trying to do this with the "current_map" thing.
However, that breaks down because you have to use the map from the block
pool to get to the stream_block to get at current_map.  Instead, this
commit changes things to track the stream_block by pointer instead of by
offset into the block pool.
2015-12-30 11:40:38 -08:00
Jason Ekstrand
28243b2fba gen7/8/cmd_buffer: Allocate the correct ammount for COLOR_CALC_STATE
We were allocating 6 bytes when we should have been allocating 6 dwords.
2015-12-30 10:37:57 -08:00
Jason Ekstrand
a0b2829f20 anv/stream_alloc: Properly manage valgrind NOACCESS and UNDEFINED status
When I first did the valgrindifying for stream allocators, I misunderstood
some things about valgrind's expectations for NOACCESS and UNDEFINED.
First off, valgrind expects things to be marked NOACCESS before you
allocate out of them.  Since our blocks came from a pool backed by a
mmapped memfd, they came in as UNDEFINED; we needed to mark them as
NOACCESS.  Also, I didn't realize that VALGRIND_MEMPOOL_CHANGE only updated
the mempool allocation state and didn't actually change definedness; we had
to add a VALGRIND_MAKE_MEM_UNDEFINED to get rid of the NOACCESS on the
newly allocated portion.
2015-12-30 10:36:19 -08:00
Ilia Mirkin
d50e6128b8 nv50/ir: attempt to do more constant folding on mad -> add conversion
The add might actually have a 0 as an argument, which would convert it
into a mov. Make sure to detect that. Also avoid the hack of putting the
immediate directly into the instruction, instead use a mov to put it
into place and let the later LoadPropagation pass place it if possible.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-30 12:29:07 -05:00
Marta Lofstedt
97685ff10e i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+
The imulExtended tests of the shader bitfield tests of the
OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W
is used for SHADER_OPECODE_MULH.

Also, remove unused helper function:
static inline bool type_is_signed(unsigned type)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-30 09:29:14 +01:00
Kristian Høgsberg Kristensen
91d93f7908 nir/spirv: Lower gl_GlobalInvocationID correctly
Use nir_intrinsic_load_local_invocation_id, not
nir_intrinsic_load_invocation_id (missing 'local'), which is a geometry
shader built-in.
2015-12-30 00:03:54 -08:00
Jason Ekstrand
451fe2670c nir/spirv/cfg: Handle discard 2015-12-29 19:23:25 -08:00
Jason Ekstrand
5693637faa nir/print: Handle variables with var->name == NULL 2015-12-29 16:58:00 -08:00
Jason Ekstrand
8cc55780fd nir/inline_functions: Switch to inlining everything 2015-12-29 16:58:00 -08:00
Timothy Arceri
0d4cd045c8 glsl: tidy up struct with a single member
There used to be more members but they now share other fields
in order to keep memory use low.

Also making the naming more generic will allow us to reuse the
field for explicit byte offsets within blocks for
ARB_enhanced_layouts.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:52:05 +11:00
Emil Velikov
2c1a215409 glsl/linker: annotate static functions as such
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:51:58 +11:00
Emil Velikov
c704b89fe4 glsl: annotate ast_process_struct_or_iface_block_members() as static
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:51:51 +11:00
Kenneth Graunke
7cdcee3bed nir/spirv/glsl450: Enumerate more built-in opcodes. 2015-12-29 16:06:35 -08:00
Kenneth Graunke
ccd84848f0 anv/state: Fix reversed MIN vs. MAX in levelCount handling.
The point is to promote a levelCount of 0 to 1 before subtracting 1.
This needs MAX, not MIN.
2015-12-29 15:51:14 -08:00
Jason Ekstrand
2a58cb03d0 nir/spirv: Use instr_rewrite_src for updating phi sources
You can't just add a new source to a phi because use/def information won't
get updated properly.  Instead, you have to use one of the core helpers.
Some day, we may want to add a nir_phi_instr_add_src helper.
2015-12-29 15:44:39 -08:00
Jason Ekstrand
69d5838aee nir/validate: Don't validate the return deref for void function calls 2015-12-29 15:35:29 -08:00
Jason Ekstrand
51b04d03d5 nir/dominance: Handle unreachable blocks
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:

loop {
   if (...) {
      break;
   } else {
      break;
   }
}

In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this.  First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable.  Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.
2015-12-29 15:29:27 -08:00
Kenneth Graunke
b4a1c9b506 nir/spirv/glsl450: Implement inverse hyperbolic trig built-ins. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
2ea111664c nir/spirv/glsl450: Implement Refract built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
74529a2c50 nir/spirv/glsl450: Implement hyperbolic trig built-ins. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
0b1a436ac8 nir/spirv/glsl450: implement Reflect built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
659a3623b0 nir/spirv/glsl450: Implement FaceForward built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
b10af36d93 nir/spirv/glsl450: Implement SmoothStep. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
6a0fa2d758 nir/spirv/glsl450: Implement Cross built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
083fd6ec2a nir/spirv/glsl450: Implement Clamp/SClamp/UClamp. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
034010924e nir/spirv/glsl450: Implement the Log built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
ffc5ae7c9e nir/spirv/glsl450: Implement Exp built-in. 2015-12-29 15:27:03 -08:00
Kenneth Graunke
227e250005 nir/spirv/glsl450: Add a helper for doing fclamp(). 2015-12-29 15:27:03 -08:00
Kenneth Graunke
0f801752f2 nir/spirv/glsl450: Add helpers for calculating exp() and log(). 2015-12-29 15:27:03 -08:00
Kenneth Graunke
9c9edd1ce8 nir/spirv/glsl450: Add an 'nb' shortcut variable.
"nb" is shorter and more convenient than "&b->nb", especially
when several operations are composed together into a larger expression
tree.
2015-12-29 15:27:03 -08:00
Jason Ekstrand
5f04a61219 nir/lower_returns: Don't just change the type of a jump.
It doesn't give core NIR the opportunity to update predecessors and
successors.  Instead, we have to remove and re-insert the instruction.
2015-12-29 14:51:47 -08:00
Jason Ekstrand
6fa47c9c17 nir/builder: Add a nir_jump helper 2015-12-29 14:48:34 -08:00
Jason Ekstrand
37a38548d4 glsl/types.cpp: Fix function_key_compare 2015-12-29 14:32:10 -08:00
Jason Ekstrand
b33f5d3889 nir/spirv: Update to the 1.0 GLSL.std.450 header 2015-12-29 14:29:03 -08:00
Jason Ekstrand
a33fcc0fd4 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in nir_builder_init_simple_shader and allows us to delete
anv_nir_builder.h entirely.
2015-12-29 13:53:41 -08:00
Jason Ekstrand
0119773ffc nir/builder: Add an init function that creates a simple shader for you
A hugely common case when using nir_builder is to have a shader with a
single function called main.  This adds a helper that gives you just that.
This commit also makes us use it in the NIR control-flow unit tests as well
as tgsi_to_nir and prog_to_nir.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-29 13:44:05 -08:00
Jason Ekstrand
5dd4386b92 nir/spirv: Use a C99-style initializer for structure fields
This ensures that all unknown fields get zero-initizlied so we don't have
undefined values floating around.
2015-12-29 13:15:20 -08:00
Jason Ekstrand
e10b0e2b49 anv/pipeline: Use vs_prog_data.inputs_read when computing vb_used 2015-12-29 13:03:01 -08:00
Jason Ekstrand
0a2ab87947 nir/spirv: Move CF emit code into vtn_cfg.c 2015-12-29 12:50:31 -08:00
Jason Ekstrand
4e22cd2e32 nir/spirv: Add support for switch statements 2015-12-29 12:50:31 -08:00
Jason Ekstrand
cf555dc1c2 nir/spirv: A couple simple loop fixes 2015-12-29 12:50:31 -08:00
Jason Ekstrand
303d095f58 nir/spirv: Add an actual CFG data structure
The current data structure doesn't handle much that we couldn't handle
before.  However, this will be absolutely crucial for doing swith
statements.  Also, this should fix structured continues.
2015-12-29 12:50:31 -08:00
Kristian Høgsberg Kristensen
55ca5b0e74 mesa/st: Pad out _mesa_sysval_to_semantic for new SYSTEM_VALUE_* enums
GL_ARB_shader_draw_parameters added two new system values.  This gets us
back to mapping mesa system values to the right TGSI semantics.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-29 12:15:01 -08:00
Ilia Mirkin
724134f683 nv50/ir: float(s32 & 0xff) = float(u8), not s8
Make sure to make conversion unsigned when we're ANDing the high bits
away. Fixes corruption in dolphin.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-29 15:08:20 -05:00
Kristian Høgsberg Kristensen
581f81860e i965: Reemit vertex state between indirect multi draws
If we're doing an indirect draw, prims[i].basevertex is always 0 and the
real base vertex value is in the indirect parameter buffer. We try to
avoid flagging BRW_NEW_VERTICES if prims[i].basevertex doesn't change,
which then breaks down for indirect draws. Thus, if a program uses base
vertex or base instance, and the draw call is indirect, always flag
BRW_NEW_VERTICES.  A new piglit test,
spec/ARB_shader_draw_parameters/drawid-indirect-vertexid tests this.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
f9283f2668 nir: Teach nir_opt_algebraic about adding and subtracting the same thing
This optimizes a + b - b to just a. Modest shader-db results (BDW):

  total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
  instructions in affected programs:     61938 -> 61348 (-0.95%)
  total loops in shared programs:        2131 -> 2131 (0.00%)
  helped:                                263
  HURT:                                  0
  GAINED:                                0
  LOST:                                  0

but the optimization turns

  gl_VertexID - gl_BaseVertexARB

into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the
i965 hardware supports natively. That means we can avoid using the
internal vertex buffer for gl_BaseVertexARB in this case.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
cddfc2cefa i965: Add support for gl_DrawIDARB and enable extension
We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway.  This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
17ebb55a14 i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB
We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
b70616f3e7 i965: Assert that SYSTEM_VALUE_VERTEX_ID gets lowered
fs_visitor::emit_vs_system_value() looks like it's trying to handle
SYSTEM_VALUE_VERTEX_ID, but we should never see that value in the
backend.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
1a59aeaebd mesa: Add core mesa support for GL_ARB_shader_draw_parameters
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-29 10:39:25 -08:00
Kristian Høgsberg Kristensen
42dd2c028d mesa/vbo: Add draw_id field to struct _mesa_prim
The drivers will need this for passing in gl_DrawIDARB. For indirect
multidraw calls, we get the prim array and prim[i].draw_id == i and is
redundant. But for non-indirect calls, we get one primitive at a time
and need the draw_id field.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-29 10:39:25 -08:00
Aaron Watry
70d8dbc9a1 nir: Remove function overload in control flow test
Fixes make check.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-29 09:42:14 -08:00
Jason Ekstrand
bbf99511d0 gen7/8/pipeline: s/vb_used/elements in emit_vertex_input 2015-12-29 09:40:22 -08:00
Nicolai Hähnle
7b8db37abb radeonsi: add RADEON_REPLACE_SHADERS debug option
This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).

v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-29 09:07:04 -05:00
Nicolai Hähnle
7d1fc2cf51 radeonsi: count compilations in si_compile_llvm
This changes the count slightly (because of si_generate_gs_copy_shader), but
this is only relevant for the driver-specific num-compilations query. It sets
the stage for the next commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-29 09:07:01 -05:00
Nicolai Hähnle
4711170239 gallium/util: add DEBUG_GET_ONCE_OPTION
This is analogous to the alreading existing macros for BOOL, NUM, and FLAGS.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-29 09:06:57 -05:00
Grazvydas Ignotas
da0e216e06 r600: fix constant buffer size programming
When buffer size is less than 16, zero ends up being programmed as
size, which prevents the hardware from fetching the correct values.
Fix it by combining shift and align so that the value is always
rounded up.

Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92229
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-12-29 09:05:55 -05:00
Kristian Høgsberg Kristensen
fc03723bcd vk: Fill out buffer surface state when updating descriptor set
We can do this when we update the descriptor set instead of on the
fly.
2015-12-28 21:57:56 -08:00
Kristian Høgsberg Kristensen
a00524a216 vk: Unstub VkSemaphore implementation
There really is nothing to do for us here, at least with the current
kernel interface.
2015-12-28 21:57:56 -08:00
Jason Ekstrand
5fab35d090 gen7/pipeline: Actually use inputs_read from the VS for laying out inputs 2015-12-28 18:21:11 -08:00
Jason Ekstrand
b090f9dce1 gen8/pipeline: Actually use inputs_read from the VS for laying out inputs 2015-12-28 18:21:11 -08:00
Jason Ekstrand
3eb108ef87 anv/meta: Fix the pos_out location for the vertex shader 2015-12-28 18:21:11 -08:00
Jason Ekstrand
b005fd62f9 nir/spirv: Add GLSL.std.450.h
It accidentally got removed during the mass rename.
2015-12-28 15:46:22 -08:00
Jason Ekstrand
9c84b6cce0 anv/device: Set device->info sooner in CreateDevice
anv_block_pool_init calls anv_block_pool_grow which checks
device->info.has_llc to see if it needs to set caching parameters.
If we don't set device->info early enough, this reads an undefined value
which is probably 0 and not what we want on llc platforms.

Found with valgrind.
2015-12-28 13:29:01 -08:00
Jason Ekstrand
763176a3e2 nir/lower_returns: Fix a bug in loop lowering 2015-12-28 13:22:09 -08:00
Kenneth Graunke
dfce9759ab docs: Mark ARB_tessellation_shader as done on all i965 platforms.
We now support all Intel GPUs which can do tessellation.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:17:08 -08:00
Kenneth Graunke
381a89cf2a i965: Enable ARB_tessellation_shader on Gen7-7.5.
We've resolved all the GPU hangs, and everything seems to be working.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:17:05 -08:00
Kenneth Graunke
bd8ab8dedb i965: Don't set interleave or complete on TCS EOT message.
Setting interleave on the TCS EOT message causes Ivybridge hardware to
GPU hang like crazy.  Individual tests would pass, but running even a
simple test like nop.shader_test in a loop would hang within 1-3 runs.
Adding sleep delays worked around the problem, somehow.

Interleave doesn't make much sense given that we only have one patch
URB handle, not two.  Complete doesn't seem useful either.

There's no reason to actually set those bits.  We were just being lazy.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:17:03 -08:00
Kenneth Graunke
b7793783b3 i965: Relase input URB Handles on Gen7/7.5 when TCS threads finish.
Pre-Broadwell hardware requires us to manually release the ICP Handles
by issuing URB read messages with the "Complete" bit set.  We can do
this in pairs to use fewer URB read messages.

Based heavily on work from Chris Forbes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:17:00 -08:00
Kenneth Graunke
6ceabb72ea i965: Use proper TCS barrier ID bits for Ivybridge/Baytrail.
Gen7 uses bits 15:12 while Gen7+ uses bits 16:13.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:57 -08:00
Kenneth Graunke
5898cbae24 i965: Use proper TCS Instance ID bits for Ivybridge/Baytrail.
Gen7 uses 22:16 while Gen7.5+ uses 23:17.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:54 -08:00
Kenneth Graunke
1245724f72 i965: Port tessellation evaluation shaders to vec4 mode.
This can be used on Broadwell by setting INTEL_SCALAR_TES=0.
More importantly, it will be used for Ivybridge and Haswell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:48 -08:00
Kenneth Graunke
889d987904 i965: Emit a real 3DSTATE_DS on Gen7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:45 -08:00
Kenneth Graunke
2c240b05e9 i965: Emit a real 3DSTATE_HS on Gen7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:34 -08:00
Kenneth Graunke
74b83fe368 i965: Add the TCS/TES state upload atoms to the gen7_atoms list.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-28 13:16:19 -08:00
Jason Ekstrand
7aaed91581 nir/spirv: Move to its own directory 2015-12-28 11:49:39 -08:00
Jason Ekstrand
d5fa51bdee Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in the removal of nir_function_overload
2015-12-28 10:56:31 -08:00
Jason Ekstrand
d9dcfafacc nir/spirv: Use nir_build_alu for alu instructions 2015-12-28 10:35:31 -08:00
Jason Ekstrand
237f2f2d8b nir: Get rid of function overloads
When Connor originally drafted NIR, he copied the same function+overload
system that GLSL IR had with a few names changed.  However, this
double-indirection is not really needed and has only served to confuse
people.  Instead, let's just have functions which may not have unique names
and may or may not have an implementation.  If someone wants to do overload
resolving, they can hav a hash table based function+overload system in the
overload resolving pass.  There's no good reason to keep it in core NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>

ir3 bits are

Reviewed-by: Rob Clark <robclark@gmail.com>
2015-12-28 09:59:53 -08:00
Jason Ekstrand
ea77b384e8 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in tessellation and the store_var changes that go with it.
2015-12-27 23:23:05 -08:00
Jason Ekstrand
f948767471 nir/lower_returns: Better algorithm as per connor 2015-12-27 22:50:45 -08:00
Jason Ekstrand
3489f66056 nir: Add a cursor helper for getting a cursor after any phi nodes 2015-12-27 22:50:14 -08:00
Ilia Mirkin
109c348284 nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion
Also release the scratch allocation if any.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-27 21:33:36 -05:00
Ilia Mirkin
28e07fdd4a nv50,nvc0: add a note when converting vertex elements using CPU
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-27 19:49:44 -05:00
Jason Ekstrand
c60456dfaa nir/gather_info: Handle multi-slot variables in io bitfields 2015-12-24 00:47:20 -08:00
Jason Ekstrand
bbebd2de13 nir: Add a helper for getting the bitmask for a variable's location 2015-12-24 00:47:20 -08:00
Jason Ekstrand
4ff4310a78 nir/types: Expose glsl_type::count_attribute_slots() 2015-12-24 00:47:19 -08:00
Jason Ekstrand
0bc1b0fd23 nir/lower_return: Do it for real this time 2015-12-24 00:47:19 -08:00
Jason Ekstrand
e1b1d58bec nir/cf: Make extracting or re-inserting nothing a no-op 2015-12-23 23:46:04 -08:00
Jason Ekstrand
eae352e75c nir: Add a function for comparing cursors 2015-12-23 18:09:42 -08:00
Connor Abbott
41c7912d04 gallium/auxiliary: don't build NIR sources with MSVC2008 flags
NIR has never been built with MSVC2008, so we shouldn't add
MSVC2008_COMPAT_CFLAGS to anything that uses it. This allows us to get
rid of the pragma in tgsi_to_nir.c.

Build tested with freedreno.

v2: Use MSVC2013_COMPAT_CLFAGS instead.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-12-23 20:46:48 -05:00
Jason Ekstrand
54c870ff61 nir/spirv: Add support for undefs in vtn_ssa_value() 2015-12-23 14:14:39 -08:00
Jason Ekstrand
2e823d5754 nir/spirv: Properly handle vector times matrix 2015-12-23 13:49:56 -08:00
Jason Ekstrand
452ba4db2b nir/spirv: Create the correct type if a matrix-vector multiply produces a vector 2015-12-23 13:49:56 -08:00
Jason Ekstrand
5b30132388 nir/spirv: Fix some mem_ctx issues with create_vec 2015-12-23 13:49:56 -08:00
Jason Ekstrand
66168a798b nir/spirv: Better document vtn_ssa_value.transposed 2015-12-23 13:49:56 -08:00
Jason Ekstrand
3b391892aa anv/descriptor_set: Use anv_foreach_stage 2015-12-23 13:49:56 -08:00
Jason Ekstrand
72ceb99bab anv: Mask out invalid stages in foreach_stage 2015-12-23 13:49:56 -08:00
Jason Ekstrand
5644b1cece nir/spirv: Handle LogicalNot 2015-12-23 13:49:56 -08:00
Jason Ekstrand
6219a69589 nir/spirv: Handle derefs in vtn_ssa_value
This is kind of a hack, but it makes vtn_ssa_value insert a load if the
value requested is actually a deref.  This shouldn't happen normally but,
thanks to the impedence mismatch of the NIR function parameter model vs.
the SPIR-V model, this can happen for function arguments.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
3ab1b7afa8 nir/spirv: Do boolean fixup on block loads
We used to do it for variable loads on things of type "uniform" but that
never got ported to block loads.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
af74ce5a19 spirv/nir: Handle non-vector extractions in vtn_composite_extract 2015-12-23 13:49:56 -08:00
Jason Ekstrand
79b8b42081 nir/spirv: Handle function calls 2015-12-23 13:49:56 -08:00
Jason Ekstrand
95990c96cc nir: Create the params array in function_impl_create 2015-12-23 13:49:56 -08:00
Jason Ekstrand
a7f3e113ad i965/nir: Remove return handling
This was added because we were getting spurrious returns coming out of
SPIR-V.  Now that we're calling lower_returns, we don't need this.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
ac975b73cf anv/pipeline: Run lower_returns and inline_functions after spirv_to_nir 2015-12-23 13:49:56 -08:00
Jason Ekstrand
8fba4bf79f nir: Add a function inlining pass 2015-12-23 13:49:56 -08:00
Jason Ekstrand
b21db9cea5 nir/builder: Add a copy_deref_var helper 2015-12-23 13:49:56 -08:00
Jason Ekstrand
23cfa683d5 nir: move nir_copy_var from anv_nir_builder to nir_builder 2015-12-23 13:49:56 -08:00
Jason Ekstrand
4aac03fe61 nir/clone: Add support for cloning a single function_impl
This will be useful for things such as function inlining.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
98291b8f2c nir: Add a helper for creating a "bare" nir_function_impl
This is useful if you want to clone a single function_impl if, for
instance, you wanted to do function inlining.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
86772c2488 nir/control_flow: Handle relinking top-level blocks
This can happen if a function ends in a return instruction and you remove
the return.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
1749e667ea nir: Add a stub function inlining pass
All it does is remove the return at the end, but it's good enough for
simple functions.
2015-12-23 13:49:56 -08:00
Jason Ekstrand
413a9d3517 nir/print: Factor variable name lookup into a helper
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
2015-12-23 13:49:56 -08:00
Anuj Phogat
52865efc41 i965: Add tr_mode and mip tail information in surface state dump
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-12-23 13:20:45 -08:00
Jordan Justen
8326eb13f2 i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data
The BDW PRM Vol2a: Command Reference: Instructions, section MEDIA_CURBE_LOAD,
says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
64-byte aligned. This is different from previous gens, that were 32-byte
aligned.

v2 (Jordan):
  - CURBE Data Start Address is also 64-byte aligned.
    - The call to brw_state_batch should also use 64-byte alignment.
      - Improve PRM reference.

v3:
 * New patch from Jordan. Always align base and size to 64 bytes.

Fixes the following SSBO CTS tests on BDW:
ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs

And many other CS CTS tests as reported by Marta Lofstedt.

(Commit message is from Iago, but in v3, code is from Jordan.)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-22 23:54:02 -08:00
Rob Clark
843cec6d3a freedreno/ir3: spelling..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-23 00:28:24 -05:00
Rob Clark
dc21747838 nir/print: print variable constant-initializers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-23 00:28:01 -05:00
Kenneth Graunke
6524897606 docs: Clarify that ARB_tessellation_shader is only done on i965/gen8+.
Requested by kisak on IRC.
2015-12-22 20:14:35 -08:00
Kenneth Graunke
209d130dd1 docs: Mark ARB_tessellation_shader as done on i965/gen8+. 2015-12-22 18:50:38 -08:00
Kenneth Graunke
7738f3a988 i965: Enable ARB_tessellation_shader on Gen8+.
Everything is in place and I'm not aware of any further issues.

Tested with:
- Piglit
- Tessmark
- Unigine Heaven
- Shadow of Mordor
- GRID Autosport

I have patches to backport this to Haswell, Ivybridge, and Baytrail as
well (the first Intel hardware to support tessellation), but there are
still a lot of GPU hangs left to debug.  So that will come later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:14 -08:00
Kenneth Graunke
794eb9d727 i965: Handle mix-and-match TCS/TES with separate shader objects.
GL_ARB_separate_shader_objects allows the application to mix-and-match
TCS and TES programs separately.  This means that the interface between
the two stages isn't known until the final SSO pipeline is in place.

This isn't a great match for our hardware: the TCS and TES have to agree
on the Patch URB entry layout.  Since we store data as per-patch slots
followed by per-vertex slots, changing the number of per-patch slots can
significantly alter the layout.  This can easily happen with SSO.

To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead
bitfields in the TCS/TES program keys, introducing program recompiles.
brw_upload_programs() decides the layout for both TCS and TES, and
passes it to brw_upload_tcs/tes(), which store it in the key.

When creating the NIR for a shader specialization, we override
nir->info.inputs_read (and friends) to the program key's values.
Since everything uses those, no further compiler changes are needed.
This also replaces the hack in brw_create_nir().

To avoid recompiles, brw_precompile_tes() looks to see if there's a
TCS in the linked shader.  If so, it accounts for the TCS outputs,
just as brw_upload_programs() would.  This eliminates all recompiles
in the non-SSO case.  In the SSO case, there should only be recompiles
when using a TCS and TES that have different input/output interfaces.

Fixes Piglit's mix-and-match-tcs-tes test.

v2: Pull the brw_upload_programs code into a brw_upload_tess_programs()
    helper function (requested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:11 -08:00
Kenneth Graunke
01b1b44d31 i965: Defer input lowering for tessellation stages until specialization.
With tessellation shaders and SSO, we won't be able to always decide on
VUE map layouts at LinkProgram time.  Unfortunately, we have to delay it
until shader specialization time.

However, uniform lowering cannot be deferred - brw_codegen_*_prog()
reads nir->num_uniforms.  Fortunately, we don't need to defer it -
uniform, system value, atomic, and sampler lowering can safely stay
where it is.  This patch moves those to brw_lower_nir()'s only caller,
renames brw_lower_nir() to brw_nir_lower_io(), and introduces calls
to that.

For non-tessellation stages, I chose to call brw_nir_lower_io() from
brw_create_nir(), so it's still done at the same time.  There's no
need to defer it, and doing it at LinkProgram time is nice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:10 -08:00
Kenneth Graunke
8bc073d601 i965: Automatically create a passthrough TCS when needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:09 -08:00
Kenneth Graunke
4ec3f0f4b9 i965: Start program_string_id from 1, not 0.
This way, I can safely use brw_tcs_prog_key::program_string_id == 0
to mean "not filled out because no program exists", which avoids the
need for adding an extra boolean to that struct.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:08 -08:00
Kenneth Graunke
2432643e89 i965: Create and set a new brw_tcs_prog_data::outputs_written field.
When the application hasn't supplied a TCS, and we have to create one,
we need to know what VS outputs to copy to TES inputs.

To do this, we create a new program key field, and set it to the TES
InputsRead bitfield.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:06 -08:00
Kenneth Graunke
239a4bdcd4 i965: Upload HS push constants whenever default tess. levels change.
When using tessellation on OpenGL without a TCS, default values for
gl_TessLevelOuter/gl_TessLevelInner are provided via the API.

Core Mesa will flag ctx->DriverFlags.NewDefaultTessLevels whenever those
values change.  We add a corresponding BRW_NEW_DEFAULT_TESS_LEVELS flag
and hook it up to HS push constants (which will be used to upload these
default values to the autogenerated TCS).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:05 -08:00
Kenneth Graunke
0d5cb4aef4 i965: Only call _mesa_load_state_parameters if prog exists.
With the automatic-TCS creation, we won't have a prog, but still need to
upload push constants.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:04 -08:00
Kenneth Graunke
a122af696c i965: Switch TCS gl_program/gl_shader_program checks over to TES.
Tessellation control shaders are optional, but evaluation shaders will
always be present when using tessellation.  However, we'll always enable
the TCS (HS) hardware stage when using tessellation - we'll just create
a program on the fly.

That program, however, won't have a gl_program or gl_shader_program.
So we shouldn't check brw->tess_ctrl_program or
shader_prog->_LinkedShaders[MESA_SHADER_TESS_CTRL] - if we want to know
whether tessellation is enabled, we should look for a TES.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:03 -08:00
Kenneth Graunke
9d35fecfb9 i965: Remove unnecessary brw->tess_ctrl_program assertions.
This is trying to enforce the fact that the hardware requires HS, TE,
and DS to be enabled or disabled together.  But it's kind of an ad-hoc
attempt, and not too useful.

More importantly, we aren't going to have a gl_shader_program for the
TCS which is automatically generated when none is present.  (We'll just
handle it in the driver backend.)  So, these will trip for no reason.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:02 -08:00
Kenneth Graunke
f46dbfaed9 i965: Consolidate BRW_NEW_TESS_{CTRL,EVAL}_PROGRAM flags.
For several reasons, I don't think it's particularly useful to have
separate flags:

1. Most of the time, tessellation shaders are paired, so both will be
   replaced at the same time.

2. The data layout is tightly coupled.  Both need to agree on the number
   of per-patch slots in the VUE map.  Even adding extra TCS outputs
   that aren't read by the TES will trigger the need for recompiles.

3. The TCS is optional from an API perspective, but required by the
   hardware whenever tessellation is enabled.  So, atoms that deal with
   the TCS must check brw->tess_eval_program (BRW_NEW_TESS_EVAL_PROGRAM?)
   rather than brw->tess_ctrl_program to tell whether tessellation is
   enabled.

So, not only is it unlikely to be useful, it's a bit confusing to get
right.  Simply using one flag for both simplifies this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:22:00 -08:00
Kenneth Graunke
8498cb4a45 i965: Only call brw_upload_tcs/tes_prog when using tessellation.
If there's no evaluation shader, tessellation is disabled.  The upload
functions would just bail.  Instead, don't bother calling them.

This will simplify the optional-TCS case a bit, as brw_upload_tcs can
assume that we're doing tessellation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:21:59 -08:00
Kenneth Graunke
2bcf989407 nir: Add a glsl_vec_type() helper.
I need access to glsl_type::vec2_type from C.  Wrapping vec() also gives
us access to vec3 if we need it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 17:21:47 -08:00
Kenneth Graunke
0daf51e130 nir: Use writemasked store_vars in glsl_to_nir.
Instead of performing the read-modify-write cycle in glsl->nir, we can
simply emit a partial writemask.  For locals, nir_lower_vars_to_ssa will
do the equivalent read-modify-write cycle for us, so we continue to get
the same SSA values we had before.

Because glsl_to_nir calls nir_lower_outputs_to_temporaries, all outputs
are shadowed with temporary values, and written out as whole vectors at
the end of the shader.  So, most consumers will still not see partial
writemasks.

However, nir_lower_outputs_to_temporaries bails for tessellation control
shader outputs.  So those remain actual variables, and stores to those
variables now get a writemask.  nir_lower_io passes that through.  This
means that TCS outputs should actually work now.

This is a functional change for tessellation control shaders.

v2: Relax the nir_validate assert to allow partial writemasks.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-22 15:57:59 -08:00
Kenneth Graunke
7d539080c1 nir: Add a writemask to store intrinsics.
Tessellation control shaders need to be careful when writing outputs.
Because multiple threads can concurrently write the same output
variables, we need to only write the exact components we were told.

Traditionally, for sub-vector writes, we've read the whole vector,
updated the temporary, and written the whole vector back.  This breaks
down with concurrent access.

This patch prepares the way for a solution by adding a writemask field
to store_var intrinsics, as well as the other store intrinsics.  It then
updates all produces to emit a writemask of "all channels enabled".  It
updates nir_lower_io to copy the writemask to output store intrinsics.

Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks
by doing a read-modify-write cycle (which is safe, because local
variables are specific to a single thread).

This should have no functional change, since no one actually emits
partial writemasks yet.

v2: Make nir_validate momentarily assert that writemasks cover the
    complete value - we shouldn't have partial writemasks yet
    (requested by Jason Ekstrand).

v3: Fix accidental SSBO change that arose from merge conflicts.

v4: Don't try to handle writemasks in ir3_compiler_nir - my code
    for indirects was likely wrong, and TTN doesn't generate partial
    writemasks today anyway.  Change them to asserts as requested by
    Rob Clark.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
2015-12-22 15:57:59 -08:00
Tapani Pälli
50fc4a9256 mesa: update gl_HelperInvocation support status in docs
Was enabled for i965 and nvc0 by following commits:

	c875e3cdd2
	39f51ec96f

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-12-22 15:14:02 +02:00
Tapani Pälli
f2be5b8ba4 mesa: fix interface matching done in validate_io
Patch makes following changes for interface matching:

   - do not try to match builtin variables
   - handle swizzle in input name, as example 'a.z' should
     match with 'a'
   - add matching by location
   - check that amount of inputs and outputs matches

These changes make interface matching tests to work in:
   ES31-CTS.sepshaderobjs.StateInteraction

The test still does not pass completely due to errors in rendering
output. IMO this is unrelated to interface matching.

Note that type matching is not done due to varying packing which
changes type of variable, this can be added later on. Preferably
when we have quicker way to iterate resources and have a complete
list of all existed varyings (before packing) available.

v2: add spec reference, return true on desktop since we do not
    have failing cases for it, inputs and outputs amount do not
    need to match on desktop.

v3: add some more spec reference, remove desktop specifics since
    not used for now on desktop, add match by location qualifier,
    rename input_stage and output_stage as producer and consumer
    as suggested by Timothy.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-22 14:50:25 +02:00
Iago Toral Quiroga
5f8bb6fbb1 mesa: add SSBOs to the list of fragment shader side effects
The i965 driver uses this function to decide if it can disable the
FS unit in the absence of color/depth writes. We don't want to disable
the unit in the presence of SSBOs, since the fragment shader could
be writing to it.

We could go a step further and check not just for the presence of SSBOs
but also if the shader code writes to them. Does not look worth the trouble
though and we are not doing this for atomic buffers either anyway.

v2: put this into a generic _mesa_active_fragment_shader_has_side_effects
    function instead of having one specific for SSBOs (Jason).

Fixes the following CTS test:
ES31-CTS.shader_storage_buffer_object.advanced-usage-sync-vsfs

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-22 12:38:48 +01:00
Iago Toral Quiroga
9bbdd0eda4 i965: Ensure FS execution in presence of atomic buffers
On Haswell we need to set the UAV_ONLY WM state bit when there are no colour
or depth buffer writes and on all hardware we should set the early
depth/stencil control field to PSEXEC unless early fragment tests are enabled
to make sure that the fragment shader is executed regardless of whether
per-fragment tests pass or not as the spec requires.

So far we have been doing this for images only, but we should apply the same
treatment to all side effectful scenarios. Suggested by Curro.

This is not strictly required for compliance with the original
ARB_shader_atomic_counters extension, it's only necessary to get the execution
semantics specified in GL4.2+ right.

v2:
- Mark active_fs_has_side_effects as constant. (Curro)
- Mention that this is only only necessary to get the execution semantics
specified in GL4.2+ right. (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-22 12:38:48 +01:00
Iago Toral Quiroga
1a95b87dad mesa: Add a _mesa_active_fragment_shader_has_side_effects helper
Some drivers can disable the FS unit if there is nothing in the shader code
that writes to an output (i.e. color, depth, etc). Right now, mesa has
a function to check for atomic buffers and the i965 driver also checks for
images. Refactor this logic into a generic function that we can use for
any source of side effects in a fragment shader. Suggested by Jason.

v2:
- Use '_Shader', as suggested by Tapani, to fix the following CTS test:

ES31-CTS.shader_atomic_counters.advanced-usage-many-draw-calls2

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-22 12:38:48 +01:00
Kenneth Graunke
57f7c85dcf i965: Implement gl_PatchVerticesIn by baking it into brw_tcs_prog_key.
The hardware provides us no decent way of getting at the number of input
vertices in the patch topology from the tessellation control shader.
It's actually very surprising - normally this sort of information would
be available in the thread payload.

For the precompile, we guess that the number of vertices will be the
same for both the input and output patches.  This usually seems to be
the case.

On Gen8+, we could pass in an extra push constant containing this value.
We may be able to do that on Haswell too.  It's quite a bit trickier on
Ivybridge, however.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 02:12:05 -08:00
Kenneth Graunke
24be658d13 i965: Add tessellation control shaders.
The TCS is the first tessellation shader stage, and the most
complicated.  It has access to each of the control points in the input
patch, and computes a new output patch.  There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.

One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce.  Otherwise, it simply writes outputs that are passed along
to the TES.

We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF.  We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.

v2: Update comments (requested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 02:12:05 -08:00
Kenneth Graunke
a5038427c3 i965: Add tessellation evaluation shaders
The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input.  Otherwise,
they're very straightforward.

This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread).  I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).

We currently handle all inputs via the pull model.

v2: Improve comments (suggested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-22 02:12:05 -08:00
Timothy Arceri
54daffef16 nir: remove field only used in GLSL IR when assigning varying locations
This field is used as a flag to optimise out any varyings that don't have
a matching varying on the other side of the interface.

The value should be the same for all varyings (except for SSO but we can't
optimise those) by the time they reach nir and are no longer be needed.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-22 17:08:03 +11:00
Ben Skeggs
a8c4747602 nouveau: enable use of new kernel interfaces
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:17 +10:00
Ben Skeggs
5b614b141a nvc0: remove use of deprecated sw class identifier
Also emits a method to properly bind the class to a subchannel, which
was missing previously.  The kernel currently doesn't care, but this
will break if it ever decides to (ie. to support multiple sw classes).

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:13 +10:00
Ben Skeggs
33a3ba8c59 nv50: fix g98+ vdec class allocation
The kernel previously exposed incorrect classes for some of the chipsets
that this code supports.  It no longer does, but the older object ioctls
have compatibility to avoid breaking userspace.

This needs to be fixed before switching over to the newer interfaces.

Rather than hardcoding chipset->class like the rest of the driver does,
this makes use of (new) sclass queries to determine what's available.

v2.
- update to use symbolic class identifier from <nvif/class.h>

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:10 +10:00
Ben Skeggs
791a3e1850 nouveau: remove use of deprecated nouveau_device_wrap()
Switching to the newer libdrm entry-points tells libdrm that it's OK to
make use of newer kernel interfaces.

We want to be able to isolate any bugs to either the interfaces changes,
or the use of NVIF itself.  As such, this commit has a slight hack which
forces libdrm to continue using the older kernel interfaces.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:08 +10:00
Ben Skeggs
323d4da372 nouveau: fix screen creation failure paths
The winsys layer would attempt to cleanup the nouveau_device if screen
init failed, however, in most paths the pipe driver would have already
destroyed it, resulting in accesses to freed memory etc.

This commit fixes the problem by allowing the winsys to detect whether
the pipe driver's destroy function needs to be called or not.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:05 +10:00
Ben Skeggs
6c1bfff66c nouveau: return nouveau_screen from hw-specific creation functions
Kills off a void cast.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:03 +10:00
Ben Skeggs
1a9ec8e062 nouveau: remove use of deprecated nouveau_device::drm_version
v2. update for libdrm nouveau_drm::lib_version removal

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:24:01 +10:00
Ben Skeggs
a458ffacba nouveau: remove use of deprecated nouveau_device::fd
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:23:59 +10:00
Ben Skeggs
a8abdf2f35 nouveau: bump required libdrm version to 2.4.66
v2. forgot bump for non-gallium driver

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-22 13:23:27 +10:00
Dave Airlie
d19106649f r600: fix viewport clipping handling (v2)
If oViewport is written, vertex reuse need to be turned off.
If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE
need to be set. (we don't have enough info to program VPORT_PROVOKE).

Fixes: arb_viewport_array-render-viewport-2 and some CTS tests.

v2: drop vport provoke write, drop initial state writing this
on evergreen, only program it on evergreen.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-22 09:09:56 +10:00
Dave Airlie
73e7c5fd7f radeonsi: fix viewport clipping handling. (v2)
If oViewport is written, vertex reuse need to be turned off.
If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE
need to be set. (We don't know if oViewport is constant so we
skip this.)

Fixes: arb_viewport_array-render-viewport-2 and some CTS tests.

v2: drop writing to provoke disable, drop write in initial
state.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-22 09:09:52 +10:00
Dave Airlie
847f91f4e5 r600: drop VTX_CNT_EN write from initial state
we always program this in shader stages atom now.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-22 09:09:48 +10:00
Nicolai Hähnle
ea8c0b16ec gallium/radeon: fix regression in a number of driver queries
This rather silly mistake was introduced by commit 01910676.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-21 15:47:10 -05:00
Ben Widawsky
0865088cca i965: Only apply CS stall workaround pre-SKL
As per the docs.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-21 10:42:42 -08:00
Ilia Mirkin
f7b7145123 glx/dri3: a drawable might not be bound at wait time
A trace of Alien Isolation hit this on nouveau.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-21 06:43:58 -05:00
Emil Velikov
37186c43b5 docs: add news item and link release notes for 11.0.8
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-21 10:13:17 +00:00
Emil Velikov
1c1994da58 docs: add sha256 checksums for 11.0.8
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b9b19162ee)
2015-12-21 10:11:28 +00:00
Emil Velikov
bb5adf065f docs: add release notes for 11.0.8
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 261daab6b4)
2015-12-21 10:11:27 +00:00
Kristian Høgsberg Kristensen
220ac9337b vk: Only require wc bo mmap for !llc GPUs 2015-12-19 22:25:57 -08:00
Kristian Høgsberg Kristensen
b49aaf5de0 vk: Remove stale 48 bit addresses FIXMEs
This has worked fine for a long time.
2015-12-19 22:20:45 -08:00
Kristian Høgsberg Kristensen
c4802bc44c vk/gen8: Implement VkEvent for gen8
We use PIPE_CONTROL for setting and resetting the event from cmd buffers
and MI_SEMAPHORE_WAIT in polling mode for waiting on an event.
2015-12-19 22:17:19 -08:00
Dave Airlie
97eee90547 glsl: count attributes for vertex inputs properly.
This function deals with vertex inputs and fragment
outputs, so we should count the attribute locations
correctly for the vertex inputs.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-19 17:57:43 +10:00
Kenneth Graunke
14193e4643 ralloc: Fix ralloc_adopt() to the old context's last child's parent.
I was cleverly using one iteration to obtain a pointer to the last item
in ralloc's singly list child list, while also setting parents.

Unfortunately, I forgot to set the parent on that last item.

Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-18 23:30:51 -08:00
Dave Airlie
b476c587e3 glsl: fix transform feedback for 64-bit outupts.
This fixes the calculations for transform feedback for doubles.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-19 11:42:26 +10:00
Dave Airlie
64cfacf319 glsl: fix partial marking for fp64 types.
This doubles the element width for the types that are greater
than 2 elements wide.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-19 11:42:26 +10:00
Dave Airlie
1fc39dae22 glsl: only update doubles inputs for vertex inputs.
This doesn't apply to other stages. This is only
used in the mesa/st code, which needs further fixes.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-19 11:42:25 +10:00
Kristian Høgsberg Kristensen
8ac46d84ff vk: Fix check for I915_PARAM_MMAP_VERSION
Comparing the wrong thing for < 1.
2015-12-18 17:24:19 -08:00
Eric Anholt
f1fb85e544 vc4: Do instruction scheduling on the QIR to hide texture fetch latency.
This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR.  Texture
fetch can probably take as much as the rest of the cycles of the program,
so it's important to hide our other cycles during it (which is hard to do
after register allocation).  Also, we can queue up multiple texture
requests before collecting the resulting samples, so that we keep the
texture unit busy more of the time.

High-settings openarena performance +2.35849% +/- 0.221154% (n=7).  Also
about 2-3% on the multiarb demo.  8 piglit tests
(ext_framebuffer_multisample accuracy depthstencil) go from failing in
rendering to failing in register allocation, but hopefully I can fix that
up with some better register pressure handling here.

total instructions in shared programs: 87723 -> 88448 (0.83%)
instructions in affected programs:     78411 -> 79136 (0.92%)
total estimated cycles in shared programs: 276583 -> 246306 (-10.95%)
estimated cycles in affected programs:     265691 -> 235414 (-11.40%)
2015-12-18 17:12:10 -08:00
Eric Anholt
5278c64de5 vc4: Fix latency handling for QPU texture scheduling.
There's only high latency between a complete texture fetch setup and
collecting its result, not between each step of setting up the texture
fetch request.
2015-12-18 17:09:03 -08:00
Eric Anholt
960f48809f vc4: Keep sample mask writes from being reordered after TLB writes
Fixes a regression I noticed after introducing scheduling on the QIR.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-12-18 17:09:03 -08:00
Dave Airlie
5dc22cadb5 glsl: fix count_attribute_slots to allow for different 64-bit handling
So vertex shader input attributes are handled different than internal
varyings between shader stages, dvec3 and dvec4 only count as
one slot for vertex attributes, but for internal varyings, they
count as 2.

This patch comments all the uses of this API to clarify what we
pass in, except one which needs further investigation

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-19 12:00:00 +11:00
Dave Airlie
69ea66231e glsl: use dual slot helper in the linker code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-19 11:59:55 +11:00
Dave Airlie
d97b060e6f glsl/fp64: add helper for dual slot double detection.
The old function didn't work for matrices, and we need this
in other places to fix some other problems, so move to a helper
in glsl type and fix the one user so far.

A dual slot double is one that has 3 or 4 components in it's
base type.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-19 11:59:49 +11:00
Dave Airlie
9fbcd8e847 glsl: pass stage into mark function
Don't use a bool here, as for some 64-bit fixes we need
the stage.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-19 11:59:42 +11:00
Rob Herring
b201a6ed9f freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled
Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit
builds.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-18 14:01:07 -05:00
Matt Turner
bb9eb59933 i965/vec4: Optimize predicate handling for any/all.
For a select whose condition is any(v), instead of emitting

   cmp.nz.f0(8)    null<1>D        g1<0,4,1>D      0D
   mov(8)          g7<1>.xUD       0x00000000UD
   (+f0.any4h) mov(8) g7<1>.xUD    0xffffffffUD
   cmp.nz.f0(8)    null<1>D        g7<4,4,1>.xD    0D
   (+f0) sel(8)    g8<1>UD         g4<4,4,1>UD     g3<4,4,1>UD

we now emit

   cmp.nz.f0(8)    null<1>D        g1<0,4,1>D      0D
   (+f0.any4h) sel(8) g9<1>UD      g4<4,4,1>UD     g3<4,4,1>UD

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-18 13:20:13 -05:00
Matt Turner
c8a74e3a4e nir: Delete bany, ball, fany, fall.
As in the previous patches, these can be implemented as

   any(v) -> any_nequal(v, false)
   all(v) -> all_equal(v, true)

and their removal simplifies the code in the next patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-18 13:20:13 -05:00
Matt Turner
21cd298aec glsl: Implement all(v) as all_equal(v, true).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-18 13:20:13 -05:00
Matt Turner
2268a50ffd glsl: Remove ir_unop_any.
The GLSL IR to TGSI/Mesa IR paths for any_nequal have the same
optimizations the ir_unop_any paths had.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-18 13:20:12 -05:00
Matt Turner
249bb89617 glsl: Implement any(v) as any_nequal(v, false).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-18 13:20:12 -05:00
Nicolai Hähnle
0a6a17b9d7 gallium/radeon: only dispose locally created target machine in radeon_llvm_compile
Unify the cleanup paths of the function rather than duplicating code.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-18 12:17:40 -05:00
Jordan Justen
5e82a91324 anv/gen8: Add support for gl_NumWorkGroups
Co-authored-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-18 01:45:11 -08:00
Jason Ekstrand
d7f66f9f6f nir/spirv: Array lengths are constants not literals 2015-12-17 16:36:29 -08:00
Roland Scheidegger
61e5f8d073 gallium/util: (trivial) include p_shader_tokens.h in u_simple_shaders.h
as it uses definition from it (enum tgsi_return_type).
2015-12-18 01:02:16 +01:00
Roland Scheidegger
6743c68a11 draw: fix clip test with NaNs
NaNs mean it should be clipped, otherwise the NaNs might get passed to the
next stages (if clipping didn't happen for another reason already), which
might cause all kind of problems.
The llvm path got this right already (possibly by luck), but this isn't used
when there's a gs active.
Found by code inspection, verified with some hacked piglit test and some more
hacked debug output.
(Note the clipper can still itself incorrectly generate NaN and INF position
values in its output prims (at least after w divide / viewport transform) even
if the inputs weren't NaNs, if the position data of the vertices is
"sufficiently bad".)

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-18 00:57:07 +01:00
Roland Scheidegger
44e87b7b7b draw: fix pstipple and aaline stages wrt sampler_views/samplers
Those stages only really work for OGL-style texturing (so number of samplers
and views mostly the same, certainly for the max values).
These get often set up all at once, thus there might be max number of both
even if all of them are just NULL. We must not set the max number of samplers
and views to the same value since that will lead to terrible things if a driver
supports more views than samplers (and the state tracker set up all the views).
(This will not make these stages magically work if a shader uses dx10-style
texturing, they might still replace an actually used sview in that case.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-18 00:55:35 +01:00
Jason Ekstrand
1473a8dc6f anv/formats: Add more 64-bit formats 2015-12-17 13:51:09 -08:00
Jason Ekstrand
167809365b anv/formats: Add more PACK32 formats 2015-12-17 13:44:50 -08:00
Miklós Máté
6723b61753 swrast: move two global defines to the only place where they are used
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-17 12:09:58 -08:00
Miklós Máté
555f67c3d7 mesa: improve debug log in atifragshader
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-17 12:09:58 -08:00
Miklós Máté
5150d56ec4 program: fix comment about the fog formula
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-17 12:09:58 -08:00
Miklós Máté
7279453da5 mesa: Don't leak ATIfs instructions in DeleteFragmentShader
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-17 12:09:58 -08:00
Jason Ekstrand
952bf05897 anv/image: Properly report buffer features 2015-12-17 11:52:31 -08:00
Jason Ekstrand
3395ca17d1 isl: Add a is_storage_image_format helper 2015-12-17 11:45:04 -08:00
Jason Ekstrand
b1325404c5 anv/device: Handle zero-sized memory allocations 2015-12-17 11:00:38 -08:00
Oded Gabbay
6e44bbe0f5 configura.ac: fix test for SSE4.1 assembler support
This patch modifies the SSE4.1 test in configure.ac to use a global
variable to initialize vector variables. In addition, we now return the
value of the computation instead of 0.

This is done so gcc 4.9 (and lower) won't optimize the SSE4.1 assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE4.1 instructions (the test will pass because the compiler does support the intrinsics).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-17 09:37:24 +00:00
Jonathan Gray
4ef44bb484 configure: check for python2.7 for PYTHON2
Check for a 'python2.7' binary, 'python' and 'python2' are not
provided by the OpenBSD python 2.7.x packages.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-17 09:37:24 +00:00
Jonathan Gray
7f585a6a98 configure.ac: use pkg-config for libelf
Use PKG_CHECK_MODULES to get the flags to link libelf

v2: keep AC_CHECK_LIB as a fallback for elfutils provided
libelf that doesn't install a pkg-config file.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-17 09:37:24 +00:00
Jordan Justen
e97b207654 i965/screen: Allow OpenGLES 3.1 for gen8+
OpenGLES 3.1 cannot be enabled for gen 7 (Ivy Bridge, Haswell) since
they are still missing ARB_stencil_texturing.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-12-16 20:37:40 -08:00
Jordan Justen
3b5d442661 i965: Enable compute shaders in more cases for OpenGLES 3.1
Previously we were checking the desktop OpenGL ARB_compute_shader
requirements, but for OpenGLES 3.1, the requirements are lower.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-12-16 20:37:23 -08:00
Jordan Justen
3e8a6e468b main/version: Don't require ARB_compute_shader for OpenGLES 3.1
The OpenGL ARB_compute_shader extension specfication requires at least
1024 for GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS, whereas OpenGLES 3.1
only required 128.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-16 20:36:16 -08:00
Jordan Justen
a9d934726e main: Allow compute shaders to be compiled with OpenGLES 3.1
Previous OpenGLES 3.1 testing had been done when ARB_compute_shader
was overridden to enabled.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-12-16 20:35:55 -08:00
Jordan Justen
3507d0b7f9 main: Add MESA_VERBOSE=api for LinkProgram & UseProgram
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-16 20:35:51 -08:00
Matt Turner
257fb76403 ir_to_mesa: Skip useless comparison instructions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-16 19:59:05 -08:00
Kenneth Graunke
4a5cff24d7 glsl: Remove inverse() from GLSL 1.20 and 1.30.
I apparently regressed this when rewriting the built-ins using
ir_builder, in 76d2f73643.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93387
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-16 19:32:21 -08:00
Samuel Pitoiset
695ae816da nv50: free memory allocated by the prog which reads MP perf counters
This fixes a memory leak introduced in 6a9c151
("nv50: add compute-related MP perf counters on G84+")

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-12-16 21:52:43 -05:00
Brian Paul
f992d02ba2 st/osmesa: add OSMesaCreateContextAttribs() function
As with the previous commit, except for gallium.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-16 19:39:05 -07:00
Brian Paul
a34e7612dc osmesa: add new OSMesaCreateContextAttribs function
This allows specifying a GL profile and version so one can get a core-
profile context.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-16 19:38:51 -07:00
Brian Paul
c2c0983215 svga: don't use debug code in update_state() in release builds
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-16 19:38:15 -07:00
Jason Ekstrand
c643e9cea8 anv/state: Allow levelCount to be 0
This can happen if the client is creating an image view of a textureable
surface and they only ever intend to render to that view.
2015-12-16 17:34:57 -08:00
Samuel Pitoiset
aeee7f2a4d nv50,nvc0: free memory allocated by performance metrics
The destroy_query() helper was actually never called. This fixes
a memory leak while monitoring performance metrics.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-12-16 23:03:08 +01:00
Samuel Pitoiset
9aca60bfb0 nvc0: free memory allocated by the prog which reads MP perf counters
This fixes a long time ago memory leak (even before all my query
related changes).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-16 22:00:57 +01:00
Samuel Pitoiset
8022c7480e nvc0: fix metric-achieved_occupancy calculation on Kepler
The maximum number of resident warps per multiprocessor is 64 on
Kepler instead of 48 on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-16 22:00:57 +01:00
Christian König
a87a1420d6 st/va: remove fence handling v3
It's nonsense to drain the pipeline like this.

v2: keep the drain for DMA-buf exports.
v3: flush before the export and after compositing and add TODO comment.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2015-12-16 21:13:42 +01:00
Neil Roberts
61cdb7665f Revert "i965: Use MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals"
This reverts commit 839793680f.

The patch was breaking DRI3 because driGLFormatToImageFormat does not
handle MESA_FORMAT_B8G8R8X8_SRGB which ended up making it fail to
create the renderbuffer and it would later crash. It's not trivial to
add this format because there is no __DRI_IMAGE_FORMAT nor
__DRI_IMAGE_FOURCC define for the format either. I'm not sure how
difficult adding this would be and whether adding a new format would
require some sort of new version for DRI. Seeing as this might take a
while to fix I think it makes sense to just revert the patch in the
meantime in order to avoid regressing master.

It is also not handled in intel_gles3_srgb_workaround and there may be
other cases where it breaks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93388
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-16 17:35:33 +00:00
Neil Roberts
8c5310da9d i965: Fix crash when calling glViewport with no surface bound
If EGL_KHR_surfaceless_context is used then glViewport can be called
with NULL for the draw and read surfaces. This was previously causing
a crash because the i965 driver tries to use this point to invalidate
the surfaces and it was derferencing the NULL pointer.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93257
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2015-12-16 16:39:29 +00:00
Neil Roberts
4c7c9e4602 mesa/blit: Don't require the same format for mulitisample blits
Previously the GL spec required that whenever glBlitFramebuffer is
used with either buffer being multisampled, the internal formats must
match. However the GL 4.4 spec was later changed to remove this
restriction. In the section entitled “Changes in the released
Specification of July 22, 2013” it says:

“Relax BlitFramebuffer in section 18.3.1 so that format conversion can
 take place during multisample blits, since drivers already allow this
 and some apps depend on it.”

If most drivers already allowed this in earlier versions I think it's
safe to assume that this is a spec bug and it should also be allowed
in all versions.

This patch just removes the restriction on desktop GL. For GLES there
are conformance tests that assert the previous behaviour so it is
probably safer to leave it in.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92706
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-16 16:20:36 +00:00
Julien Isorce
89eb342def st/va: retrieve size from the temporary img variable
"image" is not ready yet since it will be set at
the end of the function by: *image = *img;

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
2015-12-16 14:12:31 +00:00
Roland Scheidegger
8e195a6251 draw: handle edge flags in llvm path
We just ignored them altogether. While this feature is rather old-fashioned
supporting it is actually rather trivial.
This fixes the associated piglit tests (2 gl-1.0-edgeflag, 2 gl-2.0-edgeflag
and all (7) of point-vertex-id).

v2: comment fixes, and make the use of the edgeflag in clipmask consistent
with when it's actually there (should be impossible to hit a case where the
difference would actually matter but still...)

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-16 03:55:25 +01:00
Roland Scheidegger
13c0b1c780 draw: don't set start_instance and instance id for pt emit
This just adds confusion, these parameters are used when fetching vertices
by translate, but certainly not when emitting hw vertices for drivers, they
make no sense there (setting them has no consequences otherwise since there
won't be any elements with instance_divisor set). So just set them to 0 (the
draw_pipe_vbuf code for emitting vertices when the draw pipeline is run
already does exactly that).
Also while here do some whitespace cleanup.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-16 03:55:14 +01:00
Jason Ekstrand
b2fe8b4673 nir/spirv: Add a missing break statement 2015-12-15 17:24:18 -08:00
Jason Ekstrand
1c51d91bfe anv/pipeline: Allow the user to pass a null MultisampleCreateInfo
According to section 5.2 of the Vulkan spec, this is allowed for color-only
rendering pipelines.
2015-12-15 16:26:10 -08:00
Jason Ekstrand
d61ff1ed08 anv/descriptor_set: Initialize immutable_samplers to NULL
Previously this wasn't a problem.  However, with the new API update,
descriptor sets can now be sparse so the client doesn't have to provide an
entry for every binding.  This means that it's possible for a binding to be
uninitialized other than the memset.  In that case, we want to have a null
array of immutable samplers.
2015-12-15 16:24:22 -08:00
Jason Ekstrand
d7cb1634d2 nir/lower_system_values: Refactor and use the builder.
Now that we have a helper in the builder for system values and a helper in
core NIR to get the intrinsic opcode, there's really no point in having
things split out into a helper function.  This commit "modernizes" this
pass to use helpers better and look more like newer passes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-12-15 14:12:31 -08:00
Jason Ekstrand
f6910f072a nir/builder: Add a load_system_value helper
While we're at it, go ahead and make nir_lower_clip use it.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-12-15 14:12:31 -08:00
Jason Ekstrand
ca5be008bc nir/lower_system_values: Stop supporting non-SSA
The one user of this (i965) only ever calls it while in SSA form.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-12-15 14:12:31 -08:00
Samuel Pitoiset
276837cbe4 nvc0: remove old comment related to metric calculations
I forgot to remove it when I refactored all performance metrics.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-15 22:49:37 +01:00
Eric Anholt
3858722740 vc4: Add support for dumping executed commands to a file.
The VC4_DEBUG=cl,qpu is nice and all, but I want to be able to get more
detailed dumps, and to replay the same exact commands in simulation.  For
that I need a dump with all of the VBOs, shaders, shader recs, etc.  This
dump can be parsed by vc4-gpu-tools.

For now this is only doable from simulator mode, because otherwise we
don't have access to the RCL contents generated by the kernel.
2015-12-15 12:05:48 -08:00
Eric Anholt
07570edb98 vc4: Import updated vc4_drm.h with hang state. 2015-12-15 12:02:54 -08:00
Eric Anholt
c5b886b028 vc4: Only update vc4->msaa when the framebuffer changes.
Any update here should have been the same as in
vc4_set_framebuffer_state(), except for the point where vc4_blit.c
temporarily sets different state for its different buffers.
2015-12-15 12:02:53 -08:00
Eric Anholt
f2cf2a63f1 vc4: Don't consider nr_samples==1 surfaces to be MSAA.
This is apparently a weirdness of gallium -- nr_samples==1 is occasionally
used and means the same thing as nr_samples==0.  Fixes a bunch of
ARB_framebuffer_srgb blit cases in piglit.
2015-12-15 12:02:53 -08:00
Eric Anholt
da92f16c50 vc4: Fix min() wrapper definition for the simulator's kernel code. 2015-12-15 12:02:53 -08:00
Eric Anholt
02bcb443ee vc4: Warn instead of abort()ing on exec ioctl failures.
It's really harsh to abort() the X Server because of a momentary failure
(particularly -ENOMEM).  I don't see a way to pass an -ENOMEM up the stack
from here, but we can at least log to stderr before proceeding on.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-12-15 12:02:44 -08:00
Jason Ekstrand
28c4ef9d6c anv/device: Bump the size of the instruction block pool
Some CTS test shaders were failing to compile.  At some point soon, we
really need to make a real pipeline cache and stop using a block pool for
this.
2015-12-15 11:49:28 -08:00
Jason Ekstrand
306abbead3 anv/pipeline: Properly set IncludeVertexHandles in 3DSTATE_GS 2015-12-15 11:37:18 -08:00
Jason Ekstrand
2d4b7eda23 nir/spirv: Add support for more CS intrinsics 2015-12-15 10:20:23 -08:00
Jason Ekstrand
1035108a7f nir/lower_system_values: Add support for computed builtins.
In particular, this commit adds support for computing gl_GlobalInvocationID
and gl_LocalInvocationIndex from other intrinsics.
2015-12-15 10:20:23 -08:00
Jason Ekstrand
630b9528b3 shader_enums: Add enums for gl_GlobalInvocationID and gl_LocalInvocationIndex 2015-12-15 10:20:23 -08:00
Jason Ekstrand
7ebd84fa4b nir/lower_system_values: Refactor and use the builder.
Now that we have a helper in the builder for system values and a helper in
core NIR to get the intrinsic opcode, there's really no point in having
things split out into a helper function.  This commit "modernizes" this
pass to use helpers better and look more like newer passes.
2015-12-15 10:20:23 -08:00
Jason Ekstrand
c26e889a44 nir/builder: Add a load_system_value helper
While we're at it, go ahead and make nir_lower_clip use it.

Cc: Rob Clark <robclark@gmail.com>
2015-12-15 10:20:23 -08:00
Jason Ekstrand
de67456d6d nir/lower_system_values: Stop supporting non-SSA
The one user of this (i965) only ever calls it while in SSA form.
2015-12-15 10:20:23 -08:00
Andreas Boll
a2140b0571 docs: Replace sourceforge logo with a text link
Fixes the following Lintian (Debian package checker) error:

privacy-breach-logo

  usr/share/doc/mesa-common-dev/contents.html
    (http://sourceforge.net/sflogo.php?group_id=3&amp;type=1)
  usr/share/doc/mesa-common-dev/thanks.html
    (http://sourceforge.net/sflogo.php?group_id=3&amp;type=1)

The extended description of this tag is:

    This package creates a potential privacy breach by fetching a logo
at runtime.

    Before using a local copy you should check that the logo is suitable
for main. You can get help with determining this by posting a link to
the logo and a copy of, or a link to, the logo copyright and license
information to the debian-legal mailing list.

    Please replace any scripts, images, or other remote resources with
non-remote resources. It is preferable to replace them with text and
links but local copies of the remote resources are also acceptable as
long as they don't also make calls to remote services. Please ensure
that the remote resources are suitable for Debian main before making
local copies of them.

    Severity: serious, Certainty: possible

    Check: files, Type: binary, udeb

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-15 17:57:25 +01:00
Chad Versace
64f0ee73e0 isl: Add func isl_surf_get_image_offset_sa
The function calculates the offset to a subimage within the surface, in
units of surface samples.

All unit tests pass with `make check`. (Admittedly, though, there are
too few unit tests).
2015-12-15 08:46:09 -08:00
Chad Versace
53504b884e isl: Fix calculation of array pitch for layout GEN4_2D
The height of the miptree's right half was not large enough.

Found by `make check` in test_isl_surf_get_offset, which is added in the
next commit.
2015-12-15 08:46:09 -08:00
Chad Versace
f7e36f9f66 isl: Move it a standalone directory
The plan all along was to eventualyl move isl out of the Vulkan
directory, because I intended i965 and anvil to share it.

A small problem I encountered when attempting to write unit tests for
isl precipitated the move.  I discovered that it's easier to get isl
unit tests to build if I remove the extra, unneeded dependencies
injected by src/vulkan/Makefile.am. And the easiest way to remove those
unneeded dependencies is to move isl out of src/vulkan. (Unit tests come
in subsequent commits).
2015-12-15 08:45:49 -08:00
Nicolai Hähnle
c8d9d289ff radeonsi: fix perfcounter selection for SI_PC_MULTI_BLOCK layouts
The incorrectly computed register count caused lockups.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-15 11:23:40 -05:00
Nicolai Hähnle
149d049676 gallium/radeon: remove unnecessary test in r600_pc_query_add_result
This test is a left-over of the initial development. It is unneeded and
misleading, so let's get rid of it.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-15 11:23:40 -05:00
Nicolai Hähnle
819543adb4 mesa/main: use BITSET_FOREACH_SET in perf_monitor_result_size
This should make the code both faster and slightly clearer.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-15 11:23:40 -05:00
Emil Velikov
9c0773958e docs: add news item and link release notes for 11.1.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-15 15:07:03 +00:00
Emil Velikov
b8394ef3df docs: add sha256 checksums for 11.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 525f3c2c28)
2015-12-15 15:07:02 +00:00
Emil Velikov
5497e119a5 docs: Update 11.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5a616125ac)
2015-12-15 15:07:02 +00:00
Rob Clark
e677b3047b freedreno/a4xx: fix fragcoord.z + fragdepth
It seems like disabling earlyz on a4xx also, by defaults, disables
fragcoord.z to the FS.  For frag shaders that both read fragcoord(.z)
and write fragdepth, we need to set some extra bits to prevent a
lockup.

This lets us get rid of the hack of disabling fragcoord.z (which
prevented 0ad from lockups, but resulted in rendering corruption).  Also
fixes fbo-depth-sample-compare.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-15 09:40:54 -05:00
Rob Clark
cad0920d11 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-15 09:39:10 -05:00
Rob Clark
249b2be3bc freedreno/ir3/cmdline: don't dump nir by default
By default we only want the disasm dumped, which we get anyways.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-15 09:39:10 -05:00
Christian König
10b7a7c344 st/va: remove nonesense HEVC picture id handling
The picture id in this case is a VA-API surface handle, checking
for a certain value can't be correct.

Signed-off-by: Christian König <christian.koenig@amd.com>
2015-12-15 11:25:02 +01:00
Chris Forbes
af5ca43f26 i965: Allocate URB space for HS and DS stages when required.
v2: (by Ken, incorporating feedback from Matt Turner):
- Rewrite the push constant allocation code to be clearer.
- Only apply the minimum VS entries workaround on Gen 8.

v3: (by Ken)
- Fix a bug in v2 where we failed to allocate the full push constant
  space when the number of enabled stages didn't divide the available
  push constant space evenly.  (Any left over space is now allocated
  to the PS, as it was in v1.)
- Fix an off-by-one error in v2's number of enabled stages calculation.
- Use DIV_ROUND_UP for nicer formatting.
- Line wrapping fixes.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-15 02:16:14 -08:00
Jason Ekstrand
8224571ef8 vec4/generator: Actually pass the sampler into generate_tex
This is an artifact of the way the separate samplers/textures series ended
up getting sent out and rebased.  This should fix a number of CTS tests
involving geometry shaders.
2015-12-14 21:13:52 -08:00
Jordan Justen
7edcc59a7b anv: Rename gs_vec4 to gs_kernel
The code generated may be vec4 or simd8 depending on how we start the
compiler.

To run the GS in SIMD8, set the INTEL_SCALAR_GS environment variable.
This was added in:

    commit 36fd653817
    Author: Kenneth Graunke <kenneth@whitecape.org>
    Date:   Wed Mar 11 23:14:31 2015 -0700

        i965: Add scalar geometry shader support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 18:23:14 -08:00
Jordan Justen
a3c5c339a8 nir/spirv_to_nir: Use a minimum of 1 for GS invocations
glslang is giving us 0, which causes the SIMD8 GS compile to hit an
assert.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 18:23:14 -08:00
Timothy Arceri
8c0963f9d3 docs: mark input/output block locations as DONE
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-15 13:10:51 +11:00
Timothy Arceri
0aeb9b3e5e glsl: add support for explicit locations inside interface blocks
This change also adds explicit location support for structs and interfaces which
is currently missing in Mesa but is allowed with SSO and GLSL 1.50+.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-15 13:10:44 +11:00
Timothy Arceri
183c606066 glsl: simplify interface matching
This makes the code easier to follow, should be more efficient
and will makes it easier to add matching via explicit locations
in the following patch.

This patch also replaces the hash table with the newer
resizable hash table this should be more suitable as the table
is likely to only contain a small number of entries.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-15 13:10:39 +11:00
Roland Scheidegger
8e264765a4 draw: remove clip_vertex from vertex header
vertex header had both clip_pos and clip_vertex.
We only really need one (clip_pos) because the draw llvm shader would
overwrite the position output from the vs with the viewport transformed.
However, we don't really need the second one, which was only really used
for gl_ClipVertex - if the shader didn't have that the values were just
duplicated to both clip_pos and clip_vertex. So, just use this from the vs
output instead when we actually need it.
Also change clip debug to output both the data from clip_pos and the
clipVertex output (if available).
Makes some things more complex, some things less complex, but seems more
easy to understand what clipping actually does (and what values it uses
to do its magic).

Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-15 02:03:40 +01:00
Roland Scheidegger
1775400a20 draw: use clip_pos, not clip_vertex for the fake guardband xy point clipping
Seems obvious now this should use the data from position and not clip_vertex
(albeit might not really make a difference).

Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-15 02:03:40 +01:00
Roland Scheidegger
8575ddb644 draw: rename vertex header members
clip -> clip_vertex and pre_clip_pos -> clip_pos.
Looks more obvious to me what these values actually represent (so use
something resembling the vs output names).

Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-15 02:03:40 +01:00
Roland Scheidegger
1b22815af6 draw: don't pretend have_clipdist is per-vertex
This is just for code cleanup, conceptually the have_clipdist really
isn't per-vertex state, so don't put it there (just dependent on the
shader). Even though there wasn't really any overhead associated with
this, we shouldn't store random shader information in the vertex header.

Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-15 02:03:40 +01:00
Roland Scheidegger
9e3f2af3c3 draw: use position not clipVertex output for xyz view volume clipping
I'm pretty sure this should use position (i.e. pre_clip_pos) and not
the output from clipVertex. Albeit piglit doesn't care. It is what we
use in the clip test, and it is what every other driver does (as they
don't even have clipVertex output and lower the additional planes to
clip distances).

Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-15 02:03:40 +01:00
Jason Ekstrand
f46544dea1 anv: Fix CUBE storage images 2015-12-14 16:59:59 -08:00
Jason Ekstrand
783a21192c anv: Add support for storage texel buffers 2015-12-14 16:51:12 -08:00
Jason Ekstrand
1f98bf8da0 anv: Pass an isl_format into fill_buffer_surface_state 2015-12-14 16:14:20 -08:00
Jason Ekstrand
091b6156dd i965/fs: Push small uniform arrays
Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations.  The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing.  Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up.  To do this,
we use an algorithm similar to the on in split_virtual_grfs.
2015-12-14 15:58:10 -08:00
Jason Ekstrand
63c313de84 i965/fs: Rename demote_pull_constants to lower_constant_loads 2015-12-14 15:58:10 -08:00
Jason Ekstrand
75f33a6420 i965/vec4: Get rid of the uniform_size array 2015-12-14 15:58:09 -08:00
Jason Ekstrand
eb76f226cf i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD 2015-12-14 15:58:09 -08:00
Jason Ekstrand
a487f0284f i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants
This commit moves us to an instruction based model rather than a
register-based model for indirects.  This is more accurate anyway as we
have to emit instructions to resolve the reladdr.  It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.

One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants.  This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.
2015-12-14 15:58:09 -08:00
Jason Ekstrand
46f5396846 i965/vec4: Inline get_pull_constant_offset
It's not really doing enough anymore to justify a helper function.
2015-12-14 15:58:09 -08:00
Jason Ekstrand
9c36c40845 i965/fs: Get rid of the param_size array 2015-12-14 15:58:09 -08:00
Jason Ekstrand
9024353db3 i965/fs: Stop relying on param_size in assign_constant_locations
Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode.  With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore.  The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other.  In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.
2015-12-14 15:58:09 -08:00
Jason Ekstrand
9f46af9e41 i965/fs: Get rid of reladdr
We aren't using it anymore.
2015-12-14 15:58:09 -08:00
Jason Ekstrand
a3cd95a884 i965/fs: Use MOV_INDIRECT for all indirect uniform loads
Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load.  We also have to
rework some of the other bits of the backend to handle this new form of
uniform load.  The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.
2015-12-14 15:58:09 -08:00
Jordan Justen
c4219bc6ff anv/cmd_buffer: Gen 8 requires 64 byte alignment for push constant data
See MEDIA_CURBE_LOAD, CURBE Data Start Address & CURBE Total Data Length

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 15:39:07 -08:00
Jason Ekstrand
f0313a5569 anv: Add initial support for cube maps
This fixes 486 cubemap CTS tests.
2015-12-14 15:36:30 -08:00
Kenneth Graunke
77cc2666b1 i965: Use DIV_ROUND_UP() in gen7_urb.c code.
This is a newer convention, which we prefer over ALIGN(x, n) / n.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-14 14:56:14 -08:00
Kenneth Graunke
9f0944d15b i965: Make TES inputs match TCS outputs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 14:48:29 -08:00
Kenneth Graunke
4fac950010 i965: Force VS -> TCS varyings to use the SSO VUE map layout.
The compact VUE map only works when varying packing is in use.
Unfortunately, varying packing is disabled for TCS inputs.

This is needed to fix Piglit's tcs-input-read-array-interface test.

v2: Make lines fit in 80 columns (caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 14:48:18 -08:00
Kenneth Graunke
bee42cc1f7 i965: Handle TCS outputs and TES inputs.
TCS outputs and TES inputs both refer to a common "patch URB entry"
shared across all invocations.  First, there are some number of
per-patch entries.  Then, there are per-vertex entries accessed via
an offset for the variable and a stride times the vertex index.

Because these calculations need to be done in both the vec4 and scalar
backends, it's simpler to just compute the offset calculations in NIR.
It doesn't necessarily make much sense to use per-vertex intrinsics
afterwards, but that at least means we don't lose the per-patch vs.
per-vertex information.

v2: Use is_input/is_output helpers (suggested by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 14:48:13 -08:00
Kenneth Graunke
31140d097a i965: Handle TCS inputs and TES outputs.
TES outputs work exactly like VS outputs, so we can simply add a case
statement for those.

TCS inputs are very similar to geometry shaders - they're arrays of
per-vertex data.  We use the same method I used for the scalar GS
backend.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 14:48:07 -08:00
Kenneth Graunke
1f46163acb i965: Add tessellation shader VUE map code.
Based on a patch by Chris Forbes, but largely rewritten by Ken.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-14 14:48:01 -08:00
Kenneth Graunke
9f3917bf37 i965: Fix partial variable access for geometry shaders in SSO mode.
Without varying packing, if a VS writes a compound variable, and the GS
only reads part of it, the base location of the variable may not
actually be in the VUE map.

To cope with this, we do lowering in terms of varying slots, add any
constant offsets to the base, and then do the VUE map remapping.  This
ensures we only look up VUE map entries for slots which actually exist.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-14 14:39:38 -08:00
Kenneth Graunke
8c4deb10df i965: Separate base offset/constant offset combining from remapping.
My tessellation branch has two additional remap functions.  I don't want
to replicate this logic there.

v2: Handle inputs/outputs separately (suggested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-14 14:39:34 -08:00
Jason Ekstrand
7ba70b1b51 nir: Add another index to load_uniform to specify the range read 2015-12-14 14:28:31 -08:00
Jason Ekstrand
4115648a6b i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT 2015-12-14 14:28:31 -08:00
Jason Ekstrand
2f1455dbb0 i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware
While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant.  This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.
2015-12-14 14:28:31 -08:00
Jason Ekstrand
4be9a1c7bb i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr
The subnr field is in bytes so we don't need to multiply by type_sz.
2015-12-14 14:28:31 -08:00
Jason Ekstrand
653d8044ab i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions
It should work fine without it and the visitor can set it if it wants.
2015-12-14 14:28:30 -08:00
Jason Ekstrand
2c90f08bf7 i965/fs: Add support for doing MOV_INDIRECT on uniforms 2015-12-14 14:28:30 -08:00
Kenneth Graunke
106c3a8a48 nir: Fix number of indices on shared variable store intrinsics.
Shared variables and input reworks landed around the same time.
Presumably, this was some sort of mistake in rebase conflict resolution.

This really only affects the num_indices field in nir_intrinsic_infos,
which is rarely used.  However, it's used by the printer.

Found by inspection.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-14 14:27:38 -08:00
Jason Ekstrand
dba28da075 anv/buffer_view: Store a bo + offset instead of buffer pointer
This is what image_view does.  Also, we really need to do this so that we
can properly handle the combined offsets from the buffer and from
pCreateInfo.

This fixes some of the nonzero offset buffer view CTS tests.
2015-12-14 14:10:40 -08:00
Ian Romanick
96dc732ed8 meta/generate_mipmap: Work-around GLES 1.x problem with GL_DRAW_FRAMEBUFFER
GL_DRAW_FRAMEBUFFER does not exist in OpenGL ES 1.x, and since
_mesa_meta_begin hasn't been called yet, we have to work-around API
difficulties.  The whole reason that GL_DRAW_FRAMEBUFFER is used instead
of GL_FRAMEBUFFER is that the read framebuffer may be different.  This
is moot in OpenGL ES 1.x.

I have another patch series that would also fix this (by removing the
calls to _mesa_BindFramebuffer and friends), but it's not quite ready
yet... and I think it may be a bit heavy for some stable branches.
Consider this a stop-gap fix.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93215
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-14 13:09:15 -08:00
Chad Versace
ee57062e1e anv: Remove anv_image::surface_type
When building RENDER_SURFACE_STATE, the driver set
SurfaceType = anv_image::surface_type, which was calculated during
anv_image_init(). This was bad because the value of
anv_image::surface_type was taken from a gen-specific header,
gen8_pack.h, even though the anv_image structure is used for all gens.

Replace anv_image::surface_type with a gen-specific lookup function,
anv_surftype(), defined in gen${x}_state.c.

The lookup function contains some useful asserts that caught some nasty
bugs in anv meta, which were fixed in the previous commit.
2015-12-14 10:46:27 -08:00
Samuel Pitoiset
71135e275f nvc0: check return value of nvc0_program_validate()
Spotted by Coverity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-14 19:08:42 +01:00
Samuel Pitoiset
54f58210c2 nv50: check return value of nouveau_object_new()
When ret == 0, obj is not NULL. Spotted by Coverity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-14 19:08:39 +01:00
Samuel Pitoiset
3f7462b792 nv50,nvc0: make use of unreachable() when invalid texture target happens
Spotted by Coverity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-14 19:08:25 +01:00
Chad Versace
f0d11d5a81 anv/meta: Fix VkImageViewType
Meta unconditionally used VK_IMAGE_VIEW_TYPE_2D in the functions below.
This caused some out-of-bound memory accesses.
  anv_CmdCopyImage
  anv_CmdBlitImage
  anv_CmdCopyBufferToImage
  anv_CmdClearColorImage

Fix it by adding a new function, anv_meta_get_view_type().
2015-12-14 09:03:58 -08:00
Chad Versace
0bebaeacd7 isl: Rename s/lod_align/image_align/ for consistency
Regarding the subimages within a surface, sometimes isl called them
"images" and sometimes "LODs". This patch make isl consistently refer to
them as "images".  I choose the term "image" over "LOD" because LOD is
an misnomer when applied to 3D surfaces. The alignment applies to each
individual 2D subimage, not to the LOD as a whole.

This patch changes no behavior. It's just a manually performed,
case-insensitive, replacement s/lod/image/ that maintains correct
indentation.  any behavior.
2015-12-14 09:01:51 -08:00
Chad Versace
85a6384014 anv/tests: gitignore block_pool_no_free 2015-12-14 09:00:28 -08:00
Chad Versace
0da776b733 anv: Fix build for unit tests
Clearly no one has been running `make check`, because the unittestbuild
has been broken for a long time. After this buildfix, all tests now
pass.
2015-12-14 09:00:28 -08:00
Christian König
8b52fa71ac st/va: handle default post process regions
Avoid referencing NULL pointers.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2015-12-14 11:54:55 +01:00
Christian König
f6dd31c1cf st/va: fix unused variable warning
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
2015-12-14 11:54:55 +01:00
Christian König
025d97381e st/va: clean up post process includes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2015-12-14 11:54:54 +01:00
Christian König
27a276f625 st/va: cleanup filter color standard handling
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: ulien Isorce <j.isorce@samsung.com>
2015-12-14 11:54:54 +01:00
Tapani Pälli
8b79258cfe meta: clear_state structure cleanup
Remove unused variables from clear_state and use a hardcoded location
for color uniform to get rid of 2 more variables. Modify shaders to use
explicit location for vertex attribute too as extension is enabled.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-14 08:01:49 +02:00
Ilia Mirkin
eca8f38dcf glsl: assign varying locations to tess shaders when doing SSO
GRID Autosport uses SSO shaders. When a tessellation evaluation shader
is passed through this, it triggers assertion failures down the line
with unassigned varying locations. Make sure to do this when the first
shader in the pipeline is not a vertex shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-13 11:35:28 -05:00
Neil Roberts
839793680f i965: Use MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals
Previously if the visual didn't have an alpha channel then it would
pick a format that is not sRGB-capable. I don't think there's any
reason not to always have an sRGB-capable visual. Since 28090b30 there
are now visuals advertised without an alpha channel which means that
games that don't request alpha bits in the config would end up without
an sRGB-capable visual. This was breaking supertuxkart which assumes
the winsys buffer is always sRGB-capable.

The previous code always used an RGBA format if the visual config
itself was marked as sRGB-capable regardless of whether the visual has
alpha bits. I think we don't actually advertise any sRGB-capable
visuals (but we just use sRGB formats anyway) so it shouldn't make any
difference. However this patch also changes it to use RGBX if an
sRGB-capable visual is requested without alpha bits for consistency.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92759
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-13 14:29:42 +00:00
Neil Roberts
43f4be5f06 i965: Add B8G8R8X8_SRGB to the alpha format override
brw_init_surface_formats overrides the render format for RGBX formats
which aren't supported for rendering so that they internally use RGBA
instead. However, B8G8R8X8_SRGB was missing so it wasn't marked as a
renderable format. This patch just adds it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-13 14:29:41 +00:00
Neil Roberts
c769efda93 i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format
This will be used in a subsequent patch as the format for RGB visuals.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-13 14:29:38 +00:00
Jason Ekstrand
c56186026f anv: Add initial support for texel buffers 2015-12-12 16:11:23 -08:00
Jason Ekstrand
fd944197f2 i965/nir: Provide a default LOD for buffer textures
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures.  GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless.  This commit allows other NIR producers to be
more lazy and not provide one at all.
2015-12-12 16:09:54 -08:00
Ilia Mirkin
7752bbc44e gk104/ir: simplify and fool-proof texbar algorithm
With the current algorithm, we only look at tex uses. However there's a
write-after-write hazard where we might decide to, on some path, not use
a texture's output at all, but instead to write a different value to
that register. However without the barrier, the texture might complete
later and overwrite that value.

This fixes Unreal Elemental demo on GK110/GK208, flightgear on GK10x,
and likely other random-looking failures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-12-12 18:10:16 -05:00
Ilia Mirkin
d35695096d nv50/ir: combine sequences of conversions
In some cases shaders want non-default rounding when converting float to
integer. This can be done in one go, so merge the two ops. This comes up
in the packUnorm4x8 & co functions, as well as a few random shaders.
Overall shader-db impact is minimal, helping a handful of witcher2 and
other misc shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:16 -05:00
Ilia Mirkin
dbca0f3eba nv50/ir: manually optimize multiplication expansion logic
The conversion of 32-bit integer multiplies into 16-bit ones happens
after the regular optimization loop. However it's fairly common to
multiply by a small integer, rendering some of the expansion pointless.

Firstly, propagate immediates when possible into mul ops, secondly just
remove the ops when they are unnecessary.

Including the change to generate imad immediates, the effect is:

total instructions in shared programs : 6365463 -> 6351898 (-0.21%)
total gprs used in shared programs    : 728684 -> 728684 (0.00%)
total local used in shared programs   : 9904 -> 9904 (0.00%)
total bytes used in shared programs   : 44001576 -> 44036120 (0.08%)

                local        gpr       inst      bytes
    helped           0           0        3288           4
      hurt           0           0           0         842

It's easy for this to hurt bytes since we end up always generating the
8-byte form, while we can't always get rid of the immediate in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:16 -05:00
Ilia Mirkin
3af83c4bc7 nv50/ir: fix imul emission in the presence of an immediate
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:15 -05:00
Ilia Mirkin
a0b5d5beed nv50/ir: teach post-ra immediate folding into mad about integers
There will usually be a split before the mad op, peer through that and
pick out the right word of the immediate.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:15 -05:00
Ilia Mirkin
ab70ea1353 nv50/ir: add short imad support
Support emission of the short imad, but also include it in the various
logic that tries to make it possible to emit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:15 -05:00
Ilia Mirkin
6aca7fecb7 nv50/ir: can't have predication and immediates
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-12 18:10:15 -05:00
Ilia Mirkin
69e8b476d0 nv50/ir: fix texture grad for cubemaps
We were ignoring the partial derivatives on the last dim.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:15 -05:00
Ilia Mirkin
a27548400e nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg units
On NV50, we use 16-bit reg units (to make it all work with half-regs). A
few places assumed that it was always in 32-bit units.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-12 18:10:15 -05:00
Nicolai Hähnle
d640f179d3 gallium/ddebug: regularly log the total number of draw calls
This helps in the use of GALLIUM_DDEBUG_SKIP: first run a target application
with skip set to a very large number and note how many draw calls happen
before the bug. Then re-run, skipping the corresponding number of calls.
Despite the additional run, this can still be much faster than not skipping
anything.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-12 15:23:50 -05:00
Nicolai Hähnle
b86d5ccae2 gallium/ddebug: add GALLIUM_DDEBUG_SKIP option
When we know that hangs occur only very late in a reproducible run (e.g.
apitrace), we can save a lot of debugging time by skipping the flush and hang
detection for earlier draw calls.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-12 15:23:34 -05:00
Roland Scheidegger
af7ba989fb llvmpipe: fix layer/vp input into fs when not written by prior stages
ARB_fragment_layer_viewport requires that if a fs reads layer or viewport
index but it wasn't output by gs (or vs with other extensions), then it reads
0. This never worked for llvmpipe, and is surprisingly non-trivial to fix.
The problem is the mechanism to handle non-existing outputs in draw is rather
crude, it will simply redirect them to whatever is at output 0, thus later
stages will just get garbage. So, rather than trying to fix this up (which
looks non-trivial), fix this up in llvmpipe setup by detecting this case there
and output a fixed zero directly.
While here, also optimize the hw vertex layout a bit - previously if the gs
outputted layer (or vp) and the fs read those inputs, we'd add them twice
to the vertex layout, which is unnecessary.
And do some minor cleanup, slots don't require that many bits, there was some
bogus (but harmless) float/int mixup for psize slot too, make the slots all
unsigned (we always put pos at pos zero thus everything else has to be positive
if it exists), and make sure they are properly initialized (layer and vp index
slot were not which looked fishy as they might not have got set back to zero
when changing from a gs which outputs them to one which does not).

This fixes the failures in piglit's arb_fragment_layer_viewport group
(3 each for layer and vp).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-12 01:59:15 +01:00
Brian Paul
27d5be0b8f svga: avoid emitting redundant SetSamplers() commands
This greatly reduces the number of SetSamplers() commands for some
applications.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-12-11 16:54:58 -07:00
Brian Paul
1291e910d5 svga: avoid emitting redundant SetIndexBuffer commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-12-11 16:54:44 -07:00
Brian Paul
71f19dd201 st/mesa: trivial indentation fix 2015-12-11 16:53:20 -07:00
Brian Paul
c877f1aeef util/blitter: minor formatting fixes 2015-12-11 16:53:20 -07:00
Jason Ekstrand
1c605c8dfa Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in a shared local memory fix.
2015-12-11 14:29:13 -08:00
Jason Ekstrand
b8425bb1e8 i965/fs: Use the correct source for local memory load offsets
The offset for loads is in src[0].  This was a copy+paste error in the
nir_intrinsic_load/store refactoring.  This commit fixes a segfault in
ES31-CTS.compute_shader.work-group-size.  I have no idea how piglit failed
to catch this...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93348
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-11 13:56:34 -08:00
Jason Ekstrand
d12ea21dd5 gen8/pipeline: Support vec4 vertex shaders
In order to actually get them, you need INTEL_DEBUG=vec4.
2015-12-11 13:25:17 -08:00
Kenneth Graunke
fadf378497 i965: Add Gen8+ tessellation control shader state (3DSTATE_HS).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
b3c32f5f34 i965: Add Gen7+ tessellation engine state (3DSTATE_TE).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
37b0b11cef i965: Add Gen8+ tessellation evaluation shader state (3DSTATE_DS).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
86a6eda9bc i965: Add tessellation shader push constant support.
Based on a patch by Chris Forbes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
c59d1b1fd1 i965: Add tessellation shader sampler support.
Based on code by Chris Forbes and Fabian Bieler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
f34c04fda6 i965: Add tessellation shader surface support.
This is brw_gs_surface_state.c copy and pasted twice with search and
replace.

brw_binding_table.c code is similarly copy and pasted.

v2: Drop dword_pitch related fields.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
82455e5396 i965: Make fs_visitor::emit_urb_writes set EOT for TES as well.
Tessellation evaluation shaders work almost identically to vertex
shaders - we have a set of URB writes at the end of the program, and the
last one should terminate it.

Geometry shaders really are the special case, where multiple
EmitVertex() calls trigger URB writes in the middle of the program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
7e0c22d461 i965: Don't hardcode g1 for URB handles in fs_visitor::emit_urb_writes().
Tessellation evaluation shaders will use g4 instead.  For now, make an
fs_reg called urb_handle and use that in place of hardcoding g1.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-11 13:11:15 -08:00
Kenneth Graunke
77b338d63b i965: Make brw_set_message_descriptor() non-static.
I want to use this directly from brw_vec4_generator.cpp.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-11 13:11:15 -08:00
Kristian Høgsberg Kristensen
e803276148 Revert "i965/HACK: Build brw_cs into libcompiler"
This reverts commit 6df7963531.
2015-12-11 13:09:42 -08:00
Kristian Høgsberg Kristensen
21d5e52da8 Merge ../mesa into vulkan 2015-12-11 13:09:06 -08:00
Kristian Høgsberg Kristensen
c51f133197 i965: Move brw_cs_fill_local_id_payload() to libi965_compiler
This is a helper function for setting up the local invocation ID
payload according to the cs_prog_data generated by the compiler. It's
intended to be available to users of libi965_compiler so move it there.
2015-12-11 13:07:25 -08:00
Eric Anholt
076551116e vc4: Add quick algebraic optimization for clamping of unpacked values.
GL likes to saturate your incoming color, but if that color's coming from
unpacking from unorms, there's no point.  Ideally we'd have a range
propagation pass that cleans these up in NIR, but that doesn't seem to be
going to land soon.  It seems like we could do a one-off optimization in
nir_opt_algebraic, except that doesn't want to operate on expressions
involving unpack_unorm_4x8, since it's sized.

total instructions in shared programs: 87879 -> 87761 (-0.13%)
instructions in affected programs:     6044 -> 5926 (-1.95%)
total estimated cycles in shared programs: 349457 -> 349252 (-0.06%)
estimated cycles in affected programs:     6172 -> 5967 (-3.32%)

No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15.
2015-12-11 12:36:16 -08:00
Eric Anholt
e3efc4b023 vc4: When doing algebraic optimization into a MOV, use the right MOV.
If there were src unpacks, changing to the integer MOV instead of float
(for example) would change the unpack operation.
2015-12-11 12:21:22 -08:00
Eric Anholt
2591beef89 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.
2015-12-11 12:21:22 -08:00
Eric Anholt
b70a2f4d81 vc4: Add missing progress note in opt_algebraic. 2015-12-11 12:21:22 -08:00
Eric Anholt
5989ef2b0f vc4: Add debugging of the estimated time to run the shader to shader-db. 2015-12-11 12:21:22 -08:00
Eric Anholt
53b2523c6e vc4: Fix handling of sample_mask output.
I apparently broke this in a late refactor, in such a way that I decided
its tests were some of those interminable ones that I should just
blacklist from my testing.  As a result, the refactors related to it were
totally wrong.
2015-12-11 12:21:22 -08:00
Edward O'Callaghan
53609de762 softpipe: enable GL_ARB_viewport_array support, update GL3.txt doc
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-11 20:09:21 +01:00
Edward O'Callaghan
00f97ad5de softpipe: implement some support for multiple viewports
Mostly related to making sure the rasterizer can correctly
pick out the correct scissor box for the current viewport.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-11 20:09:21 +01:00
Roland Scheidegger
6c2c1e0ffe draw: don't assume fixed offset for data in struct vertex_info
Otherwise, if struct vertex_info is changed, you're in for some surprises...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-11 20:09:21 +01:00
Neil Roberts
583a5778f4 i965/gen9: Don't do fast clears when GL_FRAMEBUFFER_SRGB is enabled
When GL_FRAMEBUFFER_SRGB is enabled any single-sampled renderbuffers
are resolved in intel_update_state because the hardware can't cope
with fast clears on SRGB buffers. In that case it's pointless to do a
fast clear because it will just be immediately resolved.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-11 18:05:56 +00:00
Neil Roberts
0033c81344 i965/gen9: Allow fast clears for non-MSRT SRGB buffers
SRGB buffers are not marked as losslessly compressible so previously
they would not be used for fast clears. However in practice the
hardware will never actually see that we are using SRGB buffers for
fast clears if we use the linear equivalent format when clearing and
make sure to resolve the buffer as a linear format before sampling
from it.

This is an important use case because by default the window system
framebuffers are created as SRGB so without this fast clears won't be
used there.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-11 18:05:56 +00:00
Neil Roberts
82d459a423 i965/gen9: Resolve SRGB color buffers when GL_FRAMEBUFFER_SRGB enabled
SKL can't cope with the CCS buffer for SRGB buffers. Normally the
hardware won't see the SRGB formats because when GL_FRAMEBUFFER_SRGB
is disabled these get mapped to their linear equivalents. In order to
avoid relying on the CCS buffer when it is enabled this patch now
makes it flush the renderbuffers.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-11 18:05:56 +00:00
Neil Roberts
eb291d7013 i965/gen8+: Don't upload the MCS buffer for single-sampled textures
For single-sampled textures the MCS buffer is only used to implement
fast clears. However the surface always needs to be resolved before
being used as a texture anyway so the the MCS buffer doesn't actually
achieve anything. This is important for Gen9 because in that case SRGB
surfaces are not supported for fast clears and we don't want the
hardware to see the MCS buffer in that case.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-11 18:05:56 +00:00
Neil Roberts
44902ed1fa i965/meta-fast-clear: Disable GL_FRAMEBUFFER_SRGB during clear
Adds MESA_META_FRAMEBUFFER_SRGB to the meta save state so that
GL_FRAMEBUFFER_SRGB will be disabled when performing the fast clear.
That way the render surface state will be programmed with the linear
equivalent format during the clear. This is important for Gen9 because
the SRGB formats are not marked as losslessly compressible so in
theory they aren't support for fast clears. It shouldn't make any
difference whether GL_FRAMEBUFFER_SRGB is enabled for the fast clear
operation because the color is not actually written to the framebuffer
so there is no chance for the hardware to apply the SRGB conversion on
it anyway.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-11 18:05:56 +00:00
Marek Olšák
369afdb7b6 winsys/amdgpu: clear the buffer cache on mmap failure and try again
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
84a38bfc29 winsys/radeon: clear the buffer cache on mmap failure and try again
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
eb1e1af676 winsys/amdgpu: clear the buffer cache on allocation failure and try again
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
f9d6fe8001 winsys/radeon: clear the buffer cache on allocation failure and try again
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
cf811faeff gallium/radeon: remove radeon_winsys_cs_handle
"radeon_winsys_cs_handle *cs_buf" is now equivalent to "pb_buffer *buf".

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
cf422d20ff winsys/radeon: use pb_cache instead of pb_cache_manager
This is a prerequisite for the removal of radeon_winsys_cs_handle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
ebc9497fcb winsys/radeon: use radeon_bomgr less
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
a450f96ba9 winsys/radeon: rename radeon_bomgr_init_functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
38ac20f7dd winsys/radeon: move variables from radeon_bomgr to radeon_drm_winsys
radeon_bomgr is going away.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:13 +01:00
Marek Olšák
3d090223ef winsys/radeon: remove redundant radeon_bomgr::va
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
1e05812fcd winsys/amdgpu: don't use the "rws" abbreviation for amdgpu_winsys
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
6f4e74d165 winsys/amdgpu: use pb_cache instead of pb_cache_manager
This is a prerequisite for the removal of radeon_winsys_cs_handle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
3fbf250dfa gallium/pb_bufmgr_cache: use the new pb_cache module
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
2b396eeed9 gallium/pb_cache: add a copy of cache bufmgr independent of pb_manager
This simplified (basically duplicated) version of pb_cache_manager will
allow removing some ugly hacks from radeon and amdgpu winsyses and
flatten simplify their design.

The difference is that winsyses must manually add buffers to the cache
in "destroy" functions and the cache doesn't know about the buffers before
that. The integration is therefore trivial and the impact on the winsys
design is negligible.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
1a24f443b4 radeonsi: implement fast stencil clear
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
8ee96ce834 radeonsi: re-enable Hyper-Z for stencil
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
99e63338fb r600g: remove a Hyper-Z workaround that's likely not needed anymore
FORCE_OFF == 0, no need to set that

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
96e8d38ac4 r600g: re-enable Hyper-Z for stencil on Evergreen & Cayman
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
d3c08309ab gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 correctly
This is the recommended setting according to hw people and it makes Hyper-Z
stable. Just the two magic states.

This fixes Evergreen, Cayman, SI, CI, VI (using the Cayman code).

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
7c29bf26bb radeonsi: don't use the CP DMA workaround on Fiji and newer
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
787ada6bf6 radeonsi: apply the streamout workaround to Fiji as well
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
62d82193b8 radeonsi: also print hexadecimal values for register fields in the IB parser
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
de887ba90c radeonsi: implement RB+ for Stoney (v2)
v2: fix dual source blending

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
0f9519b938 radeonsi: don't call of u_prims_for_vertices for patches and rectangles
Both caused a crash due to a division by zero in that function.
This is an alternative fix.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
51603af390 radeonsi: use tgsi_shader_info::colors_written
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:11 +01:00
Marek Olšák
b5b87c4ed1 r600g: write all MRTs only if there is exactly one output (fixes a hang)
This fixes a hang in
piglit/arb_blend_func_extended-fbo-extended-blend-pattern_gles2 on REDWOOD.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:11 +01:00
Marek Olšák
eb4813a952 tgsi/scan: add flag colors_written
This is a prerequisite for the following r600g fix.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-11 15:25:11 +01:00
Marek Olšák
37208c4fd7 Revert "radeonsi: disable DCC on Stoney"
This reverts commit 32f05fadbb.

It turned out the problem with Stoney was caused by incorrect handling of
a non-power-two VRAM size in the kernel driver.
This is an optional BIOS setting and can be worked around by choosing
a different VRAM size in the BIOS.

Cc: 11.1 <mesa-stable@lists.freedesktop.org>
2015-12-11 15:25:11 +01:00
Timothy Arceri
4b9a79b7b8 nir: silence uninitialized warning
Reviewed-by: Rob Clark <robdclark@gmail.com>
2015-12-11 19:26:20 +11:00
Jason Ekstrand
6ae4e59fac anv/pipeline: Get rid of the no kernel input parameters hack
Previously, meta would pass null shaders in for the VS when it intended to
disable the VS.  However, this meant that we didn't know what inputs we had
and would dead-code things in the FS.  In order to solve this, we
hard-coded a number.  Now meta passes in a VS even if it plans to disable
the stage so this is no longer needed.
2015-12-10 22:37:30 -08:00
Jason Ekstrand
bd0e25d41e anv/apply_pipeline_layout: Multiply uniform sizes by 4
This is because uniforms are now in terms of bytes everywhere.
2015-12-10 22:36:49 -08:00
Jason Ekstrand
6df7963531 i965/HACK: Build brw_cs into libcompiler
We need it for CS push constants
2015-12-10 22:36:07 -08:00
Dave Airlie
18ad641c3b mesa/shader: return correct attribute location for double matrix arrays
If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.

I think this code is also wrong for float matricies as well.

There is now a piglit for the float case.

This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test

[airlied: update with Tapani suggestion to clean it up].

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-11 16:28:29 +10:00
Jason Ekstrand
21cf55ab54 gen8/cmd_buffer: Don't push CS constants if there aren't any
Issuing MEDIA_CURB_LOAD with a size of zero causes GPU hangs on BDW.
2015-12-10 18:56:27 -08:00
Jason Ekstrand
3893e11f4b anv: Use 4 instead of sizeof(gl_constant_value)
We no longer have access to gl_constant_value and, really, it's 4 because
our uniform layout code works entirely in dwords.
2015-12-10 18:55:16 -08:00
Jason Ekstrand
13d1dd465c nir/spirv: Put SSBO store writemasks in the right index
It moved with the nir_intrinsic_load/store update.
2015-12-10 18:54:44 -08:00
Jason Ekstrand
d5c9955d3e Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in nir_intrinsic_load/store changes and the switch of all
uniforms in i965 to bytes.  This accounts for the Vulkan changes.
2015-12-10 18:29:36 -08:00
Roland Scheidegger
64c59b0624 draw: fix clipping with linear interpolated values and gl_ClipVertex
Discovered this when working on other clip code, apparently didn't work
correctly - the combination of linear interpolated values and using
gl_ClipVertex produced wrong values (failing all such combinations
in piglits glsl-1.30 interpolation tests, named
interpolation-noperspective-XXX-vertex).
Use the pre-clip-pos values when determining the interpolation factor to
fix this.
Noone really understands this code well, but everybody agrees this looks
sane... This fixes all those failing tests (10 in total) both with
the llvm and non-llvm draw paths, with no piglit regressions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-12-11 02:21:39 +01:00
Dave Airlie
5362e53a06 r600: add missing return value check.
Pointed out by coverity scan.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-11 09:37:20 +10:00
Jason Ekstrand
8beea9d45b anv/icd: Advertise the right ABI version 2015-12-10 12:27:13 -08:00
Jason Ekstrand
78b81be627 nir: Get rid of *_indirect variants of input/output load/store intrinsics
There is some special-casing needed in a competent back-end.  However, they
can do their special-casing easily enough based on whether or not the
offset is a constant.  In the mean time, having the *_indirect variants
adds special cases a number of places where they don't need to be and, in
general, only complicates things.  To complicate matters, NIR had no way to
convdert an indirect load/store to a direct one in the case that the
indirect was a constant so we would still not really get what the back-ends
wanted.  The best solution seems to be to get rid of the *_indirect
variants entirely.

This commit is a bunch of different changes squashed together:

 - nir: Get rid of *_indirect variants of input/output load/store intrinsics
 - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect
 - nir/lower_io: Get rid of load/store_foo_indirect
 - i965/fs: Get rid of load/store_foo_indirect
 - i965/vec4: Get rid of load/store_foo_indirect
 - tgsi_to_nir: Get rid of load/store_foo_indirect
 - ir3/nir: Use the new unified io intrinsics
 - vc4: Do all uniform loads with byte offsets
 - vc4/nir: Use the new unified io intrinsics
 - vc4: Fix load_user_clip_plane crash
 - vc4: add missing src for store outputs
 - vc4: Fix state uniforms
 - nir/lower_clip: Update to the new load/store intrinsics
 - nir/lower_two_sided_color: Update to the new load intrinsic

NIR and i965 changes are

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

NIR indirect declarations and vc4 changes are

Reviewed-by: Eric Anholt <eric@anholt.net>

ir3 changes are

Reviewed-by: Rob Clark <robdclark@gmail.com>

NIR changes are

Acked-by: Rob Clark <robdclark@gmail.com>
2015-12-10 12:25:16 -08:00
Jason Ekstrand
f3970fad9e i965/fs_nir: Refactor store_output, load_input, and load_uniform
There was way too much incrementing of things going on.  Instead, let's
just start everything off at the right base location, and then increment in
the loop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-10 12:25:16 -08:00
Patrick Rudolph
79bff488bc gallium/util: return correct number of bound vertex buffers
In case a state tracker unbinds every slot by a seperate
pipe->set_vertex_buffers() call, starting from slot zero, the number
of bound buffers would not reach zero at all.
The current algorithm does not account for pre-existing holes in the
buffer list.

Unbinding all buffers at once or starting at the top-most slot results
in correct behaviour.

Calculating the correct number of bound buffers fixes a NULL pointer
dereference in nvc0_validate_vertex_buffers_shared().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-10 13:55:53 -05:00
Neil Roberts
ba67739b66 blit: Don't take into account the Mesa format when checking MSRT blit
According to the GLES3 spec, blitting between multisample FBOs with
different internal formats should not be allowed. The
compatible_resolve_formats function implements this check. Previously
it had a shortcut where if the Mesa formats of the two renderbuffers
were the same then it would assume the blit is ok. However some
drivers map different internal formats to the same Mesa format, for
example it might implement both GL_RGB and GL_RGBA textures with
MESA_FORMAT_R8G8B8A_UNORM. The function is used to generate a GL error
according to what the GL spec requires so the blit should not be
allowed in that case. This patch just removes the shortcut so that it
only ever looks at the internal format.

Note that I posted a related patch to disable this check altogether
for desktop GL. However this function is still used on GLES3 because
there are conformance tests that require this behaviour so this patch
is still useful.

Cc: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-10 11:03:58 +00:00
Neil Roberts
3f10774cba i965: Check base format to determine whether to use tiled memcpy
The tiled memcpy doesn't work for copying from RGBX to RGBA because it
doesn't override the alpha component to 1.0. Commit 2cebaac479 added
a check to disable it for RGBX formats by looking at the TexFormat.
However a lot of the rest of the code base is written with the
assumption that an RGBA texture can be used internally to implement a
GL_RGB texture. If that is done then this check breaks. This patch
makes it instead check the base format of the texture which I think
more directly matches the intention.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-12-10 11:03:49 +00:00
Neil Roberts
9a31d9870b i965/gen8: Allow rendering to B8G8R8X8
Since Gen8 this is allowed as a rendering target so we don't need to
override it to B8G8R8A8. This is helpful on Gen9+ where using this
override causes fast clears not to work.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-12-10 11:03:49 +00:00
Neil Roberts
d151338594 i965/gen9: Allow fast clear for MSRT formats matching render
Previously fast clear was disallowed on Gen9 for MSRTs with the claim
that some formats don't work but we didn't understand why. On further
investigation it seems the formats that don't work are the ones where
the render surface format is being overriden to a different format
than the one used for texturing. The one used for texturing is not
actually a renderable format. It arguably makes sense that the sampler
hardware doesn't handle the fast color correctly in these cases
because it shouldn't be possible to end up with a fast cleared surface
that is non-renderable.

This patch changes the limitation to prevent fast clear for surfaces
where the format for rendering is overriden.

Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-12-10 11:03:49 +00:00
Neil Roberts
e1a16b901b i965/gen9/fast-clear: Handle linear→SRGB conversion
If GL_FRAMEBUFFER_SRGB is enabled when writing to an SRGB-capable
framebuffer then the color will be converted from linear to SRGB
before being written. There is no chance for the hardware to do this
itself because it can't modify the clear color that is programmed in
the surface state so it seems pretty clear that the driver should be
handling this itself.

Note that this wasn't a problem before Gen9 because previously we were
only able to do fast clears to 0 or 1 and those values are the same in
linear and SRGB space.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-10 11:03:49 +00:00
Jordan Justen
83e8e07a2b docs: Add ARB_compute_shader to 11.2.0 release notes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
1c0d059c02 docs: Mark ARB_compute_shader as done for i965
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
d04612b60d i965: Enable ARB_compute_shader extension on supported hardware
Enable ARB_compute_shader on gen7+, on hardware that supports the
OpenGL 4.3 requirements of a local group size of 1024.

With SIMD16 support, this is limited to Ivy Bridge and Haswell.

Broadwell will work with a local group size up to 896 on SIMD16
meaning programs that use this size or lower should run when setting
MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
e288b4a133 i965/nir: Implement shared variable atomic operations
v3:
 * Update based on latest SSBO code (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
d584b2313e nir: Add nir intrinsics for shared variable atomic operations
v3:
 * Update min/max based on latest SSBO code (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
fc21a7c26e glsl: Disable several optimizations on shared variables
Shared variables can be accessed by other threads within the same
local workgroup. This prevents us from performing certain
optimizations with shared variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
f821a3ec4f glsl: Buffer atomics are supported for compute shaders
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
7333593cf3 glsl: Translate atomic intrinsic functions on shared variables
When an intrinsic atomic operation is used on a shared variable, we
translate it to a new 'shared variable' specific intrinsic function
call.

For example, a call to __intrinsic_atomic_add when used on a shared
variable will be translated to a call to
__intrinsic_atomic_add_shared.

v3:
 * Fix stale comments copied from SSBOs (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
614ad9b40b glsl: Check for SSBO variable in check_for_ssbo_store
The compiler probably already blocks this earlier on, but we should be
checking for an SSBO here.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
c2e6cfbd78 glsl: Check for SSBO variable in SSBO atomic lowering
When an atomic function is called, we need to check to see if it is
for an SSBO variable before lowering it to the SSBO specific intrinsic
function.

v2:
 * is_in_buffer_block => is_in_shader_storage_block (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
a108e14d1c glsl: Replace atomic_ssbo and ssbo_atomic with atomic
The atomic functions can also be used with shared variables in compute
shaders.

When lowering the intrinsic in lower_ubo_reference, we still create an
SSBO specific intrinsic since SSBO accesses can be indirectly
addressed, whereas all compute shader shared variable live in a single
shared variable area.

v2:
 * Also remove the _internal suffix from ssbo atomic intrinsic names (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
23da6aeb17 glsl: Allow atomic functions to be used with shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
d3625d4071 i965: Lower shared variable references to intrinsic calls
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
b1fe3af0da i965: Enable shared local memory for CS shared variables
v3:
 * Check shared variable size at link time

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
faddb301ff i965/fs: Handle nir shared variable store intrinsic
v4:
 * Apply similar optimization for shared variable stores as
   0cb7d7b4b7. This was causing a
   OpenGLES 3.1 CTS failure, but
   867c436ca8 fixes that.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
8613206bd3 i965/fs: Handle nir shared variable load intrinsic
v3:
 * Remove extra #includes (Iago)
 * Use recently added GEN7_BTI_SLM instead of BRW_SLM_SURFACE_INDEX (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
e128a62318 i965: Disable vector splitting on shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
aa12a92626 nir: Translate glsl shared var store intrinsic to nir intrinsic
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
03b0439938 nir: Translate glsl shared var load intrinsic to nir intrinsic
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
1078d712d7 glsl: Add lowering pass for shared variable references
In this lowering pass, shared variables are decomposed into intrinsic
calls.

v2:
 * Send mem_ctx as a parameter (Iago)

v3:
 * Shared variables don't have an associated interface block (Iago)
 * Always use 430 packing (Iago)
 * Comment / whitespace cleanup (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Iago Toral Quiroga
f22ab2e8b3 glsl: Don't assert on shared variable matrices with 'inherited' layout
We use column-major for shared variable matrices.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
66eaef7737 glsl: Don't lower_variable_index_to_cond_assign for shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
c43a7e605e glsl: Remove mem_ctx as member variable in lower_ubo_reference_visitor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
ee005df2f9 glsl ubo/ssbo: Move common code into lower_buffer_access::setup_buffer_access
This code will also be usable by the pass to lower shared variables.

Note, that *const_offset is adjusted by setup_buffer_access so it must
be initialized before calling setup_buffer_access.

v2:
 * Add comment for lower_buffer_access::setup_buffer_access

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
99c8196458 glsl ubo/ssbo: Move is_dereferenced_thing_row_major into lower_buffer_access
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
afa4129cf6 glsl ubo/ssbo: Add lower_buffer_access class
This class has code that will be shared by lower_ubo_reference and
lower_shared_reference. (lower_shared_reference will be used to
support compute shader shared variables.)

v2:
 * Add lower_buffer_access.h to makefile (Emil)
 * Remove static is_dereferenced_thing_row_major from
   lower_buffer_access.cpp. This will become a lower_buffer_access
   method in the next commit.
 * Pass mem_ctx as parameter rather than using a member variable (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
ad3c65e792 glsl ubo/ssbo: Split buffer access to insert_buffer_access
This allows the code in emit_access to be generic enough to also be
for lowering shared variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Jordan Justen
05667ecc52 glsl ubo/ssbo: Use enum to track current buffer access type
v2:
 * Rename ssbo_get_array_length to ssbo_unsized_array_length_access (Iago)
 * Use always use this-> when referencing buffer_access_type (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 23:50:38 -08:00
Tapani Pälli
8cc372b6d9 glsl: do not loose always_active_io when packing varyings
Otherwise packed and inactive varyings get optimized away. This needs
to be prevented when using separate shader objects where interface
needs to be preserved.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-10 07:51:55 +02:00
Tapani Pälli
2377db2c4e mesa: invalidate pipeline status after glUseProgramStages
This will cause validation to run during next draw, this is done
because possible changes in used stages and programs can cause
invalid pipeline state.

This fixes a subtest in following CTS test:
	ES31-CTS.sepshaderobjs.StateInteraction

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-10 07:51:40 +02:00
Dave Airlie
21abaad8fe mesa/varray: set double arrays to non-normalised.
Doesn't have any effect in practice I don't think, but
CTS reads back using GetVertexAttrib.

This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-10 13:51:44 +10:00
Michel Dänzer
b4a03e7f8f clover: Fix build against LLVM 3.8 SVN >= r255078
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-10 10:45:29 +09:00
Chad Versace
5ba9121fe8 anv/image: Remove some vkCreateImage validation
Don't validate the baseArrayLayer and layerCount of cube images.  This
allows us to remove a bloated lookup table and an unneeded struct
definition (anv_image_view_info).
2015-12-09 16:33:23 -08:00
Chad Versace
9a9c551f3e anv/image: Drop unused halign, valign lookup tables 2015-12-09 15:36:39 -08:00
Brian Paul
e1815bcc47 mesa: fix ID usage for buffer warnings
We need a different ID pointer for each call site.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-09 16:06:35 -07:00
Brian Paul
de5bb7fe78 docs: remove stray <ul> tag from 11.0.5.html file to fix indentation 2015-12-09 15:55:11 -07:00
Serge Martin
2b930327e8 freedreno: little clean up in fd_create_surface
in order to avoid returing invalid adress if CALLOC_STRUCT return NULL.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-09 17:32:41 -05:00
Serge Martin
0149e7a944 freedreno: change to goto fail
in fd_resource_transfer_map, like the others error cases

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-09 17:31:16 -05:00
Serge Martin
e63fec29a1 freedreno: fix bind_sampler_states when hwcso is NULL
src/gallium/tests/trivial/compute.c expects samplers to be cleaned
when the samplers list is NULL.
Like in radeon, the function behave like when the number of samplers
parameter is set to 0.

[small s/hwsco/hwcso/ typo fix]
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-09 17:30:58 -05:00
Edward O'Callaghan
f32f80e19d gallium/util: Make u_prims_for_vertices() safe
Let us avoid trapping in hardware from a SIGFPE and instead
assert on a zero divisor.

Hint: This can occur if a PIPE_PRIM_? is not handled in
      u_prim_vertex_count() that results in ' info ' not
      being initialized in the expected manner.

Further, we also fix a possibly NULL pointer dereference
from ' info ' being NULL from a u_prim_vertex_count() call.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-09 22:51:56 +01:00
Andreas Boll
63fe600c7a docs: add news item for mesa-demos 8.3.0 release
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
2015-12-09 22:44:52 +01:00
Jason Ekstrand
46bcf9d777 vulkan: Pull in the 0.210.1 vk_platform header
Somehow this got missed in the API update.
2015-12-09 11:55:38 -08:00
Jordan Justen
47e5fb52f4 gen8/compute: Setup push constants and local ids
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-09 11:04:30 -08:00
Jordan Justen
f8d5fb4293 anv: Add anv_cmd_buffer_cs_push_constants
Similar to anv_cmd_buffer_push_constants, but handles the compute
pipeline, which requires different setup from the other stages.

This also handles initializing the compute shader local IDs.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-09 11:02:20 -08:00
Patrick Rudolph
432a798cf5 nv50,nvc0: fix use-after-free when vertex buffers are unbound
Always reset the vertex bufctx to make sure there's no pointer to
an already freed pipe_resource left after unbinding buffers.
Fixes use after free crash in nvc0_bufctx_fence().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
[imirkin: simplify nvc0 fix, apply to nv50]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-09 13:38:15 -05:00
Andreas Boll
f876346cdd mesa: Fix a typo in a comment
s/suports/supports/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:29:24 +01:00
Andreas Boll
0560e835f3 glx: Fix a typo in a comment
s/suports/supports/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:29:21 +01:00
Andreas Boll
9246df2280 st/osmesa: Fix a typo in a comment
s/suport/support/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:29:18 +01:00
Andreas Boll
7af9930ab4 meta: Fix a typo in a print message
s/Unkown/Unknown/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:29:15 +01:00
Andreas Boll
c83e161c91 mesa: Fix typos in print messages
s/inconsistant/inconsistent/
s/occurences/occurrences/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:29:11 +01:00
Andreas Boll
5c27cb3da3 glsl: Fix a typo in a comment
s/suports/supports/

Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-09 18:26:47 +01:00
Brian Paul
aa9af32752 svga: initialize pipe_driver_query_info entries with a macro
To be safe, set all the fields in case the enums ordering/values
ever change.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-12-09 09:43:47 -07:00
Brian Paul
ab0651ccfd mesa: detect inefficient buffer use and report through debug output
When a buffer is created with GL_STATIC_DRAW, its contents should not
be changed frequently.  But that's exactly what one application I'm
debugging does.  This patch adds code to try to detect inefficient
buffer use in a couple places.  The GL_ARB_debug_output mechanism is
used to report the issue.

NVIDIA's driver detects these sort of things too.

Other types of inefficient buffer use could also be detected in the
future.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-12-09 09:43:47 -07:00
Emil Velikov
7d3df58125 docs: add news item and link release notes for 11.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-09 16:12:32 +00:00
Emil Velikov
61b91d0811 docs: add sha256 checksums for 11.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit f9715bc449)
2015-12-09 16:11:12 +00:00
Emil Velikov
d432be32e2 docs: add release notes for 11.0.7
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bec983b738)
2015-12-09 16:11:11 +00:00
Francisco Jerez
595c818071 i965: Resolve color and flush for all active shader images in intel_update_state().
Fixes arb_shader_image_load_store/execution/load-from-cleared-image.shader_test.

Couldn't reproduce any significant FPS regression in CPU-bound
benchmarks from the Finnish benchmarking system on neither VLV nor BSW
after 30 runs with 95% confidence level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92849
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-12-09 15:12:59 +02:00
Francisco Jerez
3dc97a1586 i965: Document inconsistent units the URB size is represented in.
Every other gen the representation of the URB size was changed and
previous ones weren't updated.  I'd be willing to write a series
normalizing this to be KB on all generations if anybody else cares.
2015-12-09 14:00:30 +02:00
Francisco Jerez
228d5a3f75 i965: Hook up L3 partitioning state atom.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:59:03 +02:00
Francisco Jerez
1fc797e8e4 i965: Work around L3 state leaks during context switches.
This is going to require some rather intrusive kernel changes to fix
properly, in the meantime (and forever on at least pre-v4.1 kernels)
we'll have to restore the hardware defaults at the end of every batch
in which the L3 configuration was changed to avoid interfering with
the DDX and GL clients that use an older non-L3-aware version of Mesa.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>

v2: Optimize look-up of the default configuration by assuming it's the
    first entry of the L3 config array in order to avoid an FPS
    regression in GpuTest Triangle and SynMark OglBatch2-7 on most
    affected platforms.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-09 13:57:40 +02:00
Francisco Jerez
09d9638dd0 i965: Add debug flag to print out the new L3 state during transitions.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
acc77947ca i965: Implement L3 state atom.
The L3 state atom calculates the target L3 partition weights when the
program bound to some shader stage is modified, and in case they are
far enough from the current partitioning it makes sure that the L3
state is re-emitted.

v2: Fix for inconsistent units the context URB size is expressed in.
    Clamp URB size to 1008 KB on SKL due to FF hardware limitation.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
95ad0bd33b i965: Calculate appropriate L3 partition weights for the current pipeline state.
This calculates a rather conservative partitioning of the L3 cache
based on the shaders currently bound to the pipeline and whether they
use SLM, atomics, images or scratch space.  The result is intended to
be fine-tuned later on based on other pipeline state.

Note that the L3 partitioning calculated for VLV in the non-SLM non-DC
case differs from the hardware defaults in that it doesn't include a
DC partition and has twice as much RO cache space -- This is an
intentional functional change that improves performance in several
bandwidth-bound benchmarks on VLV (5% significance): SynMark
OglTexFilterAniso by 14.18%, SynMark OglTexFilterTri by 7.15%, Unigine
Heaven by 4.91%, SynMark OglShMapPcf by 2.15%, GpuTest Fur by 1.83%,
SynMark OglDrvRes by 1.80%, SynMark OglVsTangent by 1.71%, and a few
other benchmarks from the Finnish system by less than 1%.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
fa1300f75e i965: Implement selection of the closest L3 configuration based on a vector of weights.
The input of the L3 set-up code is a vector giving the approximate
desired relative size of each partition.  This implements logic to
compare the input vector against the table of validated configurations
for the device and pick the closest compatible one.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
353abb294b i965: Define and use REG_MASK macro to make masked MMIO writes slightly more readable.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
fa043698d2 i965/hsw: Enable L3 atomics.
Improves performance of the arb_shader_image_load_store-atomicity
piglit test by over 25x (which isn't a real benchmark it's just heavy
on atomics -- the improvement in a microbenchmark I wrote a while ago
seemed to be even greater).  The drawback is one needs to be
extra-careful not to hang the GPU (in fact the whole system).  A DC
partition must have been allocated on L3, the "convert L3 cycle for DC
to UC" bit may not be set, the MOCS L3 cacheability bit must be set
for all surfaces accessed using DC atomics, and the SCRATCH1 and
ROW_CHICKEN3 bits must be kept in sync.

A fairly recent kernel is required for the command parser to allow
writes to these registers.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
6907175a4f i965: Implement programming of the L3 configuration.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
b22bebe966 i965: Import tables enumerating the set of validated L3 configurations.
It should be possible to use additional L3 configurations other than
the ones listed in the tables of validated allocations ("BSpec »
3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
» L3 Allocation and Programming"), but it seems sensible for now to
hard-code the tables in order to stick to the hardware docs.  Instead
of setting up the arbitrary L3 partitioning given as input, the
closest validated L3 configuration will be looked up in these tables
and used to program the hardware.

The included tables should work for Gen7-9.  Note that the quantities
are specified in ways rather than in KB, this is because the L3
control registers expect the value in ways, and because by doing that
we can re-use a single table for all GT variants of the same
generation (and in the case of IVB/HSW and CHV/SKL across different
generations) which generally have different L3 way sizes but allow the
same combinations of way allocations.

v2: Use slice count from the devinfo structure instead of the gt
    number to implement get_l3_way_size().

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
a403ad4f5a i965: Add slice count to the brw_device_info structure.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
c8ff045fdb i965/gen8: Don't add workaround bits to PIPE_CONTROL stalls if DC flush is set.
According to the hardware docs a DC flush is sufficient to make
CS_STALL happy, there's no need to add STALL_AT_SCOREBOARD whenever
it's present.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:05 +02:00
Francisco Jerez
2405b75bc9 i965: Define state flag to signal that the URB size has been altered.
This will make sure that we recalculate the URB layout anytime the URB
size is modified by the L3 partitioning code.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:04 +02:00
Francisco Jerez
4841cab01a i965: Keep track of whether LRI is allowed in the context struct.
This stores the result of can_do_pipelined_register_writes() in the
context struct so we can find out later whether LRI can be used to
program the L3 configuration.

v2:
 * Split change of gen check in can_do_pipelined_register_writes (jljusten)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:04 +02:00
Francisco Jerez
50c2713726 i965: Adjust gen check in can_do_pipelined_register_writes
Allow for pipelined register writes for gen < 7.

v2:
 * Split from another patch and adjust comment (jljusten)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:04 +02:00
Francisco Jerez
5912da45a6 i965: Define symbolic constants for some useful L3 cache control registers.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-12-09 13:46:04 +02:00
Dave Airlie
e307cfa7d9 radeonsi: handle loading doubles as geometry shader inputs.
This adds the double code to the geometry shader input handling.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 17:04:04 +10:00
Dave Airlie
8c9e40ac22 radeonsi: handle doubles in lds load path.
This handles loading doubles from LDS properly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.fedoraproject.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 17:03:38 +10:00
Dave Airlie
cce3864046 r600: handle geometry dynamic input array index
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.

We need to load the AR register with the value from the index reg

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 15:07:53 +10:00
Dave Airlie
38542921c7 r600g: fix geom shader input indirect indexing.
This fixes:
gs-input-array-vec4-index-rd

The others run out of gprs unfortunately.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 15:07:47 +10:00
Dave Airlie
e97ac006d7 r600g: fix outputing to non-0 buffers for stream 0.
This fixes:
arb_transform_feedback3-ext_interleaved_two_bufs_gs
arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
transform-feedback-builtins

If we are only emitting one ring, then emit all output
buffers on it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 15:07:01 +10:00
Edward O'Callaghan
1f61447ce1 r600: Add ARB_copy_image support
[airlied: update relnotes]

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 14:41:46 +10:00
Edward O'Callaghan
d13ac27200 r600g: allow copying between compatible un/compressed formats
See: `commit e82c527f1fc2f8ddc64954ecd06b0de3cea92e93`

which is where a block in src maps to a pixel in dst and vice versa.
    e.g. DXT1 <-> R32G32_UINT
         DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 14:40:32 +10:00
Ilia Mirkin
f920f8eb02 nv50/ir: fix cutoff for using r63 vs r127 when replacing zero
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:

total bytes used in shared programs   : 44154976 -> 44139880 (-0.03%)

Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-08 23:15:29 -05:00
Ilia Mirkin
44260d9080 nv50/ir: prefer to color mad def and src2 with the same color
This allows us to use the short encoding, and potentially fold
immediates in later on.

total instructions in shared programs : 6379731 -> 6367861 (-0.19%)
total gprs used in shared programs    : 728502 -> 728683 (0.02%)
total local used in shared programs   : 9904 -> 9904 (0.00%)
total bytes used in shared programs   : 44661008 -> 44154976 (-1.13%)

                local        gpr       inst      bytes
    helped           0          51        7267       20306
      hurt           0         232         125         274

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 23:15:29 -05:00
Ilia Mirkin
c1c1248b94 nv50/ir: reduce degree limit on ops that can't encode large reg dests
Operations that take immediates can only encode registers up to 64. This
fixes a shader in a "Powered by Unity" intro.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 23:15:29 -05:00
Ilia Mirkin
99581ca393 nv50/ir: only unspill once ahead of a group of instructions
We already semi-did this but the list of uses as unsorted, so it was
unreliable. Sort the uses by bb and serial, and don't unspill for each
instruction in a sequence. (And also don't unspill multiple times for a
single instruction that uses the value in question multiple times.)

This causes a minor reduction in generated instructions for shader-db
(as few programs spill) but more importantly it brings determinism to
each run's output.

On SM10:

total instructions in shared programs : 6387945 -> 6379359 (-0.13%)
total gprs used in shared programs    : 728544 -> 728544 (0.00%)
total local used in shared programs   : 9904 -> 9904 (0.00%)

                local        gpr       inst      bytes
    helped           0           0         322         322
      hurt           0           0           0           0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 23:15:29 -05:00
Ilia Mirkin
0f647bd65b nv50/ir: check if the target supports the new offset before inlining
Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93300
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 23:15:29 -05:00
Dave Airlie
a13b14930d llvmpipe: fix fp64 inputs to geom shader.
This fixes the fetching of fp64 inputs to the geometry shader,

this fixes the recently posted piglit's
arb_gpu_shader_fp64/execution/gs-fs-vs-double-array.shader_test
arb_vertex_attrib_64bit/execution/gs-fs-vs-attrib-double-array.shader_test

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-09 13:56:39 +10:00
Jordan Justen
974bdfa9ad i965: Move brw_cs_fill_local_id_payload to brw_compiler.h
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-08 18:09:31 -08:00
Jordan Justen
d28df86c87 anv/compute: Fix thread width max off by 1
See cooresponding code in:

commit 8d87070af2
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Thu Aug 28 14:47:19 2014 -0700

    i965/cs: Implement brw_emit_gpgpu_walker

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-08 18:09:31 -08:00
Matt Turner
3a7f95b3aa nir: Optimize useless comparisons against true/false.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]

v2: Move new rule to Boolean simplification section
    Add a a@bool != true simplification

Suggested-by: Neil Roberts <neil@linux.intel.com>
2015-12-08 15:41:08 -08:00
Matt Turner
9e9e6fc8f1 glsl: Switch opcode and avail parameters to binop().
To make it match unop().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-08 15:39:47 -08:00
Matt Turner
dd3c16c94b glsl_to_tgsi: Skip useless comparison instructions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 15:38:03 -08:00
Matt Turner
eca846e7ae glsl: Relax qualifier ordering restriction in ES 3.1.
... and allow the "binding" qualifier in ES 3.1 as well.

GLSL ES 3.1 incorporates only a few features from the extension
ARB_shading_language_420pack: the relaxed qualifier ordering
requirements and the binding qualifier.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-08 15:36:57 -08:00
Matt Turner
79da7220db glsl: Use has_420pack().
These features would not have been enabled with #version 420 otherwise.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-08 15:36:57 -08:00
Matt Turner
c200e606f7 glsl: Allow binding of image variables with 420pack.
This interaction was missed in the addition of ARB_image_load_store.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93266
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-08 15:36:57 -08:00
Jose Fonseca
a9a0c693e5 appveyor: Cache winflexbison archive.
Unforunately the Appveyor -> SourceForge connection seems a bit
unreliable, causing frequent build failures while downloading
winflexbison (approx once every 2 days).

Fetching winflexbison archive into Appveyor's cache should eliminate
these.

Fetching Python modules from PyPI doesn't seem to be a problem, so they
are left alone for now, though they could eventually get the same
treatment.
2015-12-08 22:49:38 +00:00
Chad Versace
db66424218 anv: Remove unused anv_image_view_info_for_vk_image_view_type() 2015-12-08 14:25:28 -08:00
Eric Anholt
f61ceeb3fd vc4: Enable MSAA.
We still have several failures in the newly enabled tests in simulation:
sRGB downsampling is done as if it was just linear, stencil blits are not
supported on MSAA either, and derivatives are still not supported
(breaking some MSAA simulation shaders).  So, other than sRGB downsampling
quality, things seem to be in good shape.
2015-12-08 10:09:52 -08:00
Eric Anholt
fc4a1bfb88 vc4: Add support for mapping of MSAA resources.
The pipe_transfer_map API requires that we do an implicit
downsample/upsample and return a mapping of that.
2015-12-08 09:49:56 -08:00
Eric Anholt
6b4dfd53ae vc4: Add support for texel fetches from MSAA resources.
This is the core of ARB_texture_multisample.  Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves.  We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.
2015-12-08 09:49:55 -08:00
Eric Anholt
a97b40dca4 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.
2015-12-08 09:49:54 -08:00
Eric Anholt
edc3305de7 vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.
I only stumbled on this while experimenting due to reading about HW-2905.
I don't know if the EZ disable in the Z-clear is actually necessary, but
go with it for now.
2015-12-08 09:49:54 -08:00
Eric Anholt
edfd4d853a vc4: Add support for drawing in MSAA. 2015-12-08 09:49:53 -08:00
Eric Anholt
e7c8ad0a6c vc4: Add kernel RCL support for MSAA rendering. 2015-12-08 09:49:53 -08:00
Eric Anholt
568d3a8e32 vc4: Rename color_ms_write to color_write.
I was thinking this was the only MSAA resolve thing, so it should be noted
separately, but actually load/store general also do MSAA resolve.
2015-12-08 09:49:52 -08:00
Eric Anholt
bf92017ace vc4: Allow RCL blits to the edge of the surface.
The recent unaligned fix successfully prevented RCL blits that weren't
aligned inside of the surface, but we also want to be able to do RCL blits
for the whole surface when the width or height of the surface aren't
aligned (we don't care what renders inside of the padding).
2015-12-08 09:49:52 -08:00
Eric Anholt
fb4877dbab vc4: Add disabled debug printf for describing blits.
I keep typing variants of this while debugging RCL blits for MSAA.
2015-12-08 09:49:51 -08:00
Eric Anholt
2792d118f1 vc4: Fix check for tile RCL blits with mismatched y.
This was a typo in 3a508a0d94 that didn't
show up in testcases at that moment.
2015-12-08 09:49:51 -08:00
Eric Anholt
1529f138ff vc4: Fix compiler warning from size_t change.
I missed this when bringing over the kernel changes.
2015-12-08 09:49:50 -08:00
Olivier Pena
a5256012ef scons: support for LLVM 3.7.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-08 13:53:31 +00:00
Dave Airlie
bd47fcd57b docs/GL3.txt: consolidate r600 GL4.1.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-08 20:13:14 +10:00
Jason Ekstrand
18069dce4a i965: Make uniform offsets be in terms of bytes
This commit pushes makes uniform offsets be terms of bytes starting with
nir_lower_io.  They get converted to be in terms of vec4s or floats when we
cram them in the UNIFORM register file but reladdr remains in terms of
bytes all the way down to the point where we lower it to a pull constant
load.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
813f0eda8e i965/nir_uniforms: Replace comps_per_unit with an is_scalar boolean
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
22c273de2b i965/nir: Remove unused indirect handling
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it.  This commit gets rid of the dead indirect handling code.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
abb569ca18 i965/state: Get rid of dword_pitch arguments to buffer functions
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
05bdc21f84 i965/vec4: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92909
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
13ad8d03f2 i965/fs: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
e3e70698c3 i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge.  On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s.  We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:51:23 -08:00
Jason Ekstrand
f4aee5d82f gen8/cmd_buffer: Flush push constants after descriptor sets
This is because, if storage images are used, flushing descriptor sets can
cause push constants to become dirty.
2015-12-07 21:45:43 -08:00
Jason Ekstrand
43ac954e25 anv: Add initial support for pushing image params
The helper to fill out the image params data-structure is stilly a dummy,
but this puts the infastructure in place.
2015-12-07 21:08:26 -08:00
Jason Ekstrand
1eb731d9fe anv/descriptor_set: Add support for storage images in layouts 2015-12-07 21:08:26 -08:00
Jason Ekstrand
ff05f634f6 anv/image: Add a separate storage image surface state
Thanks to hardware limitations, storage images may need a different surface
format and/or other bits in the surface state.
2015-12-07 21:08:22 -08:00
Jason Ekstrand
8f83222d37 isl: Add initial support for storage images 2015-12-07 21:08:08 -08:00
Ben Widawsky
6ef8149bcd i965: Fix texture views of 2d array surfaces
It is legal to have a texture view of a single layer from a 2D array texture;
you can sample from it, or render to it. Intel hardware needs to be made aware
when it is using a 2d array surface in the surface state. The texture view is
just a 2d surface with the backing miptree actually being a 2d array surface.
This caused the previous code would not set the right bit in the surface state
since it wasn't considered an array texture.

I spotted this early on in debug but brushed it off because it is clearly not
needed on other platforms (since they all pass). I have no idea how this works
properly on other platforms (I think gen7 introduced the bit in the state, but I
am too lazy to check). As such, I have opted not to modify gen7, though I
believe the current code is wrong there as well.

Thanks to Chris for helping me debug this.

v2: Just use the underlying mt's target type to make the array determination.
This replaces a bug in the first patch which was incorrectly relying only
on non-zero depth (not sure how that had no failures). (Ilia)

Cc: Chris Forbes <chrisf@ijw.co.nz>
Reported-by: Mark Janes <mark.a.janes@intel.com> (Jenkins)
References: https://www.opengl.org/registry/specs/ARB/texture_view.txt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92609
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-07 18:47:04 -08:00
Nicolai Hähnle
d5a5dbd71f radeonsi: last_gfx_fence is a winsys fence
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-12-07 21:15:59 -05:00
Jason Ekstrand
42b4417031 HACK/i965: Disable assign_var_locations on uniforms
This conflicts with the way we're doing uniforms in Vulkan.
2015-12-07 17:19:55 -08:00
Jason Ekstrand
cd75ff5d17 anv/pipeline: Only apply a pipeline layout if we have one 2015-12-07 16:56:02 -08:00
Ilia Mirkin
f97f755192 nvc0/ir: fix up mul+add -> mad algebraic opt, enable for integers
For some reason this has been disabled for integers ever since codegen
was merged, despite there being emission code for IMAD. Seems to work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-07 18:49:28 -05:00
Ilia Mirkin
1d708aacb7 gk110/ir: fix imad sat/hi flag emission for immediate args
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-07 18:49:28 -05:00
Kenneth Graunke
87a1166310 i965: Add brw_device_info::min_ds_entries field.
From the 3DSTATE_URB_DS documentation:

"Project: IVB, HSW
 If Domain Shader Thread Dispatch is Enabled then the minimum number of
 handles that must be allocated is 10 URB entries."

"Project: BDW+
 If Domain Shader Thread Dispatch is Enabled then the minimum number of
 handles that must be allocated is 34 URB entries."

When the HS is run in SINGLE_PATCH mode (the only mode we support
today), there is no minimum for HS - it's just zero.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:55 -08:00
Chris Forbes
42ca675cc9 i965: Add state bits for tess stages
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:55 -08:00
Chris Forbes
80ea18d1a1 i965: Add backend structures for tess stages
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:55 -08:00
Chris Forbes
5340f37902 i965: Set core tessellation-related limits
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:54 -08:00
Kenneth Graunke
a9e6a56a02 i965: Request lowering of gl_TessLevel* from float[] to vec4s.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:54 -08:00
Kenneth Graunke
7a17356800 i965: Create new files for HS/DS/TE state upload code.
For now, this just splits the existing code to disable these stages into
separate atoms/files.  We can then replace it with real code.

v2: Bump the render atoms in this patch so it compiles (in my branch,
    I'd bumped it in an earlier patch).  61 seems to be the minimum
    that works, which doesn't match the old value + the number of atoms
    I added in this patch, so apparently we had some slop before.

v3: Actually disable the DS unit on Gen8+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-07 14:48:54 -08:00
Ilia Mirkin
63b850403c gk104/ir: sampler doesn't matter for txf
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-07 16:22:54 -05:00
Marek Olšák
32f05fadbb radeonsi: disable DCC on Stoney
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-07 22:01:08 +01:00
Sonny Jiang
2618886600 winsys/amdgpu: addrlib - port a Fiji bug fix
Fiji: Fixed tiled resource failures

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

v2: fix a compile failure (typo) - Marek
2015-12-07 21:58:42 +01:00
Sonny Jiang
338d7bf053 winsys/amdgpu: addrlib - port Checks mip 0 for czDispCompatible
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-07 21:58:42 +01:00
Sonny Jiang
676bc25140 winsys/amdgpu: addrlib - port fix error for workaround for 1D tiling
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-07 21:58:42 +01:00
Christian König
a2c5200a4b st/va: disable MPEG4 by default v2
The workarounds are too hacky to enable them by default
and otherwise MPEG4 doesn't work reliably.

v2: add docs/envvars.html, CC stable and fix typos

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Cc: "11.1.0" <mesa-stable@lists.freedesktop.org>
2015-12-07 20:34:17 +01:00
Christian König
ca3e2b76c0 st/va: move HEVC functions into separate file v2
v2: actually copy all of it

Signed-off-by: Christian König <christian.koenig@amd.com>
2015-12-07 20:34:17 +01:00
Alejandro Piñeiro
3d260cc653 mesa: remove _mesa_tex_target_is_array
_mesa_is_array_texture provides the same functionality and:

1. it returns bool instead of GLboolean
2. it's not related to the texture format (texformat.c)
3. the name's a little shorter

v2: remove _mesa_tex_target_is_array instead (Brian Paul)

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-07 20:31:20 +01:00
Alejandro Piñeiro
b16e0ff34e i965: use _mesa_is_array_texture instead of _mesa_tex_target_is_array
Both methods provide the same functionality, so one would be
removed.

v2: use _mesa_is_array_texture and not the other way (Brian Paul)

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-07 20:30:24 +01:00
Ilia Mirkin
db072d2086 gk110/ir: fix imul hi emission with limm arg
The elemental demo hits this case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-07 13:30:17 -05:00
Chad Versace
9098e0f074 anv/image: Refactor anv_image_make_surface()
Reduce the number of function parameters. Deduce the
anv_image::*_surface from the parameters instead of requiring the caller
to do that.
2015-12-07 09:28:14 -08:00
Chad Versace
3d85a28e90 anv: Assert the succes of isl_surf_init() 2015-12-07 08:54:59 -08:00
Chad Versace
64e8af69b1 anv: Use isl_tiling_flags in anv_image_create_info
Replace
    anv_image_create_info::force_tiling
    anv_image_create_info::tiling
with the bitmask
    anv_image_create_info::isl_tiling_flags

This allows us to drop the function
anv_image.c:choose_isl_tiling_flags().
2015-12-07 08:50:28 -08:00
Chad Versace
c97d8af9aa anv: Fix anv_gem_set_tiling to respect tiling param
Function anv_gem_set_tiling() ignored its 'tiling' parameter. It
unconditionally set the bo's tiling to I915_TILING_X.
2015-12-07 08:42:11 -08:00
Chad Versace
01e2932d6a anv: Remove unused anv_format_s8_uint
This is no longer needed after migrating to isl.
2015-12-07 08:40:14 -08:00
Brian Paul
32a6e081c3 svga: use the debug callback to report issues to the state tracker
Use the new debug callback hook to report conformance, performance
and fallbacks to the state tracker.  The state tracker, in turn can
report this issues to the user via the GL_ARB_debug_output extension.

More issues can be reported in the future; this is just a start.

v2: remove conditionals around pipe_debug_message() calls since the
check is now done in the macro itself.
v3: remove unneeded dummy %s substitutions

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>,
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-12-07 08:57:49 -07:00
Brian Paul
5effc3ae74 gallium/util: check callback pointers for non-null in pipe_debug_message()
So the callers don't have to do it.

v2: also check cb!=NULL in the macro

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-12-07 08:56:51 -07:00
Abdiel Janulgue
b19546abf3 i965: Add defines for gather push constants
v2 (Francisco Jerez):
   - Rename HSW_GATHER_CONSTANTS_RESERVED to HSW_GATHER_POOL_ALLOC_MUST_BE_ONE.
   - Rename BRW_GATHER_* prefix to HSW_GATHER_CONSTANT_*.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-12-07 14:58:12 +02:00
Timothy Arceri
9214664aed mesa: move GLES checks for SSO input/output validation
This function is unfinished there is a bunch more validation rules
that need to be applied here. We will still want to call it for desktop
GL we just don't want to validate precision so move the ES check to
reflect this.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-07 21:41:14 +11:00
Timothy Arceri
ad02621854 mesa: move GL_INVALID_OPERATION error to rendering call
The validation api doesn't trigger this error so just move it to the
code called during rendering.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:41:09 +11:00
Timothy Arceri
4dd096d741 mesa: move pipeline input/output validation inside _mesa_validate_program_pipeline()
This allows validation to be done on rendering calls also.

Fixes 3 dEQP-GLES31.functional.separate tests.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
2015-12-07 21:41:05 +11:00
Timothy Arceri
da1a01361b glsl: re-validate program pipeline after sampler change
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
https://bugs.freedesktop.org/show_bug.cgi?id=93180
2015-12-07 21:41:00 +11:00
Dave Airlie
41e82f4f96 r600: apply SIMD workaround to cayman also.
At last on ARUBA this is required to stop tessellation hanging
in heaven.

This removes one of the SIMDs from use by the HS/LS.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Tested-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 18:57:34 +10:00
Dave Airlie
6bf6bdbc2b r600: fix regression introduced with ring emit changes.
This was adding one after a CUT which broke end primitive
2015-12-07 05:44:55 +00:00
Dave Airlie
fc276bda22 r600: remove stale tessellation comment
pointed out by Marek.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 11:04:48 +10:00
Dave Airlie
5ca9825758 docs: consolidate r600 entry in GL3.txt
Though fp64 emulation still needs to be done for a lot of the evergreen hw.
2015-12-07 10:06:44 +10:00
Dave Airlie
7fa2914b06 docs: update with r600 tessellation status.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
33404f1415 r600: enable tessellation for evergreen/cayman (v2)
This enables tessellation for evergreen/cayman,

This will need changes before committing depending
on what hw works etc.
working are CAYMAN/REDWOOD/BARTS/TURKS/SUMO/CAICOS

v2: only enable on evergreen and above.
2015-12-07 09:59:02 +10:00
Dave Airlie
a2885d9cf9 r600g: reduce number of ps thread on caicos
this allows tess apps to start

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
fe64a0c8bf r600g: adjust ls/hs thread counts for sumo
these stop tess hangs here.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
e7ce9e3bb8 r600/asm: enable nstack check for tess ctrl/eval shaders.
This just makes sure they register at least one stack
usage frame like vertex shaders.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
bb44c1f036 r600/asm: handle lds read operations.
Reads from the queue shouldn't be merged for now read operations.

Reads from the queue shouldn't be merged for now, or put in
T slots.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
8ec2cb13e5 r600/asm: add LDS ops and barrier to the once per group restriction.
LDS ops must be scheduled in X slot, and barrier should be on its
own in a group.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
18871ac576 r600: move VGT_VTX_CNT_EN into shader stages atom.
This should be enabled for tessellation shaders as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:02 +10:00
Dave Airlie
958d617d98 r600: enable tcs/tes dumping for R600_DUMP_SHADERS.
Trivial patch just to enable dumping more.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
b8df7d03c8 r600: handle SIMD allocation issue with HS/LS
At least one SIMD must be kept away from the HS/LS
stages in order to avoid a hw issue on evergreen/cayman.

This patch implements this workaround.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
7b5878ee04 r600/shader: increase number of inputs/outputs to 64.
Tessellation exceeds these sometimes, so increase them for now.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Edward O'Callaghan
22058f69fb r600: handle barrier opcode.
This handles the barrier opcode for EG/CM.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
9662a43d23 r600/shader: handle tess related system-values.
This adds handling for TESSINNER/TESSOUTER in the TES
where they need to be fetched from LDS,
and TESSCOORD which comes in via r0.

It also handle primitive ID and invocation ID.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
92fbf856f4 r600/shader: allow multi-dimension arrays for tcs/tes inputs/outputs.
This just allows multi-dim arrays to be processed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
30d56d1c00 r600/shader: handle TES exports and streamout
when tessellation is enabled the TES shader is responsible
for handling streamout and exports.

This adds the streamout and export workarounds to TES,
and also makes sure TES sets up spi_sid.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
2239f3eaff r600/shader: emit tessellation factors to GDS at end of TCS.
When we are finished the shader, we read back all the tess factors
from LDS and write them to special global memory storage using
GDS instructions.

This also handles adding NOP when GDS or ENDLOOP end the TCS.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
cfc2818e23 r600/shader: handle TCS output writing.
TCS outputs whenever they are written in the shader,
need to be written to LDS not temporaries, this handles
this case. It also fixes up the case where the output
is a relative addressed output, so we don't try to apply
the relative address at the wrong time.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
892cc65fa3 r600/shader: handle VS shader writing to the LDS outputs. (v1.1)
This writes the VS shaders outputs to the LDS memory in
the correct places.

v1.1: use 24-bit
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
8b2024196f r600/shader: handle fetching tcs/tes inputs and tcs outputs
This handles the logic for doing fetches from LDS for
TCS and TES. For TCS we need to fetch both inputs and outputs,
for TES only inputs need to be fetched.

v2: use 24-bit ops.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
4477be2404 r600/shader: add get_lds_offset0 helper
This retrievs the offset into the LDS for a patch or
non-patch variable, it takes the RelPatch channel
and a temporary register.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
2a9639e41f r600/shader: add function to get tess constants info
This function retrieves the tess input/output info
from the tess constant buffer that is bound to the shader.

This uses a vfetch to get the values into the shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
0696ebc899 r600/shader: add utility functions to do single slot arithmatic
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.

It handles CAYMAN MULLO differences as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
09d25a9b37 r600/eg: workaround bug with tess shader and dynamic GPRs.
When using tessellation on eg/ni chipsets, we must disable
dynamic GPRs to workaround a hw bug where the GPU hangs
when too many things get queued.

This implements something like the r600 code to emit
the transition between static and dynamic GPRs, and to
statically allocate GPRs when tessellation is enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
d87f54f225 r600/shader: move get_temp and last_instruction helpers up
These are required for tess to be used earlier.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:01 +10:00
Dave Airlie
7933ba4d9c r600: bind geometry shader ring to the correct place
When tess/gs are enabled, the geom shader ring needs
to bind to the tess eval not the vertex shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
e3ecc28e99 r600: create fixed function tess control shader fallback.
If we have no tess control shader, then we have to use a fallback
one that just writes the tessellation factors.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
731ff3766f r600: create LDS info constants buffer and write LDS registers. (v2)
This creates a constant buffer with the information about
the layout of the LDS memory that is given to the vertex, tess
control and tess evaluation shaders.

This also programs the LDS size and the LS_HS_CONFIG registers,
on evergreen only.

v2: calculate lds hs num waves properly (Marek)
Emit the state only when something has changed (airlied).

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
38b5ee4796 r600/eg: update shader stage emission/tf param for tess.
This update the setting of the shader stages register
when tess is enabled and add the setting of the VGT_TF_PARAM
register from the tess shader properties.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
8874725c84 r600: hook TES/TCS shaders to the selection logic.
This hooks the TES/TCS bindings to the HW stages up.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
79d88afd5c r600: workout bitmask for the used tcs inputs/outputs.
This is used later to setup the constants to be given
to the tessellation shaders.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
839dae0dc0 r600: port over the get_lds_unique_index from radeonsi
On r600 this needs to subtract 9 due to texcoord interactions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
420afe06d1 r600: add set_tess_state callback.
This just stores the values in the context to be used later
when emitting the constant buffers.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
7db24b740c r600/eg: init tess registers to defaults (v1.1)
This initialises the tess min/max using fglrx values,
and also initialises a number of other registers related
to tessellation.

v1.1: caicos doesn't have some registers.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
25f96c1120 r600: hook up constants/samplers/sampler view for tessellation
This hooks the resources to the correct hw shaders when tess
is enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
9f86741863 r600: add create/bind/delete shader hooks for tessellation
This hooks up the gallium API for the tessellation shaders.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
797012bb67 r600/sb: add LS/HS hw shader types.
This just adds printing for the hw shader types, and hooks it up.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
382e2a2901 r600/blit: add tcs/tes shader saves.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
bdf7dadda8 r600: disable SB for now on tess related shaders.
Note we have to disable on vertex shaders when we are
operating in tes mode.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
8849867b8a r600: update correct hw shaders depending on configuration.
This updates the tess hw shaders from the sw ones routing
things correctly.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:59:00 +10:00
Dave Airlie
b1da110b71 r600: add shader key entries for tcs and tes.
with tessellation vs can now run on ls, and tes can
run on vs or es, tcs runs on hs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
a131ac73e6 r600: add PATCHES to the pipe conversion.
This just converts the value to the hw value.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
0b08a8ade6 r600: add functions to update ls/hs state.
This just adds the two functions, these will get hooked up
later in the shader code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Glenn Kennard
b2fa64b161 r600g/sb: Support LDS ops in SB bytecode I/O
This just adds the LDS ops to the SB bytecode reader/writers.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
816bb30245 r600: add support for LDS instruction encoding.
These are used in tessellation shaders to read/write values
between VS/TCS/TES.

This splits the eg alu assembler out to handle these
instructions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
fe4eb49df9 r600/sb: add support for GDS to the sb decoder/dump. (v1.1)
This just adds support to the decoder, not actual SB support.

v1.1: fixup GDS relative mode. (Glenn).

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
2b25d9ac7f r600: add support for GDS clause to the assembler.
This just adds enough for the tessellation shaders,
which require TF_WRITE to work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
4f83184eff r600: use macros for updating the various stages.
These macros will make things easier to see when tess
is added to the mix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
85131a5490 r600: add SET_NULL_SHADER macro.
This is used to set a hw shader to NULL.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
f395ed8d4c r600: move clip misc and streamout stream updates to a single place
This will be updated in a macro later.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
8a0e21fc5a r600: move selecting shaders into earlier code.
select the ps/gs/vs in that order then process the results.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
3a7232a9a9 r600: use a macro to remove common shader selection code.
This function is going to get a lot messier with tessellation
so I'm going to use some macros to try and clean some bits
of common code up.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
19799a5928 r600: move to using hw stages array for hw stage atoms
This moves to using an array of hw stages for the atoms.

Note this drops the 23 from the vertex shader, this value
is calculated internally when shaders are bound, so not
required here.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
bb2b8778cb r600: make adjust_gprs use hw stages.
This changes the r600 specific GPR adjustment code
to use the stage defines, and arrays.

This is prep work for the tess changes later.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:59 +10:00
Dave Airlie
d1b90839c0 r600: introduce HW shader stage defines
Add a list of defines for the HW stages.

We will use this for GPR calculations amongst other things.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:58 +10:00
Dave Airlie
bd71f3e4fe r600: fix masks for two of the unused evergreen regs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-07 09:58:58 +10:00
Edward O'Callaghan
d108b69d2c gallium: Remove redundant NULL ptr checks
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:23 +01:00
Edward O'Callaghan
13eb5f596b gallium/drivers: Sanitize NULL checks into canonical form
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:23 +01:00
Edward O'Callaghan
150c289f60 gallium/auxiliary: Sanitize NULL checks into canonical form
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:23 +01:00
Edward O'Callaghan
147fd00bb3 gallium/auxiliary: Trivial code style cleanup
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:22 +01:00
Edward O'Callaghan
25b3d554c4 gallium/drivers: Trivial code-style cleanup
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:22 +01:00
Edward O'Callaghan
34782eec31 gallium/auxiliary: Fix zero integer literal to pointer comparison
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:10:02 +01:00
Edward O'Callaghan
3edae10601 winsys/amdgpu: Make use of ARRAY_SIZE macro
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:09:54 +01:00
Edward O'Callaghan
82871081fc svga: Make use of ARRAY_SIZE macro
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:09:52 +01:00
Edward O'Callaghan
70d2d3ef7f llvmpipe: Make use of ARRAY_SIZE macro
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:09:47 +01:00
Edward O'Callaghan
be51020f2a gallium/drivers/nouveau: Make use of ARRAY_SIZE macro
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 17:03:17 +01:00
Edward O'Callaghan
7e43a28079 gallium/radeon*: Remove useless casts
These are unnecessary and are likely just left overs from prior
work.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-06 11:52:16 +01:00
Ilia Mirkin
0ef5c8ab74 nv50/ir: fold shl + mul with immediates
On SM20 this gives:

total instructions in shared programs : 6299222 -> 6294240 (-0.08%)
total gprs used in shared programs    : 944139 -> 944068 (-0.01%)
total local used in shared programs   : 54116 -> 54116 (0.00%)

                local        gpr       inst      bytes
    helped           0         126        2781        2781
      hurt           0          55          11          11

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-05 18:56:43 -05:00
Ilia Mirkin
abd326e81b nv50/ir: propagate indirect loads into instructions
This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4].

On SM35:

total instructions in shared programs : 6206257 -> 6185058 (-0.34%)
total gprs used in shared programs    : 911045 -> 910722 (-0.04%)
total local used in shared programs   : 39072 -> 39072 (0.00%)

                local        gpr       inst      bytes
    helped           0         417        4195        4195
      hurt           0         280           0           0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-05 17:50:23 -05:00
Ilia Mirkin
31fde8faba nv50/ir: flip shl(add, imm) into add(shl, imm)
This works when the add also has an immediate. This often happens in
address calculations. These addresses can then be inlined as well.

On code targeted to SM35:

total instructions in shared programs : 6223346 -> 6206257 (-0.27%)
total gprs used in shared programs    : 911075 -> 911045 (-0.00%)
total local used in shared programs   : 39072 -> 39072 (0.00%)

                local        gpr       inst      bytes
    helped           0         119        3664        3664
      hurt           0          74          15          15

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-05 17:50:23 -05:00
Eric Anholt
a4eff86f4a vc4: Fix accidental scissoring when scissor is disabled.
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.

Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.
2015-12-05 13:12:27 -08:00
Eric Anholt
d16d666776 vc4: Disable RCL blitting when scissors are enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it.  If you're doing a scissored blit, you're
probably a testcase.

Fixes piglit's fbo-scissor-blit fbo
2015-12-05 13:12:27 -08:00
Eric Anholt
0afe83078d vc4: Bring over cleanups from submitting to the kernel. 2015-12-05 13:12:27 -08:00
Samuel Pitoiset
9f6ff76fdc nvc0: expose a group of performance metrics for SM30 (Kepler)
This allows to monitor these performance metrics through
GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
0afd8f7bd7 nvc0: re-introduce performance metrics for SM30 (Kepler)
This implements more performance metrics than the previous support,
but some other metrics still need to be figured out.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
af275b8839 nvc0: remove useless counting operations for MP counters
Those bits were related to old performance metrics support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
6667355d4b nvc0: remove old performance metrics support on Kepler
These performance metrics will be re-introduced in an upcoming
patch that will follow the same design as Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
662eb434ee nvc0: remove wrong inst_issued HW SM perf counter on Kepler
inst_issued is performance metric not a hardware event on Kepler (SM30).
It will be re-introduced in an upcoming patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
342ea31193 nvc0: add missing HW SM perf counters for SM30 (Kepler)
SM30 is the compute capability version for GK104/GK106/GK107.
This also introduces a new signal group selection called UNK0F.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Samuel Pitoiset
7f42688017 nvc0: fix the comment that describe MP counters storage on Kepler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-12-05 19:23:34 +01:00
Rob Clark
58efff89a2 freedreno/ir3: nir shader prints with 'disasm' debug option
Move these to 'disasm' instead of the more verbose 'optmsgs' since, like
the tgsi dumps, it is useful without the more verbose compiler logging
enabled.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-05 08:48:19 -05:00
Kristian Høgsberg Kristensen
0a5bee1fe6 vk: Don't override and hardcode autoconf CFLAGS
To disable optimizations pass CFLAGS="-O0 -g" on the configure command
line.
2015-12-04 21:24:15 -08:00
Kristian Høgsberg Kristensen
7337870036 vk: Move isl files to libisl.la helper library
These will be in their own library eventually - let's just do that now.
2015-12-04 21:24:15 -08:00
Chad Versace
2f270f0d15 anv/image: Fix choice of isl_surf_usage for depthstencil images
Fixes assertion in vkCreateImage when VkFormat is combined depthstencil.

Fixed many vulkancts tests that use combined depthstencil. For example,
fixes dEQP-VK.pipeline.depth.format.d16_unorm_s8_uint.compare_ops.\
not_equal_less_or_equal_not_equal_greater.
2015-12-04 16:37:05 -08:00
Chad Versace
a09b4c298c anv: Add func anv_get_isl_format() 2015-12-04 16:37:05 -08:00
Chad Versace
8b9ceda9f1 anv/image: Delete old ifdef'd out code 2015-12-04 16:37:05 -08:00
Jason Ekstrand
4dd5ef9e09 vk: Add needed builddir subdirectories to the include path
This fixes out-of-tree builds and closes #1
2015-12-04 15:48:27 -08:00
Kristian Høgsberg Kristensen
f1f78a371e vk: gem handles are uint32_t
No functional difference, but lets be consistent with the kernel API.
2015-12-04 12:53:27 -08:00
Ilia Mirkin
a3f90ef0a6 gallium/util: fix pipe_debug_message macro to allow 0 args
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-12-04 15:24:17 -05:00
Jason Ekstrand
8f722c2fa3 vk: Update the README for 0.210.1 2015-12-04 11:08:45 -08:00
Kristian Høgsberg
dac57750db vk: Turn on Bay Trail, Cherryview and Broxton support 2015-12-04 09:51:47 -08:00
Kristian Høgsberg Kristensen
bbb6875f35 vk: Map uncached, coherent memory as write-combine
This gives us the required characteristics for the memory type.
2015-12-04 09:51:47 -08:00
Kristian Høgsberg Kristensen
c3c61d210f vk: Expose two memory types for non-LLC GPUs
We're required to expose a host-visible, coherent memory type. On big
core GPUs that share, LLC, we can expose one such memory type that's
also cached.  However, on non-LLC GPUs we can't both be cached and
coherent. Thus, we expose both the required coherent type and the cached
but non-coherent combination.
2015-12-04 09:51:47 -08:00
Kristian Høgsberg
773592051b vk: clflush all state for non-LLC GPUs 2015-12-04 09:51:47 -08:00
Kristian Høgsberg
b431cf59a3 vk: Set I915_CACHING_NONE for userptr BOs when !llc
Regular objects are created I915_CACHING_CACHED on LLC platforms and
I915_CACHING_NONE on non-LLC platforms. However, userptr objects are
always created as I915_CACHING_CACHED, which on non-LLC means
snooped. That can be useful but comes with a bit of overheard.  Since
we're eplicitly clflushing and don't want the overhead we need to turn
it off.
2015-12-04 09:51:47 -08:00
Kristian Høgsberg
e0b5f0308c vk: Implement vkFlushMappedMemoryRanges()
We'll do a runtime switch on device->info.has_llc for now.
2015-12-04 09:51:47 -08:00
Eric Anholt
a69ac4e89c vc4: Add debug dumping of MSAA surfaces. 2015-12-04 09:24:36 -08:00
Eric Anholt
3c3b1184eb vc4: Add support for laying out MSAA resources.
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format.  Since they're full resolution buffers, we have to
align their size to full tiles.
2015-12-04 09:24:36 -08:00
Eric Anholt
74c4b3b80c vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.
2015-12-04 09:23:55 -08:00
Eric Anholt
3a508a0d94 vc4: Fix up tile alignment checks for blitting using just an RCL.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile.  We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.
2015-12-04 09:10:53 -08:00
Eric Anholt
a664233042 vc4: Add support for loading sample mask. 2015-12-04 09:10:53 -08:00
Rob Clark
4b18d51756 freedreno/ir3: convert scheduler back to recursive algo
I've played with a few different approaches to tweak instruction
priority according to how much they increase/decrease register pressure,
etc.  But nothing seems to change the fact that compared to original
(pre-multiple-block-support) scheduler, in some edge cases we are
generating shaders w/ 5-6x higher register usage.

The problem is that the priority queue approach completely looses the
dependency between instructions, and ends up scheduling all paths at the
same time.

Original reason for switching was that recursive approach relied on
starting from the shader outputs array.  But we can achieve more or less
the same thing by starting from the depth-sorted list.

shader-db results:

total instructions in shared programs:          113350 -> 105183 (-7.21%)
total dwords in shared programs:                219328 -> 211168 (-3.72%)
total full registers used in shared programs:   7911 -> 7383 (-6.67%)
total half registers used in shader programs:   109 -> 109 (0.00%)
total const registers used in shared programs:  21294 -> 21294 (0.00%)

                 half       full      const      instr     dwords
    helped           0         322           0         711         215
      hurt           0         163           0          38           4

The shaders hurt tend to gain a register or two.  While there are also a
lot of helped shaders that only loose a register or two, the more
complex ones tend to loose significanly more registers used.  In some
more extreme cases, like glsl-fs-convolution-1.shader_test it is more
like 7 vs 34 registers!

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Rob Clark
ad2cc7bddc freedreno/ir3: don't reuse a0.x across blocks
It causes confusion in sched if we need to split_addr() since otherwise
we wouldn't easily know which block the new addr instr will be scheduled
in.  So just side-step the whole situation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Rob Clark
8e52344dc1 freedreno/ir3: rename ir3_block::bd
We'll need to add similar for ir3_instruction, but following the pattern
to use 'id' seems confusing.  Let's just go w/ generic 'data' as the
name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Giuseppe Bilotta
d566382a98 util: fix comment typo
Undefining the NDEBUG is relevant for release build, as they are the
ones that set it.

[Emil Velikov: split from previous patch]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-04 14:06:41 +00:00
Giuseppe Bilotta
efaac624af xvmc: force assertion in XvMC tests
This follows the src/util/u_atomic_test.c model of undefining NDEBUG
unconditionally throughouth the XvMC tests, to force asserts regardless
of debug mode.

The comment on u_atomic_test.c is also fixed (read 'debug' where it
should have been 'release').

v2: s/debug/release/ in relevant comments

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
[Emil Velikov: keep the src/util/ hunk as separate patch]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-04 14:06:41 +00:00
Giuseppe Bilotta
4839353634 radeon: const correctness
Add missing `const` specifier for pointer pointing to a const struct.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-04 14:06:41 +00:00
Giuseppe Bilotta
d61802b5e0 radeon: whitespace cleanup
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-04 14:06:38 +00:00
Emil Velikov
1074e38fbb mesa/tests: add KHR_debug GLES glGetPointervKHR entry points
Should have been part of commit f53f9eb8d4 "glapi: add GetPointervKHR
to the ES dispatch".

v2: comment out the ES1.1 symbol and use the same description (pattern)
as elsewhere (Matt)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93235
Fixes: f53f9eb8d4 "glapi: add GetPointervKHR to the ES dispatch".
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-04 13:56:43 +00:00
Jason Ekstrand
b715e6d528 i965/vec4: Stop pretending to support indirect output stores
Since we're using nir_lower_outputs_to_temporaries to shadow all our
outputs, it's impossible to actually get an indirect store.  The code we
had to "handle" this was pretty bogus as it created a register with a
reladdr and then stuffed it in a fixed varying slot without so much as a
MOV.  Not only does this not do the MOV, it also puts the indirect on the
wrong side of the transaction.  Let's just delete the broken dead code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-03 20:58:12 -08:00
Jason Ekstrand
aa35b0c2c7 i965/vec4: Get rid of the nir_inputs array
It's not really buying us anything at this point.  It's just a way of
remapping one offset namespace onto another.  We can just use the location
namespace the whole way through.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-03 20:58:12 -08:00
Jason Ekstrand
c6bcc23369 nir/lower_io: Pass the builder and type_size into get_io_offset
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-03 20:58:12 -08:00
Ilia Mirkin
204f803ce0 nv50/ir: replace zeros in movs as well
The original change to put zeroes directly into instructions created
conditional mov's with the zero immediate. However that can't be
emitted, so make sure to replace the zero with r63.

Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-03 23:46:02 -05:00
Ilia Mirkin
a3722b81f5 nv50/ir: fold fma/mad when all 3 args are immediates
This happens pretty rarely, but might as well do it when it does.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-03 23:02:57 -05:00
Ilia Mirkin
2b98914fe0 nv50/ir: avoid looking at uninitialized srcMods entries
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-03 23:02:57 -05:00
Ilia Mirkin
49692f86a1 nv50/ir: fix DCE to not generate 96-bit loads
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-03 23:02:57 -05:00
Roland Scheidegger
51140f452a draw: fix clipping of layer/vp index outputs
This was just plain broken. It used always the value from v0 (for vp_index)
but would pass the value from the provoking vertex to later stages - but only
if there was a corresponding fs input, otherwise the layer/vp index would get
lost completely (as it would try to interpolate the (unsigned) values as
floats).
So, make it obey provoking vertex rules (drivers relying on draw will need to
do the same). And make sure that the default interpolation mode (when no
corresponding fs input is found) for them is constant.
Also, change the code a bit so constant inputs aren't interpolated then
copied over later.

Fixes the new piglit test gl-layer-render-clipped.

v2: more consistent whitespaces fixes for function defs, and more tab killing
(overall still not quite right however).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Roland Scheidegger
5ea5b169e9 softpipe: use provoking vertex for layer
Same as for llvmpipe, albeit softpipe only really handles multiple layers,
not multiple viewports/scissors.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Roland Scheidegger
ddaf8d7b10 llvmpipe: use provoking vertex for layer/viewport
d3d10 actually requires using provoking (first) vertex. GL is happy with
any vertex (as long as we say it's undefined in the corresponding queries).
Up to now we actually used vertex 0 for viewport index, and vertex 1 for
layer (for tris), which really didn't make sense (probably a typo). Also,$
since we reorder vertices of clockwise triangle, that actually meant we used
a different vertex depending if the traingle was cw or ccw (still ok by gl).
However, it should be consistent with what draw (clip) does, and using
provoking vertex seems like the sensible choice (draw clip will be fixed
next as it is totally broken there).
While here, also use the correct viewport always even when not needed
in setup (we pass it down to jit fragment shader it might be needed there
for getting correct near/far depth values).

No piglit changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-04 03:42:19 +01:00
Jason Ekstrand
cb2382882e nir/spirv: Update to SPIR-V version 1.0 2015-12-03 18:28:10 -08:00
Eric Anholt
83e65ca831 vc4: Add the RCL to CL debug dumping when in simulator mode.
We can't dump it in the real driver, since the kernel doesn't give us a
handle to it (except after a GPU hang, using a root ioctl).  In the
simulator we can.
2015-12-03 18:20:39 -08:00
Chad Versace
371fc2bc20 anv/gen9: Fix SURFACE_STATE halign and valign
Pre-Skylake, RENDER_SUFFACE_STATE.SurfaceVerticalAlignment is in units
of surface samples. A surface sample is equivalent to a pixel in all
surfaces except interleaved multisample surfaces.

In Skylake, it is in units of surface elements. A surface element is
equivalent to a surface sample except for compressed formats, in which
case the element is a compression block.
2015-12-03 15:33:08 -08:00
Chad Versace
981ef2f02d anv: Embed isl_surf into anv_surface
This reduces struct anv_surface to just two members: an offset and the
embedded isl_surf.
2015-12-03 15:31:00 -08:00
Chad Versace
594e673fcc anv/image: Drop assertions on SURFTYPE extent limits
In anv_image_create(), stop asserting that VkImageCreateInfo::extent
does not exceed the hardware limits for the given SURFTYPE. The
assertions were incorrect because they did not take into account the
hardware gen. Anyways, these types of assertions belong in isl, not
anvil.
2015-12-03 15:29:52 -08:00
Chad Versace
b369389640 anv/image: Use isl to calculate surface layout
Remove the surface layout calculations in anv_image_make_surface().  Let
isl_surf_init() do the heavy lifting.

Fixes 8 Crucible tests and regresses none. (hw=Broadwell and
crucible@33d91ec).
2015-12-03 15:29:08 -08:00
Chad Versace
afdadec77f isl: Implement isl_surf_init() for gen4-gen9
This is a big code push. The patch is about 3000 lines.

Function isl_surf_init() calculates the physical layout of a surface.

The implementation is "complete" (but untested) for all 1D, 2D, 3D, and
cube surfaces for gen4 through gen9, except:
    * gen9 1D surfaces
    * gen9 Ys multisampled surfaces
    * auxiliary surfaces (such as hiz, mcs, ccs)
2015-12-03 15:26:11 -08:00
Chad Versace
bda43a0f59 isl: Rename legacy Y tiling to ISL_TILING_Y0
Rename legacy Y tiling from ISL_TILING_Y to ISL_TILING_Y0 in order to
clearly distinguish it from Yf and Ys.  Using ISL_TILING_Y to denote
legacy Y tiling would lead to confusion with i965, because i965 uses
I195_TILE_Y to denote *any* Y tiling.
2015-12-03 15:26:11 -08:00
Chad Versace
57941b61ab anv/image: Vulkan's depthPitch is in bytes, not rows
Fix for VkGetImageSubresourceLayout.
2015-12-03 15:26:11 -08:00
Jason Ekstrand
bfeaf67391 anv/device: Give a version of 0.210.1 in apiVersion 2015-12-03 15:23:33 -08:00
Jason Ekstrand
d666487dc6 vk: Add new WSI support and bump the API to 0.210.1 2015-12-03 15:15:29 -08:00
Marek Olšák
dd27825c8c radeonsi: fix Fiji for LLVM <= 3.7
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-12-03 23:55:23 +01:00
Marek Olšák
bfc14796b0 radeonsi: fix occlusion queries on Fiji
Tested.
2015-12-03 23:46:37 +01:00
Marek Olšák
0b03f2def0 radeonsi: dump init_config IBs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
3a6de8c86e radeonsi: print framebuffer info into ddebug logs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
a0bfb2798d gallium/radeon: print more info about HTILE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
1cca259d99 gallium/radeon: print more info about CMASK
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
84fbb0aff9 gallium/radeon: rename fmask::pitch -> pitch_in_pixels
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
19eaceb6ed gallium/radeon: print more information about textures
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
2d712d35c5 gallium/radeon: move printing texture info into a separate function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
c60d49161e gallium/radeon: remove unused r600_texture::pitch_override
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Marek Olšák
75d64698f0 gallium/radeon: remove DBG_TEXMIP
we don't need 2 flags for dumping texture info

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-03 23:41:23 +01:00
Edward O'Callaghan
a5055e2f86 gallium/aux/util: Trivial, we already have format use it
No need to dereference again, fixup for clarity.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-03 23:41:23 +01:00
Jason Ekstrand
fde60c1684 anv/entrypoints: Run the headers through the preprocessor first
This allows us to filter based on preprocessor directives.  We could build
a partial preprocessor into the generator, but we would likely get it
wrong.  This allows us to filter out, for instance, windows-specific WSI
stuff.
2015-12-03 14:13:55 -08:00
Jose Fonseca
5294debfa4 automake: Fix typo in MSVC2008 compat flags.
It should be MSVC2008_COMPAT_CFLAGS and not MSVC2008_COMPAT_CXXFLAGS.

This is why the recent util_blitter breakage went unnoticed on autotools
builds.

Trivial.
2015-12-03 22:00:49 +00:00
Jose Fonseca
071af9a511 ttn: Whitelist from -Werror=declaration-after-statement.
nir is the exception among gallium/auxiliary -- we don't need to compile
it with MSVC2008 yet.  And this enables us to use
-Werror=declaration-after-statement in the next commit as we should,
without complicated fixes to tgsi_to_nir module.

Trvial.  Tested with GCC and Clang.
2015-12-03 22:00:49 +00:00
Jason Ekstrand
4c19243562 vk/0.210.0: Advertise version 0.210.0 2015-12-03 13:44:02 -08:00
Jason Ekstrand
888744cabf vk/0.210.0: Update queries to the new API 2015-12-03 13:44:02 -08:00
Jason Ekstrand
924fbfc9a1 vk/0.210.0: Fix how we handle access flags in barriers
The initial implementation in the 0.210.0 API update was misguieded as to
what the access flags meant.  This should be more correct.
2015-12-03 13:44:02 -08:00
Jason Ekstrand
fa2435de3c vk/0.210.0: Update the VkFormat enum 2015-12-03 13:44:02 -08:00
Jason Ekstrand
4e904a0310 vk/0.210.0: Rework vkQueueSubmit 2015-12-03 13:44:02 -08:00
Jason Ekstrand
5757ad2959 vk/0.210.0: Remove depth clip and add depth clamp 2015-12-03 13:43:59 -08:00
Jason Ekstrand
d689745303 vk/0.210.0: Rework device features and limits 2015-12-03 13:43:54 -08:00
Jason Ekstrand
74c4c4acb6 vk/0.210.0: Rework QueueFamilyProperties 2015-12-03 13:43:54 -08:00
Jason Ekstrand
fed3586f34 vk/0.210.0: Rework result and structure type enums
By and large, this is just moving enum values around.  However, it also
removed VK_UNSUPPORTED which we were returning a number of places.  Those
places now return VK_ERROR_INCOMPATABLE_DRIVER.
2015-12-03 13:43:54 -08:00
Jason Ekstrand
a5f19f64c3 vk/0.210.0: Remove the VkShaderStage enum
This made for an unfortunately large amount of work since we were using it
fairly heavily internally.  However, gl_shader_stage does basically the
same things, so it's not too bad.
2015-12-03 13:43:54 -08:00
Jason Ekstrand
e10dc002e9 vk/0.210.0: Remove VkShader 2015-12-03 13:43:54 -08:00
Jason Ekstrand
e6ab06ae7f vk/0.210.0: Rework memory property flags 2015-12-03 13:43:54 -08:00
Jason Ekstrand
93071482f9 vk/0.210.0: Remove some unused enum values 2015-12-03 13:43:54 -08:00
Jason Ekstrand
b264012fcf vk/0.210.0: Update VkPipelineStageFlagBits 2015-12-03 13:43:54 -08:00
Jason Ekstrand
1aaf15bf19 vk/0.210.0: Trivial function argument name change 2015-12-03 13:43:53 -08:00
Jason Ekstrand
938a2939c8 vk/0.210.0: We now allocate command buffers; not create them 2015-12-03 13:43:53 -08:00
Jason Ekstrand
5a02441789 vk/0.210.0: Rename a parameter to GetImageSparseMemoryRequirements 2015-12-03 13:43:53 -08:00
Jason Ekstrand
a9fc0ce0e3 vk/0.210.0: Delete three no longer existant entrypoints 2015-12-03 13:43:53 -08:00
Jason Ekstrand
fcfb404a58 vk/0.210.0: Rework allocation to use the new pAllocator's 2015-12-03 13:43:53 -08:00
Jason Ekstrand
d3547e7334 vk/0.210.0: Use VkSampleCountFlagBits for sample counts 2015-12-03 13:43:53 -08:00
Jason Ekstrand
9349625d60 vk/0.210.0: Rework VkInstanceCreateInfo 2015-12-03 13:43:53 -08:00
Jason Ekstrand
c30a021820 vk/0.210.0: More function argument renaming 2015-12-03 13:43:53 -08:00
Jason Ekstrand
b1cd025b88 vk/0.210.0: Replace MemoryInput/OutputFlags with AccessFlags 2015-12-03 13:43:53 -08:00
Jason Ekstrand
43f3e92348 vk/0.210.0: Rework render pass description structures 2015-12-03 13:43:53 -08:00
Jason Ekstrand
299f8f1511 vk/0.210.0: More structure field renaming 2015-12-03 13:43:53 -08:00
Jason Ekstrand
407b8cc5e0 vk/0.210.0: Get rid of VkImageAspect 2015-12-03 13:43:53 -08:00
Jason Ekstrand
3f6abd0161 vk/0.210.0: Rework descriptor sets 2015-12-03 13:43:52 -08:00
Jason Ekstrand
6a6da54ccb vk/0.210.0: Rename parameters to memory binding/mapping functions 2015-12-03 13:43:52 -08:00
Jason Ekstrand
aadb7dce9b vk/0.210.0: Update to the new instance/device create structs 2015-12-03 13:43:52 -08:00
Jason Ekstrand
607fe31598 vk/0.210.0: More trivial struct/enum changes 2015-12-03 13:43:52 -08:00
Jason Ekstrand
dde7172a8a vk/0.210.0: Trivial flag enum updates 2015-12-03 13:43:52 -08:00
Jason Ekstrand
4cf0b57bbf vk/0.210.0: Rename ChannelFlags to ColorComponentFlags 2015-12-03 13:43:52 -08:00
Jason Ekstrand
7f2284063d vk/0.210.0: s/raster/rasterization/ 2015-12-03 13:43:52 -08:00
Jason Ekstrand
1ab9f843bc vk/0.210.0: Don't allow chaining of description structs 2015-12-03 13:43:52 -08:00
Jason Ekstrand
17486b8664 vk/0.210.0: More fun with flags fields 2015-12-03 13:43:52 -08:00
Jason Ekstrand
f5ba1f994a vk/0.210.0: Make pCode a uint32_t pointer 2015-12-03 13:43:52 -08:00
Jason Ekstrand
5f348bd0e5 vk/0.210.0: Rename origin fields of VkViewport 2015-12-03 13:43:52 -08:00
Jason Ekstrand
9fa6e328eb vk/0.210.0: Move alphaToOne and alphaToCoverate to multisample state 2015-12-03 13:43:52 -08:00
Jason Ekstrand
f97c3b6d58 vk/0.210.0: Add flags fields to various pipeline create structs 2015-12-03 13:43:51 -08:00
Jason Ekstrand
e673d64209 vk/0.210.0: Change field names in vertex input structs 2015-12-03 13:43:51 -08:00
Jason Ekstrand
fd53603e42 vk/0.210.0: Misc. no-op structure changes
The only non-trivial change is to sparse resources that we don't handle
anyway.
2015-12-03 13:43:51 -08:00
Jason Ekstrand
fe644721aa vk/0.210.0: Rename property pCount parameters 2015-12-03 13:43:51 -08:00
Jason Ekstrand
e8f2294cd2 vk/0.210.0: Rework sampler filtering and mode enums 2015-12-03 13:43:51 -08:00
Jason Ekstrand
2e10ca5748 vk/0.210.0: Misc. function argument renames 2015-12-03 13:43:51 -08:00
Jason Ekstrand
569f70be56 vk/0.210.0: Rework copy/clear/blit API 2015-12-03 13:43:47 -08:00
Emil Velikov
5a23f6bd8d mesa: rework the meaning of gl_debug_message::length
Currently it stores strlen(buf) whenever the user originally provided a
negative value for length.

Although I've not seen any explicit text in the spec, CTS requires that
the very same length (be that negative value or not) is returned back on
Pop.

So let's push down the length < 0 checks, tweak the meaning of
gl_debug_message::length and fix GetDebugMessageLog to add and count the
null terminators, as required by the spec.

v2: return correct total length in GetDebugMessageLog
v3: rebase (drop _mesa_shader_debug hunk).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:21:19 +00:00
Emil Velikov
622186fbdf mesa: errors: validate the length of null terminated string
We're about to rework the meaning of gl_debug_message::length to only
store the user provided data. Thus we should add an explicit validation
for null terminated strings.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:21:08 +00:00
Emil Velikov
66fea8bd96 mesa: accept TYPE_PUSH/POP_GROUP with glDebugMessageInsert
These new (relative to ARB_debug_output) tokens, have been explicitly
separated from the existing ones in the spec text. With the reference
to glDebugMessageInsert was dropped.

At the same time, further down the spec says:
   "The value of <type> must be one of the values from Table 5.4"

... and these two are listed in Table 5.4.

The GL 4.3 and GLES 3.2 do not give any hints on the former
'definition', plus CTS requires that the tokens are valid values for
glDebugMessageInsert.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:21:08 +00:00
Emil Velikov
53be28107b mesa: add SEVERITY_NOTIFICATION to default state
As per the spec quote:

    "All messages are initially enabled unless their assigned severity
    is DEBUG_SEVERITY_LOW"

We already had MEDIUM and HIGH set, let's toggle NOTIFICATION as well.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:21:07 +00:00
Emil Velikov
078dd6a0b4 mesa: return the correct value for GroupStackDepth
We already have one group (the default) as specified in the spec. So
lets return its size, rather than the index of the current group.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:20:58 +00:00
Emil Velikov
f39954bf7c mesa: rename GroupStackDepth to CurrentGroup
The variable is used as the actual index, rather than the size of the
group stack - rename it to reflect that.

Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:17:48 +00:00
Emil Velikov
1ca735701b mesa: do not enable KHR_debug for ES 1.0
The extension requires (cough implements) GetPointervKHR (alias of
GetPointerv) which in itself is available for ES 1.1 enabled mesa.

Anyone willing to fish around and implement it for ES 1.0 is more than
welcome to revert this commit. Until then lets restrict things.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:17:48 +00:00
Emil Velikov
f53f9eb8d4 glapi: add GetPointervKHR to the ES dispatch
The KHR_debug extension implements this.

Strictly speaking it could be used with ES 1.0, although as the original
function is available on ES 1.1, I'm inclined to lift the KHR_debug
requirement to ES 1.1.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 19:17:48 +00:00
Nanley Chery
808e752796 mesa/version: Update gl_extensions::Version during version override
Commit a16ffb743c, which introduced
gl_extensions::Version, updates the field when the context version
is computed and when entering/exiting meta. Update this field when
the version is overridden as well.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-12-03 10:20:34 -08:00
Brian Paul
a0f1bc18e5 mesa: print enum names rather than hexadecimal values in error messages
Trivial.
2015-12-03 09:40:43 -07:00
Brian Paul
72a913ceb8 st/wgl: add new stw_ext_rendertexture.c file
This should have been included in the previous commit.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-12-03 09:33:55 -07:00
Brian Paul
e832b5b7fa st/wgl: add support for WGL_ARB_render_texture
There are a few legacy OpenGL apps on Windows which need this extension.
We basically use glCopyTex[Sub]Image to implement wglBindTexImageARB (see
the implementation notes for details).

v2: refactor code to use st_copy_framebuffer_to_texture() helper function.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-12-03 09:12:20 -07:00
Brian Paul
47b9ef872b st/mesa: add new st_copy_framebuffer_to_texture() function
This helper is used by the WGL state tracker to implement the
wglBindTexImageARB() function.

This is basically a new "meta" function.  However, we're not putting
it in the src/mesa/drivers/common/ directory because that code is not
linked with gallium-based drivers.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-12-03 08:34:24 -07:00
Juha-Pekka Heikkila
d6d90750f1 glsl: remove useless null checks and make match_explicit_outputs_to_inputs() static
match_explicit_outputs_to_inputs() cannot get null inputs and if it ever did
triggering first null check would later in the function cause segfault.

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
CC: timothy.arceri@collabora.com
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-03 10:56:35 +02:00
Tapani Pälli
231db5869c i965: use _Shader to get fragment program when updating surface state
Atomic counters and Images were using ctx::Shader that does not take in
to account program pipeline changes, ctx::_Shader must be used for SSO to
work. Commit c0347705 already changed ubo's to use this.

Fixes failures seen with following Piglit test:
	arb_separate_shader_object-atomic-counter

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-03 08:08:07 +02:00
Ilia Mirkin
6c6f28c35e nv50/ir: fix moves to/from flags
Noticed this when looking at a trace that caused flags to spill to/from
registers. The flags source/destination wasn't encoded correctly
according to both envydis and nvdisasm.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
101e315cc1 nv50/ir: don't forget to mark flagsDef on cvt in txb lowering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
06055121e6 nv50/ir: fix instruction permutation logic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:38 -05:00
Ilia Mirkin
11fcf46590 nv50/ir: the mad source might not have a defining instruction
For example if it's $r63 (aka 0), there won't be a definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 20:41:37 -05:00
Ilia Mirkin
52b68375ae nv50/ir: make sure entire graph is reachable
The algorithm expects the entire CFG to be reachable, so make sure that
we hit every node. Otherwise we will end up with uninitialized data,
memory corruption, etc.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
adcc547bfb nv50/ir: deal with loops with no breaks
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
ff61ac4838 nvc0/ir: fold postfactor into immediate
SM20-SM50 can't emit a post-factor in the presence of a long immediate.
Make sure to fold it in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-02 18:51:15 -05:00
Ilia Mirkin
52a800a687 nv50/ir: allow immediate 0 to be loaded anywhere
There's a post-RA fixup to replace 0's with $r63 (or $r127 if too many
regs are used), so just as nvc0, let an immediate 0 be loaded anywhere.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 18:51:15 -05:00
Kenneth Graunke
043d427538 i965: Add INTEL_DEBUG=perf information for GS recompiles.
Surprisingly, this didn't exist at all.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-02 15:23:01 -08:00
Kenneth Graunke
b6d4f051a5 i965: De-duplicate key_debug() function.
This appeared in brw_vs.c and brw_wm.c, should have appeared in
brw_gs.c, and was soon going to have to be in brw_tcs.c and brw_tes.c as
well.

So, instead, move it to a central location (which has to know about both
struct brw_context and perf_debug()).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-12-02 15:22:58 -08:00
Samuel Pitoiset
8482763d35 nv50/ir/gk110: add memory barriers support for GK110
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 22:44:53 +01:00
Samuel Pitoiset
c672bf3b04 nv50/ir: do not call textureMask() for surface ops
That texture mask thing doesn't seem to be needed for surface ops, so
just as nve4+, let do that only for texture ops.

This fixes a segfault with 'test_surface_st' from
gallium/tests/trivial/compute.c on Fermi because this test uses sustp.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-02 22:10:44 +01:00
Jose Fonseca
9e6af56666 appveyor: Initial integration.
AppVeyor doesn't require an appveyor.yml in the repos (in fact it has
some limitations as noted in comments below), but doing so has two great
advantages over the web UI:

- appveyor.yml can be revisioned together with the code, so instructions
  should always be in synch with the code

- appveyor.yml can be reused for people's private repositories (be on
  fdo or GitHub, etc.)

Acked-by: Roland Scheidegger <sroland@vmware.com>
2015-12-02 19:40:53 +00:00
Jose Fonseca
4a3d388834 util/blitter: Fix "SO C90 forbids mixed declarations and code".
Trivial.
2015-12-02 17:49:20 +00:00
Brian Paul
d31065cbf6 mesa: print enum string in compressed_subtexture_error_check() error msg
Trivial.
2015-12-02 10:28:15 -07:00
Edward O'Callaghan
772f429f0a gallium/util: Fix util_blitter_clear_depth_stencil() for num_layers>1
Previously util_blitter_clear_depth_stencil() could not clear more
than the first layer. We need to generalise this as we did for
util_blitter_clear_render_target().

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-02 18:23:43 +01:00
Edward O'Callaghan
8f2c5e281d gallium/util: Fix util_blitter_clear_render_target() for num_layers>1
Previously util_blitter_clear_render_target() could not clear more
than the first layer. We need to generalise this so that
ARB_clear_texture can pass the 3d piglit test.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-12-02 18:23:43 +01:00
Roland Scheidegger
09f74e6ef4 mesa: fix VIEWPORT_INDEX_PROVOKING_VERTEX and LAYER_PROVOKING_VERTEX queries
These are implementation-dependent queries, but so far we just returned the
value of whatever the current provoking vertex convention was set to, which
was clearly wrong.
Just make this a variable in the context constants like for other things
which are implementation dependent (I assume all drivers will want to set
this to the same value for both queries), and set it to GL_UNDEFINED_VERTEX
which is correct for everybody (and drivers can override it).

Reviewed-by: Brian Paul <brianp@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
2015-12-02 18:20:57 +01:00
Jose Fonseca
56aff6bb4e Remove Sun CC specific code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2015-12-02 07:51:04 +00:00
Jose Fonseca
51564f04b7 configure.ac: Refuse to build with Sun C compiler.
https://bugs.freedesktop.org/show_bug.cgi?id=93189

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2015-12-02 07:51:04 +00:00
Eric Anholt
18f8da7865 travis: Add a test build with scons.
Since I just broke the scons build, I figured I'd make Travis test that I
don't break it again in the future.  The script runs the builds in
parallel across VMs, so it still takes just 5 minutes to turn around
results.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-12-01 15:09:56 -08:00
Kenneth Graunke
975b1299dd i965: Increase BRW_MAX_UBO to 14.
The NVIDIA binary driver and Intel's closed source driver both expose
14 here, rather than the GL minimum of 12.  Let's follow suit.

Without this, Shadow of Mordor fails to render correctly and triggers
OpenGL errors:

Mesa: User error: GL_INVALID_VALUE in glBindBufferBase(index=68)
Mesa: User error: GL_INVALID_VALUE in glUniformBlockBinding(block binding 68 >= 60)

There are 5 stages (VS, TCS, TES, GS, FS), and 12 * 5 = 60 is too small.
14 * 5 = 70 will work just fine.

Tapani believes this will also help Alien Isolation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-12-01 14:55:33 -08:00
Matt Turner
7e6a6f3e61 i965: Do dead-code elimination in a single pass.
The first pass marked dead instructions as opcode = NOP, and a second
pass deleted those instructions so that the live ranges used in the
first pass wouldn't change.

But since we're walking the instructions in reverse order, we can just
do everything in one pass. The only thing we have to do is walk the
blocks in reverse as well.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-01 14:48:55 -08:00
Matt Turner
5a6f0bf5b8 glsl: Rename safe_reverse -> reverse_safe.
To match existing foreach_in_list_reverse_safe.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-12-01 14:48:55 -08:00
Matt Turner
48b4e88d3d i965: Don't mark dead instructions' sources live.
Removes dead code from glsl-mat-from-int-ctor-03.shader_test.

Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-01 14:48:55 -08:00
Dave Airlie
0e49151dcf r600: set mega fetch count to 16 for gs copy shader
Seems like MFC should be set for this shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
4ebcf5194d r600: increment ring index after emit vertex not before.
The docs say we should send the emit after the ring writes,
so lets do that and not have an ALU in between.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
13b134a443 r600: add alu + cf nop to copy shader on r600
SB suggests we do this for r600, so lets do it,
for the copy shader.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:13 +10:00
Dave Airlie
af4013d26b r600: SMX returns CONTEXT_DONE early workaround
streamout, gs rings bug on certain r600s, requires a wait idle
before each surface sync.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:25:00 +10:00
Dave Airlie
b63944e8b9 r600: do SQ flush ES ring rolling workaround
Need to insert a SQ_NON_EVENT when ever geometry
shaders are enabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-02 08:24:32 +10:00
Samuel Pitoiset
ea33920f7e nv50,nvc0: allow to create resources other than buffers
For the compute support, we might stick buffers as surfaces. This fixes
an assertion when executing src/gallium/tests/trivial/compute.

To avoid using these "restricted" surfaces as render targets, these
assertions have been moved. Note that it's already handled for the
framebuffer thing on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-01 22:55:14 +01:00
Brian Paul
f391b95105 glapi: work-around MSVC 65K string length limitation for enums.c
String literals cannot exceed 65535 characters for MSVC.  Instead of
emiting a string, emit an array of characters.

v2: fix indentation and add comment in the gl_enums.py file about this
ugliness.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 14:28:45 -07:00
Eric Anholt
148c2f5b17 mapi: Fix enums.c build with other build systems.
Tested with scons (by both myself and Mark Janes), Android is just copy
and paste.
2015-12-01 12:19:02 -08:00
Eric Anholt
1c0ac1976a travis: Initial import of travis instructions.
This just builds/installs our dependencies, and runs "make check".  I'm
interested in integrating more tests into it, but this seems like a pretty
easy first start.

If your personal branches of Mesa are on github, you can enable it on your
account and the repository (see
https://docs.travis-ci.com/user/for-beginners), then any pushes you do
will get their HEAD commit tested, and any pull requests to your tree will
get their merge commits tested.
2015-12-01 11:08:57 -08:00
Eric Anholt
4922a3ae52 mesa: Drop the blacklisting of new GL enums.
Now when people need new extensions, they can skip the entire
enum-definition process, and we can stop reviewing new extension XML for
its enum content.

This also brings in a new enum that I wanted to use in enum_strings.cpp
for testing the code generator.

v2: Drop comment about disabled GL_1PASS_EXT test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:42 -08:00
Eric Anholt
b65e44f55d mesa: Use a 32-bit offset for the enums.c string offset table.
With GLES 3.1, GL 4.5, and many new vendor extensions about to get their
enums added, we jump up to 85k of table.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:41 -08:00
Eric Anholt
c75cfe1c8a mesa: Prefer newer names to older ones among names present in core.
Sometimes GL likes to rename an old enum when it grows a more general
purpose, and we should prefer the new name.  Changes from this:

GL_POINT/LINE_SIZE_* (1.1) -> GL_SMOOTH_POINT/LINE_SIZE_* (1.2)
GL_FOG_COORDINATE_* (1.4) -> GL_FOG_COORD_* (1.5)
GL_SOURCE[012]_RGB/ALPHA (1.3) -> GL_SRC0_RGB (1.5)
GL_COPY_READ/WRITE_BUFFER (3.1) -> GL_COPY_READ_BUFFER_BINDING (4.2)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:38 -08:00
Eric Anholt
710762b64a mesa: Drop bitfield "enums" from the enum-to-string table.
Asking the table for bitfield names doesn't make any sense.  For 0x10, do
you want GL_GLYPH_HORIZONTAL_BEARING_ADVANCE_BIT_NV or
GL_COLOR_BUFFER_BIT4_QCOM or GL_POLYGON_STIPPLE_BIT or
GL_SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV?  Giving a useful answer would
depend on a whole lot of context.

This also fixes a bad enum table entry, where we chose GL_HINT_BIT instead
of GL_ABGR_EXT for 0x8000, so we can now fix its entry in the enum_strings
test.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:36 -08:00
Eric Anholt
cbabf5f9dc mesa: Switch to using the Khronos registry for generating enums.
I've used a bunch of python code to cut out new enums so that the two
generated files can be diffed.  I'll remove all that hardcoding in the
following commits.  All remaining differences between the generated code:

- GL_TEXTURE_BUFFER_FORMAT didn't appear in GL3 when TBOs got merged to
  core, so it now gets an _ARB suffix instead.

- Blacklisting can't keep EXT_sso's GL_ACTIVE_PROGRAM_EXT from becoming
  GL_ACTIVE_PROGRAM -- in our hash table, GL_ACTIVE_PROGRAM_EXT points at
  the GLES2 enum's value (aka GL_CURRENT_PROGRAM).  By not blacklisting
  the core name, we get both enums translated.

- GL_DRAW_FRAMEBUFFER_BINDING and GL_FRAMEBUFFER_BINDING both appeared in
  GL3 as synonyms, and the new code happens to choose
  GL_FRAMEBUFFER_BINDING instead.

- GL_TEXTURE_COMPONENTS and GL_TEXTURE_INTERNAL_FORMAT both appear in 1.1,
  and the new code chooses GL_TEXTURE_INTERNAL_FORMAT instead (which seems
  better, to me)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:34 -08:00
Eric Anholt
f72923aaea mesa: Remove the python mode bits from gl_enums.py.
emacs whines at me every time I open the file about these unsafe
variables, and the file was reformatted from 8 space to 4 space long ago.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:31 -08:00
Eric Anholt
741f642a6f mesa: Drop apparently typoed GL_ALL_CLIENT_ATTRIB_BITS.
GL_ALL_ATTRIB_BITS is a thing, and GL_CLIENT_ALL_ATTRIB_BITS, but I don't
see GL_ALL_CLIENT_ATTRIB_BITS in my grepping of khronos XML, GL extension
specs, GL 1.1, GL 2.2, and GL 4.4.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-12-01 10:24:22 -08:00
Eric Anholt
5cb9dc45c7 mesa: Drop enums that had been removed in later revs of specs.
Mesa hasn't been using these enums and the finalized specs don't reference
them, so losing them from our generated enum-to-string code should be
fine.  Reduces diffs to generating from Khronos XML, which has these enums
noted defined but commented out from any consumers.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:18 -08:00
Eric Anholt
5a7e5d8bb6 mesa: Fix a typo in AMD_performance_monitor enum.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:16 -08:00
Eric Anholt
bfc64b9688 mesa: Fix enum definition of CULL_VERTEX_EYE/OBJECT_POSITION
In converting to using the Khronos XML, I found that our XML had these two
swapped, and the text spec agreed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:15 -08:00
Eric Anholt
76ec0b9038 mesa: Add a copy of the Khronos gl.xml (SVN #31705).
The intention here is to keep a pristine copy of the upstream gl.xml that
can be updated at any time with a new version, and use that to generate
Mesa code from instead of our private XML.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:13 -08:00
Eric Anholt
edc8850436 mesa: Cut enum_strings.cpp test down to a few hand-chosen enums.
The previous contents appeared to be the output of some form of code
generation for all enums, with a few entries hand-edited to deal with
oddness.  The downside to this was that when an enum gets promoted from
vendor to _EXT or _EXT to _ARB or _ARB to core, make check starts failing
even when the commiter has done nothing wrong.  Instead of black-box
testing the code generation, pick a few enums that intentionally poke the
interesting cases of code generation.

People editing the code generator should be diffing the generated code
anyway.  This should catch when they fail to do so, without throwing false
negatives when people update the GL XML.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-01 10:24:02 -08:00
Tom Stellard
9adbb9e713 clover: Handle NULL devices returned by pipe_loader_probe() v2
When probing for devices, clover will call pipe_loader_probe() twice.
The first time to retrieve the number of devices, and then second time
to retrieve the device structures.

We currently assume that the return value of both calls will be the
same, but this will not be the case if a device happens to disappear
between the two calls.

When a device disappears, the pipe_loader_probe() will add a NULL
device to the device list, so we need to handle this.

v2:
  - Keep range for loop

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: <mesa-stable@lists.freedesktop.org>
2015-12-01 16:00:54 +00:00
Jonathan Gray
99cd600835 automake: fix some occurrences of hardcoded -ldl and -lpthread
Correct some occurrences of -ldl and -lpthread to use
$(DLOPEN_LIBS) and $(PTHREAD_LIBS) respectively.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-01 16:53:40 +00:00
Iago Toral Quiroga
241f15ac80 glsl/lower_ubo_reference: split struct copies into element copies
Improves register pressure, since otherwise we end up emitting
loads for all the elements in the RHS and them emitting
stores for all elements in the LHS.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-01 13:30:42 +01:00
Iago Toral Quiroga
867c436ca8 glsl/lower_ubo_reference: split array copies into element copies
Improves register pressure, since otherwise we end up emitting
loads for all the elements in the RHS and them emitting
stores for all elements in the LHS.

v2:
  - Mark progress properly. This also fixes some instances where the added
    nodes with individual element copies where not being lowered, which is
    expected behavior as explained in the documentation for
    visit_list_elements.
  - Only need to do this if the RHS is a buffer-backed variable.
  - We can also have arrays inside structs. A later patch will make it so
    we also split struct copies and end up with multiple
    ir_dereference_record assignments, so make sure that if any of these
    is an array copy, we also split it.

Fixes the following piglit tests:
tests/spec/arb_shader_storage_buffer_object/execution/large-field-copy.shader_test
tests/spec/arb_shader_storage_buffer_object/linker/copy-large-array.shader_test

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-12-01 13:29:57 +01:00
Julien Isorce
e483cba9f5 st/va: also retrieve reference frames info for h264
Other hardwares than AMD require to parse:
VAPictureParameterBufferH264.ReferenceFrames[16]

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-12-01 08:21:37 +00:00
Julien Isorce
b4fb6d7616 st/va: delay decoder creation until max_references is known
In general max_references cannot be based on num_render_targets.

This patch allows to allocate buffers with an accurate size.
I.e. no more than necessary. For other codecs it is a fixed
value 2.

This is similar behaviour as vaapi/vdpau-driver.

For now HEVC case defaults to num_render_targets as before.
But it could also benefits this change by setting a more
accurate max_references number in handlePictureParameterBuffer.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-12-01 08:21:20 +00:00
Iago Toral Quiroga
750393ff7d glsl/dead_builin_varyings: Fix gl_FragData array lowering
The current implementation looks for array dereferences on gl_FragData and
immediately proceeds to lower them, however this is not enough because we
can have array access on vector variables too, like in this code:

out vec4 color;
void main()
{
   int i;
   for (i = 0; i < 4; i++)
      color[i] = 1.0;
}

Fix it by making sure that the actual variable being dereferenced is an array.

Fixes a crash in:
spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-ldexp-dvec4.shader_test

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-01 08:30:52 +01:00
Dave Airlie
4f34722575 r600: workaround empty geom shader.
We need to emit at least one cut/emit in every
geometry shader, the easiest workaround it to
stick a single CUT at the top of each geom shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-01 12:58:43 +10:00
Dave Airlie
04efcc6c7a r600: rv670 use at least 16es/gs threads
This is specified in the docs for rv670 to work properly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-01 12:58:34 +10:00
Dave Airlie
8168dfdd4e r600: geometry shader gsvs itemsize workaround
On some chips the GSVS itemsize needs to be aligned to a cacheline size.

This only applies to some of the r600 family chips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-01 12:57:55 +10:00
Gregory Hainaut
2ab9cd0c4d glsl: don't sort varying in separate shader mode
This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.

It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.

V2: (by Timothy Arceri)
* reworked comment slightly

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-01 12:46:37 +11:00
Gregory Hainaut
8117f46f49 glsl: don't dead code remove SSO varyings marked as active
GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.

This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=79783
2015-12-01 12:46:32 +11:00
Gregory Hainaut
618612f867 glsl: add always_active_io attribute to ir_variable
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

v4:
* Fix variable name in assert

v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-01 12:46:26 +11:00
Timothy Arceri
76c09c1792 glsl: copy how_declared when lowering interface blocks
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-01 12:45:07 +11:00
Timothy Arceri
12ba6cfba7 glsl: optimise inputs/outputs with explicit locations
This change allows used defined inputs/outputs with explicit locations
to be removed if they are detected to not be used between shaders
at link time.

To enable this we change the is_unmatched_generic_inout field to be
flagged when we have a user defined varying. Previously
explicit_location was assumed to be set only in builtins however SSO
allows the user to set an explicit location.

We then add a function to match explicit locations between shaders.

V2: call match_explicit_outputs_to_inputs() after
is_unmatched_generic_inout has been initialised.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-12-01 12:45:03 +11:00
Jason Ekstrand
4ab9391fbb vk/0.210.0: Rework dynamic states 2015-11-30 14:19:41 -08:00
Dave Airlie
4d64459a92 r600/shader: split address get out to a function.
This will be used in the tess shaders.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-01 08:10:21 +10:00
Jason Ekstrand
73ef7d47d2 vk/0.210.0: Rework color blending enums 2015-11-30 13:49:28 -08:00
Jason Ekstrand
2c77b0cd01 gen7/8/cmd_buffer: Inline vk_to_gen_swizzle
It's currently unused on IVB so we get compiler warnings.
2015-11-30 13:29:51 -08:00
Jason Ekstrand
9b1cb8fdbc vk/0.210.0: Rework a few raster/input enums 2015-11-30 13:28:17 -08:00
Jason Ekstrand
a53f23d93f vk/0.210.0: Rework texture view component mapping 2015-11-30 13:06:12 -08:00
Jason Ekstrand
f1a7c7841f vk/0.210.0: Switch to the new VKAPI function decorations
While we're at it, we do a bunch of the VkResult -> void updates
2015-11-30 12:46:30 -08:00
Jason Ekstrand
a89a485e79 vk/0.210.0: Rename CmdBuffer to CommandBuffer 2015-11-30 11:48:08 -08:00
Jason Ekstrand
6a8a542610 vk/0.210.0: A pile of minor enum updates 2015-11-30 11:12:44 -08:00
Jason Ekstrand
3db43e8f3e vk/0.210.0: Switch to the new-style handle declarations 2015-11-30 10:58:02 -08:00
Jason Ekstrand
5cb57806b2 vk: Add connonical 0.170.2 and 0.210.0 headers
This is in preparation for the API update
2015-11-30 10:24:35 -08:00
Marta Lofstedt
44944a66ce doc: Set GL_OES_geometry_shader as started
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-30 10:47:21 +01:00
Marta Lofstedt
1d5b88e33b gles2: Update gl2ext.h to revision: 32120
This is needed to be able to implement the accepted OES
extensions.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-30 10:46:15 +01:00
Julien Isorce
10c14919c8 vl/buffers: fixes vl_video_buffer_formats for RGBX
Fixes: 42a5e143a8 "vl/buffers: add RGBX and BGRX to the supported formats"
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-30 09:02:29 +00:00
Samuel Iglesias Gonsálvez
a348fe89af i965/fs: remove unused fs_reg offset
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-11-30 10:00:40 +01:00
Kenneth Graunke
83dedb6354 i965: Add src/dst interference for certain instructions with hazards.
When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:

   mov(8) g7<1>UD      0x00000000UD    { align1 WE_all 1Q compacted };
   mov(1) g7.5<1>UD    0x00000100UD    { align1 WE_all };
   mov(1) g7<1>UD      g0<0,1,0>UD     { align1 WE_all compacted };
   mov(1) g7.3<1>UD    g8<0,1,0>UD     { align1 WE_all };

This is done in the generator since the vec4 backend can't handle align1
regioning.  From the visitor's point of view, this is a single opcode:

   hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD

Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination.  However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.

In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.

It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride.  The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.

Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that.  It might also be the end of control flow.  This might
keep values alive too long.  What we really want is to say "my source
and destinations interfere".

This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard.  For
my vec4 case, we can determine this by switching on opcodes.  For the
SIMD16 case, we just move the existing code there.

I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.

v2: Rebased by mattst88.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-30 00:34:07 -08:00
Kenneth Graunke
1ac1581f38 i965: Fix JIP to properly skip over unrelated control flow.
We've apparently always been botching JIP for sequences such as:

   do
       cmp.f0.0 ...
       (+f0.0) break
       ...
       if
          ...
       else
          ...
       endif
       ...
   while

Normally, UIP is supposed to point to the final destination of the jump,
while in nested control flow, JIP is supposed to point to the end of the
current nesting level.  It essentially bounces out of the current nested
control flow, to an instruction that has a JIP which bounces out another
level, and so on.

In the above example, when setting JIP for the BREAK, we call
brw_find_next_block_end(), which begins a search after the BREAK for the
next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
setting JIP there.

This makes no sense at all.  The break is supposed to skip over the
whole if/else/endif block entirely.  They have a sibling relationship,
not a nesting relationship.

This patch fixes brw_find_next_block_end() to track depth as it does
its search, and ignore anything not at depth 0.  So when it sees the
IF, it ignores everything until after the ENDIF.  That way, it finds
the end of the right block.

I noticed this while reading some assembly code.  We believe jumping
earlier is harmless, but makes the EU walk through a bunch of disabled
instructions for no reason.  I noticed that GLBenchmark Manhattan had
a shader that contained a BREAK with a bogus JIP, but didn't measure
any performance improvement (it's likely miniscule, if there is any).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-30 00:27:16 -08:00
Dave Airlie
d72299c531 r600: move per-type settings into a switch statement
This will allow adding tess stuff much cleaner later.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 11:08:00 +10:00
Dave Airlie
58e0122d86 r600: split out common alu_writes pattern.
This just splits out a common pattern into an inline function
to make things cleaner to read.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 11:07:18 +10:00
Dave Airlie
26332ef797 r600/llvm: fix r600/llvm build
Reported on irc by gryffus

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 11:05:42 +10:00
Dave Airlie
9eff9f6134 r600: fixes for register definitions.
Forgot to add these.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 09:35:37 +10:00
Dave Airlie
c2e701c7ca r600: add missing register to initial state
We really should initialise HS/LS_2 and SQ_LDS_ALLOC exists
on all evergreen not just cayman, so we should initialise
it as well.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 09:14:16 +10:00
Dave Airlie
bcdc748fe2 r600: define registers required for tessellation
This adds the defines for a bunch of registers and shader
values that are required to implement tessellation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 09:14:16 +10:00
Dave Airlie
b502bae610 r600: consolidate clip state updates
Move some common code into one place, tess will also need
to use this function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-30 09:14:16 +10:00
Samuel Pitoiset
b8c524ff88 nv50/ir: always display the opcode number for unknown instructions
This helps in debugging unknown instructions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-29 16:40:12 +01:00
Emil Velikov
d37ebed470 mesa: remove len argument from _mesa_shader_debug()
There was only a single user which was using strlen(buf).
As this function is not user facing (i.e. we don't need to feed back
original length via a callback), we can simplify things.

Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-29 14:41:40 +00:00
Emil Velikov
e714c971ae drivers/x11: scons: partially revert b9b40ef9b7
As glsl_types.{cpp,h} were  moved out of the sconscript (commit
b23a4859f4 "scons: Build nir/glsl_types.cpp once.") remove the dangling
includes.

Cc: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-29 14:41:39 +00:00
Emil Velikov
31ed3fc57d nir: remove recursive inclusion in builtin_type_macros.h
The header is already included by glsl_types.{cpp,h}.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-29 14:41:39 +00:00
Emil Velikov
fc16942cf7 nir: remove unneeded include
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-29 14:41:39 +00:00
Emil Velikov
b92ecdcc79 mesa/program: remove dead function declarations
Dead since

 5e9aa9926b (2011) - _mesa_ir_compile_shader
 69e07bdeb4 (2009) - _mesa_get_program_register

Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-29 14:41:39 +00:00
Emil Velikov
5d294d9fa3 auxiliary/vl/dri: fd management cleanups
Analogous to previous commit, minus the extra dup. We are the one
opening the device thus we can directly use the fd.

Spotted by Coverity (CID 1339867, 1339877)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-29 14:41:00 +00:00
Emil Velikov
151290c154 auxiliary/vl/drm: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339868)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-29 14:40:26 +00:00
Emil Velikov
fe71059388 st/xa: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339866)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-29 14:39:51 +00:00
Emil Velikov
d90ba57c08 st/dri: fd management cleanups
Add some checks if the original/dup'd fd is valid and ensure that we
don't leak it on error. The former is implicitly handled within the
pipe_loader, although let's make things explicit and check beforehand.

Spotted by Coverity (CID 1339865)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-29 14:39:03 +00:00
Emil Velikov
5f92906b87 pipe-loader: check if winsys.name is non-null prior to strcmp
In theory this wouldn't be an issue, as we'll find the correct name and
break out of the loop before we hit the sentinel.

Let's fix this and avoid issues in the future.

Spotted by Coverity (CID 1339869, 1339870, 1339871)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-29 14:38:22 +00:00
Emil Velikov
866a1f7fdd st/va: add missing break statement
Earlier commit factored out the mpeg4 IQ matrix handling into separate
function, although it forgot to add a break in its case statement.
Thus the data ended up partially overwritten as the mpeg4 and h265
structs are members of the desc union.

Spotted by Coverity (CID 1341052)

Fixes: 64761a841d "st/va: move MPEG4 functions into separate file"
Cc: Julien Isorce <j.isorce@samsung.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-29 14:31:14 +00:00
Ilia Mirkin
0396eaaf80 mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93126
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-11-28 17:24:34 -05:00
Nicolai Hähnle
9e5e702cfb radeon: only suspend queries on flush if they haven't been suspended yet
Non-timer queries are suspended during blits. When the blits end, the queries
are resumed, but this resume operation itself might run out of CS space and
trigger a flush. When this happens, we must prevent a duplicate suspend during
preflush suspend, and we must also prevent a duplicate resume when the CS flush
returns back to the original resume operation.

This fixes a regression that was introduced by:

commit 8a125afa6e
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date:   Wed Nov 18 18:40:22 2015 +0100

    radeon: ensure that timing/profiling queries are suspended on flush

    The queries_suspended_for_flush flag is redundant because suspended queries
    are not removed from their respective linked list.

    Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reported-by: Axel Davy <axel.davy@ens.fr>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-28 11:08:49 +01:00
Jose Fonseca
ea3f394e4a scons: Use LD version script for libgl-xlib.
Trivial.
2015-11-27 14:14:25 +00:00
Jose Fonseca
a11955b9f9 svga: Don't return value from void function.
Addresses MSVC warning C4098: 'svga_destroy_query' : 'void' function
returning a value.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-27 14:14:25 +00:00
Jose Fonseca
c127e6a3ea gallium: Make pipe_query_result::batch array length non-zero.
Zero length arrays are non standard:

   warning C4200: nonstandard extension used : zero-sized array in struct/union
   Cannot generate copy-ctor or copy-assignment operator when UDT contains a zero-sized array

And all code does `N * sizeof query_result->batch[0]`, so it should work
exactly the same.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-27 14:14:25 +00:00
Neil Roberts
bc2470d5d3 util: Tiny optimisation for the linear→srgb conversion
When converting 0.0 it would be nice if it didn't do any arithmetic.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-11-27 10:55:22 +01:00
Eduardo Lima Mitev
27a88a947c docs: Update GL3.txt to add ARB_internalformat_query2
Added to OpenGL 4.3 section, tagged as 'in progress (elima)'. See
https://bugs.freedesktop.org/show_bug.cgi?id=92687.

Thanks to Thomas H.P. Andersen for remainding me about this.

v1: - Update the already existing entry in section 4.3
      instead (Ilia Mirkin).
    - Added my BZ nickname as contact person (Felix Schwarz).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-26 23:53:16 +01:00
Timothy Arceri
c3ec12ec3c glsl: don't generate extra errors in ValidateProgramPipeline
From Section 11.1.3.11 (Validation) of the GLES 3.1 spec:

   "An INVALID_OPERATION error is generated by any command that trans-
   fers vertices to the GL or launches compute work if the current set
   of active program objects cannot be executed, for reasons including:"

It then goes on to list the rules we validate in the
_mesa_validate_program_pipeline() function.

For ValidateProgramPipeline the only mention of generating an error is:

   "An INVALID_OPERATION error is generated if pipeline is not a name re-
   turned from a previous call to GenProgramPipelines or if such a name has
   since been deleted by DeleteProgramPipelines,"

Which we handle separately.

This fixes:
ES31-CTS.sepshaderobjs.PipelineApi

No regressions on the eEQP 3.1 tests.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-27 08:44:37 +11:00
Kristian Høgsberg Kristensen
d6d82f1ab3 vk: Fix 3DSTATE_WM_DEPTH_STENCIL for gen8
This packet is a different size on gen8 and we hit an assertion when we
try to merge a gen9 size dword array from the pipeline with the gen8
sized array we create from dynamic state.

Use a static assert in the merge macro and fix this issue by using different
wm_depth_stencil arrays on gen8 and gen9.
2015-11-26 10:11:52 -08:00
Rob Clark
57fc0dd8d5 freedreno/ir3: assign varying locations later
Rather than assigning inloc up front, when we don't yet know if it will
be unused, assign it last thing before the legalize pass.

Also, realize when inputs are unused (since for frag shader's we can't
rely on them being removed from ir->inputs[]).  This doesn't make sense
if we don't also dynamically assign the inloc's, since we could end up
telling the hw the wrong # of varyings (since we currently assume that
the # of varyings and max-inloc are related..)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-26 12:35:10 -05:00
Rob Clark
2181f2cd58 freedreno/ir3: use instr flag to mark unused instructions
Rather than magic depth value, which won't be available in later stages.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-26 12:35:10 -05:00
Rob Clark
2fbe4e7d2f freedreno/a4xx: rework vinterp/vpsrepl
Same as previous commit, for a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-26 12:35:10 -05:00
Rob Clark
5adf4a5cda freedreno/a3xx: rework vinterp/vpsrepl
Make the interpolation / point-sprite replacement mode setup deal with
varying packing.

In a later commit, we switch to packing just the varying components that
are actually used by the frag shader, so we won't be able to assume
everything is vec4's aligned to vec4.  Which would highly confuse the
previous vinterp/vpsrepl logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-26 12:35:10 -05:00
Serge Martin
b7c958b7b7 clover: fix tgsi compiler crash with invalid src
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-26 15:30:25 +02:00
Francisco Jerez
55ffa64daf i965/gen9+: Switch thread scratch space to non-coherent stateless access.
The thread scratch space is thread-local so using the full IA-coherent
stateless surface index (255 since Gen8) is unnecessary and
potentially expensive.  On Gen8 and early steppings of Gen9 this is
not a functional change because the kernel already sets bit 4 of
HDC_CHICKEN0 which overrides all HDC memory access to be non-coherent
in order to workaround a hardware bug.

This happens to fix a full system hang when running any spilling code
on a pre-production SKL GT4e machine I have on my desk (forcing all
HDC access to non-coherent from the kernel up to stepping F0 might be
a good idea though regardless of this patch), and improves performance
of the OglPSBump2 SynMark benchmark run with INTEL_DEBUG=spill_fs by
33% (11 runs, 5% significance) on a production SKL GT2 (on which HDC
IA-coherency is apparently functional so it wouldn't make sense to
disable globally).

Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-11-26 14:07:58 +02:00
Francisco Jerez
bc8182808a i965/fs: Don't use Gen7-style scratch block reads on Gen9+.
Unfortunately Gen7 scratch block reads and writes seem to be hardwired
to BTI 255 even on Gen9+ where that index causes the dataport to do an
IA-coherent read or write.  This change is required for the next patch
to be correct, since otherwise we would be writing to the scratch
space using non-coherent access and then reading it back using
IA-coherent reads, which wouldn't be guaranteed to return the value
previously written to the same location without introducing an
additional HDC flush in between.

Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-11-26 14:07:58 +02:00
Francisco Jerez
3e6d0d2ca4 i965: Add symbolic defines for some magic dataport surface indices.
Reviewed-by: Kristian Høgsberg  <krh@bitplanet.net>
2015-11-26 14:07:58 +02:00
Nicolai Hähnle
6b5268d202 radeon: use PIPE_DRIVER_QUERY_FLAG_DONT_LIST for perfcounters
Since the query names are not very enlightening, and there are thousands
of them, GALLIUM_HUD=help should only show the first and last query name
for each hardware block.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-26 10:57:44 +01:00
Nicolai Hähnle
f36d9857cd gallium: add PIPE_DRIVER_QUERY_FLAG_DONT_LIST
This allows the driver to give a hint to the HUD so that GALLIUM_HUD=help is
less spammy.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-26 10:57:43 +01:00
Nicolai Hähnle
80a16dece6 radeon: delay the generation of driver query names until first use
This shaves a bit more time off the startup of programs that don't
actually use performance counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-26 10:57:43 +01:00
Julien Isorce
ca976e6900 st/va: add missing profiles in PipeToProfile's switch.
Otherwise assert is raised from vlVaQueryConfigProfiles's for loop.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-26 08:21:45 +00:00
Marta Lofstedt
63b49e1711 mesa: remove ARB_geometry_shader4
No drivers currently implement ARB_geometry_shader4, nor are there
any plans to implement it.  We only support the version of geometry
shaders that was incorporated into OpenGL 3.2 / GLSL 1.50.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-26 08:40:46 +01:00
Kristian Høgsberg Kristensen
cd4721c062 vk: Add SKL support
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-11-25 22:34:10 -08:00
Tapani Pälli
c2e146f487 mesa: error out in indirect draw when vertex bindings mismatch
Patch adds additional mask for tracking which vertex arrays have
associated vertex buffer binding set. This array can be directly
compared to which vertex arrays are enabled and should match when
drawing.

Fixes following CTS tests:

   ES31-CTS.draw_indirect.negative-noVBO-arrays
   ES31-CTS.draw_indirect.negative-noVBO-elements

v2: update mask in vertex_array_attrib_binding
v3: rename mask and make it track _BoundArrays which matches what
    was actually originally wanted (Fredrik Höglund)
v4: code cleanup, check for GLES 3.1 (Fredrik Höglund)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2015-11-26 08:01:31 +02:00
Kristian Høgsberg Kristensen
c445fa2f77 vk: Make entrypoint generator output gen9 entry points
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-11-25 20:58:25 -08:00
Kristian Høgsberg Kristensen
0e02a88ad4 vk: Add GEN9 pack header 2015-11-25 20:56:41 -08:00
Michel Dänzer
22d2dda03b targets/xvmc: use the non-inline sw helpers
This was missed in commit 59cfb21d ("targets: use the non-inline sw
helpers").

Fixes build failure:

  CXXLD    libXvMCgallium.la
../../../../src/gallium/auxiliary/pipe-loader/.libs/libpipe_loader_static.a(libpipe_loader_static_la-pipe_loader_sw.o):(.data.rel.ro+0x0): undefined reference to `sw_screen_create'
collect2: error: ld returned 1 exit status
Makefile:756: recipe for target 'libXvMCgallium.la' failed
make[3]: *** [libXvMCgallium.la] Error 1

Trivial.
2015-11-26 12:14:28 +09:00
Kristian Høgsberg Kristensen
0c59cb42b5 vk: Move all gen8 files to gen8 lib 2015-11-25 14:13:53 -08:00
Emil Velikov
72c33f0dd5 targets/nine: remove freedreno target
Analogous to previous commit. As we no longer have anyone who uses NIR
we can drop the link.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
2015-11-25 20:29:44 +00:00
Emil Velikov
aa335bb01b targets/nine: remove vc4 target
There are no users for it.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-11-25 20:28:38 +00:00
Emil Velikov
b78259c4b5 gallium: remove unused function declarations
Unused as of commit 23fb11455b "{st,targets}/dri: use static/dynamic
pipe-loader"

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-25 20:26:52 +00:00
Emil Velikov
59cfb21d46 targets: use the non-inline sw helpers
Previously (with the inline ones) things were embedded into the
pipe-loader, which means that we cannot control/select what we want in
each target.

That also meant that at runtime we ended up with the empty
sw_screen_create() as the GALLIUM_SOFTPIPE/LLVMPIPE were not set.

v2: Cover all the targets, not just dri.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
2015-11-25 20:25:29 +00:00
Emil Velikov
fbc6447c3d target-hepers: add non inline sw helpers
Feeling rather dirty copying the inline ones, yet we need the inline
ones for swrast only targets like libgl-xlib, osmesa.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
2015-11-25 20:25:14 +00:00
Emil Velikov
f623517188 pipe-loader: fix off-by one error
With earlier commit we've dropped the manual iteration over the fixed
size array and prepemtively set the variable storing the size, that is
to be returned. Yet we forgot to adjust the comparison, as before we
were comparing the index, now we're comparing the size.

Fixes: ff9cd8a67c "pipe-loader: directly use
pipe_loader_sw_probe_null() at probe time"
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93091
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2015-11-25 20:22:35 +00:00
Emil Velikov
0572e5fea5 nir: include what we want/need
Swap core.h with macros.h, as the latter provides the required MAX2
macro.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-25 20:19:47 +00:00
Kenneth Graunke
3810c15614 i965: Fix scalar vertex shader struct outputs.
While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.

Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.

Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-25 11:47:47 -08:00
Kenneth Graunke
3e9003e9cf i965: Fix fragment shader struct inputs.
Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.

Normally, varying packing splits up structures for us, so we don't see
them in the backend.  However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.

This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)

When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive.  (The scalar VS backend requires additional
bug fixes in the next patch.)

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-25 11:47:47 -08:00
Tom Stellard
89851a2965 radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values
The compiler has more information and is able to optimize the bits
it sets in these registers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

CC: <mesa-stable@lists.freedesktop.org>
2015-11-25 11:03:05 -05:00
Tom Stellard
95e0510916 radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}
In the future, these will be used by other shaders types.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 11:03:05 -05:00
Samuel Iglesias Gonsálvez
98ceb60177 docs: minimum required python mako version is 0.3.4
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-25 16:50:53 +01:00
Nicolai Hähnle
07bddff460 docs: update relnotes with AMD_performance_monitor for radeonsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:52:09 +01:00
Nicolai Hähnle
ad22006892 radeonsi: implement AMD_performance_monitor for CIK+
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.

The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.

Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:52:09 +01:00
Nicolai Hähnle
b9fc01aee7 radeon: scale query buffer size to result size
Performance monitor queries can become very big, especially considering that
instances of a block in different shader engines are queried separately.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:28:09 +01:00
Nicolai Hähnle
592928065c radeonsi/sid: add performance counter registers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:28:06 +01:00
Nicolai Hähnle
9823048e0b radeonsi/sid: add hardware constants for COPY_DATA packet
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:28:03 +01:00
Nicolai Hähnle
1aa3b48c12 radeon: extend CIK_UCONFIG_REG_END for performance counters
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:28:00 +01:00
Nicolai Hähnle
b589e18a98 radeon: add perfcounter-related EVENT_TYPEs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:27:56 +01:00
Nicolai Hähnle
30462b1826 radeon: additional constants for WAIT_REG_MEM and EVENT_WRITE_EOP
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-25 15:27:34 +01:00
Nicolai Hähnle
bfddd005ea st/mesa: remove outdated comment
The enable of AMD_performance_monitor is no longer related to whether
queries are run by the GPU since the commit mentioned below.

Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit ddf27a3dd0
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date:   Tue Nov 10 13:35:01 2015 +0100

    gallium: remove pipe_driver_query_group_info field type
2015-11-25 15:27:34 +01:00
Nicolai Hähnle
babf655ab2 st/mesa: delay initialization of performance counters
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-25 15:27:33 +01:00
Nicolai Hähnle
27a06e0bbe mesa/main: allow delayed initialization of performance monitors
Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.

A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver function
which will be called the first time information about the performance monitor
groups is actually used.

The init_groups helper is called for API functions that can be called before
a monitor object exists. Functions that require an existing monitor object
can rely on init_groups having been called before.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-25 15:27:33 +01:00
Tapani Pälli
315c4c315e glsl: handle case where index is array deref in optimize_split_arrays
Previously pass did not traverse to those array dereferences which were
used as indices to arrays. This fixes Synmark2 Gl42CSCloth application
issues.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-25 11:25:57 +02:00
Julien Isorce
63c344d179 nouveau: move interlaced assert down in nouveau_vp3_video_buffer_create
templat->interlaced is 0 if not NV12 which is the case currently
when using VPP.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-25 08:17:39 +00:00
Iago Toral Quiroga
2bba2152e4 i965: remove trailing spaces in various files
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-25 08:12:08 +01:00
Iago Toral Quiroga
1af0d9d939 glsl: remove trailing spaces in various files
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-25 08:09:17 +01:00
Matt Turner
f1b7fefd4e i965: Pass brw_context pointer, not gl_context pointer.
Fixes a warning introduced by commit dcadd855.
2015-11-24 21:27:57 -08:00
Timothy Arceri
7436d7c33b glsl: only call dead code pass when new inputs/outputs demoted
This will help avoid eliminating inputs/outputs needed by SSOs.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-25 09:50:13 +11:00
Timothy Arceri
404ac4bf9e glsl: move and reused code to find first and last shaders
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-25 09:49:48 +11:00
Matt Turner
0ce370a84b mesa: Use unreachable() instead of a default case.
(And add an unreachable() in one place that didn't have a default case)
2015-11-24 13:27:20 -08:00
Ian Romanick
47b3a0d235 meta: Don't save or restore the active client texture
This setting is only used by glTexCoordPointer and related glEnable
calls.  Since the preceeding commits removed all of those, it is not
necessary to save, reset to default, or restore this state.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
c63f9c735d meta: Don't save or restore the VBO binding
Nothing left in meta does anything with the VBO binding, so we don't
need to save or restore it.  The VAO binding is still modified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
58aa56d40b meta/TexSubImage: Don't pollute the buffer object namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
76cfe2bc44 meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
a222d4cbc3 meta: Use internal functions for buffer object and VAO access in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
b8a7369fb7 meta: Track VBO using gl_buffer_object instead of GL API object handle in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:30 -08:00
Ian Romanick
d5225ee5d9 meta: Partially convert _mesa_meta_DrawTex to DSA
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
37d11b13ce meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
b1b73a42c8 meta: Use internal functions for buffer object and VAO access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
52921f8e08 meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects
The fixed-function attribute paths don't get the DSA treatment because
there are no DSA entry-points for fixed-function attributes.  These
could have been added, but this is a temporary patch intended to make
later patches easier to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
1035e00a81 meta: Track VBO using gl_buffer_object instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
3b5a7d450d meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects
Meta currently does this, but future changes will make this impossible.
Explicitly do it as a step in the patch series now to catch any possible
kinks.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
ed0bd6573b i965: Use _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
7f2f300071 meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
89a61afdd7 meta: Use DSA functions for PBO in create_texture_for_pbo
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
4e6b9c11fc i965: Don't pollute the buffer object namespace in brw_meta_fast_clear
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
e62799bd4e i965: Use internal functions for buffer object access
Instead of going through the GL API implementation functions, use the
lower-level functions.  This means that we have to keep track of a
pointer to the gl_buffer_object and the gl_vertex_array_object.

This has two advantages.  First, it avoids a bunch of CPU overhead in
looking up objects and validing API parameters.  Second, and much more
importantly, it will allow us to stop calling _mesa_GenBuffers /
_mesa_CreateBuffers and pollute the buffer namespace (next patch).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
1c5423d3a0 i965: Use DSA functions for VBOs in brw_meta_fast_clear
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
dcadd855f1 i965: Pass brw_context instead of gl_context to brw_draw_rectlist
Future patches will use the brw_context instead.  Keeping this
non-functional change separate should make the function changes easier
to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
4a644f1caa mesa: Refactor enable_vertex_array_attrib to make _mesa_enable_vertex_array_attrib
Pulls the parts of enable_vertex_array_attrib that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

_mesa_enable_vertex_array_attrib can also be used to enable
fixed-function arrays.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:29 -08:00
Ian Romanick
a336fcd36a mesa: Refactor update_array_format to make _mesa_update_array_format_public
Pulls the parts of update_array_format that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:28 -08:00
Ian Romanick
8fae494df2 mesa: Make bind_vertex_buffer avilable outside varray.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-24 11:31:28 -08:00
Kenneth Graunke
03d6949630 Revert "i965: Combine assembly annotations if possible."
This reverts commit a280e83d71.

It breaks INTEL_DEBUG=fs output.  For example,
glsl-fs-discard-01.shader_test has 11 instructions but only prints 5.

Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-24 10:21:37 -08:00
Matt Turner
5369efe311 glsl: Pass ast_type_qualifier by const reference.
Coverity noticed that we were passing this by value, and it's 152 bytes.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-24 10:05:33 -08:00
Matt Turner
f36993b469 i965: Clean up #includes in the compiler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
1eb11e64b3 i965: Move brw_new_shader and brw_link_shader prototypes from brw_wm.h.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
6ba700c3c3 i965: Compile brw_cs_fill_local_id_payload() as C.
It's only called from C, it compiles as C, so just compile it as C.

Notice the missing extern "C" on the definition of the function, which
would screw things up if the prototype wasn't parsed before the
definition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
6b525d9f2b i965: Move MRF macros from brw_inst.h to brw_eu.h.
brw_inst.h is only for the brw_inst/brw_compact_inst functions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
76732932ec i965: Drop #include of main/glheader.h.
It's never used.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
ecac1aab53 i965: Push down inclusion of brw_program.h.
We were including it in headers, which then caused it to be included in
tons of places it wasn't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
64cc7572c1 i965: Mark functions called from C as extern "C".
These functions' prototypes are marked with extern "C", which apparently
overrides a lack of extern "C" at the definition site if the prototype
has been seen first.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
fb86f0e75a i965: Push down inclusion of vbo/vbo.h.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
6fe9ea78fa i965: Remove duplicate #includes.
Added in commits 36fd65381 and 337dad8ce even though the existing
include was in view.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:33 -08:00
Matt Turner
c06f3d5d54 i965: Remove unneeded forward declarations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:32 -08:00
Matt Turner
e768c498bf i965: Mark count_trailing_one_bits() static.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:32 -08:00
Matt Turner
836aaa4394 i965: Remove useless gen6_blorp.h/gen7_blorp.h headers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-24 10:05:32 -08:00
Matt Turner
d956335a0b util: Include assert.h in macros.h. 2015-11-24 10:05:32 -08:00
Matt Turner
fafbf994cf util: Include <stdbool.h> in debug.h. 2015-11-24 10:05:32 -08:00
Matt Turner
2d8c529903 i965: Prevent implicit upcasts to brw_reg.
Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-24 09:58:33 -08:00
Matt Turner
799f924073 i965: Use scope operator to ensure brw_reg is interpreted as a type.
In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:

brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
                      ^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
                     ^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
       ^

Avoid this by marking brw_reg with the scope resolution operator.
2015-11-24 09:58:33 -08:00
Matt Turner
f093c842e6 i965: Use implicit backend_reg copy-constructor.
In order to do this, we have to change the signature of the
backend_reg(brw_reg) constructor to take a reference to a brw_reg in
order to avoid unresolvable ambiguity about which constructor is
actually being called in the other modifications in this patch.

As far as I understand it, the rule in C++ is that if multiple
constructors are available for parent classes, the one closest to you in
the class heirarchy is closen, but if one of them didn't take a
reference, that screws things up.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-24 09:58:33 -08:00
Matt Turner
309a44d63c i965: Add and use backend_reg::equals().
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-24 09:58:33 -08:00
Roland Scheidegger
6c6a439e98 softpipe/llvmpipe: don't advertize support for ASTC
3333977556 added support for ASTC textures to
gallium. They don't have any helpers hooked up for software decoding, however,
so cannot support them in drivers relying on util code for decoding.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-24 18:26:11 +01:00
Roland Scheidegger
97eed8dcb9 llvmpipe: don't test for unsupported formats in lp_test_format
Removing the fake format helpers (1c7d0a6aa4)
caused this to fail. These formats were never supported, but previously
they would have asserted in the generated jit functions (which, due to lack
of test cases for these formats, were never called) whereas we now assert when
trying to build the jit function. So, skip them completely.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=93092

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-24 18:26:11 +01:00
Ian Romanick
9b41489cb5 docs: add missed i965 feature to relnotes
Trivial.  GL_ARB_fragment_layer_viewport support was added in 8c902a58
by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-11-24 09:03:39 -08:00
Rob Clark
d278e31459 util: move brw_env_var_as_boolean() to util
Kind of a handy function.  And I'll want it available outside of i965
for common nir-pass helpers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nhaehnle@gmail.com>
2015-11-24 10:02:55 -05:00
Christian König
d3e2c48dfa st/va: fix indentation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:48 +01:00
Christian König
64761a841d st/va: move MPEG4 functions into separate file
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:45 +01:00
Christian König
9fe7924328 st/va: move VC-1 functions into separate file
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:41 +01:00
Christian König
da173344a6 st/va: move H264 functions into separate file
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:38 +01:00
Christian König
c9cb22392b st/va: move MPEG12 functions into separate file
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:35 +01:00
Christian König
ec6ef1cbfe st/va: move post processing function into own file
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:31 +01:00
Christian König
3d6386fdc5 st/va: fix post process dirty area handling
The dirty area in this call isn't related to the screen at all.

v2: set clear dirty area to false as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-24 15:31:11 +01:00
Timothy Arceri
2571a768d6 glsl: implement recent spec update to SSO validation
Enables 200+ dEQP SSO tests to proceed past validation,
and fixes a ES31-CTS.sepshaderobjs.PipelineApi subtest.

V2: split out change that reverts a previous patch into its own commit,
move variable declaration to top of function, and fix some formatting
all suggested by Ian.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-11-24 20:59:48 +11:00
Timothy Arceri
3c4aa7aff2 Revert "mesa: return initial value for VALIDATE_STATUS if pipe not bound"
This reverts commit ba02f7a3b6.

The commit checked whether the pipeline was currently bound instead
of checking whether it had ever been bound.  The previous setting
of Validated during object creation makes this unnecessary.  The
real problem was that Validated was not properly set to false
elsewhere in the code.  This is fixed by a later patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-11-24 20:59:44 +11:00
Michel Dänzer
d094631936 radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now
llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-24 18:07:48 +09:00
Boyuan Zhang
f55f134a03 radeon/uvd: uv pitch separation for stoney
v2: set the behaviour default for future ASICs.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-23 17:34:43 -05:00
Jason Ekstrand
179fc4aae8 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in nir cloning and some much-needed upstream refactors.
2015-11-23 14:03:47 -08:00
Dave Airlie
237bcdbab5 texgetimage: consolidate 1D array handling code.
This should fix the getteximage-depth test that currently asserts.

I was hitting problem with virgl as well in this area.

This moves the 1D array handling code to a single place.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-24 06:43:21 +10:00
Jason Ekstrand
d9b8fde963 i965: Use NIR for lowering texture swizzle
Now that nir_lower_tex can do texture swizzle lowering, we can use that
instead of repeating more-or-less the same code in both backends.  This
both allows us to share code and means that things like the tg4
work-arounds are somewhat simpler because they don't have to take the
swizzle into account.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:07:32 -08:00
Jason Ekstrand
8537b4ab76 nir/lower_tex: Add support for lowering texture swizzle
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
6921b17107 nir: Add a tex_instr_is_query helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
7e83fd85aa nir: Add a ssa_def_rewrite_uses_after helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
384396a69b nir: Use instr/if_rewrite in nir_ssa_def_rewrite_uses
nir_ssa_def_rewrite_uses is one of the older helpers in NIR and predated
both of those.  Now it can be substantially simplified.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
03c9ad900e nir/validate: Validated dests after sources
Previously, if someone accidentally made an instruction that refers to its
own SSA destination, the validator wouldn't catch it.  The reason for this
is that it validated the destination too early and, by the time it got to
the source, the destination SSA value was already added to the set of seen
SSA values so it would assume that it came from some previous instruction.
By moving destination validation to be after source validation, the SSA
value is not in the list of seen values and the validator will catch
self-referential instructions.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
6c8ba59cff i965: Use nir_lower_tex for texture coordinate lowering
Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates.  Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us.  This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-23 11:04:49 -08:00
Jason Ekstrand
d065a93a3f i965/fs: Stomp the texture return type to UINT32 for resinfo messages
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-23 11:02:15 -08:00
Jason Ekstrand
042fa75e48 nir/lower_tex: Set the dest_type for txs instructions
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-23 11:02:15 -08:00
Jason Ekstrand
1417f6a216 nir/lower_tex: Report progress
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-23 11:02:15 -08:00
Jason Ekstrand
ce767bbdff i965: Move postprocess_nir to codegen time
This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen.  In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-23 11:02:15 -08:00
Jason Ekstrand
9cf108193b i965/nir: Split shader optimization and lowering into three stages
At the moment, brw_create_nir just calls the three stages in sequence so
there's not much difference.  Soon, however, we will want to start doing
variants in NIR at which point the postprocessing step will have to move
from shader create time to codegen time.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-23 11:02:15 -08:00
Jason Ekstrand
9d703de85a i965: Use ull immediates in brw_inst_bits
This fixes a regression introduced in b1a83b5d1 that caused basically all
shaders to fail to compile on 32-bit platforms.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 10:55:38 -08:00
Ilia Mirkin
e4c1221d36 docs: add missed freedreno features to relnotes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2015-11-23 12:32:54 -05:00
Ilia Mirkin
33dc9aac07 docs: update relnotes with new freedreno/a4xx support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 12:32:54 -05:00
Jose Fonseca
c9651f0264 svga: Add ASTC formats to format table.
Fixes build.  Otherwise untested.

Trivial.
2015-11-23 16:45:28 +00:00
Ilia Mirkin
754b26e76d freedreno/ir3: add support for a few gs5 ops
Tested on a4xx. This is part of the builtins added by ARB_gpu_shader5
and GLSL ES 3.10.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:16 -05:00
Ilia Mirkin
cca8dd4e93 ttn: fix UMSB conversion
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:16 -05:00
Ilia Mirkin
190acb34ca freedreno/a4xx: add ARB_texture_query_lod support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
f0e670bdd7 ttn: add LODQ support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
9761d5146f freedreno/a4xx: re-emit program on dirty framebuffer
The program emit depends on certain fb details. Make sure those get
updated when the fb changes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
81b16350fa freedreno/a4xx: use a factor of 32767 for snorm8 blending
It appears that the hardware wants the integer to be scaled the same way
that the hardware representation is. snorm16 uses one of the float
factors, so this is only relevant for snorm8.

This fixes a number of subcases of
  bin/fbo-blending-formats GL_EXT_texture_snorm

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-23 11:17:15 -05:00
Ilia Mirkin
6f17f19b17 freedreno/a4xx: only compute texture offset once for the view
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
f10bb0ac9e freedreno/a4xx: add ARB_texture_view support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
1b9992b803 freedreno/a4xx: add formats for ARB_texture_buffer_object_rgb32 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
f9549d0a0f freedreno/a4xx: add ARB_texture_rgb10_a2ui support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
93905a8df1 freedreno/a4xx: add astc formats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
6b21d3c92e st/mesa: add astc support
This doesn't account for the ldr/hdr distinction... that will probably
have to be exposed via a separate cap. When relevant hardware appears,
this can be worked out.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
3333977556 gallium: add ASTC formats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-23 11:17:15 -05:00
Ilia Mirkin
1c7d0a6aa4 gallium/util: remove the fake format helpers for bptc and etc2
This was a silly hack that kept growing and growing. Instead, just write
NULLs for those functions. No need to have helpers that just assert(0)
when you call them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-23 11:17:14 -05:00
Ilia Mirkin
c65bc2e805 freedreno/a4xx: support 16384 texels in buffer texture
Looks like the width field's bitmask was off-by-one.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:14 -05:00
Ilia Mirkin
99f12a3f1a freedreno/a4xx: add ARB_texture_buffer_range support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:14 -05:00
Ilia Mirkin
d4c40f99ab freedreno/a4xx: add polygon mode support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-23 11:17:14 -05:00
Emil Velikov
b89d1b2ccf configure.ac: default to disabled dri3 when --disable-dri is set
Not too long ago, the dri3 code was living in src/glx, which in itself
was guarded by HAVE_DRI_GLX. As the name suggests we didn't dive into
the folder when dri was disabled, thus we missed that dri3 does not
consider/honour --enable-dri.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 6bd9ba7d07 "loader: Add dri3 helper"
Cc: Pali Rohár <pali.rohar@gmail.com>
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-23 12:08:04 +00:00
Emil Velikov
b9b0a1f58e loader: unconditionally add AM_CPPFLAGS to libloader_la_CPPFLAGS
It seems that due to the conditional autotools is getting confused and
forgetting to add AM_CPPFLAGS when building libloader (when
HAVE_DRICOMMON is not set).

Cc: mesa-stable@lists.freedesktop.org
Fixes: 5a79e0a8e3 "automake: loader: rework the CPPFLAGS"
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 12:07:50 +00:00
Emil Velikov
8a6d476588 pipe-loader: link against libloader regardless of libdrm presence
Whether or not the loader has libdrm support is up-to it. Anyone using
the loader should just include it whenever they depend on it.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 0f39f9cb7a "pipe-loader: add a dummy 'static' pipe-loader"
Reported-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-23 12:07:09 +00:00
Neil Roberts
2010de4015 i965: Handle lum, intensity and missing components in the fast clear
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.

Fixes the ext_framebuffer_multisample-fast-clear Piglit test.

v2: Handle luminance and intensity formats
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-11-23 10:44:01 +01:00
Jason Ekstrand
f58813842b nir: s/nir_type_unsigned/nir_type_uint
v2: do the same in tgsi_to_nir (Samuel)

v3: added missing cases after rebase (Iago)

v4: Add a blank space after '#' in one of the comments (Matt)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 08:36:12 +01:00
Connor Abbott
fb93dd7aa8 nir/builder: only read meaningful channels in nir_swizzle()
This way the caller doesn't have to initialize all 4 channels when they
aren't using them.

v2: Fix signed/unsigned comparison warning (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 08:36:12 +01:00
Connor Abbott
d982922b18 i965/fs: add stride restrictions for copy propagation
There are various restrictions on what the hstride can be that depend on
the Gen, and now that we're using hstride == 2 for packing/unpacking
doubles, we're going to run into these restrictions a lot more often.
Pull them out into a separate function, and move the one restriction we
checked previously into it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 08:30:30 +01:00
Connor Abbott
95ac3b1dae i965/fs: don't propagate cmod when the exec sizes differ
This can happen when the source of the compare was split by the SIMD
lowering pass. Potentially, we could allow the case where the exec size
of scan_inst is larger, and scan_inst has the right quarter selected,
but doing that seems a little more risky.

v2: Merge the bail condition into the the previous if/break block (Matt)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 08:30:30 +01:00
Connor Abbott
70171a9c89 i965/fs: respect force_sechalf/force_writemask_all in CSE
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-23 08:30:30 +01:00
Connor Abbott
b1a83b5d1b i965: fix 64-bit immediates in brw_inst(_set)_bits
If we tried to get/set something that was exactly 64 bits, we would
try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
1's like we want.

v2 (Iago)
 - Replace ~0 by ~0ull
 - Removed unnecessary parenthesis

v3 (Kristian)
 - Avoid the conditional

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-23 08:30:30 +01:00
Connor Abbott
718b9f52dd i965/fs: print non-1 strides when dumping instructions
v2:
  - Simplify code (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-23 08:30:30 +01:00
Ilia Mirkin
4deb118d06 nv50/ir: fix (un)spilling of 3-wide results
There is no 96-bit load/store operations, so we have to split it up
into a 32-bit parts, with a split/merge around it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90348
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-11-22 23:27:22 -05:00
Timothy Arceri
6463d36394 glsl: fix max binding validation for uniform blocks
Regression as of 64710db664

We can't use the type returned by get_interface_type() as
the interface type has arrays removed.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-23 13:47:19 +11:00
Ilia Mirkin
ad5f6b03e7 nv50,nvc0: properly handle buffer storage invalidation on dsa buffer
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-11-22 21:08:16 -05:00
Ilia Mirkin
079f713754 nouveau: use the buffer usage to determine placement when no binding
With ARB_direct_state_access, buffers can be created without any binding
hints at all. We still need to allocate these buffers to VRAM or GART,
as we don't have logic down the line to place them into GPU-mappable
space. Ideally we'd be able to shift these things around based on usage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-11-22 20:58:56 -05:00
Eric Anholt
1b62a4e885 vc4: Take precedence over ilo when in simulator mode.
They're exclusive at build time, but the ilo entry is always present, so
we'd try to use it and fail out.

v2: Add comment in the code, from Emil.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-22 13:15:43 -08:00
Eric Anholt
a39eac80fd vc4: Just put USE_VC4_SIMULATOR in DEFINES.
In the pipe-loader reworks, it was missed in one of the new directories it
was used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-22 13:15:40 -08:00
Nanley Chery
d1212abf50 mesa/teximage: Fix S3TC regression due to ASTC interaction
A prior, literal reading of the ASTC spec led to the prohibition
of some compressed formats being used against the targets:
TEXTURE_CUBE_MAP_ARRAY and TEXTURE_3D. Since the spec does not specify
interactions with other extensions for specific compressed textures,
remove such interactions.

Fixes the following Piglit tests on Gen9:
piglit.spec.arb_direct_state_access.getcompressedtextureimage
piglit.spec.arb_get_texture_sub_image.arb_get_texture_sub_image-getcompressed
piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array s3tc_dxt1
piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc

v2. Don't interact with other specific compressed formats (Ian).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91927
Suggested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-22 12:29:09 -08:00
Nanley Chery
21d43fe51a mesa/extensions: Enable overriding permanently enabled extensions
Provide the ability to prevent any permanently enabled extension
from appearing in the string returned by glGetString[i]().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-11-22 12:19:45 -08:00
Igor Gnatenko
05eed0eca7 virgl: pipe_virgl_create_screen is not static
Cc: mesa-stable@lists.freedesktop.org
Fixes: 17d3a5f857 "target-helpers: add a non-inline drm_helper.h"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93063
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-22 11:17:17 +00:00
Kenneth Graunke
86fc97da06 i965: Fix num_uniforms count for scalar GS.
I noticed that brw_vs.c does this.

I believe the point is that nir->num_uniforms is either counted in
scalar components (in scalar mode), or vec4 slots (in vector mode).
But we want param_count to be in scalar components regardless, so
we have to scale up in vector mode.

We don't have to scale up in scalar mode, though.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-22 00:03:21 -08:00
Eric Anholt
4cff16bc3a vc4: Use nir_channel() to simplify all of our nir_swizzle() cases. 2015-11-21 18:55:31 -08:00
Eric Anholt
81544f231a vc4: Fix point size lookup.
I think I may have regressed this in the NIR conversion.  TGSI-to-NIR is
putting the PSIZ in the .x channel, not .w, so we were grabbing some
garbage for point size, which ended up meaning just not drawing points.

Fixes glean pointAtten and pointsprite.
2015-11-21 18:55:31 -08:00
Jose Fonseca
4befd82a64 pipe-loader: Fix PATH_MAX define on MSVC. 2015-11-21 23:03:20 +00:00
Jose Fonseca
02afbd2476 scons: Conditionally use DRM module on pipe-loader.
Fixes non Linux builds.

Trivial.
2015-11-21 21:20:12 +00:00
Jason Ekstrand
e14b2c76b4 anv/meta_clear: Don't trash state if no clears are needed 2015-11-21 11:39:12 -08:00
Ilia Mirkin
22aeb0c568 freedreno/a4xx: disable blending and alphatest for integer rt0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-21 09:08:16 -05:00
Ilia Mirkin
4c170d9e1d freedreno/a4xx: fix independent blend
This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-21 09:08:16 -05:00
Ilia Mirkin
801b55c2ee freedreno/a4xx: enable ARB_base_instance support
We already pass in start_instance in fd4_draw. Expose the extension.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-21 09:08:16 -05:00
Ilia Mirkin
f54c89f13e freedreno/a4xx: set fetchsize in mem2gmem texture restore
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-21 09:08:16 -05:00
Ilia Mirkin
7426d9581a freedreno/a4xx: add 11_11_10_float vertex type support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-21 09:08:16 -05:00
Ilia Mirkin
740eb63aa7 freedreno/a4xx: fix 3d texture setup
Same fix as on a3xx - set the second (tiny) layer size bitfield to the
smallest level's size so that the hw knows not to minify beyond that.

This fixes texelFetch sampler3D piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-21 09:08:16 -05:00
Ilia Mirkin
ecb0dcd34c freedreno/a4xx: only align slices in non-layer_first textures
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.

This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-21 09:08:16 -05:00
Emil Velikov
428146522b docs: add 11.2.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 14:10:08 +00:00
Jason Ekstrand
83c305f8ef anv/meta_clear: Don't try to clear depth-stencil without LOAD_OP_CLEAR 2015-11-21 00:05:18 -08:00
Jason Ekstrand
438eaa3ae7 anv/meta: Add initial support for multi-slice array and 3-D copies
We still need to fix up a few bits once we have real CPP values, but this
should get us a long ways.
2015-11-20 18:25:06 -08:00
Jason Ekstrand
d6a7c659c7 anv/meta: Use array textures for 2D
This a total of 1 extra instruction in the shader and gives us a lot more
flexibility in how we do blits.
2015-11-20 16:00:34 -08:00
Jason Ekstrand
e3ec964e44 anv/meta: Keep z coordinate flat while blitting 2015-11-20 15:48:03 -08:00
Jason Ekstrand
1157b0360d nir/spirv: Rework decoration iteration
The old code didn't work correctly if you had member decorations after
non-member decorations.  Since glslang never gave us any of those, it
wasn't properly tested.
2015-11-20 15:15:40 -08:00
Jason Ekstrand
cff74d6fb8 nir/spirv: Handle OpNop 2015-11-20 15:02:45 -08:00
Jason Ekstrand
1d42f773d3 gen8_state: Clamp sampler values to HW limitations 2015-11-20 14:45:44 -08:00
Jason Ekstrand
48228c114e nir/spirv: Add support for runtime arrays 2015-11-20 12:49:20 -08:00
Jason Ekstrand
55d16c090e gen8/pipeline: Properly handle MIN/MAX blend ops 2015-11-20 11:53:10 -08:00
Jason Ekstrand
b43ce6768d gen8/pipeline: Set IndependentAlphaBlendEnable properly 2015-11-20 11:52:54 -08:00
Jason Ekstrand
e69db9159b gen8/pipeline: Minor blending fixes
This makes various fields match upstream mesa
2015-11-20 11:52:30 -08:00
Jason Ekstrand
fa8db0dfcc anv: Put all of the descriptor set stuff together in one file
The stuff to take descriptor sets and turn them into binding tables and
sampler tables is still in anv_cmd_buffer.c.  We may want to consider
putting it in anv_descriptor_set.c eventually.
2015-11-18 14:58:43 -08:00
Jason Ekstrand
828b1a6eb6 anv/device: Update the right sampler in UpdateDescriptorSets 2015-11-18 14:48:28 -08:00
Jason Ekstrand
6f613abc2b anv/cmd_buffer: Add a new genX_cmd_buffer file for shared code
This file contains code that can be shared across gens modulo recompiling.
In particular, we can share STATE_BASE_ADDRESS setup and handling of the
vkPipelineBarrier call.  Not sharing STATE_BASE_ADDRESS setup has already
been a source of bugs and the gen7 and gen8 implementations of
PipelineBarrier were line-for-line identical.

Incidentally, this should fix MOCS settings for dynamic and surface state
on Haswell.
2015-11-18 12:26:57 -08:00
Jason Ekstrand
fb8b2f5f9e anv/gen7: A bunch of depth-stencil fixes
There are various bits which move around between Haswell and Ivy Bridge
that we weren't taking into account.  This also makes us actually set the
StencilWriteEnable in a sane way.
2015-11-18 11:43:52 -08:00
Jason Ekstrand
e9d634f4ad gen7/pipeline: Re-arrange stencil parameters to match gen8 2015-11-17 19:10:31 -08:00
Jason Ekstrand
9e39bdabad anv/gen7: Implement CmdPipelineBarrier 2015-11-17 17:09:27 -08:00
Jason Ekstrand
b707e90b6e anv/gen7: Don't use the upper bound on dynamic state base address
It doesn't do much for us and, if we have to resize the dynamic state block
pool for any reason, it becomes out-of-date.
2015-11-17 17:08:44 -08:00
Jason Ekstrand
f0390bcad6 anv: Add initial Haswell support 2015-11-17 12:14:24 -08:00
Jason Ekstrand
45320f677b anv: Add macros for doing per-gen compilation 2015-11-17 08:27:51 -08:00
Jason Ekstrand
92d164b1c3 anv/entrypoints: Add dispatch support for haswell 2015-11-17 08:27:51 -08:00
Jason Ekstrand
aa3002bd42 anv/entrypoints: Use devinfo instead of a gen number 2015-11-17 08:27:51 -08:00
Jason Ekstrand
0508046dc8 anv/cmd_buffer: Pack the 3DSTATE_VF packet on-demand 2015-11-17 08:27:51 -08:00
Jason Ekstrand
34d55d69cf anv/formats: Don't advertise stencil texture/blit prior to Broadwell 2015-11-17 08:23:29 -08:00
Jason Ekstrand
de54b4b18f anv: Only include the pack headers where needed
Previously, we were including gen7_pack.h, gen75_pack.h, and gen8_pack.h
in anv_private.h.  As we add more gens, this is going to become untenable.
This commit moves things around so that we only use the pack headers when
and if we need them.
2015-11-16 12:29:09 -08:00
Jason Ekstrand
cb9e2305f8 anv/cmd_buffer: Move gen-specific stuff into the appropreate files 2015-11-16 12:10:11 -08:00
Jason Ekstrand
22d024e031 nir/spirv: Add support for separate samplers and textures
This gets tricky in a few places because we have to pass vtn_sampled_image
values through OpAccessChain, but it works ok.  At some point, it probably
needs to be cleaned up but it doesn't occur to me exactly how to do that at
the moment.  We'll see how this approach goes.
2015-11-14 22:32:54 -08:00
Jason Ekstrand
002db3ee15 anv/cmd_buffer: Add a default descriptor type case
This silences a bunch of compiler warnings.
2015-11-14 09:16:55 -08:00
Jason Ekstrand
e9dba80430 anv/apply_pipeline_layout: Handle separate samplers and textures 2015-11-14 09:00:35 -08:00
Jason Ekstrand
b5d4027c35 Merge branch 'wip/i965-separate-sampler-tex' into vulkan 2015-11-14 08:23:27 -08:00
Jason Ekstrand
c7d504ad93 i965/vec4: Plumb separate surfaces and samplers through from NIR 2015-11-14 08:05:31 -08:00
Jason Ekstrand
3dd84822df i965/vec4: Separate the sampler from the surface in generate_tex 2015-11-14 08:05:31 -08:00
Jason Ekstrand
c09e140b65 i965/fs: Plumb separate surfaces and samplers through from NIR 2015-11-14 08:04:47 -08:00
Jason Ekstrand
c2a373ec85 i965/fs: Separate the sampler from the surface in generate_tex 2015-11-14 08:01:50 -08:00
Jason Ekstrand
b169bb902a nir: Separate texture from sampler in nir_tex_instr
This commit adds the capability to NIR to support separate textures and
samplers.  As it currently stands, glsl_to_nir only sets the sampler and
leaves the texture alone as it did before and nir_lower_samplers assumes
this.  However, backends can, if they wish, assume that they are separate
because nir_lower_samplers sets both texture and sampler index (they are
the same in this case).
2015-11-14 07:57:31 -08:00
Jason Ekstrand
1469ccb746 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in Matt's big compiler refactor.
2015-11-14 07:56:10 -08:00
Jason Ekstrand
e8f51fe4de anv/gen8: Subtract 1 from num_elements when setting up buffer surface state 2015-11-13 22:50:54 -08:00
Jason Ekstrand
91bc4e7cec anv/pipeline: Don't free blend states that don't exist
Compute pipelines don't need a blend state so we shouldn't be
unconditionally freeing it.
2015-11-13 21:49:41 -08:00
Jason Ekstrand
c1733886a6 nir/spirv: Add support for SSBO stores
This only handles vector stores, not component-of-a-vector stores.
2015-11-13 21:41:52 -08:00
Jason Ekstrand
c68e28d766 nir/spirv: Refactor vtn_block_load
We pull the offset calculations out into their own function so we can
re-use it for stores.
2015-11-13 21:32:00 -08:00
Jason Ekstrand
99494b96f0 nir/spirv: Add support for image_load_store 2015-11-13 17:54:43 -08:00
Jason Ekstrand
164b3ca164 nir/builder: Add a nir_ssa_undef helper 2015-11-13 17:54:43 -08:00
Jason Ekstrand
ffbc31d13b nir/spirv: Add support for creating image variables 2015-11-13 17:54:43 -08:00
Jason Ekstrand
453239f6a5 nir/spirv: Add support for image types 2015-11-13 17:54:43 -08:00
Jason Ekstrand
0572444a0e nir/types: Add image type helpers 2015-11-13 17:54:43 -08:00
Jason Ekstrand
d5ba7a26d9 glsl/types: Add a get_image_instance helper 2015-11-13 17:54:43 -08:00
Chad Versace
738eaa8acf isl: Embed brw_device_info in isl_device
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 11:14:03 -08:00
Chad Versace
ba467467f4 anv: Use enum isl_tiling everywhere
In anv_surface and anv_image_create_info, replace member
'uint8_t tile_mode' with 'enum isl_tiling'.

As a nice side-effect, this patch also reduces bug potential because the
hardware enum values for tile modes are unstable across hardware
generations.
2015-11-13 10:44:09 -08:00
Chad Versace
af392916ff anv/device: Embed isl_device
Embed struct isl_device into anv_physical_device and anv_device.  It
will later be used for surface layout calculations.
2015-11-13 10:44:09 -08:00
Chad Versace
a4a2ea3f79 isl: Add enum isl_tiling and a query func
The query func is isl_tiling_get_extent.
2015-11-13 10:44:07 -08:00
Chad Versace
652727b029 isl: Add structs isl_extent2d, isl_extent3d
They are nowhere used yet.
2015-11-13 10:31:49 -08:00
Chad Versace
b1bb270590 isl: Add struct isl_device
The struct is incomplete (it contains only the gen). And it's nowhere
used yet.

It will be used later for surface layout calculations.
2015-11-13 10:31:37 -08:00
Chad Versace
477383e9ac anv: Strip trailing whitespace from anv_device.c 2015-11-13 10:27:40 -08:00
Chad Versace
c6493dff79 anv: Strip trailing space in anv_private.h 2015-11-12 12:24:01 -08:00
Chad Versace
addc2a9d02 anv: Remove redundant fields anv_format::bs,bw,bh,bd
Instead, use the equivalent fields in anv_format::isl_layout.
2015-11-12 12:23:49 -08:00
Chad Versace
cbc31f453d anv/formats: Re-indent the fmt() macro
Use one line per struct member.
2015-11-12 12:21:46 -08:00
Chad Versace
1bea1669c5 anv: Use enum isl_format in anv_format
This patch begins using isl.h in Anvil. More refactors will follow.

Change type of anv_format::surface_format from uint16_t -> enum
isl_format.
2015-11-12 12:21:46 -08:00
Chad Versace
bfb022a235 isl: Generate isl_format_layout.c
Generate an array of struct isl_format_layout, using
isl_format_layout.csv as input.

Each entry follows the patten:

   [ISL_FORMAT_R32G32B32A32_FLOAT] = {
      ISL_FORMAT_R32G32B32A32_FLOAT,
      .bs = 16, .bpb = 128,
      .bw = 1, .bh = 1, .bd = 1,
      .channels = {
          .r = { ISL_SFLOAT, 32 },
          .g = { ISL_SFLOAT, 32 },
          .b = { ISL_SFLOAT, 32 },
          .a = { ISL_SFLOAT, 32 },
          .l = {},
          .i = {},
          .p = {},
      },
      .colorspace = ISL_COLORSPACE_LINEAR,
      .txc = ISL_TXC_NONE,
   },
2015-11-12 12:21:46 -08:00
Chad Versace
7986efc644 isl: Add CSV of format layouts
Add file isl_format_layout.csv, which describes the block layout,
channel layout, and colorspace of all hardware surface formats.
2015-11-12 11:56:16 -08:00
Chad Versace
67362698a9 isl: Add enum isl_format 2015-11-12 11:34:45 -08:00
Jason Ekstrand
3a3d79b38e anv/gen7: Implement the VS state depth-stall workaround 2015-11-10 16:42:34 -08:00
Jason Ekstrand
750b8f9e98 anv/gen7: Properly handle a GS with zero invocations 2015-11-10 16:41:23 -08:00
Jason Ekstrand
9d18555c8d anv/gen7: Add push constant support 2015-11-10 15:14:11 -08:00
Jason Ekstrand
427978d933 anv/device: Use an actual int64_t in WaitForFences 2015-11-10 15:02:52 -08:00
Jason Ekstrand
d9079648d0 anv/meta: Create a sampler in meta_emit_blit 2015-11-10 14:43:18 -08:00
Jason Ekstrand
b461744c52 anv/gen7: Properly handle VS with VertexID but no vertices 2015-11-10 11:31:31 -08:00
Jason Ekstrand
aafc87402d anv/device: Work around the i915 kernel driver timeout bug
There is a bug in some versions of the i915 kernel driver where it will
return immediately if the timeout is negative (it's supposed to wait
indefinitely).  We've worked around this in mesa for a few months but never
implemented the work-around in the Vulkan driver.

I rediscovered this bug again while working on Ivy Bridge becasuse the
drive in my Ivy Bridge currently has Fedora 21 installed which has one of
the offending kernels.
2015-11-10 11:24:11 -08:00
Jason Ekstrand
06f466a770 anv/nir: Fix codegen in lower_push_constants 2015-11-09 16:29:05 -08:00
Jason Ekstrand
abede04314 anv/gen7: Fix the length of 3DSTATE_SF 2015-11-09 16:04:07 -08:00
Jason Ekstrand
e8c2a52a70 anv/gen7: Properly handle missing color-blend state 2015-11-09 16:04:06 -08:00
Jason Ekstrand
862da6a891 anv/device: Add a newline to the end of a comment 2015-11-09 16:04:06 -08:00
Nanley Chery
9c2b37a9c3 anv/formats: Define ETC2 formats
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Nanley Chery
41cf35d1d8 anv/image: Determine the alignment units for compressed formats
Alignment units, i and j, match the compressed format block
width and height respectively.

v2: Don't assert against HALIGN* and VALIGN* enums (Chad)

Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Nanley Chery
381f602c6b anv/image: Handle compressed format qpitch and padding
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Nanley Chery
300f7c2be3 anv/image: Handle compressed format stride and size
These formulas did not take compressed formats into account.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Nanley Chery
7b4244dea0 anv/formats: Add fields for block dimensions
A non-compressed texture is a 1x1x1 block. Compressed
textures could have values which vary in different
dimensions WxHxD.

Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Nanley Chery
a6c7d1e016 anv/formats: Add surface_format initializer
v2: Rename __brw_fmt to __hw_fmt (Chad)

Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace chad.versace@intel.com
2015-11-09 15:41:41 -08:00
Nanley Chery
3ee923f1c2 anv: Rename cpp variable to "bs"
cpp (chars-per-pixel) is an integer that fails to give useful data
about most compressed formats. Instead, rename it to "bs" which
stands for block size (in bytes).

v2: Rename vk_format_for_bs to vk_format_for_size (Chad)
    Use "block size" instead of "bs" in error message (Chad)

Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-09 15:41:41 -08:00
Jason Ekstrand
17fa3d3572 nir/spirv: Give both block and buffer_block types an interface type 2015-11-07 08:03:25 -08:00
Jason Ekstrand
a10d59c09a nir/spirv: Increment num_ubos/ssbos when creating variables 2015-11-06 16:53:27 -08:00
Jason Ekstrand
046563167c anv/apply_dynamic_offsets: Use the right sized immediate zero 2015-11-06 16:49:24 -08:00
Jason Ekstrand
104525c33b anv/pipeline: Set the right SSBO binding table start index for FS 2015-11-06 15:57:51 -08:00
Jason Ekstrand
399d5314f6 anv/cmd_buffer: Rework the way we emit UBO surface state
The new mechanism should be able to handle SSBOs as well as properly handle
emitting surface state on gen7 where we need different strides depending on
shader stage.
2015-11-06 15:14:12 -08:00
Jason Ekstrand
1b5c7e7ecd anv/pipeline: Expose is_scalar_shader_stage 2015-11-06 15:12:33 -08:00
Jason Ekstrand
5ba281e794 nir/spirv: Add a helper for determining if a block is externally visable 2015-11-06 15:09:57 -08:00
Jason Ekstrand
220261a0c9 anv: Use VkDescriptorType instead of anv_descriptor_type 2015-11-06 14:09:52 -08:00
Jason Ekstrand
612e35b2c6 anv: Do range-checking in the shader for dynamic buffers 2015-11-06 13:32:52 -08:00
Jason Ekstrand
f8052351ac anv/device: Increase the block size for instructions 2015-11-06 13:29:47 -08:00
Jason Ekstrand
d7cc9929bb anv: Remove all support for BufferViews
We never *actually* supported them, we just used them for binding UBOs.
Now that we have BufferInfo and we aren't supporting texture buffers yet,
we should get rid of them until we can do them properly.
2015-11-06 13:16:18 -08:00
Jason Ekstrand
0360c3608b anv/device: Only support binding UBOs through BufferInfo 2015-11-06 12:52:12 -08:00
Jason Ekstrand
3aa2fc82dd anv: Rework UpdateDescriptorSets
Previously, UpdateDescriptorSets was wrong because it assumed that the
binding was the offset into the descriptor set.
2015-11-06 12:28:03 -08:00
Jason Ekstrand
45b1bbe801 anv: Add a descriptor_index to anv_descriptor_set_binding_layout 2015-11-06 12:16:54 -08:00
Jason Ekstrand
f029e0ce13 anv: Add a layout to anv_descriptor_set 2015-11-06 12:16:54 -08:00
Chad Versace
16119ad884 anv/meta: Finish load clears for stencil attachments
Tested by Crucible "func.depthstencil.stencil_triangles.*" in

  commit c194292d5eadb84e9d7489fc01ce0b653cdd4ca5 (HEAD -> master)
  Author: Chad Versace <chad.versace@intel.com>
  Date:   Wed Nov 4 16:19:24 2015 -0800
  Subject: func.depthstencil: Remove stencil clear workaround for Mesa
2015-11-05 15:45:43 -08:00
Jason Ekstrand
a40f682c71 anv/cmd_buffer: Fix SURFACE_STATE for non-view buffer bindings
We were treating it as if it's a BufferView and weren't taking the offset
into account properly.
2015-11-04 19:56:18 -08:00
Jason Ekstrand
1b68120760 anv/cmd_buffer: Don't use an anv_state pointer in emit_binding_table
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer.  Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
2015-11-04 19:56:16 -08:00
Chad Versace
d259af3fbb anv: Remove unused anv_render_pass members
Remove members
  num_color_clear_attachments
  has_depth_clear_attachment
  has_stencil_clear_attachment

The new clear code in anv_meta_clear.c does not use them.
2015-11-04 15:54:38 -08:00
Chad Versace
a9a3071fc4 anv/meta: Rewrite clear code
Fixes Crucible test "func.clear.load-clear.attachments-8".

The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.

    - It assumed that a VkRenderPass contained at most one depthstencil
       attachment.

    - It tried to clear all attachments (color and the sole
      depthstencil) with a single instanced draw call, using the VUE
      header's RenderTargetArrayIndex to specify the instance's target
      color attachment. But the RenderTargetArrayIndex does not select
      entries in the binding table; it only selects an array index of
      a singled layered surface.

    - If at least one attachment of VkRenderPass had
      VK_ATTACHMENT_LOAD_OP_CLEAR,
      then the old code cleared *all* attachments. This was
      a consequence of using a single draw call and single pipeline for
      the clear.

The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.

The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
2015-11-04 15:20:52 -08:00
Chad Versace
49c96a14c5 anv/meta: Clear color attribute is always flat
No behavioral change. This patch just removes an unneeded function
parameter.
2015-11-04 15:15:19 -08:00
Chad Versace
7f82cc718f anv/meta: Use consistent naming for dynamic state mask
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.

  anv_meta_saved_state::dynamic_flags -> dynamic_mask
  anv_meta_save(dynamic_state)        -> dynamic_mask
2015-11-04 15:15:19 -08:00
Chad Versace
2bdb9e2ed9 anv/meta: Rename anv_cmd_buffer_save/restore
As the functions are now exposed in anv_meta.h, let's rename them
to clarify that they are meta functions.

    anv_cmd_buffer_save -> anv_meta_save
    anv_cmd_buffer_restore -> anv_meta_restore
2015-11-04 15:15:19 -08:00
Chad Versace
16b2a489db anv: Move meta clear code to new file anv_meta_clear.c
anv_meta.c currently handles blits, copies, clears, and resolves.  The
clear code is about to grow, and anv_meta.c is already busting at the
seams.
2015-11-04 15:15:19 -08:00
Chad Versace
c56727037a anv: Move struct anv_vue_header to anv_private.h
Move it from anv_meta.c to the common header anv_private.h. This allows
us to split the meta blit and meta clear code into separate files.
2015-11-04 15:15:19 -08:00
Jason Ekstrand
b00e3f221b Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-11-03 15:45:04 -08:00
Jason Ekstrand
a1e7b8701a nir: remove sampler_set from nir_tex_instr
Now that descriptor sets are handled in a lowering pass, this is no longer
needed.
2015-11-03 14:58:20 -08:00
Chad Versace
4d1c76485b anv: Drop stale comment in anv_cmd_buffer_emit_binding_table()
When emitting the binding table for the fragment shader stage, we no
longer "walk all of the attachments, [inserting only] the color
attachments into the binding table". Instead, we iterate only over the
subpass's color attachments, which is the minimal possible iteration.

While killing the comment, also rename the variable 'attachments' to
'color_count', as it's no longer a count of all framebuffer attachments
but only the subpass's color attachment count.
2015-11-03 13:46:40 -08:00
Jason Ekstrand
584f9d4442 anv: Report 0 physical devices when not on Broadwell or Ivy Bridge
Right now, Broadweel and Ivy Bridge are the only supported platforms.
Hopefully, this reduces the chances that someone will try the driver on
unsupported hardware and be confused that it doesn't work.
2015-11-02 12:14:37 -08:00
Jason Ekstrand
3883728730 anv: Add better push constant support
What we had before was kind of a hack where we made certain untrue
assumptions about the incoming data.  This new support, while it still
doesn't support indirects properly (that will come), at least pulls the
offsets and strides from SPIR-V like it's supposed to.
2015-10-29 22:26:36 -07:00
Jason Ekstrand
1f2624e6dd nir/spirv: Add support for push constants 2015-10-29 22:26:00 -07:00
Jason Ekstrand
a2283508b0 nir/intrinsics: Add a load_push_constant intrinsic 2015-10-29 22:26:00 -07:00
Jason Ekstrand
f2a8c9db24 nir/spirv: Rework the way we handle interface types 2015-10-29 22:26:00 -07:00
Chad Versace
4073219cf1 anv/pass: Remove redundant assert
Trivial fix.
2015-10-29 11:47:39 -07:00
Chad Versace
1e98177439 anv/pass: Move VkRenderPass code to new file
Move it from anv_device.c to new file anv_pass.c. Because it will soon
grow bigger.
2015-10-29 11:10:03 -07:00
Chad Versace
c284c39b13 anv: Fix parsing of load ops in VkAttachmentDescription
My original understanding of VkAttachmentDescription::loadOp,
stencilLoadOp was incorrect. Below are all possible combinations:

  VkFormat       | loadOp=clear   stencilLoadOp=clear
  ---------------+---------------------------
  color          | clear-color    ignored
  depth-only     | clear-depth    ignored
  stencil-only   | ignored        clear-stencil
  depth-stencil  | clear-depth    clear-stencil
2015-10-29 10:59:55 -07:00
Jason Ekstrand
8bcba083db anv: Update the README
Adds a note that we support SPIR-V revision 32.  Also, we now support
geometry shaders.
2015-10-28 12:30:34 -07:00
Jason Ekstrand
12feda0c09 Revert "nir/intrinsic: Allow up to four indices"
This reverts commit 5eccd0b4b9.

This was only needed for the store_ssbo_vk_indirect intrinsic
2015-10-27 13:44:14 -07:00
Jason Ekstrand
423e7a55cc Revert "nir/intrinsics: Add new Vulkan load/store intrinsics"
This reverts commit 24bcc89c8f.

Now that we have the new vulkan_resource_index intrinsic, these variants of
the classic UBO/SSBO instrinsics aren't needed.
2015-10-27 13:43:25 -07:00
Jason Ekstrand
a6be53223e anv/nir: Work with the new vulkan_resource_index intrinsic 2015-10-27 13:42:51 -07:00
Jason Ekstrand
3d44b3aaa6 nir/spirv: Use the new vulkan_resource_index intrinsic
This is instead of using the _vk versions of UBO/SSBO load/store intrinsics
2015-10-27 13:41:59 -07:00
Jason Ekstrand
800a9706f0 nir: Add a vulkan_resource_index intrinsic 2015-10-27 13:41:08 -07:00
Jason Ekstrand
37b6afb3d9 Add a todo comment about intput_slots_valid in the FS shader key 2015-10-26 16:25:02 -07:00
Jason Ekstrand
ab6ed2e1ac anv/gen8_pipeline: Emit a real 3DSTATE_SBE_SWIZ packet 2015-10-26 16:25:02 -07:00
Jason Ekstrand
9006e555ce anv/pipeline: Bump the size of the pipeline batch to accomodate GS
The 1k batch size wasn't big enough for a full pipeline setup including
geometry shaders.  Some day we should make it dynamic.
2015-10-23 16:50:31 -07:00
Jason Ekstrand
4c59ee808f anv/gen8_pipeline: Various 3DSTATE_GS fixes 2015-10-23 16:49:26 -07:00
Jason Ekstrand
8aba8cf513 anv/pipeline: Use separate-shader 2015-10-23 10:53:00 -07:00
Jason Ekstrand
760c4b894d anv/pipeline: Pull separate_shader from NIR for vue map setup 2015-10-23 10:48:52 -07:00
Jason Ekstrand
ee8c67abe8 nir/spirv: Add support for builtins in arrays 2015-10-22 17:58:20 -07:00
Jason Ekstrand
9fe907ec79 nir/spirv: Make the builtins array distinguish between in and out 2015-10-22 17:54:24 -07:00
Jason Ekstrand
d11ea76168 nir/spirv: Make vtn_get_builtin_location smarter
Instead of just stomping on the mode, it now validates asserts that the
previously set mode is correct and only changes it if needed.  We need to
do this because, in geometry shaders, there are some builtins that can be
either an input or an output depending on context.  We can get that
information from the SPIR-V source but we can't throw it away.
2015-10-22 17:45:41 -07:00
Jason Ekstrand
9abef3e817 nir/spirv: Make get_builtin_variable take a nir_variable_mode
We'll want this in a moment for validation but, for now, it just gets
stompped by get_builtin_variable.
2015-10-22 17:28:25 -07:00
Jason Ekstrand
2ce6636c75 nir/spirv: Remove the vtn_type argument from _vtn_variable_load/store
Now that builtins are handled in deref chains, we don't really need this
anymore.
2015-10-22 16:56:42 -07:00
Jason Ekstrand
f23d951083 nir/validate: Add better validation of load/store types 2015-10-22 16:53:01 -07:00
Jason Ekstrand
82c579e314 anv/gen8: Set the correct maximum number of GS threads
This equation was pulled from mesa gen8_gs_state.c
2015-10-21 21:51:18 -07:00
Jason Ekstrand
d0e8c78407 anv/pipeline: set the gs_vertex_count in compile_gs
This was missed in the initial enabling commit.
2015-10-21 21:50:47 -07:00
Jason Ekstrand
8af2a09956 anv/pipeline: Make the has_push_constants computation more accurate
The computation used to only look for uniforms that weren't samplers.  Now
it also filters out arrays of samplers.
2015-10-21 21:50:16 -07:00
Jason Ekstrand
0329a252bd nir/spirv: Add defaults for GS input/output primitive types
These are supposed to be specified in the SPIR-V source as SpvExecutionMode
enums but glslang isn't giving them to us.  A bug has been filed:

https://github.com/KhronosGroup/glslang/issues/84
2015-10-21 21:46:22 -07:00
Jason Ekstrand
4032549885 i965/vec4: Handle returns at the end of functions 2015-10-21 20:42:23 -07:00
Jason Ekstrand
5f29dacda2 i965: Move get_hw_prim_for_gl_prim to brw_util.c 2015-10-21 20:40:28 -07:00
Jason Ekstrand
ea23cb3543 nir/spirv: Add capabilities and decorations for basic geometry shaders 2015-10-21 20:36:25 -07:00
Jason Ekstrand
d538fe849d anv/pipeline: Add back basic geometry shader support
Now that we've done the refactoring upstream, it's much easier to to get
hooked up.  We haven't tested things well enough to know that we're setting
up the GPU state correctly for them yet but at least we can compile them now.
2015-10-21 18:45:48 -07:00
Jason Ekstrand
164abff0c0 nir/spirv: Add support for more CS system values 2015-10-21 18:39:06 -07:00
Jason Ekstrand
5790ee2bbb nir/spirv: Add support for various barrier type instructions 2015-10-21 18:17:11 -07:00
Jason Ekstrand
3d35e4361f Fix a couple of dereferences 2015-10-21 18:16:50 -07:00
Jason Ekstrand
55a7ee730c spirv/nir: Add more stage asserts 2015-10-21 18:00:05 -07:00
Jason Ekstrand
27393c8630 nir/spirv: Add support for GS metadata 2015-10-21 17:58:34 -07:00
Jason Ekstrand
a8ffd6e72c nir/gather_info: Add more info for geometry shaders 2015-10-21 17:42:47 -07:00
Jason Ekstrand
fed60e3c73 Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-10-21 17:40:13 -07:00
Chad Versace
0ab926dfbf anv: Don't teardown uninitialized anv_physical_device
If the user called vkDestroyDevice but never called
vkEnumeratePhysicalDevices, then the driver tried to ralloc_free() an
unitialized anv_physical_device.

Fixes test 'dEQP-VK.api.device_init.create_instance_name_version'.
2015-10-21 11:55:37 -07:00
Jason Ekstrand
c8572d0f9c anv/pipeline: Remove a redundant line
We set compute_sample_id based on multisample state two lines below.
2015-10-20 16:02:03 -07:00
Jason Ekstrand
72d99f8a40 anv/pipeline: Update a comment 2015-10-20 16:00:55 -07:00
Jason Ekstrand
27d868500a anv/pipeline: Set key->render_to_fbo to false for fragment shaaders
Vulkan uses the upper-left convention.  This is the same as DX one and what
our hardware does.  We had it flipped around.
2015-10-20 15:37:16 -07:00
Jason Ekstrand
59bae36ffb nir/spirv: Fix a typo 2015-10-20 15:35:13 -07:00
Jason Ekstrand
44b22ca441 nir/spirv: Handle SpvExecutionMode 2015-10-20 15:23:56 -07:00
Jason Ekstrand
a71e614d33 anv: Completely rework shader compilation
Now that we have a decent interface in upstream mesa, we can get rid of all
our hacks.  As of this commit, we no longer use any fake GL state objects
and all of shader compilation is moved into anv_pipeline.c.  This should
make way for actually implementing a shader cache one of these days.

As a nice side-benifit, this commit also gains us an extra 300 passing CTS
tests because we're actually filling out the texture swizzle information
for vertex shaders.
2015-10-20 13:02:03 -07:00
Jason Ekstrand
2d9e899e35 nir: Add a pass to gather info from the shader
This pass fills out a bunch of the fields in nir_shader_info by inspecting
the shader.
2015-10-20 13:02:03 -07:00
Jason Ekstrand
6fb4469588 anv: Move the brw_compiler from anv_compiler to physical_device 2015-10-20 13:02:03 -07:00
Jason Ekstrand
9e3615cc7d i965: Move brw_compiler_create to brw_compiler.h 2015-10-20 13:02:03 -07:00
Jason Ekstrand
bf6407079b i965: Split process_nir into two haves; pre- and post- 2015-10-20 13:02:03 -07:00
Jason Ekstrand
611ace6861 anv/compiler: Remove more pre-SNB shader key setup 2015-10-20 13:02:03 -07:00
Jason Ekstrand
b3a344db30 anv/compiler: Get rid of GS support.
The geometry shader support is currently completely untested.  As I go
through and re-factor the compiler, I'd rather not refactor dead code that
I don't have a way to know if I broke.  Let's just remove it for now.  We
can put it back in easily enough later and then we'll do it properly.
2015-10-20 13:02:03 -07:00
Jason Ekstrand
5f5224f256 anv/meta: Use the actual render pass for creating blit pipelines 2015-10-20 13:02:02 -07:00
Chad Versace
4d4e559b6a vk: Use consistent names for anv_cmd_state dirty bits
Prefix all anv_cmd_state dirty bit tokens with ANV_CMD_DIRTY. For
example:

    old                           -> new
    ANV_DYNAMIC_VIEWPORT_DIRTY    -> ANV_CMD_DIRTY_DYNAMIC_VIEWPORT
    ANV_CMD_BUFFER_PIPELINE_DIRTY -> ANV_CMD_DIRTY_PIPELINE

Change type of anv_cmd_state::dirty and ::compute_dirty from uint32_t to
the self-documenting type anv_cmd_dirty_mask_t.
2015-10-20 11:40:24 -07:00
Chad Versace
2484d1a01f anv/pipeline: Fix requirement for depthstencil state
The Vulkan spec allows VkGraphicsPipelineCreateInfo::pDepthStencilState
to be NULL when the pipeline's subpass contains no depthstencil
attachment (see spec quote below). anv_pipeline_init_dynamic_state()
required it unconditionally.

This path fixes anv_pipeline_init_dynamic_state() to access
pDepthStencilState only when there is a depthstencil attachment.

From the Vulkan spec (20 Oct 2015, git-aa308cb)

   pDepthStencilState [...] may only be NULL if renderPass and subpass
   specify a subpass that has no depth/stencil attachment.
2015-10-20 11:29:16 -07:00
Chad Versace
b51468b519 anv/pipeline: Validate VkGraphicsPipelineCreateInfo
The Vulkan spec (20 Oct 2015, git-aa308cb) states that some fields of
VkGraphicsPipelineCreateInfo are required under certain conditions.
Add a new function, anv_pipeline_validate_create_info() that asserts the
requirements hold.

The assertions helped me discover bugs in Crucible and anv_meta.c.
2015-10-20 10:55:54 -07:00
Chad Versace
855180b3d9 anv: Define anv_validate macro
If a block of code is annotated with anv_validate, then the block runs
only in debug builds.
2015-10-20 10:55:54 -07:00
Chad Versace
81f8b82fc8 vk/meta: Add required renderpass to pipeline
The Vulkan spec (20 Oct 2015, git-aa308cb) requires that
VkGraphicsPipelineCreateInfo::renderPass be a valid handle. To satisfy
that, define a static dummy render pass used for all meta operations.
2015-10-20 10:48:26 -07:00
Chad Versace
0d84a0d58b vk/meta: Add required multisample state to pipeline
The Vulkan spec (20 Oct 2015, git-aa308cb) requires that
VkGraphicsPipelineCreateInfo::pMultisampleState not be NULL.
2015-10-20 10:48:09 -07:00
Jason Ekstrand
60e8439237 anv/compiler: Remove irrelevant wm key setup
Most of this applies to Iron Lake and prior only.  While we're at it, we
get rid of the legacy GL shading model code.
2015-10-19 17:00:26 -07:00
Jason Ekstrand
27ca9ca4e1 anv/compiler: Get rid of legacy shader key setup
Most of the shader key setup we did was for pre-Sandybridge and the stuff
for SNB+ wasn't in the key setup.  That stuff still isn't there but at
least we've left ourselves notes for now.
2015-10-19 16:45:11 -07:00
Jason Ekstrand
661d0db077 anv/compiler: Delete legacy clipping code
This is a Vulkan driver.  We don't need legacy clipping stuff and, even if
we did, we don't plan on supporting pre-Sandybridge anyway.
2015-10-19 16:26:16 -07:00
Jason Ekstrand
fba55b711e anv/compiler: Remove unneeded wm prog data setup
As of upstream mesa changes, brw_compile_fs does this for us so there's no
need to have the code in the Vulkan driver anymore.
2015-10-19 16:17:41 -07:00
Jason Ekstrand
12c30c9498 nir/spirv: Use the new nir_variable helpers 2015-10-19 16:08:23 -07:00
Jason Ekstrand
7e6959402d nir/spirv: Handle builtins in OpAccessChain
Previously, we were trying to handle them later when loading.  However, at
that point, you've already lost information and it's harder to handle
certain corner-cases.  In particular, if you have a shader that does

gl_PerVertex.gl_Position.x = foo

we have trouble because we see the .x and we don't know that we're in
gl_Position.  If we, instead, handle it in OpAccessChain, we have all the
information we need and we can silently re-direct it to the appropreate
variable.  This also lets us delete some code which is a nice side-effect.
2015-10-19 15:50:45 -07:00
Jason Ekstrand
958fc04dc5 Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-10-19 14:14:21 -07:00
Jason Ekstrand
995d9c4ac7 anv/pipeline: Remove the ViewportState finishme
We should be doing everything we need to with the viewport state
2015-10-17 10:35:29 -07:00
Jason Ekstrand
3e47e34036 anv: Add support for immutable descriptors 2015-10-17 08:17:00 -07:00
Jason Ekstrand
7010fe61c8 anv: Add facilities for dumping an image to a file
The ability to dump an arbitrary miplevel or array slice of an anv_image to
a file is very useful for debugging.  Nothing inside of the driver calls
this right now, but it's very useful to call from GDB.
2015-10-16 20:03:06 -07:00
Jason Ekstrand
368e703a01 anv/pipeline: Rework dynamic state handling
Aparently, we had the dynamic state array in the pipeline backwards.
Instead of enabling the bits in the pipeline, it disables them and marks
them as "dynamic".
2015-10-16 16:30:02 -07:00
Jason Ekstrand
8ed23654c9 nir/spirv: Fix handling of vector component selects via OpAccessChain
When we get to the end of the _vtn_load/store_varaible recursion, we may
have one link left in the deref chain if there is a vector component select
on the end.  In this case, we need to truncate the deref chain early so
that, when we make the copy for the load, we don't get the extra deref.
The final deref will be handled by the vector extract/insert that comes
later.
2015-10-15 21:18:57 -07:00
Jason Ekstrand
2552df41a1 anv/cmd_buffer: Reset the command buffer in BeginCommandBuffer 2015-10-15 18:28:00 -07:00
Jason Ekstrand
298d031642 anv/batch_chain: Add some sanity-check asserts for relocations 2015-10-15 17:24:32 -07:00
Jason Ekstrand
3130851add anv/x11: Only advertise VK_FORMAT_B8R8G8A8_UNORM
The others don't work at the moment so we shouldn't be advertising them.
2015-10-15 16:16:17 -07:00
Jason Ekstrand
f5eec407ea anv/x11: Treat the pPlatformWindow as a xcb_window_t* instead of xcb_window_t 2015-10-15 15:38:20 -07:00
Jason Ekstrand
03952b1513 anv/device: Add support for combined image and sampler descriptors 2015-10-15 15:17:27 -07:00
Jason Ekstrand
b459b3d82c anv/device: Remove some unneeded anv_finishmes 2015-10-15 15:17:07 -07:00
Jason Ekstrand
ba20569626 anv/device: Make the CreateSemaphore stub return success 2015-10-15 14:34:07 -07:00
Jason Ekstrand
bed7d1e03c anv: Add support for BufferInfo in descriptor sets 2015-10-15 13:45:53 -07:00
Jason Ekstrand
6dc4cad994 anv/cmd_buffer: Add an alloc_surface_state helper 2015-10-15 13:45:07 -07:00
Jason Ekstrand
896c1c65d6 anv: Get rid of the descriptor_set_binding struct
We no longer need it as we have a better way to deal with dynamic offsets.
2015-10-14 19:02:29 -07:00
Jason Ekstrand
42683e3757 anv: Get rid of backend compiler hacks for descriptor sets
Now that we have anv_nir_apply_pipeline_layout, we can hand the backend
compiler intrinsics and texture instructions that use a flat buffer index
just like it wants.  There's no longer any reason for any of these hacks.
2015-10-14 18:38:33 -07:00
Jason Ekstrand
da994f4b7e anv/nir: Rewrite apply_dynamic_offsets to handle the new vk intrinsics 2015-10-14 18:38:33 -07:00
Jason Ekstrand
9c9b7d79c8 anv/nir: Add a pass for applying a applying a pipeline layout to a shader
This new pass lowers the _vk intrinsics which take a (set, binding, index)
tripple to the single-index non-vk intrinsics based on the pipeline layout.
2015-10-14 18:38:33 -07:00
Jason Ekstrand
de608153fb nir/spirv: Use the Vulkan ubo intrinsics 2015-10-14 18:38:33 -07:00
Jason Ekstrand
24bcc89c8f nir/intrinsics: Add new Vulkan load/store intrinsics 2015-10-14 18:38:33 -07:00
Jason Ekstrand
5eccd0b4b9 nir/intrinsic: Allow up to four indices 2015-10-14 18:38:33 -07:00
Jason Ekstrand
b37c38c1ca anv: Completely rework descriptor set layouts
This patch reworks a bunch of stuff in the way we do descriptor set
layouts.  Our previous approach had a couple of problems.  First, it was
based on a misunderstanding of arrays in descriptor sets.  Second, it
didn't properly handle descriptor sets where some bindings were missing
stages.  The new apporach should be correct and also makes some operations,
particularly those on the hot-path, a bit easier.

We use the descriptor set layout for four things:

 1) To determine the map from bindings to the actual flattened descriptor
    set in vkUpdateDescriptorSets().

 2) To determine the descriptor <-> binding table entry mapping to use in
    anv_cmd_buffer_flush_descriptor_sets().

 3) To determine the mappings of dynamic indices.

 4) To determine the (set, binding, array index) -> binding table entry
    mapping inside of shaders.

The new approach is directly taylored towards these operations.
2015-10-14 18:38:33 -07:00
Chad Versace
7965fe7da6 vk: Add README
Requested by developers outside Intel.

During the driver's pre-release development, let's make the README easy
to find for external experimenters. Keep it at the top of the source
tree.
2015-10-14 13:58:29 -07:00
Jason Ekstrand
d2d8945eb8 nir/spirv: Fix a bug in indirect OpAccessChain handling 2015-10-13 20:00:18 -07:00
Jason Ekstrand
db5a5fcd18 anv/image: Add a basic implementation of GetImageSubresourceLayout 2015-10-13 20:00:17 -07:00
Jason Ekstrand
28ed02588a anv/formats: Use the surface_format_info struct from brw_surface_formats.h
The surface_format_info struct changed in mesa but the copied-and-pasted
version didn't get updated on the last mesa master merge.  This both fixes
the bug and should prevent this in the future.
2015-10-13 15:23:24 -07:00
Jason Ekstrand
accbf178eb i965/surface_formats: Pull the surface_format_info struct into a header 2015-10-13 15:23:24 -07:00
Jason Ekstrand
fd2ec1c8ad anv/x11: Do something sensible if get_geometry fails in GetSurfaceProperties 2015-10-13 15:10:40 -07:00
Jason Ekstrand
c31f926726 anv/wsi: Add the GetSurfacePresentModesKHR stub
Support has existed in the X11 and Wayland backends for a while but,
somehow, the entrypoint got missed in the API shuffle.
2015-10-13 11:47:03 -07:00
Jason Ekstrand
e21ecb841c anv: Declare/validate the correct API version 2015-10-12 18:25:19 -07:00
Jason Ekstrand
0689a0f0f3 anv/device: Return VK_SUCCESS after setting pCount in QueueFamilyProperties 2015-10-10 15:25:08 -07:00
Kristian Høgsberg Kristensen
fc2a66cfcd Merge ../mesa into vulkan 2015-10-08 17:20:24 -07:00
Jason Ekstrand
48a87f4ba0 anv/queue: Get rid of the serial
This was a remnant of the object tagging implementation we had at one
point.  We haven't used it for a long time so there's no good reason to
keep it around.
2015-10-08 12:16:00 -07:00
Jason Ekstrand
8984559892 vk/0.170.2: Update to the new VK_EXT_KHR_swapchain extensions 2015-10-08 12:11:18 -07:00
Chad Versace
2228ec0112 Merge branch 'vulkan-0.170.2' into vulkan
This updates the API from 0.138.2 to 0.170.2,
and updates SPIR-V to v32.
2015-10-07 11:49:07 -07:00
Chad Versace
7fa98ab182 vk: Remove temporary vulkan headers
Remove vulkan-0.138.2.h and vulkan-0.170.2.h. Their purpose was to aid
the header update to 0.170.2.
2015-10-07 11:45:48 -07:00
Chad Versace
2f1ca71360 vk/0.170.2: Bump header version
The header is now fully updated.
2015-10-07 11:44:44 -07:00
Chad Versace
c2f94e3a0d vk/0.170.2: Update C++ errata and typedefs 2015-10-07 11:44:33 -07:00
Chad Versace
0ca3c8480d vk/0.170.2: Update remaining enums 2015-10-07 11:39:49 -07:00
Chad Versace
f9c948ed00 vk/0.170.2: Update VkResult
Version 0.170.2 removes most of the error enums. In many cases, I had to
replace an error with a less accurate (or even incorrect) one.
In other cases, the error path is replaced with an assertion.
2015-10-07 11:36:51 -07:00
Chad Versace
8dee32e71f vk/0.170: Update VkDescriptorInfo
Ignore the new bufferInfo field with a anv_finishme.
2015-10-07 10:58:55 -07:00
Chad Versace
92e7bd3610 vk/0.170.2: Update vkCreateDescriptorPool
Nothing to do. In Mesa the pool is a stub.
2015-10-07 10:47:55 -07:00
Chad Versace
a3bc07c23b vk/0.170.2: Update VkAttachmentDescription 2015-10-07 10:44:40 -07:00
Chad Versace
82259f88dd vk/0.170.2: Update VkImageViewCreateInfo 2015-10-07 10:43:44 -07:00
Chad Versace
f4295b3cca vk/0.170.2: Update VkImageCreateInfo 2015-10-07 10:43:17 -07:00
Chad Versace
d48e71ce55 vk/0.170.2: Update VkPhysicalDeviceProperties 2015-10-07 10:36:46 -07:00
Chad Versace
81e1dcc42c vk/0.170.2: Update VkImageFormatProperties 2015-10-07 10:28:30 -07:00
Chad Versace
98c2bb6917 vk/0.170.2: Update VkFormatProperties 2015-10-07 10:15:59 -07:00
Chad Versace
545f5cc6e1 vk/0.170.2: Update VkPhysicalDeviceFeatures 2015-10-07 10:09:39 -07:00
Chad Versace
033a37f591 vk/0.170.2: Update VkPhysicalDeviceLimits 2015-10-07 10:09:31 -07:00
Jason Ekstrand
982466aeff anv/device: Remove some #ifdef'd out code
This was a left-over from the dynamic state update.
2015-10-07 09:45:49 -07:00
Jason Ekstrand
010c6efd65 vk/0.170.2: Make vkUpdateDescriptorSets return void 2015-10-07 09:44:53 -07:00
Jason Ekstrand
1a52bc3039 anv/pipeline: Add support for dynamic state in pipelines 2015-10-07 09:40:49 -07:00
Jason Ekstrand
daf68a9465 vk/0.170.2: Switch to the new dynamic state model 2015-10-07 09:40:49 -07:00
Jason Ekstrand
55fcca306b anv: Add a dynamic state data structure and basic helpers 2015-10-07 09:36:27 -07:00
Jason Ekstrand
941a105954 anv/private: Add a typed_memcpy macro
This is amazingly helpful when copying arrays of things around.
2015-10-07 09:36:27 -07:00
Chad Versace
b1c024a932 vk/meta: Fix -Wstrict-prototypes
In C, functions with no arguments require a void argument.
build_nir_clear_fragment_shader() lacked that.

Fixes:
  anv_meta.c:70:1: warning: function declaration isn't a prototype
  [-Wstrict-prototypes]
2015-10-07 09:10:25 -07:00
Chad Versace
6dea1a9ba1 vk/0.170.2: Merge VkAttachmentView into VkImageView 2015-10-07 09:10:25 -07:00
Chad Versace
03dd72279f vk/image: Fix retrieval of anv_surface for depthstencil aspect
If anv_image_get_surface_for_aspect_mask() is given a combined
depthstencil aspect mask, and the image has a stencil surface but no
depth surface, then return the stencil surface.

Hacks on hacks.
2015-10-07 09:10:25 -07:00
Chad Versace
85ff3cfde3 vk: Drop -Wextra
Eliminates lots of warnings due to anv_meta.c's inclusion of nir.h.

I like the extra warnings, and they should probably get fixed. However,
git-grep reveals that no other Mesa directory uses -Wextra. Building
Vulkan produces a lot of compiler warnings from core Mesa headers that
no other Mesa developer sees, and hence no other Mesa developer will
fix.
2015-10-07 07:28:46 -07:00
Chad Versace
24de3d49ea vk: Embed two surface states in anv_image_view
This prepares for merging VkAttachmentView into VkImageView.  The two
surface states are:

   anv_image_view::color_rt_surface_state:
       RENDER_SURFACE_STATE when using image as a color render target.

   anv_image_view::nonrt_surface_state;
       RENDER_SURFACE_STATE when using image as a non render target.

No Crucible regressions.
2015-10-06 21:22:18 -07:00
Chad Versace
37bf120930 vk/pipeline: Emit MSAA finishme only if samples > 1
If samples == 1, then there's nothing for Mesa to do, and the finishme
message is only noise.
2015-10-06 21:22:18 -07:00
Chad Versace
3fc2b1f325 vk: Remove stale finishme for stencil image views
They don't work completely. But they work well enough to satisfy
Crucible.
2015-10-06 21:22:18 -07:00
Chad Versace
44143a1f46 vk: Add anv_image::usage
It's a copy of VkImageCreateInfo::usage. Will be used for the
VkAttachmentView/VkImageView merge.
2015-10-06 21:22:18 -07:00
Chad Versace
cf603714cb vk/meta: Fix usage flags for image-wrapped-buffers
In make_image_for_buffer(), use VK_IMAGE_USAGE_SAMPLED_BIT when
transferring from the buffer and use VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT
when transferring to the buffer.
2015-10-06 21:22:18 -07:00
Chad Versace
d00718104f vk/image: Remove stale anv_asserts for depthstencil attachments
We don't fully handle mipmapped, array depthstencil attachments. But we
handle the well enough for Crucible's miptree tests.
2015-10-06 21:22:18 -07:00
Kristian Høgsberg Kristensen
1d7ef82f4b i965: Delete brw_cs.cpp which was deleted in master 2015-10-06 15:20:19 -07:00
Jason Ekstrand
c272bb58f5 nir/spirv: Better texture handling 2015-10-06 15:10:45 -07:00
Jason Ekstrand
ccea9cc332 nir/spirv: Update to SPIR-V Rev. 32 2015-10-06 14:52:35 -07:00
Jason Ekstrand
89eebd889c vk/0.170.2: Fairly trivial enum shuffling 2015-10-06 14:08:08 -07:00
Jason Ekstrand
1e4263b7d2 vk/0.170.2: s/baseArraySlice/baseArrayLayer/ 2015-10-06 14:08:08 -07:00
Chad Versace
d4446a7e58 vk: Merge anv_attachment_view into anv_image_view
This prepares for merging VkAttachmentView into VkImageView.
2015-10-06 12:13:03 -07:00
Chad Versace
6b5ce5daf5 vk: Update comments for anv_image_view
- Document the extent member. It's the extent of the view's base level.
- s/VkAttachmentView/VkImageView/
2015-10-06 12:12:52 -07:00
Jason Ekstrand
19018c9f13 vk/0.170.2: Add a stage field to ShaderCreateInfo 2015-10-06 10:20:10 -07:00
Jason Ekstrand
cc389b1482 vk/0.170.2: Rename cs to stage in ComputePipelineCreateInfo 2015-10-06 10:11:50 -07:00
Jason Ekstrand
588d40e97a vk/0.170.2: Use ImageSubresourceCopy in ImageResolve 2015-10-06 10:09:47 -07:00
Jason Ekstrand
bd4cde708a vk/0.170.2: Rename fields in VkClearColorValue 2015-10-06 10:07:47 -07:00
Jason Ekstrand
81c7fa8772 vk/0.170.2: Rework blits to use ImageSubresourceCopy 2015-10-06 10:04:04 -07:00
Jason Ekstrand
ba2254aa79 vulkan.h: Move stuff around
This has no functional change but substantially decreases the diff with the
0.170.2 header.
2015-10-06 09:50:04 -07:00
Jason Ekstrand
d1908d2c33 vk/0.170.2: Rework parameters to CmdClearDepthStencil functions 2015-10-06 09:40:39 -07:00
Jason Ekstrand
02a9be31d6 vk/0.170.2: Add the flags parameter to GetPhysicalDeviceImageFormatProperties 2015-10-06 09:37:21 -07:00
Jason Ekstrand
a145acd812 vk/0.170.2: Remove the pCount parameter from AllocDescriptorSets 2015-10-06 09:32:01 -07:00
Jason Ekstrand
8ba684cbad vk/0.170.2: Rename extension and layer query functions 2015-10-06 09:25:03 -07:00
Jason Ekstrand
a6eba403e2 vk/0.170.2: Update to the new queue family properties query 2015-10-05 21:17:12 -07:00
Jason Ekstrand
65964cd49b vk/0.170.2: Re-arrange parameters of vkCmdDraw[Indexed] 2015-10-05 21:10:20 -07:00
Jason Ekstrand
05a26a60c8 vk/0.170.2: Make destructors return void 2015-10-05 20:50:51 -07:00
Jason Ekstrand
460676122f vk/0.170.2: Rename VkClearValue.ds to depthStencil 2015-10-05 20:35:08 -07:00
Jason Ekstrand
8e1ef639b6 vk/0.170.2: Add the subpass field to VkCmdBufferBeginInfo 2015-10-05 20:30:53 -07:00
Jason Ekstrand
757166592e vk/0.170.2: Rename pointer parameters of VkSubpassDescription 2015-10-05 20:26:21 -07:00
Jason Ekstrand
57f500324b vk/0.170.2: Add unnormalizedCoordinates to VkSamplerCreateInfo 2015-10-05 20:17:24 -07:00
Jason Ekstrand
f7c3519aaf vk/0.170.2: Rename VkTexAddress to VkTexAddressMode 2015-10-05 20:15:06 -07:00
Jason Ekstrand
39a19e88a3 vulkan.h: Various cosmetic changes
These don't affect the driver in any way.
2015-10-05 20:06:30 -07:00
Chad Versace
9357062348 vk: Merge anv_*_attachment_view into anv_attachment_view
Remove anv_color_attachment_view and anv_depth_stencil_view, merging
them into anv_attachment_view. This prepares for merging
VkAttachmentView into VkImageView.
2015-10-05 17:46:04 -07:00
Chad Versace
ae30535602 vk: Drop anv_attachment_view::extent
It's duplicated by anv_attachment_view::image_view::extent.
2015-10-05 17:46:04 -07:00
Chad Versace
f0f4dfa9cc vk: Drop anv_surface_view
Push the members of struct anv_surface_view into anv_image_view and
anv_buffer_view, then remove struct anv_surface_view. Observe that
anv_surface_view::range is not needed for anv_image_view, and so was
dropped there.

This prepares for the merge of VkAttachmentView into VkImageView. Remove
the common parent of anv_buffer_view and anv_image_view (that is,
anv_surface_view) will make the merge easier.
2015-10-05 17:46:04 -07:00
Chad Versace
74193a880f vk: Use consistent names for anv_*_view variables
Rename all anv_*_view variables to follow this convention:
    - sview -> anv_surface_view
    - bview -> anv_buffer_view
    - iview -> anv_image_view
    - aview -> anv_attachment_view
    - cview -> anv_color_attachment_view
    - ds_view -> anv_depth_stencil_attachment_view

This clarifies existing code. And it will reduce noise in the upcoming
commits that merge VkAttachmentView into VkImageView.
2015-10-05 17:46:04 -07:00
Chad Versace
ffd051830d vk: Unionize anv_desciptor
For a given struct anv_descriptor, all members are NULL (in which case
the descriptor is empty) or exactly one member is non-NULL.
To make struct anv_descriptor better reflect its set of valid states,
convert the struct into a tagged union.
2015-10-05 17:46:04 -07:00
Chad Versace
63439953d7 vk: Drop dependency on no longer extant header
anv_meta no longer uses GLSL shaders, and the build system no longer
converts them to SPIR-V. So remove anv_meta_spirv_autogen.h from
Makefile.am.

(cherry picked from commit 2fc8122f66)
2015-10-05 17:06:19 -07:00
Chad Versace
2fc8122f66 vk: Drop dependency on no longer extant header
anv_meta no longer uses GLSL shaders, and the build system no longer
converts them to SPIR-V. So remove anv_meta_spirv_autogen.h from
Makefile.am.
2015-10-05 17:04:18 -07:00
Chad Versace
8bf021cf3d vk: Return anv_image_view_info by value
The struct is only 2 bytes. Returning it on the stack is better than
returning a reference into the ELF .data segment.
2015-10-05 13:22:44 -07:00
Chad Versace
4ffb4549e0 vk/image: Document a Vulkan spec requirement for depthstencil
The Vulkan spec (git a511ba2) requires support for some combined depth
stencil formats.
2015-10-05 13:18:44 -07:00
Chad Versace
3530224063 vk: Annotate anv_cmd_state::gen7::index_type
It's the value of 3DSTATE_INDEX_BUFFER.IndexFormat.
2015-10-05 08:58:35 -07:00
Chad Versace
9c93aa9141 vk: Better types for VkShaderStage, VkShaderStageFlags vars
In most places, the variable type was the uninformative uint32_t.
2015-10-05 08:55:09 -07:00
Chad Versace
6317c3144d vk/0.170.2: Drop VK_BUFFER_USAGE_GENERAL 2015-10-05 08:12:59 -07:00
Chad Versace
4744f60e79 vk/0.170.2: Drop enum VkBufferViewType 2015-10-05 08:12:58 -07:00
Chad Versace
7a089bd1a6 vk/0.170.2: Update VkImageSubresourceRange
Replace 'aspect' with 'aspectMask'.
2015-10-05 08:10:57 -07:00
Chad Versace
568654d606 vk/0.170.2: Drop VK_IMAGE_USAGE_GENERAL 2015-10-05 08:09:33 -07:00
Chad Versace
6a40af1b08 vk/0.170.2: Update VkPipelineMultisampleStateCreateInfo 2015-10-04 10:00:25 -07:00
Chad Versace
dd04be491d vk/0.170.2: Update Vk VkPipelineDepthStencilStateCreateInfo
Rename member depthBoundsEnable -> depthBoundsTestEnable.
2015-10-04 09:41:46 -07:00
Chad Versace
8cb2e27c62 vk/0.170.2: Update VkRenderPassBeginInfo
Rename members:
    attachmentCount -> clearValueCount
    pAttachmentClearValues -> pClearValues
2015-10-04 09:26:25 -07:00
Chad Versace
3694518be5 vk/0.170.2: Drop VkBufferViewCreateInfo::viewType 2015-10-04 09:14:57 -07:00
Chad Versace
216d9f248d vk: Copy current header to vulkan-0.138.2.h
While upgrading Mesa to the new 0.170.2 API, it's convenient to have all
three headers available in the tree:
    - vulkan-0.138.2.h, the old one
    - vulkan-0.170.2.h, the new one
    - vulkan.h, the one in transition
2015-10-04 09:09:35 -07:00
Chad Versace
7f18ed4b9f vk: Import header 0.170.2 header LunarG SDK
From the LunarG SDK at tag sdk-0.9.1, import vulkan.h as
vulkan-0.170.2.h. This header is the first provisional header with the
addition of minor fixes.
2015-10-04 09:09:31 -07:00
Jason Ekstrand
09ba0a7c05 Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-10-03 11:32:29 -07:00
Jason Ekstrand
ef56cf7738 Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-10-02 16:52:47 -07:00
Jason Ekstrand
10f97718c3 anv/allocator: Add a sanity assertion in state stream finish.
We assert that the block offset we got while walking the list of blocks is
actually a multiple of the block size.  If something goes wrong and the GPU
decides to stomp on the surface state buffer we can end up getting
corruptions in our list of blocks.  This assertion makes such corruptions a
crash with a meaningful message rather than an infinite loop.
2015-10-02 16:24:42 -07:00
Jason Ekstrand
002e7b0cc3 anv: Remove the GLSL -> SPIR-V scraper/converter
This was very useful to get us up-and-going.  However, now that we can use
NIR directly for meta shaders, we don't need this anymore and we might as
well drop the glslc dependency.
2015-10-02 16:20:04 -07:00
Jason Ekstrand
f5ffb0e0cb anv/meta: Use NIR directly for blit shaders 2015-10-02 16:18:44 -07:00
Jason Ekstrand
7851a4392a anv/meta: Use NIR directly for clear shaders 2015-10-02 16:18:32 -07:00
Jason Ekstrand
add99c4beb anv: Add a back-door for passing NIR shaders directly into the pipeline
This will allow us to use NIR directly for meta operations rather than
having to go through SPIR-V.
2015-10-02 16:16:57 -07:00
Jason Ekstrand
b68805f83c anv: Add some NIR builder helpers
These should all eventually be up-streamed.  However, since they currently
have no upstream users, they would just bitrot there.  We'll keep them
local for the time being.
2015-10-02 16:15:53 -07:00
Jason Ekstrand
c1553653a2 vk/wsi/x11: Send OUT_OF_DATE if the X drawable goes away 2015-10-02 13:44:53 -07:00
Kristian Høgsberg Kristensen
005c8e0106 Merge branch 'master' of ../mesa into vulkan 2015-10-01 14:24:29 -07:00
Jason Ekstrand
337caee910 anv/wsi_x11: Properly report BadDrawable errors to the client 2015-09-28 20:18:41 -07:00
Jason Ekstrand
f06bc45b0c anv/batch_chain: Use the surface state pool for binding tables 2015-09-28 16:01:14 -07:00
Jason Ekstrand
d93f6385a7 anv/batch_chain: Add helpers for fixing up block_pool relocations 2015-09-28 16:01:14 -07:00
Jason Ekstrand
8c00f9ab56 anv/gen8: Do a render cache flush prior to changing state base address 2015-09-28 16:01:14 -07:00
Jason Ekstrand
0e94446b25 anv/device: Use a 4K block size for surface state blocks
We want to start using the surface state block pool for binding tables and
binding tables.  In order to do this, we need to be able to set surface
state base address to the address of a block and surface state base address
has a 4K alignment requriement.
2015-09-28 16:01:01 -07:00
Jason Ekstrand
737e89bc8d anv/meta: Use the dynamic state stream for temporary buffers 2015-09-28 16:01:01 -07:00
Jason Ekstrand
219a1929f7 anv/util: Add helpers for getting the first and last elements of a vector 2015-09-28 16:01:01 -07:00
Jason Ekstrand
95487668df anv/batch_chain: Add a _alloc_binding_table function 2015-09-28 16:01:01 -07:00
Jason Ekstrand
d517de6126 anv: Make anv_state.offset an int32_t
Binding tables will have a negative offset and we need a way to express
that.  Besides, the chances of a state offset being larger than 2 GB is so
remote it's not worth thinking about.
2015-09-28 16:01:01 -07:00
Jason Ekstrand
9ac3dde3a0 anv/wsi_wayland: Fix FIFO mode
Previously, there were a number of things we were doing wrong:

 1) We weren't flushing the wl_display so dead-looping clients weren't
    guaranteed to work.
 2) We were sending the frame event after calling wl_surface.commit() so it
    wasn't getting assigned to the correct frame
 3) We weren't actually setting fifo_ready to false.

Unfortunately, we never noticed because (3) was hiding the other two.  This
commit fixes all three and clients that use FIFO mode are now properly
refresh-rate limited.
2015-09-28 15:58:34 -07:00
Chad Versace
ddcedb979a vk: Implement vkGetPhysicalDeviceImageFormatProperties()
The implementation is incomplete because we lie about
VkImageFormatProperties::maxResourceSize, hardcoding it to UINT32_MAX
for all supported cases.
2015-09-28 11:53:39 -07:00
Chad Versace
9f3122db0e vk: Refactor anv_GetPhysicalDeviceFormatProperties()
Move the bulk of the function body to a new function
anv_physical_device_get_format_properties(). This allows us to reuse the
function when implementing anv_GetPhysicalDeviceImageFormatProperties()
without calling into the public entry point.
2015-09-28 11:53:39 -07:00
Chad Versace
c15ce5c834 vk: Advertise that depthstencil formats support sampling
Let vkGetPhysicalDeviceFormatProperties() set
VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT for tiled depthstencil images.
2015-09-28 11:53:39 -07:00
Jason Ekstrand
4e48f94469 anv/device: Wrap a couple valgrind calls in the VG macro
This fixes the build for systems that don't have valgrind devel packages
installed.
2015-09-28 11:18:52 -07:00
Chad Versace
97636345da vk: Fix vkGetPhysicalDeviceSparseImageFormatProperties()
The driver does not yet support sparse images, so return zero properties for
all formats.
2015-09-28 10:17:48 -07:00
Kristian Høgsberg Kristensen
164f08c255 vk: Add anv_icd.json to .gitignore 2015-09-25 15:16:56 -07:00
Kristian Høgsberg Kristensen
850cfcad3e vk: Also define vk_errorf in non-debug builds 2015-09-25 15:15:37 -07:00
Kristian Høgsberg Kristensen
cf24211d55 vk: Roll back GLSL parser support for vulkan
In the interest of reducing our delta to mesa master, let's undo these
changes now that we only support SPIR-V.
2015-09-25 10:42:07 -07:00
Jason Ekstrand
e9dff5bb99 vk: Add an ICD declaration file 2015-09-24 14:45:58 -07:00
Jason Ekstrand
39cd3783a4 anv: Add support for the ICD loader 2015-09-24 14:45:58 -07:00
Jason Ekstrand
a95f51c1d7 anv: Add a global dispatch table for use in meta operations 2015-09-24 14:45:58 -07:00
Jason Ekstrand
00d18a661f anv/entrypoints: Expose the anv_resolve_entrypoint function 2015-09-24 14:45:58 -07:00
Jason Ekstrand
f5e72695e0 anv/entrypoints: Rename anv_layer to anv_dispatch_table 2015-09-24 14:45:58 -07:00
Jason Ekstrand
913a9b76f7 anv/batch_chain: Remove the current_surface_bo helper
It's no longer used outside anv_batch_chain so we certainly don't need to
be exporting.  Inside anv_batch_chain, it's only used twice and it can be
replaced by a single line so there's really no point.
2015-09-24 08:46:41 -07:00
Jason Ekstrand
bc17f9c9d7 anv/cmd_buffer: Add a helper for getting the surface state base address 2015-09-24 08:42:38 -07:00
Jason Ekstrand
e1a7c721d3 anv/allocator: Don't ever call mremap
This has always been a bit sketchy and neither Kristian nor I have ever
really liked it.
2015-09-24 08:42:14 -07:00
Jason Ekstrand
99e62f5ce8 anv/allocator: Delete the unused center_fd_offset from anv_block_pool 2015-09-24 08:41:56 -07:00
Jason Ekstrand
429665823d anv/allocator: Do a better job of centering bi-directional block pools 2015-09-24 08:41:47 -07:00
Jason Ekstrand
76be58efce anv/batch_chain: Clean up the reloc list swapping code 2015-09-24 08:41:38 -07:00
Jason Ekstrand
041f5ea089 anv/meta: Add location specifiers to meta shaders 2015-09-21 16:21:56 -07:00
Jason Ekstrand
f406b708a5 Merge branch 'nir-spirv' into vulkan 2015-09-17 20:03:40 -07:00
Jason Ekstrand
616db92b01 nir/spirv: Add better location handling
Previously, our location handling was focussed on either no location
(usually implicit 0) or a builting.  Unfortunately, if you gave it a
location, it would blow it away and just not care.  This worked fine with
crucible and our meta shaders but didn't work with the CTS.  The new code
uses the "data.explicit_location" field to denote that it has a "final"
location (usually from a builtin) and, otherwise, the location is
considered to be relative to the base for that shader stage.
2015-09-17 20:02:46 -07:00
Jason Ekstrand
a788e7c659 anv/device: Move mutex initialization to befor block pools 2015-09-17 18:23:21 -07:00
Jason Ekstrand
595e6cacf1 meta: Initial support for packing parameters
Probably incomplete but it should do for now
2015-09-17 18:21:05 -07:00
Jason Ekstrand
d616493953 anv/meta: Pass the depth through the clear vertex shader
It shouldn't matter since we shut off the VS but it's at least clearer.
2015-09-17 18:09:21 -07:00
Jason Ekstrand
3b8aa26b8e anv/formats: Properly report depth-stencil formats 2015-09-17 17:44:20 -07:00
Jason Ekstrand
b5f6889648 vk/device: Don't allow device or instance creation with invalid extensions 2015-09-17 17:44:20 -07:00
Jason Ekstrand
dcf424c98c anv/tests: Add some asserts for data integrity in block_pool_no_free 2015-09-17 17:44:20 -07:00
Jason Ekstrand
5f57ff7e18 anv/allocator: Make the block pool double-ended
This allows us to allocate from either side of the block pool in a
consistent way.  If you use the previous block_pool_alloc function, you
will get offsets from the start of the pool as normal.  If you use the new
block_pool_alloc_back function, you will get a negative index that
corresponds to something in the "back" of the pool.
2015-09-17 17:44:20 -07:00
Jason Ekstrand
15624fcf55 anv/tests: Refactor the block_pool_no_free test
This simply breaks the monotonicity check out into its own function
2015-09-17 17:44:20 -07:00
Jason Ekstrand
55daed947d vk/allocator: Split block_pool_alloc into two functions 2015-09-17 17:44:20 -07:00
Jason Ekstrand
c55fa89251 anv/allocator: Use a signed 32-bit offset for the free list
This has the unfortunate side-effect of making it so that we can't have a
block pool bigger than 1GB.  However, that's unlikely to happen and, for
the sake of bi-directional block pools, we need to negative offsets.
2015-09-17 17:44:20 -07:00
Jason Ekstrand
8c6bc1e85d anv/allocator: Create 2GB memfd up-front for the block pool 2015-09-17 17:44:20 -07:00
Jason Ekstrand
74bf7aa07c anv/allocator: Take the device mutex when growing a block pool
We don't have any locking issues yet because we use the pool size itself as
a mutex in block_pool_alloc to guarantee that only one thread is resizing
at a time.  However, we are about to add support for growing the block pool
at both ends.  This introduces two potential races:

 1) You could have two block_pool_alloc() calls that both try to grow the
    block pool, one from each end.

 2) The relocation handling code will now have to think about not only the
    bo that we use for the block pool but also the offset from the start of
    that bo to the center of the block pool.  It's possible that the block
    pool growing code could race with the relocation handling code and get
    a bo and offset out of sync.

Grabbing the device mutex solves both of these problems.  Thanks to (2), we
can't really do anything more granular.
2015-09-17 17:44:20 -07:00
Jason Ekstrand
222ddac810 anv: Document the index and offset parameters of anv_bo 2015-09-17 17:44:20 -07:00
Chad Versace
85520aa070 vk/image: Remove stale FINISHME for non-2D image views
gen8_image_view_init() now supports 1D, 2D, and 3D image views.
2015-09-14 15:16:57 -07:00
Chad Versace
622a317e4c vk/image: Teach vkCreateImage about layout of 1D surfaces
Calling vkCreateImage() with VK_IMAGE_TYPE_1D now succeeds and computes
the surface layout correctly.
2015-09-14 15:15:12 -07:00
Chad Versace
6221593ff8 vk/meta: Partially implement vkCmdCopy*, vkCmdBlit* for 3D images
Partially implement the below functions for 3D images:
    vkCmdCopyBufferToImage
    vkCmdCopyImageToBuffer
    vkCmdCopyImage
    vkCmdBlitImage

Not all features work, and there is much for performance improvement.
Beware that vkCmdCopyImage and vkCmdBlitImage are untested.  Crucible
proves that vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer works,
though.

Supported:
    - copy regions with z offset

Unsupported:
    - copy regions with extent.depth > 1

Crucible test results on master@d452d2b are:
    pass: func.miptree.r8g8b8a8-unorm.*.view-3d.*
    pass: func.miptree.d32-sfloat.*.view-3d.*
    fail: func.miptree.s8-uint.*.view-3d.*
2015-09-14 14:27:34 -07:00
Chad Versace
0ecafe0285 vk/meta: Rename meta_emit_blit() params
Rename src -> src_view and dest -> dest_view. This reduces noise in the next
patch's diff, which adds new params to the function.
2015-09-14 12:29:51 -07:00
Chad Versace
b659a066e9 vk/gen8: Set RENDER_SURFACE_STATE::RenderTargetViewExtent 2015-09-14 12:29:49 -07:00
Chad Versace
ffa61e1572 vk/gen8: Refactor setting of SURFACE_STATE::Depth
The field's meaning depends on SURFACE_STATE::SurfaceType.
Make that correlation explicit by switching on VkImageType.
For good measure, add some PRM quotes too.
2015-09-14 12:27:05 -07:00
Chad Versace
eed74e3a02 vk: Teach vkCreateImage about layout of 3D surfaces
Calling vkCreateImage() with VK_IMAGE_TYPE_3D now succeeds and computes
the surface layout correctly. However, 3D images do not yet work for
many other Vulkan entrypoints.
2015-09-14 11:04:08 -07:00
Chad Versace
e01d5a0471 vk: Refactor anv_image_make_surface()
Move the code that calculates the layout of 2D surfaces into a switch case.
2015-09-14 11:00:18 -07:00
Jason Ekstrand
8c8ad6dddf vk: Use push constants for dynamic buffers 2015-09-11 15:56:19 -07:00
Jason Ekstrand
2b4a2eb592 vk/compiler: Rework create_params_array 2015-09-11 15:55:54 -07:00
Jason Ekstrand
c3086c54a8 vk/compiler: Add a NIR pass for pushing dynamic buffer offset
This commit just adds the NIR pass but does none of the uniform setup
2015-09-11 15:53:56 -07:00
Jason Ekstrand
7487371056 vk/pipeline_layout: Add dynamic_offset_start and has_dynamic_offsets fields 2015-09-11 15:52:43 -07:00
Jason Ekstrand
de5220c7ce vk/pipeline_layout: Move surface/sampler start from SoA to AoS
This makes more sense to me and it's more consistent with
anv_descriptor_set_layout.
2015-09-11 10:43:55 -07:00
Jason Ekstrand
b908c67816 vk: Rework the push constants data structure
Previously, we simply had a big blob of stuff for "driver constants".  Now,
we have a very specific data structure that contains the driver constants
that we care about.
2015-09-11 10:25:23 -07:00
Jason Ekstrand
fd21f0681a Add the wayland protocol files to .gitignire 2015-09-11 09:29:40 -07:00
Jason Ekstrand
8040dc4ca5 vk/error: Handle ERROR_OUT_OF_DATE_WSI 2015-09-08 12:13:07 -07:00
Jason Ekstrand
060720f0c9 vk/wsi/x11: Actually block on X so we don't re-use busy buffers 2015-09-08 11:51:47 -07:00
Jason Ekstrand
1bee19e023 vk: Add the WSI header files 2015-09-08 10:33:46 -07:00
Jason Ekstrand
2f3de6260d Merge branch 'nir-spirv' into vulkan 2015-09-05 14:12:59 -07:00
Jason Ekstrand
4d73ca3c58 nir/spirv.h: Remove some cruft missed while merging
There were merge conflicts in spirv.h that got missed because they were in
a comment and so it still compiled.  This gets rid of them and we should be
on-par with upstream spirv->nir.
2015-09-05 14:11:40 -07:00
Jason Ekstrand
612b13aeae nir/spirv: Add support for most of the rest of texturing
Assuming this all works, about the only thing left should be some
corner-cases for tg4
2015-09-05 14:10:05 -07:00
Jason Ekstrand
fe786ff67d Merge branch 'nir-spirv' into vulkan 2015-09-05 13:17:53 -07:00
Jason Ekstrand
35fcd37fcf nir/spirv: Handle decorations after assigning variable locations 2015-09-05 13:17:21 -07:00
Jason Ekstrand
87d02f515b Merge branch 'nir-spirv' into vulkan 2015-09-05 09:48:33 -07:00
Jason Ekstrand
9be43ef99c nir/spirv: Handle the MatrixStride member decoration 2015-09-05 09:47:45 -07:00
Jason Ekstrand
01924a03d4 vk: Actually link in wayland libraries
Turns out this was why I had accidentally broken the universe.  Oops...
2015-09-04 20:02:38 -07:00
Jason Ekstrand
2c4ae00db6 vk: Conditionally compile Wayland support
Pulling in libwayland causes undefined symbols in applications that are
linked against vulkan alone.  Ideally, we would like to dlopen a platform
support library or something like that.  For now, this works and should get
crucible running again.
2015-09-04 19:18:52 -07:00
Jason Ekstrand
b3c037f329 vk: Fix size return value handling in a couple plces 2015-09-04 19:05:51 -07:00
Jason Ekstrand
9a95d08ed6 Merge branch 'nir-spirv' into vulkan 2015-09-04 18:54:15 -07:00
Jason Ekstrand
6d5dafd779 nir/spirv/glsl450: Use the correct write mask 2015-09-04 18:50:14 -07:00
Jason Ekstrand
7174d155e9 nir: Add a lower_fdiv option and use it in i965 2015-09-04 18:50:14 -07:00
Jason Ekstrand
f32d16a9f0 nir/spirv: Use the actual GLSL 450 extension header from Khronos 2015-09-04 18:50:14 -07:00
Jason Ekstrand
9e2c13350e nir/spirv: Add support for SpvDecorationColMajor 2015-09-04 18:50:14 -07:00
Jason Ekstrand
f3bdb93a8e nir/types: Allow single-column matrices
This can sometimes be a convenient way to build vectors.
2015-09-04 18:50:14 -07:00
Jason Ekstrand
48e87c0163 vk/wsi: Add Wayland WSI support 2015-09-04 17:55:42 -07:00
Jason Ekstrand
348cb29a20 vk/wsi: Move to a clallback system for the entire WSI implementation
We do this for two reasons: First, because it allows us to simplify WSI and
compiling in/out support for a particular platform is as simple as calling
or not calling the platform-specific init function.  Second, the
implementation gives us a place for a given chunk of the WSI to stash
stuff in the instance.
2015-09-04 17:55:42 -07:00
Jason Ekstrand
06d8fd5881 vk/instance: Expose anv_instance_alloc/free 2015-09-04 17:55:42 -07:00
Jason Ekstrand
c0b97577e8 vk/WSI: Use a callback mechanism instead of explicit switching 2015-09-04 17:55:42 -07:00
Jason Ekstrand
ca3cfbf6f1 vk: Add an initial implementation of the actual Khronos WSI extension
Unfortunately, this is a very large commit and removes the old LunarG WSI
extension.  This is because there are a couple of entrypoints that have the
same name between the two extensions so implementing them both is
impractiacl.

Support is still incomplete, but this is enough to get vkcube up and going
again.
2015-09-04 17:55:42 -07:00
Jason Ekstrand
3d9fbb6575 vk: Add initial support for VK_WSI_swapchain 2015-09-04 17:55:42 -07:00
Jason Ekstrand
beb466ff5b vk: Move anv_x11.c to anv_wsi_x11.c 2015-09-04 17:55:42 -07:00
Jason Ekstrand
9a7600c9b5 vk/device: Use an array for device extensions 2015-09-04 17:55:42 -07:00
Kristian Høgsberg Kristensen
8af3624651 vk: Further reduce diff to master
Now that we don't compile GLSL, we can roll back a few more hacks and
unexport some things from the backend compiler.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-04 16:17:01 -07:00
Kristian Høgsberg Kristensen
7c1d20dc48 vk: Drop GLSL code from anv_compiler.cpp
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 14:02:11 -07:00
Kristian Høgsberg Kristensen
316c8ac53b vk: Assert that the SPIR-V module has the magic number
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 12:27:28 -07:00
Kristian Høgsberg Kristensen
6e35a1f166 vk: Remove various hacks/scaffolding code
Since we switched away from calling brwCreateContext() there's a bit of
hacky support we can now delete.  This reduces our diff to upstream master.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 12:17:13 -07:00
Kristian Høgsberg Kristensen
1d787781ff vk: Fall back to previous gens in entry point resolver
We used to always just do a one-level fallback from genX_* to anv_*
entry points. That worked for gen7 and gen8 where all entry points were
either different or could be made anv_* entry points (eg
anv_CreateDynamicViewportState). We're about to add gen9 and now need to
be able to fall back to gen8 entry points for most things.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:09 -07:00
Kristian Høgsberg Kristensen
c4dbff58d8 vk: Drop redundant gen7_CreateGraphicsPipelines
This is handled by anv_CreateGraphicsPipelines().

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:09 -07:00
Kristian Høgsberg Kristensen
b5e90f3f48 vk: Use vk* entrypoints in meta, not driver_layer pointers
We'll change the dispatch mechanism again in a later commit. Stop using
the driver_layer function pointers and just use the public entry points.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:09 -07:00
Kristian Høgsberg Kristensen
82396a5514 vk: Drop check for I915_PARAM_HAS_EXEC_CONSTANTS
We don't use this kernel feature.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:08 -07:00
Kristian Høgsberg Kristensen
c4b30e7885 vk: Add new vk_errorf that takes a format string
This allows us to annotate error cases in debug builds.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:08 -07:00
Kristian Høgsberg Kristensen
2e346c882d vk: Make vk_error a little more helpful
Print out file and line number and translate the error code to the
symbolic name.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-09-03 11:53:08 -07:00
Chad Versace
0cb26523d3 vk/image: Add PRM reference for QPitch equation
Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-03 11:04:38 -07:00
Chad Versace
28503191f1 vk/meta: Partially fix vkCmdCopyBufferToImage for S8_UINT
Create R8_UINT VkAttachmentView and VkImageView for the stencil data.

This fixes a crash, but the pixels in the destination image are still
incorrect. They are not properly tiled.

Fixes crashes in Crucible tests func.miptree.s8-uint.aspect-stencil.* as
of crucible-7471449. Test results improve 'lost' -> 'fail'.
2015-09-02 11:08:36 -07:00
Jason Ekstrand
be0a4da6a5 vk/meta: Use SPIR-V for shaders
We are also now using glslc for compiling the Vulkan driver like we do in
curcible.
2015-09-01 15:16:06 -07:00
Jason Ekstrand
362ab2d788 vk/compiler: Handle interpolation qualifiers for SPIR-V shaders 2015-09-01 15:15:04 -07:00
Jason Ekstrand
126ade0023 vk/extensions: count needs to be <= number of extensions 2015-09-01 12:28:50 -07:00
Jason Ekstrand
0c2d476935 vk/compiler: Properly reference/delete programs when using SPIR-V 2015-09-01 12:28:50 -07:00
Jason Ekstrand
16ebe883a4 vk/meta: Add a helper for making an image from a buffer 2015-08-31 21:54:38 -07:00
Jason Ekstrand
86c3476668 nir/spirv: Use VERTEX_ID_ZERO_BASE for VertexId
In Vulkan, VertexId and InstanceId will be zero-based and new intrinsics,
VertexIndex and InstanceIndex, will be added for non-zer-based.  See also,
Khronos bug #14255
2015-08-31 17:16:49 -07:00
Jason Ekstrand
6350c97412 Merge remote-tracking branch 'fdo-personal/nir-spirv' into vulkan
From now on, the majority of SPIR-V improvements should happen on the spirv
branch which will also be public.  It will be frequently merged into the
vulkan driver.
2015-08-31 17:14:47 -07:00
Jason Ekstrand
22fdb2f855 nir/spirv: Update to the latest revision 2015-08-31 17:05:23 -07:00
Jason Ekstrand
ce70cae756 nir/builder: Use nir_after_instr to advance the cursor
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node.  Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node.  By just resetting to after_instr, we
avoid all these problems.
2015-08-31 17:05:23 -07:00
Jason Ekstrand
24b0c53231 nir/intrinsics: Move to a two-dimensional binding model for UBO's 2015-08-31 17:05:23 -07:00
Jason Ekstrand
f4608bc530 nir/nir_variable: Add a descriptor set field
We need this for SPIR-V
2015-08-31 17:05:23 -07:00
Jason Ekstrand
85cf2385c5 mesa: Move gl_vert_attrib from mtypes.h to shader_enums.h
It is a shader enum after all...
2015-08-31 17:05:23 -07:00
Jason Ekstrand
de4f379a70 nir/cursor: Add a helper for getting the current block 2015-08-31 17:05:23 -07:00
Connor Abbott
024c49e95e nir/builder: add a nir_fdot() convenience function 2015-08-31 17:05:23 -07:00
Jason Ekstrand
f6a0eff1ba nir: Add a pass to lower outputs to temporary variables
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.
2015-08-31 17:05:23 -07:00
Jason Ekstrand
4956bbaa33 nir/cursor: Add a constructor for the end of a block but before the jump 2015-08-31 16:58:20 -07:00
Connor Abbott
c62be38286 nir/types: add more nir_type_is_xxx() wrappers 2015-08-31 16:58:20 -07:00
Connor Abbott
a1e136711b nir/types: add a helper to transpose a matrix type 2015-08-31 16:58:20 -07:00
Jason Ekstrand
756b00389c nir/spirv: Don't assert that the current block is empty
It's possible that someone will give us SPIR-V code in which someone
needlessly branches to new blocks.  We should handle that ok now.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
fe220ebd37 nir/spirv: Add initial support for samplers 2015-08-31 16:58:20 -07:00
Jason Ekstrand
a992909aae nir/spirv: Move Exp and Log to the list of currently unhandled ALU ops
NIR doesn't have the native opcodes for them anymore
2015-08-31 16:58:20 -07:00
Jason Ekstrand
45963c9c64 nir/types: Add support for sampler types 2015-08-31 16:58:20 -07:00
Jason Ekstrand
2887e68f36 nir/spirv: Make the global constants in spirv.h static
I've been promissed in a bug that this will be fixed in a future version of
the header.  However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
62b094a81c nir/spirv: Handle jump-to-loop in a more general way 2015-08-31 16:58:20 -07:00
Jason Ekstrand
ca51d926fd nir/spirv: Handle boolean uniforms correctly 2015-08-31 16:58:20 -07:00
Jason Ekstrand
b6562bbc30 nir/spirv: Handle control-flow with loops 2015-08-31 16:58:20 -07:00
Jason Ekstrand
4a63761e1d nir/spirv: Set a name on temporary variables 2015-08-31 16:58:20 -07:00
Jason Ekstrand
6fc7911d15 nir/spirv: Use the correct length for copying string literals 2015-08-31 16:58:20 -07:00
Jason Ekstrand
9da6d808be nir/spirv: Make vtn_ssa_value handle constants as well as ssa values 2015-08-31 16:58:20 -07:00
Jason Ekstrand
1feeee9cf4 nir/spirv: Add initial support for GLSL 4.50 builtins 2015-08-31 16:58:20 -07:00
Jason Ekstrand
577c09fdad nir/spirv: Split the core datastructures into a header file 2015-08-31 16:58:20 -07:00
Jason Ekstrand
66fc7f252f nir/spirv: Use the builder for all instructions
We don't actually use it to create all the instructions but we do use it
for insertion always.  This should make things far more consistent for
implementing extended instructions.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
9e03b6724c nir/spirv: Add support for a bunch of ALU operations 2015-08-31 16:58:20 -07:00
Jason Ekstrand
91b3b46d8b nir/spirv: Add support for indirect array accesses 2015-08-31 16:58:20 -07:00
Jason Ekstrand
9197e3b9fc nir/spirv: Explicitly type constants and SSA values 2015-08-31 16:58:20 -07:00
Jason Ekstrand
b7904b8281 nir/spirv: Handle OpBranchConditional
We do control-flow handling as a two-step process.  The first step is to
walk the instructions list and record various information about blocks and
functions.  This is where the acutal nir_function_overload objects get
created.  We also record the start/stop instruction for each block.  Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
d216dcee94 nir/spirv: Add a helper for getting a value as an SSA value 2015-08-31 16:58:20 -07:00
Jason Ekstrand
f36fabb736 nir/spirv: Split instruction handling into preamble and body sections 2015-08-31 16:58:20 -07:00
Jason Ekstrand
7bf4b53f1c nir/spirv: Implement load/store instructiosn 2015-08-31 16:58:20 -07:00
Jason Ekstrand
7d64741a5e nir: Add a helper for getting the tail of a deref chain 2015-08-31 16:58:20 -07:00
Jason Ekstrand
112c607216 nir/spirv: Actaully add variables to the funciton or shader 2015-08-31 16:58:20 -07:00
Jason Ekstrand
4fa1366392 nir/spirv: Add a vtn_untyped_value helper 2015-08-31 16:58:20 -07:00
Jason Ekstrand
e709a4ebb8 nir/spirv: Use vtn_value in the types code and fix a off-by-one error 2015-08-31 16:58:20 -07:00
Jason Ekstrand
67af6c59f2 nir/types: Add an is_vector_or_scalar helper 2015-08-31 16:58:20 -07:00
Jason Ekstrand
5e6c5e3c8e nir/spirv: Add support for deref chains 2015-08-31 16:58:20 -07:00
Jason Ekstrand
366366c7f7 nir/types: Add a scalar type constructor 2015-08-31 16:58:20 -07:00
Jason Ekstrand
befecb3c55 nir/spirv: Add support for OpLabel 2015-08-31 16:58:20 -07:00
Jason Ekstrand
399e962d25 nir/spirv: Add support for declaring functions 2015-08-31 16:58:20 -07:00
Jason Ekstrand
ac4d459aa2 nir/types: Add accessors for function parameter/return types 2015-08-31 16:58:20 -07:00
Jason Ekstrand
3a266a18ae nir/spirv: Add support for declaring variables
Deref chains and variable load/store operations are still missing.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
2494055631 nir/spirv: Add support for constants 2015-08-31 16:58:20 -07:00
Jason Ekstrand
2a023f30a6 nir/spirv: Add basic support for types 2015-08-31 16:58:20 -07:00
Jason Ekstrand
5bb94c9b12 nir/types: Add more helpers for creating types 2015-08-31 16:58:20 -07:00
Jason Ekstrand
53bff3e445 glsl/types: Expose the function_param and struct_field structs to C
Previously, they were hidden behind a #ifdef __cplusplus so C wouldn't find
them.  This commit simpliy moves the #ifdef and adds #ifdef's around
constructors.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
0db3e4dd72 glsl/types: Add support for function types 2015-08-31 16:58:20 -07:00
Jason Ekstrand
1169fcdb05 glsl: Add GLSL_TYPE_FUNCTION to the base types enums 2015-08-31 16:58:20 -07:00
Jason Ekstrand
b79916dacc nir/spirv: Rework the way values are added
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value.  The
caller is then responsible for setting the various fields.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
ac60aba351 nir/spirv: Add stub support for extension instructions 2015-08-31 16:58:20 -07:00
Jason Ekstrand
78eabc6153 REVERT: Add a simple helper program for testing SPIR-V -> NIR translation 2015-08-31 16:58:20 -07:00
Jason Ekstrand
2c585a722d glsl/compiler: Move the error_no_memory stub to standalone_scaffolding.cpp 2015-08-31 16:58:20 -07:00
Jason Ekstrand
b20d9f5643 nir: Add the start of a SPIR-V to NIR translator
At the moment, it can handle the very basics of strings and can ignore
debug instructions.  It also has basic support for decorations.
2015-08-31 16:58:20 -07:00
Jason Ekstrand
9d92b4fd0e nir: Import the revision 30 SPIR-V header from Khronos 2015-08-31 16:58:20 -07:00
Jason Ekstrand
0af4bf4d4b Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-08-31 16:30:07 -07:00
Jason Ekstrand
9f9628e9dd vk/SPIR-V: Pull num_uniform_components out of the NIR shader 2015-08-28 22:31:03 -07:00
Jason Ekstrand
44e6ea74b0 spirv: lower outputs to temporaries 2015-08-28 17:38:41 -07:00
Jason Ekstrand
9cebdd78d8 nir: Add a pass to lower outputs to temporary variables
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.
2015-08-28 17:38:41 -07:00
Jason Ekstrand
5e7c7b2a4e spirv: Only do a block load if you're actually loading a uniform 2015-08-28 16:17:45 -07:00
Jason Ekstrand
98abed2441 spirv: Use VERTEX_ID_ZERO_BASE for vertex id 2015-08-28 16:08:29 -07:00
Jason Ekstrand
dbc3eb5bb4 vk/compiler: Pass the correct is_scalar value to brw_process_nir 2015-08-28 12:13:17 -07:00
Jason Ekstrand
ea56d0cb1d glsl/types: Fix up function type hash table insertion 2015-08-28 12:00:25 -07:00
Chad Versace
a2d15ee698 vk/meta: Support stencil in vkCmdCopyImageToBuffer
At Crucible commit 12e64a4, fixes the
func.depthstencil.stencil-triangles.* tests on Broadwell.
2015-08-28 08:41:21 -07:00
Chad Versace
84cfc08c10 vk/pipeline: Fix crash when the pipeline has no attributes
If there are no attributes, don't emit 3DSTATE_VERTEX_ELEMENTS.
That packet does not allow 0 attributes.
2015-08-28 08:07:15 -07:00
Chad Versace
053d32d2a5 vk/image: Linear stencil buffers are illegal
The hardware requires that stencil buffer memory be W-tiled.

From the Sandybridge PRM:
   This buffer is supported only in Tile W memory.
2015-08-28 08:04:59 -07:00
Chad Versace
14e1d58fb7 vk: Fix stride of stencil buffers
Stencil buffers have strange pitch. The PRM says:

  The pitch must be set to 2x the value computed based on width,
  as the stencil buffer is stored with two rows interleaved.
2015-08-28 08:03:46 -07:00
Chad Versace
31af126229 vk: Program stencil ops in 3DSTATE_WM_DEPTH_STENCIL
The driver ignored the Vulkan stencil, always programming the hardware
stencil op to 0 (STENCILOP_KEEP).
2015-08-28 08:00:56 -07:00
Chad Versace
bff2879abe vk/image: Don't abort when creating stencil image views
When creating a stencil image view, log a FINISHME but don't abort.
We're sooooo close to having this working.
2015-08-28 07:59:59 -07:00
Chad Versace
4f852c76dc vk/meta: Save/restore VkDynamicDepthStencilState 2015-08-28 07:59:29 -07:00
Chad Versace
104c4e5ddf vk/meta: Don't skip clearing when clearing only depth attachment
anv_cmd_buffer_clear_attachments() skipped the clear renderpass if no
color attachments needed to be cleared, even if a depth attachment
needed to be cleared.
2015-08-28 07:58:51 -07:00
Chad Versace
aacb7bb9b6 vk: Add func anv_cmd_buffer_get_depth_stencil_view()
This function removes some duplicated code from
genN_cmd_buffer_emit_depth_stencil().
2015-08-28 07:57:34 -07:00
Chad Versace
641c25dd55 vk: Declare some local variables as const
In anv_cmd_buffer_emit_depth_stencil(), declare 'subpass' and 'fb' as
const.
2015-08-28 07:53:24 -07:00
Chad Versace
c6f19b4248 vk: Don't duplicate anv_depth_stencil_view's surface data
In anv_depth_stencil_view, replace the members
    bo
    depth_offset
    depth_stride
    depth_format
    depth_qpitch
    stencil_offset
    stencil_stride
    stencil_qpitch
with the single member
    const struct anv_image *image

The removed members duplicated data in anv_image::depth_surface and
anv_image::stencil_surface.
2015-08-28 07:52:19 -07:00
Chad Versace
35b0262a2d vk/gen7: Add func gen7_cmd_buffer_emit_depth_stencil()
This patch moves all the GEN7_3DSTATE_DEPTH_BUFFER code from
gen7_cmd_buffer_begin_subpass() into a new function
gen7_cmd_buffer_emit_depth_stencil().
2015-08-28 07:46:16 -07:00
Chad Versace
b2ee317e24 vk: Fix format of anv_depth_stencil_view
The format of the view itself and of the view's image may differ.
Moreover, if the view's format has no depth aspect but the image's
format does, we must not program the depth buffer. Ditto for stencil.
2015-08-28 07:44:32 -07:00
Chad Versace
798acb2464 vk/gen7: Fix gen of emitted packet in gen7_batch_lri()
Emit GEN7_MI_LOAD_REGISTER_IMM, not the GEN8 version.
2015-08-28 07:36:35 -07:00
Chad Versace
4461392343 vk: Remove dummy anv_depth_stencil_view 2015-08-28 07:35:39 -07:00
Chad Versace
941b48e992 vk/image: Let anv_image have one anv_surface per aspect
Split anv_image::primary_surface into two: anv_image::color_surface and
depth_surface.
2015-08-28 07:17:54 -07:00
Jason Ekstrand
c313a989b4 spirv: Bump to the public revision 31 2015-08-27 15:24:04 -07:00
Jason Ekstrand
2a8d1ac958 vk: Update to API version 0.138.2 2015-08-27 11:41:04 -07:00
Jason Ekstrand
4e3ee043c0 vk/gen8: Add support for push constants 2015-08-27 10:25:58 -07:00
Jason Ekstrand
375a65d5de vk/private.h: Handle a NULL bo but valid offset in __gen_combine_address 2015-08-27 10:25:58 -07:00
Jason Ekstrand
c8365c55f5 vk/cmd_buffer: Set the CONSTANTS_REL_GENERAL flag on execbuf
This tells the kernel that the push constant buffers are relative to the
dynamic state base address.
2015-08-27 10:25:58 -07:00
Jason Ekstrand
efc2cce01f HACK: Don't call nir_setup_uniforms
We're doing our own uniform setup and we don't need to call into the entire
GL stack to mess with things.
2015-08-27 10:25:58 -07:00
Jason Ekstrand
33cabeab01 vk/compiler: Add a helper for setting up prog_data->param
This new helper sets it up the way we'll want for handling push constants.
2015-08-27 10:25:16 -07:00
Jason Ekstrand
5446bf352e vk: Add initial API support for setting push constants
This doesn't add support for actually uploading them, it just ensures that
we have and update the shadow copy.
2015-08-26 17:59:15 -07:00
Jason Ekstrand
36134e1050 Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-08-26 11:04:30 -07:00
Jason Ekstrand
74e076bba8 vk/meta: Destroy vertex shaders when setting up clearing 2015-08-25 18:51:26 -07:00
Jason Ekstrand
4bb9915755 vk/gen8: Don't duplicate generic pipeline setup
gen8_graphics_pipeline_create had a bunch of stuff in it that's already set
up by anv_pipeline_init.  The duplication was causing double-initialization
of a state stream and made valgrind very angry.
2015-08-25 18:41:25 -07:00
Jason Ekstrand
9b387b5d3f Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-08-25 18:41:21 -07:00
Kristian Høgsberg Kristensen
5360edcb30 vk/vec4: Use the right constant for offset into a UBO
We were using constant 0, which is the set.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-25 16:14:59 -07:00
Kristian Høgsberg Kristensen
647a60226d vk: Use true/false for RenderCacheReadWriteMode
This field in surface state is a bool, WriteOnlyCache is an enum from
GEN8.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-25 15:58:21 -07:00
Kristian Høgsberg Kristensen
7e5afa75b5 vk: Support descriptor sets and bindings in vec4 ubo loads
Still incomplete, but at least we get the simplest case working.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-25 15:57:12 -07:00
Kristian Høgsberg Kristensen
00e7799c69 vk/gen7: Enable L3 caching for GEN7 MOCS
Do what GL does here.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-25 15:55:56 -07:00
Kristian Høgsberg Kristensen
6a1098b2c2 vk/gen7: Use TILEWALK_XMAJOR for linear surfaces
You wouldn't think the TileWalk mode matters when TiledSurface is
false. However, it has to be TILEWALK_XMAJOR. Make it so.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-25 10:54:13 -07:00
Kristian Høgsberg Kristensen
f1455ffac7 vk: Add gen7 support
With all the previous commits in place, we can now drop in support for
multiple platforms. First up is gen7 (Ivybridge).

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
891995e55b vk: Move 3DSTATE_SBE setup to just before 3DSTATE_PS
This is a more logical place for it, between geometry front end state
and pixel backend state.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
9c752b5b38 vk: Move generic pipeline init to anv_pipeline.c
This logic will be shared between multiple gens.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
3800573fb5 vk: Move gen8 specific state into gen8 sub-structs
This commit moves all occurances of gen8 specific state into a gen8
substruct. This clearly identifies the state as gen8 specific and
prepares for adding gen7 state structs. In the process we also rename
the field names to exactly match the command or state packet name,
without the 3DSTATE prefix, eg:

  3DSTATE_VF -> gen8.vf
  3DSTATE_WM_DEPTH_STENCIL -> gen8.wm_depth_stencil

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
615da3795a vk: Always use a placeholder vertex shader in meta
The clear pipeline didn't have a vertex shader and relied on the clear
shader being hardcoded by the compiler to accept one attribute. This
necessitated a few special cases in the 3DSTATE_VS setup. Instead,
always provide a vertex shader, even if we disable VS dispatch.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
ac738ada7a vk: Trim out irrelevant 0-initialized surface state fields
Many of of these fields aren't used for buffer surfaces, so leave them
out for brevity.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
963a1e35e7 vk: Update generated headers
This adds VALIGN_2 and VALIGN_4 defines for IVB and HSW
RENDER_SURFACE_STATE.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:41 -07:00
Kristian Høgsberg Kristensen
f5275f7eb3 vk: Move anv_color_attachment_view_init() to gen8_state.c
I'd prefer to move anv_CreateAttachmentView() as well, but it's a little
too much generic code to just duplicate for each gen.  For now, we'll
add a anv_color_attachment_view_init() to dispatch to the gen specific
implementation, which we then call from anv_CreateAttachmentView().

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
988341a73c vk: Move anv_CreateImageView to gen8_state.c
We'll probably want to move some code back into a shared init function,
but this gets one GEN8 surface state initialization out of anv_image.c.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
bc568ee992 vk: Make anv_cmd_buffer_begin_subpass() switch on gen
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
8fe74ec45c vk: Add generic wrapper for filling out buffer surface state
We need this for generating surface state on the fly for dynamic buffer
views.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
a2b822185e vk: Add helper for adding surface state reloc
We're going to have to do this differently for earlier gens, so lets do
it in place only.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
e43fc871be vk: Make batch chain code gen-agnostic
Since the extra dword in MI_BATCH_BUFFER_START added in gen8 is at the
end of the struct, we can emit the gen8 packet on all gens as long as we
set the instruction length correctly.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
25ab43ee8c vk: Move vkCmdPipelineBarrier to gen8_cmd_buffer.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
b4ef2302a9 vk: Use helper function for emitting MI_BATCH_BUFFER_START
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
97360ffc6c vk: Use anv_batch_emit() for chaining back to primary batch
We used to use a manual GEN8_MI_BATCH_BUFFER_START_pack() call, but this
refactors the code to use anv_batch_emit();

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
cff717c649 vk: Downgrade state packet to gen7 where they're common
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
64045eebfb vk: Reorder gen8 specific code into three new files
We'll organize gen specific code in three files per gen: pipeline,
cmd_buffer and state, eg:

    gen8_cmd_buffer.c
    gen8_pipeline.c
    gen8_state.c

where gen8_cmd_buffer.c holds all vkCmd* entry points, gne8_pipeline.c
all gen specific code related to pipeline building and remaining state
code (sampler, surface state, dynamic state) in gen8_state.c.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
9f0bb5977b vk: Move gen8_CmdBindIndexBuffer() to anv_gen8.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
a7649b2869 vk: Move gen8_cmd_buffer_emit_state_base_address() to anv_gen8.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
130db30771 vk: Move gen8 specific parts of queries to anv_gen8.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
98126c021f vk: Move dynamic depth stenctil to anv_gen8.c 2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
0bcf85d79f vk: Move pipeline creation to anv_gen8.c 2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
ef0ab62486 vk: Move anv_CreateSampler to anv_gen8.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
fb428727e0 vk: Move anv_CreateBufferView to anv_gen8.c
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
74556b076a vk: Add new anv_gen8.c and move CreateDynamicRasterState there
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-24 13:45:40 -07:00
Kristian Høgsberg Kristensen
ee9788973f vk: Implement multi-gen dispatch mechanism 2015-08-24 13:45:39 -07:00
Chad Versace
c4e7ed9163 vk/meta: Implement depth clears
Fixes Crucible test
func.depthstencil.basic-depth.clear-1.0.op-greater.
2015-08-20 10:25:05 -07:00
Chad Versace
0db3d67a14 vk: Cache each render pass's number of clear ops
During vkCreateRenderPass, count the number of clear ops and store them
in new members of anv_render_pass:

    uint32_t num_color_clear_attachments
    bool has_depth_clear_attachment
    bool has_stencil_clear_attachment

Cacheing these 8 bytes (including padding) reduces the number of times
that anv_cmd_buffer_clear_attachments needs to loop over the pass's
attachments.
2015-08-20 10:25:04 -07:00
Chad Versace
2387219101 vk: Use temp var in vkCreateRenderPass's attachment loop
Store the attachment in a temporary variable and
s/pass->attachments[i]/att/ .
2015-08-20 10:25:04 -07:00
Chad Versace
1c24a191cd vk: Improve memory locality of anv_render_pass
Allocate the pass's array of attachments, anv_render_pass::attachments,
in the same allocation as the pass itself.
2015-08-20 09:31:58 -07:00
Chad Versace
4eaf90effb vk: Unharcode an argument to sizeof
s/struct anv_subpass/pass->subpasses[0])/
2015-08-20 09:31:58 -07:00
Chad Versace
44ef4484c8 vk/meta: Add Z coord to clear vertices
For now, the Z coordinate is always 0.0. Will later be used for depth
clears.
2015-08-20 09:31:12 -07:00
Chad Versace
4aef5c62cd vk/meta: Restore all saved state in anv_cmd_buffer_restore()
anv_cmd_buffer_restore() did not restore the old
VkDynamicColorBlendState.
2015-08-20 09:30:34 -07:00
Chad Versace
9f908fcbde vk/meta: Use consistent names and types in anv_saved_state
In struct anv_saved_state, each member's type was a pointer to an Anvil
struct and each member's name was prefixed with "old" except cb_state,
which was a Vulkan handle whose name lacked "old".
2015-08-20 09:29:41 -07:00
Neil Roberts
49d9e89d00 Add mesa.icd to the .gitignore
Since 4d7e0fa8c7 this file is generated by the configure script.
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

(cherry picked from commit 885762e182)
2015-08-19 14:12:31 -07:00
Chad Versace
bd0aab9a58 vk/meta: Fix dest format of vkCmdCopyImage
The source image's format was incorrectly used for both the source view
and destination view. For vkCmdCopyImage to correctly translate formats,
the destination view's format must be that of the destination image's.
2015-08-18 12:44:06 -07:00
Chad Versace
b0875aa911 vk: Assert that swap chain format is a color format 2015-08-18 12:43:57 -07:00
Chad Versace
d52822541e vk/image: Don't set anv_surface_view::offset twice
It was set twice a few lines apart, and the second setting always
overrode the first.
2015-08-18 11:48:50 -07:00
Chad Versace
e7d3a5df5a vk/meta: Use anv_format_is_color()
That is, replace !anv_format_is_depth_or_stencil() with
anv_format_is_color(). That conveys the meaning better.
2015-08-18 11:48:48 -07:00
Chad Versace
50f7bf70da vk: Add anv_format_is_color() 2015-08-18 11:48:46 -07:00
Chad Versace
6ff95bba8a vk: Add anv_format reference to anv_render_pass_attachment
Change type of anv_render_pass_attachment::format from VkFormat to const
struct anv_format*.  This elimiates the repetitive lookups into the
VkFormat -> anv_format table when looping over attachments during
anv_cmd_buffer_clear_attachments().
2015-08-17 14:08:55 -07:00
Chad Versace
5a6b2e6df0 vk/image: Simplify stencil case for anv_image_create()
Stop creating a temporary VkImageCreateInfo with overriden
format=VK_FORMAT_S8_UINT. Instead, just pass the format override
directly to anv_image_make_surface().
2015-08-17 14:08:55 -07:00
Chad Versace
a9c36daa83 vk/formats: Add global pointer to anv_format for S8_UINT
Stencil formats are often a special case. To reduce the number of lookups
into the VkFormat-to-anv_format translation table when working with
stencil, expose the table's entry for VK_FORMAT_S8_UINT as global
variable anv_format_s8_uint.
2015-08-17 14:08:55 -07:00
Chad Versace
60c4ac57f2 vk: Add anv_format reference t anv_surface_view
Change type of anv_surface_view::format from VkFormat to const struct
anv_format*. This reduces the number of lookups in the VkFormat ->
anv_format table.
2015-08-17 14:08:55 -07:00
Chad Versace
c11094ec9a vk: Pass anv_format to anv_fill_buffer_surface_state()
This moves the translation of VkFormat to anv_format from
anv_fill_buffer_surface_state() to its caller.

A prep commit to reduce more VkFormat -> anv_format translations.
2015-08-17 14:08:55 -07:00
Chad Versace
ded736f16a vk: Add anv_format reference to anv_image
Change type of anv_image::format from VkFormat to const struct
anv_format*. This reduces the number of lookups in the VkFormat ->
anv_format table.
2015-08-17 14:08:55 -07:00
Chad Versace
4ae42c83ec vk: Store the original VkFormat in anv_format
Store the original VkFormat as anv_format::vk_format. This will be used
to reduce format indirection, such as lookups into the VkFormat ->
anv_format translation table.
2015-08-17 14:07:44 -07:00
Jason Ekstrand
e39e1f4d24 vk: Update .gitignore for the autogenerated spirv changes 2015-08-17 11:47:25 -07:00
Kristian Høgsberg Kristensen
aac6f7c3bb vk: Drop aub dumper and PCI ID override feature
These are now available in intel_aubdump from intel-gpu-tools.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-17 11:41:19 -07:00
Kristian Høgsberg Kristensen
6d09d0644b vk: Use anv_image_create() for creating dmabuf VkImage
We need to make sure we use the VkImage infrastructure for creating
dmabuf images.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-17 11:41:19 -07:00
Jason Ekstrand
0deae66eb1 vk: Add an _autogen suffix autogenerated spirv file names
This prevents make from stomping on nir_spirv.h
2015-08-17 11:40:16 -07:00
Jason Ekstrand
6a7ca4ef2c Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-08-17 11:25:03 -07:00
Jason Ekstrand
b4c02253c4 vk: Add four unit tests for our lock-free data-structures 2015-08-14 17:04:39 -07:00
Jason Ekstrand
16c5b9f4ed vk: Build a version of the driver for linking into unit tests 2015-08-14 17:04:39 -07:00
Kristian Høgsberg Kristensen
30d82136bb vk: Update generated headers
This update brings usable IVB/HSW RENDER_SURFACE_STATE structs
and adds more float fields that we previously failed to
recognize.
2015-08-12 21:05:32 -07:00
Kristian Høgsberg Kristensen
9564dd37a0 vk: Query aperture size up front in anv_physical_device_init()
We already query the device in various ways here and we can just also
get the aperture size.  This avoids keeping an extra drm fd open
during the life time of the driver.

Also, we need to use explicit 64 bit types for the aperture size, not
size_t.
2015-08-10 17:18:55 -07:00
Kristian Høgsberg Kristensen
8605ee60e0 vk: Share upload logic and add size assert
This lets us hit an assert if we exceed the block pool size instead of
GPU hanging.

Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
2015-08-10 17:17:45 -07:00
Jason Ekstrand
6757e2f75c vk/cmd_buffer: Allow for null VkCmdPool's 2015-08-04 14:01:08 -07:00
Kristian Høgsberg Kristensen
4b097d73e6 vk: Call anv_batch_emit_dwords() up front in anv_batch_emit()
This avoids putting a memory barrier between the template struct and
the pack function, which generates much better code.
2015-08-03 15:38:14 -07:00
Kristian Høgsberg Kristensen
fbb119061e vk: Update generated headers
This adds zeroing of reserved blocks of dwords and removes an
instruction definition.
2015-08-03 15:21:27 -07:00
Jason Ekstrand
facf587dea vk/allocator: Solve a data race in anv_block_pool
The anv_block_pool data structure suffered from the exact same race as the
state pool.  Namely, that the uniqueness of the blocks handed out depends
on the next_block value increasing monotonically.  However, this invariant
did not hold thanks to our block "return" concept.
2015-08-03 01:19:34 -07:00
Jason Ekstrand
5e5a783530 vk: Add and use an anv_block_pool_size() helper 2015-08-03 01:18:09 -07:00
Jason Ekstrand
56ce219493 vk/allocator: Make block_pool_grow take and return a size
It takes the old size as an argument and returns the new size as the return
value.  On error, it returns a size of 0.
2015-08-03 01:06:45 -07:00
Jason Ekstrand
fd64598462 vk/allocator: Fix a data race in the state pool
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything.  In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus.  If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:

A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset

in which case D will get the same offset as C.  The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait.  Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.
2015-08-03 00:38:48 -07:00
Jason Ekstrand
481122f4ac vk/allocator: Make a few things more consistant 2015-08-03 00:35:19 -07:00
Jason Ekstrand
e65953146c vk/allocator: Use memory pools rather than (MALLOC|FREE)LIKE
We have pools, so we should be using them.  Also, I think this will help
keep valgrind from getting confused when we have to end up fighting with
system allocations such as those from malloc/free and mmap/munmap.
2015-07-31 10:38:28 -07:00
Jason Ekstrand
1920ef9675 vk/allocator: Add an anv_state_pool_finish function
Currently this is a no-op but it gives us a place to put finalization
things in the future.
2015-07-31 10:38:28 -07:00
Jason Ekstrand
930598ad56 vk/instance: valgrind-guard client-provided allocations 2015-07-31 10:38:23 -07:00
Jason Ekstrand
e40bdcef1f vk/device: Add anv_instance_alloc/free helpers
This way we can more consistently alloc/free the device and it will provide
us a better place to put valgrind hooks in the next patch
2015-07-31 10:14:17 -07:00
Jason Ekstrand
0f050aaa15 vk/device: Mark newly allocated memory as undefined for valgrind
This way valgrind still works even if the client gives us memory that has
been initialized or re-uses memory for some reason.
2015-07-31 09:44:42 -07:00
Jason Ekstrand
1f49a7d9fc vk/batch_chain: Decrement num_relocs instead of incrementing it 2015-07-31 09:11:47 -07:00
Jason Ekstrand
220a01d525 vk/batch_chain: Compute secondary exec mode after finishing the bo
Figuring out whether or not to do a copy requires knowing the length of the
final batch_bo.  This gets set by anv_batch_bo_finish so we have to do it
afterwards.  Not sure how this was even working before.
2015-07-31 08:52:30 -07:00
Jason Ekstrand
26ba0ad54d vk: Re-name command buffer implementation files
Previously, the command buffer implementation was split between
anv_cmd_buffer.c and anv_cmd_emit.c.  However, this naming convention was
confusing because none of the Vulkan entrypoints for anv_cmd_buffer were
actually in anv_cmd_buffer.c.  This changes it so that anv_cmd_buffer.c is
what you think it is and the internals are in anv_batch_chain.c.
2015-07-30 15:00:42 -07:00
Jason Ekstrand
e379cd9a0e vk/cmd_buffer: Add a simple command pool implementation 2015-07-30 14:55:49 -07:00
Jason Ekstrand
4c2a182a36 vk/cmd_buffer: Add support for zero-copy batch chaining 2015-07-30 14:22:17 -07:00
Jason Ekstrand
21004f23bf vk: Add initial support for secondary command buffers 2015-07-30 11:36:48 -07:00
Jason Ekstrand
5aee803b97 vk/cmd_buffer: Split batch chaining into a helper function 2015-07-30 11:34:58 -07:00
Jason Ekstrand
0c4a2dab7e vk/device: Make BATCH_SIZE a global #define 2015-07-30 11:34:09 -07:00
Jason Ekstrand
ace093031d vk/cmd_buffer: Add functions for cloning a list of anv_batch_bo's
We'll need this to implement secondary command buffers.
2015-07-30 11:32:27 -07:00
Jason Ekstrand
7af67e085f vk/reloc_list: Actually set the new length in reloc_list_grow 2015-07-30 11:29:55 -07:00
Jason Ekstrand
f15be18c92 util/list: Add list splicing functions
This adds functions for splicing one list into another.  These have
more-or-less the same API as the kernel list splicing functions.
2015-07-30 11:28:22 -07:00
Jason Ekstrand
e39d0b635c CLONE 2015-07-30 08:24:02 -07:00
Jason Ekstrand
82548a3aca vk/cmd_buffer: Invalidate texture cache in emit_state_base_address
Previously, the caller of emit_state_base_address was doing this.  However,
putting it directly in emit_state_base_address means that we'll never
forget the flush at the cost of one PIPE_CONTROL at the top every batch
(that should do nothing since the kernel just flushed for us).
2015-07-30 08:24:02 -07:00
Jason Ekstrand
56ce896d73 vk/cmd_buffer: Rename emit_batch_buffer_end to end_batch_buffer
This is more generic and doesn't imply that it emits MI_BATCH_BUFFER_END.
While we're at it, we'll move NOOP adding from bo_finish to
end_batch_buffer.
2015-07-30 08:24:02 -07:00
Jason Ekstrand
3ed9cea84d vk/cmd_buffer: Use an array to track all know anv_batch_bo objects
Instead of walking the list of batch and surface buffers, we simply keep
track of all known batch and surface buffers as we build the command
buffer.  Then we use this new list to construct the validate list.
2015-07-29 15:30:15 -07:00
Jason Ekstrand
0f31c580bf vk/cmd_buffer: Rework validate list creation
The algorighm we used previously required us to call add_bo in a particular
order in order to guarantee that we get the initial batch buffer as the
last element in the validate list.  The new algorighm does a recursive walk
over the buffers and then re-orders the list.  This should be much more
robust as we start to add circular dependancies in the relocations.
2015-07-29 15:16:54 -07:00
Jason Ekstrand
4fc7510a7c vk/cmd_buffer: Move emit_batch_buffer_end higher in the file 2015-07-29 12:01:08 -07:00
Jason Ekstrand
8208f01a35 vk/cmd_buffer: Store the relocation list in the anv_batch_bo struct
Before, we were doing this thing where we had one big relocation list for
the whole command buffer and each subbuffer took a chunk out of it.  Now,
we store the actual relocation list in the anv_batch_bo.  This comes at the
cost of more small allocations but makes a lot of things simpler.
2015-07-29 12:01:08 -07:00
Jason Ekstrand
7d50734240 vk/batch: Make relocs a pointer to a relocation list
Previously anv_batch.relocs was an actual relocation list.  However, this
is limiting if the implementation of the batch wants to change the
relocation list as the batch progresses.
2015-07-29 12:01:08 -07:00
Kristian Høgsberg Kristensen
fcea3e2d23 vk/headers: Update to new generated gen headers
This update fixes cases where a 48-bit address field was split into
two parts:

    __gen_address_type                           MemoryAddress;
    uint32_t                                     MemoryAddressHigh;

which cases this pack code to be generated:

   dw[1] =
       __gen_combine_address(data, &dw[1], values->MemoryAddress, dw1);

   dw[2] =
      __gen_field(values->MemoryAddressHigh, 0, 15) |
      0;

which breaks for addresses above 4G.

This update also fixes arrays of structs in commands and structs, for
example, we now have:

   struct GEN8_BLEND_STATE_ENTRY                Entry[8];

and the pack functions now write all dwords in the packet, making
valgrind happy.

Finally, we would try to pack 64 bits of blend state into a uint32_t -
that's also fixed now.
2015-07-29 11:02:33 -07:00
Jason Ekstrand
65f3d00cd6 vk/cmd_buffer: Update a comment 2015-07-29 08:33:56 -07:00
Jason Ekstrand
86a53d2880 vk/cmd_buffer: Use a doubly-linked list for batch and surface buffers
This is probably better than hand-rolling the list of buffers.
2015-07-28 17:47:59 -07:00
Jason Ekstrand
6aba52381a vk/aub: Use the data directly from the execbuf2
Previously, we were crawling through the anv_cmd_buffer datastructure to
pull out batch buffers and things.  This meant that every time something in
anv_cmd_buffer changed, we broke aub dumping.  However, aub dumping should
just dump the stuff the kernel knows about so we really don't need to be
crawling driver internals.
2015-07-28 16:53:45 -07:00
Jason Ekstrand
3c2743dcd1 vk/cmd_buffer: Pull the execbuf stuff into a substruct 2015-07-27 16:37:09 -07:00
Jason Ekstrand
4ced8650d4 vk/cmd_buffer: Move the remaining entrypoints into cmd_emit.c 2015-07-27 15:14:31 -07:00
Jason Ekstrand
d4c249364d vk/cmd_buffer: Move the re-emission of STATE_BASE_ADDRESS to the flushing code
This used to happen magically in cmd_buffer_new_surface_state_bo.  However,
according to Ken, STATE_BASE_ADDRESS is very gen-specific so we really
shouldn't have it in the generic data-structure code.
2015-07-27 15:05:06 -07:00
Jason Ekstrand
117d74b4e2 vk/cmd_buffer: Factor the guts of CmdBufferEnd into two helpers 2015-07-27 14:52:16 -07:00
Jason Ekstrand
8fb6405718 vk/cmd_buffer: Factor the guts of (Create|Reset|Destroy)CmdBuffer into helpers 2015-07-27 14:23:56 -07:00
Jason Ekstrand
80ad578c4e vk/private.h: Re-arrange and better comment anv_cmd_buffer 2015-07-27 12:40:43 -07:00
Jason Ekstrand
50e86b5777 vk: Actually advertise 0.138.1 at runtime 2015-07-23 10:44:27 -07:00
Jason Ekstrand
f884b500d0 vk/vulkan.h: Bump to the version 0.138.1 header
This doesn't actually require any implementation changes but it does change
an enum so it is ABI-incompatable with 0.138.0.
2015-07-23 10:38:22 -07:00
Jason Ekstrand
e99773badd vk: Add two more valgrind checks 2015-07-23 08:57:54 -07:00
Jason Ekstrand
b1fcc30ff0 vk/meta: Destroy shader modules 2015-07-22 17:51:26 -07:00
Jason Ekstrand
3460e6cb2f vk/device: Finish the scratch block pool on device destruction 2015-07-22 17:51:14 -07:00
Jason Ekstrand
867f6cb90c vk: Add a FreeDescriptorSets function 2015-07-22 17:33:09 -07:00
Jason Ekstrand
c9dc1f4098 vk/pipeline: Be more sloppy about shader entrypoint names
The CTS passes in NULL names right now.  It's not too hard to support that
as just "main".  With this, and a patch to vulkancts, we now pass all 6
tests.
2015-07-22 15:26:56 -07:00
Chad Versace
2c2233e328 vk: Prefix most filenames with anv
Jason started the task by creating anv_cmd_buffer.c and anv_cmd_emit.c.
This patch finishes the task by renaming all other files except
gen*_pack.h and glsl_scraper.py.
2015-07-17 20:25:38 -07:00
Chad Versace
f70d079854 vk/image: Remove unneeded data from anv_buffer_view
This completes the FINISHME to trim unneeded data from anv_buffer_view.

A VkExtent3D doesn't make sense for a VkBufferView.
So remove the member anv_surface_view::extent, and push it up to the two
objects that actually need it, anv_image_view and anv_attachment_view.
2015-07-17 14:48:23 -07:00
Chad Versace
194b77d426 vk: Document members of anv_surface_view 2015-07-17 14:39:05 -07:00
Chad Versace
169251bff0 vk: Remove more raw casts
This removes nearly all the remaining raw Anvil<->Vulkan casts from the
C source files.  (File compiler.cpp still contains many raw casts, and
I plan on ignoring that).

As far as I can tell, the only remaining raw casts are:
    anv_attachment_view -> anv_depth_stencil_view
    anv_attachment_view -> anv_color_attachment_view
2015-07-17 14:32:22 -07:00
Chad Versace
fc3838376b vk/image: Add braces around multi-line ifs 2015-07-17 13:38:09 -07:00
Connor Abbott
b2cfd85060 nir/spirv: don't declare builtin blocks
They aren't used, and the backend was barfing on them. Also, remove a
hack in in compiler.cpp now that they're gone.
2015-07-16 11:04:22 -07:00
Connor Abbott
b599735be4 nir/spirv: add support for loading UBO's
We directly emit ubo load intrinsics based off of the offset information
handed to us from SPIR-V.
2015-07-16 10:54:09 -07:00
Connor Abbott
513ee7fa48 nir/types: add more nir_type_is_xxx() wrappers 2015-07-15 21:58:32 -07:00
Connor Abbott
9fa0989ff2 nir: move to two-level binding model for UBO's
The GLSL layer above is still hacky, so we're really just moving the
hack into GLSL-to-NIR. I'd rather not go all the way and make GLSL
support the Vulkan binding model too, since presumably we'll be
switching to SPIR-V exclusively, and so working on proper GLSL support
will be a waste of time. For now, doing this keeps it working as we add
SPIR-V->NIR support though.
2015-07-15 17:18:48 -07:00
Chad Versace
5520221118 vk: Remove unneeded vulkan-138.h 2015-07-15 17:16:07 -07:00
Chad Versace
73a8f9543a vk: Bump vulkan.h version to 0.138 2015-07-15 17:16:07 -07:00
Chad Versace
55781f8d02 vk/0.138: Update VkResult values 2015-07-15 17:16:07 -07:00
Chad Versace
756d8064c1 vk/0.132: Do type-safety 2015-07-15 17:16:07 -07:00
Jason Ekstrand
927f54de68 vk/cmd_buffer: Move batch buffer padding to anv_batch_bo_finish() 2015-07-15 17:11:04 -07:00
Jason Ekstrand
9c0db9d349 vk/cmd_buffer: Rename bo_count to exec2_bo_count 2015-07-15 16:56:29 -07:00
Jason Ekstrand
6037b5d610 vk/cmd_buffer: Add a helper for allocating dynamic state
This matches what we do for surface state and makes the dynamic state pool
more opaque to things that need to get dynamic state.
2015-07-15 16:56:29 -07:00
Jason Ekstrand
7ccc8dd24a vk/private.h: Move cmd_buffer functions to near the cmd_buffer struct 2015-07-15 16:56:29 -07:00
Jason Ekstrand
d22d5f25fc vk: Split command buffer state into its own structure
Everything else in anv_cmd_buffer is the actual guts of the datastructure.
2015-07-15 16:56:29 -07:00
Jason Ekstrand
da4d9f6c7c vk: Move most of the anv_Cmd related stuff to its own file 2015-07-15 16:56:28 -07:00
Jason Ekstrand
d862099198 vk: Pull the guts of anv_cmd_buffer into its own file 2015-07-15 16:56:28 -07:00
Chad Versace
498ae009d3 vk/glsl: Replace raw casts
Needed for upcoming type-safety changes.
2015-07-15 15:51:37 -07:00
Chad Versace
6f140e8af1 vk/meta: Remove raw casts
Needed for upcoming type-safety changes.
2015-07-15 15:51:37 -07:00
Chad Versace
badbf0c94a vk/x11: Remove raw casts
The raw casts in the WSI functions will break the build when the
type-safety changes arrive.
2015-07-15 15:49:10 -07:00
Chad Versace
61a4bfe253 vk: Delete vkDbgSetObjectTag()
Because VkObject is going away.
2015-07-15 15:34:20 -07:00
Jason Ekstrand
e1c78ebe53 vk/device: Remove unneeded checks for NULL 2015-07-15 15:22:32 -07:00
Jason Ekstrand
f4748bff59 vk/device: Provide proper NULL handling in anv_device_free
The Vulkan spec does not specify that the free function provided to
CreateInstance must handle NULL properly so we do it in the wrapper.  If
this ever changes in the spec, we can delete the extra 2 lines.
2015-07-15 15:22:32 -07:00
Chad Versace
4c8e1e5888 vk: Stop internally calling anv_DestroyObject()
Replace each anv_DestroyObject() with anv_DestroyFoo().

Let vkDestroyObject() live for a while longer for Crucible's sake.
2015-07-15 15:11:16 -07:00
Chad Versace
f5ad06eb78 vk: Fix vkDestroyObject dispatch for VkRenderPass
It called anv_device_free() instead of anv_DestroyRenderPass().
2015-07-15 15:07:41 -07:00
Chad Versace
188f2328de vk: Fix vkCreate/DestroyRenderPass
While updating vkDestroyObject, I discovered that vkDestroyPass reliably
crashes. That hasn't been an issue yet, though, because it is never
called.

In vkCreateRenderPass:
    - Don't allocate empty attachment arrays.
    - Ensure that pointers to empty attachment arrays are NULL.
    - Store VkRenderPassCreateInfo::subpassCount as
      anv_render_pass::subpass_count.

In vkDestroyRenderPass:
    - Fix loop bounds: s/attachment_count/subpass_count/
    - Don't call anv_device_free on null pointers.
2015-07-15 15:07:41 -07:00
Chad Versace
c6270e8044 vk: Refactor create/destroy code for anv_descriptor_set
Define two new functions:
    anv_descriptor_set_create
    anv_descriptor_set_destroy
2015-07-15 14:31:22 -07:00
Chad Versace
365d80a91e vk: Replace some raw casts with safe casts
That is, replace some instances of
    (VkFoo) foo
with
    anv_foo_to_handle(foo)
2015-07-15 14:00:21 -07:00
Chad Versace
7529e7ce86 vk: Correct anv_CreateShaderModule's prototype
s/VkShader/VkShaderModule/

:sigh: I look forward to type-safety.
2015-07-15 13:59:47 -07:00
Chad Versace
8213be790e vk: Define struct anv_image_view, anv_buffer_view
Follow the pattern of anv_attachment_view. We need these structs to
implement the type-safety that arrived in the 0.132 header.
2015-07-15 12:19:29 -07:00
Chad Versace
43241a24bc vk/meta: Fix declared type of a shader module
s/VkShader/VkShaderModule/

I'm looking forward to a type-safe vulkan.h ;)
2015-07-15 11:49:37 -07:00
Chad Versace
94e473c993 vk: Remove struct anv_object
Trivial removal because vkDestroyObject() no longer uses it.
2015-07-15 11:29:43 -07:00
Jason Ekstrand
e375f722a6 vk/device: More documentation on surface state flushing 2015-07-15 11:09:02 -07:00
Connor Abbott
9aabe69028 vk/device: explain why a flush is necessary
Jason found this from experimenting, but the docs give a reasonable
explanation of why it's necessary.
2015-07-14 23:03:19 -07:00
Chad Versace
5f46c4608f vk: Fix indentation of anv_dynamic_cb_state 2015-07-14 18:19:10 -07:00
Chad Versace
0eeba6b80c vk: Add finishmes for VkDescriptorPool
VkDescriptorPool is a stub object. As a consequence, it's impossible to
free descriptor set memory.
2015-07-14 18:19:00 -07:00
Jason Ekstrand
2b5a4dc5f3 vk: Add vulkan-138 and remove vulkan-0.132
Now, 138 is the target and not 132.  Once object destruction is finished,
we can delete 138 as it will be identical to vulkan.h
2015-07-14 17:54:13 -07:00
Jason Ekstrand
1f658bed70 vk/device: Add stub support for command pools
Real support isn't really that far away.  We just need a data structure
with a linked list and a few tests.
2015-07-14 17:40:00 -07:00
Jason Ekstrand
ca7243b54e vk/vulkan.h: Add the stuff for cross-queue resource sharing
We only have one queue, so this is currently a no-op on our implementation.
2015-07-14 17:20:50 -07:00
Jason Ekstrand
553b4434ca vk/vulkan.h: Add a couple of size fields for specialization constants 2015-07-14 17:12:39 -07:00
Jason Ekstrand
e5db209d54 vk/vulkan.h: Move around buffer image granularities 2015-07-14 17:10:37 -07:00
Jason Ekstrand
c7fcfebd5b vk: Add stubs for all the sparse resource stuff 2015-07-14 17:06:11 -07:00
Jason Ekstrand
2a9136feb4 vk/image: Add a stub for the new ImageFormatProperties function
This lets the client query about things like multisample.  We don't do
multisample right now, so I'll let Chad deal with that when he gets to it.
2015-07-14 17:05:30 -07:00
Jason Ekstrand
2c4dc92f40 vk/vulkan.h: Rename FormatInfo to FormatProperties 2015-07-14 17:04:46 -07:00
Jason Ekstrand
d7f44852be vk/vulkan.h: Re-order some #define's 2015-07-14 16:41:39 -07:00
Jason Ekstrand
1fd3bc818a vk/vulkan.h: Rename a function parameter 2015-07-14 16:39:01 -07:00
Jason Ekstrand
2e2f48f840 vk: Remove abreviations 2015-07-14 16:34:31 -07:00
Jason Ekstrand
02db21ae11 vk: Add the new extension/layer enumeration entrypoints 2015-07-14 16:11:21 -07:00
Jason Ekstrand
a463eacb8f vk/vulkan.h: Change maxAnisotropy to a float 2015-07-14 15:04:11 -07:00
Jason Ekstrand
98957b18d2 vk/vulkan.h: Add the VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT flag 2015-07-14 15:03:39 -07:00
Jason Ekstrand
a35811d086 vk/vulkan.h: Rename a couple of function parameters
No functional change.
2015-07-14 15:03:01 -07:00
Jason Ekstrand
55723e97f1 vk: Split the memory requirements/binding functions 2015-07-14 14:59:39 -07:00
Jason Ekstrand
ccb2e5cd62 vk: Make barriers more precise (rev. 133) 2015-07-14 14:50:35 -07:00
Jason Ekstrand
30445f8f7a vk: Split the dynamic state binding function into one per state 2015-07-14 14:26:10 -07:00
Jason Ekstrand
d2c0870ff3 vk/vulkan.h: Rename a function parameter to match 132 2015-07-14 14:11:04 -07:00
Jason Ekstrand
8478350992 vk: Implement Multipass 2015-07-14 11:37:14 -07:00
Jason Ekstrand
68768c40be vk/vulkan.h: Re-arrange some enums and definitions in preparation for 131 2015-07-14 11:32:15 -07:00
Chad Versace
66cbb7f76d vk/0.132: Add vkDestroyRenderPass() 2015-07-14 11:21:31 -07:00
Chad Versace
6d0ed38db5 vk/0.132: Add vkDestroy*View()
vkDestroyColorAttachmentView
vkDestroyDepthStencilView

These functions are not in the 0.132 header, but adding them will help
us attain the type-safety API updates more quickly.
2015-07-14 11:19:22 -07:00
Chad Versace
1ca611cbad vk/0.132: Add vkDestroyCommandBuffer() 2015-07-14 11:11:41 -07:00
Chad Versace
6eec0b186c vk/0.132: Add vkDestroyImageView()
Just declare it in vulkan.h. Jason defined the function earlier
in image.c.
2015-07-14 11:09:14 -07:00
Chad Versace
4b2c5a98f0 vk/0.132: Add vkDestroyBufferView()
Just declare it in vulkan.h. Jason already defined the function
earlier in vulkan.c.
2015-07-14 11:06:57 -07:00
Chad Versace
08f7731f67 vk/0.132: Add vkDestroyFramebuffer() 2015-07-14 10:59:30 -07:00
Chad Versace
0c8456ef1e vk/0.132: Add vkDestroyDynamicDepthStencilState() 2015-07-14 10:54:51 -07:00
Chad Versace
b29c929e8e vk/0.132: Add vkDestroyDynamicColorBlendState() 2015-07-14 10:52:45 -07:00
Chad Versace
5e1737c42f vk/0.132: Add vkDestroyDynamicRasterState() 2015-07-14 10:51:08 -07:00
Chad Versace
d80fea1af6 vk/0.132: Add vkDestroyDynamicViewportState() 2015-07-14 10:42:45 -07:00
Chad Versace
9250e1e9e5 vk/0.132: Add vkDestroyDescriptorPool() 2015-07-14 10:38:22 -07:00
Chad Versace
f925ea31e7 vk/0.132: Add vkDestroyDescriptorSetLayout() 2015-07-14 10:36:49 -07:00
Chad Versace
ec5e2f4992 vk/0.132: Add vkDestroySampler() 2015-07-14 10:34:00 -07:00
Chad Versace
a684198935 vk/0.132: Add vkDestroyPipelineLayout() 2015-07-14 10:29:47 -07:00
Chad Versace
6e5ab5cf1b vk/0.132: Add vkDestroyPipeline() 2015-07-14 10:26:17 -07:00
Chad Versace
114015321e vk/0.132: Add vkDestroyPipelineCache() 2015-07-14 10:19:27 -07:00
Chad Versace
cb57bff36c vk/0.132: Add vkDestroyShader() 2015-07-14 10:16:22 -07:00
Chad Versace
8ae8e14ba7 vk/0.132: Add vkDestroyShaderModule() 2015-07-14 10:13:09 -07:00
Chad Versace
dd67c134ad vk/0.132: Add vkDestroyImage()
We only need to add it to vulkan.h because Jason defined the function
earlier in image.c.
2015-07-14 10:13:00 -07:00
Chad Versace
e18377f435 vk/0.132: Dispatch vkDestroyObject to new destructors
Oops. My recent commits added new destructors, but forgot to teach
vkDestroyObject about them. They are:
  vkDestroyFence
  vkDestroyEvent
  vkDestroySemaphore
  vkDestroyQueryPool
  vkDestroyBuffer
2015-07-14 09:58:22 -07:00
Chad Versace
e93b6d8eb1 vk/0.132: Add vkDestroyBuffer() 2015-07-14 09:47:45 -07:00
Chad Versace
584cb7a16f vk/0.132: Add vkDestroyQueryPool() 2015-07-14 09:44:58 -07:00
Chad Versace
68c7ef502d vk/0.132: Add vkDestroyEvent() 2015-07-14 09:33:47 -07:00
Chad Versace
549070b18c vk/0.132: Add vkDestroySemaphore() 2015-07-14 09:31:34 -07:00
Chad Versace
ebb191f145 vk/0.132: Add vkDestroyFence() 2015-07-14 09:29:35 -07:00
Chad Versace
435ccf4056 vk/0.132: Rename VkDynamic*State types
sed -i -e 's/VkDynamicVpState/VkDynamicViewportState/g' \
       -e 's/VkDynamicRsState/VkDynamicRasterState/g' \
       -e 's/VkDynamicCbState/VkDynamicColorBlendState/g' \
       -e 's/VkDynamicDsState/VkDynamicDepthStencilState/g' \
       $(git ls-files include/vulkan src/vulkan)
2015-07-13 16:19:28 -07:00
Connor Abbott
ffb51fd112 nir/spirv: update to SPIR-V revision 31
This means that now the internal version of glslangValidator is
required. This includes some changes due to the sampler/texture rework,
but doesn't actually enable anything more yet. We also don't yet handle
UBO's correctly, and don't handle matrix stride and row major/column
major yet.
2015-07-13 15:01:01 -07:00
Chad Versace
45f8723f44 vk/0.132: Move VkQueryControlFlags 2015-07-13 13:09:32 -07:00
Chad Versace
180c07ee50 vk/0.132: Move VkImageAspectFlags 2015-07-13 13:08:56 -07:00
Chad Versace
4b05a8cd31 vk/0.132: Move VkCmdBufferOptimizeFlags 2015-07-13 13:08:07 -07:00
Chad Versace
f1cf55fae6 vk/0.132: Move VkWaitEvent 2015-07-13 13:06:53 -07:00
Chad Versace
3112098776 vk/0.132: Move VkCmdBufferLevel 2015-07-13 13:06:33 -07:00
Chad Versace
c633ab5822 vk/0.132: Drop VK_ATTACHMENT_STORE_OP_RESOLVE_MSAA 2015-07-13 13:05:24 -07:00
Chad Versace
8f3b2187e1 vk/0.132: Rename bool32_t -> VkBool32
sed -i 's/bool32_t/VkBool32/g' \
  $(git ls-files src/vulkan include/vulkan)
2015-07-13 13:03:36 -07:00
Chad Versace
77dcfe3c70 vk/0.132: Remove stray typedef 2015-07-13 12:58:17 -07:00
Chad Versace
601d0891a6 vk/0.132: Move VKImageUsageFlags 2015-07-13 12:48:44 -07:00
Chad Versace
829810fa27 vk/0.132: Move VkImageType and VkImageTiling 2015-07-13 11:49:56 -07:00
Chad Versace
17c8232ecf vk/0.132: Import the 0.132 header
Import it as vulkan-0.132.h.
2015-07-13 11:47:12 -07:00
Chad Versace
a158ff55f0 vk/vulkan.h: Remove headers for old API versions
Remove the temporary headers for 0.90 and 0.130.
2015-07-13 11:46:30 -07:00
Chad Versace
1c4238a8e5 vk/0.130: Bump header version to 0.130
All APIs have been updated. This eliminates the diff between the
work-in-progress header and the 0.130 header.
2015-07-10 20:06:09 -07:00
Chad Versace
f43a304dc6 vk/0.130: Update vkAllocMemory to use VkMemoryType 2015-07-10 17:35:52 -07:00
Chad Versace
df2a013881 vk/0.130: Implement vkGetPhysicalDeviceMemoryProperties() 2015-07-10 17:35:52 -07:00
Chad Versace
c7f512721c vk/gem: Change signature of anv_gem_get_aperture()
Replace the anv_device parameter with anv_physical_device, because this needs
querying before vkCreateDevice.
2015-07-10 17:35:52 -07:00
Chad Versace
8cda3e9b1b vk/device: Add member anv_physical_device::fd
During anv_physical_device_init(), we opend the DRM device to do some
queries, then promptly closed it. Now we keep it open for the lifetime
of the anv_physical_device so that we can query it some more during
vkGetPhysicalDevice*Properties() [which will happen in follow-up
commits].
2015-07-10 17:35:52 -07:00
Chad Versace
4422bd4cf6 vk/device: Add func anv_physical_device_finish()
Because in a follow-up patch I need to do some non-trival teardown on
anv_physical_device. Currently, however, anv_physical_device_finish() is
currently a no-op that's just called in the right place.

Also, rename function fill_physical_device -> anv_physical_device_init
for symmetry.
2015-07-10 17:35:52 -07:00
Jason Ekstrand
7552e026da vk/device: Add an explicit destructor for RenderPass 2015-07-10 12:33:04 -07:00
Jason Ekstrand
8b342b39a3 vk/image: Add an explicit DestroyImage function 2015-07-10 12:30:58 -07:00
Jason Ekstrand
b94b8dfad5 vk/image: Add explicit constructors for buffer/image view types 2015-07-10 12:26:31 -07:00
Jason Ekstrand
18340883e3 nir: Add C++ versions of NIR_(SRC|DEST)_INIT 2015-07-10 11:57:33 -07:00
Chad Versace
9e64a2a8e4 mesa: Fix generation of git_sha1.h.tmp for gitlinks
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].

[1] A "gitlink" is a text file that specifies the real location of
    the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.

Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 75784243df)
2015-07-10 11:24:25 -07:00
Jason Ekstrand
19f0a9b582 vk/query.c: Use the casting functions 2015-07-09 20:32:44 -07:00
Jason Ekstrand
6eb221c884 vk/pipeline.c: Use the casting functions 2015-07-09 20:28:08 -07:00
Jason Ekstrand
fb4e2195ec vk/formats.c: Use the casting functions 2015-07-09 20:24:17 -07:00
Jason Ekstrand
a52e208203 vk/image.c: Use the casting functions 2015-07-09 20:24:07 -07:00
Jason Ekstrand
b1de1d4f6e vk/device.c: One more use of a casting function 2015-07-09 20:23:46 -07:00
Jason Ekstrand
8739e8fbe2 vk/meta.c: Use the casting functions 2015-07-09 20:16:13 -07:00
Jason Ekstrand
92556c77f4 vk: Fix the build 2015-07-09 18:59:08 -07:00
Jason Ekstrand
098209eedf device.c: Use the cast helpers a bunch of places 2015-07-09 18:49:43 -07:00
Jason Ekstrand
73f9187e33 device.c: Use the cast helpers 2015-07-09 18:41:27 -07:00
Jason Ekstrand
7d24fab4ef vk/private.h: Add a bunch of static inline casting functions
We will need these as soon as we turn on type saftey.  We might as well
define and start using them now rather than later.
2015-07-09 18:40:54 -07:00
Jason Ekstrand
5c49730164 vk/device.c: Fix whitespace issues 2015-07-09 18:20:28 -07:00
Jason Ekstrand
c95f9b61f2 vk/device.c: Use ANV_FROM_HANDLE a bunch of places 2015-07-09 18:20:10 -07:00
Jason Ekstrand
335e88c8ee vk/vulkan.h: Add the pEnabledFeatures field to DeviceCreateInfo 2015-07-09 16:21:31 -07:00
Jason Ekstrand
34871cf7f3 vk/vulkan.h: Change the MsCreateInfo structure to the 130 version
We do nothing with it at the moment, so this is a no-op.
2015-07-09 16:19:54 -07:00
Jason Ekstrand
8c2c37fae7 vk: Remove the old GetPhysicalDeviceInfo call 2015-07-09 16:14:37 -07:00
Jason Ekstrand
1f907011a3 vk: Add the new PhysicalDeviceQueue queries 2015-07-09 16:14:37 -07:00
Jason Ekstrand
977a469bce vk: Support GetPhysicalDeviceProperties 2015-07-09 16:14:37 -07:00
Jason Ekstrand
65e0b304b6 vk: Add support for GetPhysicalDeviceLimits 2015-07-09 16:14:37 -07:00
Jason Ekstrand
f6d51f3fd3 vk: Add GetPhysicalDeviceFeatures 2015-07-09 16:14:37 -07:00
Chad Versace
5b75dffd04 vk/device: Fix vkEnumeratePhysicalDevices()
The Vulkan spec says that pPhysicalDeviceCount is an out parameter if
pPhysicalDevices is NULL; otherwise it's an inout parameter.

Mesa incorrectly treated it unconditionally as an inout parameter, which
could have lead to reading unitialized data.
2015-07-09 15:53:21 -07:00
Chad Versace
fa915b661d vk/device: Move device enumeration to vkEnumeratePhysicalDevices()
Don't enumerate devices in vkCreateInstance(). That's where global,
device-independent initialization should happen. Move device enumeration
to the more logical location, vkEnumeratePhysicalDevices().
2015-07-09 15:41:17 -07:00
Chad Versace
c34d314db3 vk/device: Be consistent about path to DRM device
Function fill_physical_device() has a 'path' parameter, and struct
anv_physical_device has a 'path' member. Sometimes these are used;
sometimes hardcoded "/dev/dri/renderD128" is used instead.

Be consistent. Hardcode "/dev/dri/renderD128" in exactly one location,
during initialization of the physical device.
2015-07-09 15:27:26 -07:00
Connor Abbott
cff06bbe7d vk/compiler: create an empty parameters list
Prevents problems when initializing the sanity_param_count.
2015-07-09 14:29:23 -04:00
Connor Abbott
3318a86d12 nir/spirv: fix wrong writemask for ALU operations 2015-07-09 14:28:39 -04:00
Connor Abbott
b8fedc19f5 nir/spirv: fix memory context for builtin variable
Fixes valgrind errors with func.depthstencil.basic.
2015-07-08 22:03:30 -04:00
Connor Abbott
e4292ac039 nir/spirv: zero out value array
Before values are pushed or annotated with a name, decoration, etc.,
they need to have an invalid type, NULL name, NULL decoration, etc.
ralloc zero's everything by accident, so this wasn't an issue in
practice, but we should be explicitly zero'ing it.
2015-07-08 22:03:30 -04:00
Connor Abbott
997831868f vk/compiler: create the right kind of program struct
This fixes Valgrind errors and gets all the tests to pass with
--use-spir-v.
2015-07-08 22:03:30 -04:00
Connor Abbott
a841e2c747 vk/compiler: mark inputs/outputs as read/written
This doesn't handle inputs and outputs larger than a vec4, but we plan
to add a varyiing splitting/packing pass to handle those anyways.
2015-07-08 22:03:30 -04:00
Jason Ekstrand
8640dc12dc vk/vulkan.h: Copy the VkStructureType enum from version 130
We now have the exact same structs which require pType.
2015-07-08 17:45:52 -07:00
Jason Ekstrand
5a4ebf6bc1 vk: Move to the new pipeline creation API's 2015-07-08 17:30:18 -07:00
Chad Versace
4fcb32a17d vk/0.130: Remove VkImageViewCreateInfo::minLod
It's now set solely through VkSampler.
2015-07-08 14:48:22 -07:00
Jason Ekstrand
367b9ba78f vk/vulkan.h: Move renderPassContinue from GraphicsBeginInfo to BeginInfo 2015-07-08 14:37:30 -07:00
Jason Ekstrand
d29ec8fa36 vk/vulkan.h: Update to the new UpdateDescriptorSets api 2015-07-08 14:24:56 -07:00
Jason Ekstrand
c8577b5f52 vk: Add a macro for creating anv variables from vulkan handles
This is very helpful for doing the mass bunch of casts at the top of a
function.  It will also be invaluable when we get type saftey in the API.
2015-07-08 14:24:14 -07:00
Chad Versace
ccb27a002c vk/0.130 Update VkObjectType values
Don't import any new enum tokens from the 0.130 header. Just update the
values of existing enums. This reduces the diff by about 16 lines.
2015-07-08 12:53:49 -07:00
Chad Versace
8985dd15a1 vk/0.130: Remove VkDescriptorUpdateMode
Nowhere used.
2015-07-08 12:51:46 -07:00
Chad Versace
e02dfa309a vk/0.130: Remove VK_DEVICE_CREATE_MULTI_DEVICE_IQ_MATCH_BIT 2015-07-08 12:49:48 -07:00
Chad Versace
e9034ed875 vk/0.130: Update vkCmdBlitImage signature
Add VkTexFilter param. Ignored for now.
2015-07-08 12:47:48 -07:00
Jason Ekstrand
aae45ab583 vk/vulkan.h: Add packing parameters to BufferImageCopy 2015-07-08 11:51:34 -07:00
Chad Versace
b4ef7f354b vk/0.130: Remove msaa members of VkDepthStencilViewCreateInfo 2015-07-08 11:50:51 -07:00
Jason Ekstrand
522ab835d6 vk/vulkan.h: Move over to the new border color enums 2015-07-08 11:44:52 -07:00
Jason Ekstrand
7598329774 vk/vulkan.h: Move VkFormatProperties 2015-07-08 11:16:45 -07:00
Jason Ekstrand
52940e8fcf vk/vulkan.h: Add RenderPassBeginContents 2015-07-08 10:57:13 -07:00
Jason Ekstrand
e19d6be2a9 vk/vulkan.h: Add command buffer levels 2015-07-08 10:53:32 -07:00
Jason Ekstrand
c84f2d3b8c vk/vulkan.h: Import the VkPipeEvent enum from 130
Now, VkPipeEventFlags is back in sync with VkPipeEvent
2015-07-08 10:49:46 -07:00
Jason Ekstrand
b20cc72603 vk/vulkan.h: Remove VkFormatInfoType 2015-07-08 10:39:31 -07:00
Jason Ekstrand
8e05bbeee9 vk/vulkan.h: Update extension handling to rev 130 2015-07-08 10:38:07 -07:00
Jason Ekstrand
cc29a5f4be vk/vulkan.h: Move format quering to the physical device 2015-07-08 09:34:47 -07:00
Jason Ekstrand
719fa8ac74 vk/vulkan.h: Remove some peer opening structs and STRUCTURE_TYPE enums 2015-07-08 09:25:13 -07:00
Jason Ekstrand
fc6dcc6227 vk: Add a copy of the v90 header. 2015-07-08 09:23:29 -07:00
Jason Ekstrand
12119282e6 vk/vulkan.h: Remove an unneeded comment 2015-07-08 09:18:09 -07:00
Jason Ekstrand
3c65a1ac14 vk/vulkan.h: Remove the MemoryRange stubs and add sparse stubs 2015-07-08 09:16:48 -07:00
Jason Ekstrand
bb6567f5d1 vk/vulkan.h: Switch BindObjectMemory to a device function and remove the index 2015-07-08 09:04:16 -07:00
Jason Ekstrand
e7acdda184 vk/vulkan.h: Switch to the split ProcAddr functions in 130 2015-07-07 18:51:53 -07:00
Jason Ekstrand
db24afee2f vk/vulkan.h: Switch from GetImageSubresourceInfo to GetImageSubresourceLayout 2015-07-07 18:20:18 -07:00
Jason Ekstrand
ef8980e256 vk/vulkan.h: Switch from GetObjectInfo to GetMemoryRequirements 2015-07-07 18:16:42 -07:00
Jason Ekstrand
d9c2caea6a vk: Update memory flushing functions to 130
This involves updating the prototype for FlushMappedMemory, adding
InvalidateMappedMemoryRanges, and removing PinSystemMemory.
2015-07-07 17:22:31 -07:00
Jason Ekstrand
d5349b1b18 vk/vulkan.h: Constify the pFences parameter to ResetFences 2015-07-07 17:18:00 -07:00
Jason Ekstrand
6aa1b89457 vk/vulkan.h: Move the definitions of Create(Framebuffer|RenderPass)
This better matches the 130 header.
2015-07-07 17:13:10 -07:00
Jason Ekstrand
0ff06540ae vk: Implement the GetRenderAreaGranularity function
At the moment, we're just going to scissor clears so a granularity of 1x1
is all we need.
2015-07-07 17:11:37 -07:00
Jason Ekstrand
435b062b26 vk/vulkan.h: Add a PipelineLayout parameter to BindDescriptorSets 2015-07-07 17:06:10 -07:00
Jason Ekstrand
518ca9e254 vk/vulkan.h: Add a compareEnable parameter to SamplerCreateInfo
Our hardware doesn't actually need this, so adding it is a no-op.
2015-07-07 16:49:04 -07:00
Jason Ekstrand
672590710b vk/vulkan.h: Remove initialCount from SemaphoreCreateInfo 2015-07-07 16:42:42 -07:00
Jason Ekstrand
80046a7d54 vk/vulkan.h: Update clear color handling to 130 2015-07-07 16:37:43 -07:00
Jason Ekstrand
3e4b00d283 meta: Use the VkClearColorValue structure for the color attribute 2015-07-07 16:27:06 -07:00
Jason Ekstrand
a35fef1ab2 vk/vulkan.h: Remove the pass argument from EndRenderPass 2015-07-07 16:22:23 -07:00
Jason Ekstrand
d2ca7e24b4 vk/vulkan.h: Rename VertexInputStateInfo to VertexInputStateCreateInfo 2015-07-07 16:15:55 -07:00
Jason Ekstrand
abbb776bbe vk/vulkan.h: Remove programPointSize
Instead, we auto-detect whether or not your shader writes gl_PointSize.  If
it does, we use 1.0, otherwise we take it from the shader.
2015-07-07 16:00:46 -07:00
Chad Versace
e7ddfe03ab vk/0.130: Stub vkCmdClear*Attachment() funcs
vkCmdClearColorAttachment
vkCmdClearDepthStencilAttachment
2015-07-07 15:57:37 -07:00
Chad Versace
f89e2e6304 vk/0.130: Define enum VkImageAspectFlagBits 2015-07-07 15:57:37 -07:00
Chad Versace
55ab1737d3 vk/0.130: Define VkRect3D 2015-07-07 15:55:53 -07:00
Chad Versace
11901a9100 vk/0.130: Update name of vkCmdClearDepthStencilImage() 2015-07-07 15:53:35 -07:00
Chad Versace
dff32238c7 vk/0.130: Stub vkCmdExecuteCommands() 2015-07-07 15:51:55 -07:00
Chad Versace
85c0d69be9 vk/0.130: Update vkCmdWaitEvents() signature 2015-07-07 15:49:57 -07:00
Chad Versace
0ecb789b71 vk: Remove unused 'v' param from stub() macro 2015-07-07 15:47:24 -07:00
Chad Versace
f78d684772 vk: Stub vkCmdPushConstants() from 0.130 header 2015-07-07 15:46:19 -07:00
Chad Versace
18ee32ef9d vk: Update vkCmdPipelineBarrier to 0.130 header 2015-07-07 15:43:41 -07:00
Chad Versace
4af79ab076 vk: Add func anv_clear_mask()
A little helper func for inspecting and clearing bitmasks.
2015-07-07 15:43:41 -07:00
Jason Ekstrand
788a8352b9 vk/vulkan.h: Remove some unused fields.
In particular, the following are removed:

 - disableVertexReuse
 - clipOrigin
 - depthMode
 - pointOrigin
 - provokingVertex
2015-07-07 15:33:00 -07:00
Jason Ekstrand
7fbed521bb vk/vulkan.h: Remove the explicit primitive restart index
Unfortunately, this requires some non-trivial changes to the driver.  Now
that the primitive restart index isn't given explicitly by the client, we
always use ~0 for everything like D3D does.  Unfortunately, our hardware is
awesome and a 32-bit version of ~0 doesn't match any 16-bit values.  This
means, we have to set it to either UINT16_MAX or UINT32_MAX depending on
the size of the index type.  Since we get the index type from
CmdBindIndexBuffer and the rest of the VF packet from the pipeline, we need
to lazy-emit the VF packet.
2015-07-07 15:33:00 -07:00
Chad Versace
d6b840beff vk: Delete some comments not present in 0.130 header
Deleting the comments reduces diff noise.
2015-07-07 15:16:13 -07:00
Chad Versace
84a5bc25e3 vk: Pull in remaining 0.130 handle types
This pulls in the definition of VkShaderModule and VkPipelineCache,
which nowhere used yet.
2015-07-07 15:13:01 -07:00
Chad Versace
f2899b1af2 vk: Pull in #defines from 0.130 header
Despite not being used yet, pulling in the macros does diminish the
header diff.
2015-07-07 15:11:30 -07:00
Jason Ekstrand
962d6932fa vk/vulkan.h: Rename (min|max)Depth to (min|max)DepthBounds 2015-07-07 12:37:54 -07:00
Jason Ekstrand
1fb859e4b2 vk/vulkan.h: Remove client-settable pointSize from DynamicRsState 2015-07-07 12:35:32 -07:00
Jason Ekstrand
245583075c vk/vulkan.h: Remove UINT8 index buffers 2015-07-07 11:26:49 -07:00
Jason Ekstrand
0a42332904 vk/vulkan.h: Re-order the object declarations 2015-07-07 11:26:49 -07:00
Kristian Høgsberg Kristensen
a1eea996d4 vk: Emit 3DSTATE_SAMPLE_MASK
This was missing and was causing the driver to not work with
execlists. Presumably we get a different initial hw context with
execlists enabled, that has sample mask 0 initially.

Set this to 0xffff for now.  When we add MS support, we need to take the
value from VkPipelineMsStateCreateInfo::sampleMask.
2015-07-06 23:54:12 -07:00
Kristian Høgsberg Kristensen
c325bb24b5 vk: Pull in new generated headers
The new headers use stdbool for enable/disable fields which
implicitly converts expressions like (flags & 8) to 0 or 1.
Also handles MBO (must-be-one) fields by setting them to one,
corrects a bspec typo (_3DPRIM_LISTSTRIP_ADJ -> LINESTRIP) and
makes a few enum values less clashy.
2015-07-06 22:12:26 -07:00
Chad Versace
23075bccb3 vk/image: Validate vkCreateImageView more
Exhaustively validate the function input.  If it's not validated and
doesn't have an anv_finishme(), then I overlooked it.
2015-07-06 18:28:26 -07:00
Chad Versace
69e11adecc vk/image: Add more info to VkImageViewType table
Convert the table from the direct mapping
  VkImageViewType -> SurfaceType

into a mapping to an info struct
  VkImageViewType -> struct anv_image_view_info
2015-07-06 18:28:26 -07:00
Chad Versace
b844f542e0 vk: Update VkImageViewType to 0.130.0
This splits 1D and 1D_ARRAY, 2D and 2D_ARRAY, CUBE and CUBE_ARRAY.

The new tokens are unused. This is just a header update.
2015-07-06 18:28:26 -07:00
Chad Versace
5b04db71ff vk/image: Move validation for vkCreateImageView
Move the validation from anv_CreateImageView() and anv_image_view_init()
to anv_validate_CreateImageView(). No new validation is added.
2015-07-06 18:27:14 -07:00
Jason Ekstrand
1f1b26bceb vk/vulkan.h: Rename VkRect to VkRect2D 2015-07-06 17:47:18 -07:00
Jason Ekstrand
63c1190e47 vk/vulkan.h: Rename count to arraySize in VkDescriptorSetLayoutBinding 2015-07-06 17:43:58 -07:00
Jason Ekstrand
d84f3155b1 vk/vulkan.h: Remove the Vk(Memory|Semaphor|Image)OpenInfo structs
We already deleted the functions that need them.  The structs are just
dangling uselessly.
2015-07-06 17:37:13 -07:00
Jason Ekstrand
65f9ccb4e7 vk/vulkan.h: Remove VK_MEMORY_PROPERTY_PREFER_HOST_LOCAL_BIT
We weren't doing anything with it, so this is a no-op
2015-07-06 17:33:45 -07:00
Jason Ekstrand
68fa750f2e vk/vulkan.h: Replace DEVICE_COHERENT_BIT with DEVICE_NON_COHERENT_BIT 2015-07-06 17:32:28 -07:00
Jason Ekstrand
d5b5bd67f6 vk/vulkan.h: Use the query result bits from revision 130
None of the important bits or names actually changed.  It just
added/removed some no-op names.

No functional change.
2015-07-06 17:27:11 -07:00
Jason Ekstrand
d843418c2e vk/vulkan.h: One more quick enum refactor clean-up 2015-07-06 17:26:29 -07:00
Jason Ekstrand
2b37fc28d1 vk/vulkan.h: Get rid of VERTEX_INPUT_STEP_RATE_DRAW
We never supported it, so no functional change.
2015-07-06 17:24:26 -07:00
Jason Ekstrand
a75967b1bb vk/vulkan.h: Remove the CLEAR_OPTIMAL image layout 2015-07-06 17:21:19 -07:00
Jason Ekstrand
2b404e5d00 vk: Rename CPU_READ/WRITE_BIT to HOST_READ/WRITE_BIT 2015-07-06 17:18:25 -07:00
Jason Ekstrand
c57ca3f16f vk/vulkan.h: Remove VK_IMAGE_CREATE_CLONEABLE_BIT 2015-07-06 17:14:30 -07:00
Jason Ekstrand
2de388c49c vk: Remove SHAREABLE bits
They were removed from the Vulkan API and we don't really use them because
there are no multi-GPU i965 systems.
2015-07-06 17:12:51 -07:00
Jason Ekstrand
1b0c47bba6 vk/vulkan.h: Re-order the logic op enums 2015-07-06 17:08:11 -07:00
Jason Ekstrand
c7cef662d0 vk/vulkan.h: Reformat a bunch of enums to match revision 130
In theory, no functional change.
2015-07-06 17:06:02 -07:00
Jason Ekstrand
8c5e48f307 vk: Rename NUM_SHADER_STAGE to SHADER_STAGE_NUM
This is a refactor of more than just the header but it lets us finish
reformating the shader stage enum.
2015-07-06 16:43:28 -07:00
Jason Ekstrand
d9176f2ec7 vk: Reformat a bunch of enums
This accounts for a number differences between the generated headers and
the hand-written header.  Not all reformatting is done in this commit but
it does make the headers much more diffable.

In theory, no functional change.
2015-07-06 16:41:31 -07:00
Jason Ekstrand
e95bf93e5a vk: Pull the VkResult enum from revision 130 2015-07-06 16:15:12 -07:00
Jason Ekstrand
1b7b580756 vk: re-arrange enums to match the order in revision 130 2015-07-06 16:11:05 -07:00
Jason Ekstrand
2fb524b369 vk: Rename a parameter in CmdBindDynamicStateObject 2015-07-06 15:37:17 -07:00
Jason Ekstrand
c5ffcc9958 vk: Remove multi-device stuff 2015-07-06 15:34:55 -07:00
Jason Ekstrand
c5ab5925df vk: Remove ClearDescriptorSets 2015-07-06 15:32:40 -07:00
Jason Ekstrand
ea5fbe1957 vk: Remove begin/end descriptor pool update 2015-07-06 15:32:27 -07:00
Jason Ekstrand
9a798fa946 vk: Remove stub for CloneImageData 2015-07-06 15:30:05 -07:00
Jason Ekstrand
78a0d23d4e vk: Remove the stub support for memory priorities 2015-07-06 15:28:10 -07:00
Jason Ekstrand
11cf214578 vk: Remove the stub support for explicit memory references 2015-07-06 15:27:58 -07:00
Jason Ekstrand
0dc7d4ac8a vk/vulkan.h: Reformat structs to match revision 130
Structs in the old version were specified as

typedef struct VkSomeThing_
{
   type                                        field; // comment
} VkSomeThing;

However, in the generated headers, you have

typedef struct {
   type                                        field;
} VkSomeThing;

This commit also removes some unneeded whitespaces.
2015-07-06 15:19:12 -07:00
Jason Ekstrand
19aabb5730 vk/vulkah.h: Re-arrange structures to match the order in 130 2015-07-06 15:09:30 -07:00
Connor Abbott
f9dbc34a18 nir/spirv: fix some bugs 2015-07-06 15:00:37 -07:00
Connor Abbott
f3ea3b6e58 nir/spirv: add support for builtins inside structures
We may be able to revert this depending on the outcome of bug 14190, but
for now it gets vertex shaders working with SPIR-V.
2015-07-06 15:00:37 -07:00
Connor Abbott
15047514c9 nir/spirv: fix a bug with structure creation
We were creating 2 extra bogus fields.
2015-07-06 15:00:37 -07:00
Connor Abbott
73351c6a18 nir/spirv: fix a bad assertion in the decoration handling
We should be asserting that the parent decoration didn't hand us
a member if the child decoration did, but different child decorations
may obviously have different members.
2015-07-06 15:00:37 -07:00
Connor Abbott
70d2336e7e nir/spirv: pull out logic for getting builtin locations
Also add support for more builtins.
2015-07-06 15:00:37 -07:00
Connor Abbott
aca5fc6af1 nir/spirv: plumb through the type of dereferences
We need this to know if a deref is of a builtin.
2015-07-06 15:00:37 -07:00
Connor Abbott
66375e2852 nir/spirv: handle structure member builtin decorations 2015-07-06 15:00:37 -07:00
Connor Abbott
23c179be75 nir/spirv: add a vtn_type struct
This will handle decorations that aren't in the glsl_type.
2015-07-06 15:00:37 -07:00
Connor Abbott
f9bb95ad4a nir/spirv: move 'type' into the union
Since SSA values now have their own types, it's more convenient to make
'type' only used when we want to look up an actual SPIR-V type, since
we're going to change its type soon to support various decorations that
are handled at the SPIR-V -> NIR level.
2015-07-06 15:00:37 -07:00
Jason Ekstrand
d5dccc1e7a vk: Move CreateFramebuffer and CreateRenderPass higher in the header
This matches where they are in the 130 header.
2015-07-06 14:41:43 -07:00
Jason Ekstrand
4a42f45514 vk: Remove atomic counters stubs 2015-07-06 14:38:45 -07:00
Jason Ekstrand
630b19a1c8 vk: Make vulkan.h look more like vulkan-130.h
Most of these changes are insubstantial.  The only potentially substantial
cyhange is that we added a few new #defines for API maximums.
2015-07-06 14:32:52 -07:00
Jason Ekstrand
2f9180b1b2 vk: Add a revision 130 header along-side the current header 2015-07-06 14:16:51 -07:00
Jason Ekstrand
1f1465f077 vk/meta: Add an initial implementation of ClearColorImage 2015-07-02 18:15:06 -07:00
Jason Ekstrand
8a6c8177e0 vk/meta: Factor the guts out of cmd_buffer_clear 2015-07-02 18:13:59 -07:00
Jason Ekstrand
beb0e25327 vk: Roll back to API v90
This is what version 0.1 of the Vulkan SDK is built against.
2015-07-01 16:44:12 -07:00
Jason Ekstrand
fa663c27f5 nir/spirv: Add initial structure member decoration support 2015-07-01 15:38:26 -07:00
Jason Ekstrand
e3d60d479b nir/spirv: Make vtn_handle_type match the other handler functions
Previously, the caller of vtn_handle_type had to handle actually inserting
the type.  However, this didn't really work if the type was decorated in
any way.
2015-07-01 15:34:10 -07:00
Jason Ekstrand
7a749aa4ba nir/spirv: Add basic support for Op[Group]MemberDecorate 2015-07-01 14:18:07 -07:00
Jason Ekstrand
682eb9489d vk/x11: Allow for the client querying the size of the format properties 2015-07-01 14:18:07 -07:00
Chad Versace
bba767a9af vk/formats: Fix entry for S8_UINT
I forgot to update this when fixing the depth formats.
2015-06-30 09:41:44 -07:00
Chad Versace
6720b47717 vk/formats: Document new meaning of anv_format::cpp
The way the code currently works is that anv_format::cpp is the cpp of
anv_format::surface_format.

Me and Kristian disagree about how the code *should* work. Despite that,
I think it's in our discussion's best interest to document how the code
*currently* works. That should eliminate confusion.

If and when the code begins to work differently, then we'll update the
anv_format comments.
2015-06-30 09:41:41 -07:00
Chad Versace
709fa463ec vk/depth: Add a FIXME
3DSTATE_DEPTH_BUFFER.Width,Height are wrong.
2015-06-26 22:15:03 -07:00
Chad Versace
5b3a1ceb83 vk/image: Enable 2d single-sample color miptrees
What's been tested, for both image views and color attachment views:

    - VK_FORMAT_R8G8B8A8_UNORM
    - VK_IMAGE_VIEW_TYPE_2D
    - mipLevels: 1, 2
    - baseMipLevel: 0, 1
    - arraySize: 1, 2
    - baseArraySlice: 0, 1

What's known to be broken:

    - Depth and stencil miptrees. To fix this, anv_depth_stencil_view
      needs major rework.
    - VkImageViewType != 2D
    - MSAA

Fixes Crucible tests:

  func.miptree.view-2d.levels02.array01.*
  func.miptree.view-2d.levels01.array02.*
  func.miptree.view-2d.levels02.array02.*
2015-06-26 22:11:15 -07:00
Chad Versace
c6e76aed9d vk/image: Define anv_surface, refactor anv_image
This prepares for upcoming miptree support.

anv_surface is a proxy for color surfaces, depth surfaces, and stencil
surfaces.  Embed two instances of anv_surface into anv_image: the
primary surface (color or depth), and an optional stencil surface.
2015-06-26 21:45:53 -07:00
Chad Versace
127cb3f6c5 vk/image: Reformat function signatures
Reformat them to match Mesa code-style.
2015-06-26 20:12:42 -07:00
Chad Versace
fdcd71f71d vk/image: Embed VkImageCreateInfo* into anv_image_create_info
All function signatures that matched this pattern,
  old: f(const VkImageCreateInfo *, const struct anv_image_create_info *)

were rewritten as
  new: f(const struct anv_image_create_info *)
2015-06-26 20:06:08 -07:00
Chad Versace
ca6cef3302 vk/image: Drop some tmp vars in anv_image_view_init()
Variables 'tile_mode' and 'format' are unneeded.
2015-06-26 19:50:04 -07:00
Chad Versace
9c46ba9ca2 vk/image: Abort on stencil image views
The code doesn't work. Not even close.

Replace the broken code with a FINISHME and abort.
2015-06-26 19:23:21 -07:00
Chad Versace
667529fbaa vk: Reindent struct anv_image 2015-06-26 15:27:20 -07:00
Chad Versace
74e3eb304f vk: Define MIN(a, b) macro 2015-06-26 15:09:07 -07:00
Chad Versace
55752fe94a vk: Rename functions ALIGN_*32 -> align_*32
ALIGN_U32 and ALIGN_I32 are functions, not macros. So stop using
allcaps.
2015-06-26 15:07:59 -07:00
Connor Abbott
6ee082718f Merge branch 'wip/nir-vtn' into vulkan
Adds composites and matrix multiplication, plus some control flow fixes.
2015-06-26 12:14:05 -07:00
Chad Versace
37d6e04ba1 vk/formats: Remove the cpp=0 stencil hack
The format table defined cpp = 0 for stencil-only formats. The real cpp
is 1.

When code begins to lie, especially about stencil buffers, code becomes
increasingly fragile as time progresses, and the damage becomes
increasingly hard to undo. (For precedent, see the painful history of
stencil buffer cpp in the git log for gen6 and gen7 in the i965 driver).
Let's undo the stencil buffer cpp lie now to avoid future pain.

In the format table, set cpp = 1 for VK_FORMAT_S8; replace checks for
cpp == 0; and delete all comments about the hack.
2015-06-26 09:58:22 -07:00
Chad Versace
67a7659d69 vk/image: Refactor anv_image_create()
From my experience with intel_mipmap_tree.c, I learned that for struct's
like anv_image and intel_mipmap_tree, which have sprawling
multi-function construction codepaths, it's easy to mistakenly use
unitialized struct members during construction.

Let's eliminate the risk of using unitialized anv_image members during
construction.  Fill the struct at the function bottom instead of
piecemeal throughout the constructor.
2015-06-26 09:32:59 -07:00
Chad Versace
5d7103ee15 vk/image: Group some assertions closer together
In anv_image_create(), group together the assertions on
VkImageCreateInfo.
2015-06-26 09:05:46 -07:00
Chad Versace
0349e8d607 vk/formats: #undef fmt at end of format table 2015-06-26 07:38:02 -07:00
Chad Versace
068b8a41e2 vk: Fix comment for anv_depth_stencil_view::stencil_qpitch
s/DEPTH/STENCIL/
2015-06-26 07:31:57 -07:00
Chad Versace
7ea707a42a vk/image: Add qpitch fields to anv_depth_stencil_view
For now, hard-code them to 0.
2015-06-25 20:10:16 -07:00
Chad Versace
b91a76de98 vk: Reindent and document struct anv_depth_stencil_view 2015-06-25 20:10:16 -07:00
Chad Versace
ebe1e768b8 vk/formats: Fix incorrect depth formats
anv_format::surface_format was incorrect for Vulkan depth formats.
For example, the format table mapped

    VK_FORMAT_D24_UNORM -> .surface_format = D24_UNORM_X8_UINT
    VK_FORMAT_D32_FLOAT -> .surface_format = D32_FLOAT

but should have mapped

    VK_FORMAT_D24_UNORM -> .surface_format = R24_UNORM_X8_TYPELESS
    VK_FORMAT_D32_FLOAT -> .surface_format = R32_FLOAT

The Crucible test func.depthstencil.basic passed despite the bug, but
only because it did not attempt to texture from the depth surface.

The core problem is that RENDER_SURFACE_STATE.SurfaceFormat and
3DSTATE_DEPTH_BUFFER.SurfaceFormat are distinct types. Considering them
as enum spaces, the two enum spaces have incompatible collisions.

Fix this by adding a new field 'depth_format' to struct anv_format.

Refer to brw_surface_formats.c:translate_tex_format() for precedent.
2015-06-25 20:10:16 -07:00
Chad Versace
45b804a049 vk/image: Rename local variable in anv_image_create()
This function has many local variables for info structs. Having one
named simply 'info' is confusing.  Rename it to 'format_info'.
2015-06-25 20:10:16 -07:00
Chad Versace
528071f004 vk/formats: Fix table entry for R8G8B8_SNORM
Now that anv_formats[] is formatted like a table, buggy entries are
easier to see.
2015-06-25 20:10:16 -07:00
Chad Versace
4c8146313f vk/formats: Rename anv_format::format -> surface_format
I misinterpreted anv_format::format as a VkFormat. Instead, it is
a hardware surface format (RENDER_SURFACE_STATE.SurfaceFormat). Rename
the field to 'surface_format' to make it unambiguous.
2015-06-25 20:10:16 -07:00
Chad Versace
4b8b451a1d vk/formats: Rename anv_format::channels -> num_channels
I misinterpreted anv_format::channels as a bitmask of channels.
Renaming it to 'num_channels' makes it unambiguous.
2015-06-25 20:10:16 -07:00
Chad Versace
af0ade0d6c vk: Reindent struct anv_format 2015-06-25 20:10:16 -07:00
Chad Versace
ae29fd1b55 vk/formats: Don't abbreviate tokens in the format table
Abbreviating the VK_FORMAT_* tokens doesn't help much. To the contrary,
it means grep and ctags can't find them.
2015-06-25 20:10:16 -07:00
Jason Ekstrand
d5e41a3a99 vk/compiler: Add the initial hacks to get SPIR-V up and going 2015-06-25 17:36:35 -07:00
Jason Ekstrand
c4c1d96a01 HACK: Get rid of sanity_param_count for FS 2015-06-25 17:36:34 -07:00
Jason Ekstrand
4f5ef945e0 i965: Don't print the GLSL IR if it doesn't exist 2015-06-25 17:36:34 -07:00
Jason Ekstrand
588acdb431 nir/spirv: Set the right location for shader input/outputs
We need to add FRAG_RESULT_DATA0 etc. to the input/output location.
2015-06-25 17:36:34 -07:00
Jason Ekstrand
333b8ddd6b nir/spirv: Set the interface type on uniform blocks 2015-06-25 17:36:34 -07:00
Jason Ekstrand
7e1792b1b7 nir/spirv: Set the system value mode on builtins 2015-06-25 17:36:34 -07:00
Jason Ekstrand
b72936fdad nir/spirv: Actually put variables on the right linked list 2015-06-25 17:36:34 -07:00
Jason Ekstrand
ee0a8f23e4 glsl: Move vert_attrib varying_slot and frag_result enums to shader_enums.h 2015-06-25 17:36:34 -07:00
Chad Versace
fa352969a2 vk/image: Check extent does not exceed surface type limits 2015-06-25 16:53:24 -07:00
Chad Versace
99031aa0f3 vk/image: Stop hardcoding SurfaceType of VkImageView
Instead, translate VkImageViewType to a gen SurfaceType.
2015-06-25 16:53:22 -07:00
Chad Versace
7ea121687c vk/image: Add anv_image::surf_type
This the gen SurfaceType, such as SURFTYPE_2D.
2015-06-25 16:52:16 -07:00
Chad Versace
cb30acaced vk/image: Add tables for gen SurfaceType
Tables for mapping VkImageType and VkImageViewType to gen SurfaceType.
Tables are unused.
2015-06-25 16:52:16 -07:00
Chad Versace
1132080d5d vk/util: Add anv_loge() for logging error messages 2015-06-25 16:52:16 -07:00
Chad Versace
5f2d469e37 vk: Add func anv_is_aligned() 2015-06-25 16:52:16 -07:00
Chad Versace
f7fb7575ef vk: Add anv_minify() 2015-06-25 16:52:05 -07:00
Chad Versace
7cec6c5dfd vk: Define MAX(a, b) macro 2015-06-25 16:29:42 -07:00
Jason Ekstrand
d178e15567 nir/spirv: Fix up some dererf ralloc parenting 2015-06-24 21:39:07 -07:00
Jason Ekstrand
845002e163 i965/nir: Handle returns as long as they're at the end of a function 2015-06-24 21:38:49 -07:00
Jason Ekstrand
2ecac045a4 i965/nir: Split NIR shader handling into two functions
The brw_create_nir function takes a GLSL or ARB shader and turns it into a
NIR shader.  The guts of the optimization and lowering code is now split
into a new brw_process_shader function.
2015-06-24 21:22:07 -07:00
Jason Ekstrand
e369a0eb41 nir/spirv: Use vtn_ssa_value for texture coordinates 2015-06-24 20:39:37 -07:00
Jason Ekstrand
d0bd2bc604 nir/spirv: Add support for the Uniform storage class
This is kida sketchy.  I'm not really sure this is the way it's supposed to
be used.
2015-06-24 20:32:05 -07:00
Jason Ekstrand
ba0d9d33d4 nir/spirv: Add support for some more decorations including built-in 2015-06-24 20:30:32 -07:00
Jason Ekstrand
1bc0a1ad98 nir/spirv: Make the header file C++ safe 2015-06-24 19:01:10 -07:00
Jason Ekstrand
88d02a1b27 vk: Build xmlconfig stuff into libi965_compiler 2015-06-24 15:59:09 -07:00
Kristian Høgsberg Kristensen
24dff4f8fa vk/headers: Handle MBO fields
These must be set to one.
2015-06-24 09:37:50 -07:00
Jason Ekstrand
a62edcce4e Merge remote-tracking branch 'mesa-public/master' into vulkan 2015-06-23 18:05:25 -07:00
Connor Abbott
dee4a94e69 nir/vtn: add support for phi nodes 2015-06-23 10:34:55 -07:00
Connor Abbott
fe1269cf28 nir/builder: add support for inserting before/after blocks 2015-06-23 10:34:22 -07:00
Connor Abbott
9a3dda101e nir/vtn: fix emitting code after loops
When we're done emitting the code for a loop, we need to visit the new
break block, which is the merge block of the current loop, rather than
the old merge block, which is the merge block of the loop containing the
one we just emitted code for.
2015-06-22 13:53:08 -07:00
Connor Abbott
e9c21d0ca0 unbreak things 2015-06-22 11:59:55 -07:00
Kristian Høgsberg Kristensen
9b9f973ca6 vk: Implement scratch buffers to make spilling work 2015-06-19 15:42:15 -07:00
Kristian Høgsberg Kristensen
9e59003fb1 vk: Undo relocs for scratch bos 2015-06-19 15:42:15 -07:00
Kristian Høgsberg Kristensen
b20794cfa8 vk/allocator: Get rid of non-memfd path
We can just use modern valgrind now.
2015-06-19 15:42:15 -07:00
Kristian Høgsberg Kristensen
aba75d0546 vk/headers: Make General State offsets relocations 2015-06-19 15:42:15 -07:00
Connor Abbott
841aab6f50 matrices matrices matrices 2015-06-18 18:52:44 -07:00
Connor Abbott
d0fc04aacf nir/types: be less strict about constructing matrix types 2015-06-18 18:51:51 -07:00
Connor Abbott
22854a60ef nir/builder: add a nir_fdot() convenience function 2015-06-18 17:34:55 -07:00
Connor Abbott
0e86ab7c0a nir/types: add a helper to transpose a matrix type 2015-06-18 17:34:12 -07:00
Connor Abbott
de4c31a085 fix glsl450 for composites 2015-06-18 17:33:08 -07:00
Kristian Høgsberg Kristensen
aedd3c9579 vk: Add missing gen7 RENDER_SURFACE_STATE struct 2015-06-17 21:42:29 -07:00
Connor Abbott
bf5a615659 composites composites composites 2015-06-17 16:25:38 -07:00
Kristian Høgsberg Kristensen
fa8a07748d vk: Compute CS exec mask and thread width max in pipeline
We compute the right mask and thread width max parameters as part of
pipeline creation and set them accordingly at vkCmdDispatch() and
vkCmdDispatchIndirect() time. These parameters depend only on the local
group size and the dispatch width of the program so we can figure this
out at pipeline create time.
2015-06-12 18:21:50 -07:00
Kristian Høgsberg Kristensen
c103c4990c vk: Set binding table layout for CS
We weren't setting the binding table layout for the backend compiler.
2015-06-12 18:21:49 -07:00
Kristian Høgsberg Kristensen
2fdd17d259 vk: Generate CS prog_data into the pipeline instance
We were generating the prog_data into a local variable and never
initializing the pipeline->cs_prog_data one.
2015-06-12 18:21:49 -07:00
Kristian Høgsberg Kristensen
00494c6cb7 vk: Document how depth/stencil formats work in anv_image_create()
This reverts commits

  e17ed04 * vk/image: Don't double-allocate stencil buffers
  1ee2d1c * vk/image: Teach anv_image_choose_tile_mode about WMAJOR

and instead adds a comment to describe the subtlety of how we create
images for stencil only formats.
2015-06-11 22:07:16 -07:00
Kristian Høgsberg Kristensen
fbc9fe3c92 vk: Use compute pipeline layout when binding compute sets 2015-06-11 21:57:43 -07:00
Kristian Høgsberg Kristensen
765175f5d1 vk: Implement basic compute shader support 2015-06-11 15:31:42 -07:00
Kristian Høgsberg Kristensen
7637b02aaa vk: Emit PIPELINE_SELECT on demand 2015-06-11 15:21:49 -07:00
Kristian Høgsberg Kristensen
405697eb3d vk: Stop asserting we have a fragment shader
Even for graphics, this is not a requirement, we can have a depth-only output pipeline.
2015-06-11 15:07:38 -07:00
Kristian Høgsberg Kristensen
e7edde60ba vk: Defer setting viewport dynamic state
We can't emit this until we've done a 3D pipeline select.
2015-06-11 15:04:09 -07:00
Kristian Høgsberg Kristensen
f7fe06cf0a vk: Disable shader stages in the graphics pipeline batch
We need to move this into the graphics pipeline batch so we don't  emit it
for compute pipelines.
2015-06-11 14:58:31 -07:00
Kristian Høgsberg Kristensen
9aae480cc4 vk: Don't emit STATE_SIP
We don't have a SIP kernel and don't enable exceptions.
2015-06-11 14:56:29 -07:00
Kristian Høgsberg Kristensen
923e923bbc vk: Compile fragment shader after VS and GS
Just moving code around to do shader stages in the natual order.
2015-06-11 14:55:50 -07:00
Jason Ekstrand
1dd63fcbed vk/entrypoints: Don't print every single function call 2015-06-11 10:10:13 -07:00
Kristian Høgsberg Kristensen
b581e924b6 vk: Remove left-over trp call 2015-06-11 09:26:49 -07:00
Kristian Høgsberg Kristensen
d76ea7644a vk: Set maximum point size range
We set both minimum and maximum point size to 0 in 3DSTATE_CLIP, which
will clip away all points.
2015-06-11 09:25:04 -07:00
Kristian Høgsberg Kristensen
a5b49d2799 vk: Use generated headers with fixed point support
The generated headers now convert float in the template struct to the
correct fixed point format.
2015-06-11 09:25:04 -07:00
Kristian Høgsberg Kristensen
ea7ef46cf9 vk: Regenerate headers with __gen_validate_value() 2015-06-11 09:25:03 -07:00
Jason Ekstrand
a566b1e08a vk/formats: Refactor format properties code
Along with the refactor, we now do the right thing when we hit an
unsupported format: Set the flags to 0 and return VK_SUCCESS.
2015-06-11 09:11:16 -07:00
Jason Ekstrand
2a3c29698c vk/image: Add a bunch of asserts 2015-06-10 21:04:51 -07:00
Jason Ekstrand
c8b62d109b vk: Add a couple vk_error calls 2015-06-10 21:04:13 -07:00
Jason Ekstrand
7153b56abc vk/private: Add a non-fatal assert 2015-06-10 21:03:50 -07:00
Jason Ekstrand
29d2bbb2b5 vk/cmd: Add an initial implementation of PipelineBarrier
We may want to do something more inteligent here later such as actually
handling image layout transitions.  However, this should do for now.
2015-06-10 16:37:33 -07:00
Jason Ekstrand
047ed02723 vk/emit: Use valgrind to validate every packed field 2015-06-10 12:43:02 -07:00
Jason Ekstrand
9cae3d18ac vk: Add valgrind checks in various emit functions
The check in batch_bo_finish should catch any undefined values in the batch
but isn't that great for debugging.  The checks in the various emit
functions will help get better granularity.
2015-06-09 21:51:37 -07:00
Jason Ekstrand
d5ad24e39b vk: Move the valgrind include and VG() macro to private.h 2015-06-09 21:51:37 -07:00
Chad Versace
e17ed04b03 vk/image: Don't double-allocate stencil buffers
If the main surface has format S8_UINT, then don't allocate the
auxiliary stencil surface.
2015-06-09 16:39:28 -07:00
Chad Versace
1ee2d1c3fc vk/image: Teach anv_image_choose_tile_mode about WMAJOR 2015-06-09 16:38:55 -07:00
Chad Versace
2d2e148952 vk/util: Add anv_abortf(), anv_abortfv()
Convenience functions to print an error message then abort.
2015-06-09 16:38:50 -07:00
Chad Versace
ffb1ee5d20 vk: Define anv_noreturn macro 2015-06-09 16:38:46 -07:00
Chad Versace
f1db3b3869 vk/image: Factor tile mode selection into separate function
Because it will eventually need to get smarter.
2015-06-09 16:38:42 -07:00
Jason Ekstrand
11e941900a vk/device: Actually allow destruction 2015-06-09 16:28:46 -07:00
Jason Ekstrand
5d4b6a01af vk/cmd_buffer: Properly initialize/reset dynamic states 2015-06-09 16:27:55 -07:00
Jason Ekstrand
634a6150b9 vk/pipeline: Zero out the depth-stencil state when not in use 2015-06-09 16:26:55 -07:00
Jason Ekstrand
919e7b7551 vk/device: Use anv_CreateDynamicViewportState instead of the vk one 2015-06-09 16:01:56 -07:00
Jason Ekstrand
0599d39dd9 vk/device: Dedent the vkCreateDynamicViewportState call 2015-06-09 15:53:26 -07:00
Chad Versace
d57c4cf999 vk/util: Annotate anv_finishme() as printflike 2015-06-09 14:46:49 -07:00
Chad Versace
822cb16abe vk: Define anv_printflike() macro 2015-06-09 14:46:45 -07:00
Chad Versace
081f617b5a vk/image: Stop hardcoding alignment of stencil surfaces
Look up the alignment from anv_tile_info_table.
2015-06-09 14:16:56 -07:00
Chad Versace
e6bd568f36 vk/image: Rewrite tile info table
- Reduce the number of table lookups in anv_image_create from 4 to 1.
- Add field for surface alignment.
- Shorten field names tile_width, tile_height -> width, height.
2015-06-09 14:16:45 -07:00
Chad Versace
5b777e2bcf vk/image: Delete an old comment 2015-06-09 14:14:29 -07:00
Jason Ekstrand
d842a6965f vk/compiler: Free the GL errors data 2015-06-09 12:36:23 -07:00
Jason Ekstrand
9f292219bf vk/compiler: Free more of prog_data when tearing down a pipeline 2015-06-09 12:36:23 -07:00
Jason Ekstrand
66b00d5e5a vk/queue: Embed the queue in and allocate it with the device 2015-06-09 12:36:23 -07:00
Jason Ekstrand
38f5eef59d vk/device: Free border color states when we have valgrind 2015-06-09 12:36:23 -07:00
Jason Ekstrand
999b56c507 vk/device: Destroy all batch buffers
Due to a copy+paste error, we were destroying all but the first batch or
surface state buffer.  Now we destroy them all.
2015-06-09 12:36:23 -07:00
Jason Ekstrand
3a38b0db5f vk/meta: Clean up temporary objects 2015-06-09 12:36:23 -07:00
Jason Ekstrand
9d6f55dedf vk/surface_view: Add a destructor 2015-06-09 12:36:23 -07:00
Chad Versace
e6162c2fef vk/image: Add anv_image::h_align,v_align
Use the new fields to compute RENDER_SURFACE_STATE.Surface*Alignment.
We still hardcode them to 4, though.
2015-06-09 12:19:24 -07:00
Jason Ekstrand
58afc24e57 vk/allocator: Remove the concept of a slave block pool
This reverts commit d24f8245db.
2015-06-08 17:46:32 -07:00
Jason Ekstrand
b6363c3f12 vk/device: Remove the binding table pools/streams 2015-06-08 17:45:57 -07:00
Jason Ekstrand
531549d9fc vk/pipeline: Move freeing the program stream to pipeline.c
It's created in pipeline.c so we should free it there.
2015-06-08 14:27:04 -07:00
Jason Ekstrand
66a4dab89a vk/pipeline: Don't destroy the program stream
It's freed in compiler.cpp and we don't want to free it twice.
2015-06-08 13:53:19 -07:00
Jason Ekstrand
920fb771d4 vk/allocator: Make the use of NULL_BLOCK in state_stream_finish explicit 2015-06-08 13:53:19 -07:00
Kristian Høgsberg Kristensen
52637c0996 vk: Quiet a few warnings 2015-06-08 08:51:40 -07:00
Kristian Høgsberg Kristensen
9eab70e54f vk: Create a minimal context for the compiler
This avoids the full brw context initialization and just sets up context
constants, initializes extensions and sets a few driver vfuncs for the
front-end GLSL compiler.
2015-06-08 08:51:40 -07:00
Jason Ekstrand
ce00233c13 vk/cmd_buffer: Use the dynamic state stream in emit_dynamic and merge_dynamic 2015-06-05 17:26:41 -07:00
Jason Ekstrand
e69588b764 vk/device: Use a 64-byte alignment for CC state 2015-06-05 17:26:26 -07:00
Jason Ekstrand
c2eeab305b vk/pipeline: Actually free the program stream and dynamic pool 2015-06-05 17:26:26 -07:00
Jason Ekstrand
ed2ca020f8 vk/allocator: Avoid double-free in the bo pool 2015-06-05 17:12:28 -07:00
Jason Ekstrand
aa523d3c62 vk/gem: Call VALGRIND_FREELIKE_BLOCK before unmapping 2015-06-05 16:41:49 -07:00
Chad Versace
87d98e1935 vk: Fix 2 incorrect typecasts
The compiler didn't find the cast errors because all Vulkan types are
just integers.
2015-06-04 14:32:22 -07:00
Chad Versace
b981379bcf vk: Make make clean remove generated spirv headers 2015-06-04 14:26:46 -07:00
Jason Ekstrand
8d930da35d vk/allocator: Remove an unneeded VG() wrapper 2015-06-04 09:14:33 -07:00
Jason Ekstrand
7f90e56e42 vk/device: Dissalow device destruction 2015-06-04 09:14:33 -07:00
Chad Versace
9cd42b3dea vk: Fix build
Commit 1286bd, which deleted vk.c, broke the build. Update the Makefile
to fix it.
2015-06-04 09:01:30 -07:00
Jason Ekstrand
251aea80b0 vk/DS: Mask stencil masks to 8 bits 2015-06-03 16:59:13 -07:00
Connor Abbott
47bd462b0c awesome control flow bugfixes/clarifications 2015-06-03 14:10:28 -04:00
Kristian Høgsberg Kristensen
a37d122e88 vk: Set color/blend state in meta clear if not set yet 2015-06-02 23:08:05 -07:00
Kristian Høgsberg Kristensen
1286bd3160 vk: Delete vk.c test case
We now have crucible up and running and all vk sub-cases have been moved
over. Delete this crufty old hack of a test case.
2015-06-02 22:57:42 -07:00
Kristian Høgsberg Kristensen
2f6aa424e9 vk: Update generated headers with support for 64 bit fields 2015-06-02 22:57:42 -07:00
Kristian Høgsberg Kristensen
5744d1763c vk: Set cb_state to NULL at cmd buffer create time
Dynamic color/blend state can be NULL in case we're not rendering to
color targets (only output to depth and/or stencil). Initialize
cmd_buffer->cb_state to NULL so we can reliably detect whether it's been
set or not.
2015-06-02 22:57:42 -07:00
Kristian Høgsberg Kristensen
c8f078537e vk: Implement vertexOffset parameter of vkCmdDrawIndexed()
As exposed by the func.draw_indexed test, we were ignoring the argument
and hardcoding 0.
2015-06-02 22:57:42 -07:00
Jason Ekstrand
e702197e3f vk/formats: Add a name to the metadata and better logging 2015-06-02 11:30:39 -07:00
Jason Ekstrand
fbafc946c6 vk/formats: Rework the formats table 2015-06-02 11:30:39 -07:00
Kristian Høgsberg Kristensen
f98c89ef31 vk: Move query related functionality to new file query.c 2015-06-01 21:52:45 -07:00
Jason Ekstrand
08748e3a0c i965: Use NIR by default for vertex shaders on GEN8+
GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:

   total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
   instructions in affected programs:     1514770 -> 1454047 (-4.01%)
   helped:                                5813
   HURT:                                  1120

The gained programs are ARB vertext programs that were previously going
through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
programs can go through the scalar backend so they show up as "gained" in
the shader-db results.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-06-01 12:25:58 -07:00
Jason Ekstrand
d4cbf6a728 vk/compiler: Add an index_count to the bind map and check for OOB 2015-06-01 12:25:58 -07:00
Jason Ekstrand
510b5c3bed vk/HACK: Plumb real descriptor set/index into textures 2015-06-01 12:25:58 -07:00
Jason Ekstrand
aded32bf04 NIR: Add a helper for doing sampler lowering for vulkan 2015-06-01 12:25:58 -07:00
Kristian Høgsberg Kristensen
5caa408579 vk: Indent tables to align '=' at column 48 2015-05-31 22:36:26 -07:00
Kristian Høgsberg Kristensen
76bb658518 vk: Add support for anisotropic bits 2015-05-31 22:15:34 -07:00
Kristian Høgsberg Kristensen
dc56e4f7b8 vk: Implement support for sampler border colors
This supports the three Vulkan border color types for float color
formats. The support for integer formats is a little trickier, as we
don't know the format of the texture at this time.
2015-05-31 17:20:48 -07:00
Jason Ekstrand
e497ac2c62 vk/device: Only flush the texture cache when setting state base address
After further examination, it appears that the other flushes and stalls
weren't actually needed.
2015-05-30 18:04:50 -07:00
Jason Ekstrand
2251305e1a vk/cmd_buffer: Track descriptor set dirtying per-stage 2015-05-30 10:07:29 -07:00
Jason Ekstrand
33cccbbb73 vk/device: Emit PIPE_CONTROL flushes surrounding new STATE_BASE_ADDRESS
According to the bspec, you're supposed to emit a PIPE_CONTROL with a CS
stall and a render target flush prior to chainging STATE_BASE_ADDRESS.  A
little experimentation, however, shows that this is not enough.  It also
appears as if you have to flush the texture cache after chainging base
address or things won't propagate properly.
2015-05-30 08:08:07 -07:00
Jason Ekstrand
b2b9fc9fad vk/allocator: Don't call VALGRIND_MALLOCLIKE_BLOCK on fresh gem_mmap's 2015-05-29 21:15:47 -07:00
Jason Ekstrand
03ffa9ca31 vk: Don't crash on partial descriptor sets 2015-05-29 20:43:10 -07:00
Jason Ekstrand
4ffbab5ae0 vk/device: Allow for starting a new surface state buffer
This commit allows for us to create a whole new surface state buffer when
the old one runs out of room.  We simply re-emit the state base address for
the new state, re-emit binding tables, and keep going.
2015-05-29 17:49:41 -07:00
Jason Ekstrand
c4bd5f87a0 vk/device: Do lazy surface state emission for binding tables
Before, we were emitting surface states up-front when binding tables were
updated.  Now, we wait to emit the surface states until we emit the binding
table.  This makes meta simpler and should make it easier to deal with
swapping out the surface state buffer.
2015-05-29 16:51:11 -07:00
Kristian Høgsberg Kristensen
4aecec0bd6 vk: Store dynamic slot index with struct anv_descriptor_slot
We need to make sure we use the right index into dynamic offset
array. Dynamic descriptors can be present or not in different stages and
to get the right offset, we need to compute the index at
vkCreateDescriptorSetLayout time.
2015-05-29 11:32:53 -07:00
Kristian Høgsberg Kristensen
fad418ff47 vk: Implement dynamic buffer offsets
We do this by creating a surface state on the fly that incorporates the
dynamic offset. This patch also refactor the descriptor set layout
constructor a bit to be less clever with switch statement fall
through. Instead of duplicating the subtle code to update the sampler
and surface slot map, we just use two switch statements.
2015-05-28 22:41:20 -07:00
Jason Ekstrand
9ffc1bed15 vk/device: Split state base address emit into its own function 2015-05-28 15:34:08 -07:00
Jason Ekstrand
468c89a351 vk/device: Use anv_batch_emit for MI_BATCH_BUFFER_START 2015-05-28 15:25:02 -07:00
Jason Ekstrand
2dc0f7fe5b vk/device: Actually destroy batch buffers 2015-05-28 13:08:21 -07:00
Jason Ekstrand
8cf932fd25 vk/query: Don't emit a CS stall by itself
Both the bspec and the simulator don't like this.  I'm not sure if stalling
at the scoreboard is right but it at least shuts up the simulator.
2015-05-28 10:27:53 -07:00
Jason Ekstrand
730ca0efb1 vk/device: Fixups for batch buffer chaining
Some how these didn't get merged with the other batch buffer chaining
stuff.  Oh well, it's here now.
2015-05-28 10:26:11 -07:00
Jason Ekstrand
de221a672d meta: Add a default ds_state and use it when no ds state is set 2015-05-28 10:06:45 -07:00
Jason Ekstrand
6eefeb1f84 vk/meta: Share the dummy RS and CB state between clear and blit 2015-05-28 10:00:38 -07:00
Kristian Høgsberg Kristensen
5a317ef4cb vk: Initialize dynamic state binding points to NULL
We rely on these being initialized to NULL so meta can reliably detect
whether or not they've been set. ds_state is also allowed to not be
present so we need a well-defined value for that.
2015-05-27 22:13:48 -07:00
Chad Versace
1435bf4bc4 .gitignore: Ignore spirv2nir binary 2015-05-27 17:01:09 -07:00
Chad Versace
f559fe9134 .gitignore: Scope Vulkan's generated source files
Don't ignore any file named entrypoints.{c,h}. Ignore it only if it's in
src/vulkan.
2015-05-27 16:59:53 -07:00
Chad Versace
ca385dcf2a vk: gitignore generated source files 2015-05-27 16:57:31 -07:00
Chad Versace
466f61e9f6 vk/glsl_scraper: Replace adhoc arg parsing with argparse 2015-05-27 16:56:02 -07:00
Chad Versace
fab9011c44 vk/image: Assert that VkImageTiling is valid 2015-05-27 16:21:04 -07:00
Chad Versace
c0739043b3 vk/image: Remove trailing whitespace 2015-05-27 16:15:47 -07:00
Chad Versace
4514e63893 vk/glsl: Reject invalid options
The script incorrectly interpreted --blah as the input filename.
2015-05-27 16:14:26 -07:00
Chad Versace
fd8b5e0df2 vk/glsl_scraper: Indent large text blocks
Indent them to the same level as if the text was code.

No changes in entrypoints.{c,h} after a clean build.
2015-05-27 16:09:31 -07:00
Chad Versace
df4b02f4ed vk/glsl_scraper: Fix code style for imports
Python style is one module imported per line, and imports are at the top
of the file.
2015-05-27 16:04:12 -07:00
Jason Ekstrand
b23885857f vk/meta: Actually create the CB state for blits 2015-05-27 12:06:30 -07:00
Jason Ekstrand
da8f148203 vk: Rework anv_batch and use chaining batch buffers
This mega-commit primarily does two things.  First, is to turn anv_batch
into a better abstraction of a batch.  Instead of actually having a BO, it
now has a few pointers to some piece of memory that are used to add data to
the "batch".  If it gets to the end, there is a function pointer that it
can call to attempt to grow the batch.

The second change is to start using chained batch buffers.  When the end of
the current batch BO is reached, it automatically creates a new one and
ineserts an MI_BATCH_BUFFER_START command to chain to it.  In this way, our
batch buffers are effectively infinite in length.
2015-05-27 11:48:28 -07:00
Jason Ekstrand
59def43fc8 Fixup for growable reloc lists 2015-05-27 11:48:28 -07:00
Jason Ekstrand
1c63575de8 vk/cmd_buffer: Allocate the surface_bo from device->batch_bo_pool 2015-05-27 11:48:28 -07:00
Jason Ekstrand
403266be05 vk/device: Make reloc lists growable 2015-05-27 11:48:28 -07:00
Jason Ekstrand
5ef81f0a05 vk/device: Use a bo pool for batch buffers 2015-05-27 11:48:28 -07:00
Jason Ekstrand
6f3e3c715a vk/allocator: Add a BO pool 2015-05-27 11:48:28 -07:00
Jason Ekstrand
59328bac10 vk/allocator: Add a free list that acts on pointers instead of offsets 2015-05-27 11:48:28 -07:00
Kristian Høgsberg
a1d30f867d vk: Add support for dynamic and pipeline color blend state 2015-05-26 17:12:37 -07:00
Kristian Høgsberg
2514ac5547 vk/test: Create and use color/blend dynamic and pipeline state 2015-05-26 17:12:37 -07:00
Kristian Høgsberg
1cd8437b9d vk/meta: Allocate and set color/blend state
For color blend, we have to set our own state to avoid inheriting bogus
blend state.
2015-05-26 17:12:37 -07:00
Kristian Høgsberg
610e6291da vk: Allocate samplers from dynamic stream 2015-05-26 11:50:34 -07:00
Kristian Høgsberg
b29f44218d vk: Emit color calc state
This involves pulling stencil ref values out of DS dynamic state and the
blend constant out of CB dynamic state.
2015-05-26 11:27:31 -07:00
Kristian Høgsberg
5e637c5d5a vk/pack: Generate length macros for structs 2015-05-26 11:27:31 -07:00
Kristian Høgsberg
998837764f vk: Program depth bias
This makes 3DSTATE_RASTER a split state command.
2015-05-26 11:27:31 -07:00
Kristian Høgsberg
0dbed616af vk: Add support for texture component swizzle
This also drops the share create_surface_state helper and moves filling
out SURFACE_STATE directly into anv_image_view_init() and
anv_color_attachment_view_init().
2015-05-26 11:27:29 -07:00
Kristian Høgsberg
cbe7ed416e vk: Implement dynamic and pipeline ds state 2015-05-25 20:20:31 -07:00
Kristian Høgsberg
37743f90bc vk: Set up depth and stencil buffers 2015-05-25 20:20:31 -07:00
Kristian Høgsberg
7c0d0021eb vk/test: Add new depth-stencil test
Not yet a depth stencil test, but will become one.
2015-05-25 20:20:31 -07:00
Kristian Høgsberg
0997a7b2e3 vk: Add basic MOCS settings
This matches what we do for GL.
2015-05-25 20:20:31 -07:00
Kristian Høgsberg
c03314bdd3 vk: Update to header files with nested struct support
This will let us do MOCS settings right.
2015-05-25 20:20:31 -07:00
Jason Ekstrand
ae8c93e023 vk/cmd_buffer: Initialize the pipeline pointer to NULL
If a meta operation is called before the pipeline is set, this can cause
uses of undefined values.  They *should* be harmless, but we might as well
shut up valgrind on this one too.
2015-05-25 17:14:49 -07:00
Jason Ekstrand
912944e59d vk/device: Use the correct number of viewports when creating default VP state
Fixes valgrind uninitialized value errors
2015-05-25 17:14:49 -07:00
Jason Ekstrand
1b211feb6c vk/compiler: Zero out the vs_prog_data struct when VS is disabled
Prevents uninitialized value errors
2015-05-25 17:14:49 -07:00
Jason Ekstrand
903bd4b056 vk/compiler: Fix up the binding hack and make it work in NIR 2015-05-25 12:57:32 -07:00
Jason Ekstrand
57153da2d5 vk: Actually implement some sort of destructor for all object types 2015-05-22 15:15:08 -07:00
Jason Ekstrand
0f0b5aecb8 vk/pipeline: Track VB's that are actually used by the pipeline
Previously, we just blasted out whatever VB's we had marked as "dirty"
regardless of which ones were used by the pipeline.  Given that the stride
of the VB is embedded in the pipeline this can cause problems.  One problem
is if the pipeline doesn't use the given VB binding we emit a bogus stride.
Another problem is that we weren't properly resetting the dirty bits when
the pipeline changed.
2015-05-21 16:58:53 -07:00
Jason Ekstrand
0a54751910 vk/device: Memset descriptor sets to 0 and handle descriptor set holes 2015-05-21 16:33:04 -07:00
Jason Ekstrand
519fe765e2 vk: Do relocations in surface states when they are created
Previously, we waited until later and did a pass through the used surfaces
and did the relocations then.  This lead to doing double-relocations which
was causing us to get bogus surface offsets.
2015-05-21 15:55:29 -07:00
Jason Ekstrand
ccf2bf9b99 vk/test: Use the glsl_scraper for building shaders 2015-05-21 12:24:02 -07:00
Jason Ekstrand
f3d70e4165 vk/glsl_scraper: Use the LunarG back-door for GLSL source 2015-05-21 12:22:44 -07:00
Jason Ekstrand
cb56372eeb vk/glsl_scraper: Use a fake GLSL version that glslang will accept 2015-05-21 12:21:02 -07:00
Jason Ekstrand
0e441cde71 vk: Bake the GLSL_VK_SHADER macro into the scraper output file 2015-05-21 12:21:00 -07:00
Jason Ekstrand
f17e835c26 vk/meta: Use glsl_scraper for our GLSL source
We are not yet using SPIR-V for meta but this is a first step.
2015-05-21 11:39:54 -07:00
Jason Ekstrand
b13c0f469b vk: More out-of-tree build fixes 2015-05-21 11:32:59 -07:00
Jason Ekstrand
f294154e42 vk: Fix for out-of-tree builds 2015-05-21 10:23:18 -07:00
Kristian Høgsberg
f9e66ea621 vk: Remove render pass stub call
This isn't really a stub.
2015-05-20 20:34:52 -07:00
Kristian Høgsberg
a29df71dd2 vk: Add WSI implementation 2015-05-20 20:34:52 -07:00
Kristian Høgsberg
f886647b75 vk: Add debug stubs 2015-05-20 20:34:52 -07:00
Kristian Høgsberg
63da974529 vk: Mark remaining unsupported formats as such 2015-05-20 20:34:52 -07:00
Kristian Høgsberg
387a1bb58f vk: Mark VK_FORMAT_UNDEFINED as 1 cpp, 1 channel 2015-05-20 20:34:52 -07:00
Kristian Høgsberg
a1bd426393 vk: Stream surface state instead of using the surface pool
Since the binding table pointer is only 16 bits, we can only have 64kb
of binding table state allocated at any given time. With a block size of
1kb, that amounts to just 64 command buffers, which is not enough.
2015-05-20 20:34:52 -07:00
Kristian Høgsberg
01504057f5 vk: Use surface_format_info from dri driver for vkGetFormatInfo 2015-05-20 20:34:52 -07:00
Chad Versace
a61f307996 vk: Fix result of vkCreateInstance
When fill_physical_device() fails, don't return VK_SUCCESS.
2015-05-20 19:51:10 -07:00
Jason Ekstrand
14929046ba vk/compiler: Add shader language detection
This commit adds support for the LunarG GLSL back-door as well as detecting
regular GLSL and SPIR-V.  The SPIR-V path doesn't exist yet, so that will
cause an assert-fail.
2015-05-20 17:05:41 -07:00
Jason Ekstrand
47c1cf5ce6 vk/test: Add a test for testing buffer copies 2015-05-20 16:20:04 -07:00
Jason Ekstrand
bea66ac5ad vk/meta: Add support for copying arbitrary size buffers 2015-05-20 16:20:04 -07:00
Jason Ekstrand
9557b85e3d vk/meta: Use the biggest format possible for buffer copies
This should substantially improve throughput of buffer copies.
2015-05-20 16:20:04 -07:00
Jason Ekstrand
13719e9225 vk/meta: Fix buffer copy extents 2015-05-20 16:20:04 -07:00
Jason Ekstrand
d7044a19b1 vk/meta: Use texture() instead of texture2D() 2015-05-19 12:44:35 -07:00
Jason Ekstrand
edff076188 vk: Use binding instead of index in uniform layout qualifiers
This more closely matches what the Vulkan docs say to do.
2015-05-19 12:44:22 -07:00
Jason Ekstrand
e37a89136f vk/glsl_scraper: Add a --glsl-only option 2015-05-19 11:29:07 -07:00
Jason Ekstrand
4bcf58a192 vk/glsl_scraper: Use the line number from the end of the macro
We used to use the line number from the start of the macro but this doesn't
seem to match the c preprocessor
2015-05-19 11:29:07 -07:00
Jason Ekstrand
1573913194 vk/glsl_scraper: Don't open files until needed
This prevents us from writing an empty file when the compile failed.
2015-05-19 11:29:07 -07:00
Kristian Høgsberg
e4c11f50b5 vk: Call finish for binding table state stream 2015-05-18 21:12:13 -07:00
Jason Ekstrand
851495d344 vk/meta: Use the new *view_init functions and stack-allocated views
This should save us a good deal of the leakage that meta currently has.
2015-05-18 20:57:43 -07:00
Jason Ekstrand
4668bbb161 vk/image: Factor view creation out into separate *_init functions
The *_init functions work basically the same as the Vulkan entrypoints
except that they act on an already-created view and take an optional
command buffer option.  If a command buffer is given, the surface state is
allocated out of the command buffer's state stream.
2015-05-18 20:57:43 -07:00
Jason Ekstrand
7c9f209427 Revert "vk/allocator: Don't use memfd when valgrind is detected"
This reverts commit b6ab076d6b.

It turns out setting USE_MEMFD to 0 is really bad because it means we can't
resize the pool.  Besides, valgrind SVN handles memfd so we really don't
need this fallback for valgrind anymore.
2015-05-18 20:57:43 -07:00
Jason Ekstrand
923691c70d vk: Use a separate block pool and state stream for binding tables
The binding table pointers packet only allows for a 16-bit binding table
address so all binding tables have to be in the first 64 KB of the surface
state BO.  We solve this by adding a slave block pool that pulls off the
first 64 KB worth of blocks and reserves them for binding tables.
2015-05-18 20:57:43 -07:00
Jason Ekstrand
d24f8245db vk/allocator: Add a concept of a slave block pool
We probably need a better name but this will do for now.
2015-05-18 20:57:43 -07:00
Kristian Høgsberg
997596e4c4 vk/test: Add test that prints format features 2015-05-18 20:52:44 -07:00
Kristian Høgsberg
241b59cba0 vk/test: Test timestamps and occlusion queries 2015-05-18 20:52:44 -07:00
Kristian Høgsberg
ae9ac47c74 vk: Make timestamp command work correctly
This was using the wrong timestamp register and needs to write a 64 bit
value.
2015-05-18 20:52:43 -07:00
Kristian Høgsberg
82ddab4b18 vk: Make occlusion query work, both copy and get functions 2015-05-18 20:52:43 -07:00
Kristian Høgsberg
1d40e6ade8 vk: Update generated header files
This fixes a problem where register addresses where incorrectly shifted.
2015-05-18 20:52:43 -07:00
Kristian Høgsberg
f330bad545 vk: Only fill render targets for meta clear
Clear inherits the render targets from the current render pass. This
means we need to fill out the binding table after switching to meta
bindings. However, meta copies etc happen outside a render pass and
break when we try to fill in the render targets. This change fills the
render targets only for meta clear.
2015-05-18 20:52:43 -07:00
Jason Ekstrand
b6c7d8c911 vk/pipeline: Use a state_stream for storing programs
Previously, we were effectively using a state_stream, it was just
hand-rolled based on a block pool.  Now we actually use the data structure.
2015-05-18 15:58:20 -07:00
Jason Ekstrand
4063b7deb8 vk/allocator: Add support for valgrind tracking of state pools and streams
We leave the block pool untracked so that reads/writes to freed blocks
will get caught and do the tracking at the state pool/stream level.  We
have to do a few extra gymnastics for streams because valgrind works in
terms of poitners and we work in terms of separate map and offset.
Fortunately, the users of the state pool and stream should always be using
the map pointer provided in the anv_state structure.  We just have to
track, per block, the map that was used when we initially got the block.
Then we can make sure we always use that map and valgrind should stay
happy.
2015-05-18 15:58:20 -07:00
Jason Ekstrand
b6ab076d6b vk/allocator: Don't use memfd when valgrind is detected 2015-05-18 15:58:20 -07:00
Jason Ekstrand
682d11a6e8 vk/allocator: Assert that block_pool_grow succeeds 2015-05-18 15:48:19 -07:00
Jason Ekstrand
28804fb9e4 vk/gem: VG_CLEAR the padding for the gem_mmap struct 2015-05-18 12:05:17 -07:00
Jason Ekstrand
8440b13f55 vk/meta: Rework the indentation style
No functional change.
2015-05-18 10:43:51 -07:00
Kristian Høgsberg
5286ef7849 vk: Provide more realistic values for device info 2015-05-18 10:27:08 -07:00
Kristian Høgsberg
69fd473321 vk: Use a temporary buffer for formatting in finishme
This is more likely to avoid breaking up the message when racing with
other threads.
2015-05-18 10:27:08 -07:00
Jason Ekstrand
cd7ab6ba4e vk/meta: Add an initial implementation of vkCmdCopyBuffer
Compile-tested only
2015-05-18 10:27:08 -07:00
Jason Ekstrand
c25ce55fd3 vk/meta: Add an initial implementation of vkCmdCopyBufferToImage
Compile-tested only
2015-05-18 10:27:08 -07:00
Jason Ekstrand
08bd554cda vk/meta: Add an initial implementation of vkCmdBlitImage
Compile-tested only
2015-05-18 10:27:08 -07:00
Jason Ekstrand
fb27d80781 vk/meta: Add an initial implementation of vkCmdCopyImage
Compile-tested only
2015-05-18 10:27:08 -07:00
Jason Ekstrand
c15f3834e3 vk/gem: Set the gem_mmap.flags parameter to 0 if it exists 2015-05-18 10:27:08 -07:00
Jason Ekstrand
f7b0f922be vk/gem: Only VK_CLEAR the addr_ptr in gen_mmap 2015-05-18 10:27:07 -07:00
Kristian Høgsberg
ca7e62d421 vk: Add a logger wrapper for the generated entrypoint 2015-05-18 10:27:07 -07:00
Kristian Høgsberg
eb92745b2e vk/gem: Just return -1 from anv_gem_wait() on error
We were returning -errno, unlike all the other gem functions.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
05754549e8 vk: Fix vkGetOjectInfo return values
We weren't properly returning the allocation count.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
6afb26452b vk: Implement fences
This basic implementation uses a throw-away bo for synchronization.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
e26a7ffbd9 vk/meta: Use anv_* internal entrypoints 2015-05-18 10:27:07 -07:00
Kristian Høgsberg
b7fac7a7d1 vk: Implement allocation count query 2015-05-18 10:27:07 -07:00
Kristian Høgsberg
783e6217fc vk: Change pData/pDataSize semantics
We now always copy the entire struct unless pData is NULL and
unconditionally write back the struct size. It's not clear this is
useful if the structs may grow over time, but it seems to be the
expected behaviour for now.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
b4b3bd1c51 vk: Return VK_SUCCESS from vkAllocDescriptorSets
This should've been returning VK_SUCCESS all along.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
a9f2115486 vk: Return VK_SUCCESS for all descriptor pool entry points 2015-05-18 10:27:07 -07:00
Kristian Høgsberg
60ebcbed54 vk: Start Implementing vkGetFormatInfo()
We move the format table and vkGetFormatInfo to their own file in the
process.
2015-05-18 10:27:07 -07:00
Kristian Høgsberg
454345da1e vk: Add script for generating ifunc entry points
This lets us generate a hash table for vkGetProcAddress and lets us call
public functions internally without the public entrypoint overhead.
2015-05-18 10:27:02 -07:00
Kristian Høgsberg
333bcc2072 vk: Fix vulkan header inconsistency
The function pointer typedef and the function prototype for
vkCmdClearColorImage() didn't agree. Fix the typedef to match the
prototype.
2015-05-17 21:08:31 -07:00
Kristian Høgsberg
b9eb56a404 vk: Add function pointer typedef for intel extension
Also guard function prototype by VK_PROTOTYPES.
2015-05-17 21:08:30 -07:00
Kristian Høgsberg
75cb85c56a vk: Add missing VKAPI for vkQueueRemoveMemReferences 2015-05-17 21:08:30 -07:00
Jason Ekstrand
a924ea0c75 Merge remote-tracking branch 'fdo-personal/wip/nir-vtn' into vulkan
This adds the SPIR-V -> NIR translator.
2015-05-16 12:43:16 -07:00
Jason Ekstrand
a63952510d nir/spirv: Don't assert that the current block is empty
It's possible that someone will give us SPIR-V code in which someone
needlessly branches to new blocks.  We should handle that ok now.
2015-05-16 12:34:34 -07:00
Jason Ekstrand
4e44dcc312 nir/spirv: Add initial support for samplers 2015-05-16 12:34:15 -07:00
Jason Ekstrand
d6f52dfb3e nir/spirv: Move Exp and Log to the list of currently unhandled ALU ops
NIR doesn't have the native opcodes for them anymore
2015-05-16 12:33:32 -07:00
Jason Ekstrand
a53e795524 nir/types: Add support for sampler types 2015-05-16 12:32:58 -07:00
Jason Ekstrand
0fa9211d7f nir/spirv: Make the global constants in spirv.h static
I've been promissed in a bug that this will be fixed in a future version of
the header.  However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
2015-05-16 11:16:34 -07:00
Jason Ekstrand
036a4b1855 nir/spirv: Handle jump-to-loop in a more general way 2015-05-16 11:16:34 -07:00
Jason Ekstrand
56f533b3a0 nir/spirv: Handle boolean uniforms correctly 2015-05-16 11:16:34 -07:00
Jason Ekstrand
64bc58a88e nir/spirv: Handle control-flow with loops 2015-05-16 11:16:34 -07:00
Jason Ekstrand
3a2db9207d nir/spirv: Set a name on temporary variables 2015-05-16 11:16:34 -07:00
Jason Ekstrand
a28f8ad9f1 nir/spirv: Use the correct length for copying string literals 2015-05-16 11:16:34 -07:00
Jason Ekstrand
7b9c29e440 nir/spirv: Make vtn_ssa_value handle constants as well as ssa values 2015-05-16 11:16:33 -07:00
Jason Ekstrand
b0d1854efc nir/spirv: Add initial support for GLSL 4.50 builtins 2015-05-16 11:16:33 -07:00
Jason Ekstrand
1da9876486 nir/spirv: Split the core datastructures into a header file 2015-05-16 11:16:33 -07:00
Jason Ekstrand
98d78856f6 nir/spirv: Use the builder for all instructions
We don't actually use it to create all the instructions but we do use it
for insertion always.  This should make things far more consistent for
implementing extended instructions.
2015-05-16 11:16:33 -07:00
Jason Ekstrand
ff828749ea nir/spirv: Add support for a bunch of ALU operations 2015-05-16 11:16:33 -07:00
Jason Ekstrand
d2a7972557 nir/spirv: Add support for indirect array accesses 2015-05-16 11:16:33 -07:00
Jason Ekstrand
683c99908a nir/spirv: Explicitly type constants and SSA values 2015-05-16 11:16:33 -07:00
Jason Ekstrand
c5650148a9 nir/spirv: Handle OpBranchConditional
We do control-flow handling as a two-step process.  The first step is to
walk the instructions list and record various information about blocks and
functions.  This is where the acutal nir_function_overload objects get
created.  We also record the start/stop instruction for each block.  Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
2015-05-16 11:16:33 -07:00
Jason Ekstrand
ebc152e4c9 nir/spirv: Add a helper for getting a value as an SSA value 2015-05-16 11:16:33 -07:00
Jason Ekstrand
f23afc549b nir/spirv: Split instruction handling into preamble and body sections 2015-05-16 11:16:33 -07:00
Jason Ekstrand
ae6d32c635 nir/spirv: Implement load/store instructiosn 2015-05-16 11:16:33 -07:00
Jason Ekstrand
88f6fbc897 nir: Add a helper for getting the tail of a deref chain 2015-05-16 11:16:33 -07:00
Jason Ekstrand
06acd174f3 nir/spirv: Actaully add variables to the funciton or shader 2015-05-16 11:16:33 -07:00
Jason Ekstrand
5045efa4aa nir/spirv: Add a vtn_untyped_value helper 2015-05-16 11:16:33 -07:00
Jason Ekstrand
01f3aa9c51 nir/spirv: Use vtn_value in the types code and fix a off-by-one error 2015-05-16 11:16:33 -07:00
Jason Ekstrand
6ff0830d64 nir/types: Add an is_vector_or_scalar helper 2015-05-16 11:16:33 -07:00
Jason Ekstrand
5acd472271 nir/spirv: Add support for deref chains 2015-05-16 11:16:33 -07:00
Jason Ekstrand
7182597e50 nir/types: Add a scalar type constructor 2015-05-16 11:16:32 -07:00
Jason Ekstrand
eccd798cc2 nir/spirv: Add support for OpLabel 2015-05-16 11:16:32 -07:00
Jason Ekstrand
a6cb9d9222 nir/spirv: Add support for declaring functions 2015-05-16 11:16:32 -07:00
Jason Ekstrand
8ee23dab04 nir/types: Add accessors for function parameter/return types 2015-05-16 11:16:32 -07:00
Jason Ekstrand
707b706d18 nir/spirv: Add support for declaring variables
Deref chains and variable load/store operations are still missing.
2015-05-16 11:16:32 -07:00
Jason Ekstrand
b2db85d8e4 nir/spirv: Add support for constants 2015-05-16 11:16:32 -07:00
Jason Ekstrand
3f83579664 nir/spirv: Add basic support for types 2015-05-16 11:16:32 -07:00
Jason Ekstrand
e9d3b1e694 nir/types: Add more helpers for creating types 2015-05-16 11:16:32 -07:00
Jason Ekstrand
fe550f0738 glsl/types: Expose the function_param and struct_field structs to C
Previously, they were hidden behind a #ifdef __cplusplus so C wouldn't find
them.  This commit simpliy moves the ifdef.
2015-05-16 11:16:32 -07:00
Jason Ekstrand
053778c493 glsl/types: Add support for function types 2015-05-16 11:16:32 -07:00
Jason Ekstrand
7b63b3de93 glsl: Add GLSL_TYPE_FUNCTION to the base types enums 2015-05-16 11:16:32 -07:00
Jason Ekstrand
2b570a49a9 nir/spirv: Rework the way values are added
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value.  The
caller is then responsible for setting the various fields.
2015-05-16 11:16:32 -07:00
Jason Ekstrand
f9a31ba044 nir/spirv: Add stub support for extension instructions 2015-05-16 11:16:32 -07:00
Jason Ekstrand
4763a13b07 REVERT: Add a simple helper program for testing SPIR-V -> NIR translation 2015-05-16 11:16:32 -07:00
Jason Ekstrand
cae8db6b7e glsl/compiler: Move the error_no_memory stub to standalone_scaffolding.cpp 2015-05-16 11:16:32 -07:00
Jason Ekstrand
98452cd8ae nir: Add the start of a SPIR-V to NIR translator
At the moment, it can handle the very basics of strings and can ignore
debug instructions.  It also has basic support for decorations.
2015-05-16 11:16:32 -07:00
Jason Ekstrand
573ca4a4a7 nir: Import the revision 30 SPIR-V header from Khronos 2015-05-16 11:16:31 -07:00
Jason Ekstrand
057bef8a84 vk/device: Use bias rather than layers for computing binding table size
Because we statically use the first 8 binding table entries for render
targets, we need to create a table of size 8 + surfaces.
2015-05-16 10:42:53 -07:00
Jason Ekstrand
22e61c9da4 vk/meta: Make clear a no-op if no layers need clearing
Among other things, this prevents recursive meta.
2015-05-16 10:30:05 -07:00
Jason Ekstrand
120394ac92 vk/meta: Save and restore the old bindings pointer
If we don't do this then recursive meta is completely broken.  What happens
is that the outer meta call may change the bindings pointer and the inner
meta call will change it again and, when it exits set it back to the
default.  However, the outer meta call may be relying on it being left
alone so it uses the non-meta descriptor sets instead of its own.
2015-05-16 10:28:04 -07:00
Jason Ekstrand
4223de769e vk/device: Simplify surface_count calculation 2015-05-16 10:23:09 -07:00
Jason Ekstrand
eb1952592e vk/glsl_helpers: Fix GLSL_VK_SHADER with respect to commas
Previously, the GLSL_VK_SHADER macro didn't work if the shader contained
commas outside of parentheses due to the way the C preprocessor works.
This commit fixes this by making it variadic again and doing it correctly
this time.
2015-05-15 22:17:07 -07:00
Kristian Høgsberg
3b9f32e893 vk: Make cmd_buffer->bindings a pointer
This lets us save and restore efficiently by just moving the pointer to
a temporary bindings struct for meta.
2015-05-15 18:12:07 -07:00
Kristian Høgsberg
9540130c41 vk: Move vertex buffers into struct anv_bindings 2015-05-15 16:34:31 -07:00
Kristian Høgsberg
0cfc493775 vk: Fix GLSL_VK_SHADER macro
Stringify doesn't work with __ARGV__. The last macro argument swallows
up excess arguments and as such we can just stringify that.
2015-05-15 16:15:04 -07:00
Kristian Høgsberg
af45f4a558 vk: Fix warning from missing initializer
Struct initializers need to be { 0, } to zero out the variable they're
initializing.
2015-05-15 16:07:17 -07:00
Kristian Høgsberg
bf096c9ec3 vk: Build binding tables at bind descriptor time
This changes the way descriptor sets and layouts work so that we fill
out binding table contents at the time we bind descriptor sets. We
manipulate the binding table contents and sampler state in a shadow-copy
in anv_cmd_buffer. At draw time, we allocate the actual binding table
and sampler state and flush the anv_cmd_buffer copies.
2015-05-15 16:05:31 -07:00
Kristian Høgsberg
1f6c220b45 vk: Update the bind map length to reflect MAX_SETS 2015-05-15 15:22:29 -07:00
Kristian Høgsberg
b806e80e66 vk: Flip back to using memfd for the allocators 2015-05-15 15:22:29 -07:00
Kristian Høgsberg
0a775e1eab vk: Rename dyn_state_pool to dynamic_state_pool
Given that we already tolerate surface_state_pool and the even longer
instruction_state_pool, there's no reason to arbitrarily abbreviate
dynamic.
2015-05-15 15:22:29 -07:00
Kristian Høgsberg
f5b0f1351f vk: Consolidate image, buffer and color attachment views
These are all just surface state, offset and a bo.
2015-05-15 15:22:29 -07:00
Jason Ekstrand
41db8db0f2 vk: Add a GLSL scraper utility
This new utility, glsl_scraper.py scrapes C files for instances of the
GLSL_VK_SHADER macro, pulls out the shader source, and compiles it to
SPIR-V.  The compilation is done using glslValidator.  The result is then
placed into another C file as arrays of dwords that can be easiliy handed
to a Vulkan driver.
2015-05-14 19:18:57 -07:00
Jason Ekstrand
79ace6def6 vk/meta: Add a magic GLSL shader source macro 2015-05-14 19:07:34 -07:00
Jason Ekstrand
018a0c1741 vk/meta: Add a better comment about the VS for blits 2015-05-14 11:39:32 -07:00
Jason Ekstrand
8c92701a69 vk/test: Use VK_IMAGE_TILING_OPTIMAL for the render target 2015-05-13 22:27:38 -07:00
Jason Ekstrand
4fb8bddc58 vk/test: Do a copy of the RT into a linear buffer and write that to a PNG 2015-05-13 22:23:30 -07:00
Jason Ekstrand
bd5b76d6d0 vk/meta: Add the start of a blit implementation
Currently, we only implement CopyImageToBuffer
2015-05-13 22:23:30 -07:00
Jason Ekstrand
94b8c0b810 vk/pipeline: Default to a SamplerCount of 1 for PS 2015-05-13 22:23:30 -07:00
Jason Ekstrand
d3d4776202 vk/pipeline: Add an extra flag for force-disabling the vertex shader
This way we can pass in a vertex shader and yet have the pipeline emit an
empty 3DSTATE_VS packet.  We need this for meta because we need to trick
the compiler into not deleting our inputs but at the same time disable the
VS so that we can use a rectlist.  This should go away once we actually get
SPIR-V.
2015-05-13 22:23:30 -07:00
Jason Ekstrand
a1309c5255 vk/pass: Emit a flushing pipe control at the end of the pass
This is rather crude but it at least makes sure that all the render targets
get flushed at the end of the pass.  We probably actually want to do
somthing based on image layout traansitions, but this will work for now.
2015-05-13 22:23:30 -07:00
Jason Ekstrand
07943656a7 vk/compiler: Set the binding table texture_start
This is by no means a complete solution to the binding table problems.
However, it does make texturing actually work.  Before, we were texturing
from the render target since they were both starting at 0.
2015-05-13 22:23:30 -07:00
Jason Ekstrand
cd197181f2 vk/compiler: Zero the prog data
We use prog_data[stage] != NULL to determine whether or not we need to
clean up that stage.  Make sure it default to NULL.
2015-05-13 22:22:59 -07:00
Jason Ekstrand
1f7dcf9d75 vk/image: Stash more information in images and views 2015-05-13 22:22:59 -07:00
Jason Ekstrand
43126388cd vk/meta: Save/restore more stuff in cmd_buffer_restore 2015-05-13 22:22:59 -07:00
Chad Versace
50806e8dec vk: Install headers
I need this for building a testsuite.
2015-05-13 17:49:26 -07:00
Kristian Høgsberg
83c7e1f1db vk: Add support for sampler descriptors 2015-05-13 14:47:11 -07:00
Kristian Høgsberg
4f9eaf77a5 vk: Use a typesafe anv_descriptor struct 2015-05-13 14:47:11 -07:00
Kristian Høgsberg
5c9d77600b vk: Create and bind a sampler in vk.c 2015-05-13 14:47:11 -07:00
Kristian Høgsberg
18acfa7301 vk: Fix copy-n-paste sType in vkCreateSampler 2015-05-13 14:47:11 -07:00
Kristian Høgsberg
a1ec789b0b vk: Add a dynamic state stream to anv_cmd_buffer
We'll need this for sampler state.
2015-05-13 14:47:11 -07:00
Kristian Høgsberg
3f52c016fa vk: Move struct anv_sampler to private.h 2015-05-13 14:47:11 -07:00
Kristian Høgsberg
a77229c979 vk: Allocate layout->count number of descriptors
layout->count is the number of descriptors the application
requested. layout->total is the number of entries we need across all
stages.
2015-05-13 14:47:11 -07:00
Kristian Høgsberg
a3fd136509 vk: Fill out sampler state from API values 2015-05-13 14:47:11 -07:00
Chad Versace
828817b88f vk: Ignore vk executable 2015-05-13 12:05:38 -07:00
Kristian Høgsberg
2b7a060178 vk: Fix stale error handling in vkQueueSubmit 2015-05-12 14:38:58 -07:00
Kristian Høgsberg
cb986ef597 vk: Submit all cmd buffers passed to vkQueueSubmit 2015-05-12 14:38:12 -07:00
Kristian Høgsberg
9905481552 vk: Add generated header for HSW and IVB (GEN75 and GEN7) 2015-05-12 14:29:04 -07:00
Jason Ekstrand
ffe9f60358 vk: Add stub() and stub_return() macros and mark piles of functions as stubs 2015-05-12 13:45:02 -07:00
Jason Ekstrand
d3b374ce59 vk/util: Add a anv_finishme function/macro 2015-05-12 13:43:36 -07:00
Jason Ekstrand
7727720585 vk/meta: Break setting up meta clear state into it's own functin 2015-05-12 13:03:50 -07:00
Jason Ekstrand
4336a1bc00 vk/pipeline: Add support for disabling the scissor in "extra" 2015-05-12 12:53:01 -07:00
Kristian Høgsberg
d77c34d1d2 vk: Add clear load-op for render passes 2015-05-11 23:25:29 -07:00
Kristian Høgsberg
b734e0bcc5 vk: Add support for driver-internal custom pipelines
This lets us disable the viewport, use rect lists and repclear.
2015-05-11 23:25:29 -07:00
Kristian Høgsberg
ad132bbe48 vk: Fix 3DSTATE_VERTEX_BUFFER emission
Set VertexBufferIndex to the attribute binding, not the location.
2015-05-11 23:25:29 -07:00
Kristian Høgsberg
6a895c6681 vk: Add 32 bpc signed and unsigned integer formats 2015-05-11 23:25:29 -07:00
Kristian Høgsberg
55b9b703ea vk: Add anv_batch_emit_merge() helper macro
This lets us emit a state packet by merging to half-backed versions,
typically one from the pipeline object and one from a dynamic state
objects.
2015-05-11 23:25:28 -07:00
Kristian Høgsberg
099faa1a2b vk: Store bo pointer in anv_image and anv_buffer
We don't need to point back to the memory object the bo came from.
Pointing directly to a bo lets us bind images and buffers to other
bos - like our allocator bos.
2015-05-11 23:25:28 -07:00
Kristian Høgsberg
4f25f5d86c vk: Support not having a vertex shader
This lets us bypass the vertex shader and pass data straight into
the rasterizer part of the pipeline.
2015-05-11 23:25:28 -07:00
Kristian Høgsberg
20ad071190 vk: Allow NULL as a valid pipeline layout
Vertex buffers and render targets aren't part of the layout so having
an empty layout is pretty common.
2015-05-11 22:12:56 -07:00
Kristian Høgsberg
769785c497 Add vulkan driver for BDW 2015-05-09 11:38:32 -07:00
2601 changed files with 298042 additions and 65421 deletions

View File

@@ -5,6 +5,7 @@
(c-file-style . "stroustrup")
(fill-column . 78)
(eval . (progn
(c-set-offset 'case-label '0)
(c-set-offset 'innamespace '0)
(c-set-offset 'inline-open '0)))
)

3
.gitignore vendored
View File

@@ -34,6 +34,7 @@ aclocal.m4
config.log
config.status
cscope*
tags
.scon*
config.py
build
@@ -46,3 +47,5 @@ manifest.txt
Makefile
Makefile.in
.install-mesa-links
.install-gallium-links
/src/git_sha1.h

460
.mailmap Normal file
View File

@@ -0,0 +1,460 @@
Aapo Tahkola <aet@rasterburn.org> <aapo@aapo-desktop.(none)>
Adam Jackson <ajax@redhat.com> <ajax@benzedrine.nwnk.net>
Adam Jackson <ajax@redhat.com> <ajax@freedesktop.org>
Adrian Marius Negreanu <adrian.m.negreanu@intel.com> Adrian Negreanu <adrian.m.negreanu@intel.com>
Adrian Marius Negreanu <adrian.m.negreanu@intel.com> Negreanu Marius Adrian <adrian.m.negreanu@intel.com>
Dave Airlie <airlied@redhat.com> <airliedfreedesktop.org>
Dave Airlie <airlied@redhat.com> airlied <airlied@unused-12-215.bne.redhat.com>
Dave Airlie <airlied@redhat.com> <airlied@dhcp-1-203.bne.redhat.com>
Dave Airlie <airlied@redhat.com> <airlied@gmail.com>
Dave Airlie <airlied@redhat.com> <airlied@itt42.(none)>
Dave Airlie <airlied@redhat.com> <airlied@linux.ie>
Dave Airlie <airlied@redhat.com> <airlied@nx6125b.(none)>
Dave Airlie <airlied@redhat.com> <airlied@panoply-rh.(none)>
Dave Airlie <airlied@redhat.com> <airlied@ppcg5.localdomain>
Alan Coopersmith <alan.coopersmith@oracle.com> <alan.coopersmith@sun.com>
Alan Hourihane <alanh@vmware.com> <alanh@tungstengraphics.com>
Alan Hourihane <alanh@vmware.com> <alanh@fairlite.demon.co.uk>
Alan Hourihane <alanh@vmware.com> <alanh@jetpack.(none)>
Alexander Monakov <amonakov@gmail.com> <amonakov@ispras.ru>
Alexander von Gluck IV <kallisti5@unixzen.com> Alexander von Gluck <kallisti5@unixzen.com>
Alex Corscadden <alexc@vmware.com> <alexc@alexc-dev1.prom.eng.vmware.com>
Alex Corscadden <alexc@vmware.com> <alexc@alexc-dev1.vmware.com>
Alex Deucher <alexdeucher@gmail.com> <alexander.deucher@amd.com>
Alex Deucher <alexdeucher@gmail.com> <agd5f@yahoo.com>
Alex Deucher <alexdeucher@gmail.com> <alex@botch2.com>
Alex Deucher <alexdeucher@gmail.com> <alex@botch2.(none)>
Alex Deucher <alexdeucher@gmail.com> <alex@cube.(none)>
Alex Deucher <alexdeucher@gmail.com> <alex@samba.(none)>
Andreas Fänger <a.faenger@e-sign.com> <a.faenger@e-sign.com>
Andreas Hartmetz <ahartmetz@gmail.com> <andreas.hartmetz@kdab.com>
Andre Heider <a.heider@gmail.com>
Andreas Heider <andreas@heider.io>
Andreas Pokorny <andreas.pokorny@canonical.com> <andreas.pokorny@elektrobit.com>
Andrew Randrianasulu <randrianasulu@gmail.com> <randrik_a@yahoo.com>
Andrew Randrianasulu <randrianasulu@gmail.com> <randrik@mail.ru>
Arthur Huillet <arthur.huillet@free.fr> Arthur HUILLET <arthur.huillet@free.fr>
Benjamin Franzke <benjaminfranzke@googlemail.com> ben <benjaminfranzke@googlemail.com>
Ben Skeggs <bskeggs@redhat.com> <darktama@beleth.(none)>
Ben Skeggs <bskeggs@redhat.com> <darktama@iinet.net.au>
Ben Skeggs <bskeggs@redhat.com> <darktama@nisroch.keine.ath.cx>
Ben Skeggs <bskeggs@redhat.com> <skeggsb-at-gmail.com>
Ben Skeggs <bskeggs@redhat.com> <skeggsb@gmail.com>
Ben Skeggs <bskeggs@redhat.com> <skeggsb@localhost.localdomain>
Ben Skeggs <bskeggs@redhat.com> <skeggsb@nisroch.keine.ath.cx>
Ben Widawsky <benjamin.widawsky@intel.com> Ben Widawsky <ben@bwidawsk.net>
Blair Sadewitz <blair.sadewitz@gmail.com> Blair Sadewitz <blair.sadewitz.gmail.com>
Boris Peterbarg <reist@users.sourceforge.net> reist <reist>
Brian Paul <brianp@vmware.com> Brian <brian.paul@tungstengraphics.com>
Brian Paul <brianp@vmware.com> <brian.paul@tungstengraphics.com>
Brian Paul <brianp@vmware.com> <brian.e.paul@gmail.com>
Brian Paul <brianp@vmware.com> <brianp@kemper.freedesktop.org>
Brian Paul <brianp@vmware.com> brian <brian@cvp965.(none)>
Brian Paul <brianp@vmware.com> Brian <brian@i915.localnet.net>
Brian Paul <brianp@vmware.com> Brian <brian@nostromo.localnet.net>
Brian Paul <brianp@vmware.com> Brian <brian@poulsbo.localnet.net>
Brian Paul <brianp@vmware.com> Brian <brian@ps3.localnet.net>
Brian Paul <brianp@vmware.com> Brian <brianp@vmware.com>
Brian Paul <brianp@vmware.com> Brian <brian@yutani.localnet.net>
Brian Paul <brianp@vmware.com> root <brian.paul@tungstengraphics.com>
Brian Paul <brianp@vmware.com> root <root@i915.localnet.net>
Brian Paul <brianp@vmware.com> root <root@nostromo.localnet.net>
Brian Paul <brianp@vmware.com> root <root@i965.localnet.net>
Bruce Merry <bmerry@users.sourceforge.net> <bmerry@gmail.com>
Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <s3734770@mail.zih.tu-dresden.de>
Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <carli@carli-laptop.(none)>
Carl-Philip Hänsch <cphaensch@googlemail.com> Carl-Philip Haensch <Carl-Philip.Haensch@mailbox.tu-dresden.de>
Chad Versace <chad.versace@intel.com> <chad@chad-versace.us>
Chad Versace <chad.versace@intel.com> <Chad Versace chad@chad-versace.us>
Chad Versace <chad.versace@intel.com> <chad.versace@linux.intel.com>
Chia-I Wu <olvaffe@gmail.com> <olv@lunarg.com>
Chia-I Wu <olvaffe@gmail.com> Chia-Wu <olvaffe@gmail.com>
Chih-Wei Huang <cwhuang@linux.org.tw> Chih-Wei Huang <cwhuang@android-x86.org>
Christian König <christian.koenig@amd.com> Christian Koenig <christian.koenig@amd.com>
Christian König <christian.koenig@amd.com> Christian König <christian.koenig at amd.com>
Christian König <christian.koenig@amd.com> Christian König <deathsimple@vodafone.de>
Christoph Brill <egore911@egore911.de> Christoph Bill <egore@gmx.de>
Christoph Brill <egore911@egore911.de> <egore@gmx.de>
Christoph Bumiller <christoph.bumiller@speed.at> <e0425955@student.tuwien.ac.at>
Christopher James Halse Rogers <christopher.halse.rogers@canonical.com> Christopher James Halse Rogers <raof@ubuntu.com>
Claudio Ciccani <klan@directfb.org> <klan@users.sf.net>
Claudio Ciccani <klan@directfb.org> <klan@users.sourceforge.net>
Connor Abbott <cwabbott0@gmail.com> <connor.w.abbott@intel.com>
Connor Abbott <cwabbott0@gmail.com> <connor.abbott@intel.com>
Corbin Simpson <MostAwesomeDude@gmail.com> <mostawesomed...@gmail.com>
Corbin Simpson <MostAwesomeDude@gmail.com> <mostawesomedude@gmail.com>
Courtney Goeltzenleuchter <courtney@lunarg.com> <courtney@LunarG.com>
Daniel Skinner <sio@users.sourceforge.net> sio <sio>
Daniel Stone <daniels@collabora.com> <daniel@fooishbar.org>
David Miller <davem@davemloft.net> David S. Miller <davem@davemloft.net>
David Miller <davem@davemloft.net> Dave Miller <davem@davemloft.net>
David Miller <davem@davemloft.net> davem69 <davem69>
David Heidelberger <david.heidelberger@ixit.cz> David Heidelberg <david@ixit.cz>
David Heidelberger <david.heidelberger@ixit.cz> <d.okias@gmail.com>
David Reveman <reveman@chromium.org> <c99drn@cs.umu.se>
Dieter Nützel <Dieter@nuetzel-hh.de> Dieter Nützel <dieter@nuetzel-hh.de>
Dmitry Cherkassov <dcherkassov@gmail.com> Dmitry Cherkasov <dcherkassov@gmail.com>
Dylan Baker <dylanx.c.baker@intel.com> <baker.dylan.c@gmail.com>
Emeric Grange <emeric.grange@gmail.com> Emeric <emeric.grange@gmail.com>
Emil Velikov <emil.l.velikov@gmail.com> <emil.velikov@collabora.com>
Eric Anholt <eric@anholt.net> Eric Anholt <anholt@FreeBSD.org>
Eugeni Dodonov <eugeni.dodonov@intel.com> <eugeni@mandriva.com>
Fabian Bieler <der.fabe@gmx.net> <fabianbieler@fastmail.fm>
Fabian Bieler <der.fabe@gmx.net> <&lt;der.fabe@gmx.net&gt>
Feng, Haitao <haitao.feng@intel.com> Haitao Feng <haitao.feng@intel.com>
Frank Henigman <fjhenigman@google.com> <fjhenigman@chromium.org>
George Sapountzis <gsapountzis@gmail.com> George Sapountzis <gsap7@yahoo.gr>
Gwenole Beauchesne <gwenole.beauchesne@intel.com> <gb.devel@gmail.com>
Hamish Marson <hmarson@users.sourceforge.net> hmarson <hmarson>
Hans de Goede <hdegoede@redhat.com> Hans de Goede <j.w..r..degoede@hhs.nl>
Homer Hsing <dongsheng.xing@intel.com> <homer.hsing@gmail.com>
Hui Qi Tay <hqtay@vmware.com> <tayhuiqithq@gmail.com>
Ian Romanick <ian.d.romanick@intel.com> <idr@freedesktop.org>
Ian Romanick <ian.d.romanick@intel.com> <idr@us.ibm.com>
Jakob Bornecrantz <wallbraker@gmail.com> <jakob@vmware.com>
Jakob Bornecrantz <wallbraker@gmail.com> <jakob@aurora.(none)>
Jakob Bornecrantz <wallbraker@gmail.com> <jakob@aurora.walkyrie.se>
Jakob Bornecrantz <wallbraker@gmail.com> <jakob@tungstengraphics.com>
Jakob Bornecrantz <wallbraker@gmail.com> <wallbraker 'at' gmail 'dot' com>
Jakub Bogusz <qboosh@pld-linux.org> <gboosh@pld-linux.org>
James Legg <jlegg@feralinteractive.com> <lankyleggy@gmail.com>
Jan Vesely <jano.vesely@gmail.com> Jan Vesely <jan.vesely@rutgers.edu>
Jason Ekstrand <jason@jlekstrand.net> <jason.ekstrand@intel.com>
Jeremy Huddleston <jeremyhu@apple.com> <jeremyhu@freedesktop.org>
Jeremy Huddleston <jeremyhu@apple.com> <jeremy@tifa.local>
Jeremy Huddleston <jeremyhu@apple.com> <jeremy@vincent.local>
Jeremy Huddleston <jeremyhu@apple.com> <jeremy@yuffie.local>
Jeremy Huddleston <jeremyhu@apple.com> Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Jeremy Kolb <jkolb@freedesktop.org> <jkolb@brandeis.edu>
Jerome Glisse <jglisse@redhat.com> <glisse@freedesktop.org>
Jerome Glisse <jglisse@redhat.com> <glisse@kemper.freedesktop.org>
Jerome Glisse <jglisse@redhat.com> John Doe <glisse@barney.(none)>
Jerome Glisse <jglisse@redhat.com> John Doe <glisse@localhost.localdomain>
Jesse Barnes <jesse.barnes@intel.com> <jbarnes@hobbes.lan>
Jesse Barnes <jesse.barnes@intel.com> <jbarnes@hobbes.(none)>
Jesse Barnes <jesse.barnes@intel.com> <jbarnes@jbarnes-desktop.localdomain>
Jesse Barnes <jesse.barnes@intel.com> <jbarnes@jbarnes-t61.(none)>
Jesse Barnes <jesse.barnes@intel.com> <jbarnes@virtuousgeek.org>
Joakim Sindholt <bacn@zhasha.com> <opensource@zhasha.com>
Joakim Sindholt <bacn@zhasha.com> <zhasha@gallium-dev.(none)>
Jochen Gerlach <jtg@users.sourceforge.net> jtg <jtg>
Joel Bosveld <joel.bosveld@gmail.com> <Joel.Bosveld@gmail.com>
Jonathan Adamczewski <jadamcze@utas.edu.au> <jadamcze@utas.edu.a>
Jon Turney <jon.turney@dronecode.org.uk> Jon TURNEY <jon.turney@dronecode.org.uk>
José Fonseca <jfonseca@vmware.com> Jose Fonseca <jfonseca@vmware.com>
José Fonseca <jfonseca@vmware.com> Jose Fonseca <jrfonseca@tungstengraphics.com>
José Fonseca <jfonseca@vmware.com> <jfonseca@pegasus.(none)>
José Fonseca <jfonseca@vmware.com> <jfonseca@titan.(none)>
José Fonseca <jfonseca@vmware.com> <jose.r.fonseca@gmail.com>
José Fonseca <jfonseca@vmware.com> <jrfonseca@tungstengraphics.com>
José Fonseca <jfonseca@vmware.com> <j_r_fonseca@yahoo.co.uk>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> Jouk Jansen <jouk@hrem.nano.tudelft.nl>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> Jouk Jansen <joukj@hrem.stm.tudelft.nl>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> joukj <joukj@tarantella.(none)>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> Jouk <joukj@tarantella.nano.tudelft.nl>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> Jouk <joukj@tarantella.(none)>
Jouk Jansen <joukj@hrem.nano.tudelft.nl> J.Jansen <joukj@tarantella.nano.tudelft.nl>
Juan Zhao <juan.j.zhao@intel.com> <juan.j.zhao@linux.intel.com>
Julien Cristau <jcristau@debian.org> <julien.cristau@logilab.fr>
Julien Isorce <j.isorce@samsung.com> <julien.isorce@gmail.com>
Kalyan Kondapally <kalyan.kondapally@intel.com> <kondapallykalyancontribute@gmail.com>
Karl Schultz <karl.w.schultz@gmail.com> Karl Schultze <k.w.schultz@comcast.net>
Karl Schultz <karl.w.schultz@gmail.com> unknown <kwschult@.na.qualcomm.com>
Karl Schultz <karl.w.schultz@gmail.com> <k.w.schultz@comcast.net>
Karl Schultz <karl.w.schultz@gmail.com> <Karl.W.Schultz@gmail.com>
Karl Schultz <karl.w.schultz@gmail.com> <kschultz@freedesktop.org>
Keith Harrison <sio2@users.sourceforge.net> sio2 <sio2>
Keith Packard <keithp@keithp.com> <keithp@koto.keithp.com>
Keith Packard <keithp@keithp.com> <keithp@neko.keithp.com>
Keith Whitwell <keithw@vmware.com> <keith@tungstengraphics.com>
Keith Whitwell <keithw@vmware.com> keithw <keithw@keithw-laptop.(none)>
Kristian Høgsberg <krh@bitplanet.net> <krh@redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <krh@hinata.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <krh@sasori.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <krh@temari.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <kristian.h.kristensen@intel.com>
Krzesimir Nowak <qdlacz@gmail.com> <krzesimir@kinvolk.io>
Li Peng <peng.li@intel.com> <peng.li@linux.intel.com>
Lucas Stach <dev@lynxeye.de> <l.stach@pengutronix.de>
Maarten Lankhorst <maarten.lankhorst@ubuntu.com> <dev@mblankhorst.nl>
Maarten Lankhorst <maarten.lankhorst@ubuntu.com> <m.b.lankhorst@gmail.com>
Maarten Lankhorst <maarten.lankhorst@ubuntu.com> <maarten.lankhorst@canonical.com>
Maciej Cencora <m.cencora@gmail.com> <maciej@osiris.(none)>
Marc-André Lureau <marcandre.lureau@gmail.com> Marc-Andre Lureau <marcandre.lureau@gmail.com>
Marc Dietrich <marvin24@gmx.de> Marc <marvin24@gmx.de>
Marc Dietrich <marvin24@gmx.de> marvin24 <marvin24@gmx.de>
Marcin Ślusarz <marcin.slusarz@gmail.com> Marcin Slusarz <marcin.slusarz@gmail.com>
Marek Olšák <marek.olsak@amd.com> <maraeo@gmail.com>
Mario Kleiner <mario.kleiner.de@gmail.com> kleinerm <mario.kleiner@tuebingen.mpg.de>
Mario Kleiner <mario.kleiner.de@gmail.com> <mario.kleiner@tuebingen.mpg.de>
Mark Mueller <markkmueller@gmail.com> <MarkKMueller@gmail.com>
Marta Lofstedt <marta.lofstedt@intel.com> <marta.lofstedt@linux.intel.com>
Martin Peres <martin.peres@linux.intel.com> <martin.peres@labri.fr>
Mathias Fröhlich <mathias.froehlich@gmx.net> Mathias Froehlich <Mathias.Froehlich@gmx.net>
Mathias Fröhlich <mathias.froehlich@gmx.net> Mathias Froehlich <Mathias.Froehlich@web.de>
Mathias Fröhlich <mathias.froehlich@gmx.net> Mathias Frohlich <M.Froehlich@science-computing.de>
Mathias Fröhlich <mathias.froehlich@gmx.net> <frohlich8@users.sourceforge.net>
Mathias Fröhlich <mathias.froehlich@gmx.net> <Mathias.Froehlich@gmx.net>
Mathias Fröhlich <mathias.froehlich@gmx.net> <Mathias.Froehlich@web.de>
Mathias Fröhlich <mathias.froehlich@gmx.net> M.Froehlich@science-computing.de <M.Froehlich@science-computing.de>
Matthew W. S. Bell <matthew@bells23.org.uk> Matthew Bell <matthew@bells23.org.uk>
Maxence Le Doré <maxence.ledore@gmail.com> Maxence Le Dore <maxence.ledore@gmail.com>
Micah Fedke <micah.fedke@collabora.co.uk> <M.Fedke@Astronautics.com>
Michal Krol <michal@vmware.com> <michal@tungstengraphics.com>
Michal Krol <michal@vmware.com> Michal Krol <michal@ubuntu-vbox.(none)>
Michal Krol <michal@vmware.com> Michal Krol <mjkrol@gmail.org>
Michal Krol <michal@vmware.com> michal <michal@capacitor.(none)>
Michal Krol <michal@vmware.com> michal <michal@michal-laptop.(none)>
Michal Krol <michal@vmware.com> michal <michal@quad.(none)>
Michal Krol <michal@vmware.com> michal <michal@transistor.(none)>
Michal Krol <michal@vmware.com> Michal <michal@tungstengraphics.com>
Michal Krol <michal@vmware.com> michal <michal@wmvare.com>
Michel Dänzer <michel@daenzer.net> <michel.daenzer@amd.com>
Michel Dänzer <michel@daenzer.net> <daenzer@vmware.com>
Michel Dänzer <michel@daenzer.net> <michel@tungstengraphics.com>
Michel Dänzer <michel@daenzer.net> Michel Daenzer <michel.daenzer@amd.com>
Michel Dänzer <michel@daenzer.net> Michel Daenzer <daenzer@localhost.(none)>
Mike Kaplinskiy <mike.kaplinskiy@gmail.com> Mike Kaplinksiy <mike.kaplinskiy@gmail.com>
Mike Kaplinskiy <mike.kaplinskiy@gmail.com> <mike.kaplinskiy@gmai.com>
Mike Stroyan <mike@lunarg.com> <mike@LunarG.com>
Nian Wu <nian.wu@intel.com> <nian@graphics.(none)>
Nian Wu <nian.wu@intel.com> <nian@tinderbox.sh.intel.com>
Nick Bowler <nbowler@draconx.ca>
Nick Sarnie <commendsarnex@gmail.com>
Nicolai Hähnle <nicolai.haehnle@amd.com> <nhaehnle@gmail.com>
Nicolai Hähnle <nicolai.haehnle@amd.com> Nicolai Haehnle <nhaehnle@gmail.com>
Nicolai Hähnle <nicolai.haehnle@amd.com> Nicolai Haehnle <prefect_@gmx.net>
Nicolai Hähnle <nicolai.haehnle@amd.com> Nicolai Haehnle <prefect@upb.de>
Nigel Stewart <nigels@users.sourceforge.net> <nigels@sourceforge.net>
Nigel Stewart <nigels@users.sourceforge.net> <nstewart@nvidia.com>
nobled <nobled@dreamwidth.org> <nobled2@nobled2-karmic.(none)>
Oliver McFadden <oliver.mcfadden@linux.intel.com> <z3ro.geek@gmail.com>
Owain Ainsworth <zerooa@googlemail.com> Owain G. Ainsworth <oga@openbsd.org>
Owen W. Taylor <otaylor@fishsoup.net> Owen Taylor <otaylor@snell.localdomain>
Patrice Mandin <patmandin@gmail.com> <patrice@manoir.racoon.city>
Patrice Mandin <patmandin@gmail.com> <pmandin@caramail.com>
Patrice Mandin <patmandin@gmail.com> <pmandin@freedesktop.org>
Pauli Nieminen <pauli.nieminen@linux.intel.com> <suokkos@gmail.com>
Paulo Zanoni <paulo.r.zanoni@intel.com> Paulo Zanoni <pzanoni@mandriva.com>
Paul Seidler <sepek@exherbo.org> Paul Seidler <pl.seidler@googlemail.com>
Pekka Paalanen <pekka.paalanen@collabora.co.uk> <ppaalanen@gmail.com>
Pekka Paalanen <pekka.paalanen@collabora.co.uk> <pq@iki.fi>
Peter Hutterer <peter.hutterer@who-t.net> <peter@cs.unisa.edu.au>
Pierre-Eric Pelloux-Prayer <pelloux@gmail.com> pepp <pelloux@gmail.com>
Pierre Willenbrock <pierre@pirsoft.de> Pierre Willenbrok <pierre@pirsoft.de>
Quentin Glidic <sardemff7+git@sardemff7.net> <sardemff7@sardemff7.net>
RALOVICH, Kristóf <tade60@freemail.hu> <kristof.ralovich@gmail.com>
Richard Li <richardradeon@gmail.com> <RichardZ.Li@amd.com>
# The next ones are not 100% sure
Richard Li <richardradeon@gmail.com> richard <richard@richard-desktop3.(none)>
Richard Li <richardradeon@gmail.com> richard <richard@richard-desktop.(none)>
Richard Li <richardradeon@gmail.com> root <root@richard-desktop.(none)>
Richard Sandiford <rsandifo@linux.vnet.ibm.com> <r.sandiford@uk.ibm.com>
Rob Clark <robclark@freedesktop.org> <Rob Clark robdclark@freedesktop.org>
Rob Clark <robclark@freedesktop.org> <robdclark@gmail.com>
Robert Bragg <robert@sixbynine.org> <robert@linux.intel.com>
Robert Ellison <papillo@vmware.com> <papillo@i965-laptop.(none)>
Robert Ellison <papillo@vmware.com> <papillo@tungstengraphics.com>
Robert Hooker <sarvatt@ubuntu.com> <robert.hooker@canonical.com>
Roland Scheidegger <sroland@vmware.com> <rscheidegger@gmx.ch>
Roland Scheidegger <sroland@vmware.com> <sroland@tungstengraphics.com>
Roy Spliet <rspliet@eclipso.eu> <r.spliet@student.tudelft.nl>
Rune Petersen <rune@megahurts.dk> Rune Peterson <rune@megahurts.dk>
Ryan Houdek <sonicadvance1@gmail.com> <Sonicadvance1@gmail.com>
Sam Hocevar <sam@hocevar.net> Sam Hocevar <sam@zoy.org>
Samuel Iglesias Gonsálvez <siglesias@igalia.com> Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Sean D'Epagnier <sean@depagnier.com> <geckosenator@freedesktop.org>
Serge Martin <edb+mesa@sigluy.net> Serge Martin (EdB) <edb+mesa@sigluy.net>
Serge Martin <edb+mesa@sigluy.net> EdB <edb+mesa@sigluy.net>
Sinclair Yeh <syeh@vmware.com> <sinclair.yeh@intel.com>
Stefan Brüns <stefan.bruens@rwth-aachen.de> <Stefan.Bruens@rwth-aachen.de>
Stéphane Marchesin <marcheu@chromium.org> Stephane Marchesin <marchesin@icps.u-strasbg.fr>
Stéphane Marchesin <marcheu@chromium.org> Stephane Marchesin <stephane.marchesin@gmail.com>
Sven M. Hallberg <pesco@users.sourceforge.net> pesco <pesco>
Tapani Pälli <tapani.palli@intel.com> <tapani.palli@gmail.com>
Tapani Pälli <tapani.palli@intel.com> Tapani <tapani.palli@intel.com>
Thierry Reding <treding@nvidia.com> <thierry@gilfi.de>
Thierry Reding <treding@nvidia.com> <thierry.reding@avionic-design.de>
Thierry Vignaud <thierry.vignaud@gmail.com> <tvignaud@mandriva.com>
Thomas Balling Sørensen <tball@io.dk> <tball@tball-laptop.(none)>
Thomas Hellstrom <thellstrom@vmware.com> Thomas <thellstrom@vmware.com>
Thomas Hellstrom <thellstrom@vmware.com> Thomas Hellstrom <thellstrom-at-vmware-dot-com>
Thomas Hellstrom <thellstrom@vmware.com> Thomas Hellstrom <thomas-at-tungstengraphics-dot-com>
Thomas Hellstrom <thellstrom@vmware.com> Thomas Hellstrom <thomas@tungstengraphics.com>
Thomas Hellstrom <thellstrom@vmware.com> Thomas Hellström <thomas@tungstengraphics.com>
Thomas Tanner <tanner@gmx.net> tanner <tanner>
Tilman Sauerbeck <tilman@code-monkey.de> <tilman@freedesktop.org>
Timothy Arceri <timothy.arceri@collabora.com> <t_arceri@yahoo.com.au>
Timothy Arceri <timothy.arceri@collabora.com> Timothy <t_arceri@yahoo.com.au>
Tom Fogal <tfogal@alumni.unh.edu> <tfogal@sci.utah.edu>
Tom Stellard <thomas.stellard@amd.com> <tstellar@gmail.com>
Tom Stellard <thomas.stellard@amd.com> Thomas Stellard <tom.stellard@amd.com>
Tormod Volden <debian.tormod@gmail.com> <lists.tormod@gmail.com>
Török Edwin <edwin+mesa@etorok.net> Török Edvin <edwintorok@gmail.com>
Török Edwin <edwin+mesa@etorok.net> <edwintorok@gmail.com>
Ville Syrjälä <ville.syrjala@linux.intel.com> Ville Syrjala <syrjala@freedesktop.org>
Ville Syrjälä <ville.syrjala@linux.intel.com> Ville Syrjala <syrjala@sci.fi>
Vincent Lejeune <vljn@ovi.com> <peluche.canard@gmail.com>
Vinson Lee <vlee@freedesktop.org> <vlee@vmware.com>
Zhenyu Wang <zhenyuw@linux.intel.com> Wang Zhenyu <zhenyu.z.wang@intel.com>
Zack Rusin <zackr@vmware.com> <zack@kde.org>
Zack Rusin <zackr@vmware.com> <zack@pixel.(none)>
Zack Rusin <zackr@vmware.com> <zack@tungstengraphics.com>
Zhang <zxpmyth@yahoo.com.cn> zhang <zxpmyth@yahoo.com.cn>

101
.travis.yml Normal file
View File

@@ -0,0 +1,101 @@
language: c
sudo: false
cache:
directories:
- $HOME/.ccache
addons:
apt:
packages:
- libdrm-dev
- libudev-dev
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libxcb-dri2-0-dev
- libx11-xcb-dev
- llvm-3.4-dev
- scons
env:
global:
- XORG_RELEASES=http://xorg.freedesktop.org/releases/individual
- XCB_RELEASES=http://xcb.freedesktop.org/dist
- XORGMACROS_VERSION=util-macros-1.19.0
- GLPROTO_VERSION=glproto-1.4.17
- DRI2PROTO_VERSION=dri2proto-2.8
- DRI3PROTO_VERSION=dri3proto-1.0
- PRESENTPROTO_VERSION=presentproto-1.0
- LIBPCIACCESS_VERSION=libpciaccess-0.13.4
- LIBDRM_VERSION=libdrm-2.4.65
- XCBPROTO_VERSION=xcb-proto-1.11
- LIBXCB_VERSION=libxcb-1.11
- LIBXSHMFENCE_VERSION=libxshmfence-1.2
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig
matrix:
- BUILD=make
- BUILD=scons
install:
- export PATH="/usr/lib/ccache:$PATH"
- pip install --user mako
# Install dependencies where we require specific versions (or where
# disallowed by Travis CI's package whitelisting).
- wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2
- tar -jxvf $XORGMACROS_VERSION.tar.bz2
- (cd $XORGMACROS_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2
- tar -jxvf $GLPROTO_VERSION.tar.bz2
- (cd $GLPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2
- tar -jxvf $DRI2PROTO_VERSION.tar.bz2
- (cd $DRI2PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$DRI3PROTO_VERSION.tar.bz2
- tar -jxvf $DRI3PROTO_VERSION.tar.bz2
- (cd $DRI3PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$PRESENTPROTO_VERSION.tar.bz2
- tar -jxvf $PRESENTPROTO_VERSION.tar.bz2
- (cd $PRESENTPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2
- tar -jxvf $XCBPROTO_VERSION.tar.bz2
- (cd $XCBPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2
- tar -jxvf $LIBXCB_VERSION.tar.bz2
- (cd $LIBXCB_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2
- tar -jxvf $LIBPCIACCESS_VERSION.tar.bz2
- (cd $LIBPCIACCESS_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget http://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
- tar -jxvf $LIBDRM_VERSION.tar.bz2
- (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
- tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
- (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Disabled LLVM (and therefore r300 and r600) because the build fails
# with "undefined reference to `clock_gettime'" and "undefined
# reference to `setupterm'" in llvmpipe.
script:
- if test "x$BUILD" = xmake; then
./autogen.sh --enable-debug
--disable-gallium-llvm
--with-egl-platforms=x11,drm
--with-dri-drivers=i915,i965,radeon,r200,swrast,nouveau
--with-gallium-drivers=svga,swrast,vc4,virgl
;
make && make check;
elif test x$BUILD = xscons; then
scons;
fi

View File

@@ -21,13 +21,8 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# use c99 compiler by default
ifeq ($(LOCAL_CC),)
ifeq ($(LOCAL_IS_HOST_MODULE),true)
LOCAL_CC := $(HOST_CC) -std=c99 -D_GNU_SOURCE
else
LOCAL_CC := $(TARGET_CC) -std=c99
endif
LOCAL_CFLAGS += -D_GNU_SOURCE
endif
LOCAL_C_INCLUDES += \
@@ -37,6 +32,8 @@ LOCAL_C_INCLUDES += \
MESA_VERSION := $(shell cat $(MESA_TOP)/VERSION)
# define ANDROID_VERSION (e.g., 4.0.x => 0x0400)
LOCAL_CFLAGS += \
-Wno-unused-parameter \
-Wno-date-time \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
@@ -57,14 +54,18 @@ LOCAL_CFLAGS += \
-DHAVE___BUILTIN_CLZLL \
-DHAVE___BUILTIN_UNREACHABLE \
-DHAVE_PTHREAD=1 \
-DHAVE_DLOPEN \
-fvisibility=hidden \
-Wno-sign-compare
# mesa requires at least c99 compiler
LOCAL_CONLYFLAGS += \
-std=c99
ifeq ($(strip $(MESA_ENABLE_ASM)),true)
ifeq ($(TARGET_ARCH),x86)
LOCAL_CFLAGS += \
-DUSE_X86_ASM \
-DHAVE_DLOPEN \
endif
endif
@@ -82,9 +83,19 @@ LOCAL_CPPFLAGS += \
-Wno-error=non-virtual-dtor \
-Wno-non-virtual-dtor
ifeq ($(MESA_LOLLIPOP_BUILD),true)
LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\"
else
LOCAL_CFLAGS += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
endif
# uncomment to keep the debug symbols
#LOCAL_STRIP_MODULE := false
ifeq ($(strip $(LOCAL_MODULE_TAGS)),)
LOCAL_MODULE_TAGS := optional
endif
# Quiet down the build system and remove any .h files from the sources
LOCAL_SRC_FILES := $(patsubst %.h, , $(LOCAL_SRC_FILES))

View File

@@ -24,7 +24,7 @@
# BOARD_GPU_DRIVERS should be defined. The valid values are
#
# classic drivers: i915 i965
# gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi vc4 vmwgfx
# gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi vc4 virgl vmwgfx
#
# The main target is libGLES_mesa. For each classic driver enabled, a DRI
# module will also be built. DRI modules will be loaded by libGLES_mesa.
@@ -42,11 +42,15 @@ $(call local-intermediates-dir)
endef
endif
MESA_DRI_MODULE_REL_PATH := dri
MESA_DRI_MODULE_PATH := $(TARGET_OUT_SHARED_LIBRARIES)/$(MESA_DRI_MODULE_REL_PATH)
MESA_DRI_MODULE_UNSTRIPPED_PATH := $(TARGET_OUT_SHARED_LIBRARIES_UNSTRIPPED)/$(MESA_DRI_MODULE_REL_PATH)
MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
MESA_PYTHON2 := python
classic_drivers := i915 i965
gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi vmwgfx vc4
gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi vmwgfx vc4 virgl
MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))
@@ -84,18 +88,21 @@ MESA_ENABLE_LLVM := $(if $(filter radeonsi,$(MESA_GPU_DRIVERS)),true,false)
ifneq ($(strip $(MESA_GPU_DRIVERS)),)
SUBDIRS := \
src/gbm \
src/loader \
src/mapi \
src/glsl \
src/compiler \
src/mesa \
src/util \
src/egl \
src/mesa/drivers/dri
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))
ifeq ($(strip $(MESA_BUILD_GALLIUM)),true)
SUBDIRS += src/gallium
INC_DIRS += $(call all-named-subdir-makefiles,src/gallium)
endif
include $(call all-named-subdir-makefiles,$(SUBDIRS))
include $(INC_DIRS)
endif

View File

@@ -22,20 +22,29 @@
SUBDIRS = src
AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-dri \
--enable-dri3 \
--enable-egl \
--enable-gallium-tests \
--enable-gallium-osmesa \
--enable-gallium-llvm \
--enable-gbm \
--enable-gles1 \
--enable-gles2 \
--enable-glx \
--enable-glx-tls \
--enable-nine \
--enable-opencl \
--enable-opengl \
--enable-va \
--enable-vdpau \
--enable-xa \
--enable-xvmc \
--disable-llvm-shared-libs \
--with-egl-platforms=x11,wayland,drm \
--with-egl-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl \
--with-vulkan-drivers=intel
ACLOCAL_AMFLAGS = -I m4
@@ -51,7 +60,6 @@ noinst_HEADERS = \
include/c99_alloca.h \
include/c99_compat.h \
include/c99_math.h \
include/c99 \
include/c11 \
include/D3D9 \
include/HaikuGL \

106
REVIEWERS Normal file
View File

@@ -0,0 +1,106 @@
Overview:
This file is similar in syntax (or more precisly a subset) of what is
used by the MAINTAINERS file in the linux kernel. Some fields do not
apply, for example, in all cases, send patches to:
mesa-dev@lists.freedesktop.org
and in all cases the patchwork instance is:
https://patchwork.freedesktop.org/project/mesa/
The purpose is not exactly the same the MAINTAINERS file in the linux
kernel, as there are not official/formal maintainers of different
subsystems in mesa, but is meant to give an idea of who to CC for
various patches for review, and to allow the use of
scripts/get_reviewer.pl as git --cc-cmd.
Usage:
When sending patches:
git send-email --cc-cmd ./scripts/get_reviewer.pl ...
Or to configure as default:
git config sendemail.cccmd ./scripts/get_reviewer.pl
Descriptions of section entries:
R: Designated reviewer: FullName <address@domain>
These reviewers should be CCed on patches.
F: Files and directories with wildcard patterns.
A trailing slash includes all files and subdirectory files.
F: drivers/net/ all files in and below drivers/net
F: drivers/net/* all files in drivers/net, but not below
F: */net/* all files in "any top level directory"/net
One pattern per line. Multiple F: lines acceptable.
N: Files and directories with regex patterns.
N: [^a-z]tegra all files whose path contains the word tegra
One pattern per line. Multiple N: lines acceptable.
scripts/get_maintainer.pl has different behavior for files that
match F: pattern and matches of N: patterns. By default,
get_maintainer will not look at git log history when an F: pattern
match occurs. When an N: match occurs, git log history is used
to also notify the people that have git commit signatures.
Maintainers List (try to look for most precise areas first)
Note: this is an opt-in system, I have not tried to add anyone who hasn't
either asked me or sent a patch to add themselves.
-----------------------------------
NIR
R: Jason Ekstrand <jason@jlekstrand.net>
F: src/compiler/nir/
DOCUMENTATION
R: Emil Velikov <emil.l.velikov@gmail.com>
F: docs/
F: doxygen/
COMPATIBILITY HEADERS
R: Emil Velikov <emil.l.velikov@gmail.com>
F: include/c99*
DRI LOADER
R: Emil Velikov <emil.l.velikov@gmail.com>
F: src/loader/
GALLIUM LOADER
R: Emil Velikov <emil.l.velikov@gmail.com>
F: src/gallium/auxiliary/pipe-loader/
F: src/gallium/auxiliary/target-helpers/
GALLIUM TARGETS
R: Emil Velikov <emil.l.velikov@gmail.com>
F: src/gallium/targets/
AUTOCONF BUILD
R: Emil Velikov <emil.l.velikov@gmail.com>
F: configure.ac
F: */Automake.inc
F: */Makefile.*am
F: */Makefile.sources
SCONS BUILD
F: scons/
F: */SConscript*
F: */Makefile.sources
ANDROID BUILD
R: Emil Velikov <emil.l.velikov@gmail.com>
F: CleanSpec.mk
F: */Android.*mk
F: */Makefile.sources
WAYLAND EGL SUPPORT
R: Daniel Stone <daniels@collabora.com>
F: src/egl/wayland/*
F: src/egl/drivers/dri2/platform_wayland.c
FREEDRENO
R: Rob Clark <robclark@freedesktop.org>
F: src/gallium/drivers/freedreno/

View File

@@ -1,7 +1,7 @@
#######################################################################
# Top-level SConstruct
#
# For example, invoke scons as
# For example, invoke scons as
#
# scons build=debug llvm=yes machine=x86
#
@@ -12,13 +12,13 @@
# build='debug'
# llvm=True
# machine='x86'
#
#
# Invoke
#
# scons -h
#
# to get the full list of options. See scons manpage for more info.
#
#
import os
import os.path
@@ -36,7 +36,7 @@ common.AddOptions(opts)
env = Environment(
options = opts,
tools = ['gallium'],
toolpath = ['#scons'],
toolpath = ['#scons'],
ENV = os.environ,
)
@@ -53,7 +53,7 @@ else:
print 'scons: warning: targets option is deprecated; pass the targets on their own such as'
print
print ' scons %s' % ' '.join(targets)
print
print
COMMAND_LINE_TARGETS.append(targets)
@@ -84,9 +84,14 @@ env.Append(CPPPATH = [
#print env.Dump()
# Add a check target for running tests
check = env.Alias('check')
env.AlwaysBuild(check)
#######################################################################
# Invoke host SConscripts
#
# Invoke host SConscripts
#
# For things that are meant to be run on the native host build machine, instead
# of the target machine.
#

View File

@@ -1 +1 @@
11.1.1
11.3.0-devel

76
appveyor.yml Normal file
View File

@@ -0,0 +1,76 @@
# http://www.appveyor.com/docs/appveyor-yml
#
# To setup AppVeyor for your own personal repositories do the following:
# - Sign up
# - Add a new project
# - Select Git and fill in the Git clone URL
# - Setup a Git hook as explained in
# https://github.com/appveyor/webhooks#installing-git-hook
# - Check 'Settings > General > Skip branches without appveyor.yml'
# - Check 'Settings > General > Rolling builds'
# - Setup the global or project notifications to your liking
#
# Note that kicking (or restarting) a build via the web UI will not work, as it
# will fail to find appveyor.yml . The Git hook is the most practical way to
# kick a build.
#
# See also:
# - http://help.appveyor.com/discussions/problems/2209-node-grunt-build-specify-a-project-or-solution-file-the-directory-does-not-contain-a-project-or-solution-file
# - http://help.appveyor.com/discussions/questions/1184-build-config-vs-appveyoryaml
version: '{build}'
branches:
except:
- /^travis.*$/
# Don't download the full Mesa history to speed up cloning. However the clone
# depth must not be too small, otherwise builds might fail when lots of patches
# are committed in succession, because the desired commit is not found on the
# truncated history.
#
# See also:
# - https://www.appveyor.com/blog/2014/06/04/shallow-clone-for-git-repositories
clone_depth: 100
cache:
- win_flex_bison-2.4.5.zip
- llvm-3.3.1-msvc2013-mtd.7z
environment:
WINFLEXBISON_ARCHIVE: win_flex_bison-2.4.5.zip
LLVM_ARCHIVE: llvm-3.3.1-msvc2013-mtd.7z
install:
# Check pip
- python --version
- python -m pip --version
# Install Mako
- python -m pip install --egg Mako
# Install SCons
- python -m pip install --egg scons==2.4.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "http://downloads.sourceforge.net/project/winflexbison/%WINFLEXBISON_ARCHIVE%"
- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul
- set Path=%CD%\winflexbison;%Path%
- win_flex --version
- win_bison --version
# Download and extract LLVM
- if not exist "%LLVM_ARCHIVE%" appveyor DownloadFile "https://people.freedesktop.org/~jrfonseca/llvm/%LLVM_ARCHIVE%"
- 7z x -y "%LLVM_ARCHIVE%" > nul
- mkdir llvm\bin
- set LLVM=%CD%\llvm
build_script:
- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=12.0 llvm=1
after_build:
- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=12.0 llvm=1 check
# It's possible to setup notification here, as described in
# http://www.appveyor.com/docs/notifications#appveyor-yml-configuration , but
# doing so would cause the notification settings to be replicated across all
# repos, which is most likely undesired. So it's better to rely on the
# Appveyor global/project notification settings.

View File

@@ -1,5 +0,0 @@
# As per Marek http://lists.freedesktop.org/archives/mesa-stable/2015-December/003600.html
37208c4fd7b1ec679d10992b42a2811cab8245a5 Revert "radeonsi: disable DCC on Stoney"
# causes regression in xwayland, kde/plasma, mpv, steam ... fdo#92759
839793680f99b8387bee9489733d5071c10f3ace i965: Use MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals

35
bin/get-extra-pick-list.sh Executable file
View File

@@ -0,0 +1,35 @@
#!/bin/sh
# Script for generating a list of candidates which fix commits that have been
# previously cherry-picked to a stable branch.
#
# Usage examples:
#
# $ bin/get-extra-pick-list.sh
# $ bin/get-extra-pick-list.sh > picklist
# $ bin/get-extra-pick-list.sh | tee picklist
# Use the last branchpoint as our limit for the search
# XXX: there should be a better way for this
latest_branchpoint=`git branch | grep \* | cut -c 3-`-branchpoint
# Grep for commits with "cherry picked from commit" in the commit message.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' |\
cut -c -8 |\
while read sha
do
# Check if the original commit is referenced in master
git log -n1 --pretty=oneline --grep=$sha $latest_branchpoint..origin/master |\
cut -c -8 |\
while read candidate
do
# Check if the potential fix, hasn't landed in branch yet.
found=`git log -n1 --pretty=oneline --reverse --grep=$candidate $latest_branchpoint..HEAD |wc -l`
if test $found = 0
then
echo Commit $candidate might need to be picked, as it references $sha
fi
done
done

View File

@@ -97,6 +97,7 @@ def AddOptions(opts):
opts.Add(BoolOption('embedded', 'embedded build', 'no'))
opts.Add(BoolOption('analyze',
'enable static code analysis where available', 'no'))
opts.Add(BoolOption('asan', 'enable Address Sanitizer', 'no'))
opts.Add('toolchain', 'compiler toolchain', default_toolchain)
opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support',
'no'))

View File

@@ -68,13 +68,13 @@ OPENCL_VERSION=1
AC_SUBST([OPENCL_VERSION])
dnl Versions for external dependencies
LIBDRM_REQUIRED=2.4.60
LIBDRM_REQUIRED=2.4.66
LIBDRM_RADEON_REQUIRED=2.4.56
LIBDRM_AMDGPU_REQUIRED=2.4.63
LIBDRM_INTEL_REQUIRED=2.4.61
LIBDRM_NVVIEUX_REQUIRED=2.4.33
LIBDRM_NOUVEAU_REQUIRED=2.4.62
LIBDRM_FREEDRENO_REQUIRED=2.4.65
LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66
LIBDRM_FREEDRENO_REQUIRED=2.4.67
DRI2PROTO_REQUIRED=2.6
DRI3PROTO_REQUIRED=1.0
PRESENTPROTO_REQUIRED=1.0
@@ -99,6 +99,7 @@ AM_PROG_CC_C_O
AM_PROG_AS
AX_CHECK_GNU_MAKE
AC_CHECK_PROGS([PYTHON2], [python2.7 python2 python])
AC_CHECK_PROGS([PYTHON3], [python3.5 python3.4 python3])
AC_PROG_SED
AC_PROG_MKDIR_P
@@ -110,10 +111,10 @@ LT_INIT([disable-static])
AC_CHECK_PROG(RM, rm, [rm -f])
AX_PROG_BISON([],
AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.c"],
AS_IF([test ! -f "$srcdir/src/compiler/glsl/glcpp/glcpp-parse.c"],
[AC_MSG_ERROR([bison not found - unable to compile glcpp-parse.y])]))
AX_PROG_FLEX([],
AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-lex.c"],
AS_IF([test ! -f "$srcdir/src/compiler/glsl/glcpp/glcpp-lex.c"],
[AC_MSG_ERROR([flex not found - unable to compile glcpp-lex.l])]))
AC_CHECK_PROG(INDENT, indent, indent, cat)
@@ -141,6 +142,12 @@ else
fi
fi
if test -z "$PYTHON3"; then
if test ! -f "$srcdir/src/intel/genxml/gen9_pack.h"; then
AC_MSG_ERROR([Python3 not found - unable to generate sources])
fi
fi
AC_PROG_INSTALL
dnl We need a POSIX shell for parts of the build. Assume we have one
@@ -197,6 +204,13 @@ if test "x$GCC" = xyes -a "x$acv_mesa_CLANG" = xno; then
fi
fi
dnl We don't support building Mesa with Sun C compiler
dnl https://bugs.freedesktop.org/show_bug.cgi?id=93189
AC_CHECK_DECL([__SUNPRO_C], [SUNCC=yes], [SUNCC=no])
if test "x$SUNCC" = xyes; then
AC_MSG_ERROR([Building with Sun C compiler is not supported, use GCC instead.])
fi
dnl Check for compiler builtins
AX_GCC_BUILTIN([__builtin_bswap32])
AX_GCC_BUILTIN([__builtin_bswap64])
@@ -216,8 +230,10 @@ AX_GCC_FUNC_ATTRIBUTE([format])
AX_GCC_FUNC_ATTRIBUTE([malloc])
AX_GCC_FUNC_ATTRIBUTE([packed])
AX_GCC_FUNC_ATTRIBUTE([pure])
AX_GCC_FUNC_ATTRIBUTE([returns_nonnull])
AX_GCC_FUNC_ATTRIBUTE([unused])
AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
AX_GCC_FUNC_ATTRIBUTE([weak])
AM_CONDITIONAL([GEN_ASM_OFFSETS], test "x$GEN_ASM_OFFSETS" = xyes)
@@ -238,9 +254,13 @@ _SAVE_LDFLAGS="$LDFLAGS"
_SAVE_CPPFLAGS="$CPPFLAGS"
dnl Compiler macros
DEFINES="-D__STDC_LIMIT_MACROS"
DEFINES="-D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS"
AC_SUBST([DEFINES])
android=no
case "$host_os" in
*-android)
android=yes
;;
linux*|*-gnu*|gnu*)
DEFINES="$DEFINES -D_GNU_SOURCE"
;;
@@ -252,6 +272,8 @@ cygwin*)
;;
esac
AM_CONDITIONAL(HAVE_ANDROID, test "x$android" = xyes)
dnl Add flags for gcc and g++
if test "x$GCC" = xyes; then
CFLAGS="$CFLAGS -Wall"
@@ -298,8 +320,7 @@ if test "x$GCC" = xyes; then
# Flags to help ensure that certain portions of the code -- and only those
# portions -- can be built with MSVC:
# - src/util, src/gallium/auxiliary, and src/gallium/drivers/llvmpipe needs
# to build with Windows SDK 7.0.7600, which bundles MSVC 2008
# - src/util, src/gallium/auxiliary, rc/gallium/drivers/llvmpipe, and
# - non-Linux/Posix OpenGL portions needs to build on MSVC 2013 (which
# supports most of C99)
# - the rest has no compiler compiler restrictions
@@ -316,9 +337,6 @@ if test "x$GCC" = xyes; then
AC_MSG_RESULT([yes])],
AC_MSG_RESULT([no]));
CFLAGS="$save_CFLAGS"
MSVC2008_COMPAT_CFLAGS="$MSVC2013_COMPAT_CFLAGS -Werror=declaration-after-statement"
MSVC2008_COMPAT_CXXFLAGS="$MSVC2013_COMPAT_CXXFLAGS"
fi
if test "x$GXX" = xyes; then
CXXFLAGS="$CXXFLAGS -Wall"
@@ -346,8 +364,6 @@ fi
AC_SUBST([MSVC2013_COMPAT_CFLAGS])
AC_SUBST([MSVC2013_COMPAT_CXXFLAGS])
AC_SUBST([MSVC2008_COMPAT_CFLAGS])
AC_SUBST([MSVC2008_COMPAT_CXXFLAGS])
dnl even if the compiler appears to support it, using visibility attributes isn't
dnl going to do anything useful currently on cygwin apart from emit lots of warnings
@@ -389,6 +405,61 @@ fi
AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
AC_SUBST([SSE41_CFLAGS], $SSE41_CFLAGS)
dnl Check for Endianness
AC_C_BIGENDIAN(
little_endian=no,
little_endian=yes,
little_endian=no,
little_endian=no
)
dnl Check for POWER8 Architecture
PWR8_CFLAGS="-mpower8-vector"
have_pwr8_intrinsics=no
AC_MSG_CHECKING(whether gcc supports -mpower8-vector)
save_CFLAGS=$CFLAGS
CFLAGS="$PWR8_CFLAGS $CFLAGS"
AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 8))
#error "Need GCC >= 4.8 for sane POWER8 support"
#endif
#include <altivec.h>
int main () {
vector unsigned char r;
vector unsigned int v = vec_splat_u32 (1);
r = __builtin_vec_vgbbd ((vector unsigned char) v);
return 0;
}]])], have_pwr8_intrinsics=yes)
CFLAGS=$save_CFLAGS
AC_ARG_ENABLE(pwr8,
[AC_HELP_STRING([--disable-pwr8-inst],
[disable POWER8-specific instructions])],
[enable_pwr8=$enableval], [enable_pwr8=auto])
if test "x$enable_pwr8" = xno ; then
have_pwr8_intrinsics=disabled
fi
if test $have_pwr8_intrinsics = yes && test $little_endian = yes ; then
DEFINES="$DEFINES -D_ARCH_PWR8"
CXXFLAGS="$CXXFLAGS $PWR8_CFLAGS"
CFLAGS="$CFLAGS $PWR8_CFLAGS"
else
PWR8_CFLAGS=
fi
AC_MSG_RESULT($have_pwr8_intrinsics)
if test "x$enable_pwr8" = xyes && test $have_pwr8_intrinsics = no ; then
AC_MSG_ERROR([POWER8 compiler support not detected])
fi
if test $have_pwr8_intrinsics = yes && test $little_endian = no ; then
AC_MSG_WARN([POWER8 optimization is enabled only on POWER8 Little-Endian])
fi
AC_SUBST([PWR8_CFLAGS], $PWR8_CFLAGS)
dnl Can't have static and shared libraries, default to static if user
dnl explicitly requested. If both disabled, set to static since shared
dnl was explicitly requested.
@@ -414,8 +485,29 @@ AC_ARG_ENABLE([debug],
[enable_debug="$enableval"],
[enable_debug=no]
)
AC_ARG_ENABLE([profile],
[AS_HELP_STRING([--enable-profile],
[enable profiling of code @<:@default=disabled@:>@])],
[enable_profile="$enableval"],
[enable_profile=no]
)
if test "x$enable_profile" = xyes; then
DEFINES="$DEFINES -DPROFILE"
if test "x$GCC" = xyes; then
CFLAGS="$CFLAGS -fno-omit-frame-pointer"
fi
if test "x$GXX" = xyes; then
CXXFLAGS="$CXXFLAGS -fno-omit-frame-pointer"
fi
fi
if test "x$enable_debug" = xyes; then
DEFINES="$DEFINES -DDEBUG"
if test "x$enable_profile" = xyes; then
AC_MSG_WARN([Debug and Profile are enabled at the same time])
fi
if test "x$GCC" = xyes; then
if ! echo "$CFLAGS" | grep -q -e '-g'; then
CFLAGS="$CFLAGS -g"
@@ -436,6 +528,8 @@ else
DEFINES="$DEFINES -DNDEBUG"
fi
DEFAULT_GL_LIB_NAME=GL
dnl
dnl Check if linker supports -Bsymbolic
dnl
@@ -533,6 +627,23 @@ esac
AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
DEFAULT_GL_LIB_NAME=GL
dnl
dnl Libglvnd configuration
dnl
AC_ARG_ENABLE([libglvnd],
[AS_HELP_STRING([--enable-libglvnd],
[Build for libglvnd @<:@default=disabled@:>@])],
[enable_libglvnd="$enableval"],
[enable_libglvnd=no])
AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
#AM_COND_IF([USE_LIBGLVND_GLX], [DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"])
if test "x$enable_libglvnd" = xyes ; then
DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
DEFAULT_GL_LIB_NAME=GLX_mesa
fi
dnl
dnl library names
dnl
@@ -570,13 +681,13 @@ AC_ARG_WITH([gl-lib-name],
[AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
[specify GL library name @<:@default=GL@:>@])],
[GL_LIB=$withval],
[GL_LIB=GL])
[GL_LIB="$DEFAULT_GL_LIB_NAME"])
AC_ARG_WITH([osmesa-lib-name],
[AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
[specify OSMesa library name @<:@default=OSMesa@:>@])],
[OSMESA_LIB=$withval],
[OSMESA_LIB=OSMesa])
AS_IF([test "x$GL_LIB" = xyes], [GL_LIB=GL])
AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
dnl
@@ -627,8 +738,10 @@ test "x$enable_asm" = xno && AC_MSG_RESULT([no])
if test "x$enable_asm" = xyes -a "x$cross_compiling" = xyes; then
case "$host_cpu" in
i?86 | x86_64 | amd64)
enable_asm=no
AC_MSG_RESULT([no, cross compiling])
if test "x$host_cpu" != "x$target_cpu"; then
enable_asm=no
AC_MSG_RESULT([no, cross compiling])
fi
;;
esac
fi
@@ -719,6 +832,10 @@ dnl to -pthread, which causes problems if we need -lpthread to appear in
dnl pkgconfig files.
test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread"
PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs)
AC_SUBST(PTHREADSTUBS_CFLAGS)
AC_SUBST(PTHREADSTUBS_LIBS)
dnl SELinux awareness.
AC_ARG_ENABLE([selinux],
[AS_HELP_STRING([--enable-selinux],
@@ -779,8 +896,8 @@ AC_ARG_ENABLE([dri3],
[enable_dri3="$enableval"],
[enable_dri3="$dri3_default"])
AC_ARG_ENABLE([glx],
[AS_HELP_STRING([--enable-glx],
[enable GLX library @<:@default=enabled@:>@])],
[AS_HELP_STRING([--enable-glx@<:@=dri|xlib|gallium-xlib@:>@],
[enable the GLX library and choose an implementation @<:@default=auto@:>@])],
[enable_glx="$enableval"],
[enable_glx=yes])
AC_ARG_ENABLE([osmesa],
@@ -846,17 +963,6 @@ AC_ARG_ENABLE([opencl_icd],
@<:@default=disabled@:>@])],
[enable_opencl_icd="$enableval"],
[enable_opencl_icd=no])
AC_ARG_ENABLE([xlib-glx],
[AS_HELP_STRING([--enable-xlib-glx],
[make GLX library Xlib-based instead of DRI-based @<:@default=disabled@:>@])],
[enable_xlib_glx="$enableval"],
[enable_xlib_glx=no])
AC_ARG_ENABLE([r600-llvm-compiler],
[AS_HELP_STRING([--enable-r600-llvm-compiler],
[Enable experimental LLVM backend for graphics shaders @<:@default=disabled@:>@])],
[enable_r600_llvm="$enableval"],
[enable_r600_llvm=no])
AC_ARG_ENABLE([gallium-tests],
[AS_HELP_STRING([--enable-gallium-tests],
@@ -915,36 +1021,85 @@ AM_CONDITIONAL(NEED_OPENGL_COMMON, test "x$enable_opengl" = xyes -o \
"x$enable_gles1" = xyes -o \
"x$enable_gles2" = xyes)
if test "x$enable_glx" = xno; then
AC_MSG_WARN([GLX disabled, disabling Xlib-GLX])
enable_xlib_glx=no
# Validate GLX options
if test "x$enable_glx" = xyes; then
if test "x$enable_dri" = xyes; then
enable_glx=dri
elif test -n "$with_gallium_drivers"; then
enable_glx=gallium-xlib
else
enable_glx=xlib
fi
fi
case "x$enable_glx" in
xdri | xxlib | xgallium-xlib)
# GLX requires OpenGL
if test "x$enable_opengl" = xno; then
AC_MSG_ERROR([GLX cannot be built without OpenGL])
fi
if test "x$enable_dri$enable_xlib_glx" = xyesyes; then
AC_MSG_ERROR([DRI and Xlib-GLX cannot be built together])
# Check individual dependencies
case "x$enable_glx" in
xdri)
if test "x$enable_dri" = xno; then
AC_MSG_ERROR([DRI-based GLX requires DRI to be enabled])
fi
;;
xxlib)
if test "x$enable_dri" = xyes; then
AC_MSG_ERROR([Xlib-based GLX cannot be built with DRI enabled])
fi
;;
xgallium-xlib )
if test "x$enable_dri" = xyes; then
AC_MSG_ERROR([Xlib-based (Gallium) GLX cannot be built with DRI enabled])
fi
if test -z "$with_gallium_drivers"; then
AC_MSG_ERROR([Xlib-based (Gallium) GLX cannot be built without Gallium enabled])
fi
;;
esac
;;
xno)
;;
*)
AC_MSG_ERROR([Illegal value for --enable-glx: $enable_glx])
;;
esac
AM_CONDITIONAL(HAVE_DRI_GLX, test "x$enable_glx" = xdri)
AM_CONDITIONAL(HAVE_XLIB_GLX, test "x$enable_glx" = xxlib)
AM_CONDITIONAL(HAVE_GALLIUM_XLIB_GLX, test "x$enable_glx" = xgallium-xlib)
dnl
dnl Libglvnd configuration
dnl
AC_ARG_ENABLE([libglvnd],
[AS_HELP_STRING([--enable-libglvnd],
[Build for libglvnd @<:@default=disabled@:>@])],
[enable_libglvnd="$enableval"],
[enable_libglvnd=no])
AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
if test "x$enable_libglvnd" = xyes ; then
dnl XXX: update once we can handle more than libGL/glx.
dnl Namely: we should error out if neither of the glvnd enabled libraries
dnl are built
case "x$enable_glx" in
xno)
AC_MSG_ERROR([cannot build libglvnd without GLX])
;;
xxlib | xgallium-xlib )
AC_MSG_ERROR([cannot build libgvnd when Xlib-GLX or Gallium-Xlib-GLX is enabled])
;;
xdri)
;;
esac
PKG_CHECK_MODULES([GLVND], libglvnd >= 0.1.0)
DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
DEFAULT_GL_LIB_NAME=GLX_mesa
fi
if test "x$enable_opengl$enable_xlib_glx" = xnoyes; then
AC_MSG_ERROR([Xlib-GLX cannot be built without OpenGL])
fi
# Disable GLX if OpenGL is not enabled
if test "x$enable_glx$enable_opengl" = xyesno; then
AC_MSG_WARN([OpenGL not enabled, disabling GLX])
enable_glx=no
fi
# Disable GLX if DRI and Xlib-GLX are not enabled
if test "x$enable_glx" = xyes -a \
"x$enable_dri" = xno -a \
"x$enable_xlib_glx" = xno; then
AC_MSG_WARN([Neither DRI nor Xlib-GLX enabled, disabling GLX])
enable_glx=no
fi
AM_CONDITIONAL(HAVE_DRI_GLX, test "x$enable_glx" = xyes -a \
"x$enable_dri" = xyes)
# Check for libdrm
PKG_CHECK_MODULES([LIBDRM], [libdrm >= $LIBDRM_REQUIRED],
[have_libdrm=yes], [have_libdrm=no])
@@ -999,10 +1154,6 @@ dnl
dnl Driver specific build directories
dnl
if test -n "$with_gallium_drivers" -a "x$enable_glx$enable_xlib_glx" = xyesyes; then
NEED_WINSYS_XLIB="yes"
fi
if test "x$enable_gallium_osmesa" = xyes; then
if ! echo "$with_gallium_drivers" | grep -q 'swrast'; then
AC_MSG_ERROR([gallium_osmesa requires the gallium swrast driver])
@@ -1195,8 +1346,8 @@ AC_ARG_ENABLE([driglx-direct],
dnl
dnl libGL configuration per driver
dnl
case "x$enable_glx$enable_xlib_glx" in
xyesyes)
case "x$enable_glx" in
xxlib | xgallium-xlib)
# Xlib-based GLX
dri_modules="x11 xext xcb"
PKG_CHECK_MODULES([XLIBGL], [$dri_modules])
@@ -1206,7 +1357,7 @@ xyesyes)
GL_LIB_DEPS="$GL_LIB_DEPS $SELINUX_LIBS -lm $PTHREAD_LIBS $DLOPEN_LIBS"
GL_PC_LIB_PRIV="$GL_PC_LIB_PRIV $SELINUX_LIBS -lm $PTHREAD_LIBS"
;;
xyesno)
xdri)
# DRI-based GLX
PKG_CHECK_MODULES([GLPROTO], [glproto >= $GLPROTO_REQUIRED])
@@ -1233,7 +1384,7 @@ xyesno)
if test x"$enable_dri3" = xyes; then
PKG_CHECK_EXISTS([xcb >= $XCB_REQUIRED], [], AC_MSG_ERROR([DRI3 requires xcb >= $XCB_REQUIRED]))
dri3_modules="xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
dri3_modules="xcb xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
PKG_CHECK_MODULES([XCB_DRI3], [$dri3_modules])
fi
fi
@@ -1295,11 +1446,11 @@ AC_SUBST([HAVE_XF86VIDMODE])
dnl
dnl More GLX setup
dnl
case "x$enable_glx$enable_xlib_glx" in
xyesyes)
case "x$enable_glx" in
xxlib | xgallium-xlib)
DEFINES="$DEFINES -DUSE_XSHM"
;;
xyesno)
xdri)
DEFINES="$DEFINES -DGLX_INDIRECT_RENDERING"
if test "x$driglx_direct" = xyes; then
DEFINES="$DEFINES -DGLX_DIRECT_RENDERING"
@@ -1472,8 +1623,58 @@ if test -n "$with_dri_drivers"; then
DRI_DIRS=`echo $DRI_DIRS|tr " " "\n"|sort -u|tr "\n" " "`
fi
#
# Vulkan driver configuration
#
AC_ARG_WITH([vulkan-drivers],
[AS_HELP_STRING([--with-vulkan-drivers@<:@=DIRS...@:>@],
[comma delimited Vulkan drivers list, e.g.
"intel"
@<:@default=no@:>@])],
[with_vulkan_drivers="$withval"],
[with_vulkan_drivers="no"])
# Doing '--without-vulkan-drivers' will set this variable to 'no'. Clear it
# here so that the script doesn't choke on an unknown driver name later.
case "x$with_vulkan_drivers" in
xyes) with_vulkan_drivers="$VULKAN_DRIVERS_DEFAULT" ;;
xno) with_vulkan_drivers='' ;;
esac
AC_ARG_WITH([vulkan-icddir],
[AS_HELP_STRING([--with-vulkan-icddir=DIR],
[directory for the Vulkan driver icd files @<:@${sysconfdir}/vulkan/icd.d@:>@])],
[VULKAN_ICD_INSTALL_DIR="$withval"],
[VULKAN_ICD_INSTALL_DIR='${sysconfdir}/vulkan/icd.d'])
AC_SUBST([VULKAN_ICD_INSTALL_DIR])
if test -n "$with_vulkan_drivers"; then
VULKAN_DRIVERS=`IFS=', '; echo $with_vulkan_drivers`
for driver in $VULKAN_DRIVERS; do
case "x$driver" in
xintel)
if test "x$HAVE_I965_DRI" != xyes; then
AC_MSG_ERROR([Intel Vulkan driver requires the i965 dri driver])
fi
if test "x$with_sha1" == "x"; then
AC_MSG_ERROR([Intel Vulkan driver requires SHA1])
fi
HAVE_INTEL_VULKAN=yes;
;;
*)
AC_MSG_ERROR([Vulkan driver '$driver' does not exist])
;;
esac
done
VULKAN_DRIVERS=`echo $VULKAN_DRIVERS|tr " " "\n"|sort -u|tr "\n" " "`
fi
AM_CONDITIONAL(NEED_MEGADRIVER, test -n "$DRI_DIRS")
AM_CONDITIONAL(NEED_LIBMESA, test "x$enable_xlib_glx" = xyes -o \
AM_CONDITIONAL(NEED_LIBMESA, test "x$enable_glx" = xxlib -o \
"x$enable_osmesa" = xyes -o \
-n "$DRI_DIRS")
@@ -1488,7 +1689,7 @@ AC_ARG_WITH([osmesa-bits],
[osmesa_bits="$withval"],
[osmesa_bits=8])
if test "x$osmesa_bits" != x8; then
if test "x$enable_dri" = xyes -o "x$enable_glx" = xyes; then
if test "x$enable_dri" = xyes -o "x$enable_glx" != xno; then
AC_MSG_WARN([Ignoring OSMesa channel bits because of non-OSMesa driver])
osmesa_bits=8
fi
@@ -1644,7 +1845,12 @@ if test "x$enable_xvmc" = xyes -o \
"x$enable_vdpau" = xyes -o \
"x$enable_omx" = xyes -o \
"x$enable_va" = xyes; then
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
if test x"$enable_dri3" = xyes; then
PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED
x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
else
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
fi
need_gallium_vl_winsys=yes
fi
AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
@@ -1658,6 +1864,7 @@ AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes)
if test "x$enable_vdpau" = xyes; then
PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED])
gallium_st="$gallium_st vdpau"
DEFINES="$DEFINES -DHAVE_ST_VDPAU"
fi
AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes)
@@ -1834,6 +2041,9 @@ for plat in $egl_platforms; do
AC_MSG_ERROR([EGL platform surfaceless requires libdrm >= $LIBDRM_REQUIRED])
;;
android)
;;
*)
AC_MSG_ERROR([EGL platform '$plat' does not exist])
;;
@@ -1854,11 +2064,11 @@ else
EGL_NATIVE_PLATFORM="_EGL_INVALID_PLATFORM"
fi
AM_CONDITIONAL(HAVE_EGL_PLATFORM_X11, echo "$egl_platforms" | grep -q 'x11')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_WAYLAND, echo "$egl_platforms" | grep -q 'wayland')
AM_CONDITIONAL(HAVE_PLATFORM_X11, echo "$egl_platforms" | grep -q 'x11')
AM_CONDITIONAL(HAVE_PLATFORM_WAYLAND, echo "$egl_platforms" | grep -q 'wayland')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_DRM, echo "$egl_platforms" | grep -q 'drm')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_SURFACELESS, echo "$egl_platforms" | grep -q 'surfaceless')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_NULL, echo "$egl_platforms" | grep -q 'null')
AM_CONDITIONAL(HAVE_EGL_PLATFORM_ANDROID, echo "$egl_platforms" | grep -q 'android')
AM_CONDITIONAL(HAVE_EGL_DRIVER_DRI2, test "x$HAVE_EGL_DRIVER_DRI2" != "x")
@@ -2076,7 +2286,12 @@ gallium_require_drm_loader() {
fi
}
dnl This is for Glamor. Skip this if OpenGL is disabled.
require_egl_drm() {
if test "x$enable_opengl" = xno; then
return 0
fi
case "$with_egl_platforms" in
*drm*)
;;
@@ -2098,7 +2313,7 @@ radeon_llvm_check() {
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
llvm_check_version_for "3" "5" "0" $1
llvm_check_version_for "3" "6" "0" $1
if test true && $LLVM_CONFIG --targets-built | grep -iqvw $amdgpu_llvm_target_name ; then
AC_MSG_ERROR([LLVM $amdgpu_llvm_target_name not enabled in your LLVM build.])
fi
@@ -2109,6 +2324,16 @@ radeon_llvm_check() {
fi
}
swr_llvm_check() {
gallium_require_llvm $1
if test ${LLVM_VERSION_INT} -lt 306; then
AC_MSG_ERROR([LLVM version 3.6 or later required when building $1])
fi
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
}
dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block
if test -n "$with_gallium_drivers"; then
gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
@@ -2143,14 +2368,8 @@ if test -n "$with_gallium_drivers"; then
PKG_CHECK_MODULES([RADEON], [libdrm_radeon >= $LIBDRM_RADEON_REQUIRED])
gallium_require_drm "Gallium R600"
gallium_require_drm_loader
if test "x$enable_r600_llvm" = xyes -o "x$enable_opencl" = xyes; then
radeon_llvm_check "r600g"
LLVM_COMPONENTS="${LLVM_COMPONENTS} bitreader asmparser"
fi
if test "x$enable_r600_llvm" = xyes; then
USE_R600_LLVM_COMPILER=yes;
fi
if test "x$enable_opencl" = xyes; then
radeon_llvm_check "r600g"
LLVM_COMPONENTS="${LLVM_COMPONENTS} bitreader asmparser"
fi
;;
@@ -2181,6 +2400,35 @@ if test -n "$with_gallium_drivers"; then
HAVE_GALLIUM_LLVMPIPE=yes
fi
;;
xswr)
swr_llvm_check "swr"
AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
AVX_CXXFLAGS="-march=core-avx-i"
AVX2_CXXFLAGS="-march=core-avx2"
AC_LANG_PUSH([C++])
save_CXXFLAGS="$CXXFLAGS"
CXXFLAGS="-std=c++11 $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([c++11 compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
save_CXXFLAGS="$CXXFLAGS"
CXXFLAGS="$AVX_CXXFLAGS $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([AVX compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
save_CFLAGS="$CXXFLAGS"
CXXFLAGS="$AVX2_CXXFLAGS $CXXFLAGS"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
[AC_MSG_ERROR([AVX2 compiler support not detected])])
CXXFLAGS="$save_CXXFLAGS"
AC_LANG_POP([C++])
HAVE_GALLIUM_SWR=yes
;;
xvc4)
HAVE_GALLIUM_VC4=yes
gallium_require_drm "vc4"
@@ -2213,6 +2461,9 @@ dnl in LLVM_LIBS.
if test "x$MESA_LLVM" != x0; then
if ! $LLVM_CONFIG --libs ${LLVM_COMPONENTS} >/dev/null; then
AC_MSG_ERROR([Calling ${LLVM_CONFIG} failed])
fi
LLVM_LIBS="`$LLVM_CONFIG --libs ${LLVM_COMPONENTS}`"
dnl llvm-config may not give the right answer when llvm is a built as a
@@ -2267,6 +2518,10 @@ AM_CONDITIONAL(HAVE_GALLIUM_NOUVEAU, test "x$HAVE_GALLIUM_NOUVEAU" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test "x$HAVE_GALLIUM_FREEDRENO" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SWR, test "x$HAVE_GALLIUM_SWR" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SWRAST, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes -o \
"x$HAVE_GALLIUM_LLVMPIPE" = xyes -o \
"x$HAVE_GALLIUM_SWR" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VIRGL, test "x$HAVE_GALLIUM_VIRGL" = xyes)
@@ -2288,12 +2543,16 @@ AM_CONDITIONAL(HAVE_R200_DRI, test x$HAVE_R200_DRI = xyes)
AM_CONDITIONAL(HAVE_RADEON_DRI, test x$HAVE_RADEON_DRI = xyes)
AM_CONDITIONAL(HAVE_SWRAST_DRI, test x$HAVE_SWRAST_DRI = xyes)
AM_CONDITIONAL(HAVE_INTEL_VULKAN, test "x$HAVE_INTEL_VULKAN" = xyes)
AM_CONDITIONAL(HAVE_INTEL_DRIVERS, test "x$HAVE_INTEL_VULKAN" = xyes -o \
"x$HAVE_I965_DRI" = xyes)
AM_CONDITIONAL(NEED_RADEON_DRM_WINSYS, test "x$HAVE_GALLIUM_R300" = xyes -o \
"x$HAVE_GALLIUM_R600" = xyes -o \
"x$HAVE_GALLIUM_RADEONSI" = xyes)
AM_CONDITIONAL(NEED_WINSYS_XLIB, test "x$NEED_WINSYS_XLIB" = xyes)
AM_CONDITIONAL(NEED_WINSYS_XLIB, test "x$enable_glx" = xgallium-xlib)
AM_CONDITIONAL(NEED_RADEON_LLVM, test x$NEED_RADEON_LLVM = xyes)
AM_CONDITIONAL(USE_R600_LLVM_COMPILER, test x$USE_R600_LLVM_COMPILER = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
@@ -2302,7 +2561,6 @@ if test "x$USE_VC4_SIMULATOR" = xyes -a "x$HAVE_GALLIUM_ILO" = xyes; then
fi
AM_CONDITIONAL(HAVE_LIBDRM, test "x$have_libdrm" = xyes)
AM_CONDITIONAL(HAVE_X11_DRIVER, test "x$enable_xlib_glx" = xyes)
AM_CONDITIONAL(HAVE_OSMESA, test "x$enable_osmesa" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_OSMESA, test "x$enable_gallium_osmesa" = xyes)
@@ -2336,6 +2594,27 @@ AC_SUBST([XA_MINOR], $XA_MINOR)
AC_SUBST([XA_TINY], $XA_TINY)
AC_SUBST([XA_VERSION], "$XA_MAJOR.$XA_MINOR.$XA_TINY")
AC_ARG_ENABLE(valgrind,
[AS_HELP_STRING([--enable-valgrind],
[Build mesa with valgrind support (default: auto)])],
[VALGRIND=$enableval], [VALGRIND=auto])
if test "x$VALGRIND" != xno; then
PKG_CHECK_MODULES(VALGRIND, [valgrind], [have_valgrind=yes], [have_valgrind=no])
fi
AC_MSG_CHECKING([whether to enable Valgrind support])
if test "x$VALGRIND" = xauto; then
VALGRIND="$have_valgrind"
fi
if test "x$VALGRIND" = "xyes"; then
if ! test "x$have_valgrind" = xyes; then
AC_MSG_ERROR([Valgrind support required but not present])
fi
AC_DEFINE([HAVE_VALGRIND], 1, [Use valgrind intrinsics to suppress false warnings])
fi
AC_MSG_RESULT([$VALGRIND])
dnl Restore LDFLAGS and CPPFLAGS
LDFLAGS="$_SAVE_LDFLAGS"
CPPFLAGS="$_SAVE_CPPFLAGS"
@@ -2353,6 +2632,7 @@ CXXFLAGS="$CXXFLAGS $USER_CXXFLAGS"
dnl Substitute the config
AC_CONFIG_FILES([Makefile
src/Makefile
src/compiler/Makefile
src/egl/Makefile
src/egl/main/egl.pc
src/egl/wayland/wayland-drm/Makefile
@@ -2375,6 +2655,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/rbug/Makefile
src/gallium/drivers/softpipe/Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/swr/Makefile
src/gallium/drivers/trace/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/virgl/Makefile
@@ -2422,11 +2703,14 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glsl/Makefile
src/glx/Makefile
src/glx/apple/Makefile
src/glx/tests/Makefile
src/gtest/Makefile
src/intel/Makefile
src/intel/genxml/Makefile
src/intel/isl/Makefile
src/intel/vulkan/Makefile
src/loader/Makefile
src/mapi/Makefile
src/mapi/es1api/glesv1_cm.pc
@@ -2453,6 +2737,14 @@ AC_CONFIG_FILES([Makefile
AC_OUTPUT
# Fix up dependencies in *.Plo files, where we changed the extension of a
# source file
$SED -i -e 's/brw_blorp.cpp/brw_blorp.c/' src/mesa/drivers/dri/i965/.deps/brw_blorp.Plo
$SED -i -e 's/gen6_blorp.cpp/gen6_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen6_blorp.Plo
$SED -i -e 's/gen7_blorp.cpp/gen7_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen7_blorp.Plo
$SED -i -e 's/gen8_blorp.cpp/gen8_blorp.c/' src/mesa/drivers/dri/i965/.deps/gen8_blorp.Plo
dnl
dnl Output some configuration info for the user
dnl
@@ -2491,12 +2783,15 @@ if test "x$enable_dri" != xno; then
echo " DRI driver dir: $DRI_DRIVER_INSTALL_DIR"
fi
case "x$enable_glx$enable_xlib_glx" in
xyesyes)
case "x$enable_glx" in
xdri)
echo " GLX: DRI-based"
;;
xxlib)
echo " GLX: Xlib-based"
;;
xyesno)
echo " GLX: DRI-based"
xgallium-xlib)
echo " GLX: Xlib-based (Gallium)"
;;
*)
echo " GLX: $enable_glx"
@@ -2520,6 +2815,15 @@ if test "$enable_egl" = yes; then
echo " EGL drivers: $egl_drivers"
fi
# Vulkan
echo ""
if test "x$VULKAN_DRIVERS" != x; then
echo " Vulkan drivers: $VULKAN_DRIVERS"
echo " Vulkan ICD dir: $VULKAN_ICD_INSTALL_DIR"
else
echo " Vulkan drivers: no"
fi
echo ""
if test "x$MESA_LLVM" = x1; then
echo " llvm: yes"
@@ -2570,6 +2874,7 @@ if test "x$MESA_LLVM" = x1; then
echo ""
fi
echo " PYTHON2: $PYTHON2"
echo " PYTHON3: $PYTHON3"
echo ""
echo " Run '${MAKE-make}' to build Mesa"

View File

@@ -1,490 +0,0 @@
Some parts of Mesa are copyrighted under the GNU LGPL. See the
Mesa/docs/COPYRIGHT file for details.
The following is the standard GNU copyright file.
----------------------------------------------------------------------
GNU LIBRARY GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1991 Free Software Foundation, Inc.
675 Mass Ave, Cambridge, MA 02139, USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the library GPL. It is
numbered 2 because it goes with version 2 of the ordinary GPL.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Library General Public License, applies to some
specially designated Free Software Foundation software, and to any
other libraries whose authors decide to use it. You can use it for
your libraries, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if
you distribute copies of the library, or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link a program with the library, you must provide
complete object files to the recipients so that they can relink them
with the library, after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
Our method of protecting your rights has two steps: (1) copyright
the library, and (2) offer you this license which gives you legal
permission to copy, distribute and/or modify the library.
Also, for each distributor's protection, we want to make certain
that everyone understands that there is no warranty for this free
library. If the library is modified by someone else and passed on, we
want its recipients to know that what they have is not the original
version, so that any problems introduced by others will not reflect on
the original authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that companies distributing free
software will individually obtain patent licenses, thus in effect
transforming the program into proprietary software. To prevent this,
we have made it clear that any patent must be licensed for everyone's
free use or not licensed at all.
Most GNU software, including some libraries, is covered by the ordinary
GNU General Public License, which was designed for utility programs. This
license, the GNU Library General Public License, applies to certain
designated libraries. This license is quite different from the ordinary
one; be sure to read it in full, and don't assume that anything in it is
the same as in the ordinary license.
The reason we have a separate public license for some libraries is that
they blur the distinction we usually make between modifying or adding to a
program and simply using it. Linking a program with a library, without
changing the library, is in some sense simply using the library, and is
analogous to running a utility program or application program. However, in
a textual and legal sense, the linked executable is a combined work, a
derivative of the original library, and the ordinary General Public License
treats it as such.
Because of this blurred distinction, using the ordinary General
Public License for libraries did not effectively promote software
sharing, because most developers did not use the libraries. We
concluded that weaker conditions might promote sharing better.
However, unrestricted linking of non-free programs would deprive the
users of those programs of all benefit from the free status of the
libraries themselves. This Library General Public License is intended to
permit developers of non-free programs to use free libraries, while
preserving your freedom as a user of such programs to change the free
libraries that are incorporated in them. (We have not seen how to achieve
this as regards changes in header files, but we have achieved it as regards
changes in the actual functions of the Library.) The hope is that this
will lead to faster development of free libraries.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, while the latter only
works together with the library.
Note that it is possible for a library to be covered by the ordinary
General Public License rather than by this special one.
GNU LIBRARY GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library which
contains a notice placed by the copyright holder or other authorized
party saying it may be distributed under the terms of this Library
General Public License (also called "this License"). Each licensee is
addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
6. As an exception to the Sections above, you may also compile or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
c) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
d) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the source code distributed need not include anything that is normally
distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Library General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
Appendix: How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).
To apply these terms, attach the following notices to the library. It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
<one line to give the library's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Also add information on how to contact you by electronic and paper mail.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!

View File

@@ -1,13 +1,28 @@
# Status of OpenGL extensions in Mesa
Status of OpenGL 3.x features in Mesa
Here's how to read this file:
all DONE: <driver>, ...
All the extensions are done for the given list of drivers.
Note: when an item is marked as "DONE" it means all the core Mesa
infrastructure is complete but it may be the case that few (if any) drivers
implement the features.
DONE
The extension is done for Mesa and no implementation is necessary on the
driver-side.
DONE ()
The extension is done for Mesa and all the drivers in the "all DONE" list.
OpenGL Core and Compatibility context support
DONE (<driver>, ...)
The extension is done for Mesa, all the drivers in the "all DONE" list, and
all the drivers in the brackets.
in progress
The extension is started but not finished yet.
not started
The extension isn't started yet.
# OpenGL Core and Compatibility context support
OpenGL 3.1 and later versions are only supported with the Core profile.
There are no plans to support GL_ARB_compatibility. The last supported OpenGL
@@ -15,249 +30,248 @@ version with all deprecated features is 3.0. Some of the later GL features
are exposed in the 3.0 context as extensions.
Feature Status
----------------------------------------------------- ------------------------
Feature Status
------------------------------------------------------- ------------------------
GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
glBindFragDataLocation, glGetFragDataLocation DONE
Conditional rendering (GL_NV_conditional_render) DONE ()
Map buffer subranges (GL_ARB_map_buffer_range) DONE ()
Clamping controls (GL_ARB_color_buffer_float) DONE ()
Float textures, renderbuffers (GL_ARB_texture_float) DONE ()
GL_NV_conditional_render (Conditional rendering) DONE ()
GL_ARB_map_buffer_range (Map buffer subranges) DONE ()
GL_ARB_color_buffer_float (Clamping controls) DONE ()
GL_ARB_texture_float (Float textures, renderbuffers) DONE ()
GL_EXT_packed_float DONE ()
GL_EXT_texture_shared_exponent DONE ()
Float depth buffers (GL_ARB_depth_buffer_float) DONE ()
Framebuffer objects (GL_ARB_framebuffer_object) DONE ()
GL_ARB_depth_buffer_float (Float depth buffers) DONE ()
GL_ARB_framebuffer_object (Framebuffer objects) DONE ()
GL_ARB_half_float_pixel DONE (all drivers)
GL_ARB_half_float_vertex DONE ()
GL_EXT_texture_integer DONE ()
GL_EXT_texture_array DONE ()
Per-buffer blend and masks (GL_EXT_draw_buffers2) DONE ()
GL_EXT_draw_buffers2 (Per-buffer blend and masks) DONE ()
GL_EXT_texture_compression_rgtc DONE ()
GL_ARB_texture_rg DONE ()
Transform feedback (GL_EXT_transform_feedback) DONE ()
Vertex array objects (GL_ARB_vertex_array_object) DONE ()
sRGB framebuffer format (GL_EXT_framebuffer_sRGB) DONE ()
GL_EXT_transform_feedback (Transform feedback) DONE ()
GL_ARB_vertex_array_object (Vertex array objects) DONE ()
GL_EXT_framebuffer_sRGB (sRGB framebuffer format) DONE ()
glClearBuffer commands DONE
glGetStringi command DONE
glTexParameterI, glGetTexParameterI commands DONE
glVertexAttribI commands DONE
Depth format cube textures DONE ()
GLX_ARB_create_context (GLX 1.4 is required) DONE
Multisample anti-aliasing DONE (llvmpipe (*), softpipe (*))
Multisample anti-aliasing DONE (llvmpipe (*), softpipe (*), swr (*))
(*) llvmpipe and softpipe have fake Multisample anti-aliasing support
(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
Forward compatible context support/deprecations DONE ()
Instanced drawing (GL_ARB_draw_instanced) DONE ()
Buffer copying (GL_ARB_copy_buffer) DONE ()
Primitive restart (GL_NV_primitive_restart) DONE ()
GL_ARB_draw_instanced (Instanced drawing) DONE ()
GL_ARB_copy_buffer (Buffer copying) DONE ()
GL_NV_primitive_restart (Primitive restart) DONE ()
16 vertex texture image units DONE ()
Texture buffer objs (GL_ARB_texture_buffer_object) DONE for OpenGL 3.1 contexts ()
Rectangular textures (GL_ARB_texture_rectangle) DONE ()
Uniform buffer objs (GL_ARB_uniform_buffer_object) DONE ()
Signed normalized textures (GL_EXT_texture_snorm) DONE ()
GL_ARB_texture_buffer_object (Texture buffer objs) DONE (for OpenGL 3.1 contexts)
GL_ARB_texture_rectangle (Rectangular textures) DONE ()
GL_ARB_uniform_buffer_object (Uniform buffer objs) DONE ()
GL_EXT_texture_snorm (Signed normalized textures) DONE ()
GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
Core/compatibility profiles DONE
Geometry shaders DONE ()
BGRA vertex order (GL_ARB_vertex_array_bgra) DONE ()
Base vertex offset(GL_ARB_draw_elements_base_vertex) DONE ()
Frag shader coord (GL_ARB_fragment_coord_conventions) DONE ()
Provoking vertex (GL_ARB_provoking_vertex) DONE ()
Seamless cubemaps (GL_ARB_seamless_cube_map) DONE ()
Multisample textures (GL_ARB_texture_multisample) DONE ()
Frag depth clamp (GL_ARB_depth_clamp) DONE ()
Fence objects (GL_ARB_sync) DONE ()
GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (swr)
GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr)
GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr)
GL_ARB_provoking_vertex (Provoking vertex) DONE (swr)
GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (swr)
GL_ARB_texture_multisample (Multisample textures) DONE (swr)
GL_ARB_depth_clamp (Frag depth clamp) DONE (swr)
GL_ARB_sync (Fence objects) DONE (swr)
GLX_ARB_create_context_profile DONE
GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL_ARB_blend_func_extended DONE ()
GL_ARB_blend_func_extended DONE (swr)
GL_ARB_explicit_attrib_location DONE (all drivers that support GLSL)
GL_ARB_occlusion_query2 DONE ()
GL_ARB_occlusion_query2 DONE (swr)
GL_ARB_sampler_objects DONE (all drivers)
GL_ARB_shader_bit_encoding DONE ()
GL_ARB_texture_rgb10_a2ui DONE ()
GL_ARB_texture_swizzle DONE ()
GL_ARB_timer_query DONE ()
GL_ARB_instanced_arrays DONE ()
GL_ARB_vertex_type_2_10_10_10_rev DONE ()
GL_ARB_shader_bit_encoding DONE (swr)
GL_ARB_texture_rgb10_a2ui DONE (swr)
GL_ARB_texture_swizzle DONE (swr)
GL_ARB_timer_query DONE (swr)
GL_ARB_instanced_arrays DONE (swr)
GL_ARB_vertex_type_2_10_10_10_rev DONE (swr)
GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
GL 4.0, GLSL 4.00 --- all DONE: nvc0, r600, radeonsi
GL_ARB_draw_buffers_blend DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_draw_indirect DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_gpu_shader5 DONE (i965, r600)
- 'precise' qualifier DONE
- Dynamically uniform sampler array indices DONE (softpipe)
- Dynamically uniform UBO array indices DONE ()
- Implicit signed -> unsigned conversions DONE
- Fused multiply-add DONE ()
- Packing/bitfield/conversion functions DONE (softpipe)
- Enhanced textureGather DONE (softpipe)
- Geometry shader instancing DONE (llvmpipe, softpipe)
- Geometry shader multiple streams DONE ()
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50, r600)
GL_ARB_shader_subroutine DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_tessellation_shader DONE ()
GL_ARB_texture_buffer_object_rgb32 DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_texture_cube_map_array DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_query_lod DONE (i965, nv50, r600, softpipe)
GL_ARB_transform_feedback2 DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_transform_feedback3 DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_draw_buffers_blend DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (i965, llvmpipe, softpipe, swr)
GL_ARB_gpu_shader5 DONE (i965)
- 'precise' qualifier DONE
- Dynamically uniform sampler array indices DONE (softpipe)
- Dynamically uniform UBO array indices DONE ()
- Implicit signed -> unsigned conversions DONE
- Fused multiply-add DONE ()
- Packing/bitfield/conversion functions DONE (softpipe)
- Enhanced textureGather DONE (softpipe)
- Geometry shader instancing DONE (llvmpipe, softpipe)
- Geometry shader multiple streams DONE ()
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (i965/gen8+, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50)
GL_ARB_shader_subroutine DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_tessellation_shader DONE (i965)
GL_ARB_texture_buffer_object_rgb32 DONE (i965, llvmpipe, softpipe, swr)
GL_ARB_texture_cube_map_array DONE (i965, nv50, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_query_lod DONE (i965, nv50, softpipe)
GL_ARB_transform_feedback2 DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965, nv50, llvmpipe, softpipe, swr)
GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi
GL 4.1, GLSL 4.10 --- all DONE: nvc0, r600, radeonsi
GL_ARB_ES2_compatibility DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (r600, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, r600, llvmpipe)
GL_ARB_ES2_compatibility DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (i965/gen8+, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe)
GL 4.2, GLSL 4.20:
GL 4.2, GLSL 4.20 -- all DONE: radeonsi
GL_ARB_texture_compression_bptc DONE (i965, nvc0, r600, radeonsi)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_base_instance DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_shader_image_load_store DONE (i965)
GL_ARB_conservative_depth DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_420pack DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_internalformat_query DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_map_buffer_alignment DONE (all drivers)
GL_ARB_texture_compression_bptc DONE (i965, nvc0, r600, radeonsi)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_conservative_depth DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_420pack DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_internalformat_query DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_map_buffer_alignment DONE (all drivers)
GL 4.3, GLSL 4.30:
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_copy_image DONE (i965, nv50, nvc0, radeonsi)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965)
GL_ARB_internalformat_query2 not started
GL_ARB_invalidate_subdata DONE (all drivers)
GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior not started
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_copy_image DONE (i965, nv50, nvc0, r600, radeonsi)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_internalformat_query2 DONE (all drivers)
GL_ARB_invalidate_subdata DONE (all drivers)
GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior DONE (i965, nvc0, radeonsi)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GL 4.4, GLSL 4.40:
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965, nv50, nvc0)
GL_ARB_enhanced_layouts in progress (Timothy)
- compile-time constant expressions DONE
- explicit byte offsets for blocks in progress
- forced alignment within blocks in progress
- specified vec4-slot component numbers in progress
- specified transform/feedback layout in progress
- input/output block locations in progress
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object not started
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_stencil8 DONE (nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965, nv50, nvc0)
GL_ARB_enhanced_layouts in progress (Timothy)
- compile-time constant expressions DONE
- explicit byte offsets for blocks DONE
- forced alignment within blocks DONE
- specified vec4-slot component numbers in progress
- specified transform/feedback layout DONE
- input/output block locations DONE
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object DONE (i965/hsw+, nvc0)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_stencil8 DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL 4.5, GLSL 4.50:
GL_ARB_ES3_1_compatibility not started
GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_cull_distance in progress (Tobias)
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_texture_barrier DONE (i965, nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robust_buffer_access_behavior not started
GL_KHR_robustness 90% done (the ARB variant)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
GL_ARB_ES3_1_compatibility DONE (nvc0, radeonsi)
GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_cull_distance DONE (i965, nv50, nvc0, llvmpipe, softpipe)
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_texture_barrier DONE (i965, nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robustness DONE (i965)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
These are the extensions cherry-picked to make GLES 3.1
GLES3.1, GLSL ES 3.1
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (i965)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965)
GL_ARB_shader_image_load_store DONE (i965)
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
Multisample textures (GL_ARB_texture_multisample) DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GS5 Enhanced textureGather DONE (i965, nvc0, r600, radeonsi)
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GS5 Enhanced textureGather DONE (i965, nvc0, r600, radeonsi)
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functionality not covered above:
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glGetBooleani_v - restrict to GLES enums
gl_HelperInvocation support
gl_HelperInvocation support DONE (i965, nvc0, r600, radeonsi)
GLES3.2, GLSL ES 3.2
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced not started
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness 90% done (the ARB variant)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image not started (based on GL_ARB_copy_image, which is done for some drivers)
GL_OES_draw_buffers_indexed not started
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced not started
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness DONE (i965)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image DONE (i965)
GL_OES_draw_buffers_indexed DONE (all drivers that support GL_ARB_draw_buffers_blend)
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader started (idr)
GL_OES_gpu_shader5 DONE (all drivers that support GL_ARB_gpu_shader5)
GL_OES_primitive_bounding_box not started
GL_OES_sample_shading DONE (i965, nvc0, r600, radeonsi)
GL_OES_sample_variables DONE (i965, nvc0, r600, radeonsi)
GL_OES_shader_image_atomic DONE (all drivers that support GL_ARB_shader_image_load_store)
GL_OES_shader_io_blocks DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_shader_multisample_interpolation DONE (i965, nvc0, r600, radeonsi)
GL_OES_tessellation_shader started (Ken)
GL_OES_texture_border_clamp DONE (all drivers)
GL_OES_texture_buffer DONE (i965, nvc0, radeonsi)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 DONE (all drivers that support GL_ARB_texture_stencil8)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
http://dri.freedesktop.org/wiki/MissingFunctionality

View File

@@ -90,14 +90,14 @@
<li><a href="http://www.opengl.org" target="_parent">OpenGL website</a>
<li><a href="http://dri.freedesktop.org" target="_parent">DRI website</a>
<li><a href="http://www.freedesktop.org" target="_parent">freedesktop.org</a>
<li><a href="http://planet.freedesktop.org" target="_parent">Developer blogs</a>
</ul>
<b>Hosted by:</b>
<br>
<blockquote>
<a href="http://sourceforge.net"
target="_parent"><img src="http://sourceforge.net/sflogo.php?group_id=3&amp;type=1"
width="88" height="31" align="bottom" alt="Sourceforge.net" border="0"></a>
target="_parent">sourceforge.net</a>
</blockquote>
</body>

View File

@@ -18,7 +18,9 @@
<p>
Primary Mesa download site:
<a href="ftp://ftp.freedesktop.org/pub/mesa/">freedesktop.org</a> (FTP)
<a href="ftp://ftp.freedesktop.org/pub/mesa/">ftp.freedesktop.org</a> (FTP)
or <a href="https://mesa.freedesktop.org/archive/">mesa.freedesktop.org</a>
(HTTP).
</p>
<p>

View File

@@ -89,9 +89,11 @@ types such as <code>EGLNativeDisplayType</code> or
<p>The available platforms are <code>x11</code>, <code>drm</code>,
<code>wayland</code>, <code>surfaceless</code>, <code>android</code>,
and <code>haiku</code>. The <code>android</code> platform
can only be built as a system component, part of AOSP, while the
<code>haiku</code> platform can only be built with SCons.
and <code>haiku</code>.
The <code>android</code> platform can either be built as a system
component, part of AOSP, using <code>Android.mk</code> files, or
cross-compiled using appropriate <code>configure</code> options.
The <code>haiku</code> platform can only be built with SCons.
Unless for special needs, the build system should
select the right platforms automatically.</p>

View File

@@ -91,11 +91,20 @@ This is only valid for versions &gt;= 3.0.
<li> Mesa may not really implement all the features of the given version.
(for developers only)
</ul>
<li>MESA_GLES_VERSION_OVERRIDE - changes the value returned by
glGetString(GL_VERSION) for OpenGL ES.
<ul>
<li> The format should be MAJOR.MINOR
<li> Examples: 2.0, 3.0, 3.1
<li> Mesa may not really implement all the features of the given version.
(for developers only)
</ul>
<li>MESA_GLSL_VERSION_OVERRIDE - changes the value returned by
glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are integers, such as
"130". Mesa will not really implement all the features of the given language version
if it's higher than what's normally reported. (for developers only)
<li>MESA_GLSL - <a href="shading.html#envvars">shading language compiler options</a>
<li>MESA_NO_MINMAX_CACHE - when set, the minmax index cache is globally disabled.
</ul>
@@ -154,6 +163,9 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>vec4 - force vec4 mode in vertex shader</li>
<li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>
<li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>
</ul>
</ul>
@@ -223,7 +235,7 @@ See src/mesa/state_tracker/st_debug.c for other options.
<li>LP_PERF - a comma-separated list of options to selectively no-op various
parts of the driver. See the source code for details.
<li>LP_NUM_THREADS - an integer indicating how many threads to use for rendering.
Zero turns of threading completely. The default value is the number of CPU
Zero turns off threading completely. The default value is the number of CPU
cores present.
</ul>
@@ -244,6 +256,25 @@ for details.
</ul>
<h3>VC4 driver environment variables</h3>
<ul>
<li>VC4_DEBUG - a comma-separated list of named flags, which do various things:
<ul>
<li>cl - dump command list during creation</li>
<li>qpu - dump generated QPU instructions</li>
<li>qir - dump QPU IR during program compile</li>
<li>nir - dump NIR during program compile</li>
<li>tgsi - dump TGSI during program compile</li>
<li>shaderdb - dump program compile information for shader-db analysis</li>
<li>perf - print during performance-related events</li>
<li>norast - skip actual hardware execution of commands</li>
<li>always_flush - flush after each draw call</li>
<li>always_sync - wait for finish after each flush</li>
<li>dump - write a GPU command stream trace file (VC4 simulator only)</li>
</ul>
</ul>
<p>
Other Gallium drivers have their own environment variables. These may change
frequently so the source code should be consulted for details.

View File

@@ -16,6 +16,79 @@
<h1>News</h1>
<h2>May 9, 2016</h2>
<p>
<a href="relnotes/11.1.4.html">Mesa 11.1.4</a> and
<a href="relnotes/11.2.2.html">Mesa 11.2.2</a> are released.
These are bug-fix releases from the 11.1 and 11.2 branches, respectively.
<br>
NOTE: It is anticipated that 11.1.4 will be the final release in the 11.1.4
series. Users of 11.1 are encouraged to migrate to the 11.2 series in order
to obtain future fixes.
</p>
<h2>April 17, 2016</h2>
<p>
<a href="relnotes/11.1.3.html">Mesa 11.1.3</a> and
<a href="relnotes/11.2.1.html">Mesa 11.2.1</a> are released.
These are bug-fix releases from the 11.1 and 11.2 branches, respectively.
</p>
<h2>April 4, 2016</h2>
<p>
<a href="relnotes/11.2.0.html">Mesa 11.2.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>February 10, 2016</h2>
<p>
<a href="relnotes/11.1.2.html">Mesa 11.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>January 22, 2016</h2>
<p>
<a href="relnotes/11.0.9.html">Mesa 11.0.9</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 11.0.9 will be the final release in the 11.0
series. Users of 11.0 are encouraged to migrate to the 11.1 series in order
to obtain future fixes.
</p>
<h2>January 13, 2016</h2>
<p>
<a href="relnotes/11.1.1.html">Mesa 11.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>December 21, 2015</h2>
<p>
<a href="relnotes/11.0.8.html">Mesa 11.0.8</a> is released.
This is a bug-fix release.
</p>
<h2>December 15, 2015</h2>
<p>
<a href="relnotes/11.1.0.html">Mesa 11.1.0</a> is released. This is a new
development release. See the release notes for more information about
the release.
</p>
<h2>December 9, 2015</h2>
<p>
<a href="relnotes/11.0.7.html">Mesa 11.0.7</a> is released.
This is a bug-fix release.
</p>
<p>
Mesa demos 8.3.0 is also released.
See the <a href="http://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
You can download it from <a href="ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/">ftp.freedesktop.org/pub/mesa/demos/8.3.0/</a>.
</p>
<h2>November 21, 2015</h2>
<p>
<a href="relnotes/11.0.6.html">Mesa 11.0.6</a> is released.

View File

@@ -39,7 +39,7 @@ Version 2.6.4 or later should work.
</li>
<br>
<li><a href="http://www.makotemplates.org/">Python Mako module</a> -
Python Mako module is required. Version 0.7.3 or later should work.
Python Mako module is required. Version 0.3.4 or later should work.
</li>
</br>
<li><a href="http://www.scons.org/">SCons</a> is required for building on
@@ -58,6 +58,9 @@ On Windows with MinGW, install flex and bison with:
For MSVC on Windows, install
<a href="http://winflexbison.sourceforge.net/">Win flex-bison</a>.
</li>
<br>
<li>For building on Windows, Microsoft Visual Studio 2013 or later is required.
</li>
</ul>
@@ -70,8 +73,7 @@ The following are required for DRI-based hardware acceleration with Mesa:
<ul>
<li><a href="http://xorg.freedesktop.org/releases/individual/proto/">
dri2proto</a> version 2.6 or later
<li><a href="http://dri.freedesktop.org/libdrm/">libDRM</a>
version 2.4.33 or later
<li><a href="http://dri.freedesktop.org/libdrm/">libDRM</a> latest version
<li>Xorg server version 1.5 or later
<li>Linux 2.6.28 or later
</ul>

View File

@@ -46,10 +46,10 @@ library</em>. <br>
<p>
The Mesa distribution consists of several components. Different copyrights
and licenses apply to different components. For example, some demo programs
are copyrighted by SGI, some of the Mesa device drivers are copyrighted by
their authors. See below for a list of Mesa's main components and the license
for each.
and licenses apply to different components.
For example, the GLX client code uses the SGI Free Software License B, and
some of the Mesa device drivers are copyrighted by their authors.
See below for a list of Mesa's main components and the license for each.
</p>
<p>
The core Mesa library is licensed according to the terms of the MIT license.
@@ -97,13 +97,17 @@ and their respective licenses.
<pre>
Component Location License
------------------------------------------------------------------
Main Mesa code src/mesa/ Mesa (MIT)
Main Mesa code src/mesa/ MIT
Device drivers src/mesa/drivers/* MIT, generally
Gallium code src/gallium/ MIT
Ext headers include/GL/glext.h Khronos
include/GL/glxext.h
GLX client code src/glx/ SGI Free Software License B
C11 thread include/c11/threads*.h Boost (permissive)
emulation
</pre>

View File

@@ -21,6 +21,17 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/11.2.2.html">11.2.2 release notes</a>
<li><a href="relnotes/11.1.4.html">11.1.4 release notes</a>
<li><a href="relnotes/11.2.1.html">11.2.1 release notes</a>
<li><a href="relnotes/11.1.3.html">11.1.3 release notes</a>
<li><a href="relnotes/11.2.0.html">11.2.0 release notes</a>
<li><a href="relnotes/11.1.2.html">11.1.2 release notes</a>
<li><a href="relnotes/11.0.9.html">11.0.9 release notes</a>
<li><a href="relnotes/11.1.1.html">11.1.1 release notes</a>
<li><a href="relnotes/11.0.8.html">11.0.8 release notes</a>
<li><a href="relnotes/11.1.0.html">11.1.0 release notes</a>
<li><a href="relnotes/11.0.7.html">11.0.7 release notes</a>
<li><a href="relnotes/11.0.6.html">11.0.6 release notes</a>
<li><a href="relnotes/11.0.5.html">11.0.5 release notes</a>
<li><a href="relnotes/11.0.4.html">11.0.4 release notes</a>

View File

@@ -45,8 +45,6 @@ because compatibility contexts are not supported.
<ul>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>

154
docs/relnotes/11.0.7.html Normal file
View File

@@ -0,0 +1,154 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.7 Release Notes / December 9, 2015</h1>
<p>
Mesa 11.0.7 is a bug fix release which fixes bugs found since the 11.0.6 release.
</p>
<p>
Mesa 11.0.7 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
07c27004ff68b288097d17b2faa7bdf15ec73c96b7e6c9835266e544adf0a62f mesa-11.0.7.tar.gz
e7e90a332ede6c8fd08eff90786a3fd1605a4e62ebf3a9b514047838194538cb mesa-11.0.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93110">Bug 93110</a> - [NVE4] textureSize() and textureQueryLevels() uses a texture bound during the previous draw call</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
</ul>
<h2>Changes</h2>
<p>Chris Wilson (1):</p>
<ul>
<li>meta: Compute correct buffer size with SkipRows/SkipPixels</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>egl/wayland: Ignore rects from SwapBuffersWithDamage</li>
</ul>
<p>Dave Airlie (4):</p>
<ul>
<li>texgetimage: consolidate 1D array handling code.</li>
<li>r600: geometry shader gsvs itemsize workaround</li>
<li>r600: rv670 use at least 16es/gs threads</li>
<li>r600: workaround empty geom shader.</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.6</li>
<li>get-pick-list.sh: Require explicit "11.0" for nominating stable patches</li>
<li>mesa; add get-extra-pick-list.sh script into bin/</li>
<li>Update version to 11.0.7</li>
</ul>
<p>François Tigeot (1):</p>
<ul>
<li>xmlconfig: Add support for DragonFly</li>
</ul>
<p>Ian Romanick (22):</p>
<ul>
<li>mesa: Make bind_vertex_buffer avilable outside varray.c</li>
<li>mesa: Refactor update_array_format to make _mesa_update_array_format_public</li>
<li>mesa: Refactor enable_vertex_array_attrib to make _mesa_enable_vertex_array_attrib</li>
<li>i965: Pass brw_context instead of gl_context to brw_draw_rectlist</li>
<li>i965: Use DSA functions for VBOs in brw_meta_fast_clear</li>
<li>i965: Use internal functions for buffer object access</li>
<li>i965: Don't pollute the buffer object namespace in brw_meta_fast_clear</li>
<li>meta: Use DSA functions for PBO in create_texture_for_pbo</li>
<li>meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects</li>
<li>i965: Use _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects</li>
<li>meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects</li>
<li>meta: Track VBO using gl_buffer_object instead of GL API object handle</li>
<li>meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects</li>
<li>meta: Use internal functions for buffer object and VAO access</li>
<li>meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects</li>
<li>meta: Partially convert _mesa_meta_DrawTex to DSA</li>
<li>meta: Track VBO using gl_buffer_object instead of GL API object handle in _mesa_meta_DrawTex</li>
<li>meta: Use internal functions for buffer object and VAO access in _mesa_meta_DrawTex</li>
<li>meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex</li>
<li>meta/TexSubImage: Don't pollute the buffer object namespace</li>
<li>meta/generate_mipmap: Don't leak the framebuffer object</li>
<li>glsl: Fix off-by-one error in array size check assertion</li>
</ul>
<p>Ilia Mirkin (7):</p>
<ul>
<li>nvc0/ir: actually emit AFETCH on kepler</li>
<li>nir: fix typo in idiv lowering, causing large-udiv-udiv failures</li>
<li>nouveau: use the buffer usage to determine placement when no binding</li>
<li>nv50,nvc0: properly handle buffer storage invalidation on dsa buffer</li>
<li>nv50/ir: fix (un)spilling of 3-wide results</li>
<li>mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists</li>
<li>nvc0/ir: start offset at texBindBase for txq, like regular texturing</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>automake: fix some occurrences of hardcoded -ldl and -lpthread</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: disable Stoney VCE for 11.0</li>
</ul>
<p>Marta Lofstedt (1):</p>
<ul>
<li>gles2: Update gl2ext.h to revision: 32120</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: disable VSX in ppc due to LLVM PPC bug</li>
</ul>
</div>
</body>
</html>

200
docs/relnotes/11.0.8.html Normal file
View File

@@ -0,0 +1,200 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.8 Release Notes / December 9, 2015</h1>
<p>
Mesa 11.0.8 is a bug fix release which fixes bugs found since the 11.0.7 release.
</p>
<p>
Mesa 11.0.8 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
ab9db87b54d7525e4b611b82577ea9a9eae55927558df57b190059d5ecd9406f mesa-11.0.8.tar.gz
5696e4730518b6805d2ed5def393c4293f425a2c2c01bd5ed4bdd7ad62f7ad75 mesa-11.0.8.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91806">Bug 91806</a> - configure does not test whether assembler supports sse4.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/uvd: uv pitch separation for stoney</li>
</ul>
<p>Dave Airlie (9):</p>
<ul>
<li>r600: do SQ flush ES ring rolling workaround</li>
<li>r600: SMX returns CONTEXT_DONE early workaround</li>
<li>r600/shader: split address get out to a function.</li>
<li>r600/shader: add utility functions to do single slot arithmatic</li>
<li>r600g: fix geom shader input indirect indexing.</li>
<li>r600: handle geometry dynamic input array index</li>
<li>radeonsi: handle doubles in lds load path.</li>
<li>mesa/varray: set double arrays to non-normalised.</li>
<li>mesa/shader: return correct attribute location for double matrix arrays</li>
</ul>
<p>Emil Velikov (8):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.7</li>
<li>cherry-ignore: don't pick a specific i965 formats patch</li>
<li>Revert "i965/nir: Remove unused indirect handling"</li>
<li>Revert "i965/state: Get rid of dword_pitch arguments to buffer functions"</li>
<li>Revert "i965/vec4: Use a stride of 1 and byte offsets for UBOs"</li>
<li>Revert "i965/fs: Use a stride of 1 and byte offsets for UBOs"</li>
<li>Revert "i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge"</li>
<li>Update version to 11.0.8</li>
</ul>
<p>Francisco Jerez (1):</p>
<ul>
<li>i965: Resolve color and flush for all active shader images in intel_update_state().</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>meta/generate_mipmap: Work-around GLES 1.x problem with GL_DRAW_FRAMEBUFFER</li>
</ul>
<p>Ilia Mirkin (17):</p>
<ul>
<li>freedreno/a4xx: support lod_bias</li>
<li>freedreno/a4xx: fix 5_5_5_1 texture sampler format</li>
<li>freedreno/a4xx: point regid to "red" even for alpha-only rb formats</li>
<li>nvc0/ir: fold postfactor into immediate</li>
<li>nv50/ir: deal with loops with no breaks</li>
<li>nv50/ir: the mad source might not have a defining instruction</li>
<li>nv50/ir: fix instruction permutation logic</li>
<li>nv50/ir: don't forget to mark flagsDef on cvt in txb lowering</li>
<li>nv50/ir: fix DCE to not generate 96-bit loads</li>
<li>nv50/ir: avoid looking at uninitialized srcMods entries</li>
<li>gk110/ir: fix imul hi emission with limm arg</li>
<li>gk104/ir: sampler doesn't matter for txf</li>
<li>gk110/ir: fix imad sat/hi flag emission for immediate args</li>
<li>nv50/ir: fix cutoff for using r63 vs r127 when replacing zero</li>
<li>nv50/ir: can't have predication and immediates</li>
<li>glsl: assign varying locations to tess shaders when doing SSO</li>
<li>ttn: add TEX2 support</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge</li>
<li>i965/fs: Use a stride of 1 and byte offsets for UBOs</li>
<li>i965/vec4: Use a stride of 1 and byte offsets for UBOs</li>
<li>i965/state: Get rid of dword_pitch arguments to buffer functions</li>
<li>i965/nir: Remove unused indirect handling</li>
</ul>
<p>Jonathan Gray (2):</p>
<ul>
<li>configure.ac: use pkg-config for libelf</li>
<li>configure: check for python2.7 for PYTHON2</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Fix fragment shader struct inputs.</li>
<li>i965: Fix scalar vertex shader struct outputs.</li>
</ul>
<p>Marek Olšák (8):</p>
<ul>
<li>radeonsi: fix occlusion queries on Fiji</li>
<li>radeonsi: fix a hang due to uninitialized border color registers</li>
<li>radeonsi: fix Fiji for LLVM &lt;= 3.7</li>
<li>radeonsi: don't call of u_prims_for_vertices for patches and rectangles</li>
<li>radeonsi: apply the streamout workaround to Fiji as well</li>
<li>gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 correctly</li>
<li>tgsi/scan: add flag colors_written</li>
<li>r600g: write all MRTs only if there is exactly one output (fixes a hang)</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>glsl: Allow binding of image variables with 420pack.</li>
</ul>
<p>Neil Roberts (2):</p>
<ul>
<li>i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format</li>
<li>i965: Add B8G8R8X8_SRGB to the alpha format override</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>configura.ac: fix test for SSE4.1 assembler support</li>
</ul>
<p>Patrick Rudolph (2):</p>
<ul>
<li>nv50,nvc0: fix use-after-free when vertex buffers are unbound</li>
<li>gallium/util: return correct number of bound vertex buffers</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>nvc0: free memory allocated by the prog which reads MP perf counters</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>i965: use _Shader to get fragment program when updating surface state</li>
</ul>
<p>Tom Stellard (2):</p>
<ul>
<li>radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}</li>
<li>radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values</li>
</ul>
</div>
</body>
</html>

127
docs/relnotes/11.0.9.html Normal file
View File

@@ -0,0 +1,127 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.9 Release Notes / January 22, 2016</h1>
<p>
Mesa 11.0.9 is a bug fix release which fixes bugs found since the 11.0.8 release.
</p>
<p>
Mesa 11.0.9 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
1597c2e983f476f98efdd6cd58b5298896d18479ff542bdeff28b98b129ede05 mesa-11.0.9.tar.gz
a1262ff1c66a16ccf341186cf0e57b306b8589eb2cc5ce92ffb6788ab01d2b01 mesa-11.0.9.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92229">Bug 92229</a> - [APITRACE] SOMA have serious graphical errors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93257">Bug 93257</a> - [SKL, bisected] ASTC dEQP tests segfault</li>
</ul>
<h2>Changes</h2>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.8</li>
<li>cherry-ignore: add patch already in branch</li>
<li>cherry-ignore: add the dri3 glx null check patch</li>
<li>i915: correctly parse/set the context flags</li>
<li>egl/dri2: expose srgb configs when KHR_gl_colorspace is available</li>
<li>Update version to 11.0.9</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>r600: fix constant buffer size programming</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion</li>
<li>nv50/ir: float(s32 &amp; 0xff) = float(u8), not s8</li>
<li>nv50,nvc0: make sure there's pushbuf space and that we ref the bo early</li>
<li>nv50,nvc0: fix crash when increasing bsp bo size for h264</li>
<li>nvc0: scale up inter_bo size so that it's 16M for a 4K video</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>ralloc: Fix ralloc_adopt() to the old context's last child's parent.</li>
<li>nvc0: Set winding order regardless of domain.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: don't miss changes to SPI_TMPRING_SIZE</li>
</ul>
<p>Miklós Máté (1):</p>
<ul>
<li>mesa: Don't leak ATIfs instructions in DeleteFragmentShader</li>
</ul>
<p>Neil Roberts (1):</p>
<ul>
<li>i965: Fix crash when calling glViewport with no surface bound</li>
</ul>
<p>Nicolai Hähnle (6):</p>
<ul>
<li>gallium/radeon: only dispose locally created target machine in radeon_llvm_compile</li>
<li>mesa/bufferobj: make _mesa_delete_buffer_object externally accessible</li>
<li>st/mesa: use _mesa_delete_buffer_object</li>
<li>radeon: use _mesa_delete_buffer_object</li>
<li>i915: use _mesa_delete_buffer_object</li>
<li>i965: use _mesa_delete_buffer_object</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: use vpkswss when dst is signed</li>
</ul>
<p>Rob Herring (1):</p>
<ul>
<li>freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled</li>
</ul>
</div>
</body>
</html>

View File

@@ -31,7 +31,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD
b15089817540ba0bffd0aad323ecf3a8ff6779568451827c7274890b4a269d58 mesa-11.1.1.tar.gz
64db074fc514136b5fb3890111f0d50604db52f0b1e94ba3fcb0fe8668a7fd20 mesa-11.1.1.tar.xz
</pre>

182
docs/relnotes/11.1.2.html Normal file
View File

@@ -0,0 +1,182 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.2 Release Notes / February 10, 2016</h1>
<p>
Mesa 11.1.2 is a bug fix release which fixes bugs found since the 11.1.1 release.
</p>
<p>
Mesa 11.1.2 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
ba0e7462b2936b86e6684c26fbb55519f8d9ad31d13a1c1e1afbe41e73466eea mesa-11.1.2.tar.gz
8f72aead896b340ba0f7a4a474bfaf71681f5d675592aec1cb7ba698e319148b mesa-11.1.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93628">Bug 93628</a> - Exception: attempt to use unavailable module DRM when building MesaGL 11.1.0 on windows</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93648">Bug 93648</a> - Random lines being rendered when playing Dolphin (geometry shaders related, w/ apitrace)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93650">Bug 93650</a> - GL_ARB_separate_shader_objects is buggy (PCSX2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93717">Bug 93717</a> - Meta mipmap generation can corrupt texture state</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93722">Bug 93722</a> - Segfault when compiling shader with a subroutine that takes a parameter</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93731">Bug 93731</a> - glUniformSubroutinesuiv segfaults when subroutine uniform is bound to a specific location</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93761">Bug 93761</a> - A conditional discard in a fragment shader causes no depth writing at all</li>
</ul>
<h2>Changes</h2>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/bxt: Fix conservative wm thread counts.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>glsl: fix subroutine lowering reusing actual parmaters</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: add sha256 checksums for 11.1.1</li>
<li>cherry-ignore: drop the i965/kbl .num_slices patch</li>
<li>i915: correctly parse/set the context flags</li>
<li>targets/dri: android: use WHOLE static libraries</li>
<li>egl/dri2: expose srgb configs when KHR_gl_colorspace is available</li>
<li>Update version to 11.1.2</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>vc4: Don't record the seqno of a failed job submit.</li>
<li>vc4: Throttle outstanding rendering after submission.</li>
</ul>
<p>François Tigeot (1):</p>
<ul>
<li>gallium: Add DragonFly support</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>r600g: don't leak driver const buffers</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>meta/blit: Restore GL_DEPTH_STENCIL_TEXTURE_MODE state for GL_TEXTURE_RECTANGLE</li>
<li>meta: Use internal functions to set texture parameters</li>
</ul>
<p>Ilia Mirkin (6):</p>
<ul>
<li>st/mesa: use surface format to generate mipmaps when available</li>
<li>glsl: always compute proper varying type, irrespective of varying packing</li>
<li>nvc0: avoid crashing when there are holes in vertex array bindings</li>
<li>nv50,nvc0: fix buffer clearing to respect engine alignment requirements</li>
<li>nv50/ir: fix false global CSE on instructions with multiple defs</li>
<li>st/mesa: treat a write as a read for range purposes</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>i965/vec4: Use UW type for multiply into accumulator on GEN8+</li>
<li>i965/fs/generator: Take an actual shader stage rather than a string</li>
<li>i965/fs: Always set channel 2 of texture headers in some stages</li>
</ul>
<p>Jose Fonseca (2):</p>
<ul>
<li>scons: Conditionally use DRM module on pipe-loader.</li>
<li>pipe-loader: Fix PATH_MAX define on MSVC.</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nv50/ir: fix memory corruption when spilling and redoing RA</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>glsl: Make bitfield_insert/extract and bfi/bfm non-vectorizable.</li>
<li>glsl: Allow implicit int -&gt; uint conversions for bitwise operators (&amp;, ^, |).</li>
</ul>
<p>Leo Liu (2):</p>
<ul>
<li>vl: add zig zag scan for list 4x4</li>
<li>st/omx/dec/h264: fix corruption when scaling matrix present flag set</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: don't miss changes to SPI_TMPRING_SIZE</li>
</ul>
<p>Nicolai Hähnle (11):</p>
<ul>
<li>mesa/bufferobj: make _mesa_delete_buffer_object externally accessible</li>
<li>st/mesa: use _mesa_delete_buffer_object</li>
<li>radeon: use _mesa_delete_buffer_object</li>
<li>i915: use _mesa_delete_buffer_object</li>
<li>i965: use _mesa_delete_buffer_object</li>
<li>util/u_pstipple.c: copy immediates during transformation</li>
<li>radeonsi: extract the VGT_GS_MODE calculation into its own function</li>
<li>radeonsi: ensure that VGT_GS_MODE is sent when necessary</li>
<li>radeonsi: add DCC buffer for sampler views on new CS</li>
<li>st/mesa: use the correct address generation functions in st_TexSubImage blit</li>
<li>radeonsi: fix discard-only fragment shaders (11.1 version)</li>
</ul>
<p>Timothy Arceri (4):</p>
<ul>
<li>glsl: fix segfault linking subroutine uniform with explicit location</li>
<li>mesa: fix segfault in glUniformSubroutinesuiv()</li>
<li>glsl: fix interface block error message</li>
<li>glsl: create helper to remove outer vertex index array used by some stages</li>
</ul>
</div>
</body>
</html>

319
docs/relnotes/11.1.3.html Normal file
View File

@@ -0,0 +1,319 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.3 Release Notes / April 17, 2016</h1>
<p>
Mesa 11.1.3 is a bug fix release which fixes bugs found since the 11.1.2 release.
</p>
<p>
Mesa 11.1.3 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
9e86c72b6b2e8adb53c1c4a0002ab267b45094d753eb9404b1db34f81ce94ccf mesa-11.1.3.tar.gz
51f6658a214d75e4d9f05207586d7ed56ebba75c6b10841176fb6675efa310ac mesa-11.1.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=27512">Bug 27512</a> - Illegal instruction _mesa_x86_64_transform_points4_general</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93358">Bug 93358</a> - [HSW] Unreal Elemental demo - assertion error in copy_image_with_blitter</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93418">Bug 93418</a> - Geometry Shaders output wrong vertices on Sandy Bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93524">Bug 93524</a> - Clover doesn't build</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93667">Bug 93667</a> - Crash in eglCreateImageKHR with huge texture size</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93813">Bug 93813</a> - Incorrect viewport range when GL_CLIP_ORIGIN is GL_UPPER_LEFT</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94050">Bug 94050</a> - test_vec4_register_coalesce regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94073">Bug 94073</a> - Miscompilation of abs_vec3_vert_xvary_ref.vert in WebGL conformance</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94088">Bug 94088</a> - [llvmpipe] SIGFPE pthread_barrier_destroy.c:40</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94195">Bug 94195</a> - [llvmpipe] Does not build with LLVM 3.7.x on Windows</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94388">Bug 94388</a> - r600_blit.c:281: r600_decompress_depth_textures: Assertion `tex-&gt;is_depth &amp;&amp; !tex-&gt;is_flushing_texture' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94412">Bug 94412</a> - Trine 3 misrender</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&amp;swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94954">Bug 94954</a> - test_vec4_copy_propagation fails in `make check`</li>
</ul>
<h2>Changes</h2>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Fix assert conditions for src/dst x/y offsets</li>
</ul>
<p>Ben Widawsky (2):</p>
<ul>
<li>i965: Make sure we blit a full compressed block</li>
<li>i965/skl: Add two missing device IDs</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix incorrect viewport position when GL_CLIP_ORIGIN = GL_LOWER_LEFT</li>
</ul>
<p>Chris Forbes (1):</p>
<ul>
<li>i965/blorp: Fix hiz ops on MSAA surfaces</li>
</ul>
<p>Christian König (1):</p>
<ul>
<li>radeon/uvd: disable MPEG1</li>
</ul>
<p>Christian Schmidbauer (1):</p>
<ul>
<li>st/nine: specify WINAPI only for i386 and amd64</li>
</ul>
<p>Daniel Czarnowski (3):</p>
<ul>
<li>egl_dri2: NULL check for xcb_dri2_get_buffers_reply()</li>
<li>egl_dri2: set correct error code if swapbuffers fails</li>
<li>egl: support EGL_LARGEST_PBUFFER in eglCreatePbufferSurface(...)</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/fbobject: propogate Layered when reusing attachments.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/wayland: Try to use wl_surface.damage_buffer for SwapBuffersWithDamage</li>
</ul>
<p>Dongwon Kim (1):</p>
<ul>
<li>egl: move Null check to eglGetSyncAttribKHR to prevent Segfault</li>
</ul>
<p>Emil Velikov (10):</p>
<ul>
<li>docs: add sha256 checksums for 11.1.2</li>
<li>get-pick-list.sh: Require explicit "11.1" for nominating stable patches</li>
<li>cherry-ignore: do not pick nv50/ir commit</li>
<li>automake: add nine to make distcheck</li>
<li>install-gallium-links: port changes from install-lib-links</li>
<li>automake: add more missing options for make distcheck</li>
<li>mesa; add get-extra-pick-list.sh script into bin/</li>
<li>egl/x11: check the return value of xcb_dri2_get_buffers_reply()</li>
<li>nvc/ir: remove duplicate variable declaration</li>
<li>Update version to 11.1.3</li>
</ul>
<p>Francisco Jerez (4):</p>
<ul>
<li>i965: Reupload push and pull constants when we get new shader image unit state.</li>
<li>i965/fs: Add missing analysis invalidation in opt_sampler_eot().</li>
<li>i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().</li>
<li>i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.</li>
</ul>
<p>Ilia Mirkin (21):</p>
<ul>
<li>nvc0/ir: fix converting between predicate and gpr</li>
<li>nvc0: add some missing PUSH_SPACE's</li>
<li>nvc0: avoid negatives in PUSH_SPACE argument</li>
<li>glsl: make sure builtins are initialized before getting the shader</li>
<li>glsl: return cloned signature, not the builtin one</li>
<li>nv50/ir: fix quadop emission in the presence of predication</li>
<li>st/mesa: fix up result_src.type when doing i2u/u2i conversions</li>
<li>meta/copy_image: use precomputed dst_internal_format to avoid segfault</li>
<li>st/mesa: force depth mode to GL_RED for sized depth/stencil formats</li>
<li>glx: update to updated version of EXT_create_context_es2_profile</li>
<li>nv50,nvc0: bump minimum texture buffer offset alignment</li>
<li>nvc0: reset TFB bufctx when we no longer hold a reference to the buffers</li>
<li>glsl: avoid stack smashing when there are too many attributes</li>
<li>nvc0: fix blit triangle size to fully cover FB's &gt; 8192x8192</li>
<li>nv50: reset TFB bufctx when we no longer hold a reference to the buffers</li>
<li>nv50/ir: force-enable derivatives on TXD ops</li>
<li>st/mesa: only minify depth for 3d targets</li>
<li>nv50/ir: fix indirect texturing for non-array textures on nvc0</li>
<li>nvc0/ir: fix picking of coordinates from tex instruction for textureGrad</li>
<li>nvc0: disable primitive restart and index bias during blits</li>
<li>nv50/ir: we can't load local memory directly into an output</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>nir/lower_vec_to_movs: Better report channels handled by insert_mov</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>mesa: Make glGet queries initialize ctx-&gt;Debug when necessary.</li>
<li>mesa: Allow Get*() of several forgotten IsEnabled() pnames.</li>
<li>i965: Only magnify depth for 3D textures, not array textures.</li>
</ul>
<p>Koop Mast (1):</p>
<ul>
<li>st/clover: Add libelf cflags to the build</li>
</ul>
<p>Marc-André Lureau (1):</p>
<ul>
<li>virtio_gpu: Add virtio 1.0 PCI ID to driver map</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>radeonsi: fix Hyper-Z on Stoney</li>
<li>gallium/radeon: don't use temporary buffers for persistent mappings</li>
<li>radeonsi: fix Hyper-Z hangs on P2 configs</li>
</ul>
<p>Matt Turner (3):</p>
<ul>
<li>i965/vec4: don't copy ATTR into 3src instructions with complex swizzles</li>
<li>i965/fs: Don't CSE negated multiplies with saturation.</li>
<li>i965/vec4: Update vec4 unit tests for commit 01dacc83ff.</li>
</ul>
<p>Nanley Chery (2):</p>
<ul>
<li>mesa/image: Make _mesa_clip_readpixels() work with renderbuffers</li>
<li>mesa/readpix: Clip ReadPixels() area to the ReadBuffer's</li>
</ul>
<p>Nicolai Hähnle (2):</p>
<ul>
<li>r600g: clear compressed_depthtex/colortex_mask when binding buffer texture</li>
<li>st/mesa: use the texture view's format for render-to-texture</li>
</ul>
<p>Nishanth Peethambaran (2):</p>
<ul>
<li>st/omx: Remove trailing spaces</li>
<li>st/omx/dec: Correct the timestamping</li>
</ul>
<p>Oded Gabbay (8):</p>
<ul>
<li>gallium/radeon: Correctly translate colorswaps for big endian</li>
<li>llvmpipe: use vpkswss when dst is signed</li>
<li>gallium/radeon: return correct values for BE in r600_translate_colorswap</li>
<li>gallium/radeon: remove separate BE path in r600_translate_colorswap</li>
<li>gallium/r600: Don't let h/w do endian swap for colorformat</li>
<li>gallium/radeon: disable evergreen_do_fast_color_clear for BE</li>
<li>r600g: Do colorformat endian swap for PIPE_USAGE_STAGING</li>
<li>radeonsi: Do colorformat endian swap for PIPE_USAGE_STAGING</li>
</ul>
<p>Olivier Pena (1):</p>
<ul>
<li>scons: support for LLVM 3.7.</li>
</ul>
<p>Patrick Baggett (1):</p>
<ul>
<li>mesa: Use SSE prefetch instructions rather than 3DNow instructions</li>
</ul>
<p>Rob Herring (10):</p>
<ul>
<li>Android: remove dependence on .SECONDEXPANSION</li>
<li>Android: glsl: fix dependence on YACC_HEADER_SUFFIX from build system</li>
<li>Android: add -Wno-date-time flag for clang</li>
<li>Android: remove headers from LOCAL_SRC_FILES</li>
<li>Android: clean-up and fix DRI module path handling</li>
<li>freedreno: drop unnecessary -Wno-packed-bitfield-compat</li>
<li>gallium/radeon: Add space between string literal and identifier</li>
<li>r600: Make enum alu_op_flags unsigned</li>
<li>virtio_gpu: Add PCI ID to driver map</li>
<li>Android: fix x86 gallium builds</li>
</ul>
<p>Roland Scheidegger (2):</p>
<ul>
<li>softpipe: fix anisotropic filtering crash</li>
<li>draw: fix line stippling</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>nvc0: make sure to delete samplers used by compute shaders</li>
</ul>
<p>Steinar H. Gunderson (1):</p>
<ul>
<li>mesa: Fix locking of GLsync objects.</li>
</ul>
<p>Tamil velan (1):</p>
<ul>
<li>radeon/uvd: increase max height to 4096 for VI and newer</li>
</ul>
<p>Thomas Hellstrom (2):</p>
<ul>
<li>winsys/svga: Fix an uninitialized return value</li>
<li>winsys/svga: Increase the fence timeout</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>llvmpipe: Do not use barriers if not using threads.</li>
</ul>
<p>xavier (1):</p>
<ul>
<li>r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.</li>
</ul>
</div>
</body>
</html>

182
docs/relnotes/11.1.4.html Normal file
View File

@@ -0,0 +1,182 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.4 Release Notes / May 9, 2016</h1>
<p>
Mesa 11.1.4 is a bug fix release which fixes bugs found since the 11.1.3 release.
</p>
<p>
Mesa 11.1.4 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
034231fffb22621dadb8e4a968cb44752b8b68db7a2417568d63c275b3490cea mesa-11.1.4.tar.gz
0f781e9072655305f576efd4204d183bf99ac8cb8d9e0dd9fc2b4093230a0eba mesa-11.1.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92850">Bug 92850</a> - Segfault loading War Thunder</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93962">Bug 93962</a> - [HSW, regression, bisected, CTS] ES2-CTS.gtf.GL2FixedTests.scissor.scissor - segfault/asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94955">Bug 94955</a> - Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94994">Bug 94994</a> - OSMesaGetProcAdress always fails on mangled OSMesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95026">Bug 95026</a> - Alien Isolation segfault after initial loading screen/video</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95133">Bug 95133</a> - X-COM Enemy Within crashes when entering tactical mission with Bonaire</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>gallium/util: initialize pipe_framebuffer_state to zeros</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>dri: Fix robust context creation via EGL attribute</li>
</ul>
<p>Egbert Eich (1):</p>
<ul>
<li>dri2: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 11.1.3</li>
<li>cherry-ignore: add non-applicable "fix of a fix"</li>
<li>cherry-ignore: ignore st_DrawAtlasBitmaps mem leak fix</li>
<li>cherry-ignore: add CodeEmitterGK110::emitATOM() fix</li>
<li>Update version to 11.1.4</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>vc4: Fix subimage accesses to LT textures.</li>
<li>vc4: Add support for rendering to cube map surfaces.</li>
<li>vc4: Fix tests for format supported with nr_samples == 1.</li>
<li>vc4: Make sure we recompile when sample_mask changes.</li>
</ul>
<p>Frederic Devernay (1):</p>
<ul>
<li>glapi: fix _glapi_get_proc_address() for mangled function names</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functions</li>
<li>i965/tiled_memcpy: Rework the RGBA -&gt; BGRA mem_copy functions</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>egl/x11: authenticate before doing chipset id ioctls</li>
</ul>
<p>Jose Fonseca (1):</p>
<ul>
<li>winsys/sw/xlib: use correct free function for xlib_dt-&gt;data</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/uvd: fix tonga feedback buffer size</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>drirc: add a workaround for blackness in Warsow</li>
<li>st/mesa: fix blit-based GetTexImage for non-finalized textures</li>
</ul>
<p>Nicolai Hähnle (5):</p>
<ul>
<li>radeonsi: fix bounds check in si_create_vertex_elements</li>
<li>gallium/radeon: handle failure when mapping staging buffer</li>
<li>st/glsl_to_tgsi: reduce stack explosion in recursive expression visitor</li>
<li>gallium/radeon: fix crash in r600_set_streamout_targets</li>
<li>radeonsi: correct NULL-pointer check in si_upload_const_buffer</li>
</ul>
<p>Oded Gabbay (4):</p>
<ul>
<li>r600g/radeonsi: send endian info to format translation functions</li>
<li>r600g: set endianess of 16/32-bit buffers according to do_endian_swap</li>
<li>r600g: use do_endian_swap in color swapping functions</li>
<li>r600g: use do_endian_swap in texture swapping function</li>
</ul>
<p>Roland Scheidegger (3):</p>
<ul>
<li>llvmpipe: (trivial) initialize src1_alpha var to NULL</li>
<li>gallivm: fix bogus argument order to lp_build_sample_mipmap function</li>
<li>gallivm: make sampling more robust against bogus coordinates</li>
</ul>
<p>Samuel Pitoiset (5):</p>
<ul>
<li>gk110/ir: make use of IMUL32I for all immediates</li>
<li>nvc0/ir: fix wrong emission of (a OP b) OP c</li>
<li>gk110/ir: add emission for (a OP b) OP c</li>
<li>nvc0: reduce GL_MAX_3D_TEXTURE_SIZE to 2048 on Kepler+</li>
<li>st/glsl_to_tgsi: fix potential crash when allocating temporaries</li>
</ul>
<p>Stefan Dirsch (1):</p>
<ul>
<li>dri3: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Thomas Hindoe Paaboel Andersen (1):</p>
<ul>
<li>st/va: avoid dereference after free in vlVaDestroyImage</li>
</ul>
<p>WuZhen (3):</p>
<ul>
<li>tgsi: initialize stack allocated struct</li>
<li>winsys/sw/dri: use correct free function for dri_sw_dt-&gt;data</li>
<li>android: enable dlopen() on all architectures</li>
</ul>
</div>
</body>
</html>

296
docs/relnotes/11.2.0.html Normal file
View File

@@ -0,0 +1,296 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.2.0 Release Notes / 4 April 2016</h1>
<p>
Mesa 11.2.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.2.1.
</p>
<p>
Mesa 11.2.0 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
dea3d8143929aad5c24ef0993ddb05807b30c284b488fc62903adfcc1c127887 mesa-11.2.0.tar.gz
1c1fed2674abf3f16ed2623e9a5694d6752c293194e18462ebc644a19cfaafb2 mesa-11.2.0.tar.xz
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_arrays_of_arrays on all gallium drivers that provide GLSL 1.30</li>
<li>GL_ARB_base_instance on freedreno/a4xx</li>
<li>GL_ARB_compute_shader on i965</li>
<li>GL_ARB_copy_image on r600</li>
<li>GL_ARB_indirect_parameters on nvc0</li>
<li>GL_ARB_query_buffer_object on nvc0</li>
<li>GL_ARB_shader_atomic_counters on nvc0</li>
<li>GL_ARB_shader_draw_parameters on i965, nvc0</li>
<li>GL_ARB_shader_storage_buffer_object on nvc0</li>
<li>GL_ARB_tessellation_shader on i965 and r600 (evergreen/cayman only)</li>
<li>GL_ARB_texture_buffer_object_rgb32 on freedreno/a4xx</li>
<li>GL_ARB_texture_buffer_range on freedreno/a4xx</li>
<li>GL_ARB_texture_query_lod on freedreno/a4xx</li>
<li>GL_ARB_texture_rgb10_a2ui on freedreno/a4xx</li>
<li>GL_ARB_texture_view on freedreno/a4xx</li>
<li>GL_ARB_vertex_type_10f_11f_11f_rev on freedreno/a4xx</li>
<li>GL_KHR_texture_compression_astc_ldr on freedreno/a4xx</li>
<li>GL_AMD_performance_monitor on radeonsi (CIK+ only)</li>
<li>GL_ATI_meminfo on r600, radeonsi</li>
<li>GL_NVX_gpu_memory_info on r600, radeonsi</li>
<li>New OSMesaCreateContextAttribs() function (for creating core profile
contexts)</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=27512">Bug 27512</a> - Illegal instruction _mesa_x86_64_transform_points4_general</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=75165">Bug 75165</a> - compute.c:464:49: error: function definition is not allowed here</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors &quot;work&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89330">Bug 89330</a> - piglit glsl-1.50 invariant-qualifier-in-out-block-01 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89969">Bug 89969</a> - nouveau: add support for chunk decoding in order to support vaapi (st/va)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91806">Bug 91806</a> - configure does not test whether assembler supports sse4.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail with kernel upgrade</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92229">Bug 92229</a> - [APITRACE] SOMA have serious graphical errors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92233">Bug 92233</a> - Unigine Heaven 4.0 silhuette run</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92589">Bug 92589</a> - [BDW BSW SKL CTS] ES31-CTS.texture_gather.* GPU_HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92595">Bug 92595</a> - [HSW,BDW,SKL][GLES 3.1 CTS] Big difference in the results for the ES31-CTS.shader_bitfield_operation.* tests</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92609">Bug 92609</a> - [BDW, BSW] piglit sampling-2d-array-as-2d-layer fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92687">Bug 92687</a> - Add support for ARB_internalformat_query2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92706">Bug 92706</a> - glBlitFramebuffer refuses to blit RGBA to RGB with MSAA</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - &quot;LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main&quot; when starting race in stuntrally</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92743">Bug 92743</a> - Centroid shouldn't have to match between the FS and the VS</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92759">Bug 92759</a> - [Regression, bisected] Visuals without alpha bits are not sRGB-capable</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of pipe_virgl_create_screen follows non-static declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93092">Bug 93092</a> - lp_test_format regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93189">Bug 93189</a> - &quot;./util/u_inlines.h&quot;, line 83: operands have incompatible types: void &quot;:&quot; int</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93257">Bug 93257</a> - [SKL, bisected] ASTC dEQP tests segfault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93264">Bug 93264</a> - Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93300">Bug 93300</a> - Two Worlds 2 renders water incorrectly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93312">Bug 93312</a> - [SKL][GLES 3.1 CTS] ES31-CTS.layout_binding* GPU_HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93320">Bug 93320</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93322">Bug 93322</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.resource-ubo fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93323">Bug 93323</a> - [HSW,BDW,SKL][GLES 3.1 CTS]ES31-CTS.shader_image_load_store.basic-allTargets-store-fs fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93325">Bug 93325</a> - [HSW,BDW,SKL]ES31-CTS.explicit_uniform_location.uniform-loc-* 2 tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93339">Bug 93339</a> - glLinkProgram() should fail when a varying is never written to in a previous stage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93348">Bug 93348</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.* segfault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93358">Bug 93358</a> - [HSW] Unreal Elemental demo - assertion error in copy_image_with_blitter</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93387">Bug 93387</a> - inverse() shouldnt be exposed in GLSL 1.20 and 1.30</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93388">Bug 93388</a> - [i965, regression, bisection] MESA_FORMAT_B8G8R8X8_SRGB changes break kwin</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93407">Bug 93407</a> - [SKL][GLES 3.1 CTS]ES31-CTS.compute_shader.resources-texture fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93410">Bug 93410</a> - [BDW,SKL][GLES 3.1 CTS]ES31-CTS.shader_image_load_store.negative-linkErrors fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93418">Bug 93418</a> - Geometry Shaders output wrong vertices on Sandy Bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93426">Bug 93426</a> - [SKL,BDW,BSW,BXT] CTS regression: es2-cts.gtf.gl2fixedtests.buffer_objects.buffer_object,s</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93524">Bug 93524</a> - Clover doesn't build</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93526">Bug 93526</a> - GfxBench 4 tessellation demos misrender</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93532">Bug 93532</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.*. Regression, bisected.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93540">Bug 93540</a> - [BISECTED, HSW] Rendering issue in Heaven (and other benchmarks)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93560">Bug 93560</a> - opt_combine_constants failing fabsf(reg-&gt;f) == table.imm[i].val assertion</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93599">Bug 93599</a> - Strange green flashes with &quot;Metro: Last Light Redux&quot; + &quot;Metro 2033 Redux&quot; with Intel Mesa driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93648">Bug 93648</a> - Random lines being rendered when playing Dolphin (geometry shaders related, w/ apitrace)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93650">Bug 93650</a> - GL_ARB_separate_shader_objects is buggy (PCSX2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93667">Bug 93667</a> - Crash in eglCreateImageKHR with huge texture size</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93696">Bug 93696</a> - [HSW,BDW;SKL][GLES 3.1 CTS]ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93700">Bug 93700</a> - [SKL, regression] deqp-gles2.functional.texture.completeness</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93717">Bug 93717</a> - Meta mipmap generation can corrupt texture state</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93722">Bug 93722</a> - Segfault when compiling shader with a subroutine that takes a parameter</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93725">Bug 93725</a> - [HSW, regression, bisected] ES31-CTS.texture_gather.*depth*</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93731">Bug 93731</a> - glUniformSubroutinesuiv segfaults when subroutine uniform is bound to a specific location</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93761">Bug 93761</a> - A conditional discard in a fragment shader causes no depth writing at all</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93790">Bug 93790</a> - [HSW] Use after free with compute programs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93792">Bug 93792</a> - [HSW] intel_mipmap_tree.c:1325: intel_miptree_copy_slice: Assertion `src_mt-&gt;format == dst_mt-&gt;format</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93813">Bug 93813</a> - Incorrect viewport range when GL_CLIP_ORIGIN is GL_UPPER_LEFT</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93840">Bug 93840</a> - [i965] Alien: Isolation fails with GL_ARB_compute_shader enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93862">Bug 93862</a> - [Bisected] &quot;drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2&quot; is bad</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93878">Bug 93878</a> - [llvmpipe][softpipe] piglit arb_gpu_shader_fp64-double-gettransformfeedbackvarying regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93957">Bug 93957</a> - [HSW] Mishandling of sample count when using an attachment-less framebuffer (assertion error)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93961">Bug 93961</a> - virgl build failure after 2016-02-01 changes - no previous prototype for 'virgl_drm_winsys_create'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93962">Bug 93962</a> - [HSW, regression, bisected, CTS] ES2-CTS.gtf.GL2FixedTests.scissor.scissor - segfault/asserts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93989">Bug 93989</a> - build: flex-2.5.39 seems to be failing for glsl_lexer.ll</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94016">Bug 94016</a> - make check MesaExtensionsTest.AlphabeticallySorted regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94019">Bug 94019</a> - [bisected] 3D acceleration broken with gallium/radeon: just get num_tile_pipes from the winsys</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94050">Bug 94050</a> - test_vec4_register_coalesce regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94073">Bug 94073</a> - Miscompilation of abs_vec3_vert_xvary_ref.vert in WebGL conformance</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94081">Bug 94081</a> - [HSW] compute shader shared var + atomic op = fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94088">Bug 94088</a> - [llvmpipe] SIGFPE pthread_barrier_destroy.c:40</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94091">Bug 94091</a> - Tonga unreal elemental segfault since radeonsi: put image, fmask, and sampler descriptors into one array</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94100">Bug 94100</a> - [HSW] compute indirect dispatch with 0 work groups causes gpu hang</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94134">Bug 94134</a> - [regression] piglit.spec.arb_texture_view.sampling-2d-array-as-2d-layer assertion</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94139">Bug 94139</a> - [regression, HSW, IVB] piglit.spec.arb_compute_shader.minmax</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94150">Bug 94150</a> - UE4 Suntemple rendering errors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94186">Bug 94186</a> - Crash when launching glxinfo and World of Warcraft with RV790</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94188">Bug 94188</a> - define (or undef) defined behaves stupidly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94199">Bug 94199</a> - Shader abort/crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94253">Bug 94253</a> - [llvmpipe] piglit gl-1.0-swapbuffers-behavior regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94254">Bug 94254</a> - [llvmpipe] [softpipe] piglit read-front regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94257">Bug 94257</a> - [softpipe] piglit glx-copy-sub-buffer regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94274">Bug 94274</a> - [swrast] piglit arb_occlusion_query2-render regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94284">Bug 94284</a> - [radeonsi] outlast segfault on start</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94388">Bug 94388</a> - r600_blit.c:281: r600_decompress_depth_textures: Assertion `tex-&gt;is_depth &amp;&amp; !tex-&gt;is_flushing_texture' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94412">Bug 94412</a> - Trine 3 misrender</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94524">Bug 94524</a> - Wrong gl_TessLevelOuter interpretation for isolines</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&amp;swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
</ul>
<h2>Changes</h2>
Microsoft Visual Studio 2013 or later is now required for building
on Windows.
Previously, Visual Studio 2008 and later were supported.
</div>
</body>
</html>

119
docs/relnotes/11.2.1.html Normal file
View File

@@ -0,0 +1,119 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.2.1 Release Notes / April 17, 2016</h1>
<p>
Mesa 11.2.1 is a bug fix release which fixes bugs found since the 11.2.0 release.
</p>
<p>
Mesa 11.2.1 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
cc2a024204564a71acc95cf262bf618fe49b1d77d351e5755eea705cadac5167 mesa-11.2.1.tar.gz
a65207e9ae5c5f1c29f863c6a2cc98a7ab99762a24b82a248337f0ea9cfce01b mesa-11.2.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93962">Bug 93962</a> - [HSW, regression, bisected, CTS] ES2-CTS.gtf.GL2FixedTests.scissor.scissor - segfault/asserts</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (2):</p>
<ul>
<li>st/mesa: fix glReadBuffer() assertion failure</li>
<li>st/mesa: fix memleak in glDrawPixels cache code</li>
</ul>
<p>Christian Schmidbauer (1):</p>
<ul>
<li>st/nine: specify WINAPI only for i386 and amd64</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 11.2.0</li>
<li>configure.ac: update the path of the generated files</li>
<li>Update version to 11.2.1</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>glsl: allow usage of the keyword buffer before GLSL 430 / ESSL 310</li>
</ul>
<p>Iurie Salomov (1):</p>
<ul>
<li>va: check null context in vlVaDestroyContext</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functions</li>
<li>i965/tiled_memcpy: Rework the RGBA -&gt; BGRA mem_copy functions</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>i965: Fix textureSize() depth value for 1 layer surfaces on Gen4-6.</li>
<li>i965: Use brw-&gt;urb.min_vs_urb_entries instead of 32 for BLORP.</li>
<li>glsl: Lower variable indexing of system value arrays unconditionally.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>drirc: add a workaround for blackness in Warsow</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: fix bounds check in si_create_vertex_elements</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>nv50/ir: do not try to attach JOIN ops to ATOM</li>
</ul>
<p>Thomas Hindoe Paaboel Andersen (1):</p>
<ul>
<li>st/va: avoid dereference after free in vlVaDestroyImage</li>
</ul>
</div>
</body>
</html>

210
docs/relnotes/11.2.2.html Normal file
View File

@@ -0,0 +1,210 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.2.2 Release Notes / May 9, 2016</h1>
<p>
Mesa 11.2.2 is a bug fix release which fixes bugs found since the 11.2.1 release.
</p>
<p>
Mesa 11.2.2 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
e2453014cd2cc5337a5180cdeffe8cf24fffbb83e20a96888e2b01df868eaae6 mesa-11.2.2.tar.gz
40e148812388ec7c6d7b6657d5a16e2e8dabba8b97ddfceea5197947647bdfb4 mesa-11.2.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92850">Bug 92850</a> - Segfault loading War Thunder</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93767">Bug 93767</a> - Glitches with soft shadows and MSAA in Knights of the Old Republic 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94955">Bug 94955</a> - Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94994">Bug 94994</a> - OSMesaGetProcAdress always fails on mangled OSMesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95026">Bug 95026</a> - Alien Isolation segfault after initial loading screen/video</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95133">Bug 95133</a> - X-COM Enemy Within crashes when entering tactical mission with Bonaire</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95164">Bug 95164</a> - GLSL compiler (linker I think) emits assertion upon call to glAttachShader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95251">Bug 95251</a> - vdpau decoder capabilities: not supported</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/uvd: alignment fix for decode message buffer</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>st/mesa: fix sampler view leak in st_DrawAtlasBitmaps()</li>
<li>gallium/util: initialize pipe_framebuffer_state to zeros</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>dri: Fix robust context creation via EGL attribute</li>
</ul>
<p>Egbert Eich (1):</p>
<ul>
<li>dri2: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 11.2.1</li>
<li>docs: update the sha256 checksums for 11.2.1</li>
<li>cherry-ignore: remove duplicate commit</li>
<li>cherry-ignore: ignore the GetSamplerParameterIuiv{EXT,OES} fixups</li>
<li>Update version to 11.2.2</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>vc4: Fix subimage accesses to LT textures.</li>
<li>vc4: Add support for rendering to cube map surfaces.</li>
<li>vc4: Fix tests for format supported with nr_samples == 1.</li>
<li>vc4: Make sure we recompile when sample_mask changes.</li>
</ul>
<p>Frederic Devernay (1):</p>
<ul>
<li>glapi: fix _glapi_get_proc_address() for mangled function names</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nvc0: fix retrieving query results into buffer for timestamps</li>
<li>nouveau/video: properly detect the decoder class for availability checks</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Properly report regs_written from SAMPLEINFO</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>egl/x11: authenticate before doing chipset id ioctls</li>
</ul>
<p>Jose Fonseca (1):</p>
<ul>
<li>winsys/sw/xlib: use correct free function for xlib_dt-&gt;data</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>i965: Fix clear code for ignoring colormask for XRGB formats on Gen9+.</li>
<li>glsl: Convert lower_vec_index_to_swizzle to a rvalue visitor.</li>
<li>glsl: Lower vector_extracts to swizzles after lower_vector_derefs.</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/uvd: fix tonga feedback buffer size</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: fix blit-based GetTexImage for non-finalized textures</li>
</ul>
<p>Nicolai Hähnle (5):</p>
<ul>
<li>gallium/radeon: handle failure when mapping staging buffer</li>
<li>st/glsl_to_tgsi: reduce stack explosion in recursive expression visitor</li>
<li>gallium/radeon: fix crash in r600_set_streamout_targets</li>
<li>radeonsi: correct NULL-pointer check in si_upload_const_buffer</li>
<li>radeonsi: work around an MSAA fast stencil clear problem</li>
</ul>
<p>Oded Gabbay (4):</p>
<ul>
<li>r600g/radeonsi: send endian info to format translation functions</li>
<li>r600g: set endianess of 16/32-bit buffers according to do_endian_swap</li>
<li>r600g: use do_endian_swap in color swapping functions</li>
<li>r600g: use do_endian_swap in texture swapping function</li>
</ul>
<p>Patrick Rudolph (1):</p>
<ul>
<li>r600g: fix and optimize tgsi_cmp when using ABS and NEG modifier</li>
</ul>
<p>Roland Scheidegger (3):</p>
<ul>
<li>llvmpipe: (trivial) initialize src1_alpha var to NULL</li>
<li>gallivm: fix bogus argument order to lp_build_sample_mipmap function</li>
<li>gallivm: make sampling more robust against bogus coordinates</li>
</ul>
<p>Samuel Pitoiset (6):</p>
<ul>
<li>gk110/ir: do not overwrite def value with zero for EXCH ops</li>
<li>gk110/ir: make use of IMUL32I for all immediates</li>
<li>nvc0/ir: fix wrong emission of (a OP b) OP c</li>
<li>gk110/ir: add emission for (a OP b) OP c</li>
<li>nvc0: reduce GL_MAX_3D_TEXTURE_SIZE to 2048 on Kepler+</li>
<li>st/glsl_to_tgsi: fix potential crash when allocating temporaries</li>
</ul>
<p>Stefan Dirsch (1):</p>
<ul>
<li>dri3: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Topi Pohjolainen (2):</p>
<ul>
<li>i965/blorp/gen7: Prepare re-using for gen8</li>
<li>i965/blorp: Use 8k chunk size for urb allocation</li>
</ul>
<p>WuZhen (3):</p>
<ul>
<li>tgsi: initialize stack allocated struct</li>
<li>winsys/sw/dri: use correct free function for dri_sw_dt-&gt;data</li>
<li>android: enable dlopen() on all architectures</li>
</ul>
</div>
</body>
</html>

89
docs/relnotes/11.3.0.html Normal file
View File

@@ -0,0 +1,89 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.3.0 Release Notes / TBD</h1>
<p>
Mesa 11.3.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.3.1.
</p>
<p>
Mesa 11.3.0 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 4.3 on nvc0, radeonsi, i965 (Gen8+)</li>
<li>OpenGL ES 3.1 on nvc0, radeonsi</li>
<li>GL_ARB_ES3_1_compatibility on nvc0, radeonsi</li>
<li>GL_ARB_compute_shader on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_cull_distance on i965/gen6+, nv50, nvc0, llvmpipe, softpipe</li>
<li>GL_ARB_framebuffer_no_attachments on nvc0, r600, radeonsi, softpipe</li>
<li>GL_ARB_internalformat_query2 on all drivers</li>
<li>GL_ARB_query_buffer_object on i965/hsw+</li>
<li>GL_ARB_robust_buffer_access_behavior on i965, nvc0, radeonsi</li>
<li>GL_ARB_shader_atomic_counters on radeonsi, softpipe</li>
<li>GL_ARB_shader_atomic_counter_ops on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_load_store on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_image_size on nvc0, radeonsi, softpipe</li>
<li>GL_ARB_shader_storage_buffer_objects on radeonsi, softpipe</li>
<li>GL_ATI_fragment_shader on all Gallium drivers</li>
<li>GL_EXT_base_instance on all drivers that support GL_ARB_base_instance</li>
<li>GL_EXT_clip_cull_distance on all drivers that support GL_ARB_cull_distance</li>
<li>GL_KHR_robustness on i965</li>
<li>GL_OES_copy_image on i965 (Baytrail and Gen8+)</li>
<li>GL_OES_draw_buffers_indexed and GL_EXT_draw_buffers_indexed on all drivers that support GL_ARB_draw_buffers_blend</li>
<li>GL_OES_gpu_shader5 and GL_EXT_gpu_shader5 on all drivers that support GL_ARB_gpu_shader5</li>
<li>GL_OES_sample_shading on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_sample_variables on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_shader_image_atomic on all drivers that support GL_ARB_shader_image_load_store</li>
<li>GL_OES_shader_io_blocks on i965, nvc0, radeonsi</li>
<li>GL_OES_shader_multisample_interpolation on i965, nvc0, r600, radeonsi</li>
<li>GL_OES_texture_border_clamp and GL_EXT_texture_border_clamp on all drivers that support GL_ARB_texture_border_clamp</li>
<li>GL_OES_texture_buffer and GL_EXT_texture_buffer on i965, nvc0, radeonsi</li>
<li>EGL_KHR_reusable_sync on all drivers</li>
<li>GL_ARB_stencil_texture8 and GL_OES_stencil_texture8 on i965/gen8+</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -68,21 +68,39 @@ To get the Mesa sources anonymously (read-only):
<h2 id="developer">Developer git Access</h2>
<p>
Mesa developers need to first have an account on
<a href="http://www.freedesktop.org">freedesktop.org</a>.
To get an account, please ask Brian or the other Mesa developers for
permission.
Then, if there are no objections, follow this
<a href="http://www.freedesktop.org/wiki/AccountRequests">
procedure</a>.
If you wish to become a Mesa developer with git-write privilege, please
follow this procedure:
</p>
<ol>
<li>Subscribe to the
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev</a>
mailing list.
<li>Start contributing to the project by posting patches / review requests to
the mesa-dev list. Specifically,
<ul>
<li>Use <code>git send-mail</code> to post your patches to mesa-dev.
<li>Wait for someone to review the code and give you a <code>Reviewed-by</code>
statement.
<li>You'll have to rely on another Mesa developer to push your initial patches
after they've been reviewed.
</ul>
<li>After you've demonstrated the ability to write good code and have had
a dozen or so patches accepted you can apply for an account.
<li>Occasionally, but rarely, someone may be given a git account sooner, but
only if they're being supervised by another Mesa developer at the same
organization and planning to work in a limited area of the code or on a
separate branch.
<li>To apply for an account, follow
<a href="http://www.freedesktop.org/wiki/AccountRequests">these directions</a>.
It's also appreciated if you briefly describe what you intend to do (work
on a particular driver, add a new extension, etc.) in the bugzilla record.
</ol>
<p>
Once your account is established:
</p>
<ol>
<li>Install the git software on your computer if needed.<br><br>
<li>Get an initial, local copy of the repository with:
<pre>
git clone git+ssh://username@git.freedesktop.org/git/mesa/mesa

View File

@@ -209,51 +209,6 @@ The final vertex and fragment programs may be interpreted in software
(see drivers/dri/i915/i915_fragprog.c for example).
</p>
<h3>Code Generation Options</h3>
<p>
Internally, there are several options that control the compiler's code
generation and instruction selection.
These options are seen in the gl_shader_state struct and may be set
by the device driver to indicate its preferences:
<pre>
struct gl_shader_state
{
...
/** Driver-selectable options: */
GLboolean EmitHighLevelInstructions;
GLboolean EmitCondCodes;
GLboolean EmitComments;
};
</pre>
<dl>
<dt>EmitHighLevelInstructions</dt>
<dd>
This option controls instruction selection for loops and conditionals.
If the option is set high-level IF/ELSE/ENDIF, LOOP/ENDLOOP, CONT/BRK
instructions will be emitted.
Otherwise, those constructs will be implemented with BRA instructions.
</dd>
<dt>EmitCondCodes</dt>
<dd>
If set, condition codes (ala GL_NV_fragment_program) will be used for
branching and looping.
Otherwise, ordinary registers will be used (the IF instruction will
examine the first operand's X component and do the if-part if non-zero).
This option is only relevant if EmitHighLevelInstructions is set.
</dd>
<dt>EmitComments</dt>
<dd>
If set, instructions will be annotated with comments to help with debugging.
Extra NOP instructions will also be inserted.
</dd>
</dl>
<h2 id="validation">Compiler Validation</h2>
<p>

View File

@@ -34,8 +34,7 @@ Hardware drivers include:
</p>
<ul>
<li>Intel i965, i945, i915.
See <a href="http://intellinuxgraphics.org/index.html">
Intel's website</a></li>
See <a href="https://01.org/linuxgraphics">Intel's website</a></li>
<li>AMD Radeon series.
See <a href="http://www.x.org/wiki/RadeonFeature">RadeonFeature</a></li>
<li>NVIDIA GPUs.

View File

@@ -42,9 +42,7 @@ Tungsten Graphics, Inc. have supported the ongoing development of Mesa.
<li>The
<a href="http://www.mesa3d.org">Mesa</a>
website is hosted by
<a href="http://sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=3&amp;type=1"
width="88" height="31" align="bottom" alt="Sourceforge.net" border="0"></a>
<a href="http://sourceforge.net">sourceforge.net</a>.
<br>
<br>

View File

@@ -31,7 +31,7 @@
<dd>is a very useful tool for tracking down
memory-related problems in your code.</dd>
<dt><a href="http:scan.coverity.com/projects/mesa">Coverity</a><dt>
<dt><a href="http://scan.coverity.com/projects/mesa">Coverity</a><dt>
<dd>provides static code analysis of Mesa. If you create an account
you can see the results and try to fix outstanding issues.</dd>
</dl>

8
doxygen/.gitignore vendored
View File

@@ -1,9 +1,6 @@
*.db
*.tag
*.tmp
agpgart
array_cache
core
core_subset
gallium
gbm
@@ -13,11 +10,8 @@ i965
main
math
math_subset
miniglx
radeondrm
radeonfb
nir
radeon_subset
shader
swrast
swrast_setup
tnl

View File

@@ -12,20 +12,21 @@ FULL = \
vbo.doxy \
glapi.doxy \
glsl.doxy \
shader.doxy \
swrast.doxy \
swrast_setup.doxy \
tnl.doxy \
tnl_dd.doxy \
gbm.doxy \
i965.doxy
i965.doxy \
nir.doxy
full: $(FULL:.doxy=.tag)
$(foreach FILE,$(FULL),doxygen $(FILE);)
SUBSET = \
main.doxy \
math.doxy
math.doxy \
gallium.doxy
subset: $(SUBSET:.doxy=.tag)
$(foreach FILE,$(SUBSET),doxygen $(FILE);)

View File

@@ -53,16 +53,6 @@ CREATE_SUBDIRS = NO
OUTPUT_LANGUAGE = English
# This tag can be used to specify the encoding used in the generated output.
# The encoding is not always determined by the language that is chosen,
# but also whether or not the output is meant for Windows or non-Windows users.
# In case there is a difference, setting the USE_WINDOWS_ENCODING tag to YES
# forces the Windows encoding (this is the default for the Windows binary),
# whereas setting the tag to NO uses a Unix-style encoding (the default for
# all platforms other than Windows).
USE_WINDOWS_ENCODING = NO
# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will
# include brief member descriptions after the members that are listed in
# the file and class documentation (similar to JavaDoc).
@@ -147,13 +137,6 @@ JAVADOC_AUTOBRIEF = YES
MULTILINE_CPP_IS_BRIEF = NO
# If the DETAILS_AT_TOP tag is set to YES then Doxygen
# will output the detailed description near the top, like JavaDoc.
# If set to NO, the detailed description appears after the member
# documentation.
DETAILS_AT_TOP = YES
# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented
# member inherits the documentation from any documented member that it
# re-implements.
@@ -607,12 +590,6 @@ HTML_FOOTER =
HTML_STYLESHEET =
# If the HTML_ALIGN_MEMBERS tag is set to YES, the members of classes,
# files or namespaces will be aligned in HTML using tables. If set to
# NO a bullet list will be used.
HTML_ALIGN_MEMBERS = YES
# If the GENERATE_HTMLHELP tag is set to YES, additional index files
# will be generated that can be used as input for tools like the
# Microsoft HTML help workshop to generate a compressed HTML help file (.chm)
@@ -839,18 +816,6 @@ GENERATE_XML = NO
XML_OUTPUT = xml
# The XML_SCHEMA tag can be used to specify an XML schema,
# which can be used by a validating XML parser to check the
# syntax of the XML files.
XML_SCHEMA =
# The XML_DTD tag can be used to specify an XML DTD,
# which can be used by a validating XML parser to check the
# syntax of the XML files.
XML_DTD =
# If the XML_PROGRAMLISTING tag is set to YES Doxygen will
# dump the program listings (including syntax highlighting
# and cross-referencing information) to the XML output. Note that
@@ -1104,22 +1069,6 @@ DOT_PATH =
DOTFILE_DIRS =
# The MAX_DOT_GRAPH_WIDTH tag can be used to set the maximum allowed width
# (in pixels) of the graphs generated by dot. If a graph becomes larger than
# this value, doxygen will try to truncate the graph, so that it fits within
# the specified constraint. Beware that most browsers cannot cope with very
# large images.
MAX_DOT_GRAPH_WIDTH = 1024
# The MAX_DOT_GRAPH_HEIGHT tag can be used to set the maximum allows height
# (in pixels) of the graphs generated by dot. If a graph becomes larger than
# this value, doxygen will try to truncate the graph, so that it fits within
# the specified constraint. Beware that most browsers cannot cope with very
# large images.
MAX_DOT_GRAPH_HEIGHT = 1024
# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the
# graphs generated by dot. A depth value of 3 means that only nodes reachable
# from the root by following a path via at most 3 edges will be shown. Nodes that

View File

@@ -190,8 +190,7 @@ SKIP_FUNCTION_MACROS = YES
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = \
math_subset.tag=../math_subset \
miniglx.tag=../miniglx
math_subset.tag=../math_subset
GENERATE_TAGFILE = core_subset.tag
ALLEXTERNALS = NO
PERL_PATH =

View File

@@ -6,7 +6,9 @@ doxygen swrast_setup.doxy
doxygen tnl.doxy
doxygen core.doxy
doxygen glapi.doxy
doxygen shader.doxy
doxygen glsl.doxy
doxygen nir.doxy
doxygen i965.doxy
echo Building again, to resolve tags
doxygen tnl_dd.doxy
@@ -15,5 +17,8 @@ doxygen math.doxy
doxygen swrast.doxy
doxygen swrast_setup.doxy
doxygen tnl.doxy
doxygen core.doxy
doxygen glapi.doxy
doxygen shader.doxy
doxygen glsl.doxy
doxygen nir.doxy
doxygen i965.doxy

View File

@@ -39,10 +39,10 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = main.tag=../main \
math.tag=../math \
tnl_dd.tag=../tnl_dd \
swrast_setup.tag=../gbm_setup \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
vbo.tag=vbo
vbo.tag=../vbo
GENERATE_TAGFILE = gbm.tag

View File

@@ -9,7 +9,7 @@ PROJECT_NAME = "Mesa GL API dispatcher"
#---------------------------------------------------------------------------
# configuration options related to the input files
#---------------------------------------------------------------------------
INPUT = ../src/mesa/glapi/
INPUT = ../src/mapi/glapi/
FILE_PATTERNS = *.c *.h
RECURSIVE = NO
EXCLUDE =
@@ -39,11 +39,11 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = main.tag=../main \
math.tag=../math \
tnl_dd.tag=../tnl_dd \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
vbo.tag=vbo
GENERATE_TAGFILE = swrast.tag
vbo.tag=../vbo
GENERATE_TAGFILE = glapi.tag

View File

@@ -9,11 +9,12 @@ PROJECT_NAME = "Mesa GLSL module"
#---------------------------------------------------------------------------
# configuration options related to the input files
#---------------------------------------------------------------------------
INPUT = ../src/glsl/
INPUT = ../src/compiler/glsl/
FILE_PATTERNS = *.c *.cpp *.h
RECURSIVE = NO
EXCLUDE = ../src/glsl/glsl_lexer.cpp \
../src/glsl/glsl_parser.cpp \
../src/glsl/glsl_parser.h
EXCLUDE = ../src/compiler/glsl/glsl_lexer.cpp \
../src/compiler/glsl/glsl_parser.cpp \
../src/compiler/glsl/glsl_parser.h
EXCLUDE_PATTERNS =
#---------------------------------------------------------------------------
# configuration options related to the HTML output

View File

@@ -8,9 +8,9 @@
<a class="qindex" href="../main/index.html">core</a> |
<a class="qindex" href="../glapi/index.html">glapi</a> |
<a class="qindex" href="../glsl/index.html">glsl</a> |
<a class="qindex" href="../nir/index.html">nir</a> |
<a class="qindex" href="../vbo/index.html">vbo</a> |
<a class="qindex" href="../math/index.html">math</a> |
<a class="qindex" href="../shader/index.html">shader</a> |
<a class="qindex" href="../swrast/index.html">swrast</a> |
<a class="qindex" href="../swrast_setup/index.html">swrast_setup</a> |
<a class="qindex" href="../tnl/index.html">tnl</a> |

View File

@@ -6,6 +6,5 @@
<div class="qindex">
<a class="qindex" href="../core_subset/index.html">Mesa Core</a> |
<a class="qindex" href="../math_subset/index.html">math</a> |
<a class="qindex" href="../miniglx/index.html">MiniGLX</a> |
<a class="qindex" href="../radeon_subset/index.html">radeon_subset</a>
</div>

View File

@@ -46,5 +46,5 @@ TAGFILES = glsl.tag=../glsl \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
tnl_dd.tag=../tnl_dd \
vbo.tag=vbo
vbo.tag=../vbo
GENERATE_TAGFILE = i965.tag

View File

@@ -43,7 +43,6 @@ TAGFILES = tnl_dd.tag=../tnl_dd \
vbo.tag=../vbo \
glapi.tag=../glapi \
math.tag=../math \
shader.tag=../shader \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl

View File

@@ -41,7 +41,7 @@ SKIP_FUNCTION_MACROS = YES
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = tnl_dd.tag=../tnl_dd \
main.tag=../core \
main.tag=../main \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \

View File

@@ -5,45 +5,46 @@
#---------------------------------------------------------------------------
# General configuration options
#---------------------------------------------------------------------------
PROJECT_NAME = "Mesa Vertex and Fragment Program code"
PROJECT_NAME = "Mesa NIR module"
#---------------------------------------------------------------------------
# configuration options related to the input files
# Configuration options related to the input files
#---------------------------------------------------------------------------
INPUT = ../src/mesa/shader/
FILE_PATTERNS = *.c *.h
INPUT = ../src/compiler/nir
FILE_PATTERNS = *.c *.cpp *.h
RECURSIVE = NO
EXCLUDE =
EXCLUDE_PATTERNS =
EXAMPLE_PATH =
EXAMPLE_PATTERNS =
EXCLUDE =
EXCLUDE_PATTERNS =
EXAMPLE_PATH =
EXAMPLE_PATTERNS =
EXAMPLE_RECURSIVE = NO
IMAGE_PATH =
INPUT_FILTER =
IMAGE_PATH =
INPUT_FILTER =
FILTER_SOURCE_FILES = NO
#---------------------------------------------------------------------------
# configuration options related to the HTML output
# Configuration options related to the HTML output
#---------------------------------------------------------------------------
HTML_OUTPUT = shader
HTML_OUTPUT = nir
#---------------------------------------------------------------------------
# Configuration options related to the preprocessor
# Configuration options related to the preprocessor
#---------------------------------------------------------------------------
ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = NO
EXPAND_ONLY_PREDEF = NO
SEARCH_INCLUDES = YES
INCLUDE_PATH = ../include/
INCLUDE_FILE_PATTERNS =
PREDEFINED =
EXPAND_AS_DEFINED =
INCLUDE_FILE_PATTERNS =
PREDEFINED =
EXPAND_AS_DEFINED =
SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
# Configuration::additions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = glsl.tag=../glsl \
main.tag=../main \
math.tag=../math \
tnl_dd.tag=../tnl_dd \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
vbo.tag=vbo
GENERATE_TAGFILE = swrast.tag
tnl_dd.tag=../tnl_dd \
vbo.tag=../vbo
GENERATE_TAGFILE = nir.tag

View File

@@ -168,8 +168,7 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
TAGFILES = \
core_subset.tag=../core_subset \
math_subset.tag=../math_subset \
miniglx.tag=../miniglx
math_subset.tag=../math_subset
GENERATE_TAGFILE = radeon_subset.tag
ALLEXTERNALS = NO
PERL_PATH =

View File

@@ -39,10 +39,10 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = main.tag=../main \
math.tag=../math \
tnl_dd.tag=../tnl_dd \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
vbo.tag=vbo
vbo.tag=../vbo
GENERATE_TAGFILE = swrast.tag

View File

@@ -41,7 +41,7 @@ SKIP_FUNCTION_MACROS = YES
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = tnl_dd.tag=../tnl_dd \
main.tag=../core \
main.tag=../main \
math.tag=../math \
swrast.tag=../swrast \
tnl.tag=../tnl \

View File

@@ -40,11 +40,10 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = tnl_dd.tag=../tnl \
main.tag=../core \
TAGFILES = tnl_dd.tag=../tnl_dd \
main.tag=../main \
math.tag=../math \
shader.tag=../shader \
swrast.tag=../swrast \
swrast_setup.tag=swrast_setup \
vbo.tag=vbo
swrast_setup.tag=../swrast_setup \
vbo.tag=../vbo
GENERATE_TAGFILE = tnl.tag

View File

@@ -39,11 +39,10 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = main.tag=../main \
math.tag=../math \
shader.tag=../shader \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \
vbo.tag=vbo
vbo.tag=../vbo
GENERATE_TAGFILE = tnl_dd.tag

View File

@@ -40,9 +40,8 @@ SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::addtions related to external references
#---------------------------------------------------------------------------
TAGFILES = main.tag=../core \
TAGFILES = main.tag=../main \
math.tag=../math \
shader.tag=../shader \
swrast.tag=../swrast \
swrast_setup.tag=../swrast_setup \
tnl.tag=../tnl \

View File

@@ -260,7 +260,7 @@ struct IDirect3DDevice9 : public IUnknown
virtual HRESULT WINAPI SetStreamSourceFreq(UINT StreamNumber, UINT Setting) = 0;
virtual HRESULT WINAPI GetStreamSourceFreq(UINT StreamNumber, UINT *pSetting) = 0;
virtual HRESULT WINAPI SetIndices(IDirect3DIndexBuffer9 *pIndexData) = 0;
virtual HRESULT WINAPI GetIndices(IDirect3DIndexBuffer9 **ppIndexData, UINT *pBaseVertexIndex) = 0;
virtual HRESULT WINAPI GetIndices(IDirect3DIndexBuffer9 **ppIndexData) = 0;
virtual HRESULT WINAPI CreatePixelShader(const DWORD *pFunction, IDirect3DPixelShader9 **ppShader) = 0;
virtual HRESULT WINAPI SetPixelShader(IDirect3DPixelShader9 *pShader) = 0;
virtual HRESULT WINAPI GetPixelShader(IDirect3DPixelShader9 **ppShader) = 0;
@@ -848,7 +848,7 @@ typedef struct IDirect3DDevice9Vtbl
HRESULT (WINAPI *SetStreamSourceFreq)(IDirect3DDevice9 *This, UINT StreamNumber, UINT Setting);
HRESULT (WINAPI *GetStreamSourceFreq)(IDirect3DDevice9 *This, UINT StreamNumber, UINT *pSetting);
HRESULT (WINAPI *SetIndices)(IDirect3DDevice9 *This, IDirect3DIndexBuffer9 *pIndexData);
HRESULT (WINAPI *GetIndices)(IDirect3DDevice9 *This, IDirect3DIndexBuffer9 **ppIndexData, UINT *pBaseVertexIndex);
HRESULT (WINAPI *GetIndices)(IDirect3DDevice9 *This, IDirect3DIndexBuffer9 **ppIndexData);
HRESULT (WINAPI *CreatePixelShader)(IDirect3DDevice9 *This, const DWORD *pFunction, IDirect3DPixelShader9 **ppShader);
HRESULT (WINAPI *SetPixelShader)(IDirect3DDevice9 *This, IDirect3DPixelShader9 *pShader);
HRESULT (WINAPI *GetPixelShader)(IDirect3DDevice9 *This, IDirect3DPixelShader9 **ppShader);
@@ -975,7 +975,7 @@ struct IDirect3DDevice9
#define IDirect3DDevice9_SetStreamSourceFreq(p,a,b) (p)->lpVtbl->SetStreamSourceFreq(p,a,b)
#define IDirect3DDevice9_GetStreamSourceFreq(p,a,b) (p)->lpVtbl->GetStreamSourceFreq(p,a,b)
#define IDirect3DDevice9_SetIndices(p,a) (p)->lpVtbl->SetIndices(p,a)
#define IDirect3DDevice9_GetIndices(p,a,b) (p)->lpVtbl->GetIndices(p,a,b)
#define IDirect3DDevice9_GetIndices(p,a) (p)->lpVtbl->GetIndices(p,a)
#define IDirect3DDevice9_CreatePixelShader(p,a,b) (p)->lpVtbl->CreatePixelShader(p,a,b)
#define IDirect3DDevice9_SetPixelShader(p,a) (p)->lpVtbl->SetPixelShader(p,a)
#define IDirect3DDevice9_GetPixelShader(p,a) (p)->lpVtbl->GetPixelShader(p,a)
@@ -1099,7 +1099,7 @@ typedef struct IDirect3DDevice9ExVtbl
HRESULT (WINAPI *SetStreamSourceFreq)(IDirect3DDevice9Ex *This, UINT StreamNumber, UINT Setting);
HRESULT (WINAPI *GetStreamSourceFreq)(IDirect3DDevice9Ex *This, UINT StreamNumber, UINT *pSetting);
HRESULT (WINAPI *SetIndices)(IDirect3DDevice9Ex *This, IDirect3DIndexBuffer9 *pIndexData);
HRESULT (WINAPI *GetIndices)(IDirect3DDevice9Ex *This, IDirect3DIndexBuffer9 **ppIndexData, UINT *pBaseVertexIndex);
HRESULT (WINAPI *GetIndices)(IDirect3DDevice9Ex *This, IDirect3DIndexBuffer9 **ppIndexData);
HRESULT (WINAPI *CreatePixelShader)(IDirect3DDevice9Ex *This, const DWORD *pFunction, IDirect3DPixelShader9 **ppShader);
HRESULT (WINAPI *SetPixelShader)(IDirect3DDevice9Ex *This, IDirect3DPixelShader9 *pShader);
HRESULT (WINAPI *GetPixelShader)(IDirect3DDevice9Ex *This, IDirect3DPixelShader9 **ppShader);
@@ -1242,7 +1242,7 @@ struct IDirect3DDevice9Ex
#define IDirect3DDevice9Ex_SetStreamSourceFreq(p,a,b) (p)->lpVtbl->SetStreamSourceFreq(p,a,b)
#define IDirect3DDevice9Ex_GetStreamSourceFreq(p,a,b) (p)->lpVtbl->GetStreamSourceFreq(p,a,b)
#define IDirect3DDevice9Ex_SetIndices(p,a) (p)->lpVtbl->SetIndices(p,a)
#define IDirect3DDevice9Ex_GetIndices(p,a,b) (p)->lpVtbl->GetIndices(p,a,b)
#define IDirect3DDevice9Ex_GetIndices(p,a) (p)->lpVtbl->GetIndices(p,a)
#define IDirect3DDevice9Ex_CreatePixelShader(p,a,b) (p)->lpVtbl->CreatePixelShader(p,a,b)
#define IDirect3DDevice9Ex_SetPixelShader(p,a) (p)->lpVtbl->SetPixelShader(p,a)
#define IDirect3DDevice9Ex_GetPixelShader(p,a) (p)->lpVtbl->GetPixelShader(p,a)

View File

@@ -173,16 +173,16 @@ typedef struct _RGNDATA {
#define D3DPRESENTFLAG_RESTRICTED_CONTENT 0x00000400
#define D3DPRESENTFLAG_RESTRICT_SHARED_RESOURCE_DRIVER 0x00000800
#ifdef WINAPI
#undef WINAPI
#endif /* WINAPI*/
#if defined(__x86_64__) || defined(_M_X64)
#define WINAPI __attribute__((ms_abi))
#else /* x86_64 */
#define WINAPI __attribute__((__stdcall__))
#endif /* x86_64 */
/* Windows calling convention */
#ifndef WINAPI
#if defined(__x86_64__) && !defined(__ILP32__)
#define WINAPI __attribute__((ms_abi))
#elif defined(__i386__)
#define WINAPI __attribute__((__stdcall__))
#else /* neither amd64 nor i386 */
#define WINAPI
#endif
#endif /* WINAPI */
/* Implementation caps */
#define D3DPRESENT_BACK_BUFFERS_MAX 3
@@ -227,6 +227,7 @@ typedef struct _RGNDATA {
#define D3DERR_DRIVERINVALIDCALL MAKE_D3DHRESULT(2157)
#define D3DERR_DEVICEREMOVED MAKE_D3DHRESULT(2160)
#define D3DERR_DEVICEHUNG MAKE_D3DHRESULT(2164)
#define S_PRESENT_OCCLUDED MAKE_D3DSTATUS(2168)
/********************************************************
* Bitmasks *

View File

@@ -34,17 +34,6 @@ extern "C" {
#include <EGL/eglplatform.h>
#ifndef EGL_MESA_drm_display
#define EGL_MESA_drm_display 1
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLDisplay EGLAPIENTRY eglGetDRMDisplayMESA(int fd);
#endif /* EGL_EGLEXT_PROTOTYPES */
typedef EGLDisplay (EGLAPIENTRYP PFNEGLGETDRMDISPLAYMESA) (int fd);
#endif /* EGL_MESA_drm_display */
#ifdef EGL_MESA_drm_image
/* Mesa's extension to EGL_MESA_drm_image... */
#ifndef EGL_DRM_BUFFER_USE_CURSOR_MESA

View File

@@ -79,6 +79,7 @@ typedef struct __DRIdri2LoaderExtensionRec __DRIdri2LoaderExtension;
typedef struct __DRI2flushExtensionRec __DRI2flushExtension;
typedef struct __DRI2throttleExtensionRec __DRI2throttleExtension;
typedef struct __DRI2fenceExtensionRec __DRI2fenceExtension;
typedef struct __DRI2interopExtensionRec __DRI2interopExtension;
typedef struct __DRIimageLoaderExtensionRec __DRIimageLoaderExtension;
@@ -392,6 +393,31 @@ struct __DRI2fenceExtensionRec {
};
/**
* Extension for API interop.
* See GL/mesa_glinterop.h.
*/
#define __DRI2_INTEROP "DRI2_Interop"
#define __DRI2_INTEROP_VERSION 1
struct mesa_glinterop_device_info;
struct mesa_glinterop_export_in;
struct mesa_glinterop_export_out;
struct __DRI2interopExtensionRec {
__DRIextension base;
/** Same as MesaGLInterop*QueryDeviceInfo. */
int (*query_device_info)(__DRIcontext *ctx,
struct mesa_glinterop_device_info *out);
/** Same as MesaGLInterop*ExportObject. */
int (*export_object)(__DRIcontext *ctx,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
};
/*@}*/
/**
@@ -1068,7 +1094,7 @@ struct __DRIdri2ExtensionRec {
* extensions.
*/
#define __DRI_IMAGE "DRI_IMAGE"
#define __DRI_IMAGE_VERSION 11
#define __DRI_IMAGE_VERSION 12
/**
* These formats correspond to the similarly named MESA_FORMAT_*
@@ -1100,8 +1126,18 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_USE_SCANOUT 0x0002
#define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */
#define __DRI_IMAGE_USE_LINEAR 0x0008
/* The buffer will only be read by an external process after SwapBuffers,
* in contrary to gbm buffers, front buffers and fake front buffers, which
* could be read after a flush."
*/
#define __DRI_IMAGE_USE_BACKBUFFER 0x0010
#define __DRI_IMAGE_TRANSFER_READ 0x1
#define __DRI_IMAGE_TRANSFER_WRITE 0x2
#define __DRI_IMAGE_TRANSFER_READ_WRITE \
(__DRI_IMAGE_TRANSFER_READ | __DRI_IMAGE_TRANSFER_WRITE)
/**
* Four CC formats that matches with WL_DRM_FORMAT_* from wayland_drm.h,
* GBM_FORMAT_* from gbm.h, and DRM_FORMAT_* from drm_fourcc.h. Used with
@@ -1127,6 +1163,11 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FOURCC_NV16 0x3631564e
#define __DRI_IMAGE_FOURCC_YUYV 0x56595559
#define __DRI_IMAGE_FOURCC_YVU410 0x39555659
#define __DRI_IMAGE_FOURCC_YVU411 0x31315659
#define __DRI_IMAGE_FOURCC_YVU420 0x32315659
#define __DRI_IMAGE_FOURCC_YVU422 0x36315659
#define __DRI_IMAGE_FOURCC_YVU444 0x34325659
/**
* Queryable on images created by createImageFromNames.
@@ -1350,6 +1391,33 @@ struct __DRIimageExtensionRec {
* \since 10
*/
int (*getCapabilities)(__DRIscreen *screen);
/**
* Returns a map of the specified region of a __DRIimage for the specified usage.
*
* flags may include __DRI_IMAGE_TRANSFER_READ, which will populate the
* mapping with the current buffer content. If __DRI_IMAGE_TRANSFER_READ
* is not included in the flags, the buffer content at map time is
* undefined. Users wanting to modify the mapping must include
* __DRI_IMAGE_TRANSFER_WRITE; if __DRI_IMAGE_TRANSFER_WRITE is not
* included, behaviour when writing the mapping is undefined.
*
* Returns the byte stride in *stride, and an opaque pointer to data
* tracking the mapping in **data, which must be passed to unmapImage().
*
* \since 12
*/
void *(*mapImage)(__DRIcontext *context, __DRIimage *image,
int x0, int y0, int width, int height,
unsigned int flags, int *stride, void **data);
/**
* Unmap a previously mapped __DRIimage
*
* \since 12
*/
void (*unmapImage)(__DRIcontext *context, __DRIimage *image, void *data);
};

304
include/GL/mesa_glinterop.h Normal file
View File

@@ -0,0 +1,304 @@
/*
* Mesa 3-D graphics library
*
* Copyright 2016 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/* Mesa OpenGL inter-driver interoperability interface designed for but not
* limited to OpenCL.
*
* This is a driver-agnostic, backward-compatible interface. The structures
* are only allowed to grow. They can never shrink and their members can
* never be removed, renamed, or redefined.
*
* The interface doesn't return a lot of static texture parameters like
* width, height, etc. It mainly returns mutable buffer and texture view
* parameters that can't be part of the texture allocation (because they are
* mutable). If drivers want to return more data or want to return static
* allocation parameters, they can do it in one of these two ways:
* - attaching the data to the DMABUF handle in a driver-specific way
* - passing the data via "out_driver_data" in the "in" structure.
*
* Mesa is expected to do a lot of error checking on behalf of OpenCL, such
* as checking the target, miplevel, and texture completeness.
*
* OpenCL, on the other hand, needs to check if the display+context combo
* is compatible with the OpenCL driver by querying the device information.
* It also needs to check if the texture internal format and channel ordering
* (returned in a driver-specific way) is supported by OpenCL, among other
* things.
*/
#ifndef MESA_GLINTEROP_H
#define MESA_GLINTEROP_H
#include <stddef.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
/* Forward declarations to avoid inclusion of GL/glx.h */
typedef struct _XDisplay Display;
typedef struct __GLXcontextRec *GLXContext;
/* Forward declarations to avoid inclusion of EGL/egl.h */
typedef void *EGLDisplay;
typedef void *EGLContext;
/** Returned error codes. */
enum {
MESA_GLINTEROP_SUCCESS = 0,
MESA_GLINTEROP_OUT_OF_RESOURCES,
MESA_GLINTEROP_OUT_OF_HOST_MEMORY,
MESA_GLINTEROP_INVALID_OPERATION,
MESA_GLINTEROP_INVALID_VERSION,
MESA_GLINTEROP_INVALID_DISPLAY,
MESA_GLINTEROP_INVALID_CONTEXT,
MESA_GLINTEROP_INVALID_TARGET,
MESA_GLINTEROP_INVALID_OBJECT,
MESA_GLINTEROP_INVALID_MIP_LEVEL,
MESA_GLINTEROP_UNSUPPORTED
};
/** Access flags. */
enum {
MESA_GLINTEROP_ACCESS_READ_WRITE = 0,
MESA_GLINTEROP_ACCESS_READ_ONLY,
MESA_GLINTEROP_ACCESS_WRITE_ONLY
};
#define MESA_GLINTEROP_DEVICE_INFO_VERSION 1
/**
* Device information returned by Mesa.
*/
struct mesa_glinterop_device_info {
/* The caller should set this to the version of the struct they support */
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_DEVICE_INFO_VERSION macro */
uint32_t version;
/* PCI location */
uint32_t pci_segment_group;
uint32_t pci_bus;
uint32_t pci_device;
uint32_t pci_function;
/* Device identification */
uint32_t vendor_id;
uint32_t device_id;
/* Structure version 1 ends here. */
};
#define MESA_GLINTEROP_EXPORT_IN_VERSION 1
/**
* Input parameters to Mesa interop export functions.
*/
struct mesa_glinterop_export_in {
/* The caller should set this to the version of the struct they support */
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_EXPORT_IN_VERSION macro */
uint32_t version;
/* One of the following:
* - GL_TEXTURE_BUFFER
* - GL_TEXTURE_1D
* - GL_TEXTURE_2D
* - GL_TEXTURE_3D
* - GL_TEXTURE_RECTANGLE
* - GL_TEXTURE_1D_ARRAY
* - GL_TEXTURE_2D_ARRAY
* - GL_TEXTURE_CUBE_MAP_ARRAY
* - GL_TEXTURE_CUBE_MAP
* - GL_TEXTURE_CUBE_MAP_POSITIVE_X
* - GL_TEXTURE_CUBE_MAP_NEGATIVE_X
* - GL_TEXTURE_CUBE_MAP_POSITIVE_Y
* - GL_TEXTURE_CUBE_MAP_NEGATIVE_Y
* - GL_TEXTURE_CUBE_MAP_POSITIVE_Z
* - GL_TEXTURE_CUBE_MAP_NEGATIVE_Z
* - GL_TEXTURE_2D_MULTISAMPLE
* - GL_TEXTURE_2D_MULTISAMPLE_ARRAY
* - GL_TEXTURE_EXTERNAL_OES
* - GL_RENDERBUFFER
* - GL_ARRAY_BUFFER
*/
unsigned target;
/* If target is GL_ARRAY_BUFFER, it's a buffer object.
* If target is GL_RENDERBUFFER, it's a renderbuffer object.
* If target is GL_TEXTURE_*, it's a texture object.
*/
unsigned obj;
/* Mipmap level. Ignored for non-texture objects. */
unsigned miplevel;
/* One of MESA_GLINTEROP_ACCESS_* flags. This describes how the exported
* object is going to be used.
*/
uint32_t access;
/* Size of memory pointed to by out_driver_data. */
uint32_t out_driver_data_size;
/* If the caller wants to query driver-specific data about the OpenGL
* object, this should point to the memory where that data will be stored.
* This is expected to be a temporary staging memory. The pointer is not
* allowed to be saved for later use by Mesa.
*/
void *out_driver_data;
/* Structure version 1 ends here. */
};
#define MESA_GLINTEROP_EXPORT_OUT_VERSION 1
/**
* Outputs of Mesa interop export functions.
*/
struct mesa_glinterop_export_out {
/* The caller should set this to the version of the struct they support */
/* The callee will overwrite it if it supports a lower version.
*
* The caller should check the value and access up-to the version supported
* by the the callee.
*/
/* NOTE: Do not use the MESA_GLINTEROP_EXPORT_OUT_VERSION macro */
uint32_t version;
/* The DMABUF handle. It must be closed by the caller using the POSIX
* close() function when it's not needed anymore. Mesa is not responsible
* for closing the handle.
*
* Not closing the handle by the caller will lead to a resource leak,
* will prevent releasing the GPU buffer, and may prevent creating new
* DMABUF handles within the process.
*/
int dmabuf_fd;
/* The mutable OpenGL internal format specified by glTextureView or
* glTexBuffer. If the object is not one of those, the original internal
* format specified by glTexStorage, glTexImage, or glRenderbufferStorage
* will be returned.
*/
unsigned internal_format;
/* Buffer offset and size for GL_ARRAY_BUFFER and GL_TEXTURE_BUFFER.
* This allows interop with suballocations (a buffer allocated within
* a larger buffer).
*
* Parameters specified by glTexBufferRange for GL_TEXTURE_BUFFER are
* applied to these and can shrink the range further.
*/
ptrdiff_t buf_offset;
ptrdiff_t buf_size;
/* Parameters specified by glTextureView. If the object is not a texture
* view, default parameters covering the whole texture will be returned.
*/
unsigned view_minlevel;
unsigned view_numlevels;
unsigned view_minlayer;
unsigned view_numlayers;
/* The number of bytes written to out_driver_data. */
uint32_t out_driver_data_written;
/* Structure version 1 ends here. */
};
/**
* Query device information.
*
* \param dpy GLX display
* \param context GLX context
* \param out where to return the information
*
* \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
*/
int
MesaGLInteropGLXQueryDeviceInfo(Display *dpy, GLXContext context,
struct mesa_glinterop_device_info *out);
/**
* Same as MesaGLInteropGLXQueryDeviceInfo except that it accepts EGLDisplay
* and EGLContext.
*/
int
MesaGLInteropEGLQueryDeviceInfo(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_device_info *out);
/**
* Create and return a DMABUF handle corresponding to the given OpenGL
* object, and return other parameters about the OpenGL object.
*
* \param dpy GLX display
* \param context GLX context
* \param in input parameters
* \param out return values
*
* \return MESA_GLINTEROP_SUCCESS or MESA_GLINTEROP_* != 0 on error
*/
int
MesaGLInteropGLXExportObject(Display *dpy, GLXContext context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
/**
* Same as MesaGLInteropGLXExportObject except that it accepts
* EGLDisplay and EGLContext.
*/
int
MesaGLInteropEGLExportObject(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
typedef int (PFNMESAGLINTEROPGLXQUERYDEVICEINFOPROC)(Display *dpy, GLXContext context,
struct mesa_glinterop_device_info *out);
typedef int (PFNMESAGLINTEROPEGLQUERYDEVICEINFOPROC)(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_device_info *out);
typedef int (PFNMESAGLINTEROPGLXEXPORTOBJECTPROC)(Display *dpy, GLXContext context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
typedef int (PFNMESAGLINTEROPEGLEXPORTOBJECTPROC)(EGLDisplay dpy, EGLContext context,
struct mesa_glinterop_export_in *in,
struct mesa_glinterop_export_out *out);
#ifdef __cplusplus
}
#endif
#endif /* MESA_GLINTEROP_H */

View File

@@ -58,8 +58,8 @@ extern "C" {
#include <GL/gl.h>
#define OSMESA_MAJOR_VERSION 10
#define OSMESA_MINOR_VERSION 0
#define OSMESA_MAJOR_VERSION 11
#define OSMESA_MINOR_VERSION 2
#define OSMESA_PATCH_VERSION 0
@@ -95,6 +95,18 @@ extern "C" {
#define OSMESA_MAX_WIDTH 0x24 /* new in 4.0 */
#define OSMESA_MAX_HEIGHT 0x25 /* new in 4.0 */
/*
* Accepted in OSMesaCreateContextAttrib's attribute list.
*/
#define OSMESA_DEPTH_BITS 0x30
#define OSMESA_STENCIL_BITS 0x31
#define OSMESA_ACCUM_BITS 0x32
#define OSMESA_PROFILE 0x33
#define OSMESA_CORE_PROFILE 0x34
#define OSMESA_COMPAT_PROFILE 0x35
#define OSMESA_CONTEXT_MAJOR_VERSION 0x36
#define OSMESA_CONTEXT_MINOR_VERSION 0x37
typedef struct osmesa_context *OSMesaContext;
@@ -127,6 +139,35 @@ OSMesaCreateContextExt( GLenum format, GLint depthBits, GLint stencilBits,
GLint accumBits, OSMesaContext sharelist);
/*
* Create an Off-Screen Mesa rendering context with attribute list.
* The list is composed of (attribute, value) pairs and terminated with
* attribute==0. Supported Attributes:
*
* Attributes Values
* --------------------------------------------------------------------------
* OSMESA_FORMAT OSMESA_RGBA*, OSMESA_BGRA, OSMESA_ARGB, etc.
* OSMESA_DEPTH_BITS 0*, 16, 24, 32
* OSMESA_STENCIL_BITS 0*, 8
* OSMESA_ACCUM_BITS 0*, 16
* OSMESA_PROFILE OSMESA_COMPAT_PROFILE*, OSMESA_CORE_PROFILE
* OSMESA_CONTEXT_MAJOR_VERSION 1*, 2, 3
* OSMESA_CONTEXT_MINOR_VERSION 0+
*
* Note: * = default value
*
* We return a context version >= what's specified by OSMESA_CONTEXT_MAJOR/
* MINOR_VERSION for the given profile. For example, if you request a GL 1.4
* compat profile, you might get a GL 3.0 compat profile.
* Otherwise, null is returned if the version/profile is not supported.
*
* New in Mesa 11.2
*/
GLAPI OSMesaContext GLAPIENTRY
OSMesaCreateContextAttribs( const int *attribList, OSMesaContext sharelist );
/*
* Destroy an Off-Screen Mesa rendering context.
*

View File

@@ -169,6 +169,32 @@ mtx_destroy(mtx_t *mtx)
pthread_mutex_destroy(mtx);
}
/*
* XXX: Workaround when building with -O0 and without pthreads link.
*
* In such cases constant folding and dead code elimination won't be
* available, thus the compiler will always add the pthread_mutexattr*
* functions into the binary. As we try to link, we'll fail as the
* symbols are unresolved.
*
* Ideally we'll enable the optimisations locally, yet that does not
* seem to work.
*
* So the alternative workaround is to annotate the symbols as weak.
* Thus the linker will be happy and things don't clash when building
* with -O1 or greater.
*/
#ifdef HAVE_FUNC_ATTRIBUTE_WEAK
__attribute__((weak))
int pthread_mutexattr_init(pthread_mutexattr_t *attr);
__attribute__((weak))
int pthread_mutexattr_settype(pthread_mutexattr_t *attr, int type);
__attribute__((weak))
int pthread_mutexattr_destroy(pthread_mutexattr_t *attr);
#endif
// 7.25.4.2
static inline int
mtx_init(mtx_t *mtx, int type)
@@ -180,9 +206,14 @@ mtx_init(mtx_t *mtx, int type)
&& type != (mtx_timed|mtx_recursive)
&& type != (mtx_try|mtx_recursive))
return thrd_error;
if ((type & mtx_recursive) == 0) {
pthread_mutex_init(mtx, NULL);
return thrd_success;
}
pthread_mutexattr_init(&attr);
if ((type & mtx_recursive) != 0)
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(mtx, &attr);
pthread_mutexattr_destroy(&attr);
return thrd_success;

View File

@@ -1,305 +0,0 @@
// ISO C9x compliant inttypes.h for Microsoft Visual Studio
// Based on ISO/IEC 9899:TC2 Committee draft (May 6, 2005) WG14/N1124
//
// Copyright (c) 2006 Alexander Chemeris
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. The name of the author may be used to endorse or promote products
// derived from this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
// WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
// EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
// OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
// OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
// ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
///////////////////////////////////////////////////////////////////////////////
#ifndef _MSC_VER // [
#error "Use this header only with Microsoft Visual C++ compilers!"
#endif // _MSC_VER ]
#ifndef _MSC_INTTYPES_H_ // [
#define _MSC_INTTYPES_H_
#if _MSC_VER > 1000
#pragma once
#endif
#include "stdint.h"
// 7.8 Format conversion of integer types
typedef struct {
intmax_t quot;
intmax_t rem;
} imaxdiv_t;
// 7.8.1 Macros for format specifiers
#if !defined(__cplusplus) || defined(__STDC_FORMAT_MACROS) // [ See footnote 185 at page 198
// The fprintf macros for signed integers are:
#define PRId8 "d"
#define PRIi8 "i"
#define PRIdLEAST8 "d"
#define PRIiLEAST8 "i"
#define PRIdFAST8 "d"
#define PRIiFAST8 "i"
#define PRId16 "hd"
#define PRIi16 "hi"
#define PRIdLEAST16 "hd"
#define PRIiLEAST16 "hi"
#define PRIdFAST16 "hd"
#define PRIiFAST16 "hi"
#define PRId32 "I32d"
#define PRIi32 "I32i"
#define PRIdLEAST32 "I32d"
#define PRIiLEAST32 "I32i"
#define PRIdFAST32 "I32d"
#define PRIiFAST32 "I32i"
#define PRId64 "I64d"
#define PRIi64 "I64i"
#define PRIdLEAST64 "I64d"
#define PRIiLEAST64 "I64i"
#define PRIdFAST64 "I64d"
#define PRIiFAST64 "I64i"
#define PRIdMAX "I64d"
#define PRIiMAX "I64i"
#define PRIdPTR "Id"
#define PRIiPTR "Ii"
// The fprintf macros for unsigned integers are:
#define PRIo8 "o"
#define PRIu8 "u"
#define PRIx8 "x"
#define PRIX8 "X"
#define PRIoLEAST8 "o"
#define PRIuLEAST8 "u"
#define PRIxLEAST8 "x"
#define PRIXLEAST8 "X"
#define PRIoFAST8 "o"
#define PRIuFAST8 "u"
#define PRIxFAST8 "x"
#define PRIXFAST8 "X"
#define PRIo16 "ho"
#define PRIu16 "hu"
#define PRIx16 "hx"
#define PRIX16 "hX"
#define PRIoLEAST16 "ho"
#define PRIuLEAST16 "hu"
#define PRIxLEAST16 "hx"
#define PRIXLEAST16 "hX"
#define PRIoFAST16 "ho"
#define PRIuFAST16 "hu"
#define PRIxFAST16 "hx"
#define PRIXFAST16 "hX"
#define PRIo32 "I32o"
#define PRIu32 "I32u"
#define PRIx32 "I32x"
#define PRIX32 "I32X"
#define PRIoLEAST32 "I32o"
#define PRIuLEAST32 "I32u"
#define PRIxLEAST32 "I32x"
#define PRIXLEAST32 "I32X"
#define PRIoFAST32 "I32o"
#define PRIuFAST32 "I32u"
#define PRIxFAST32 "I32x"
#define PRIXFAST32 "I32X"
#define PRIo64 "I64o"
#define PRIu64 "I64u"
#define PRIx64 "I64x"
#define PRIX64 "I64X"
#define PRIoLEAST64 "I64o"
#define PRIuLEAST64 "I64u"
#define PRIxLEAST64 "I64x"
#define PRIXLEAST64 "I64X"
#define PRIoFAST64 "I64o"
#define PRIuFAST64 "I64u"
#define PRIxFAST64 "I64x"
#define PRIXFAST64 "I64X"
#define PRIoMAX "I64o"
#define PRIuMAX "I64u"
#define PRIxMAX "I64x"
#define PRIXMAX "I64X"
#define PRIoPTR "Io"
#define PRIuPTR "Iu"
#define PRIxPTR "Ix"
#define PRIXPTR "IX"
// The fscanf macros for signed integers are:
#define SCNd8 "d"
#define SCNi8 "i"
#define SCNdLEAST8 "d"
#define SCNiLEAST8 "i"
#define SCNdFAST8 "d"
#define SCNiFAST8 "i"
#define SCNd16 "hd"
#define SCNi16 "hi"
#define SCNdLEAST16 "hd"
#define SCNiLEAST16 "hi"
#define SCNdFAST16 "hd"
#define SCNiFAST16 "hi"
#define SCNd32 "ld"
#define SCNi32 "li"
#define SCNdLEAST32 "ld"
#define SCNiLEAST32 "li"
#define SCNdFAST32 "ld"
#define SCNiFAST32 "li"
#define SCNd64 "I64d"
#define SCNi64 "I64i"
#define SCNdLEAST64 "I64d"
#define SCNiLEAST64 "I64i"
#define SCNdFAST64 "I64d"
#define SCNiFAST64 "I64i"
#define SCNdMAX "I64d"
#define SCNiMAX "I64i"
#ifdef _WIN64 // [
# define SCNdPTR "I64d"
# define SCNiPTR "I64i"
#else // _WIN64 ][
# define SCNdPTR "ld"
# define SCNiPTR "li"
#endif // _WIN64 ]
// The fscanf macros for unsigned integers are:
#define SCNo8 "o"
#define SCNu8 "u"
#define SCNx8 "x"
#define SCNX8 "X"
#define SCNoLEAST8 "o"
#define SCNuLEAST8 "u"
#define SCNxLEAST8 "x"
#define SCNXLEAST8 "X"
#define SCNoFAST8 "o"
#define SCNuFAST8 "u"
#define SCNxFAST8 "x"
#define SCNXFAST8 "X"
#define SCNo16 "ho"
#define SCNu16 "hu"
#define SCNx16 "hx"
#define SCNX16 "hX"
#define SCNoLEAST16 "ho"
#define SCNuLEAST16 "hu"
#define SCNxLEAST16 "hx"
#define SCNXLEAST16 "hX"
#define SCNoFAST16 "ho"
#define SCNuFAST16 "hu"
#define SCNxFAST16 "hx"
#define SCNXFAST16 "hX"
#define SCNo32 "lo"
#define SCNu32 "lu"
#define SCNx32 "lx"
#define SCNX32 "lX"
#define SCNoLEAST32 "lo"
#define SCNuLEAST32 "lu"
#define SCNxLEAST32 "lx"
#define SCNXLEAST32 "lX"
#define SCNoFAST32 "lo"
#define SCNuFAST32 "lu"
#define SCNxFAST32 "lx"
#define SCNXFAST32 "lX"
#define SCNo64 "I64o"
#define SCNu64 "I64u"
#define SCNx64 "I64x"
#define SCNX64 "I64X"
#define SCNoLEAST64 "I64o"
#define SCNuLEAST64 "I64u"
#define SCNxLEAST64 "I64x"
#define SCNXLEAST64 "I64X"
#define SCNoFAST64 "I64o"
#define SCNuFAST64 "I64u"
#define SCNxFAST64 "I64x"
#define SCNXFAST64 "I64X"
#define SCNoMAX "I64o"
#define SCNuMAX "I64u"
#define SCNxMAX "I64x"
#define SCNXMAX "I64X"
#ifdef _WIN64 // [
# define SCNoPTR "I64o"
# define SCNuPTR "I64u"
# define SCNxPTR "I64x"
# define SCNXPTR "I64X"
#else // _WIN64 ][
# define SCNoPTR "lo"
# define SCNuPTR "lu"
# define SCNxPTR "lx"
# define SCNXPTR "lX"
#endif // _WIN64 ]
#endif // __STDC_FORMAT_MACROS ]
// 7.8.2 Functions for greatest-width integer types
// 7.8.2.1 The imaxabs function
#define imaxabs _abs64
// 7.8.2.2 The imaxdiv function
// This is modified version of div() function from Microsoft's div.c found
// in %MSVC.NET%\crt\src\div.c
#ifdef STATIC_IMAXDIV // [
static
#else // STATIC_IMAXDIV ][
_inline
#endif // STATIC_IMAXDIV ]
imaxdiv_t __cdecl imaxdiv(intmax_t numer, intmax_t denom)
{
imaxdiv_t result;
result.quot = numer / denom;
result.rem = numer % denom;
if (numer < 0 && result.rem > 0) {
// did division wrong; must fix up
++result.quot;
result.rem -= denom;
}
return result;
}
// 7.8.2.3 The strtoimax and strtoumax functions
#define strtoimax _strtoi64
#define strtoumax _strtoui64
// 7.8.2.4 The wcstoimax and wcstoumax functions
#define wcstoimax _wcstoi64
#define wcstoumax _wcstoui64
#endif // _MSC_INTTYPES_H_ ]

View File

@@ -1,247 +0,0 @@
// ISO C9x compliant stdint.h for Microsoft Visual Studio
// Based on ISO/IEC 9899:TC2 Committee draft (May 6, 2005) WG14/N1124
//
// Copyright (c) 2006-2008 Alexander Chemeris
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. The name of the author may be used to endorse or promote products
// derived from this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
// WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
// EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
// OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
// OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
// ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
///////////////////////////////////////////////////////////////////////////////
#ifndef _MSC_VER // [
#error "Use this header only with Microsoft Visual C++ compilers!"
#endif // _MSC_VER ]
#ifndef _MSC_STDINT_H_ // [
#define _MSC_STDINT_H_
#if _MSC_VER > 1000
#pragma once
#endif
#include <limits.h>
// For Visual Studio 6 in C++ mode and for many Visual Studio versions when
// compiling for ARM we should wrap <wchar.h> include with 'extern "C++" {}'
// or compiler give many errors like this:
// error C2733: second C linkage of overloaded function 'wmemchr' not allowed
#ifdef __cplusplus
extern "C" {
#endif
# include <wchar.h>
#ifdef __cplusplus
}
#endif
// Define _W64 macros to mark types changing their size, like intptr_t.
#ifndef _W64
# if !defined(__midl) && (defined(_X86_) || defined(_M_IX86)) && _MSC_VER >= 1300
# define _W64 __w64
# else
# define _W64
# endif
#endif
// 7.18.1 Integer types
// 7.18.1.1 Exact-width integer types
// Visual Studio 6 and Embedded Visual C++ 4 doesn't
// realize that, e.g. char has the same size as __int8
// so we give up on __intX for them.
#if (_MSC_VER < 1300)
typedef signed char int8_t;
typedef signed short int16_t;
typedef signed int int32_t;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
#else
typedef signed __int8 int8_t;
typedef signed __int16 int16_t;
typedef signed __int32 int32_t;
typedef unsigned __int8 uint8_t;
typedef unsigned __int16 uint16_t;
typedef unsigned __int32 uint32_t;
#endif
typedef signed __int64 int64_t;
typedef unsigned __int64 uint64_t;
// 7.18.1.2 Minimum-width integer types
typedef int8_t int_least8_t;
typedef int16_t int_least16_t;
typedef int32_t int_least32_t;
typedef int64_t int_least64_t;
typedef uint8_t uint_least8_t;
typedef uint16_t uint_least16_t;
typedef uint32_t uint_least32_t;
typedef uint64_t uint_least64_t;
// 7.18.1.3 Fastest minimum-width integer types
typedef int8_t int_fast8_t;
typedef int16_t int_fast16_t;
typedef int32_t int_fast32_t;
typedef int64_t int_fast64_t;
typedef uint8_t uint_fast8_t;
typedef uint16_t uint_fast16_t;
typedef uint32_t uint_fast32_t;
typedef uint64_t uint_fast64_t;
// 7.18.1.4 Integer types capable of holding object pointers
#ifdef _WIN64 // [
typedef signed __int64 intptr_t;
typedef unsigned __int64 uintptr_t;
#else // _WIN64 ][
typedef _W64 signed int intptr_t;
typedef _W64 unsigned int uintptr_t;
#endif // _WIN64 ]
// 7.18.1.5 Greatest-width integer types
typedef int64_t intmax_t;
typedef uint64_t uintmax_t;
// 7.18.2 Limits of specified-width integer types
#if !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS) // [ See footnote 220 at page 257 and footnote 221 at page 259
// 7.18.2.1 Limits of exact-width integer types
#define INT8_MIN ((int8_t)_I8_MIN)
#define INT8_MAX _I8_MAX
#define INT16_MIN ((int16_t)_I16_MIN)
#define INT16_MAX _I16_MAX
#define INT32_MIN ((int32_t)_I32_MIN)
#define INT32_MAX _I32_MAX
#define INT64_MIN ((int64_t)_I64_MIN)
#define INT64_MAX _I64_MAX
#define UINT8_MAX _UI8_MAX
#define UINT16_MAX _UI16_MAX
#define UINT32_MAX _UI32_MAX
#define UINT64_MAX _UI64_MAX
// 7.18.2.2 Limits of minimum-width integer types
#define INT_LEAST8_MIN INT8_MIN
#define INT_LEAST8_MAX INT8_MAX
#define INT_LEAST16_MIN INT16_MIN
#define INT_LEAST16_MAX INT16_MAX
#define INT_LEAST32_MIN INT32_MIN
#define INT_LEAST32_MAX INT32_MAX
#define INT_LEAST64_MIN INT64_MIN
#define INT_LEAST64_MAX INT64_MAX
#define UINT_LEAST8_MAX UINT8_MAX
#define UINT_LEAST16_MAX UINT16_MAX
#define UINT_LEAST32_MAX UINT32_MAX
#define UINT_LEAST64_MAX UINT64_MAX
// 7.18.2.3 Limits of fastest minimum-width integer types
#define INT_FAST8_MIN INT8_MIN
#define INT_FAST8_MAX INT8_MAX
#define INT_FAST16_MIN INT16_MIN
#define INT_FAST16_MAX INT16_MAX
#define INT_FAST32_MIN INT32_MIN
#define INT_FAST32_MAX INT32_MAX
#define INT_FAST64_MIN INT64_MIN
#define INT_FAST64_MAX INT64_MAX
#define UINT_FAST8_MAX UINT8_MAX
#define UINT_FAST16_MAX UINT16_MAX
#define UINT_FAST32_MAX UINT32_MAX
#define UINT_FAST64_MAX UINT64_MAX
// 7.18.2.4 Limits of integer types capable of holding object pointers
#ifdef _WIN64 // [
# define INTPTR_MIN INT64_MIN
# define INTPTR_MAX INT64_MAX
# define UINTPTR_MAX UINT64_MAX
#else // _WIN64 ][
# define INTPTR_MIN INT32_MIN
# define INTPTR_MAX INT32_MAX
# define UINTPTR_MAX UINT32_MAX
#endif // _WIN64 ]
// 7.18.2.5 Limits of greatest-width integer types
#define INTMAX_MIN INT64_MIN
#define INTMAX_MAX INT64_MAX
#define UINTMAX_MAX UINT64_MAX
// 7.18.3 Limits of other integer types
#ifdef _WIN64 // [
# define PTRDIFF_MIN _I64_MIN
# define PTRDIFF_MAX _I64_MAX
#else // _WIN64 ][
# define PTRDIFF_MIN _I32_MIN
# define PTRDIFF_MAX _I32_MAX
#endif // _WIN64 ]
#define SIG_ATOMIC_MIN INT_MIN
#define SIG_ATOMIC_MAX INT_MAX
#ifndef SIZE_MAX // [
# ifdef _WIN64 // [
# define SIZE_MAX _UI64_MAX
# else // _WIN64 ][
# define SIZE_MAX _UI32_MAX
# endif // _WIN64 ]
#endif // SIZE_MAX ]
// WCHAR_MIN and WCHAR_MAX are also defined in <wchar.h>
#ifndef WCHAR_MIN // [
# define WCHAR_MIN 0
#endif // WCHAR_MIN ]
#ifndef WCHAR_MAX // [
# define WCHAR_MAX _UI16_MAX
#endif // WCHAR_MAX ]
#define WINT_MIN 0
#define WINT_MAX _UI16_MAX
#endif // __STDC_LIMIT_MACROS ]
// 7.18.4 Limits of other integer types
#if !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS) // [ See footnote 224 at page 260
// 7.18.4.1 Macros for minimum-width integer constants
#define INT8_C(val) val##i8
#define INT16_C(val) val##i16
#define INT32_C(val) val##i32
#define INT64_C(val) val##i64
#define UINT8_C(val) val##ui8
#define UINT16_C(val) val##ui16
#define UINT32_C(val) val##ui32
#define UINT64_C(val) val##ui64
// 7.18.4.2 Macros for greatest-width integer constants
#define INTMAX_C INT64_C
#define UINTMAX_C UINT64_C
#endif // __STDC_CONSTANT_MACROS ]
#endif // _MSC_STDINT_H_ ]

View File

@@ -36,17 +36,17 @@
*/
#if defined(_MSC_VER)
# if _MSC_VER < 1500
# error "Microsoft Visual Studio 2008 or higher required"
# if _MSC_VER < 1800
# error "Microsoft Visual Studio 2013 or higher required"
# endif
/*
* Visual Studio 2012 will complain if we define the `inline` keyword, but
* Visual Studio will complain if we define the `inline` keyword, but
* actually it only supports the keyword on C++.
*
* To avoid this the _ALLOW_KEYWORD_MACROS must be set.
*/
# if (_MSC_VER >= 1700) && !defined(_ALLOW_KEYWORD_MACROS)
# if !defined(_ALLOW_KEYWORD_MACROS)
# define _ALLOW_KEYWORD_MACROS
# endif
@@ -81,8 +81,6 @@
/* Intel compiler supports inline keyword */
# elif defined(__WATCOMC__) && (__WATCOMC__ >= 1100)
# define inline __inline
# elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
/* C99 supports inline keyword */
# elif (__STDC_VERSION__ >= 199901L)
/* C99 supports inline keyword */
# else
@@ -100,8 +98,6 @@
#ifndef restrict
# if (__STDC_VERSION__ >= 199901L)
/* C99 */
# elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
/* C99 */
# elif defined(__GNUC__)
# define restrict __restrict__
# elif defined(_MSC_VER)
@@ -118,8 +114,6 @@
#ifndef __func__
# if (__STDC_VERSION__ >= 199901L)
/* C99 */
# elif defined(__SUNPRO_C) && defined(__C99FEATURES__)
/* C99 */
# elif defined(__GNUC__)
# define __func__ __FUNCTION__
# elif defined(_MSC_VER)
@@ -141,4 +135,48 @@ test_c99_compat_h(const void * restrict a,
#endif
/* Fallback definitions, for build systems other than autoconfig which don't
* auto-detect these things. */
#ifdef HAVE_NO_AUTOCONF
# ifndef _WIN32
# define HAVE_PTHREAD
# define HAVE_POSIX_MEMALIGN
# endif
# ifdef __GNUC__
# if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 2)
# error "GCC version 4.2 or higher required"
# endif
/* https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Other-Builtins.html */
# define HAVE___BUILTIN_CLZ 1
# define HAVE___BUILTIN_CLZLL 1
# define HAVE___BUILTIN_CTZ 1
# define HAVE___BUILTIN_EXPECT 1
# define HAVE___BUILTIN_FFS 1
# define HAVE___BUILTIN_FFSLL 1
# define HAVE___BUILTIN_POPCOUNT 1
# define HAVE___BUILTIN_POPCOUNTLL 1
/* https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Function-Attributes.html */
# define HAVE_FUNC_ATTRIBUTE_FLATTEN 1
# define HAVE_FUNC_ATTRIBUTE_UNUSED 1
# define HAVE_FUNC_ATTRIBUTE_FORMAT 1
# define HAVE_FUNC_ATTRIBUTE_PACKED 1
# if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
/* https://gcc.gnu.org/onlinedocs/gcc-4.3.6/gcc/Other-Builtins.html */
# define HAVE___BUILTIN_BSWAP32 1
# define HAVE___BUILTIN_BSWAP64 1
# endif
# if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5)
# define HAVE___BUILTIN_UNREACHABLE 1
# endif
# endif /* __GNUC__ */
#endif /* !HAVE_AUTOCONF */
#endif /* _C99_COMPAT_H_ */

View File

@@ -38,55 +38,16 @@
#include "c99_compat.h"
#if defined(_MSC_VER)
/* This is to ensure that we get M_PI, etc. definitions */
#if !defined(_USE_MATH_DEFINES)
#if defined(_MSC_VER) && !defined(_USE_MATH_DEFINES)
#error _USE_MATH_DEFINES define required when building with MSVC
#endif
#if _MSC_VER < 1800
#define isfinite(x) _finite((double)(x))
#define isnan(x) _isnan((double)(x))
#endif /* _MSC_VER < 1800 */
#if _MSC_VER < 1800
static inline double log2( double x )
{
const double invln2 = 1.442695041;
return log( x ) * invln2;
}
static inline double
round(double x)
{
return x >= 0.0 ? floor(x + 0.5) : ceil(x - 0.5);
}
static inline float
roundf(float x)
{
return x >= 0.0f ? floorf(x + 0.5f) : ceilf(x - 0.5f);
}
#endif
#ifndef INFINITY
#include <float.h> // DBL_MAX
#define INFINITY (DBL_MAX + DBL_MAX)
#endif
#ifndef NAN
#define NAN (INFINITY - INFINITY)
#endif
#endif /* _MSC_VER */
#if (defined(_MSC_VER) && _MSC_VER < 1800) || \
(!defined(_MSC_VER) && \
__STDC_VERSION__ < 199901L && \
(!defined(_XOPEN_SOURCE) || _XOPEN_SOURCE < 600) && \
!defined(__cplusplus))
#if !defined(_MSC_VER) && \
__STDC_VERSION__ < 199901L && \
(!defined(_XOPEN_SOURCE) || _XOPEN_SOURCE < 600) && \
!defined(__cplusplus)
static inline long int
lrint(double d)
@@ -224,4 +185,27 @@ fpclassify(double x)
#endif
/* Since C++11, the following functions are part of the std namespace. Their C
* counteparts should still exist in the global namespace, however cmath
* undefines those functions, which in glibc 2.23, are defined as macros rather
* than functions as in glibc 2.22.
*/
#if __cplusplus >= 201103L && (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 23))
#include <cmath>
using std::fpclassify;
using std::isfinite;
using std::isinf;
using std::isnan;
using std::isnormal;
using std::signbit;
using std::isgreater;
using std::isgreaterequal;
using std::isless;
using std::islessequal;
using std::islessgreater;
using std::isunordered;
#endif
#endif /* #define _C99_MATH_H_ */

View File

@@ -29,7 +29,11 @@
#define D3DADAPTER9DRM_NAME "drm"
/* current version */
#define D3DADAPTER9DRM_MAJOR 0
#define D3DADAPTER9DRM_MINOR 0
#define D3DADAPTER9DRM_MINOR 1
/* version 0.0: Initial release
* 0.1: All IDirect3D objects can be assumed to have a pointer to the
* internal vtable in second position of the structure */
struct D3DAdapter9DRM
{

View File

@@ -69,6 +69,12 @@ typedef struct ID3DPresentVtbl
HRESULT (WINAPI *SetCursor)(ID3DPresent *This, void *pBitmap, POINT *pHotspot, BOOL bShow);
HRESULT (WINAPI *SetGammaRamp)(ID3DPresent *This, const D3DGAMMARAMP *pRamp, HWND hWndOverride);
HRESULT (WINAPI *GetWindowInfo)(ID3DPresent *This, HWND hWnd, int *width, int *height, int *depth);
/* Available since version 1.1 */
BOOL (WINAPI *GetWindowOccluded)(ID3DPresent *This);
/* Available since version 1.2 */
BOOL (WINAPI *ResolutionMismatch)(ID3DPresent *This);
HANDLE (WINAPI *CreateThread)(ID3DPresent *This, void *pThreadfunc, void *pParam);
BOOL (WINAPI *WaitForThread)(ID3DPresent *This, HANDLE thread);
} ID3DPresentVtbl;
struct ID3DPresent
@@ -96,6 +102,10 @@ struct ID3DPresent
#define ID3DPresent_SetCursor(p,a,b,c) (p)->lpVtbl->SetCursor(p,a,b,c)
#define ID3DPresent_SetGammaRamp(p,a,b) (p)->lpVtbl->SetGammaRamp(p,a,b)
#define ID3DPresent_GetWindowInfo(p,a,b,c,d) (p)->lpVtbl->GetWindowSize(p,a,b,c,d)
#define ID3DPresent_GetWindowOccluded(p) (p)->lpVtbl->GetWindowOccluded(p)
#define ID3DPresent_ResolutionMismatch(p) (p)->lpVtbl->ResolutionMismatch(p)
#define ID3DPresent_CreateThread(p,a,b) (p)->lpVtbl->CreateThread(p,a,b)
#define ID3DPresent_WaitForThread(p,a) (p)->lpVtbl->WaitForThread(p,a)
typedef struct ID3DPresentGroupVtbl
{

View File

@@ -112,6 +112,7 @@ CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x190B, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x1912, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x1913, skl_gt2, "Intel(R) Skylake GT2f")
@@ -122,16 +123,17 @@ CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x191B, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x191D, skl_gt2, "Intel(R) HD Graphics P530 (Skylake GT2)")
CHIPSET(0x191E, skl_gt2, "Intel(R) HD Graphics 515 (Skylake GT2)")
CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x1923, skl_gt3, "Intel(R) Iris Graphics 540 (Skylake GT3e)")
CHIPSET(0x1926, skl_gt3, "Intel(R) HD Graphics 535 (Skylake GT3)")
CHIPSET(0x1921, skl_gt2, "Intel(R) HD Graphics 520 (Skylake GT2)")
CHIPSET(0x1923, skl_gt3, "Intel(R) Skylake GT3e")
CHIPSET(0x1926, skl_gt3, "Intel(R) Iris Graphics 540 (Skylake GT3e)")
CHIPSET(0x1927, skl_gt3, "Intel(R) Iris Graphics 550 (Skylake GT3e)")
CHIPSET(0x192A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x192B, skl_gt3, "Intel(R) Iris Graphics (Skylake GT3fe)")
CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x192B, skl_gt3, "Intel(R) Iris Graphics 555 (Skylake GT3e)")
CHIPSET(0x192D, skl_gt3, "Intel(R) Iris Graphics P555 (Skylake GT3e)")
CHIPSET(0x1932, skl_gt4, "Intel(R) Iris Pro Graphics 580 (Skylake GT4e)")
CHIPSET(0x193A, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4e)")
CHIPSET(0x193B, skl_gt4, "Intel(R) Iris Pro Graphics 580 (Skylake GT4e)")
CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4e)")
CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
@@ -154,10 +156,12 @@ CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593A, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593D, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x0A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")

View File

@@ -182,4 +182,26 @@ CHIPSET(0x9877, CARRIZO_, CARRIZO)
CHIPSET(0x7300, FIJI_, FIJI)
CHIPSET(0x67E0, POLARIS11_, POLARIS11)
CHIPSET(0x67E1, POLARIS11_, POLARIS11)
CHIPSET(0x67E3, POLARIS11_, POLARIS11)
CHIPSET(0x67E7, POLARIS11_, POLARIS11)
CHIPSET(0x67E8, POLARIS11_, POLARIS11)
CHIPSET(0x67E9, POLARIS11_, POLARIS11)
CHIPSET(0x67EB, POLARIS11_, POLARIS11)
CHIPSET(0x67EF, POLARIS11_, POLARIS11)
CHIPSET(0x67FF, POLARIS11_, POLARIS11)
CHIPSET(0x67C0, POLARIS10_, POLARIS10)
CHIPSET(0x67C1, POLARIS10_, POLARIS10)
CHIPSET(0x67C2, POLARIS10_, POLARIS10)
CHIPSET(0x67C4, POLARIS10_, POLARIS10)
CHIPSET(0x67C7, POLARIS10_, POLARIS10)
CHIPSET(0x67C8, POLARIS10_, POLARIS10)
CHIPSET(0x67C9, POLARIS10_, POLARIS10)
CHIPSET(0x67CA, POLARIS10_, POLARIS10)
CHIPSET(0x67CC, POLARIS10_, POLARIS10)
CHIPSET(0x67CF, POLARIS10_, POLARIS10)
CHIPSET(0x67DF, POLARIS10_, POLARIS10)
CHIPSET(0x98E4, STONEY_, STONEY)

View File

@@ -0,0 +1,2 @@
CHIPSET(0x0010, VIRTGL, VIRTGL)
CHIPSET(0x1050, VIRTGL, VIRTGL)

85
include/vulkan/vk_icd.h Normal file
View File

@@ -0,0 +1,85 @@
#ifndef VKICD_H
#define VKICD_H
#include "vk_platform.h"
/*
* The ICD must reserve space for a pointer for the loader's dispatch
* table, at the start of <each object>.
* The ICD must initialize this variable using the SET_LOADER_MAGIC_VALUE macro.
*/
#define ICD_LOADER_MAGIC 0x01CDC0DE
typedef union _VK_LOADER_DATA {
uintptr_t loaderMagic;
void *loaderData;
} VK_LOADER_DATA;
static inline void set_loader_magic_value(void* pNewObject) {
VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *) pNewObject;
loader_info->loaderMagic = ICD_LOADER_MAGIC;
}
static inline bool valid_loader_magic_value(void* pNewObject) {
const VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *) pNewObject;
return (loader_info->loaderMagic & 0xffffffff) == ICD_LOADER_MAGIC;
}
/*
* Windows and Linux ICDs will treat VkSurfaceKHR as a pointer to a struct that
* contains the platform-specific connection and surface information.
*/
typedef enum _VkIcdWsiPlatform {
VK_ICD_WSI_PLATFORM_MIR,
VK_ICD_WSI_PLATFORM_WAYLAND,
VK_ICD_WSI_PLATFORM_WIN32,
VK_ICD_WSI_PLATFORM_XCB,
VK_ICD_WSI_PLATFORM_XLIB,
} VkIcdWsiPlatform;
typedef struct _VkIcdSurfaceBase {
VkIcdWsiPlatform platform;
} VkIcdSurfaceBase;
#ifdef VK_USE_PLATFORM_MIR_KHR
typedef struct _VkIcdSurfaceMir {
VkIcdSurfaceBase base;
MirConnection* connection;
MirSurface* mirSurface;
} VkIcdSurfaceMir;
#endif // VK_USE_PLATFORM_MIR_KHR
#ifdef VK_USE_PLATFORM_WAYLAND_KHR
typedef struct _VkIcdSurfaceWayland {
VkIcdSurfaceBase base;
struct wl_display* display;
struct wl_surface* surface;
} VkIcdSurfaceWayland;
#endif // VK_USE_PLATFORM_WAYLAND_KHR
#ifdef VK_USE_PLATFORM_WIN32_KHR
typedef struct _VkIcdSurfaceWin32 {
VkIcdSurfaceBase base;
HINSTANCE hinstance;
HWND hwnd;
} VkIcdSurfaceWin32;
#endif // VK_USE_PLATFORM_WIN32_KHR
#ifdef VK_USE_PLATFORM_XCB_KHR
typedef struct _VkIcdSurfaceXcb {
VkIcdSurfaceBase base;
xcb_connection_t* connection;
xcb_window_t window;
} VkIcdSurfaceXcb;
#endif // VK_USE_PLATFORM_XCB_KHR
#ifdef VK_USE_PLATFORM_XLIB_KHR
typedef struct _VkIcdSurfaceXlib {
VkIcdSurfaceBase base;
Display* dpy;
Window window;
} VkIcdSurfaceXlib;
#endif // VK_USE_PLATFORM_XLIB_KHR
#endif // VKICD_H

View File

@@ -0,0 +1,127 @@
//
// File: vk_platform.h
//
/*
** Copyright (c) 2014-2015 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
** "Materials"), to deal in the Materials without restriction, including
** without limitation the rights to use, copy, modify, merge, publish,
** distribute, sublicense, and/or sell copies of the Materials, and to
** permit persons to whom the Materials are furnished to do so, subject to
** the following conditions:
**
** The above copyright notice and this permission notice shall be included
** in all copies or substantial portions of the Materials.
**
** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
*/
#ifndef VK_PLATFORM_H_
#define VK_PLATFORM_H_
#ifdef __cplusplus
extern "C"
{
#endif // __cplusplus
/*
***************************************************************************************************
* Platform-specific directives and type declarations
***************************************************************************************************
*/
/* Platform-specific calling convention macros.
*
* Platforms should define these so that Vulkan clients call Vulkan commands
* with the same calling conventions that the Vulkan implementation expects.
*
* VKAPI_ATTR - Placed before the return type in function declarations.
* Useful for C++11 and GCC/Clang-style function attribute syntax.
* VKAPI_CALL - Placed after the return type in function declarations.
* Useful for MSVC-style calling convention syntax.
* VKAPI_PTR - Placed between the '(' and '*' in function pointer types.
*
* Function declaration: VKAPI_ATTR void VKAPI_CALL vkCommand(void);
* Function pointer type: typedef void (VKAPI_PTR *PFN_vkCommand)(void);
*/
#if defined(_WIN32)
// On Windows, Vulkan commands use the stdcall convention
#define VKAPI_ATTR
#define VKAPI_CALL __stdcall
#define VKAPI_PTR VKAPI_CALL
#elif defined(__ANDROID__) && defined(__ARM_EABI__) && !defined(__ARM_ARCH_7A__)
// Android does not support Vulkan in native code using the "armeabi" ABI.
#error "Vulkan requires the 'armeabi-v7a' or 'armeabi-v7a-hard' ABI on 32-bit ARM CPUs"
#elif defined(__ANDROID__) && defined(__ARM_ARCH_7A__)
// On Android/ARMv7a, Vulkan functions use the armeabi-v7a-hard calling
// convention, even if the application's native code is compiled with the
// armeabi-v7a calling convention.
#define VKAPI_ATTR __attribute__((pcs("aapcs-vfp")))
#define VKAPI_CALL
#define VKAPI_PTR VKAPI_ATTR
#else
// On other platforms, use the default calling convention
#define VKAPI_ATTR
#define VKAPI_CALL
#define VKAPI_PTR
#endif
#include <stddef.h>
#if !defined(VK_NO_STDINT_H)
#if defined(_MSC_VER) && (_MSC_VER < 1600)
typedef signed __int8 int8_t;
typedef unsigned __int8 uint8_t;
typedef signed __int16 int16_t;
typedef unsigned __int16 uint16_t;
typedef signed __int32 int32_t;
typedef unsigned __int32 uint32_t;
typedef signed __int64 int64_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif
#endif // !defined(VK_NO_STDINT_H)
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
// Platform-specific headers required by platform window system extensions.
// These are enabled prior to #including "vulkan.h". The same enable then
// controls inclusion of the extension interfaces in vulkan.h.
#ifdef VK_USE_PLATFORM_ANDROID_KHR
#include <android/native_window.h>
#endif
#ifdef VK_USE_PLATFORM_MIR_KHR
#include <mir_toolkit/client_types.h>
#endif
#ifdef VK_USE_PLATFORM_WAYLAND_KHR
#include <wayland-client.h>
#endif
#ifdef VK_USE_PLATFORM_WIN32_KHR
#include <windows.h>
#endif
#ifdef VK_USE_PLATFORM_XLIB_KHR
#include <X11/Xlib.h>
#endif
#ifdef VK_USE_PLATFORM_XCB_KHR
#include <xcb/xcb.h>
#endif
#endif

3800
include/vulkan/vulkan.h Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,62 @@
/*
* Copyright © 2015 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#ifndef __VULKAN_INTEL_H__
#define __VULKAN_INTEL_H__
#include "vulkan.h"
#ifdef __cplusplus
extern "C"
{
#endif // __cplusplus
#define VK_STRUCTURE_TYPE_DMA_BUF_IMAGE_CREATE_INFO_INTEL 1024
typedef struct VkDmaBufImageCreateInfo_
{
VkStructureType sType; // Must be VK_STRUCTURE_TYPE_DMA_BUF_IMAGE_CREATE_INFO_INTEL
const void* pNext; // Pointer to next structure.
int fd;
VkFormat format;
VkExtent3D extent; // Depth must be 1
uint32_t strideInBytes;
} VkDmaBufImageCreateInfo;
typedef VkResult (VKAPI_PTR *PFN_vkCreateDmaBufImageINTEL)(VkDevice device, const VkDmaBufImageCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkDeviceMemory* pMem, VkImage* pImage);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkCreateDmaBufImageINTEL(
VkDevice _device,
const VkDmaBufImageCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkDeviceMemory* pMem,
VkImage* pImage);
#endif
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // __VULKAN_INTEL_H__

View File

@@ -3,18 +3,18 @@
if BUILD_SHARED
if HAVE_COMPAT_SYMLINKS
all-local : .libs/install-gallium-links
all-local : .install-gallium-links
.libs/install-gallium-links : $(dri_LTLIBRARIES) $(egl_LTLIBRARIES) $(lib_LTLIBRARIES)
.install-gallium-links : $(dri_LTLIBRARIES) $(egl_LTLIBRARIES) $(lib_LTLIBRARIES)
$(AM_V_GEN)$(MKDIR_P) $(top_builddir)/$(LIB_DIR); \
link_dir=$(top_builddir)/$(LIB_DIR)/gallium; \
if test x$(egl_LTLIBRARIES) != x; then \
link_dir=$(top_builddir)/$(LIB_DIR)/egl; \
fi; \
$(MKDIR_P) $$link_dir; \
file_list=$(dri_LTLIBRARIES:%.la=.libs/%.so); \
file_list+=$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
file_list+=$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
file_list="$(dri_LTLIBRARIES:%.la=.libs/%.so)"; \
file_list+="$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
file_list+="$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)"; \
for f in $$file_list; do \
if test -h .libs/$$f; then \
cp -d $$f $$link_dir; \
@@ -23,4 +23,15 @@ all-local : .libs/install-gallium-links
fi; \
done && touch $@
endif
clean-local:
for f in $(notdir $(dri_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)) \
$(notdir $(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)) \
$(notdir $(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)); do \
echo $$f; \
$(RM) $(top_builddir)/$(LIB_DIR)/gallium/$$f; \
done;
rmdir $(top_builddir)/$(LIB_DIR)/gallium || true
$(RM) .install-gallium-links
endif

View File

@@ -53,6 +53,7 @@
# optimize
# packed
# pure
# returns_nonnull
# unused
# used
# visibility
@@ -76,6 +77,9 @@
#serial 2
# mattst88:
# Added support for returns_nonnull attribute
AC_DEFUN([AX_GCC_FUNC_ATTRIBUTE], [
AS_VAR_PUSHDEF([ac_var], [ax_cv_have_func_attribute_$1])
@@ -175,6 +179,9 @@ AC_DEFUN([AX_GCC_FUNC_ATTRIBUTE], [
[pure], [
int foo( void ) __attribute__(($1));
],
[returns_nonnull], [
int *foo( void ) __attribute__(($1));
],
[unused], [
int foo( void ) __attribute__(($1));
],

View File

@@ -30,11 +30,10 @@ Custom builders and methods.
#
import os
import os.path
import re
import sys
import subprocess
import modulefinder
import SCons.Action
import SCons.Builder
@@ -44,6 +43,13 @@ import fixes
import source_list
# the get_implicit_deps() method changed between 2.4 and 2.5: now it expects
# a callable that takes a scanner as argument and returns a path, rather than
# a path directly. We want to support both, so we need to detect the SCons version,
# for which no API is provided by SCons 8-P
scons_version = tuple(map(int, SCons.__version__.split('.')))
def quietCommandLines(env):
# Quiet command lines
# See also http://www.scons.org/wiki/HidingCommandLinesInOutput
@@ -93,27 +99,19 @@ def createConvenienceLibBuilder(env):
return convenience_lib
# TODO: handle import statements with multiple modules
# TODO: handle from import statements
import_re = re.compile(r'^\s*import\s+(\S+)\s*$', re.M)
def python_scan(node, env, path):
# http://www.scons.org/doc/0.98.5/HTML/scons-user/c2781.html#AEN2789
# https://docs.python.org/2/library/modulefinder.html
contents = node.get_contents()
source_dir = node.get_dir()
imports = import_re.findall(contents)
finder = modulefinder.ModuleFinder()
finder.run_script(node.abspath)
results = []
for imp in imports:
for dir in path:
file = os.path.join(str(dir), imp.replace('.', os.sep) + '.py')
if os.path.exists(file):
results.append(env.File(file))
break
file = os.path.join(str(dir), imp.replace('.', os.sep), '__init__.py')
if os.path.exists(file):
results.append(env.File(file))
break
#print node, map(str, results)
for name, mod in finder.modules.iteritems():
if mod.__file__ is None:
continue
assert os.path.exists(mod.__file__)
results.append(env.File(mod.__file__))
return results
python_scanner = SCons.Scanner.Scanner(function = python_scan, skeys = ['.py'])
@@ -138,7 +136,7 @@ def code_generate(env, script, target, source, command):
# Explicitly mark that the generated code depends on the generator,
# and on implicitly imported python modules
path = (script_src.get_dir(),)
path = (script_src.get_dir(),) if scons_version < (2, 5, 0) else lambda x: script_src
deps = [script_src]
deps += script_src.get_implicit_deps(env, python_scanner, path)
env.Depends(code, deps)

View File

@@ -82,11 +82,6 @@ def install_shared_library(env, sources, version = ()):
return targets
def createInstallMethods(env):
env.AddMethod(install_program, 'InstallProgram')
env.AddMethod(install_shared_library, 'InstallSharedLibrary')
def msvc2013_compat(env):
if env['gcc']:
env.Append(CCFLAGS = [
@@ -94,16 +89,20 @@ def msvc2013_compat(env):
'-Werror=pointer-arith',
])
def msvc2008_compat(env):
msvc2013_compat(env)
if env['gcc']:
env.Append(CFLAGS = [
'-Werror=declaration-after-statement',
])
def createMSVCCompatMethods(env):
env.AddMethod(msvc2013_compat, 'MSVC2013Compat')
env.AddMethod(msvc2008_compat, 'MSVC2008Compat')
def unit_test(env, test_name, program_target, args=None):
env.InstallProgram(program_target)
cmd = [program_target[0].abspath]
if args is not None:
cmd += args
cmd = ' '.join(cmd)
# http://www.scons.org/wiki/UnitTests
action = SCons.Action.Action(cmd, " Running $SOURCE ...")
alias = env.Alias(test_name, program_target, action)
env.AlwaysBuild(alias)
env.Depends('check', alias)
def num_jobs():
@@ -172,16 +171,6 @@ def generate(env):
# Allow override compiler and specify additional flags from environment
if os.environ.has_key('CC'):
env['CC'] = os.environ['CC']
# Update CCVERSION to match
pipe = SCons.Action._subproc(env, [env['CC'], '--version'],
stdin = 'devnull',
stderr = 'devnull',
stdout = subprocess.PIPE)
if pipe.wait() == 0:
line = pipe.stdout.readline()
match = re.search(r'[0-9]+(\.[0-9]+)+', line)
if match:
env['CCVERSION'] = match.group(0)
if os.environ.has_key('CFLAGS'):
env['CCFLAGS'] += SCons.Util.CLVar(os.environ['CFLAGS'])
if os.environ.has_key('CXX'):
@@ -194,14 +183,15 @@ def generate(env):
# Detect gcc/clang not by executable name, but through pre-defined macros
# as autoconf does, to avoid drawing wrong conclusions when using tools
# that overrice CC/CXX like scan-build.
env['gcc'] = 0
env['gcc_compat'] = 0
env['clang'] = 0
env['msvc'] = 0
if host_platform.system() == 'Windows':
env['msvc'] = check_cc(env, 'MSVC', 'defined(_MSC_VER)', '/E')
if not env['msvc']:
env['gcc'] = check_cc(env, 'GCC', 'defined(__GNUC__) && !defined(__clang__)')
env['clang'] = check_cc(env, 'Clang', '__clang__')
env['gcc_compat'] = check_cc(env, 'GCC', 'defined(__GNUC__)')
env['clang'] = check_cc(env, 'Clang', '__clang__')
env['gcc'] = env['gcc_compat'] and not env['clang']
env['suncc'] = env['platform'] == 'sunos' and os.path.basename(env['CC']) == 'cc'
env['icc'] = 'icc' == os.path.basename(env['CC'])
@@ -214,7 +204,7 @@ def generate(env):
platform = env['platform']
x86 = env['machine'] == 'x86'
ppc = env['machine'] == 'ppc'
gcc_compat = env['gcc'] or env['clang']
gcc_compat = env['gcc_compat']
msvc = env['msvc']
suncc = env['suncc']
icc = env['icc']
@@ -300,7 +290,11 @@ def generate(env):
# C preprocessor options
cppdefines = []
cppdefines += ['__STDC_LIMIT_MACROS']
cppdefines += [
'__STDC_LIMIT_MACROS',
'__STDC_CONSTANT_MACROS',
'HAVE_NO_AUTOCONF',
]
if env['build'] in ('debug', 'checked'):
cppdefines += ['DEBUG']
else:
@@ -315,8 +309,6 @@ def generate(env):
'_BSD_SOURCE',
'_GNU_SOURCE',
'_DEFAULT_SOURCE',
'HAVE_PTHREAD',
'HAVE_POSIX_MEMALIGN',
]
if env['platform'] == 'darwin':
cppdefines += [
@@ -337,11 +329,6 @@ def generate(env):
if env['platform'] in ('linux', 'darwin'):
cppdefines += ['HAVE_XLOCALE_H']
if env['platform'] == 'haiku':
cppdefines += [
'HAVE_PTHREAD',
'HAVE_POSIX_MEMALIGN'
]
if platform == 'windows':
cppdefines += [
'WIN32',
@@ -375,26 +362,6 @@ def generate(env):
print 'warning: Floating-point textures enabled.'
print 'warning: Please consult docs/patents.txt with your lawyer before building Mesa.'
cppdefines += ['TEXTURE_FLOAT_ENABLED']
if gcc_compat:
ccversion = env['CCVERSION']
cppdefines += [
'HAVE___BUILTIN_EXPECT',
'HAVE___BUILTIN_FFS',
'HAVE___BUILTIN_FFSLL',
'HAVE_FUNC_ATTRIBUTE_FLATTEN',
'HAVE_FUNC_ATTRIBUTE_UNUSED',
# GCC 3.0
'HAVE_FUNC_ATTRIBUTE_FORMAT',
'HAVE_FUNC_ATTRIBUTE_PACKED',
# GCC 3.4
'HAVE___BUILTIN_CTZ',
'HAVE___BUILTIN_POPCOUNT',
'HAVE___BUILTIN_POPCOUNTLL',
'HAVE___BUILTIN_CLZ',
'HAVE___BUILTIN_CLZLL',
]
if distutils.version.LooseVersion(ccversion) >= distutils.version.LooseVersion('4.5'):
cppdefines += ['HAVE___BUILTIN_UNREACHABLE']
env.Append(CPPDEFINES = cppdefines)
# C compiler options
@@ -402,13 +369,8 @@ def generate(env):
cxxflags = [] # C++
ccflags = [] # C & C++
if gcc_compat:
ccversion = env['CCVERSION']
if env['build'] == 'debug':
ccflags += ['-O0']
elif env['gcc'] and ccversion.startswith('4.2.'):
# gcc 4.2.x optimizer is broken
print "warning: gcc 4.2.x optimizer is broken -- disabling optimizations"
ccflags += ['-O0']
else:
ccflags += ['-O3']
if env['gcc']:
@@ -418,7 +380,7 @@ def generate(env):
# Work around aliasing bugs - developers should comment this out
ccflags += ['-fno-strict-aliasing']
ccflags += ['-g']
if env['build'] in ('checked', 'profile'):
if env['build'] in ('checked', 'profile') or env['asan']:
# See http://code.google.com/p/jrfonseca/wiki/Gprof2Dot#Which_options_should_I_pass_to_gcc_when_compiling_for_profiling?
ccflags += [
'-fno-omit-frame-pointer',
@@ -479,31 +441,23 @@ def generate(env):
# See also:
# - http://msdn.microsoft.com/en-us/library/19z1t1wy.aspx
# - cl /?
if 'MSVC_VERSION' not in env or distutils.version.LooseVersion(env['MSVC_VERSION']) < distutils.version.LooseVersion('12.0'):
# Use bundled stdbool.h and stdint.h headers for older MSVC
# versions. stdint.h was introduced in MSVC 2010, but stdbool.h
# was only introduced in MSVC 2013.
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
env.Append(CPPPATH = [os.path.join(top_dir, 'include/c99')])
if env['build'] == 'debug':
ccflags += [
'/Od', # disable optimizations
'/Oi', # enable intrinsic functions
]
else:
if 'MSVC_VERSION' in env and distutils.version.LooseVersion(env['MSVC_VERSION']) < distutils.version.LooseVersion('11.0'):
print 'scons: warning: Visual Studio versions prior to 2012 are known to produce incorrect code when optimizations are enabled ( https://bugs.freedesktop.org/show_bug.cgi?id=58718 )'
ccflags += [
'/O2', # optimize for speed
]
if env['build'] == 'release':
ccflags += [
'/GL', # enable whole program optimization
]
if not env['clang']:
ccflags += [
'/GL', # enable whole program optimization
]
else:
ccflags += [
'/Oy-', # disable frame pointer omission
'/GL-', # disable whole program optimization
]
ccflags += [
'/W3', # warning level
@@ -517,6 +471,10 @@ def generate(env):
'/wd4800', # forcing value to bool 'true' or 'false' (performance warning)
'/wd4996', # disable deprecated POSIX name warnings
]
if env['clang']:
ccflags += [
'-Wno-microsoft-enum-value', # enumerator value is not representable in underlying type 'int'
]
if env['machine'] == 'x86':
ccflags += [
'/arch:SSE2', # use the SSE2 instructions (default since MSVC 2012)
@@ -556,6 +514,16 @@ def generate(env):
# scan-build will produce more comprehensive output
env.Append(CCFLAGS = ['--analyze'])
# https://github.com/google/sanitizers/wiki/AddressSanitizer
if env['asan']:
if gcc_compat:
env.Append(CCFLAGS = [
'-fsanitize=address',
])
env.Append(LINKFLAGS = [
'-fsanitize=address',
])
# Assembler options
if gcc_compat:
if env['machine'] == 'x86':
@@ -593,7 +561,7 @@ def generate(env):
shlinkflags += ['-Wl,--enable-stdcall-fixup']
#shlinkflags += ['-Wl,--kill-at']
if msvc:
if env['build'] == 'release':
if env['build'] == 'release' and not env['clang']:
# enable Link-time Code Generation
linkflags += ['/LTCG']
env.Append(ARFLAGS = ['/LTCG'])
@@ -673,8 +641,10 @@ def generate(env):
# Custom builders and methods
env.Tool('custom')
createInstallMethods(env)
createMSVCCompatMethods(env)
env.AddMethod(install_program, 'InstallProgram')
env.AddMethod(install_shared_library, 'InstallSharedLibrary')
env.AddMethod(msvc2013_compat, 'MSVC2013Compat')
env.AddMethod(unit_test, 'UnitTest')
env.PkgCheckModules('X11', ['x11', 'xext', 'xdamage', 'xfixes', 'glproto >= 1.4.13'])
env.PkgCheckModules('XCB', ['x11-xcb', 'xcb-glx >= 1.8.1', 'xcb-dri2 >= 1.8'])

View File

@@ -106,7 +106,19 @@ def generate(env):
])
env.Prepend(LIBPATH = [os.path.join(llvm_dir, 'lib')])
# LIBS should match the output of `llvm-config --libs engine mcjit bitwriter x86asmprinter`
if llvm_version >= distutils.version.LooseVersion('3.6'):
if llvm_version >= distutils.version.LooseVersion('3.7'):
env.Prepend(LIBS = [
'LLVMBitWriter', 'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
'LLVMCodeGen', 'LLVMScalarOpts', 'LLVMProfileData',
'LLVMInstCombine', 'LLVMInstrumentation', 'LLVMTransformUtils', 'LLVMipa',
'LLVMAnalysis', 'LLVMX86Desc', 'LLVMMCDisassembler',
'LLVMX86Info', 'LLVMX86AsmPrinter', 'LLVMX86Utils',
'LLVMMCJIT', 'LLVMTarget', 'LLVMExecutionEngine',
'LLVMRuntimeDyld', 'LLVMObject', 'LLVMMCParser',
'LLVMBitReader', 'LLVMMC', 'LLVMCore', 'LLVMSupport'
])
elif llvm_version >= distutils.version.LooseVersion('3.6'):
env.Prepend(LIBS = [
'LLVMBitWriter', 'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',

2301
scripts/get_reviewer.pl Executable file

File diff suppressed because it is too large Load Diff

View File

@@ -19,10 +19,28 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
git_sha1.h:
@if test -e $(top_srcdir)/.git; then \
if which git > /dev/null; then \
git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
> git_sha1.h ; \
fi \
fi
BUILT_SOURCES = git_sha1.h
SUBDIRS = . gtest util mapi/glapi/gen mapi
# include only conditionally ?
SUBDIRS += compiler
if HAVE_INTEL_DRIVERS
SUBDIRS += intel
endif
if NEED_OPENGL_COMMON
SUBDIRS += glsl mesa
SUBDIRS += mesa
endif
SUBDIRS += loader
@@ -31,24 +49,36 @@ if HAVE_DRI_GLX
SUBDIRS += glx
endif
if HAVE_EGL_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-egl egl/wayland/wayland-drm
## Optionally required by GBM and EGL
if HAVE_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-drm
endif
## Optionally required by EGL (aka PLATFORM_GBM)
if HAVE_GBM
SUBDIRS += gbm
endif
## Optionally required by EGL
if HAVE_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-egl
endif
if HAVE_EGL
SUBDIRS += egl
endif
## Requires the i965 compiler (part of mesa) and wayland-drm
if HAVE_INTEL_VULKAN
SUBDIRS += intel/vulkan
endif
if HAVE_GALLIUM
SUBDIRS += gallium
endif
EXTRA_DIST = \
getopt hgl SConscript
getopt hgl SConscript git_sha1.h
AM_CFLAGS = $(VISIBILITY_CFLAGS)
AM_CXXFLAGS = $(VISIBILITY_CXXFLAGS)

View File

@@ -5,7 +5,7 @@ if env['platform'] == 'windows':
SConscript('getopt/SConscript')
SConscript('util/SConscript')
SConscript('glsl/SConscript')
SConscript('compiler/SConscript')
if env['hostonly']:
# We are just compiling the things necessary on the host for cross

5
src/compiler/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
glsl_compiler
subtest-cr
subtest-cr-lf
subtest-lf
subtest-lf-cr

View File

@@ -0,0 +1,80 @@
# Mesa 3-D graphics library
#
# Copyright (C) 2010-2011 Chia-I Wu <olvaffe@gmail.com>
# Copyright (C) 2010-2011 LunarG Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# included by glsl Android.mk for source generation
ifeq ($(LOCAL_MODULE_CLASS),)
LOCAL_MODULE_CLASS := STATIC_LIBRARIES
endif
intermediates := $(call local-generated-sources-dir)
LOCAL_SRC_FILES := $(LOCAL_SRC_FILES)
LOCAL_C_INCLUDES += \
$(intermediates)/glsl \
$(intermediates)/glsl/glcpp \
$(LOCAL_PATH)/glsl \
$(LOCAL_PATH)/glsl/glcpp \
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \
$(LIBGLCPP_GENERATED_FILES) \
$(LIBGLSL_GENERATED_CXX_FILES))
define local-l-or-ll-to-c-or-cpp
@mkdir -p $(dir $@)
@echo "Mesa Lex: $(PRIVATE_MODULE) <= $<"
$(hide) $(LEX) --nounistd -o$@ $<
endef
define glsl_local-y-to-c-and-h
@mkdir -p $(dir $@)
@echo "Mesa Yacc: $(PRIVATE_MODULE) <= $<"
$(hide) $(YACC) -o $@ -p "glcpp_parser_" $<
endef
YACC_HEADER_SUFFIX := .hpp
define local-yy-to-cpp-and-h
@mkdir -p $(dir $@)
@echo "Mesa Yacc: $(PRIVATE_MODULE) <= $<"
$(hide) $(YACC) -p "_mesa_glsl_" -o $@ $<
touch $(@:$1=$(YACC_HEADER_SUFFIX))
echo '#ifndef '$(@F:$1=_h) > $(@:$1=.h)
echo '#define '$(@F:$1=_h) >> $(@:$1=.h)
cat $(@:$1=$(YACC_HEADER_SUFFIX)) >> $(@:$1=.h)
echo '#endif' >> $(@:$1=.h)
rm -f $(@:$1=$(YACC_HEADER_SUFFIX))
endef
$(intermediates)/glsl/glsl_lexer.cpp: $(LOCAL_PATH)/glsl/glsl_lexer.ll
$(call local-l-or-ll-to-c-or-cpp)
$(intermediates)/glsl/glsl_parser.cpp: $(LOCAL_PATH)/glsl/glsl_parser.yy
$(call local-yy-to-cpp-and-h,.cpp)
$(intermediates)/glsl/glcpp/glcpp-lex.c: $(LOCAL_PATH)/glsl/glcpp/glcpp-lex.l
$(call local-l-or-ll-to-c-or-cpp)
$(intermediates)/glsl/glcpp/glcpp-parse.c: $(LOCAL_PATH)/glsl/glcpp/glcpp-parse.y
$(call glsl_local-y-to-c-and-h)

View File

@@ -36,39 +36,19 @@ include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
$(LIBGLCPP_FILES) \
$(LIBGLSL_FILES) \
$(NIR_FILES)
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/compiler/nir \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/mesa \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary
LOCAL_STATIC_LIBRARIES := libmesa_compiler
LOCAL_MODULE := libmesa_glsl
include $(LOCAL_PATH)/Android.gen.mk
include $(LOCAL_PATH)/Android.glsl.gen.mk
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)
# ---------------------------------------
# Build glsl_compiler
# ---------------------------------------
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
$(GLSL_COMPILER_CXX_FILES)
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/mesa \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary
LOCAL_STATIC_LIBRARIES := libmesa_glsl libmesa_glsl_utils libmesa_util
LOCAL_MODULE_TAGS := eng
LOCAL_MODULE := glsl_compiler
include $(MESA_COMMON_MK)
include $(BUILD_EXECUTABLE)

48
src/compiler/Android.mk Normal file
View File

@@ -0,0 +1,48 @@
# Mesa 3-D graphics library
#
# Copyright (C) 2015 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
LOCAL_PATH := $(call my-dir)
include $(LOCAL_PATH)/Makefile.sources
# ---------------------------------------
# Build libmesa_compiler
# ---------------------------------------
include $(CLEAR_VARS)
LOCAL_SRC_FILES := $(LIBCOMPILER_FILES)
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/mesa \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary
LOCAL_MODULE := libmesa_compiler
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)
include $(LOCAL_PATH)/Android.glsl.mk
include $(LOCAL_PATH)/Android.nir.mk

View File

@@ -32,55 +32,20 @@ intermediates := $(call local-generated-sources-dir)
LOCAL_SRC_FILES := $(LOCAL_SRC_FILES)
LOCAL_C_INCLUDES += \
$(intermediates)/glcpp \
$(intermediates)/nir \
$(MESA_TOP)/src/glsl/glcpp \
$(MESA_TOP)/src/glsl/nir
$(MESA_TOP)/src/compiler/nir
LOCAL_EXPORT_C_INCLUDE_DIRS += \
$(intermediates)/nir \
$(MESA_TOP)/src/glsl/nir
$(MESA_TOP)/src/compiler/nir
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \
$(LIBGLCPP_GENERATED_FILES) \
$(NIR_GENERATED_FILES) \
$(LIBGLSL_GENERATED_CXX_FILES))
$(NIR_GENERATED_FILES))
define local-l-or-ll-to-c-or-cpp
@mkdir -p $(dir $@)
@echo "Mesa Lex: $(PRIVATE_MODULE) <= $<"
$(hide) $(LEX) --nounistd -o$@ $<
endef
define glsl_local-y-to-c-and-h
@mkdir -p $(dir $@)
@echo "Mesa Yacc: $(PRIVATE_MODULE) <= $<"
$(hide) $(YACC) -o $@ -p "glcpp_parser_" $<
endef
define local-yy-to-cpp-and-h
@mkdir -p $(dir $@)
@echo "Mesa Yacc: $(PRIVATE_MODULE) <= $<"
$(hide) $(YACC) -p "_mesa_glsl_" -o $@ $<
touch $(@:$1=$(YACC_HEADER_SUFFIX))
echo '#ifndef '$(@F:$1=_h) > $(@:$1=.h)
echo '#define '$(@F:$1=_h) >> $(@:$1=.h)
cat $(@:$1=$(YACC_HEADER_SUFFIX)) >> $(@:$1=.h)
echo '#endif' >> $(@:$1=.h)
rm -f $(@:$1=$(YACC_HEADER_SUFFIX))
endef
$(intermediates)/glsl_lexer.cpp: $(LOCAL_PATH)/glsl_lexer.ll
$(call local-l-or-ll-to-c-or-cpp)
$(intermediates)/glsl_parser.cpp: $(LOCAL_PATH)/glsl_parser.yy
$(call local-yy-to-cpp-and-h,.cpp)
$(intermediates)/glcpp/glcpp-lex.c: $(LOCAL_PATH)/glcpp/glcpp-lex.l
$(call local-l-or-ll-to-c-or-cpp)
$(intermediates)/glcpp/glcpp-parse.c: $(LOCAL_PATH)/glcpp/glcpp-parse.y
$(call glsl_local-y-to-c-and-h)
# Modules using libmesa_nir must set LOCAL_GENERATED_SOURCES to this
MESA_GEN_NIR_H := $(addprefix $(call local-generated-sources-dir)/, \
nir/nir_opcodes.h \
nir/nir_builder_opcodes.h)
nir_builder_opcodes_gen := $(LOCAL_PATH)/nir/nir_builder_opcodes_h.py
nir_builder_opcodes_deps := \

View File

@@ -0,0 +1,49 @@
# Mesa 3-D graphics library
#
# Copyright (C) 2015 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
LOCAL_PATH := $(call my-dir)
include $(LOCAL_PATH)/Makefile.sources
# ---------------------------------------
# Build libmesa_nir
# ---------------------------------------
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
$(NIR_FILES)
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/mesa \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary
LOCAL_STATIC_LIBRARIES := libmesa_compiler
LOCAL_MODULE := libmesa_nir
include $(LOCAL_PATH)/Android.nir.gen.mk
include $(MESA_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

63
src/compiler/Makefile.am Normal file
View File

@@ -0,0 +1,63 @@
#
# Copyright © 2012 Jon TURNEY
# Copyright (C) 2015 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
include Makefile.sources
AM_CPPFLAGS = \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src \
-I$(top_srcdir)/src/mapi \
-I$(top_srcdir)/src/mesa/ \
-I$(top_builddir)/src/compiler/glsl\
-I$(top_srcdir)/src/compiler/glsl\
-I$(top_srcdir)/src/compiler/glsl/glcpp\
-I$(top_builddir)/src/compiler/nir \
-I$(top_srcdir)/src/compiler/nir \
-I$(top_srcdir)/src/gallium/include \
-I$(top_srcdir)/src/gallium/auxiliary \
-I$(top_srcdir)/src/gtest/include \
$(DEFINES)
AM_CFLAGS = \
$(VISIBILITY_CFLAGS) \
$(MSVC2013_COMPAT_CFLAGS)
AM_CXXFLAGS = \
$(VISIBILITY_CXXFLAGS) \
$(MSVC2013_COMPAT_CXXFLAGS)
noinst_LTLIBRARIES = libcompiler.la
libcompiler_la_SOURCES = $(LIBCOMPILER_FILES)
check_PROGRAMS =
TESTS =
BUILT_SOURCES =
CLEANFILES =
EXTRA_DIST = SConscript
MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
include Makefile.glsl.am
include Makefile.nir.am

View File

@@ -0,0 +1,224 @@
#
# Copyright © 2012 Jon TURNEY
# Copyright (C) 2015 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
EXTRA_DIST += glsl/tests glsl/glcpp/tests glsl/README \
glsl/TODO glsl/glcpp/README \
glsl/glsl_lexer.ll \
glsl/glsl_parser.yy \
glsl/glcpp/glcpp-lex.l \
glsl/glcpp/glcpp-parse.y \
SConscript.glsl
TESTS += glsl/glcpp/tests/glcpp-test \
glsl/glcpp/tests/glcpp-test-cr-lf \
glsl/tests/blob-test \
glsl/tests/general-ir-test \
glsl/tests/optimization-test \
glsl/tests/sampler-types-test \
glsl/tests/uniform-initializer-test \
glsl/tests/warnings-test
TESTS_ENVIRONMENT= \
export PYTHON2=$(PYTHON2); \
export PYTHON_FLAGS=$(PYTHON_FLAGS);
check_PROGRAMS += \
glsl/glcpp/glcpp \
glsl/glsl_test \
glsl/tests/blob-test \
glsl/tests/general-ir-test \
glsl/tests/sampler-types-test \
glsl/tests/uniform-initializer-test
noinst_PROGRAMS = glsl_compiler
glsl_tests_blob_test_SOURCES = \
glsl/tests/blob_test.c
glsl_tests_blob_test_LDADD = \
glsl/libglsl.la
glsl_tests_general_ir_test_SOURCES = \
glsl/tests/builtin_variable_test.cpp \
glsl/tests/invalidate_locations_test.cpp \
glsl/tests/general_ir_test.cpp \
glsl/tests/varyings_test.cpp
glsl_tests_general_ir_test_CFLAGS = \
$(PTHREAD_CFLAGS)
glsl_tests_general_ir_test_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
glsl/libglsl.la \
glsl/libstandalone.la \
$(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
glsl_tests_uniform_initializer_test_SOURCES = \
glsl/tests/copy_constant_to_storage_tests.cpp \
glsl/tests/set_uniform_initializer_tests.cpp \
glsl/tests/uniform_initializer_utils.cpp \
glsl/tests/uniform_initializer_utils.h
glsl_tests_uniform_initializer_test_CFLAGS = \
$(PTHREAD_CFLAGS)
glsl_tests_uniform_initializer_test_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
glsl/libglsl.la \
$(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
glsl_tests_sampler_types_test_SOURCES = \
glsl/tests/sampler_types_test.cpp
glsl_tests_sampler_types_test_CFLAGS = \
$(PTHREAD_CFLAGS)
glsl_tests_sampler_types_test_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
glsl/libglsl.la \
$(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
noinst_LTLIBRARIES += glsl/libglsl.la glsl/libglcpp.la glsl/libstandalone.la
glsl_libglcpp_la_LIBADD = \
$(top_builddir)/src/util/libmesautil.la
glsl_libglcpp_la_SOURCES = \
glsl/glcpp/glcpp-lex.c \
glsl/glcpp/glcpp-parse.c \
glsl/glcpp/glcpp-parse.h \
$(LIBGLCPP_FILES)
glsl_glcpp_glcpp_SOURCES = \
glsl/glcpp/glcpp.c
glsl_glcpp_glcpp_LDADD = \
glsl/libglcpp.la \
$(top_builddir)/src/libglsl_util.la \
-lm
glsl_libglsl_la_LIBADD = \
nir/libnir.la \
glsl/libglcpp.la
glsl_libglsl_la_SOURCES = \
glsl/glsl_lexer.cpp \
glsl/glsl_parser.cpp \
glsl/glsl_parser.h \
$(LIBGLSL_FILES)
glsl_libstandalone_la_SOURCES = \
$(GLSL_COMPILER_CXX_FILES)
glsl_libstandalone_la_LIBADD = \
glsl/libglsl.la \
$(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/util/libmesautil.la \
$(PTHREAD_LIBS)
glsl_compiler_SOURCES = \
glsl/main.cpp
glsl_compiler_LDADD = \
glsl/libstandalone.la
glsl_glsl_test_SOURCES = \
glsl/test.cpp \
glsl/test_optpass.cpp \
glsl/test_optpass.h
glsl_glsl_test_LDADD = \
glsl/libglsl.la \
glsl/libstandalone.la \
$(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
# We write our own rules for yacc and lex below. We'd rather use automake,
# but automake makes it especially difficult for a number of reasons:
#
# * < automake-1.12 generates .h files from .yy and .ypp files, but
# >=automake-1.12 generates .hh and .hpp files respectively. There's no
# good way of making a project that uses C++ yacc files compatible with
# both versions of automake. Strong work automake developers.
#
# * Since we're generating code from .l/.y files in a subdirectory (glcpp/)
# we'd like the resulting generated code to also go in glcpp/ for purposes
# of distribution. Automake gives no way to do this.
#
# * Since we're building multiple yacc parsers into one library (and via one
# Makefile) we have to use per-target YFLAGS. Using per-target YFLAGS causes
# automake to name the resulting generated code as <library-name>_filename.c.
# Frankly, that's ugly and we don't want a libglcpp_glcpp_parser.h file.
# In order to make build output print "LEX" and "YACC", we reproduce the
# automake variables below.
AM_V_LEX = $(am__v_LEX_$(V))
am__v_LEX_ = $(am__v_LEX_$(AM_DEFAULT_VERBOSITY))
am__v_LEX_0 = @echo " LEX " $@;
am__v_LEX_1 =
AM_V_YACC = $(am__v_YACC_$(V))
am__v_YACC_ = $(am__v_YACC_$(AM_DEFAULT_VERBOSITY))
am__v_YACC_0 = @echo " YACC " $@;
am__v_YACC_1 =
YACC_GEN = $(AM_V_YACC)$(YACC) $(YFLAGS)
LEX_GEN = $(AM_V_LEX)$(LEX) $(LFLAGS)
glsl/glsl_parser.cpp glsl/glsl_parser.h: glsl/glsl_parser.yy
$(MKDIR_GEN)
$(YACC_GEN) -o $@ -p "_mesa_glsl_" --defines=$(builddir)/glsl/glsl_parser.h $(srcdir)/glsl/glsl_parser.yy
glsl/glsl_lexer.cpp: glsl/glsl_lexer.ll
$(MKDIR_GEN)
$(LEX_GEN) -o $@ $(srcdir)/glsl/glsl_lexer.ll
glsl/glcpp/glcpp-parse.c glsl/glcpp/glcpp-parse.h: glsl/glcpp/glcpp-parse.y
$(MKDIR_GEN)
$(YACC_GEN) -o $@ -p "glcpp_parser_" --defines=$(builddir)/glsl/glcpp/glcpp-parse.h $(srcdir)/glsl/glcpp/glcpp-parse.y
glsl/glcpp/glcpp-lex.c: glsl/glcpp/glcpp-lex.l
$(MKDIR_GEN)
$(LEX_GEN) -o $@ $(srcdir)/glsl/glcpp/glcpp-lex.l
# Only the parsers (specifically the header files generated at the same time)
# need to be in BUILT_SOURCES. Though if we list the parser headers YACC is
# called for the .c/.cpp file and the .h files. By listing the .c/.cpp files
# YACC is only executed once for each parser. The rest of the generated code
# will be created at the appropriate times according to standard automake
# dependency rules.
BUILT_SOURCES += \
glsl/glsl_parser.cpp \
glsl/glsl_lexer.cpp \
glsl/glcpp/glcpp-parse.c \
glsl/glcpp/glcpp-lex.c
CLEANFILES += \
glsl/glcpp/glcpp-parse.h \
glsl/glsl_parser.h \
glsl/glsl_parser.cpp \
glsl/glsl_lexer.cpp \
glsl/glcpp/glcpp-parse.c \
glsl/glcpp/glcpp-lex.c
clean-local:
$(RM) -r subtest-cr subtest-cr-lf subtest-lf subtest-lf-cr
dist-hook:
$(RM) glsl/glcpp/tests/*.out
$(RM) glsl/glcpp/tests/subtest*/*.out

View File

@@ -0,0 +1,89 @@
#
# Copyright © 2012 Jon TURNEY
# Copyright (C) 2015 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
noinst_LTLIBRARIES += nir/libnir.la
nir_libnir_la_LIBADD = \
libcompiler.la
nir_libnir_la_SOURCES = \
$(NIR_FILES) \
$(SPIRV_FILES) \
$(NIR_GENERATED_FILES)
PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
nir/nir_builder_opcodes.h: nir/nir_opcodes.py nir/nir_builder_opcodes_h.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_builder_opcodes_h.py > $@ || ($(RM) $@; false)
nir/nir_constant_expressions.c: nir/nir_opcodes.py nir/nir_constant_expressions.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_constant_expressions.py > $@ || ($(RM) $@; false)
nir/nir_opcodes.h: nir/nir_opcodes.py nir/nir_opcodes_h.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_opcodes_h.py > $@ || ($(RM) $@; false)
nir/nir_opcodes.c: nir/nir_opcodes.py nir/nir_opcodes_c.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_opcodes_c.py > $@ || ($(RM) $@; false)
nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@ || ($(RM) $@; false)
check_PROGRAMS += nir/tests/control_flow_tests
nir_tests_control_flow_tests_CPPFLAGS = \
$(AM_CPPFLAGS) \
-I$(top_builddir)/src/compiler/nir \
-I$(top_srcdir)/src/compiler/nir
nir_tests_control_flow_tests_SOURCES = \
nir/tests/control_flow_tests.cpp
nir_tests_control_flow_tests_CFLAGS = \
$(PTHREAD_CFLAGS)
nir_tests_control_flow_tests_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
nir/libnir.la \
$(top_builddir)/src/util/libmesautil.la \
$(PTHREAD_LIBS)
TESTS += nir/tests/control_flow_tests
BUILT_SOURCES += $(NIR_GENERATED_FILES)
CLEANFILES += $(NIR_GENERATED_FILES)
EXTRA_DIST += \
nir/nir_algebraic.py \
nir/nir_builder_opcodes_h.py \
nir/nir_constant_expressions.py \
nir/nir_opcodes.py \
nir/nir_opcodes_c.py \
nir/nir_opcodes_h.py \
nir/nir_opt_algebraic.py \
nir/tests

View File

@@ -0,0 +1,255 @@
LIBCOMPILER_FILES = \
builtin_type_macros.h \
glsl_types.cpp \
glsl_types.h \
nir_types.cpp \
nir_types.h \
shader_enums.c \
shader_enums.h
# libglsl
LIBGLSL_FILES = \
glsl/ast.h \
glsl/ast_array_index.cpp \
glsl/ast_expr.cpp \
glsl/ast_function.cpp \
glsl/ast_to_hir.cpp \
glsl/ast_type.cpp \
glsl/blob.c \
glsl/blob.h \
glsl/builtin_functions.cpp \
glsl/builtin_types.cpp \
glsl/builtin_variables.cpp \
glsl/glsl_parser_extras.cpp \
glsl/glsl_parser_extras.h \
glsl/glsl_symbol_table.cpp \
glsl/glsl_symbol_table.h \
glsl/glsl_to_nir.cpp \
glsl/glsl_to_nir.h \
glsl/hir_field_selection.cpp \
glsl/ir_basic_block.cpp \
glsl/ir_basic_block.h \
glsl/ir_builder.cpp \
glsl/ir_builder.h \
glsl/ir_clone.cpp \
glsl/ir_constant_expression.cpp \
glsl/ir.cpp \
glsl/ir.h \
glsl/ir_equals.cpp \
glsl/ir_expression_flattening.cpp \
glsl/ir_expression_flattening.h \
glsl/ir_function_can_inline.cpp \
glsl/ir_function_detect_recursion.cpp \
glsl/ir_function_inlining.h \
glsl/ir_function.cpp \
glsl/ir_hierarchical_visitor.cpp \
glsl/ir_hierarchical_visitor.h \
glsl/ir_hv_accept.cpp \
glsl/ir_import_prototypes.cpp \
glsl/ir_optimization.h \
glsl/ir_print_visitor.cpp \
glsl/ir_print_visitor.h \
glsl/ir_reader.cpp \
glsl/ir_reader.h \
glsl/ir_rvalue_visitor.cpp \
glsl/ir_rvalue_visitor.h \
glsl/ir_set_program_inouts.cpp \
glsl/ir_uniform.h \
glsl/ir_validate.cpp \
glsl/ir_variable_refcount.cpp \
glsl/ir_variable_refcount.h \
glsl/ir_visitor.h \
glsl/linker.cpp \
glsl/linker.h \
glsl/link_atomics.cpp \
glsl/link_functions.cpp \
glsl/link_interface_blocks.cpp \
glsl/link_uniforms.cpp \
glsl/link_uniform_initializers.cpp \
glsl/link_uniform_block_active_visitor.cpp \
glsl/link_uniform_block_active_visitor.h \
glsl/link_uniform_blocks.cpp \
glsl/link_varyings.cpp \
glsl/link_varyings.h \
glsl/list.h \
glsl/loop_analysis.cpp \
glsl/loop_analysis.h \
glsl/loop_controls.cpp \
glsl/loop_unroll.cpp \
glsl/lower_buffer_access.cpp \
glsl/lower_buffer_access.h \
glsl/lower_const_arrays_to_uniforms.cpp \
glsl/lower_discard.cpp \
glsl/lower_discard_flow.cpp \
glsl/lower_distance.cpp \
glsl/lower_if_to_cond_assign.cpp \
glsl/lower_instructions.cpp \
glsl/lower_jumps.cpp \
glsl/lower_mat_op_to_vec.cpp \
glsl/lower_noise.cpp \
glsl/lower_offset_array.cpp \
glsl/lower_packed_varyings.cpp \
glsl/lower_named_interface_blocks.cpp \
glsl/lower_packing_builtins.cpp \
glsl/lower_subroutine.cpp \
glsl/lower_tess_level.cpp \
glsl/lower_texture_projection.cpp \
glsl/lower_variable_index_to_cond_assign.cpp \
glsl/lower_vec_index_to_cond_assign.cpp \
glsl/lower_vec_index_to_swizzle.cpp \
glsl/lower_vector.cpp \
glsl/lower_vector_derefs.cpp \
glsl/lower_vector_insert.cpp \
glsl/lower_vertex_id.cpp \
glsl/lower_output_reads.cpp \
glsl/lower_shared_reference.cpp \
glsl/lower_ubo_reference.cpp \
glsl/opt_algebraic.cpp \
glsl/opt_array_splitting.cpp \
glsl/opt_conditional_discard.cpp \
glsl/opt_constant_folding.cpp \
glsl/opt_constant_propagation.cpp \
glsl/opt_constant_variable.cpp \
glsl/opt_copy_propagation.cpp \
glsl/opt_copy_propagation_elements.cpp \
glsl/opt_dead_builtin_variables.cpp \
glsl/opt_dead_builtin_varyings.cpp \
glsl/opt_dead_code.cpp \
glsl/opt_dead_code_local.cpp \
glsl/opt_dead_functions.cpp \
glsl/opt_flatten_nested_if_blocks.cpp \
glsl/opt_flip_matrices.cpp \
glsl/opt_function_inlining.cpp \
glsl/opt_if_simplification.cpp \
glsl/opt_minmax.cpp \
glsl/opt_noop_swizzle.cpp \
glsl/opt_rebalance_tree.cpp \
glsl/opt_redundant_jumps.cpp \
glsl/opt_structure_splitting.cpp \
glsl/opt_swizzle_swizzle.cpp \
glsl/opt_tree_grafting.cpp \
glsl/opt_vectorize.cpp \
glsl/program.h \
glsl/propagate_invariance.cpp \
glsl/s_expression.cpp \
glsl/s_expression.h
# glsl_compiler
GLSL_COMPILER_CXX_FILES = \
glsl/standalone_scaffolding.cpp \
glsl/standalone_scaffolding.h \
glsl/standalone.cpp \
glsl/standalone.h
# libglsl generated sources
LIBGLSL_GENERATED_CXX_FILES = \
glsl/glsl_lexer.cpp \
glsl/glsl_parser.cpp
# libglcpp
LIBGLCPP_FILES = \
glsl/glcpp/glcpp.h \
glsl/glcpp/pp.c
LIBGLCPP_GENERATED_FILES = \
glsl/glcpp/glcpp-lex.c \
glsl/glcpp/glcpp-parse.c
NIR_GENERATED_FILES = \
nir/nir_builder_opcodes.h \
nir/nir_constant_expressions.c \
nir/nir_opcodes.c \
nir/nir_opcodes.h \
nir/nir_opt_algebraic.c
NIR_FILES = \
nir/nir.c \
nir/nir.h \
nir/nir_array.h \
nir/nir_builder.h \
nir/nir_clone.c \
nir/nir_constant_expressions.h \
nir/nir_control_flow.c \
nir/nir_control_flow.h \
nir/nir_control_flow_private.h \
nir/nir_dominance.c \
nir/nir_from_ssa.c \
nir/nir_gather_info.c \
nir/nir_gs_count_vertices.c \
nir/nir_inline_functions.c \
nir/nir_instr_set.c \
nir/nir_instr_set.h \
nir/nir_intrinsics.c \
nir/nir_intrinsics.h \
nir/nir_liveness.c \
nir/nir_lower_alu_to_scalar.c \
nir/nir_lower_atomics.c \
nir/nir_lower_bitmap.c \
nir/nir_lower_clamp_color_outputs.c \
nir/nir_lower_clip.c \
nir/nir_lower_double_ops.c \
nir/nir_lower_double_packing.c \
nir/nir_lower_drawpixels.c \
nir/nir_lower_global_vars_to_local.c \
nir/nir_lower_gs_intrinsics.c \
nir/nir_lower_load_const_to_scalar.c \
nir/nir_lower_locals_to_regs.c \
nir/nir_lower_idiv.c \
nir/nir_lower_indirect_derefs.c \
nir/nir_lower_io.c \
nir/nir_lower_io_to_temporaries.c \
nir/nir_lower_io_types.c \
nir/nir_lower_passthrough_edgeflags.c \
nir/nir_lower_phis_to_scalar.c \
nir/nir_lower_returns.c \
nir/nir_lower_samplers.c \
nir/nir_lower_system_values.c \
nir/nir_lower_tex.c \
nir/nir_lower_to_source_mods.c \
nir/nir_lower_two_sided_color.c \
nir/nir_lower_vars_to_ssa.c \
nir/nir_lower_var_copies.c \
nir/nir_lower_vec_to_movs.c \
nir/nir_lower_wpos_center.c \
nir/nir_lower_wpos_ytransform.c \
nir/nir_metadata.c \
nir/nir_move_vec_src_uses_to_dest.c \
nir/nir_normalize_cubemap_coords.c \
nir/nir_opt_constant_folding.c \
nir/nir_opt_copy_propagate.c \
nir/nir_opt_cse.c \
nir/nir_opt_dce.c \
nir/nir_opt_dead_cf.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_peephole_select.c \
nir/nir_opt_remove_phis.c \
nir/nir_opt_undef.c \
nir/nir_phi_builder.c \
nir/nir_phi_builder.h \
nir/nir_print.c \
nir/nir_remove_dead_variables.c \
nir/nir_repair_ssa.c \
nir/nir_search.c \
nir/nir_search.h \
nir/nir_split_var_copies.c \
nir/nir_sweep.c \
nir/nir_to_ssa.c \
nir/nir_validate.c \
nir/nir_vla.h \
nir/nir_worklist.c \
nir/nir_worklist.h
SPIRV_FILES = \
spirv/GLSL.std.450.h \
spirv/nir_spirv.h \
spirv/spirv.h \
spirv/spirv_to_nir.c \
spirv/vtn_alu.c \
spirv/vtn_cfg.c \
spirv/vtn_glsl450.c \
spirv/vtn_private.h \
spirv/vtn_variables.c

Some files were not shown because too many files have changed in this diff Show More