Compare commits

..

382 Commits

Author SHA1 Message Date
Timothy Arceri
b010fa8567 glsl: make sure UBO arrays are sized in ES
This check was removed in 5b2675093e add it back in.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
https://bugs.freedesktop.org/show_bug.cgi?id=96349
2016-06-14 11:33:24 +10:00
Vedran Miletić
4825264f75 clover: Update OpenCL version string to match OpenGL
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.

v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-06-13 15:55:59 -07:00
Francisco Jerez
bd9f972651 i965/fs: Fix regs_written for SIMD-lowered instructions some more.
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end.  Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly.  Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):

 spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
 spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-13 15:55:59 -07:00
Francisco Jerez
a84b5d43e2 i965: Fix cross-primitive scratch corruption when changing the per-thread allocation.
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.

I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original.  Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.

Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9.  The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Francisco Jerez
d960284e44 i965: Keep track of the per-thread scratch allocation in brw_stage_state.
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.

v2: Handle BO allocation manually instead of relying on
    brw_get_scratch_bo (Ken).

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Francisco Jerez
013ae4a70a i965: Fix scratch overallocation if the original slot size was already a power of two.
The bitwise arithmetic trick used in brw_get_scratch_size() to clamp
the scratch allocation to 1KB has the unintended side effect that it
will cause us to allocate 2x the required amount of scratch space if
the original per-thread scratch size happened to be already a power of
two.  Instead use the obvious MAX2 idiom to clamp the scratch
allocation to the expected range.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 15:55:58 -07:00
Kenneth Graunke
2df8f4a253 mesa: Make TexSubImage check negative dimensions sooner.
Two dEQP tests expect INVALID_VALUE errors for negative width/height
parameters, but get INVALID_OPERATION because they haven't actually
created a destination image.  This is arguably not a bug in Mesa, as
there's no specified ordering of error conditions.

However, it's also really easy to make the tests pass, and there's
no real harm in doing these checks earlier.

Fixes:
dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_width_height
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texsubimage3d_neg_width_height

v2: Drop redundant check (caught by Anuj Phogat).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-06-13 15:38:47 -07:00
Brian Paul
cf9bb9acac util: update some assertions in util_resource_copy_region()
To cope with copies of compressed images which are not multiples of
the block size.  Suggested by Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@sroland@vmware.com>
2016-06-13 13:30:19 -06:00
Kenneth Graunke
5a0d294d38 i965: Fix encode_slm_size() to take a generation, not a device info.
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct.  I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this.  But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly.  This led to crashes.

Brown paper bag fix for commit 87d062a940.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 12:23:11 -07:00
Kenneth Graunke
667e5cec76 i965: Don't leak scratch BOs for TCS/TES.
These need to be freed too.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-13 12:22:06 -07:00
Nanley Chery
a4a5917248 anv/pipeline: Don't dereference NULL dynamic state pointers
Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
of pCreateInfo members are moved to the earliest points at which they
should not be NULL.

This fixes a segfault seen in the McNopper demo, VKTS_Example09.

v3 (Jason Ekstrand):
   - Fix disabled rasterization check
   - Revert opaque detection of color attachment usage

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 11:35:45 -07:00
Nanley Chery
a0d84a9ef9 anv: Document and rename anv_pipeline_init_dynamic_state()
To reduce confusion, clarify that the state being copied is not dynamic.

This agrees with the Vulkan spec's usage of the term. Various sections
specify that the various pipeline state which have VkDynamicState enums
(e.g. viewport, scissor, etc.) may or may not be dynamic.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 11:35:45 -07:00
Samuel Pitoiset
7f257abc1b nvc0/ir: clamp the UBO index for compute on Kepler
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-13 20:12:48 +02:00
Marek Olšák
6e1b12c788 radeonsi: enable scratch coalescing
This makes one particular compute shader 8x faster.

Latest LLVM git is required.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-13 18:13:51 +02:00
Jimmy Berry
0c0f841e5d st/va: hardlink driver instances to gallium_drv_video.so
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.

Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Jan Vesely
1fb4179f92 vl: Fix trivial sign compare warnings
v2: add whitepace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
[Emil Velikov: squash a few more whitespace issues]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Rob Herring
112e988329 Android: move libdrm settings to top-level Android.common.mk
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:

external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;

HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Rob Herring
54e550ab8a Android: disable some noisy warnings
Turn off warnings for -Wpointer-arith, -Wno-missing-field-initializers,
-Wno-initializer-overrides, and -Wno-mismatched-tags. These are all deemed
pointless, on purpose or no plans to fix.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Emil Velikov
db8790c0da st/mesa: inline _mesa_create_context() into its only caller
Inline the function into it's only caller. This way it's more obvious
how the classic and gallium drivers (st/mesa) use _mesa_initialize_context.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:29 +01:00
Emil Velikov
a4fa8bf819 st/mesa: remove unneeded break from st_api_create_context()
We have return on the previous line, thus the break will never be
reached.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
6406bc1592 st/mesa: use c99 initializer for st_gl_api
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
15bc7856bf gallium: remove st_api::get_proc_address hook
It has been unused for a long time, plus makes the gallium dri modules
require an extra glapi symbol relative to their classic counterparts.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
23a7fca6aa mesa: remove _mesa_init_get_hash()
The actual code of the function print_table_stats() is guarded
by a ifdef GET_DEBUG, which was not been defined in years.

The last fix in 2013 (7db6b5aa91) indicates that it's rarely
used/tested. Since the issue has gone unnoticed for a whole year
(broken with 2ad4a47547).

Let's remove it for now. We can always revive it at a later stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
b81685eb32 mesa: kill off _mesa_do_init_remap_table()
... and inline its contents in _mesa_init_remap_table().

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
bfbf286f7d mesa: use native types when possible
All of the functions and related data is internal, so there's no point
if using the GL types.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
3f80c95f35 mesa: make _mesa_map_function_spec() static
Used only locally.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
390678f27d mesa: remove used _mesa_get_function_spec() and gl_function_remap
Final user was killed with last commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
5b700059a8 mesa: remove unused _mesa_map_function_array()
Unused as of commit 5a175127f3 ("dri: Remove all extension enabling
utility functions") and the patch before the previous patch.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
5378ee8187 glapi: remap_helper.py: remove MESA_alt_functions
The final user was nuked with last commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
b5dd8e0cf8 mesa: remove unused function _mesa_map_static_functions()
Unused as of commit 5a175127f3 ("dri: Remove all extension enabling
utility functions")

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-13 15:31:28 +01:00
Emil Velikov
07ae8c7df7 dri/common: remove unused libdri_test_stubs.la
... and associated file(s).

No longer needed since commit 057259655e ("i965: Don't link libmesa or
libdri_test_stubs into tests")

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-13 15:31:27 +01:00
Emil Velikov
fcb5a75a66 swr: automake: add missing -I flag
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.

Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:24 +01:00
Emil Velikov
f4d26856df automake: add SWR to `make distcheck' gallium drivers
Will allows us to catch missing files and build issues before getting
the tarball out for general consumption.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:24:44 +01:00
Emil Velikov
bab5ab6940 configure.ac: strip out the llvm-config -march/mtune flags
Otherwise drivers such as SWR that depend on providing their own values
will fail to build.

v2: Add -mcpu for good measure (Chuck)

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chuck Atkins <chuck.atkins@kitware.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
2016-06-13 15:24:44 +01:00
Chuck Atkins
c86fcaca72 swr: Add missing headers for package inclusion
CC: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:24:44 +01:00
Emil Velikov
8229fe68b5 automake: get in-tree `make distclean' working again.
With earlier commit we've handled the `make distclean' out of tree
build, yet we failed to attribute that for in-tree builds the test
condition will return 1. Thus effectively the target will be considered
as "failed".

Fixes: b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building
OOT")
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-13 15:24:44 +01:00
Jan Vesely
ace70aedcf gallivm: Fix trivial sign warnings
v2: include whitespace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-13 09:23:09 -04:00
Julien Isorce
a04804746f st/va: use proper temp pipe_video_buffer template
Instead of changing the format on the existing template
which makes error handling not nice and confuses coverity.

CoverityID: 1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-13 09:14:32 +01:00
Julien Isorce
6c43e0016e st/va: it is valid to release the VABuffer of an exported resource
pipe_resource_reference(&res, NULL) will decrement reference counting,
i.e. p_atomic_dec(res->count). But the va surface still has the initial
reference since it has created the resource. So calling vaDestroyImage
on a derived image calls VaDestroyBuffer but the decrementation won't
reach 0. It is just wrong for vlVaDestroyBuffer to rely on the
export_refcount flag. Finally the vaapi intel driver has the same logic.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-13 09:14:32 +01:00
Timothy Arceri
30df78236c glsl: fix component overlap validation for doubles
This change makes sure to remove arrays when checking if type
is a double.

The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-12 21:56:32 +10:00
Timothy Arceri
ad3def919e glsl: fix max varyings count for ARB_enhanced_layouts
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.

Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-12 21:56:28 +10:00
Kenneth Graunke
0fb85ac08d i965: Use the correct number of threads for compute shaders.
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.

Thanks to Curro and Jordan for helping track this down!

On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.

On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
  0.255654% (n=25).

On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
  by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
  0.432771% (n=64).

On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
  by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:15 -07:00
Kenneth Graunke
1db37ebecf i965: Assert that the scratch spaces are in range.
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.

We should really fix this properly.  However, the pitfall has existed
for ages, so for now we should at least detect it.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:15 -07:00
Kenneth Graunke
a42a93dc12 i965: Fix CS scratch size calculations on Ivybridge and Baytrail.
These are linear, not powers of two, and much more limited.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:14 -07:00
Kenneth Graunke
147a90d82a i965: Fix Haswell CS per-thread scratch space encoding.
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB.  But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.

This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.

Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now.  A subsequent commit will take care of that.

Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:40:14 -07:00
Kenneth Graunke
a7d029d3df i965: Account for poor address calculations in Haswell CS scratch size.
Curro figured this out by investigating the simulator.  Apparently
there's also a workaround in the Windows driver.  I'm not sure it's
actually documented anywhere.

We were underallocating the scratch buffer by a factor of 128/70.

v2: Rename threads_per_subslice to scratch_ids_per_subslice
    (suggested by Jordan Justen).

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:39:45 -07:00
Kenneth Graunke
2213ffdb4b i965: Allocate scratch space for the maximum number of compute threads.
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.

Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing.  We need to allocate enough
for every possible thread that could run.

Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:50 -07:00
Kenneth Graunke
9cd8f95809 i965: Set subslice_total on Gen7/7.5 platforms.
We'll use this for compute shader thread counts and scratch space
calculations shortly.

Note that subslices are referred to as "half slices" on Ivybridge.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:47 -07:00
Kenneth Graunke
87d062a940 i965: Fix shared local memory size for Gen9+.
Skylake changes the representation of shared local memory size:

 Size   | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
 -------------------------------------------------------------------
 Gen7-8 |    0 | none | none |    1 |    2 |     4 |     8 |    16 |
 -------------------------------------------------------------------
 Gen9+  |    0 |    1 |    2 |    3 |    4 |     5 |     6 |     7 |

The old formula would substantially underallocate the amount of space.
This fixes GPU hangs on Skylake when running with full thread counts.

v2: Fix the Vulkan driver too, use a helper function, and fix the table
    in the comments and commit message.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-12 00:38:26 -07:00
Ilia Mirkin
3f48548a6f nv50: reinstate dedicated constbuf push path
This was disabled due to occasionally incorrect behavior when trying to
upload data. It later became apparent that nvc0 also had a similar but
slightly different issue, which was resolved in commit e50c01d5. This
takes the same logic as nvc0 and applies it to nv50 (which has somewhat
different interfaces).

Unfortunately I did not note down precisely what was broken with UBOs
when removing the support from nv50, but I've tested a bunch of local
traces, and none of them appear to regress. This should hopefully
improve performance when UBOs are used, but this was not directly
verified.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-11 12:18:43 -04:00
Ilia Mirkin
f47845596b nv50: enable indirect addressing of fragment shader inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-11 11:50:42 -04:00
Ilia Mirkin
7d7e015381 mesa: add drawbuffer argument to ClearNamedFramebufferfi
This was fixed in revision 47 of the ARB_dsa spec in Oct 22, 2015. Since
it's horrible to have differing APIs across library versions, we should
attempt to minimize the impact by backporting it as far as possible and
hope no one notices.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:32:03 -04:00
Ilia Mirkin
92351a71a8 GL: update glcorearb.h to svn 32433
This brings in the fixed glClearNamedFramebufferfi definition, as well
as a lot of GLsizei -> GLsizeiptr changes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:31:53 -04:00
Ilia Mirkin
f81374fd3e GL: update glext to svn 32957
This brings in defines from GL_EXT_window_rectangles and fixes the
glClearNamedFramebufferfi definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 20:24:53 -04:00
Brian Paul
5cfc91624c docs: GL_ARB_copy_image done for softpipe, llvmpipe
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-06-10 15:50:55 -06:00
Brian Paul
e9b86bb92c llvmpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
2db747cf26 llvmpipe: don't use 3-component formats, except 32-bit x 3 formats
This basically disallows all 8-bit x 3 and 16-bit x 3 formats for
textures and render targets.  Some 3-component formats were already
disallowed before.  This avoids problems with GL_ARB_copy_image.

v2: the previous version of this patch disallowed all 3-component formats

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
672e92a146 softpipe: turn on pipe cap for GL_ARB_copy_image support
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
d8fe6332d8 softpipe: don't use 3-component formats
Mesa and gallium don't have a complete set of matching 3-component
texture formats.  For example, 8-bit sRGB unorm.  To fully support
the GL_ARB_copy_image extension we need to have support for all of
these formats: RGB8_UNORM, RGB8_SNORM, RGB8_SRGB, RGB8_UINT, and
RGB8_SINT using the same component order.  Since we don't have that,
disable the 3-component formats for now.

v2: Simplify 3-component format check, per Marek.
Also check that target != PIPE_BUFFER.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
e295b4e800 st/mesa: tweak surface format mapping table
1. Try to choose R8G8B8A8 unorm/srgb formats before others in an
effort to try to match component ordering for UINT/SINT/etc.

2. If we can't get a format such as PIPE_FORMAT_A16_UNORM, try
PIPE_FORMAT_R16G16B16A16_UNORM before shallower formats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Brian Paul
dd4be2e19a util: update util_resource_copy_region() for GL_ARB_copy_image
This primarily means added support for copying between compressed
and uncompressed formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Anuj Phogat
466b320163 gallium: Fix region overlap conditions for rectangles with a shared edge
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."

So, the rectangles sharing just an edge shouldn't overlap.
 -----------
|           |
 ------- ---
|       |   |
|       |   |
 ------- ---

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 14:35:21 -07:00
Anuj Phogat
f8679badd4 mesa: Fix region overlap conditions for rectangles with a shared edge
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
 rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
 to the destination rectangle bounded by the locations (dstX0, dstY 0)
 and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
 while the upper bounds are exclusive."

So, the rectangles sharing just an edge shouldn't overlap.
     -----------
    |           |
     ------- ---
    |       |   |
    |       |   |
     ------- ---

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 14:35:21 -07:00
Dave Airlie
1584918996 gallivm: more 64-bit integer prep work.
This converts one other place to using the new helper.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:30 +10:00
Dave Airlie
f550b6d296 radeonsi: convert to 64-bitness checks instead of doubles.
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:21 +10:00
Dave Airlie
e5c57824ec gallivm: make non-float return code bitcast consistent.
This just uses the same form across the fetches.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:17 +10:00
Dave Airlie
3b97e50b9a gallium/gallivm: use 64-bit test instead of doubles.
This just makes some generic code that currently emits double
suitable for emitting 64-bit values.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:13 +10:00
Dave Airlie
213ab8db87 gallium/tgsi: add 64-bitness type check function.
Currently this just doubles, but we'll convert users to this
so making adding 64-bit integers easier.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:43:45 +10:00
Jason Ekstrand
8d37556ec9 anv/entrypoints: Rework #if guards
This reworks the #if guards a bit.  When Emil originally wrote them, he
just guarded everything.  However, part of what anv_entrypoints_gen.py
generates is a hash table for looking up entrypoints based on their name.
This table *cannot* get out of sync between C and python regardless of
preprocessor flags.  In order to prevent this, this commit makes us use
void pointers in the dispatch table for those entrypoints which aren't
available.  This means that the dispatch table size and entry order is
constant and it should never get out-of-sync with the python.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 13:21:07 -07:00
Jason Ekstrand
9ed0d9dd06 anv/entrypoints: Use the function pointer types provided by vulkan.h
This is a bit cleaner than generating the types ourselves when making the
table.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 13:21:07 -07:00
Nicolai Hähnle
42624ea837 st/mesa: use base level size as "guess" when available
When an applications specifies mip levels _before_ setting a mipmap texture
filter, we will initially guess a single texture level. When the second level
image is created, we try to allocate the full texture -- however, we get the
base level size guess wrong if that size is odd. This leads to yet another
re-allocation of the texture later during st_finalize_texture.

Even worse, this re-allocation breaks a (reasonable) assumption made by
st_generate_mipmaps, because the re-allocation in the finalization call will
again allocate a single-level pipe texture (based on the non-mipmap texture
filter!). As a result, mipmap generation fails in interesting ways.

All of this can be avoided by just using the fact that we already know the
size of the base level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95529
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-10 20:20:39 +02:00
Jason Ekstrand
a1e69930e4 anv: Remove the PhysicalDeviceLimits FINISHME
At this point, the limits are probably more-or-less correct.  If there is
an invalid limit, that's a bug not a FINSHME.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:45 -07:00
Jason Ekstrand
4f5bbf804b anv/pipeline_cache: Allow for an zero-sized cache
This gets ANV_ENABLE_PIPELINE_CACHE=false working again.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:10 -07:00
Jason Ekstrand
a1a25db699 anv/pipeline: Store the (set, binding, index) tripple in the bind map
This way the the bind map (which we're caching) is mostly independent of
the pipeline layout.  The only coupling remaining is that we pull the array
size of a binding out of the layout.  However, that size is also specified
in the shader and should always match so it's not really coupled.  This
rendering issues in Dota 2.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:07 -07:00
Jason Ekstrand
c13c5ac561 anv/descriptor_set: Ensure that bindings are always in increasing order
Since applications are allowed to specify some set of bindings which need
not be dense they also need not be in order.  For most things, this doesn't
matter, but it could result getting the wrong dynamic offsets. This adds a
quick-and-dirty sort to ensure that everything is always in increasing
order of binding index.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:43:03 -07:00
Jason Ekstrand
e2265926f2 anv/descriptor_set: Add a type field in debug builds
This allows for some extra validation and makes it easier to see what's
going on when poking around in gdb.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:42:59 -07:00
Jason Ekstrand
cd21015abd anv/descriptor_set: Set array_size to zero for non-existant descriptors
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 09:42:45 -07:00
Leo Liu
2ad443e4cc vl/dri3: support receiving new pixmap for front buffer
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.

This also fixes Totem player playback corruption.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:24 -04:00
Leo Liu
0ef8500aab vl/dri3: get Makefile properly
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.

This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:19 -04:00
Jose Fonseca
2b4cee0571 gallivm: Never emit llvm.fmuladd on LLVM 3.3.
Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't
handle llvm.fmuladd and instead it fall backs to a C function.  Which in
turn causes a segfault on Windows.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 16:17:04 +01:00
Jose Fonseca
320d1191c6 gallivm: Use llvm.fmuladd.*.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Jose Fonseca
9e8edfa190 util,gallivm: Explicitly enable/disable fma attribute.
As suggested by Roland Scheidegger.

Use the same logic as f16c, since fma requires VEX encoding.

But disable FMA on LLVM 3.3 without MCJIT.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Bas Nieuwenhuizen
54f755fa0f radeonsi: Reinitialize all descriptors in CE preamble.
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.

To illustrate suppose we have two graphics IB's 1 and 2, which  are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.

The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.

Fix this by always restoring the entire CE RAM.

v2: - Just load all descriptor set buffers instead of load and store the entire
      CE RAM.
    - Leave the ce_ram_dirty tracking in place for the non-preamble case.

v3: - Fixed parameter alignment.
    - Rebased to master (Nicolai's descriptor series).

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 12:18:29 +02:00
Jose Fonseca
f93c22109e mesa: Wrap extensions.h declarations with extern "C".
This should fix the MSVC linker failures that arose with commit
5e2d25894b.

Trivial.
2016-06-10 11:00:42 +01:00
Ilia Mirkin
f48f344700 st/mesa: fix type confusion with reladdrs
The reality is that this doesn't matter, because we manually emit the
ARL to the sampler reladdr, and those arguments don't get an extra load
later, so it's effectively just a boolean. However having the types be
wrong is confusing and could trigger very odd bugs should usage change
down the line.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-09 21:01:53 -04:00
Dave Airlie
f140ed6d95 glsl/ir: remove TABs in ir_constant_expression.cpp
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.

Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-10 10:30:18 +10:00
Anuj Phogat
73a54e4892 i965/gen9: Don't change halign and valign to fit in fast copy blit
An update in graphics specs has deleted the halign and valign fields
from XY_FAST_COPY_BLT command. See mesa commit 97f0f91.

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-06-09 15:50:07 -07:00
Anuj Phogat
46c8967813 mesa: Add a helper function for shared code in get_tex_rgba_{un}compressed
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-09 15:50:07 -07:00
Samuel Pitoiset
5e2d25894b mesa: Let compute shaders work in compatibility profiles
The extension is already advertised in compatibility profile, but
the _mesa_has_compute_shaders only returns true in core profile.
If we advertise it, we should allow it to work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-06-09 21:03:28 +02:00
Tim Rowley
2c85128e01 swr: implement clipPlanes/clipVertex/clipDistance/cullDistance
v2: only load the clip vertex once

v3: fix clip enable logic, add cullDistance

v4: remove duplicate fields in vs jit key, fix test of clip fixup needed

v5: fix clipdistance linkage for slot!=0,4

v6: support clip+cull; passes most piglit clip (failures understood)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-09 13:28:35 -05:00
Daniel Czarnowski
cf804b4455 glx: fix crash with bad fbconfig
GLX documentation states:
	glXCreateNewContext can generate the following errors: (...)
	GLXBadFBConfig if config is not a valid GLXFBConfig

Function checks if the given config is a valid config and sets proper
error code.

Fixes currently crashing glx-fbconfig-bad Piglit test.

v2: coding style cleanups (Emil, Topi)
    use DefaultScreen macro (Emil)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
2016-06-09 17:55:44 +03:00
Nayan Deshmukh
2d140ae70a st/vdpau: implement luma keying
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-09 14:23:24 +02:00
Nayan Deshmukh
f24eb5a178 vl: Apply luma key filter before CSC conversion
Apply the luma key filter to the YCbCr values during the CSC conversion
    in video buffer shader. The initial values of max and min luma are set
    to opposite values to disable the filter initially and will be set when
    enabling it.

    Add extra parmeters min and max luma for the luma key filter in
    vl_compositor_set_csc_matrix in va, xvmc. Setting them
    to opposite value 1.f and 0.f respectively won't effect the CSC
    conversion

    v2: -Squash 1,2 and 3 into one patch to avoid breaking build of
        other components. (Christian)
        -use ureg_swizzle. (Christian)
        -change name of the variables. (Christian)

    v3: -Squash all patches in one to avoid breaking of build. (Emil)
        -wrap functions properly. (Emil)
        -use 0.0f and 1.0f instead of 0.f and 1.f respectively. (Emil)

    v4: -Divide it in two patches one which introduces the functionality
	 and assigs dummy values to the changed functions and second which
	 implements the lumakey filter. (Christian)
	-use ureg_scalar instead ureg_swizzle. (Christian)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-09 14:23:07 +02:00
Jason Ekstrand
037ce5d734 i965: Emit surface states for extra planes prior to gen8
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier.  It all works, we just need to emit the states
for the extra planes.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-08 21:57:57 -07:00
Marc-André Lureau
dc81b3ad43 virgl: fix checking fences
When calling virgl_fence_wait() with timeout=0,
virgl_{drm,vtest}_resource_is_busy() is called. However, it returns TRUE
for a busy resource, whereace virgl_fence_wait() should return TRUE for
a completed (non-busy) resource.

This fixes running supertuxkart in a VM (I could not reproduce locally
with vtest though there is a similar fix)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 14:07:53 +10:00
Dave Airlie
15896a470b glsl/types: rename is_dual_slot_double to is_dual_slot_64bit.
In the future int64 support will have the same requirements.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 09:17:24 +10:00
Dave Airlie
45c901f7a3 st/glsl_to_tgsi: move to checking 64-bitness instead of double
This uses the new types interfaces to check for 64-bit types,
as futureproofing against int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:49 +10:00
Dave Airlie
bbbc45b8e1 st/glsl_to_tgsi: use enum glsl_base_type instead of unsigned
This is just some better type safety that I noticed while working
on 64-bit integer support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:49 +10:00
Dave Airlie
152f5eea62 mesa: use new 64-bit checks instead of explicit double checks.
This just moves to the new interfaces in advance of int64.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:47 +10:00
Dave Airlie
2df46519e4 glsl/link_varyings: switch to 64bit check instead of double.
This is prep work for int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:43 +10:00
Dave Airlie
35616a9e0e glsl: use new interfaces for 64-bit checks.
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:19 +10:00
Dave Airlie
a82b8e8b36 compiler: use 64bit check for sizing instead of double check.
This just moves code to the new check in advance of int64 support.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:15 +10:00
Dave Airlie
246518154e compiler/types: add 64-bitness queries.
This adds an inline and type query for if a type is 64-bit.

Fow now this is equivalent to double, but int64 will change
this.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09 07:37:04 +10:00
Adam Jackson
a1c5cd426c glapi/glx: Add overflow checks to the client-side indirect code
Coverity complains that the computed sizes can lead to negative lengths
passed to memcpy. If that happens we've been handed invalid arguments
anyway, so just bomb out.

The funky "0%s" is because the size string for the variable-length part
of the request is of the form "+ safe_pad() ...", and a unary + would
coerce the result to always be positive, defeating the overflow check.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-08 14:39:46 -04:00
Marek Olšák
26b69ad250 radeonsi: improve the computation and comment of scratch_waves
2% isn't much. If you think the number should be decreased, please speak up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
1d9c1d9386 radeonsi: print the number of spilled VGPRs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:28:25 +02:00
Marek Olšák
2b18d67a1e gallium/radeon: remove dead code creating LLVMTargetMachine
This was for some old unsupported LLVM version.
Only si_create_context creates the target machine now.
r600g doesn't use this function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:42 +02:00
Marek Olšák
a343ab55f7 radeonsi: don't enable scratch just for SGPR spills
Diff from shader-db:
  Scratch: 3221504 -> 17408 (-99.46 %) bytes per wave

v2: add "break;"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 19:23:41 +02:00
Marek Olšák
55b097d004 st/mesa: try not to compile compute shader on the first use
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-08 19:23:41 +02:00
Marek Olšák
95288277d5 Revert "radeonsi: allow direct hw MSAA resolve for scanout surfaces"
This reverts commit ffd54d1936.

No, it doesn't work. The test case is "glxgears -samples 2".
2016-06-08 19:21:55 +02:00
Nicolai Hähnle
bd5c41fe5f st/mesa: directly compute level=0 texture size in st_finalize_texture
The width0/height0/depth0 on stObj may not have been set at this point.
Observed in a trace that set up levels 2..9 of a 2d texture, and set the base
level to 2, with height 1. This made the guess logic always bail.

Originally investigated by Ilia Mirkin, this patch gets rid of the somewhat
redundant storage of width0/height0/depth0 and makes sure we always compute
pipe texture sizes that are compatible with the base level image of the
GL texture.

Fixes the gl-1.2-texture-base-level piglit test provided by Brian Paul.

v2:
- try to re-use an existing pipe texture when possible
- handle a corner case where the base level is not level 0 and it is of
  size 1x1x1

v3:
- ptHeight = ptWidth in cube map 1x1 case (suggested by Brian)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-08 19:12:07 +02:00
Timothy Arceri
8c3ecde0e1 glsl: stop allocating memory for SSBOs and builtins
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.

These uniform types don't use this storage.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-08 13:19:32 +10:00
Ilia Mirkin
6e6fd911da st/mesa: use buffer usage history to set dirty flags for revalidation
We were previously unconditionally doing this for arrays and ubo's, and
ignoring texture/storage/atomic buffers. Instead use the usage history
to determine which atoms need to be revalidated.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-07 22:27:04 -04:00
Gurchetan Singh
d9546b0c5d i965: Integrate precise trig into configuration infrastructure
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put

<option name="precise_trig" value="true"/>

in the proper drirc file.

V2: Make option name more generic

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
2016-06-07 15:42:21 -07:00
Marek Olšák
f39439d166 radeonsi: re-enable PBO ReadPixels acceleration
disabled by 4f1cccf570

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-08 00:22:45 +02:00
Marek Olšák
7c6e88b643 radeonsi: allow MSAA resolving into a texture that has DCC enabled
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
9a472a3e0b gallium/radeon: move DCC clearing into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
ffd54d1936 radeonsi: allow direct hw MSAA resolve for scanout surfaces
No idea why this was disabled, but it works fine.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
4be46c7d9d radeonsi: don't allocate DCC for the temporary MSAA resolve surface
Allocating it has no effect, but it adds overhead (useless DCC clear).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
c06246501e radeonsi: don't enable DCC in the sampler if first_level doesn't have it
If first_level > 0 and DCC is disabled for that level, let's skip DCC
reads entirely.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
00389100b6 winsys/amdgpu: enable DCC for mipmapped textures
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
c65361763c gallium/radeon: don't disable DCC because of SDMA
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
2fd74a05bb radeonsi: don't flag renderbuffer feedback loop if DCC has just been disabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
aa7fe70443 radeonsi: add per-level dcc_enabled flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
60e93ddd06 radeonsi: compute DCC register parameters in si_emit_framebuffer_state
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
a01536a29f gallium/radeon: add an assertion checking the validity of PIPE_BIND_SCANOUT
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Marek Olšák
d4d733e39d gallium/radeon: don't allocate DCC for non-renderable texture formats
R9G9B9E5 is the only uncompressed one hopefully.

This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-08 00:22:45 +02:00
Nicolai Hähnle
b42bc90b6a radeonsi: enable WQM in PS prolog when needed
WQM is needed when the PS prolog computes a VGPR that is consumed by a shader
with (implicit or explicit) derivatives.

Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be
effective (otherwise it's just a no-op).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Cc: 12.0 <mesa-dev@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 23:46:02 +02:00
Nicolai Hähnle
d3a584defe tgsi/scan: add uses_derivatives (v2)
v2:
- TG4 does not calculate derivatives (Ilia)
- also handle SAMPLE* instructions (Roland)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-07 23:45:17 +02:00
Nanley Chery
b7a0c0ec7f docs/devinfo: Expound on helpful extension tips
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Nanley Chery
9e7de50cab docs/devinfo: Update bullet in stale extension guide
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Nanley Chery
26b0f023d7 docs/devinfo: Add closing paragraph tag
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-07 11:16:23 -07:00
Tim Rowley
87f0a0448f swr: fix provoking vertex
Use rasterizer provoking vertex API.

Fix rasterizer provoking vertex for tristrips and quad list/strips.

v2: make provoking vertex tables static const

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-06-07 11:47:52 -05:00
Ilia Mirkin
c81b090c92 st/mesa: revalidate image atoms when a texture is updated
A texture may be redefined with _NEW_TEXTURE, which might have been
bound to a shader image slot. We have to revalidate the image atoms to
pick up on the new resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-07 10:18:34 -04:00
Ilia Mirkin
71ad8a173f gk104/ir: fix conditions for adding a texbar
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.

Based on an earlier patch by Samuel Pitoiset.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
2016-06-07 10:18:13 -04:00
Nicolai Hähnle
8239da28e8 radeonsi: keep track of dirty descriptor sets
Reduces CPU load for draw calls that change none or few of the descriptors.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:10 +02:00
Nicolai Hähnle
d152c73712 radeonsi: move si_descriptors into a per-context array
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:07 +02:00
Nicolai Hähnle
a29c4f9ebd radeonsi: pass shader stage to si_disable_shader_image
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:05 +02:00
Nicolai Hähnle
4e0fb72786 radeonsi: access descriptor sets via local variables
This will simplify moving them to a per-context array.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:18:02 +02:00
Nicolai Hähnle
ba4a2840c7 radeonsi: add si_set_rw_buffer to be used for internal descriptors
So that callers outside of si_descriptors.c need to worry less about the
details of descriptor handling.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:59 +02:00
Nicolai Hähnle
c615a055f4 radeonsi: pass shader stage to si_set_shader_image
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:57 +02:00
Nicolai Hähnle
e6612a3e68 radeonsi: pass shader stage to si_set_sampler_view
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:55 +02:00
Nicolai Hähnle
c32cd4b78d radeonsi: move descriptor set begin_new_cs handling into a separate function
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:39 +02:00
Nicolai Hähnle
031b57bc2f radeonsi: move enabled_mask out of si_descriptors
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-07 15:17:23 +02:00
Jason Ekstrand
d1e141a661 anv/entrypoints: Stop using the C preprocessor
Now that we emit guards for everything, we can just generate the files and
trust build flags to keep us safe.  This should also fix the tarball
problems.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Jason Ekstrand
d1a53f91ee anv/entrypoints: Emit #if guards for all platforms
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Haixia Shi
1ea233c6f3 platform_android: prevent deadlock in droid_swap_buffers
To avoid blocking other EGL calls, release the display mutex before
we enqueue buffer to android frameworks and re-acquire the mutex
upon return.

v2: moved lock/unlock inside droid_window_enqueue_buffer().

TEST=verify pinch zoom in Photos app no longer causes hangs

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:25 +01:00
Emil Velikov
b7f7ec7843 mesa: automake: distclean git_sha1.h when building OOT
In the case of out-of-tree (OOT) builds, in particular when building
from tarball, we'll end up with the file in both srcdir and builddir.

We want the former to remain intact (since we need it on rebuild) while
the latter should be removed otherwise `make distclean' gets angry at
us.

Ideally there'll be a solution that feels a bit less of a hack. Until
then this does the job exactly as expected.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:30:23 +01:00
Emil Velikov
2c424e00c3 mesa: automake: ensure that git_sha1.h.tmp has the right attributes
... when copied from git_sha1.h.

As the latter file can we lacking the write attribute, one should set it
explicitly. Otherwise we'll get a warning/failure at cleanup stage.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:21:46 +01:00
Emil Velikov
359d9dfec3 mesa: automake: add directory prefix for git_sha1.h
Otherwise the build will assume that we've talking about builddir, which
is not the case in the else statement.

Here the file is already generated and is part of the tarball.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-07 12:21:45 +01:00
Emil Velikov
1816c837c1 egl: android: don't add the image loader extension for !render_node
With earlier commit we introduced support for render_node devices, which
was couples with the use of the image loader extension.

As the work was inspired by egl/wayland we (erroneously) added the
extension for the !render_node path as well.

That works for wayland, as the implementations of the DRI2 and IMAGE
loader extensions converge behind the scenes. As that is not yet
the case for Android we shouldn't expose the extension.

Fixes: 34ddef39ce ("egl: android: add dma-buf fd support")

Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-07 12:21:45 +01:00
Marek Olšák
095803a37a gallium/radeon: add support for sharing textures with DCC between processes
v2: use a function for calculating WORD1 of bo metadata

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-07 11:12:26 +02:00
Marek Olšák
9e5b5fbde0 gallium/radeon: don't discard DCC if an external user can write to it
We don't import textures with DCC now, but soon we will.

v2: if we can't disable DCC for image writes, at least decompress DCC
    at bind time

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-07 11:12:26 +02:00
Dave Airlie
c6b14bafa4 i915: fix typo CAP.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 18:31:14 +10:00
Jakob Sinclair
b450f29073 glsl: initialise pointer to NULL
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.

CID: 1343616

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-07 08:13:25 +02:00
Dave Airlie
c295923d13 i965/gen8: fix cull distance emission for tessellation shaders.
This fixes some cases of:
GL45-CTS.cull_distance.functional
on Skylake.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 11:52:17 +10:00
Ilia Mirkin
704bc0f0e9 nvc0: add support for VOTE tgsi opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
f64c36e2d7 st/mesa: expose GL_ARB_shader_group_vote when supported by backend
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
edfa7a4b25 gallium: add PIPE_CAP_TGSI_VOTE for when the VOTE ops are allowed
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:29 -04:00
Ilia Mirkin
30684b50d7 gallium: add VOTE_* opcodes to implement GL_ARB_shader_group_vote
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:28 -04:00
Ilia Mirkin
5189f0243a mesa: hook up core bits of GL_ARB_shader_group_vote
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:48:46 -04:00
Kenneth Graunke
13b859de04 glsl: Make opt_copy_propagation_elements actually propagate into loops.
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.

The shader-db statistics are basically a wash:

   No change in instruction counts.

   total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
   cycles in affected programs: 2102646 -> 2097396 (-0.25%)
   helped: 48
   HURT: 83

I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-06 14:14:31 -07:00
Kenneth Graunke
0756e3a25c glsl: Make opt_copy_propagation actually propagate into loops.
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.

The shader-db statistics are not terribly impressive:

   total instructions in shared programs: 9008589 -> 9008613 (0.00%)
   instructions in affected programs: 4293 -> 4317 (0.56%)
   helped: 0
   HURT: 10

   total cycles in shared programs: 78550978 -> 78575760 (0.03%)
   cycles in affected programs: 655426 -> 680208 (3.78%)
   helped: 75
   HURT: 88

   GAINED: 2

Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one.  That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.

This patch is also useful for a couple of reasons:

1. It can clean up copies created by varying packing (previously, we
   couldn't if the uses were inside a loop).

   This fixes a bug when interpolateAt*() is used on a packed varying
   inside a loop: glsl_to_nir struggles to see through the extra copy
   and mistakenly believed the variable was not an input.

2. It will help propagate uniform array access created by
   lower_const_array_to_uniforms().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-06 14:14:31 -07:00
Samuel Pitoiset
08ddfe7b2f nv50/ir: use round toward 0 when converting doubles to integers
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.

This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 22:56:04 +02:00
Marek Olšák
00e6899ae5 gallium/radeon: don't re-set BO metadata after CMASK deallocation
CMASK has no effect on metadata, because it's not sharable.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
589d6b58c3 st/mesa: change SQRT lowering to fix the game Risen
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94627
(against nouveau)

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-06 22:50:55 +02:00
Marek Olšák
991cbfcb14 radeonsi: add a performance tweak for 4 SE parts
Ported from Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
2802310c25 radeonsi: simplify PRIMGROUP_SIZE computation for tessellation
Ported from Vulkan.

v2: keep the comment

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
014c8ec770 r600g: use hw MSAA resolve for non-trivial resolves
This improves MSAA resolve performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Marek Olšák
6b449783f6 radeonsi: use hw MSAA resolve for non-trivial resolves
This improves MSAA resolve performance.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-06 22:50:55 +02:00
Dave Airlie
07403014c3 mesa/program_resource: return -1 for index if no location.
The GL4.5 spec quote seems clear on this:
"The value -1 will be returned by either command if an error occurs,
if name does not identify an active variable on programInterface,
or if name identifies an active variable that does not have a valid
location assigned, as described above."

This fixes:
GL45-CTS.program_interface_query.output-built-in

[airlied: use _mesa_program_resource_location_index as
suggested by Eduardo]
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-07 06:10:19 +10:00
Nicolai Hähnle
ec2b52e2d9 radeonsi: set descriptor dirty mask on shader buffer unbind
Found randomly while skimming the code. This might have caused VM faults in
robustness tests.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 21:43:18 +02:00
Nicolai Hähnle
0f916d4ca7 st/mesa: fix resource leak in try_pbo_readpixels
Found by inspection after seeing
https://bugs.freedesktop.org/show_bug.cgi?id=96343

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 21:42:27 +02:00
Charmaine Lee
627e975896 tgsi: fix mixed data type comparison in tgsi_point_sprite.c
Cast the unsigned semantic index to integer datatype before comparing
to max_generic, otherwise, max_generic which is initialized to -1
will be converted to unsigned int before the comparison, causing a wrong
semantic index to be assigned to a shader output.

Fixes the assert running TurboCAD_gl.trace. (VMware bug 1667265)

Also tested with glretrace, mesa demos pointblast, spriteblast and pointcoord.

v2: use the original max_generic variable but add the (int) cast
    to the semantic index, as suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-06 10:20:45 -06:00
Charmaine Lee
304b5a1446 svga: print shader linkage info when tgsi debug bit is on
When TGSI debug flag is enabled, print the shader linkage info as well.

Tested with mesa demos with SVGA_DEBUG=tgsi

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-06 10:20:45 -06:00
Ilia Mirkin
4f1cccf570 st/mesa: check shader image format support before using PBO download
ARB_shader_image_load_store only requires a very fixed list of formats
to be supported, while textures may be in all kinds of formats, like
BGRA which are presently not supported on at least Kepler.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-06 12:05:59 -04:00
Lars Hamre
4163c71010 tgsi: use truncf in micro_trunc
Switches to using truncf in micro_trunc.

Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-trunc-float
fs-trunc-vec2
fs-trunc-vec3
fs-trunc-vec4
vs-trunc-float
vs-trunc-vec2
vs-trunc-vec3
vs-trunc-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-trunc-float
gs-trunc-vec2
gs-trunc-vec3
gs-trunc-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-06 15:56:28 +02:00
Samuel Iglesias Gonsálvez
2b648ec17c i965/gs/scalar: Fix load input for doubles
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Samuel Iglesias Gonsálvez
2d6f82a294 i965/fs: fix offset when loading double vector input varyings
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.

const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Samuel Iglesias Gonsálvez
cb30727648 i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-06 12:37:16 +02:00
Dave Airlie
4c86399378 glsl: geom shader max_vertices layout must match.
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."

Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 18:02:19 +10:00
Jason Ekstrand
ffcef720b7 anv/pipeline: Add support for caching the push constant map
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2016-06-06 00:44:32 -07:00
Dave Airlie
78659ade40 glsl: use enum glsl_interface_packing in more places. (v2)
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.

This makes things a bit nicer in gdb, and improves type
safety.

v2: fix a few pieces of interface I missed that caused some
piglit regressions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-06-06 15:58:37 +10:00
Dave Airlie
ff2e569153 i965: don't use NumLayers for 3D textures.
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.

This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 13:07:07 +10:00
Dave Airlie
1f66a4b689 glsl: for anonymous struct matching use without_array() (v3)
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().

Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in

v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:41 +10:00
Dave Airlie
6702c15810 glsl/ast: don't crash when func_name is NULL
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types

If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:30 +10:00
Dave Airlie
4336196b7f glsl: handle ast_aggregate in has_sequence_subexpression. (v2)
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };

The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.

v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-06 12:54:19 +10:00
Kenneth Graunke
f657a59d98 mesa: Try to unbreak the MSVC build.
PATH_MAX is apparently not a thing on Windows.  Borrow the hack from
pipe_loader.c to try and make this work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-05 16:32:08 -07:00
Kenneth Graunke
c417c0c9c3 mesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.
This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).

It supports both GLSL shaders and ARB programs.  All stages that
are linked together are written in a single .shader_test file.

This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly.  It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.

We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
  all linked shaders in one file), at draw time (not link time),
  with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
  MESA_SHADER_READ_PATH) also uses separate files per shader stage,
  but allows reading in files to replace an app's shader code.

v2:  Dump ARB programs too, not just GLSL.
v3:  Don't dump bogus 0.shader_test file.
v4:  Add "GL_ARB_separate_shader_objects" to the [require] block.
v5:  Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6:  Don't hardcode /tmp/mesa.
v7:  Fix memoization of getenv().
v8:  Also print "SSO ENABLED" (suggested by Timothy).
v9:  Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
     _mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-06-05 13:48:57 -07:00
Ilia Mirkin
092ec3920f nv50,nvc0: fix BGR10_A2UI vertex format
This is mostly academic as this is not reachable from GL, which only has
the packed RGB10_A2UI vertex format.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 15:13:46 -04:00
Samuel Pitoiset
be365f34f0 nvc0: do not clear surfaces bins in the validate function
We should not call nouveau_bufctx_reset() inside a validate function.
This only affects Fermi where images are aliased between 3D and CP.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 19:02:59 +02:00
Samuel Pitoiset
43d3ecfb33 nvc0: re-validate images after launching a grid on Fermi
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.

However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.

This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-05 18:48:02 +02:00
Marek Olšák
3b44864ab7 radeonsi: fix images with level > 0
This should fix spec@arb_shader_image_load_store@level.

Broken by:
    Commit: 95c5bbae66
    radeonsi: set some image descriptor fields at bind time

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-05 17:00:14 +02:00
Ilia Mirkin
fd6bbc2ee2 nvc0: reduce overhead from always marking images dirty
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
0f673db6f0 nvc0: reduce overhead from always marking buffers dirty
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
e8ee161b16 nvc0: fix memory barrier flag handling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Ilia Mirkin
29abbeecd8 nvc0: mark bound buffer range valid
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-04 23:50:56 -04:00
Dave Airlie
f018456901 anv/entrypoints: don't go using wayland/xcb unless they are configured
The fix in:
anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards

breaks things if wayland headers aren't installed.

Separate things out properly to avoid that problem.

[airlied: fixed up to put in pre-existing sections].
Reported-by: Arjan van de Ven
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-05 07:03:12 +10:00
Marek Olšák
d5491a81ff gallium/radeon: don't use the DMA ring for pipelined buffer uploads
Submitting a DMA IB flushes the GFX IB and all GPU caches.

Vedran Miletić said:
  "On Tonga 380X, this improves The Talos Principle from 8.3 fps to 28.3 fps
   (all graphics settings Ultra, 4xAA, 1080p resolution with downsampling
   from 1200p)."

Some anonymous dude said:
   R9 390 results:
      Tomb Raider (normal settings): 80 -> 88 FPS
      Talos Principle (custom settings): 23 -> 56 FPS
      Metro Last Light Redux (default benchmark settings): 39 -> 40 FPS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Vedran Miletić <vedran@miletic.net>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
9c35ec2042 r600g: don't flush caches when binding shader resources
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
eff94af794 r600g: only do necessary cache flushes in cp_dma_copy_buffer
The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
9e62012c30 r600g: only do necessary cache flushes in cp_dma_clear_buffer
The main impact is that fast color clear doesn't flush TC, CONST, DB.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
c92a3ae7e9 r600g: remove a CP DMA workaround that's not needed anymore
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
5ea5ed6050 r600g: fix CP DMA hazard with index buffer fetches (v3)
v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
    otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ade16e1f5d r600g: properly sync CP with CP DMA on R6xx
This will allow removing useless cache & IB flushes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
7746903d3a r600g: write WAIT_UNTIL in the correct place
This has been wrong all along. Fixing this will allow removing useless
cache flushes.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ee0c96c11e gallium/radeon: rename allocator_so_filled_size -> allocator_zeroed_memory
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Marek Olšák
ada3d8f31e gallium/u_suballoc: allow different alignment for each allocation
Just move the alignment parameter from u_suballocator_create
to u_suballocator_alloc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Jason Ekstrand
441194edd9 anv/blit: Use CLAMP_TO_EDGE for scaled blits
When upscaling you can end up interpolating between the edge pixel and one
past the edge.  Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case.  This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*

Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
9313a56816 anv/copy: Account for the anv_surface.offset when creating a blit2d_surf
This was causing problems if the user tried to copy to/from the stencil
portion of a combined depth/stencil image.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
526a8de22d nir/spirv: Make a decoration switch complete
Getting rid of the default case makes the compiler warn if we are missing
cases.  While we're here, we also add the one missing case.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
62c6e94bd6 nir/spirv: Make unhandled decorations and capabilities non-fatal
glslang frequently throw bogus decorations into shaders.  While we are free
to assert-fail, it's a bit nicer to the application to just warn.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
ed14d21d04 nir/spirv: Add a way to print non-fatal warnings
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
2e46a5d155 nir/spirv: Add string lookup tables for a couple of SPIR-V enums
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
5a1e56f344 nir/spirv: Complete the list of capabilities
Previously we supported a subset of capabilities and just left a default
case for the others.  It's time to stop being lazy and actually audit the
capabilities.  This should bring them up-to-date with reality.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
9fa958e95b anv/pipeline: Add support for early depth stencil
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
66bd2e1133 mesa: Get rid of _mesa_active_fragment_shader_has_side_effects
It is no longer used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
35bf4d9dc2 i965/ps_state: Use wm_prog_data.has_side_effects
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
3fb289f957 i965/fs Add a wm_prog_data bit for has_side_effects
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
4d3b8318a7 nir/info: Get rid of uses_interp_var_at_offset
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
56a178922f anv/pipeline: Silently pass tests if depth or stencil is missing
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
bc7f7e1953 anv/pipeline: Unify gen7/8 emit_ds_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
fdc3c5dd05 genxml/gen6,7,75: s/BackFace/Backface
This is more consistent with gen8+

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
1f7b54ed29 nir/spirv: Handle the WorkgroupSize builtin decoration
This fixes the 7 dEQP-VK.pipeline.spec_constant.compute.local_size.* tests
in the latest dev version of the Vulkan CTS.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
b26cdd65e8 nir/spirv: Use breaks instead of returns in constant handling
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
a19ae36ce5 anv/pipeline: Refactor specialization constant handling a bit
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
45542f554c nir/lower_indirect_derefs: Use the direct array deref for recursion
This fixes about 100 of the new Vulkan CTS tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Jason Ekstrand
59f06ac389 anv/clear: Handle ClearImage on 3-D images
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-03 19:29:28 -07:00
Francisco Jerez
7244dc1e06 Revert "i965/fs: Allow scalar source regions on SNB math instructions."
This reverts commit c1107cec44.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode.  Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):

   es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
   es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
   es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_float_frag_xvary
   es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
   es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
   es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-06-03 18:47:29 -07:00
Francisco Jerez
a2135c6fd9 i965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 18:38:51 -07:00
Emil Velikov
7a3a0d9212 anv: add the X related and Wayland CFLAGS to VULKAN_ENTRYPOINT_CPPFLAGS
Otherwise we will fail to find the headers in some scenarios.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2016-06-04 00:52:00 +01:00
Emil Velikov
a1256c0ea7 nir: automake: add nir_search_helpers.h to the sources list(s)
Fixes: dfbae7d64f ("nir/algebraic: support for power-of-two
optimizations")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-04 00:18:40 +01:00
Rob Clark
1535519e51 freedreno/ir3: do idiv lowering after main opt loop
Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-03 16:05:03 -04:00
Rob Clark
dfbae7d64f nir/algebraic: support for power-of-two optimizations
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 16:05:03 -04:00
Nicolai Hähnle
a64c7cd2ba radeonsi: mark buffer texture range valid for shader images
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.

This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.

Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-03 14:11:05 +02:00
Marek Olšák
8c361e84ad Revert "egl: Check if API is supported when using eglBindAPI."
This reverts commit e8b38ca202.

It broke Glamor for Gallium at least.
2016-06-03 11:33:45 +02:00
Alejandro Piñeiro
9bdbb9c0e0 mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment
For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.

The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.

v2: better wording on the comment (Ian Romanick)

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-03 07:38:25 +02:00
Dave Airlie
d10ae20b96 mesa/get: return correct value for layer provoking vertex.
This fixes:
GL45-CTS.geometry_shader.layered_rendering.layered_rendering

on Skylake.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 12:33:34 +10:00
Plamena Manolova
0b67efaed2 egl: Account for default values of texture target and format
When validating attributes during surface creation we should account
for the default values of texture target and format (EGL_NO_TEXTURE)
since the user is not obligated to explicitly set both via the
attribute list passed to eglCreatePbufferSurface.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-06-02 16:07:31 -07:00
Samuel Pitoiset
28590eb949 nvc0: mark buffer texture range valid for shader images
Loosely based on radeonsi (Thanks to Nicolai).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-06-03 00:12:23 +02:00
Mauro Rossi
278c2212ac isl: add support for Android libmesa_isl static library
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.

Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}

Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-02 22:31:44 +01:00
Mauro Rossi
4143245c23 android: libmesa_glsl: add a dependency on libmesa_nir static
Fixes the following building error:

target  C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
                 from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-02 22:31:00 +01:00
Emil Velikov
af1a0ae8ce isl: automake: don't include isl_format_layout.c in two lists.
Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.

v2: Drop unrelated vulkan hunk (Jason).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 22:26:04 +01:00
Emil Velikov
af2637aa32 automake: bring back the .PHONY git_sha1.h.tmp rule
With earlier commit 3689ef32af ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.

The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.

Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-06-02 22:23:12 +01:00
Kenneth Graunke
f74a29188c i965: Add _NEW_POINT to a couple of comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-06-02 14:11:55 -07:00
Charmaine Lee
0cf0d7c02e svga: allow copy box in svga_transfer_dma_band()
Instead of just allow copy of a rectangle in svga_transfer_dma_band(),
this patch allows it to copy a box, hence allows copy a 3d texture
in one transfer.

Fixes black screen in running Heaven after commit fb9fe35. (Bug 1663282)

Tested with Heaven, glretrace, piglit.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 15:03:41 -06:00
Rob Clark
94d8fbd217 freedreno: fix bad bitshift warnings
Coverity doesn't realize idx will never be negative.  Throw in some
assert()s to help it out.

(Hopefully assert() isn't getting compiled out for coverity build.. but
there seems to be just one way to find out.  We might have to change
these to assume())

Fixes CID 1362442, 1362443

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 16:29:32 -04:00
Rob Clark
676c77a923 freedreno: assume builtin shaders do compile
Maybe we should switch to ureg to build the builtin shaders.  But at any
rate, if they fail to compile it is because someone messed them up (or
changed TGSI syntax?).

CID 1362444

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 16:29:32 -04:00
Francisco Jerez
060c8d245d i965/fs: Reindent emit_zip().
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:24:48 -07:00
Francisco Jerez
7aa76d66a1 i965/fs: Skip SIMD lowering destination zipping if possible.
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.

v2: Don't handle partial source-destination overlap for simplicity
    (Jason).  No instruction count regressions with respect to v1 in
    either shader-db or the few FP64 shader_runner test-cases with
    partial overlap I've checked manually.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:24:48 -07:00
Anuj Phogat
75da9c9933 blorp: Fix 16x multisample scaled blits
Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled
(with added 16x sample support) now passes with this patch.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-02 13:21:26 -07:00
Anuj Phogat
59c19b7687 meta: Fix indentation in shader code
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2016-06-02 13:21:26 -07:00
Dave Airlie
af7bf610cf mesa/copyimage: report INVALID_VALUE for missing cube face
The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.

This fixes:
GL45-CTS.copy_image.non_existent_mipmap

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 06:08:44 +10:00
Dave Airlie
c0856eacf1 mesa/copyimage: fix num samples check to handle renderbuffers.
This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.

This fixes:
GL45-CTS.copy_image.invalid_target

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-03 06:08:22 +10:00
Rob Clark
80c2886033 freedreno/a4xx: silence coverity warning
CID 1362451

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
9b854ce53c freedreno/a3xx+a4xx: fix potential null ptr deref
Coverity spotted the a3xx case (not sure why not the a4xx).

CID 1362452

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
27a97097e1 freedreno/ir3: fix coverity warning
CID 1362453

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
374ad2e2bd freedreno/ir3: use nir_shader_get_entrypoint() helper
Should also fix coverity warning: CID 1362454

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
df64cd6814 freedreno/a4xx: fix incorrect enum type
a4xx has it's own enum, different from a2xx/a3xx.

Spotted by coverity: CID 1362458, 1362459

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
1632b0eac0 freedreno: fix coverity negative array index warning
Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative.  Lets let coverity realize
this.

CID 1362460

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
ba452d43e0 freedreno: fix dereference before null check
ptr can actually never be null so just drop the check.

CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
228b2b36f4 gallium/util: remove u_staging
Unused, and fixes a couple of coverity warnings: CID 1362171, 1362170

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-06-02 15:44:07 -04:00
Rob Clark
18fb922faa freedreno/a3xx: only update/emit bordercolor state when needed
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Rob Clark
11f0652404 freedreno/a4xx: only update/emit bordercolor state when needed
I noticed in stk that it was contributing to a lot of overhead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-06-02 15:44:07 -04:00
Matt Turner
0d81a684c1 i965: Add missing types to type_sz().
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-06-02 11:34:09 -07:00
Nanley Chery
c06cef7f9b mesa/extensions: Fix ES1 extension reporting
Commit eda15abd84 , unintentionally
advertised these extensions in ES1 contexts. Undo this error.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-02 10:46:59 -07:00
Plamena Manolova
e8b38ca202 egl: Check if API is supported when using eglBindAPI.
According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 07:45:19 -07:00
Eric Engestrom
17f4c723eb st/osmesa: remove double-write (overwriting)
These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.

CoverityID: 1296205

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-02 07:05:05 -06:00
Nayan Deshmukh
6c9a352d79 st/vdpau: check for null pointer in get/put bits.
Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-02 09:28:48 +02:00
Christian König
b3e75c3997 radeon/uvd: fix the H264 level for Tonga v2
We support 5.2 for a while now.

v2: we even support 5.2 for H264, 5.1 is for HEVC.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
2016-06-02 09:27:57 +02:00
Alejandro Piñeiro
b48c42cd1f mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-02 08:54:17 +02:00
Alejandro Piñeiro
c1ceee6cc9 i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-06-02 08:54:10 +02:00
Alejandro Piñeiro
58617bcebe i965/eu: use simd8 when exec_size != EXECUTE_16
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-06-02 08:08:10 +02:00
Jordan Justen
0a3acff5b5 i965: Remove old CS local ID handling
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
b1f22c6317 i965: Enable cross-thread constants and compact local IDs for hsw+
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
 * Minimize size of patch that switches from the old local ID layout
   to the new layout (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
3ba9594f32 anv: Support new local ID generation & cross-thread constants
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
30685392e0 i965: Support new local ID push constant & cross-thread constants
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
d437798ace i965: Add CS push constant info to brw_cs_prog_data
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
 * Support older local ID push constant layout as well. (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
1b79e7ebbd i965: Store number of threads in brw_cs_prog_data
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
3ef0957dac i965: Add nir based intrinsic lowering and thread ID uniform
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
 * Create variable during lowering pass. (Ken)

v3:
 * Don't create a variable, but instead just insert an intrisic call
   to load a uniform from the allocated location. (Jason)

v4:
 * Don't run this pass if thread_local_id_index < 0

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
04fc72501a i965: Put CS local thread ID uniform in last push register
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
 * Add variable in intrinsics lowering pass
 * Make sure the ID is pushed last in assign_constant_locations, and
   that we save a spot for the ID in the push constants

v3:
 * Simplify code based with Jason's suggestions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
fa279dfbf0 i965: Add uniform for a CS thread local base ID
v4:
 * Force thread_local_id_index to -1 for now, and have
   fs_visitor::setup_cs_payload look at thread_local_id_index. This
   enables us to more easily cut over from the old local ID layout to
   the new layout, as suggested by Jason.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
8f48d23e0f i965: Add nir channel_num system value
v2:
 * simd16/32 fixes (curro)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
6f316c9d86 nir: Make lowering gl_LocalInvocationIndex optional
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jordan Justen
7b9def3583 glsl: Add glsl LowerCsDerivedVariables option
v2:
 * Move lower flag to context constants. (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jason Ekstrand
1205999c22 i965/fs: Copy the offset when lowering logical pull constant sends
This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 16:00:44 -07:00
Dave Airlie
8d4f4adfbd glsl/distance: make sure we use clip dist varying slot for lowered var.
When lowering, we always want to use the clip dist varying.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-02 07:09:21 +10:00
Nicolai Hähnle
c7877b9dab winsys/amdgpu: decay max_ib_size over time
So that memory use will eventually decrease again after a temporary peak.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
6aff6377b1 winsys/amdgpu: implement IB chaining on the gfx ring
As a consequence, CE IB size never triggers a flush anymore.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
45be461f55 winsys/amdgpu: consolidate IB size management in amdgpu_ib_finalize
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
89ba076de4 radeon/winsys: introduce radeon_winsys_cs_chunk
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
a7c26bfc0c radeonsi/sid: add packet definitions for IB chaining
While we're at it, add packet printing in si_debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
83a01cb498 winsys/amdgpu: start with smaller IBs, growing as necessary
This avoids allocating giant IBs from the outset, especially for CE and DMA.

Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.

With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.

v2:
- clean up buffer_size calculation

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
f80c6abb9e winsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions
The latter function allows getting the containing amdgpu_cs from any IB
(including non-main ones).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
9e5ed559ba winsys/amdgpu: extract IB big buffer allocation for re-use
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
9db851b5ee winsys/amdgpu: add IB buffer in amdgpu_get_new_ib
Adding the buffer when we start using it for the IB makes the logic for
chaining a bit simpler.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
d6211a61b0 gallium/radeon: use cs_check_space throughout
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
46ad3561be radeon/winsys: add cs_check_space
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
92d5d97b10 winsys/amdgpu: simplify interface of amdgpu_get_new_ib
We'll want to have an amdgpu_cs pointer for future changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Nicolai Hähnle
8396ab4241 winsys/amdgpu: add amdgpu_cs_has_user_fence
v2: style change

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Kenneth Graunke
25e1b8d366 i965: Fix isoline reads in scalar TES.
Isolines aren't reversed.  commit 5b2d8c2273 fixed this for the vec4
TES backend, but not the scalar one.

Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2016-06-01 13:46:09 -07:00
Nicolai Hähnle
ed0e9862c5 st/mesa: implement PBO downloads for ReadPixels
v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of
    the texture targets, but all hardware that has shader images should also
    have this cap.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:51 +02:00
Nicolai Hähnle
f3b62d4c74 st/mesa: hook up a no-op try_pbo_readpixels
For better bisectability given that the order of some of the fallback tests
in the blit path are rearranged.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:48 +02:00
Nicolai Hähnle
1cb4be94ae st/mesa: add layer_offset to PBO fragment shader
This will be used to select a slice of a 3D texture.

v2: fix a comment (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:43 +02:00
Nicolai Hähnle
2bf6dfac8a st/mesa: create PBO download fragment shaders
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:40 +02:00
Nicolai Hähnle
852d3fcd3b st/mesa: add PBO download enable bit and fragment shaders
For downloads, the fragment shader must know the source texture target, hence
we may cache multiple fragment shaders.

v2: break long line (Marek)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:34 +02:00
Nicolai Hähnle
581c001532 st/mesa: move shareable parts of PBO upload state and draw to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:31 +02:00
Nicolai Hähnle
e16800226e st/mesa: move PBO buffer address calculation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:28 +02:00
Nicolai Hähnle
21e069f7d4 st/mesa: move PBO upload fs creation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:26 +02:00
Nicolai Hähnle
979688a027 st/mesa: rename pbo_upload to pbo
At the same time, rename members that are upload-specific to say so.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:23 +02:00
Nicolai Hähnle
be82065fbe st/mesa: move PBO vertex and geometry shader creation to st_pbo.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:20 +02:00
Nicolai Hähnle
4ecc32b0e1 st/mesa: begin moving PBO functions into their own file
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:18 +02:00
Nicolai Hähnle
d9893feb2c gallium/cso: allow saving the first fragment shader image slot
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:15 +02:00
Nicolai Hähnle
fc0352ff9c gallium/u_inlines: allow NULL src in util_copy_image_view
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:12 +02:00
Nicolai Hähnle
57f576f1fb gallium: add PIPE_BARRIER_ALL define
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:36:48 +02:00
Ian Romanick
a428c955ce glsl: Use Geom.VerticesOut == -1 to specify unset
Because apparently layout(max_vertices=0) is a thing.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:39 -07:00
Ian Romanick
b27dfa5403 i965: If control_data_header_size_bits is zero, don't do EndPrimitive
This can occur when max_vertices=0 is explicitly specified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:39 -07:00
Ian Romanick
049bb94d2e mesa: Fix bogus strncmp
The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned.  That string has length 3.  strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.

v2: Not the same as strncmp(..., 3).  Noticed by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 11:11:25 -07:00
Marek Olšák
12740efd29 radeonsi: set correct stencil tile mode for texturing
Sadly, this doesn't affect SI and VI in any way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
ea68215c54 winsys/amdgpu: set flags correctly when allocating depth-stencil buffers
This mimics Vulkan. It also documents how to fix stencil texturing.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
532a5af47f gallium/radeon: lower memory usage during texture transfers
This improves throughput by keeping TTM overhead down.

Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.

v2: use gart_size / 4 as the threshold

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-01 17:35:30 +02:00
Marek Olšák
614e3c6272 gallium/radeon: invalidate busy linear textures for whole-texture uploads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
fc1479a954 gallium/radeon: degrade tiled textures mapped often to linear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
9927c8138a gallium/radeon: clean up and better comment use_staging_texture
Next commits will add other things around this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
b033584299 radeonsi: set some colorbuffer register fields at emit time
to allow reallocating the texture storage with different parameters

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
30b2b860b0 radeonsi: implement global resetting of texture descriptors
it will be used by texture reallocation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
28de7aec0c radeonsi: move code for setting one shader image into separate function
v2: fix set_shader_images(..., NULL). Found by Christoph Haag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
95c5bbae66 radeonsi: set some image descriptor fields at bind time
mainly the fields that can change by reallocating a texture and changing
the tile mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
ef765d0789 gallium/radeon: strenghten some checking for DMA preparation
Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.

v1.1: fix the comment about DCC

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák
9d881cc0ac gallium/util: add util_texrange_covers_whole_level from radeon
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Ilia Mirkin
ca135a2612 nir: allow sat on all float destination types
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 10:44:40 -04:00
Alex Deucher
bd85e4a041 radeonsi: fix the raster config setup for 1 RB iceland chips
I didn't realize there were 1 and 2 RB variants when this code
was originally added.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
2016-06-01 09:59:57 -04:00
Dave Airlie
6400144041 mesa/sampler: fix error codes for sampler parameters.
The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.

Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 17:01:19 +10:00
Dave Airlie
0ebf4257a3 glsl: define some GLES3 constants in GLSL 4.1
The GLSL 4.1 spec adds:
gl_MaxVertexUniformVectors
gl_MaxFragmentUniformVectors
gl_MaxVaryingVectors

This fixes:
GL45-CTS.gtf31.GL3Tests.uniform_buffer_object.uniform_buffer_object_build_in_constants

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 17:01:13 +10:00
Topi Pohjolainen
6ca118d2f4 i965: Add norbc debug option
This INTEL_DEBUG option disables lossless compression (also known
as render buffer compression).

v2: (Matt) Use likely(!lossless_compression_disabled) instead of
           !likely(lossless_compression_disabled)
    (Grazvydas) Update docs/envvars.html

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-01 09:16:36 +03:00
Topi Pohjolainen
30e9e6bd07 i965/gen9: Configure rbc buffers as plain for non-rbc tex views
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.

Fortunately none of tracked benchmarks showed a regression with
this.

v2 (Matt): Add missing space

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-01 09:16:36 +03:00
Kenneth Graunke
a3dc99f3d4 i965: Fix the passthrough TCS for isolines.
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants.  We at least
need to initialize them to zero.  We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).

Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.

(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2016-05-31 23:09:13 -07:00
Dave Airlie
ebb81cd683 i965/xfb: skip components in correct buffer.
The driver was adding the skip components but always for buffer 0.

This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 15:53:00 +10:00
Dave Airlie
1fe7bbb911 glsl/linker: fix multiple streams transform feedback.
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.

caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.

The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 13:30:41 +10:00
Dave Airlie
e891f7cf55 mesa/bufferobj: use mapping range in BufferSubData.
According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.

So we should use the if range is mapped path.

This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-01 13:30:40 +10:00
Ilia Mirkin
18d11c9989 nv50/ir: fix error finding free element in bitset in some situations
This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).

Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-05-31 23:25:51 -04:00
Ilia Mirkin
d873608bcf nv50/ir: print relevant file's bitset when showing RA info
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-31 23:25:50 -04:00
Timothy Arceri
98d40b4d11 Revert "glsl: fix xfb_offset unsized array validation"
This reverts commit aac90ba292.

The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom

Also the CTS test it was meant to fix seems like it may be bogus.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-01 10:33:57 +10:00
Francisco Jerez
c1107cec44 i965/fs: Allow scalar source regions on SNB math instructions.
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:

 "The supported regioning modes for math instructions are align16,
  align1 with the following restrictions:
   - Scalar source is supported.
  [...]
   - Source and destination offset must be the same, except the case of
     scalar source."

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-31 15:57:41 -07:00
Francisco Jerez
06d8765bc0 i965/fs: Fix constant combining for instructions that cannot accept source mods.
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
303ec22ed6 i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
4fe4f6e8a7 i965/fs: Fix compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already.  The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.

Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa5 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:41 -07:00
Francisco Jerez
1898673f58 i965/fs: Teach compute_to_mrf() about the COMPR4 address transformation.
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
485fbaff03 i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction.  In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
4b0ec9f475 i965/fs: Fix compute-to-mrf VGRF region coverage condition.
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ.  Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:57:40 -07:00
Francisco Jerez
bb61e24787 i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:56:54 -07:00
Francisco Jerez
88f380a2dd i965/fs: Teach regions_overlap() about COMPR4 MRF regions.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-31 15:22:04 -07:00
Dylan Baker
604010a7ed Don't use python 3
Now there are not files that require python 3, so for now just remove
the python 3 dependency and use python 2. I think the right plan is to
just get all of the python ready for python 3, and then use whatever
python is available.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
ab31817fed genxml: change chbang to python 2
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
12c1a01c72 genxml: use the isalpha method rather than str.isalpha.
This fixes gen_pack_header to work on python 2, where name[0] is unicode
not str.

Signed-off-by: Dylan Bake <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
a45a25418b genxml: require future imports for python2 compatibility.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
e5681e4d70 genxml: mark re strings as raw
This is a correctness issue.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
de2e9da2e9 genxml: Make classes descendants of object
This is the default in python3, but in python2 you get old style
classes. No one likes old-style classes.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Dylan Baker
9f50e3572c genxml: mark gen_pack_header.py as encoded in utf-8
There is unicode in this file, and I'm actually surprised that the
python interpreter hasn't gotten grumpy.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
2016-05-31 15:09:06 -07:00
Bas Nieuwenhuizen
35818129a6 radeonsi: Decompress DCC textures in a render feedback loop.
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:04 +02:00
Bas Nieuwenhuizen
cbe3421f05 radeonsi: Add counter to check if a texture is bound to a framebuffer.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 21:43:00 +02:00
Rhys Kidd
8cb74dd4e6 vc4: Fix compiler warnings in fail_instr path of QIR validate pass
Introduced in 8e2d0843c0.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-31 10:56:02 -07:00
Emil Velikov
b8e1f59d62 anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards
The generated sources should follow the example set by the vulkan
headers and our non-generated code. Namely: the code for all supported
platforms should be available, each one guarded by its respective
VK_USE_PLATFORM_*_KHR macro.

v2: Reword commit message.

Cc: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96285
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1 over IRC)
2016-05-31 18:41:28 +01:00
Brian Paul
6bea33008e svga: change enum pipe_resource_usage back to unsigned
This parameter is actually a bitmask of PIPE_TRANSFER_x flags.
Change it back to a simple unsigned type.  IIRC, some compilers
complain about masks of enum values.  Also, this make the function
signature match u_resource_vtbl::transfer_map() again.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-31 10:20:36 -06:00
Marek Olšák
7ca55d2da8 radeonsi: fix CP DMA hazard with index buffer fetches
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-31 16:59:32 +02:00
Marek Olšák
d427110882 r600g: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:55 +02:00
Marek Olšák
d5882bb0df radeonsi: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:54 +02:00
Marek Olšák
921ab0028e gallium/u_blitter: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:53 +02:00
Marek Olšák
8a10192b4b mesa: fix crash in driver_RenderTexture_is_safe
This just fixed the crash with the apitrace in bug report.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95246

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:43:34 +02:00
Marek Olšák
fc4896e686 radeonsi: don't flush TC at the end of IBs on DRM >= 3.2.0
It's not needed since it was fixed in the kernel.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-05-31 16:41:22 +02:00
Jakob Sinclair
877c00c653 gallium/radeon: fixed division by zero
Coverity is getting a false positive that a division by zero can occur
here. This change will silence the Coverity warnings as a division by zero
cannot occur in this case.

Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 12:51:20 +02:00
Eric Engestrom
35fd5282ea st/glsl_to_tgsi: prevent infinite loop
`unsigned j` would never fail `j >= 0`, leading to an infinite loop as
`j--` wraps around.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-31 11:46:30 +02:00
Dave Airlie
f87352d769 glsl/images: bounds check image unit assignment
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.

Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-31 10:41:44 +10:00
Ilia Mirkin
4b1a167a2b nvc0/ir: fix spilling predicates to registers
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-05-30 18:15:14 -04:00
Ilia Mirkin
1f895caba0 nvc0/ir: limit max number of regs based on availability in SM
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-30 18:15:10 -04:00
Ilia Mirkin
27a51ff9b4 nv50/ir: record number of threads in a compute shader
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-05-30 18:14:55 -04:00
Pierre Moreau
ae70879530 nv50/ir: Add missing handling of U64/S64 in inlines
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-30 16:12:12 -04:00
Emil Velikov
9074470d7b docs: rename release notes to 12.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7ad2cb6f08)
2016-05-30 20:33:30 +01:00
Ilia Mirkin
68d135011b docs: move nvc0 out of individual lines of GL 4.2, 4.3, ES 3.1
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-30 15:18:32 -04:00
Emil Velikov
888cf6eea2 docs: add 12.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 20:03:19 +01:00
Marek Olšák
4291229488 docs/GL3: mark radeonsi as all done up to GL 4.3 and GLES 3.1 2016-05-30 20:48:51 +02:00
Emil Velikov
922b471777 nir: add the SConscript.nir to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-05-30 19:19:01 +01:00
283 changed files with 5564 additions and 3728 deletions

View File

@@ -34,6 +34,10 @@ MESA_VERSION := $(shell cat $(MESA_TOP)/VERSION)
LOCAL_CFLAGS += \
-Wno-unused-parameter \
-Wno-date-time \
-Wno-pointer-arith \
-Wno-missing-field-initializers \
-Wno-initializer-overrides \
-Wno-mismatched-tags \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
@@ -78,6 +82,12 @@ LOCAL_CFLAGS += \
-D__STDC_LIMIT_MACROS
endif
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm
endif
LOCAL_CPPFLAGS += \
$(if $(filter true,$(MESA_LOLLIPOP_BUILD)),-D_USING_LIBCXX) \
-Wno-error=non-virtual-dtor \

View File

@@ -1 +1 @@
12.0.0-rc3
12.1.0-devel

View File

@@ -1,2 +0,0 @@
# The offending commit that this patch (part) reverts isn't in 12.0
be32a2132785fbc119f17e62070e007ee7d17af7 i965/compiler: Bring back the INTEL_PRECISE_TRIG environment variable

View File

@@ -146,45 +146,45 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, r600, radeonsi
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe)
GL 4.2, GLSL 4.20 -- all DONE: radeonsi
GL 4.2, GLSL 4.20 -- all DONE: nvc0, radeonsi
GL_ARB_texture_compression_bptc DONE (i965, nvc0, r600, radeonsi)
GL_ARB_texture_compression_bptc DONE (i965, r600)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_atomic_counters DONE (i965, softpipe)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_transform_feedback_instanced DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_shader_image_load_store DONE (i965, softpipe)
GL_ARB_conservative_depth DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_420pack DONE (all drivers that support GLSL 1.30)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_internalformat_query DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_internalformat_query DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_map_buffer_alignment DONE (all drivers)
GL 4.3, GLSL 4.30:
GL 4.3, GLSL 4.30 -- all DONE: nvc0, radeonsi
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_copy_image DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_compute_shader DONE (i965, softpipe)
GL_ARB_copy_image DONE (i965, nv50, r600, softpipe, llvmpipe)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, r600, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, r600, softpipe)
GL_ARB_internalformat_query2 DONE (all drivers)
GL_ARB_invalidate_subdata DONE (all drivers)
GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_multi_draw_indirect DONE (i965, r600, llvmpipe, softpipe, swr)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior DONE (i965, nvc0, radeonsi)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_robust_buffer_access_behavior DONE (i965)
GL_ARB_shader_image_size DONE (i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, softpipe)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (nv50, i965, r600, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_view DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_vertex_attrib_binding DONE (all drivers)
@@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
GL_ARB_ES3_1_compatibility DONE (nvc0, radeonsi)
GL_ARB_clip_control DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_cull_distance DONE (i965, nv50, nvc0, llvmpipe, softpipe)
GL_ARB_cull_distance DONE (i965, nv50, nvc0, llvmpipe, softpipe, swr)
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
@@ -222,32 +222,32 @@ GL 4.5, GLSL 4.50:
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
These are the extensions cherry-picked to make GLES 3.1
GLES3.1, GLSL ES 3.1
GLES3.1, GLSL ES 3.1 -- all DONE: nvc0, radeonsi
GL_ARB_arrays_of_arrays DONE (all drivers that support GLSL 1.30)
GL_ARB_compute_shader DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_compute_shader DONE (i965, softpipe)
GL_ARB_draw_indirect DONE (i965, r600, llvmpipe, softpipe, swr)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (i965, nvc0, r600, radeonsi, softpipe)
GL_ARB_framebuffer_no_attachments DONE (i965, r600, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_load_store DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_image_size DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, nvc0, radeonsi, softpipe)
GL_ARB_shader_atomic_counters DONE (i965, softpipe)
GL_ARB_shader_image_load_store DONE (i965, softpipe)
GL_ARB_shader_image_size DONE (i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (i965, softpipe)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_texture_multisample (Multisample textures) DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_vertex_attrib_binding DONE (all drivers)
GS5 Enhanced textureGather DONE (i965, nvc0, r600, radeonsi)
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GS5 Enhanced textureGather DONE (i965, r600)
GS5 Packing/bitfield/conversion functions DONE (i965, r600)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functionality not covered above:
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glGetBooleani_v - restrict to GLES enums
gl_HelperInvocation support DONE (i965, nvc0, r600, radeonsi)
gl_HelperInvocation support DONE (i965, r600)
GLES3.2, GLSL ES 3.2
GL_EXT_color_buffer_float DONE (all drivers)

View File

@@ -684,9 +684,11 @@ To add a new GL extension to Mesa you have to do at least the following.
</li>
<li>
Add a new entry to the <code>gl_extensions</code> struct in mtypes.h
if the extension requires driver capabilities not already exposed by
another extension.
</li>
<li>
Update the <code>extensions.c</code> file.
Add a new entry to the src/mesa/main/extensions_table.h file.
</li>
<li>
From this point, the best way to proceed is to find another extension,
@@ -697,12 +699,18 @@ To add a new GL extension to Mesa you have to do at least the following.
If the new extension adds new GL state, the functions in get.c, enable.c
and attrib.c will most likely require new code.
</li>
<li>
To determine if the new extension is active in the current context,
use the auto-generated _mesa_has_##name_str() function defined in
src/mesa/main/extensions.h.
</li>
<li>
The dispatch tests check_table.cpp and dispatch_sanity.cpp
should be updated with details about the new extensions functions. These
tests are run using 'make check'
</li>
</ul>
</p>

60
docs/relnotes/12.1.0.html Normal file
View File

@@ -0,0 +1,60 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.1.0 Release Notes / TBD</h1>
<p>
Mesa 12.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 12.1.1.
</p>
<p>
Mesa 12.1.0 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_shader_group_vote on nvc0</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -235,6 +235,7 @@ NIR_FILES = \
nir/nir_repair_ssa.c \
nir/nir_search.c \
nir/nir_search.h \
nir/nir_search_helpers.h \
nir/nir_split_var_copies.c \
nir/nir_sweep.c \
nir/nir_to_ssa.c \

View File

@@ -3393,7 +3393,7 @@ apply_layout_qualifier_to_variable(const struct ast_type_qualifier *qual,
(qual_component + components - 1) > 3) {
_mesa_glsl_error(loc, state, "component overflow (%u > 3)",
(qual_component + components - 1));
} else if (qual_component == 1 && type->is_double()) {
} else if (qual_component == 1 && type->is_64bit()) {
/* We don't bother checking for 3 as it should be caught by the
* overflow check above.
*/
@@ -6843,7 +6843,7 @@ ast_process_struct_or_iface_block_members(exec_list *instructions,
}
} else {
if (layout && layout->flags.q.explicit_xfb_offset) {
unsigned align = field_type->is_double() ? 8 : 4;
unsigned align = field_type->is_64bit() ? 8 : 4;
fields[i].offset = glsl_align(block_xfb_offset, align);
block_xfb_offset +=
MAX2(xfb_stride, (int) (4 * field_type->component_slots()));

View File

@@ -528,6 +528,12 @@ barrier_supported(const _mesa_glsl_parse_state *state)
state->stage == MESA_SHADER_TESS_CTRL;
}
static bool
vote(const _mesa_glsl_parse_state *state)
{
return state->ARB_shader_group_vote_enable;
}
/** @} */
/******************************************************************************/
@@ -853,6 +859,8 @@ private:
ir_function_signature *_shader_clock(builtin_available_predicate avail,
const glsl_type *type);
ir_function_signature *_vote(enum ir_expression_operation opcode);
#undef B0
#undef B1
#undef B2
@@ -2935,6 +2943,10 @@ builtin_builder::create_builtins()
glsl_type::uvec2_type),
NULL);
add_function("anyInvocationARB", _vote(ir_unop_vote_any), NULL);
add_function("allInvocationsARB", _vote(ir_unop_vote_all), NULL);
add_function("allInvocationsEqualARB", _vote(ir_unop_vote_eq), NULL);
#undef F
#undef FI
#undef FIUD
@@ -5576,6 +5588,16 @@ builtin_builder::_shader_clock(builtin_available_predicate avail,
return sig;
}
ir_function_signature *
builtin_builder::_vote(enum ir_expression_operation opcode)
{
ir_variable *value = in_var(glsl_type::bool_type, "value");
MAKE_SIG(glsl_type::bool_type, vote, 1, value);
body.emit(ret(expr(opcode, value)));
return sig;
}
/** @} */
/******************************************************************************/

View File

@@ -2467,6 +2467,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t versio
if (extensions->ARB_cull_distance)
add_builtin_define(parser, "GL_ARB_cull_distance", 1);
if (extensions->ARB_shader_group_vote)
add_builtin_define(parser, "GL_ARB_shader_group_vote", 1);
}
}

View File

@@ -594,6 +594,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = {
EXT(ARB_shader_bit_encoding, true, false, ARB_shader_bit_encoding),
EXT(ARB_shader_clock, true, false, ARB_shader_clock),
EXT(ARB_shader_draw_parameters, true, false, ARB_shader_draw_parameters),
EXT(ARB_shader_group_vote, true, false, ARB_shader_group_vote),
EXT(ARB_shader_image_load_store, true, false, ARB_shader_image_load_store),
EXT(ARB_shader_image_size, true, false, ARB_shader_image_size),
EXT(ARB_shader_precision, true, false, ARB_shader_precision),
@@ -1602,6 +1603,7 @@ ast_struct_specifier::ast_struct_specifier(const char *identifier,
name = identifier;
this->declarations.push_degenerate_list_at_head(&declarator_list->link);
is_declaration = true;
layout = NULL;
}
void ast_subroutine_list::print(void) const

View File

@@ -575,6 +575,8 @@ struct _mesa_glsl_parse_state {
bool ARB_shader_clock_warn;
bool ARB_shader_draw_parameters_enable;
bool ARB_shader_draw_parameters_warn;
bool ARB_shader_group_vote_enable;
bool ARB_shader_group_vote_warn;
bool ARB_shader_image_load_store_enable;
bool ARB_shader_image_load_store_warn;
bool ARB_shader_image_size_enable;

View File

@@ -1284,9 +1284,6 @@ nir_visitor::visit(ir_expression *ir)
intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
intrin->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
if (intrin->intrinsic == nir_intrinsic_interp_var_at_offset)
shader->info.uses_interp_var_at_offset = true;
unsigned bit_size = glsl_get_bit_size(deref->type);
add_instr(&intrin->instr, deref->type->vector_elements, bit_size);

View File

@@ -341,6 +341,12 @@ ir_expression::ir_expression(int op, ir_rvalue *op0)
this->type = glsl_type::int_type;
break;
case ir_unop_vote_any:
case ir_unop_vote_all:
case ir_unop_vote_eq:
this->type = glsl_type::bool_type;
break;
default:
assert(!"not reached: missing automatic type setup for ir_expression");
this->type = op0->type;
@@ -563,6 +569,9 @@ static const char *const operator_strs[] = {
"interpolate_at_centroid",
"get_buffer_size",
"ssbo_unsized_array_length",
"vote_any",
"vote_all",
"vote_eq",
"+",
"-",
"*",

View File

@@ -537,6 +537,10 @@ public:
return this->interface_type;
}
enum glsl_interface_packing get_interface_type_packing() const
{
return this->interface_type->get_interface_packing();
}
/**
* Get the max_ifc_array_access pointer
*
@@ -1477,10 +1481,17 @@ enum ir_expression_operation {
*/
ir_unop_ssbo_unsized_array_length,
/**
* Vote among threads on the value of the boolean argument.
*/
ir_unop_vote_any,
ir_unop_vote_all,
ir_unop_vote_eq,
/**
* A sentinel marking the last of the unary operations.
*/
ir_last_unop = ir_unop_ssbo_unsized_array_length,
ir_last_unop = ir_unop_vote_eq,
ir_binop_add,
ir_binop_sub,

File diff suppressed because it is too large Load Diff

View File

@@ -119,7 +119,7 @@ mark(struct gl_program *prog, ir_variable *var, int offset, int len,
/* double inputs read is only for vertex inputs */
if (stage == MESA_SHADER_VERTEX &&
var->type->without_array()->is_dual_slot_double())
var->type->without_array()->is_dual_slot())
prog->DoubleInputsRead |= bitfield;
if (stage == MESA_SHADER_FRAGMENT) {
@@ -306,7 +306,7 @@ ir_set_program_inouts_visitor::try_mark_partial_variable(ir_variable *var,
/* double element width for double types that takes two slots */
if (this->shader_stage != MESA_SHADER_VERTEX ||
var->data.mode != ir_var_shader_in) {
if (type->without_array()->is_dual_slot_double())
if (type->without_array()->is_dual_slot())
elem_width *= 2;
}

View File

@@ -453,6 +453,14 @@ ir_validate::visit_leave(ir_expression *ir)
assert(ir->operands[0]->type->base_type == GLSL_TYPE_SUBROUTINE);
assert(ir->type->base_type == GLSL_TYPE_INT);
break;
case ir_unop_vote_any:
case ir_unop_vote_all:
case ir_unop_vote_eq:
assert(ir->type == glsl_type::bool_type);
assert(ir->operands[0]->type == glsl_type::bool_type);
break;
case ir_binop_add:
case ir_binop_sub:
case ir_binop_mul:

View File

@@ -167,8 +167,7 @@ link_uniform_block_active_visitor::visit(ir_variable *var)
* also considered active, even if no member of the block is
* referenced."
*/
if (var->get_interface_type()->interface_packing ==
GLSL_INTERFACE_PACKING_PACKED)
if (var->get_interface_type_packing() == GLSL_INTERFACE_PACKING_PACKED)
return visit_continue;
/* Process the block. Bail if there was an error.
@@ -258,8 +257,7 @@ link_uniform_block_active_visitor::visit_enter(ir_dereference_array *ir)
* std140 layout qualifier, all its instances have been already marked
* as used in link_uniform_block_active_visitor::visit(ir_variable *).
*/
if (var->get_interface_type()->interface_packing ==
GLSL_INTERFACE_PACKING_PACKED) {
if (var->get_interface_type_packing() == GLSL_INTERFACE_PACKING_PACKED) {
b->var = var;
process_arrays(this->mem_ctx, ir, b);
}

View File

@@ -70,7 +70,7 @@ private:
}
virtual void enter_record(const glsl_type *type, const char *,
bool row_major, const unsigned packing) {
bool row_major, const enum glsl_interface_packing packing) {
assert(type->is_record());
if (packing == GLSL_INTERFACE_PACKING_STD430)
this->offset = glsl_align(
@@ -81,7 +81,7 @@ private:
}
virtual void leave_record(const glsl_type *type, const char *,
bool row_major, const unsigned packing) {
bool row_major, const enum glsl_interface_packing packing) {
assert(type->is_record());
/* If this is the last field of a structure, apply rule #9. The
@@ -106,7 +106,7 @@ private:
virtual void visit_field(const glsl_type *type, const char *name,
bool row_major, const glsl_type *,
const unsigned packing,
const enum glsl_interface_packing packing,
bool last_field)
{
assert(this->index < this->num_variables);

View File

@@ -222,7 +222,7 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program *prog,
val->array_elements[0]->type->base_type;
const unsigned int elements = val->array_elements[0]->type->components();
unsigned int idx = 0;
unsigned dmul = (base_type == GLSL_TYPE_DOUBLE) ? 2 : 1;
unsigned dmul = glsl_base_type_is_64bit(base_type) ? 2 : 1;
assert(val->type->length >= storage->array_elements);
for (unsigned int i = 0; i < storage->array_elements; i++) {

View File

@@ -65,7 +65,7 @@ program_resource_visitor::process(const glsl_type *type, const char *name)
unsigned record_array_count = 1;
char *name_copy = ralloc_strdup(NULL, name);
unsigned packing = type->interface_packing;
enum glsl_interface_packing packing = type->get_interface_packing();
recursion(type, &name_copy, strlen(name), false, NULL, packing, false,
record_array_count, NULL);
@@ -79,9 +79,9 @@ program_resource_visitor::process(ir_variable *var)
const bool row_major =
var->data.matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR;
const unsigned packing = var->get_interface_type() ?
var->get_interface_type()->interface_packing :
var->type->interface_packing;
const enum glsl_interface_packing packing = var->get_interface_type() ?
var->get_interface_type_packing() :
var->type->get_interface_packing();
const glsl_type *t =
var->data.from_named_ifc_block ? var->get_interface_type() : var->type;
@@ -116,7 +116,7 @@ void
program_resource_visitor::recursion(const glsl_type *t, char **name,
size_t name_length, bool row_major,
const glsl_type *record_type,
const unsigned packing,
const enum glsl_interface_packing packing,
bool last_field,
unsigned record_array_count,
const glsl_struct_field *named_ifc_member)
@@ -228,7 +228,7 @@ void
program_resource_visitor::visit_field(const glsl_type *type, const char *name,
bool row_major,
const glsl_type *,
const unsigned,
const enum glsl_interface_packing,
bool /* last_field */)
{
visit_field(type, name, row_major);
@@ -243,13 +243,13 @@ program_resource_visitor::visit_field(const glsl_struct_field *field)
void
program_resource_visitor::enter_record(const glsl_type *, const char *, bool,
const unsigned)
const enum glsl_interface_packing)
{
}
void
program_resource_visitor::leave_record(const glsl_type *, const char *, bool,
const unsigned)
const enum glsl_interface_packing)
{
}
@@ -402,7 +402,9 @@ private:
* uniforms.
*/
this->num_active_uniforms++;
this->num_values += values;
if(!is_gl_identifier(name) && !is_shader_storage)
this->num_values += values;
}
struct string_to_uint_map *hidden_map;
@@ -660,7 +662,7 @@ private:
}
virtual void enter_record(const glsl_type *type, const char *,
bool row_major, const unsigned packing) {
bool row_major, const enum glsl_interface_packing packing) {
assert(type->is_record());
if (this->buffer_block_index == -1)
return;
@@ -673,7 +675,7 @@ private:
}
virtual void leave_record(const glsl_type *type, const char *,
bool row_major, const unsigned packing) {
bool row_major, const enum glsl_interface_packing packing) {
assert(type->is_record());
if (this->buffer_block_index == -1)
return;
@@ -687,7 +689,7 @@ private:
virtual void visit_field(const glsl_type *type, const char *name,
bool row_major, const glsl_type * /* record_type */,
const unsigned packing,
const enum glsl_interface_packing packing,
bool /* last_field */)
{
assert(!type->without_array()->is_record());
@@ -762,13 +764,14 @@ private:
current_var->data.how_declared == ir_var_hidden;
this->uniforms[id].builtin = is_gl_identifier(name);
/* Do not assign storage if the uniform is builtin */
if (!this->uniforms[id].builtin)
this->uniforms[id].storage = this->values;
this->uniforms[id].is_shader_storage =
current_var->is_in_shader_storage_block();
/* Do not assign storage if the uniform is builtin */
if (!this->uniforms[id].builtin &&
!this->uniforms[id].is_shader_storage)
this->uniforms[id].storage = this->values;
if (this->buffer_block_index != -1) {
this->uniforms[id].block_index = this->buffer_block_index;
@@ -819,7 +822,9 @@ private:
this->uniforms[id].row_major = false;
}
this->values += values_for_type(type);
if (!this->uniforms[id].builtin &&
!this->uniforms[id].is_shader_storage)
this->values += values_for_type(type);
}
/**
@@ -1251,7 +1256,8 @@ link_assign_uniform_locations(struct gl_shader_program *prog,
#ifndef NDEBUG
for (unsigned i = 0; i < num_uniforms; i++) {
assert(uniforms[i].storage != NULL || uniforms[i].builtin);
assert(uniforms[i].storage != NULL || uniforms[i].builtin ||
uniforms[i].is_shader_storage);
}
assert(parcel.values == data_end);

View File

@@ -397,15 +397,15 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
unsigned slot_limit = idx + num_elements;
unsigned last_comp;
if (var->type->without_array()->is_record()) {
if (type->without_array()->is_record()) {
/* The component qualifier can't be used on structs so just treat
* all component slots as used.
*/
last_comp = 4;
} else {
unsigned dmul = var->type->is_double() ? 2 : 1;
unsigned dmul = type->without_array()->is_64bit() ? 2 : 1;
last_comp = var->data.location_frac +
var->type->without_array()->vector_elements * dmul;
type->without_array()->vector_elements * dmul;
}
while (idx < slot_limit) {
@@ -425,7 +425,7 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
for (unsigned j = 0; j < 4; j++) {
if (explicit_locations[idx][j] &&
(explicit_locations[idx][j]->type->without_array()
->base_type != var->type->without_array()->base_type)) {
->base_type != type->without_array()->base_type)) {
linker_error(prog,
"Varyings sharing the same location must "
"have the same underlying numerical type. "
@@ -443,7 +443,7 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
* worry about components beginning at anything other than 0 as
* the spec does not allow this for dvec3 and dvec4.
*/
if (i == 3 && last_comp > 4) {
if (i == 4 && last_comp > 4) {
last_comp = last_comp - 4;
/* Bump location index and reset the component index */
idx++;
@@ -708,7 +708,7 @@ tfeedback_decl::assign_location(struct gl_context *ctx,
+ this->matched_candidate->toplevel_var->data.location_frac
+ this->matched_candidate->offset;
const unsigned dmul =
this->matched_candidate->type->without_array()->is_double() ? 2 : 1;
this->matched_candidate->type->without_array()->is_64bit() ? 2 : 1;
if (this->matched_candidate->type->is_array()) {
/* Array variable */
@@ -886,7 +886,7 @@ tfeedback_decl::store(struct gl_context *ctx, struct gl_shader_program *prog,
}
if (explicit_stride && explicit_stride[buffer]) {
if (this->is_double() && info->Buffers[buffer].Stride % 2) {
if (this->is_64bit() && info->Buffers[buffer].Stride % 2) {
linker_error(prog, "invalid qualifier xfb_stride=%d must be a "
"multiple of 8 as its applied to a type that is or "
"contains a double.",
@@ -1937,7 +1937,7 @@ canonicalize_shader_io(exec_list *ir, enum ir_variable_mode io_mode)
* 64 bit map. Per-vertex and per-patch both have separate location domains
* with a max of MAX_VARYING.
*/
static uint64_t
uint64_t
reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode)
{
assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
@@ -1999,7 +1999,8 @@ assign_varying_locations(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *producer, gl_shader *consumer,
unsigned num_tfeedback_decls,
tfeedback_decl *tfeedback_decls)
tfeedback_decl *tfeedback_decls,
const uint64_t reserved_slots)
{
/* Tessellation shaders treat inputs and outputs as shared memory and can
* access inputs and outputs of other invocations.
@@ -2177,10 +2178,6 @@ assign_varying_locations(struct gl_context *ctx,
}
}
const uint64_t reserved_slots =
reserved_varying_slot(producer, ir_var_shader_out) |
reserved_varying_slot(consumer, ir_var_shader_in);
const unsigned slots_used = matches.assign_locations(prog, reserved_slots);
matches.store_locations();
@@ -2263,14 +2260,16 @@ assign_varying_locations(struct gl_context *ctx,
bool
check_against_output_limit(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *producer)
gl_shader *producer,
unsigned num_explicit_locations)
{
unsigned output_vectors = 0;
unsigned output_vectors = num_explicit_locations;
foreach_in_list(ir_instruction, node, producer->ir) {
ir_variable *const var = node->as_variable();
if (var && var->data.mode == ir_var_shader_out &&
if (var && !var->data.explicit_location &&
var->data.mode == ir_var_shader_out &&
var_counts_against_varying_limit(producer->Stage, var)) {
/* outputs for fragment shader can't be doubles */
output_vectors += var->type->count_attribute_slots(false);
@@ -2305,14 +2304,16 @@ check_against_output_limit(struct gl_context *ctx,
bool
check_against_input_limit(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *consumer)
gl_shader *consumer,
unsigned num_explicit_locations)
{
unsigned input_vectors = 0;
unsigned input_vectors = num_explicit_locations;
foreach_in_list(ir_instruction, node, consumer->ir) {
ir_variable *const var = node->as_variable();
if (var && var->data.mode == ir_var_shader_in &&
if (var && !var->data.explicit_location &&
var->data.mode == ir_var_shader_in &&
var_counts_against_varying_limit(consumer->Stage, var)) {
/* vertex inputs aren't varying counted */
input_vectors += var->type->count_attribute_slots(false);

View File

@@ -151,7 +151,7 @@ public:
return this->size;
else
return this->vector_elements * this->matrix_columns * this->size *
(this->is_double() ? 2 : 1);
(this->is_64bit() ? 2 : 1);
}
unsigned get_location() const {
@@ -160,7 +160,7 @@ public:
private:
bool is_double() const
bool is_64bit() const
{
switch (this->type) {
case GL_DOUBLE:
@@ -320,16 +320,22 @@ assign_varying_locations(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *producer, gl_shader *consumer,
unsigned num_tfeedback_decls,
tfeedback_decl *tfeedback_decls);
tfeedback_decl *tfeedback_decls,
const uint64_t reserved_slots);
uint64_t
reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode);
bool
check_against_output_limit(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *producer);
gl_shader *producer,
unsigned num_explicit_locations);
bool
check_against_input_limit(struct gl_context *ctx,
struct gl_shader_program *prog,
gl_shader *consumer);
gl_shader *consumer,
unsigned num_explicit_locations);
#endif /* GLSL_LINK_VARYINGS_H */

View File

@@ -2863,7 +2863,7 @@ assign_attribute_or_color_locations(gl_shader_program *prog,
* issue (3) of the GL_ARB_vertex_attrib_64bit behavior, this
* is optional behavior, but it seems preferable.
*/
if (var->type->without_array()->is_dual_slot_double())
if (var->type->without_array()->is_dual_slot())
double_storage_locations |= (use_mask << attr);
}
@@ -2940,7 +2940,7 @@ assign_attribute_or_color_locations(gl_shader_program *prog,
to_assign[i].var->data.is_unmatched_generic_inout = 0;
used_locations |= (use_mask << location);
if (to_assign[i].var->type->without_array()->is_dual_slot_double())
if (to_assign[i].var->type->without_array()->is_dual_slot())
double_storage_locations |= (use_mask << location);
}
@@ -4850,9 +4850,12 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
*/
if (last < MESA_SHADER_FRAGMENT &&
(num_tfeedback_decls != 0 || prog->SeparateShader)) {
const uint64_t reserved_out_slots =
reserved_varying_slot(prog->_LinkedShaders[last], ir_var_shader_out);
if (!assign_varying_locations(ctx, mem_ctx, prog,
prog->_LinkedShaders[last], NULL,
num_tfeedback_decls, tfeedback_decls))
num_tfeedback_decls, tfeedback_decls,
reserved_out_slots))
goto done;
}
@@ -4870,6 +4873,9 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
gl_shader *const sh = prog->_LinkedShaders[last];
if (prog->SeparateShader) {
const uint64_t reserved_slots =
reserved_varying_slot(sh, ir_var_shader_in);
/* Assign input locations for SSO, output locations are already
* assigned.
*/
@@ -4877,7 +4883,8 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
NULL /* producer */,
sh /* consumer */,
0 /* num_tfeedback_decls */,
NULL /* tfeedback_decls */))
NULL /* tfeedback_decls */,
reserved_slots))
goto done;
}
@@ -4898,9 +4905,15 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
gl_shader *const sh_i = prog->_LinkedShaders[i];
gl_shader *const sh_next = prog->_LinkedShaders[next];
const uint64_t reserved_out_slots =
reserved_varying_slot(sh_i, ir_var_shader_out);
const uint64_t reserved_in_slots =
reserved_varying_slot(sh_next, ir_var_shader_in);
if (!assign_varying_locations(ctx, mem_ctx, prog, sh_i, sh_next,
next == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0,
tfeedback_decls))
tfeedback_decls,
reserved_out_slots | reserved_in_slots))
goto done;
do_dead_builtin_varyings(ctx, sh_i, sh_next,
@@ -4909,11 +4922,14 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
/* This must be done after all dead varyings are eliminated. */
if (sh_i != NULL) {
if (!check_against_output_limit(ctx, prog, sh_i)) {
unsigned slots_used = _mesa_bitcount_64(reserved_out_slots);
if (!check_against_output_limit(ctx, prog, sh_i, slots_used)) {
goto done;
}
}
if (!check_against_input_limit(ctx, prog, sh_next))
unsigned slots_used = _mesa_bitcount_64(reserved_in_slots);
if (!check_against_input_limit(ctx, prog, sh_next, slots_used))
goto done;
next = i;

View File

@@ -156,7 +156,7 @@ protected:
*/
virtual void visit_field(const glsl_type *type, const char *name,
bool row_major, const glsl_type *record_type,
const unsigned packing,
const enum glsl_interface_packing packing,
bool last_field);
/**
@@ -180,10 +180,10 @@ protected:
virtual void visit_field(const glsl_struct_field *field);
virtual void enter_record(const glsl_type *type, const char *name,
bool row_major, const unsigned packing);
bool row_major, const enum glsl_interface_packing packing);
virtual void leave_record(const glsl_type *type, const char *name,
bool row_major, const unsigned packing);
bool row_major, const enum glsl_interface_packing packing);
virtual void set_buffer_offset(unsigned offset);
@@ -199,7 +199,7 @@ private:
*/
void recursion(const glsl_type *t, char **name, size_t name_length,
bool row_major, const glsl_type *record_type,
const unsigned packing,
const enum glsl_interface_packing packing,
bool last_field, unsigned record_array_count,
const glsl_struct_field *named_ifc_member);
};

View File

@@ -114,7 +114,7 @@ lower_buffer_access::emit_access(void *mem_ctx,
/* For a row-major matrix, the next column starts at the next
* element.
*/
int size_mul = deref->type->is_double() ? 8 : 4;
int size_mul = deref->type->is_64bit() ? 8 : 4;
emit_access(mem_ctx, is_write, col_deref, base_offset,
deref_offset + i * size_mul,
row_major, deref->type->matrix_columns, packing,
@@ -125,7 +125,7 @@ lower_buffer_access::emit_access(void *mem_ctx,
/* std430 doesn't round up vec2 size to a vec4 size */
if (packing == GLSL_INTERFACE_PACKING_STD430 &&
deref->type->vector_elements == 2 &&
!deref->type->is_double()) {
!deref->type->is_64bit()) {
size_mul = 8;
} else {
/* std140 always rounds the stride of arrays (and matrices) to a
@@ -137,7 +137,7 @@ lower_buffer_access::emit_access(void *mem_ctx,
* machine units, the base alignment is 4N. For vec4, base
* alignment is 4N.
*/
size_mul = (deref->type->is_double() &&
size_mul = (deref->type->is_64bit() &&
deref->type->vector_elements > 2) ? 32 : 16;
}
@@ -159,7 +159,7 @@ lower_buffer_access::emit_access(void *mem_ctx,
is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
insert_buffer_access(mem_ctx, deref, deref->type, offset, mask, -1);
} else {
unsigned N = deref->type->is_double() ? 8 : 4;
unsigned N = deref->type->is_64bit() ? 8 : 4;
/* We're dereffing a column out of a row-major matrix, so we
* gather the vector from each stored row.
@@ -328,7 +328,7 @@ lower_buffer_access::setup_buffer_access(void *mem_ctx,
bool *row_major,
int *matrix_columns,
const glsl_struct_field **struct_field,
unsigned packing)
enum glsl_interface_packing packing)
{
*offset = new(mem_ctx) ir_constant(0u);
*row_major = is_dereferenced_thing_row_major(deref);
@@ -358,7 +358,7 @@ lower_buffer_access::setup_buffer_access(void *mem_ctx,
* thread or SIMD channel is modifying the same vector.
*/
array_stride = 4;
if (deref_array->array->type->is_double())
if (deref_array->array->type->is_64bit())
array_stride *= 2;
} else if (deref_array->array->type->is_matrix() && *row_major) {
/* When loading a vector out of a row major matrix, the
@@ -367,7 +367,7 @@ lower_buffer_access::setup_buffer_access(void *mem_ctx,
* vector) is handled below in emit_ubo_loads.
*/
array_stride = 4;
if (deref_array->array->type->is_double())
if (deref_array->array->type->is_64bit())
array_stride *= 2;
*matrix_columns = deref_array->array->type->matrix_columns;
} else if (deref_array->type->without_array()->is_interface()) {

View File

@@ -58,7 +58,7 @@ public:
ir_rvalue **offset, unsigned *const_offset,
bool *row_major, int *matrix_columns,
const glsl_struct_field **struct_field,
unsigned packing);
enum glsl_interface_packing packing);
};
} /* namespace lower_buffer_access */

View File

@@ -432,7 +432,7 @@ lower_packed_varyings_visitor::lower_rvalue(ir_rvalue *rvalue,
bool gs_input_toplevel,
unsigned vertex_index)
{
unsigned dmul = rvalue->type->is_double() ? 2 : 1;
unsigned dmul = rvalue->type->is_64bit() ? 2 : 1;
/* When gs_input_toplevel is set, we should be looking at a geometry shader
* input array.
*/
@@ -480,7 +480,7 @@ lower_packed_varyings_visitor::lower_rvalue(ir_rvalue *rvalue,
char right_swizzle_name[4] = { 0, 0, 0, 0 };
left_components = 4 - fine_location % 4;
if (rvalue->type->is_double()) {
if (rvalue->type->is_64bit()) {
/* We might actually end up with 0 left components! */
left_components /= 2;
}
@@ -676,7 +676,7 @@ lower_packed_varyings_visitor::needs_lowering(ir_variable *var)
return false;
type = type->without_array();
if (type->vector_elements == 4 && !type->is_double())
if (type->vector_elements == 4 && !type->is_64bit())
return false;
return true;
}

View File

@@ -138,7 +138,7 @@ lower_shared_reference_visitor::handle_rvalue(ir_rvalue **rvalue)
bool row_major;
int matrix_columns;
assert(var->get_interface_type() == NULL);
const unsigned packing = GLSL_INTERFACE_PACKING_STD430;
const enum glsl_interface_packing packing = GLSL_INTERFACE_PACKING_STD430;
setup_buffer_access(mem_ctx, var, deref,
&offset, &const_offset,
@@ -206,7 +206,7 @@ lower_shared_reference_visitor::handle_assignment(ir_assignment *ir)
bool row_major;
int matrix_columns;
assert(var->get_interface_type() == NULL);
const unsigned packing = GLSL_INTERFACE_PACKING_STD430;
const enum glsl_interface_packing packing = GLSL_INTERFACE_PACKING_STD430;
setup_buffer_access(mem_ctx, var, deref,
&offset, &const_offset,
@@ -365,7 +365,7 @@ lower_shared_reference_visitor::lower_shared_atomic_intrinsic(ir_call *ir)
bool row_major;
int matrix_columns;
assert(var->get_interface_type() == NULL);
const unsigned packing = GLSL_INTERFACE_PACKING_STD430;
const enum glsl_interface_packing packing = GLSL_INTERFACE_PACKING_STD430;
buffer_access_type = shared_atomic_access;
setup_buffer_access(mem_ctx, var, deref,

View File

@@ -61,7 +61,7 @@ public:
unsigned *const_offset,
bool *row_major,
int *matrix_columns,
unsigned packing);
enum glsl_interface_packing packing);
uint32_t ssbo_access_params();
ir_expression *ubo_load(void *mem_ctx, const struct glsl_type *type,
ir_rvalue *offset);
@@ -99,7 +99,7 @@ public:
ir_expression *emit_ssbo_get_buffer_size(void *mem_ctx);
unsigned calculate_unsized_array_stride(ir_dereference *deref,
unsigned packing);
enum glsl_interface_packing packing);
ir_call *lower_ssbo_atomic_intrinsic(ir_call *ir);
ir_call *check_for_ssbo_atomic_intrinsic(ir_call *ir);
@@ -273,7 +273,7 @@ lower_ubo_reference_visitor::setup_for_load_or_store(void *mem_ctx,
unsigned *const_offset,
bool *row_major,
int *matrix_columns,
unsigned packing)
enum glsl_interface_packing packing)
{
/* Determine the name of the interface block */
ir_rvalue *nonconst_block_index;
@@ -344,7 +344,7 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue)
unsigned const_offset;
bool row_major;
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
enum glsl_interface_packing packing = var->get_interface_type_packing();
this->buffer_access_type =
var->is_in_shader_storage_block() ?
@@ -557,7 +557,7 @@ lower_ubo_reference_visitor::write_to_memory(void *mem_ctx,
unsigned const_offset;
bool row_major;
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
enum glsl_interface_packing packing = var->get_interface_type_packing();
this->buffer_access_type = ssbo_store_access;
this->variable = var;
@@ -666,7 +666,7 @@ lower_ubo_reference_visitor::emit_ssbo_get_buffer_size(void *mem_ctx)
unsigned
lower_ubo_reference_visitor::calculate_unsized_array_stride(ir_dereference *deref,
unsigned packing)
enum glsl_interface_packing packing)
{
unsigned array_stride = 0;
@@ -736,7 +736,7 @@ lower_ubo_reference_visitor::process_ssbo_unsized_array_length(ir_rvalue **rvalu
unsigned const_offset;
bool row_major;
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
enum glsl_interface_packing packing = var->get_interface_type_packing();
int unsized_array_stride = calculate_unsized_array_stride(deref, packing);
this->buffer_access_type = ssbo_unsized_array_length_access;
@@ -970,7 +970,7 @@ lower_ubo_reference_visitor::lower_ssbo_atomic_intrinsic(ir_call *ir)
unsigned const_offset;
bool row_major;
int matrix_columns;
unsigned packing = var->get_interface_type()->interface_packing;
enum glsl_interface_packing packing = var->get_interface_type_packing();
this->buffer_access_type = ssbo_atomic_access;
this->variable = var;

View File

@@ -83,6 +83,7 @@ public:
}
virtual ir_visitor_status visit(class ir_dereference_variable *);
void handle_loop(class ir_loop *, bool keep_acp);
virtual ir_visitor_status visit_enter(class ir_loop *);
virtual ir_visitor_status visit_enter(class ir_function_signature *);
virtual ir_visitor_status visit_enter(class ir_function *);
@@ -252,21 +253,24 @@ ir_copy_propagation_visitor::visit_enter(ir_if *ir)
return visit_continue_with_parent;
}
ir_visitor_status
ir_copy_propagation_visitor::visit_enter(ir_loop *ir)
void
ir_copy_propagation_visitor::handle_loop(ir_loop *ir, bool keep_acp)
{
exec_list *orig_acp = this->acp;
exec_list *orig_kills = this->kills;
bool orig_killed_all = this->killed_all;
/* FINISHME: For now, the initial acp for loops is totally empty.
* We could go through once, then go through again with the acp
* cloned minus the killed entries after the first run through.
*/
this->acp = new(mem_ctx) exec_list;
this->kills = new(mem_ctx) exec_list;
this->killed_all = false;
if (keep_acp) {
/* Populate the initial acp with a copy of the original */
foreach_in_list(acp_entry, a, orig_acp) {
this->acp->push_tail(new(this->acp) acp_entry(a->lhs, a->rhs));
}
}
visit_list_elements(this, &ir->body_instructions);
if (this->killed_all) {
@@ -284,6 +288,20 @@ ir_copy_propagation_visitor::visit_enter(ir_loop *ir)
}
ralloc_free(new_kills);
}
ir_visitor_status
ir_copy_propagation_visitor::visit_enter(ir_loop *ir)
{
/* Make a conservative first pass over the loop with an empty ACP set.
* This also removes any killed entries from the original ACP set.
*/
handle_loop(ir, false);
/* Then, run it again with the real ACP set, minus any killed entries.
* This takes care of propagating values from before the loop into it.
*/
handle_loop(ir, true);
/* already descended into the children. */
return visit_continue_with_parent;

View File

@@ -106,6 +106,7 @@ public:
ralloc_free(mem_ctx);
}
void handle_loop(ir_loop *, bool keep_acp);
virtual ir_visitor_status visit_enter(class ir_loop *);
virtual ir_visitor_status visit_enter(class ir_function_signature *);
virtual ir_visitor_status visit_leave(class ir_assignment *);
@@ -374,8 +375,8 @@ ir_copy_propagation_elements_visitor::visit_enter(ir_if *ir)
return visit_continue_with_parent;
}
ir_visitor_status
ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
void
ir_copy_propagation_elements_visitor::handle_loop(ir_loop *ir, bool keep_acp)
{
exec_list *orig_acp = this->acp;
exec_list *orig_kills = this->kills;
@@ -389,6 +390,13 @@ ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
this->kills = new(mem_ctx) exec_list;
this->killed_all = false;
if (keep_acp) {
/* Populate the initial acp with a copy of the original */
foreach_in_list(acp_entry, a, orig_acp) {
this->acp->push_tail(new(this->acp) acp_entry(a));
}
}
visit_list_elements(this, &ir->body_instructions);
if (this->killed_all) {
@@ -406,6 +414,13 @@ ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
}
ralloc_free(new_kills);
}
ir_visitor_status
ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
{
handle_loop(ir, false);
handle_loop(ir, true);
/* already descended into the children. */
return visit_continue_with_parent;

View File

@@ -144,7 +144,7 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned)
* layouts, do not eliminate it.
*/
if (entry->var->is_in_buffer_block()) {
if (entry->var->get_interface_type()->interface_packing !=
if (entry->var->get_interface_type_packing() !=
GLSL_INTERFACE_PACKING_PACKED)
continue;
}

View File

@@ -1434,7 +1434,7 @@ glsl_type::can_implicitly_convert_to(const glsl_type *desired,
unsigned
glsl_type::std140_base_alignment(bool row_major) const
{
unsigned N = is_double() ? 8 : 4;
unsigned N = is_64bit() ? 8 : 4;
/* (1) If the member is a scalar consuming <N> basic machine units, the
* base alignment is <N>.
@@ -1552,7 +1552,7 @@ glsl_type::std140_base_alignment(bool row_major) const
unsigned
glsl_type::std140_size(bool row_major) const
{
unsigned N = is_double() ? 8 : 4;
unsigned N = is_64bit() ? 8 : 4;
/* (1) If the member is a scalar consuming <N> basic machine units, the
* base alignment is <N>.
@@ -1689,7 +1689,7 @@ unsigned
glsl_type::std430_base_alignment(bool row_major) const
{
unsigned N = is_double() ? 8 : 4;
unsigned N = is_64bit() ? 8 : 4;
/* (1) If the member is a scalar consuming <N> basic machine units, the
* base alignment is <N>.
@@ -1798,7 +1798,7 @@ glsl_type::std430_base_alignment(bool row_major) const
unsigned
glsl_type::std430_array_stride(bool row_major) const
{
unsigned N = is_double() ? 8 : 4;
unsigned N = is_64bit() ? 8 : 4;
/* Notice that the array stride of a vec3 is not 3 * N but 4 * N.
* See OpenGL 4.30 spec, section 7.6.2.2 "Standard Uniform Block Layout"
@@ -1816,7 +1816,7 @@ glsl_type::std430_array_stride(bool row_major) const
unsigned
glsl_type::std430_size(bool row_major) const
{
unsigned N = is_double() ? 8 : 4;
unsigned N = is_64bit() ? 8 : 4;
/* OpenGL 4.30 spec, section 7.6.2.2 "Standard Uniform Block Layout":
*

View File

@@ -64,6 +64,11 @@ enum glsl_base_type {
GLSL_TYPE_ERROR
};
static inline bool glsl_base_type_is_64bit(enum glsl_base_type type)
{
return type == GLSL_TYPE_DOUBLE;
}
enum glsl_sampler_dim {
GLSL_SAMPLER_DIM_1D = 0,
GLSL_SAMPLER_DIM_2D,
@@ -490,11 +495,19 @@ struct glsl_type {
}
/**
* Query whether a double takes two slots.
* Query whether a 64-bit type takes two slots.
*/
bool is_dual_slot_double() const
bool is_dual_slot() const
{
return base_type == GLSL_TYPE_DOUBLE && vector_elements > 2;
return is_64bit() && vector_elements > 2;
}
/**
* Query whether or not a type is 64-bit
*/
bool is_64bit() const
{
return glsl_base_type_is_64bit(base_type);
}
/**
@@ -745,6 +758,14 @@ struct glsl_type {
*/
bool record_compare(const glsl_type *b, bool match_locations = true) const;
/**
* Get the type interface packing.
*/
enum glsl_interface_packing get_interface_packing() const
{
return (enum glsl_interface_packing)interface_packing;
}
private:
static mtx_t mutex;

View File

@@ -1651,6 +1651,9 @@ typedef struct nir_shader_compiler_options {
/* lower {slt,sge,seq,sne} to {flt,fge,feq,fne} + b2f: */
bool lower_scmp;
/** enables rules to lower idiv by power-of-two: */
bool lower_idiv;
/* Does the native fdot instruction replicate its result for four
* components? If so, then opt_algebraic_late will turn all fdotN
* instructions into fdot_replicatedN instructions.
@@ -1720,9 +1723,6 @@ typedef struct nir_shader_info {
/* Whether or not this shader ever uses textureGather() */
bool uses_texture_gather;
/** Whether or not this shader uses nir_intrinsic_interp_var_at_offset */
bool uses_interp_var_at_offset;
/* Whether or not this shader uses the gl_ClipDistance output */
bool uses_clip_distance_out;

View File

@@ -76,6 +76,7 @@ class Value(object):
return Constant(val, name_base)
__template = mako.template.Template("""
#include "compiler/nir/nir_search_helpers.h"
static const ${val.c_type} ${val.name} = {
{ ${val.type_enum}, ${val.bit_size} },
% if isinstance(val, Constant):
@@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
${val.index}, /* ${val.var_name} */
${'true' if val.is_constant else 'false'},
${val.type() or 'nir_type_invalid' },
${val.cond if val.cond else 'NULL'},
% elif isinstance(val, Expression):
${'true' if val.inexact else 'false'},
nir_op_${val.opcode},
@@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
Variable=Variable,
Expression=Expression)
_constant_re = re.compile(r"(?P<value>[^@]+)(?:@(?P<bits>\d+))?")
_constant_re = re.compile(r"(?P<value>[^@\(]+)(?:@(?P<bits>\d+))?")
class Constant(Value):
def __init__(self, val, name):
@@ -150,7 +152,8 @@ class Constant(Value):
return "nir_type_float"
_var_name_re = re.compile(r"(?P<const>#)?(?P<name>\w+)"
r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?")
r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?"
r"(?P<cond>\([^\)]+\))?")
class Variable(Value):
def __init__(self, val, name, varset):
@@ -161,6 +164,7 @@ class Variable(Value):
self.var_name = m.group('name')
self.is_constant = m.group('const') is not None
self.cond = m.group('cond')
self.required_type = m.group('type')
self.bit_size = int(m.group('bits')) if m.group('bits') else 0

View File

@@ -57,10 +57,6 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, nir_shader *shader)
shader->info.gs.uses_end_primitive = 1;
break;
case nir_intrinsic_interp_var_at_offset:
shader->info.uses_interp_var_at_offset = 1;
break;
default:
break;
}

View File

@@ -45,10 +45,11 @@ d = 'd'
# however, be used for backend-requested lowering operations as those need to
# happen regardless of precision.
#
# Variable names are specified as "[#]name[@type]" where "#" inicates that
# the given variable will only match constants and the type indicates that
# Variable names are specified as "[#]name[@type][(cond)]" where "#" inicates
# that the given variable will only match constants and the type indicates that
# the given variable will only match values from ALU instructions with the
# given output type.
# given output type, and (cond) specifies an additional condition function
# (see nir_search_helpers.h).
#
# For constants, you have to be careful to make sure that it is the right
# type because python is unaware of the source and destination types of the
@@ -62,6 +63,14 @@ d = 'd'
# constructed value should have that bit-size.
optimizations = [
(('imul', a, '#b@32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b))),
(('imul', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('ishl', a, ('find_lsb', ('iabs', b))))),
(('udiv', a, '#b@32(is_pos_power_of_two)'), ('ushr', a, ('find_lsb', b))),
(('idiv', a, '#b@32(is_pos_power_of_two)'), ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),
(('idiv', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))), 'options->lower_idiv'),
(('umod', a, '#b(is_pos_power_of_two)'), ('iand', a, ('isub', b, 1))),
(('fneg', ('fneg', a)), a),
(('ineg', ('ineg', a)), a),
(('fabs', ('fabs', a)), ('fabs', a)),

View File

@@ -127,6 +127,9 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src,
instr->src[src].src.ssa->parent_instr->type != nir_instr_type_load_const)
return false;
if (var->cond && !var->cond(instr, src, num_components, new_swizzle))
return false;
if (var->type != nir_type_invalid) {
if (instr->src[src].src.ssa->parent_instr->type != nir_instr_type_alu)
return false;

View File

@@ -68,6 +68,16 @@ typedef struct {
* never match anything.
*/
nir_alu_type type;
/** Optional condition fxn ptr
*
* This is only allowed in search expressions, and allows additional
* constraints to be placed on the match. Typically used for 'is_constant'
* variables to require, for example, power-of-two in order for the search
* to match.
*/
bool (*cond)(nir_alu_instr *instr, unsigned src,
unsigned num_components, const uint8_t *swizzle);
} nir_search_variable;
typedef struct {

View File

@@ -0,0 +1,94 @@
/*
* Copyright © 2016 Red Hat
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*
* Authors:
* Rob Clark <robclark@freedesktop.org>
*/
#ifndef _NIR_SEARCH_HELPERS_
#define _NIR_SEARCH_HELPERS_
#include "nir.h"
static inline bool
__is_power_of_two(unsigned int x)
{
return ((x != 0) && !(x & (x - 1)));
}
static inline bool
is_pos_power_of_two(nir_alu_instr *instr, unsigned src, unsigned num_components,
const uint8_t *swizzle)
{
nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
/* only constant src's: */
if (!val)
return false;
for (unsigned i = 0; i < num_components; i++) {
switch (nir_op_infos[instr->op].input_types[src]) {
case nir_type_int:
if (val->i32[swizzle[i]] < 0)
return false;
if (!__is_power_of_two(val->i32[swizzle[i]]))
return false;
break;
case nir_type_uint:
if (!__is_power_of_two(val->u32[swizzle[i]]))
return false;
break;
default:
return false;
}
}
return true;
}
static inline bool
is_neg_power_of_two(nir_alu_instr *instr, unsigned src, unsigned num_components,
const uint8_t *swizzle)
{
nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
/* only constant src's: */
if (!val)
return false;
for (unsigned i = 0; i < num_components; i++) {
switch (nir_op_infos[instr->op].input_types[src]) {
case nir_type_int:
if (val->i32[swizzle[i]] > 0)
return false;
if (!__is_power_of_two(abs(val->i32[swizzle[i]])))
return false;
break;
default:
return false;
}
}
return true;
}
#endif /* _NIR_SEARCH_ */

View File

@@ -61,12 +61,6 @@ ifeq ($(shell echo "$(MESA_ANDROID_VERSION) >= 4.2" | bc),1)
LOCAL_SHARED_LIBRARIES += libsync
endif
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm
endif
ifeq ($(strip $(MESA_BUILD_CLASSIC)),true)
# require i915_dri and/or i965_dri
LOCAL_REQUIRED_MODULES += \

View File

@@ -160,8 +160,14 @@ droid_window_dequeue_buffer(struct dri2_egl_surface *dri2_surf)
}
static EGLBoolean
droid_window_enqueue_buffer(struct dri2_egl_surface *dri2_surf)
droid_window_enqueue_buffer(_EGLDisplay *disp, struct dri2_egl_surface *dri2_surf)
{
/* To avoid blocking other EGL calls, release the display mutex before
* we enter droid_window_enqueue_buffer() and re-acquire the mutex upon
* return.
*/
mtx_unlock(&disp->Mutex);
#if ANDROID_VERSION >= 0x0402
/* Queue the buffer without a sync fence. This informs the ANativeWindow
* that it may access the buffer immediately.
@@ -185,14 +191,15 @@ droid_window_enqueue_buffer(struct dri2_egl_surface *dri2_surf)
dri2_surf->buffer->common.decRef(&dri2_surf->buffer->common);
dri2_surf->buffer = NULL;
mtx_lock(&disp->Mutex);
return EGL_TRUE;
}
static void
droid_window_cancel_buffer(struct dri2_egl_surface *dri2_surf)
droid_window_cancel_buffer(_EGLDisplay *disp, struct dri2_egl_surface *dri2_surf)
{
/* no cancel buffer? */
droid_window_enqueue_buffer(dri2_surf);
droid_window_enqueue_buffer(disp, dri2_surf);
}
static __DRIbuffer *
@@ -325,7 +332,7 @@ droid_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf)
if (dri2_surf->base.Type == EGL_WINDOW_BIT) {
if (dri2_surf->buffer)
droid_window_cancel_buffer(dri2_surf);
droid_window_cancel_buffer(disp, dri2_surf);
dri2_surf->window->common.decRef(&dri2_surf->window->common);
}
@@ -435,7 +442,7 @@ droid_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
dri2_flush_drawable_for_swapbuffers(disp, draw);
if (dri2_surf->buffer)
droid_window_enqueue_buffer(dri2_surf);
droid_window_enqueue_buffer(disp, dri2_surf);
(*dri2_dpy->flush->invalidate)(dri2_surf->dri_drawable);

View File

@@ -236,6 +236,12 @@ _eglParseSurfaceAttribList(_EGLSurface *surf, const EGLint *attrib_list)
}
if (type == EGL_PBUFFER_BIT) {
if (tex_target == -1)
tex_target = surf->TextureTarget;
if (tex_format == -1)
tex_format = surf->TextureFormat;
if ((tex_target == EGL_NO_TEXTURE && tex_format != EGL_NO_TEXTURE) ||
(tex_format == EGL_NO_TEXTURE && tex_target != EGL_NO_TEXTURE)) {
err = EGL_BAD_MATCH;

View File

@@ -288,8 +288,6 @@ C_SOURCES := \
util/u_slab.h \
util/u_split_prim.h \
util/u_sse.h \
util/u_staging.c \
util/u_staging.h \
util/u_string.h \
util/u_suballoc.c \
util/u_suballoc.h \

View File

@@ -91,6 +91,9 @@ struct cso_context {
struct pipe_constant_buffer aux_constbuf_current[PIPE_SHADER_TYPES];
struct pipe_constant_buffer aux_constbuf_saved[PIPE_SHADER_TYPES];
struct pipe_image_view fragment_image0_current;
struct pipe_image_view fragment_image0_saved;
unsigned nr_so_targets;
struct pipe_stream_output_target *so_targets[PIPE_MAX_SO_BUFFERS];
@@ -371,6 +374,9 @@ void cso_destroy_context( struct cso_context *ctx )
pipe_resource_reference(&ctx->aux_constbuf_saved[i].buffer, NULL);
}
pipe_resource_reference(&ctx->fragment_image0_current.resource, NULL);
pipe_resource_reference(&ctx->fragment_image0_saved.resource, NULL);
for (i = 0; i < PIPE_MAX_SO_BUFFERS; i++) {
pipe_so_target_reference(&ctx->so_targets[i], NULL);
pipe_so_target_reference(&ctx->so_targets_saved[i], NULL);
@@ -1352,6 +1358,35 @@ cso_restore_fragment_sampler_views(struct cso_context *ctx)
}
void
cso_set_shader_images(struct cso_context *ctx, unsigned shader_stage,
unsigned start, unsigned count,
struct pipe_image_view *images)
{
if (shader_stage == PIPE_SHADER_FRAGMENT && start == 0 && count >= 1) {
util_copy_image_view(&ctx->fragment_image0_current, &images[0]);
}
ctx->pipe->set_shader_images(ctx->pipe, shader_stage, start, count, images);
}
static void
cso_save_fragment_image0(struct cso_context *ctx)
{
util_copy_image_view(&ctx->fragment_image0_saved,
&ctx->fragment_image0_current);
}
static void
cso_restore_fragment_image0(struct cso_context *ctx)
{
cso_set_shader_images(ctx, PIPE_SHADER_FRAGMENT, 0, 1,
&ctx->fragment_image0_saved);
}
void
cso_set_stream_outputs(struct cso_context *ctx,
unsigned num_targets,
@@ -1541,6 +1576,8 @@ cso_save_state(struct cso_context *cso, unsigned state_mask)
cso_save_viewport(cso);
if (state_mask & CSO_BIT_PAUSE_QUERIES)
cso->pipe->set_active_query_state(cso->pipe, false);
if (state_mask & CSO_BIT_FRAGMENT_IMAGE0)
cso_save_fragment_image0(cso);
}
@@ -1594,6 +1631,8 @@ cso_restore_state(struct cso_context *cso)
cso_restore_viewport(cso);
if (state_mask & CSO_BIT_PAUSE_QUERIES)
cso->pipe->set_active_query_state(cso->pipe, true);
if (state_mask & CSO_BIT_FRAGMENT_IMAGE0)
cso_restore_fragment_image0(cso);
cso->saved_state = 0;
}

View File

@@ -171,6 +171,7 @@ void cso_set_render_condition(struct cso_context *cso,
#define CSO_BIT_VERTEX_SHADER 0x20000
#define CSO_BIT_VIEWPORT 0x40000
#define CSO_BIT_PAUSE_QUERIES 0x80000
#define CSO_BIT_FRAGMENT_IMAGE0 0x100000
#define CSO_BITS_ALL_SHADERS (CSO_BIT_VERTEX_SHADER | \
CSO_BIT_FRAGMENT_SHADER | \
@@ -191,6 +192,14 @@ cso_set_sampler_views(struct cso_context *cso,
struct pipe_sampler_view **views);
/* shader images */
void
cso_set_shader_images(struct cso_context *cso, unsigned shader_stage,
unsigned start, unsigned count,
struct pipe_image_view *views);
/* constant buffers */
void cso_set_constant_buffer(struct cso_context *cso, unsigned shader_stage,

View File

@@ -1123,10 +1123,8 @@ generate_viewport(struct draw_llvm_variant *variant,
/* divide by w */
out = LLVMBuildFMul(builder, out, out3, "");
/* mult by scale */
out = LLVMBuildFMul(builder, out, scale, "");
/* add translation */
out = LLVMBuildFAdd(builder, out, trans, "");
/* mult by scale, add translation */
out = lp_build_fmuladd(builder, out, scale, trans);
/* store transformed outputs */
LLVMBuildStore(builder, out, outputs[pos][i]);
@@ -1303,22 +1301,19 @@ generate_clipmask(struct draw_llvm *llvm,
plane_ptr = LLVMBuildGEP(builder, planes_ptr, indices, 3, "");
plane1 = LLVMBuildLoad(builder, plane_ptr, "plane_y");
planes = lp_build_broadcast(gallivm, vs_type_llvm, plane1);
test = LLVMBuildFMul(builder, planes, cv_y, "");
sum = LLVMBuildFAdd(builder, sum, test, "");
sum = lp_build_fmuladd(builder, planes, cv_y, sum);
indices[2] = lp_build_const_int32(gallivm, 2);
plane_ptr = LLVMBuildGEP(builder, planes_ptr, indices, 3, "");
plane1 = LLVMBuildLoad(builder, plane_ptr, "plane_z");
planes = lp_build_broadcast(gallivm, vs_type_llvm, plane1);
test = LLVMBuildFMul(builder, planes, cv_z, "");
sum = LLVMBuildFAdd(builder, sum, test, "");
sum = lp_build_fmuladd(builder, planes, cv_z, sum);
indices[2] = lp_build_const_int32(gallivm, 3);
plane_ptr = LLVMBuildGEP(builder, planes_ptr, indices, 3, "");
plane1 = LLVMBuildLoad(builder, plane_ptr, "plane_w");
planes = lp_build_broadcast(gallivm, vs_type_llvm, plane1);
test = LLVMBuildFMul(builder, planes, cv_w, "");
sum = LLVMBuildFAdd(builder, sum, test, "");
sum = lp_build_fmuladd(builder, planes, cv_w, sum);
test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, zero, sum);
temp = lp_build_const_int_vec(gallivm, i32_type, 1LL << plane_idx);

View File

@@ -50,7 +50,6 @@
#include "util/u_memory.h"
#include "util/u_debug.h"
#include "util/u_math.h"
#include "util/u_string.h"
#include "util/u_cpu_detect.h"
#include "lp_bld_type.h"
@@ -262,6 +261,28 @@ lp_build_min_simple(struct lp_build_context *bld,
}
LLVMValueRef
lp_build_fmuladd(LLVMBuilderRef builder,
LLVMValueRef a,
LLVMValueRef b,
LLVMValueRef c)
{
LLVMTypeRef type = LLVMTypeOf(a);
assert(type == LLVMTypeOf(b));
assert(type == LLVMTypeOf(c));
if (HAVE_LLVM < 0x0304) {
/* XXX: LLVM 3.3 does not breakdown llvm.fmuladd into mul+add when FMA is
* not supported, and instead it falls-back to a C function.
*/
return LLVMBuildFAdd(builder, LLVMBuildFMul(builder, a, b, ""), c, "");
}
char intrinsic[32];
lp_format_intrinsic(intrinsic, sizeof intrinsic, "llvm.fmuladd", type);
LLVMValueRef args[] = { a, b, c };
return lp_build_intrinsic(builder, intrinsic, type, args, 3, 0);
}
/**
* Generate max(a, b)
* No checks for special case values of a or b = 1 or 0 are done.
@@ -1023,6 +1044,22 @@ lp_build_mul(struct lp_build_context *bld,
}
/* a * b + c */
LLVMValueRef
lp_build_mad(struct lp_build_context *bld,
LLVMValueRef a,
LLVMValueRef b,
LLVMValueRef c)
{
const struct lp_type type = bld->type;
if (type.floating) {
return lp_build_fmuladd(bld->gallivm->builder, a, b, c);
} else {
return lp_build_add(bld, lp_build_mul(bld, a, b), c);
}
}
/**
* Small vector x scale multiplication optimization.
*/
@@ -1153,6 +1190,11 @@ lp_build_lerp_simple(struct lp_build_context *bld,
delta = lp_build_sub(bld, v1, v0);
if (bld->type.floating) {
assert(flags == 0);
return lp_build_mad(bld, x, delta, v0);
}
if (flags & LP_BLD_LERP_WIDE_NORMALIZED) {
if (!bld->type.sign) {
if (!(flags & LP_BLD_LERP_PRESCALED_WEIGHTS)) {
@@ -2717,23 +2759,10 @@ lp_build_sin_or_cos(struct lp_build_context *bld,
/*
* The magic pass: "Extended precision modular arithmetic"
* x = ((x - y * DP1) - y * DP2) - y * DP3;
* xmm1 = _mm_mul_ps(y, xmm1);
* xmm2 = _mm_mul_ps(y, xmm2);
* xmm3 = _mm_mul_ps(y, xmm3);
*/
LLVMValueRef xmm1 = LLVMBuildFMul(b, y_2, DP1, "xmm1");
LLVMValueRef xmm2 = LLVMBuildFMul(b, y_2, DP2, "xmm2");
LLVMValueRef xmm3 = LLVMBuildFMul(b, y_2, DP3, "xmm3");
/*
* x = _mm_add_ps(x, xmm1);
* x = _mm_add_ps(x, xmm2);
* x = _mm_add_ps(x, xmm3);
*/
LLVMValueRef x_1 = LLVMBuildFAdd(b, x_abs, xmm1, "x_1");
LLVMValueRef x_2 = LLVMBuildFAdd(b, x_1, xmm2, "x_2");
LLVMValueRef x_3 = LLVMBuildFAdd(b, x_2, xmm3, "x_3");
LLVMValueRef x_1 = lp_build_fmuladd(b, y_2, DP1, x_abs);
LLVMValueRef x_2 = lp_build_fmuladd(b, y_2, DP2, x_1);
LLVMValueRef x_3 = lp_build_fmuladd(b, y_2, DP3, x_2);
/*
* Evaluate the first polynom (0 <= x <= Pi/4)
@@ -2755,10 +2784,8 @@ lp_build_sin_or_cos(struct lp_build_context *bld,
* y = *(v4sf*)_ps_coscof_p0;
* y = _mm_mul_ps(y, z);
*/
LLVMValueRef y_3 = LLVMBuildFMul(b, z, coscof_p0, "y_3");
LLVMValueRef y_4 = LLVMBuildFAdd(b, y_3, coscof_p1, "y_4");
LLVMValueRef y_5 = LLVMBuildFMul(b, y_4, z, "y_5");
LLVMValueRef y_6 = LLVMBuildFAdd(b, y_5, coscof_p2, "y_6");
LLVMValueRef y_4 = lp_build_fmuladd(b, z, coscof_p0, coscof_p1);
LLVMValueRef y_6 = lp_build_fmuladd(b, y_4, z, coscof_p2);
LLVMValueRef y_7 = LLVMBuildFMul(b, y_6, z, "y_7");
LLVMValueRef y_8 = LLVMBuildFMul(b, y_7, z, "y_8");
@@ -2796,13 +2823,10 @@ lp_build_sin_or_cos(struct lp_build_context *bld,
* y2 = _mm_add_ps(y2, x);
*/
LLVMValueRef y2_3 = LLVMBuildFMul(b, z, sincof_p0, "y2_3");
LLVMValueRef y2_4 = LLVMBuildFAdd(b, y2_3, sincof_p1, "y2_4");
LLVMValueRef y2_5 = LLVMBuildFMul(b, y2_4, z, "y2_5");
LLVMValueRef y2_6 = LLVMBuildFAdd(b, y2_5, sincof_p2, "y2_6");
LLVMValueRef y2_4 = lp_build_fmuladd(b, z, sincof_p0, sincof_p1);
LLVMValueRef y2_6 = lp_build_fmuladd(b, y2_4, z, sincof_p2);
LLVMValueRef y2_7 = LLVMBuildFMul(b, y2_6, z, "y2_7");
LLVMValueRef y2_8 = LLVMBuildFMul(b, y2_7, x_3, "y2_8");
LLVMValueRef y2_9 = LLVMBuildFAdd(b, y2_8, x_3, "y2_9");
LLVMValueRef y2_9 = lp_build_fmuladd(b, y2_7, x_3, x_3);
/*
* select the correct result from the two polynoms
@@ -2969,19 +2993,19 @@ lp_build_polynomial(struct lp_build_context *bld,
if (i % 2 == 0) {
if (even)
even = lp_build_add(bld, coeff, lp_build_mul(bld, x2, even));
even = lp_build_mad(bld, x2, even, coeff);
else
even = coeff;
} else {
if (odd)
odd = lp_build_add(bld, coeff, lp_build_mul(bld, x2, odd));
odd = lp_build_mad(bld, x2, odd, coeff);
else
odd = coeff;
}
}
if (odd)
return lp_build_add(bld, lp_build_mul(bld, odd, x), even);
return lp_build_mad(bld, odd, x, even);
else if (even)
return even;
else
@@ -3212,7 +3236,7 @@ lp_build_log2_approx(struct lp_build_context *bld,
LLVMValueRef exp = NULL;
LLVMValueRef mant = NULL;
LLVMValueRef logexp = NULL;
LLVMValueRef logmant = NULL;
LLVMValueRef p_z = NULL;
LLVMValueRef res = NULL;
assert(lp_check_value(bld->type, x));
@@ -3261,13 +3285,11 @@ lp_build_log2_approx(struct lp_build_context *bld,
z = lp_build_mul(bld, y, y);
/* compute P(z) */
logmant = lp_build_polynomial(bld, z, lp_build_log2_polynomial,
ARRAY_SIZE(lp_build_log2_polynomial));
p_z = lp_build_polynomial(bld, z, lp_build_log2_polynomial,
ARRAY_SIZE(lp_build_log2_polynomial));
/* logmant = y * P(z) */
logmant = lp_build_mul(bld, y, logmant);
res = lp_build_add(bld, logmant, logexp);
/* y * P(z) + logexp */
res = lp_build_mad(bld, y, p_z, logexp);
if (type.floating && handle_edge_cases) {
LLVMValueRef negmask, infmask, zmask;

View File

@@ -87,6 +87,21 @@ lp_build_div(struct lp_build_context *bld,
LLVMValueRef b);
/* llvm.fmuladd.* intrinsic */
LLVMValueRef
lp_build_fmuladd(LLVMBuilderRef builder,
LLVMValueRef a,
LLVMValueRef b,
LLVMValueRef c);
/* a * b + c */
LLVMValueRef
lp_build_mad(struct lp_build_context *bld,
LLVMValueRef a,
LLVMValueRef b,
LLVMValueRef c);
/**
* Set when the weights for normalized are prescaled, that is, in range
* 0..2**n, as opposed to range 0..2**(n-1).

View File

@@ -311,7 +311,7 @@ lp_build_clamped_float_to_unsigned_norm(struct gallivm_state *gallivm,
* important, we also get exact results for 0.0 and 1.0.
*/
unsigned n = MIN2(src_type.width - 1, dst_width);
unsigned n = MIN2(src_type.width - 1u, dst_width);
double scale = (double)(1ULL << n);
unsigned lshift = dst_width - n;
@@ -445,7 +445,7 @@ int lp_build_conv_auto(struct gallivm_state *gallivm,
unsigned num_srcs,
LLVMValueRef *dst)
{
int i;
unsigned i;
int num_dsts = num_srcs;
if (src_type.floating == dst_type->floating &&

View File

@@ -289,8 +289,7 @@ lp_build_linear_to_srgb(struct gallivm_state *gallivm,
c_const = lp_build_const_vec(gallivm, src_type, -0.0620f * 255.0f);
tmp = lp_build_mul(&f32_bld, a_const, x0375);
tmp2 = lp_build_mul(&f32_bld, b_const, x05);
tmp2 = lp_build_add(&f32_bld, tmp2, c_const);
tmp2 = lp_build_mad(&f32_bld, b_const, x05, c_const);
pow_final = lp_build_add(&f32_bld, tmp, tmp2);
}

View File

@@ -420,6 +420,7 @@ lp_build_init(void)
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
util_cpu_caps.has_fma = 0;
}
#endif
@@ -454,6 +455,7 @@ lp_build_init(void)
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
util_cpu_caps.has_fma = 0;
}
#ifdef PIPE_ARCH_PPC_64

View File

@@ -88,8 +88,6 @@ lp_build_compare_ext(struct gallivm_state *gallivm,
LLVMValueRef cond;
LLVMValueRef res;
assert(func >= PIPE_FUNC_NEVER);
assert(func <= PIPE_FUNC_ALWAYS);
assert(lp_check_value(type, a));
assert(lp_check_value(type, b));
@@ -98,6 +96,9 @@ lp_build_compare_ext(struct gallivm_state *gallivm,
if(func == PIPE_FUNC_ALWAYS)
return ones;
assert(func > PIPE_FUNC_NEVER);
assert(func < PIPE_FUNC_ALWAYS);
if(type.floating) {
LLVMRealPredicate op;
switch(func) {
@@ -176,8 +177,6 @@ lp_build_compare(struct gallivm_state *gallivm,
LLVMValueRef zeros = LLVMConstNull(int_vec_type);
LLVMValueRef ones = LLVMConstAllOnes(int_vec_type);
assert(func >= PIPE_FUNC_NEVER);
assert(func <= PIPE_FUNC_ALWAYS);
assert(lp_check_value(type, a));
assert(lp_check_value(type, b));
@@ -186,6 +185,9 @@ lp_build_compare(struct gallivm_state *gallivm,
if(func == PIPE_FUNC_ALWAYS)
return ones;
assert(func > PIPE_FUNC_NEVER);
assert(func < PIPE_FUNC_ALWAYS);
#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
/*
* There are no unsigned integer comparison instructions in SSE.

View File

@@ -570,6 +570,15 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
*/
MAttrs.push_back(util_cpu_caps.has_avx ? "+avx" : "-avx");
MAttrs.push_back(util_cpu_caps.has_f16c ? "+f16c" : "-f16c");
if (HAVE_LLVM >= 0x0304) {
MAttrs.push_back(util_cpu_caps.has_fma ? "+fma" : "-fma");
} else {
/*
* The old JIT in LLVM 3.3 has a bug encoding llvm.fmuladd.f32 and
* llvm.fmuladd.v2f32 intrinsics when FMA is available.
*/
MAttrs.push_back("-fma");
}
MAttrs.push_back(util_cpu_caps.has_avx2 ? "+avx2" : "-avx2");
/* disable avx512 and all subvariants */
#if HAVE_LLVM >= 0x0304

View File

@@ -236,7 +236,7 @@ lp_build_concat_n(struct gallivm_state *gallivm,
unsigned num_dsts)
{
int size = num_srcs / num_dsts;
int i;
unsigned i;
assert(num_srcs >= num_dsts);
assert((num_srcs % size) == 0);

View File

@@ -155,10 +155,10 @@ lp_build_print_value(struct gallivm_state *gallivm,
}
static int
static unsigned
lp_get_printf_arg_count(const char *fmt)
{
int count =0;
unsigned count = 0;
const char *p = fmt;
int c;
@@ -195,8 +195,7 @@ lp_build_printf(struct gallivm_state *gallivm,
{
LLVMValueRef params[50];
va_list arglist;
int argcount;
int i;
unsigned argcount, i;
argcount = lp_get_printf_arg_count(fmt);
assert(ARRAY_SIZE(params) >= argcount + 1);

View File

@@ -580,10 +580,8 @@ lp_build_brilinear_lod(struct lp_build_context *bld,
lp_build_ifloor_fract(bld, lod, out_lod_ipart, &lod_fpart);
lod_fpart = lp_build_mul(bld, lod_fpart,
lp_build_const_vec(bld->gallivm, bld->type, factor));
lod_fpart = lp_build_add(bld, lod_fpart,
lod_fpart = lp_build_mad(bld, lod_fpart,
lp_build_const_vec(bld->gallivm, bld->type, factor),
lp_build_const_vec(bld->gallivm, bld->type, post_offset));
/*
@@ -639,10 +637,8 @@ lp_build_brilinear_rho(struct lp_build_context *bld,
/* fpart = rho / 2**ipart */
lod_fpart = lp_build_extract_mantissa(bld, rho);
lod_fpart = lp_build_mul(bld, lod_fpart,
lp_build_const_vec(bld->gallivm, bld->type, factor));
lod_fpart = lp_build_add(bld, lod_fpart,
lod_fpart = lp_build_mad(bld, lod_fpart,
lp_build_const_vec(bld->gallivm, bld->type, factor),
lp_build_const_vec(bld->gallivm, bld->type, post_offset));
/*

View File

@@ -467,7 +467,7 @@ lp_build_swizzle_aos(struct lp_build_context *bld,
LLVMValueRef res;
struct lp_type type4;
unsigned cond = 0;
unsigned chan;
int chan;
int shift;
/*

View File

@@ -186,15 +186,15 @@ void lp_build_fetch_args(
}
/**
* with doubles src and dst channels aren't 1:1.
* with 64-bit src and dst channels aren't 1:1.
* check the src/dst types for the opcode,
* 1. if neither is double then src == dst;
* 2. if dest is double
* 1. if neither is 64-bit then src == dst;
* 2. if dest is 64-bit
* - don't store to y or w
* - if src is double then src == dst.
* - if src is 64-bit then src == dst.
* - else for f2d, d.xy = s.x
* - else for f2d, d.zw = s.y
* 3. if dst is single, src is double
* 3. if dst is single, src is 64-bit
* - map dst x,z to src xy;
* - map dst y,w to src zw;
*/
@@ -204,12 +204,12 @@ static int get_src_chan_idx(unsigned opcode,
enum tgsi_opcode_type dtype = tgsi_opcode_infer_dst_type(opcode);
enum tgsi_opcode_type stype = tgsi_opcode_infer_src_type(opcode);
if (dtype != TGSI_TYPE_DOUBLE && stype != TGSI_TYPE_DOUBLE)
if (!tgsi_type_is_64bit(dtype) && !tgsi_type_is_64bit(stype))
return dst_chan_index;
if (dtype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(dtype)) {
if (dst_chan_index == 1 || dst_chan_index == 3)
return -1;
if (stype == TGSI_TYPE_DOUBLE)
if (tgsi_type_is_64bit(stype))
return dst_chan_index;
if (dst_chan_index == 0)
return 0;
@@ -335,7 +335,7 @@ lp_build_emit_fetch(
enum tgsi_opcode_type stype = tgsi_opcode_infer_src_type(inst->Instruction.Opcode);
if (chan_index == LP_CHAN_ALL) {
swizzle = ~0;
swizzle = ~0u;
} else {
swizzle = tgsi_util_get_full_src_register_swizzle(reg, chan_index);
if (swizzle > 3) {
@@ -398,7 +398,7 @@ lp_build_emit_fetch(
* Swizzle the argument
*/
if (swizzle == ~0) {
if (swizzle == ~0u) {
res = bld_base->emit_swizzle(bld_base, res,
reg->Register.SwizzleX,
reg->Register.SwizzleY,
@@ -453,7 +453,7 @@ lp_build_emit_fetch_texoffset(
* Swizzle the argument
*/
if (swizzle == ~0) {
if (swizzle == ~0u) {
res = bld_base->emit_swizzle(bld_base, res,
off->SwizzleX,
off->SwizzleY,

View File

@@ -52,7 +52,7 @@
extern "C" {
#endif
#define LP_CHAN_ALL ~0
#define LP_CHAN_ALL ~0u
#define LP_MAX_INSTRUCTIONS 256

View File

@@ -1577,6 +1577,19 @@ log_emit_cpu(
}
/* TGSI_OPCODE_MAD (CPU Only) */
static void
mad_emit_cpu(
const struct lp_build_tgsi_action * action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
{
emit_data->output[emit_data->chan] =
lp_build_mad(&bld_base->base,
emit_data->args[0], emit_data->args[1], emit_data->args[2]);
}
/* TGSI_OPCODE_MAX (CPU Only) */
static void
@@ -2162,6 +2175,7 @@ lp_set_default_actions_cpu(
bld_base->op_actions[TGSI_OPCODE_LG2].emit = lg2_emit_cpu;
bld_base->op_actions[TGSI_OPCODE_LOG].emit = log_emit_cpu;
bld_base->op_actions[TGSI_OPCODE_MAD].emit = mad_emit_cpu;
bld_base->op_actions[TGSI_OPCODE_MAX].emit = max_emit_cpu;
bld_base->op_actions[TGSI_OPCODE_MIN].emit = min_emit_cpu;
bld_base->op_actions[TGSI_OPCODE_MOD].emit = mod_emit_cpu;

View File

@@ -642,7 +642,7 @@ static boolean default_analyse_is_last(struct lp_exec_mask *mask,
{
unsigned pc = bld_base->pc;
struct function_ctx *ctx = func_ctx(mask);
unsigned curr_switch_stack = ctx->switch_stack_size;
int curr_switch_stack = ctx->switch_stack_size;
if (ctx->switch_stack_size > LP_MAX_TGSI_NESTING) {
return false;
@@ -653,7 +653,7 @@ static boolean default_analyse_is_last(struct lp_exec_mask *mask,
pc++;
}
while (pc != -1 && pc < bld_base->num_instructions) {
while (pc != ~0u && pc < bld_base->num_instructions) {
unsigned opcode = bld_base->instructions[pc].Instruction.Opcode;
switch (opcode) {
case TGSI_OPCODE_CASE:
@@ -856,7 +856,7 @@ static void lp_exec_mask_endsub(struct lp_exec_mask *mask, int *pc)
static LLVMValueRef
get_file_ptr(struct lp_build_tgsi_soa_context *bld,
unsigned file,
unsigned index,
int index,
unsigned chan)
{
LLVMBuilderRef builder = bld->bld_base.base.gallivm->builder;
@@ -1227,7 +1227,7 @@ emit_fetch_constant(
LLVMValueRef res;
/* XXX: Handle fetching xyzw components as a vector */
assert(swizzle != ~0);
assert(swizzle != ~0u);
if (reg->Register.Dimension) {
assert(!reg->Dimension.Indirect);
@@ -1264,7 +1264,7 @@ emit_fetch_constant(
index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2);
index_vec = lp_build_add(uint_bld, index_vec, swizzle_vec);
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
LLVMValueRef swizzle_vec2;
swizzle_vec2 = lp_build_const_int_vec(gallivm, uint_bld->type, swizzle + 1);
index_vec2 = lp_build_shl_imm(uint_bld, indirect_index, 2);
@@ -1299,14 +1299,14 @@ emit_fetch_constant(
}
/**
* Fetch double values from two separate channels.
* Doubles are stored split across two channels, like xy and zw.
* Fetch 64-bit values from two separate channels.
* 64-bit values are stored split across two channels, like xy and zw.
* This function creates a set of 16 floats,
* extracts the values from the two channels,
* puts them in the correct place, then casts to 8 doubles.
* puts them in the correct place, then casts to 8 64-bits.
*/
static LLVMValueRef
emit_fetch_double(
emit_fetch_64bit(
struct lp_build_tgsi_context * bld_base,
enum tgsi_opcode_type stype,
LLVMValueRef input,
@@ -1369,7 +1369,7 @@ emit_fetch_immediate(
indirect_index,
swizzle,
FALSE);
if (stype == TGSI_TYPE_DOUBLE)
if (tgsi_type_is_64bit(stype))
index_vec2 = get_soa_array_offsets(&bld_base->uint_bld,
indirect_index,
swizzle + 1,
@@ -1383,7 +1383,7 @@ emit_fetch_immediate(
bld->imms_array, &lindex, 1, "");
res = LLVMBuildLoad(builder, imms_ptr, "");
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
LLVMValueRef lindex1;
LLVMValueRef imms_ptr2;
LLVMValueRef res2;
@@ -1393,22 +1393,19 @@ emit_fetch_immediate(
imms_ptr2 = LLVMBuildGEP(builder,
bld->imms_array, &lindex1, 1, "");
res2 = LLVMBuildLoad(builder, imms_ptr2, "");
res = emit_fetch_double(bld_base, stype, res, res2);
res = emit_fetch_64bit(bld_base, stype, res, res2);
}
}
}
else {
res = bld->immediates[reg->Register.Index][swizzle];
if (stype == TGSI_TYPE_DOUBLE)
res = emit_fetch_double(bld_base, stype, res, bld->immediates[reg->Register.Index][swizzle + 1]);
if (tgsi_type_is_64bit(stype))
res = emit_fetch_64bit(bld_base, stype, res, bld->immediates[reg->Register.Index][swizzle + 1]);
}
if (stype == TGSI_TYPE_UNSIGNED) {
res = LLVMBuildBitCast(builder, res, bld_base->uint_bld.vec_type, "");
} else if (stype == TGSI_TYPE_SIGNED) {
res = LLVMBuildBitCast(builder, res, bld_base->int_bld.vec_type, "");
} else if (stype == TGSI_TYPE_DOUBLE) {
res = LLVMBuildBitCast(builder, res, bld_base->dbl_bld.vec_type, "");
if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED || stype == TGSI_TYPE_DOUBLE) {
struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype);
res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, "");
}
return res;
}
@@ -1441,7 +1438,7 @@ emit_fetch_input(
indirect_index,
swizzle,
TRUE);
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
index_vec2 = get_soa_array_offsets(&bld_base->uint_bld,
indirect_index,
swizzle + 1,
@@ -1461,7 +1458,7 @@ emit_fetch_input(
bld->inputs_array, &lindex, 1, "");
res = LLVMBuildLoad(builder, input_ptr, "");
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
LLVMValueRef lindex1;
LLVMValueRef input_ptr2;
LLVMValueRef res2;
@@ -1471,24 +1468,21 @@ emit_fetch_input(
input_ptr2 = LLVMBuildGEP(builder,
bld->inputs_array, &lindex1, 1, "");
res2 = LLVMBuildLoad(builder, input_ptr2, "");
res = emit_fetch_double(bld_base, stype, res, res2);
res = emit_fetch_64bit(bld_base, stype, res, res2);
}
}
else {
res = bld->inputs[reg->Register.Index][swizzle];
if (stype == TGSI_TYPE_DOUBLE)
res = emit_fetch_double(bld_base, stype, res, bld->inputs[reg->Register.Index][swizzle + 1]);
if (tgsi_type_is_64bit(stype))
res = emit_fetch_64bit(bld_base, stype, res, bld->inputs[reg->Register.Index][swizzle + 1]);
}
}
assert(res);
if (stype == TGSI_TYPE_UNSIGNED) {
res = LLVMBuildBitCast(builder, res, bld_base->uint_bld.vec_type, "");
} else if (stype == TGSI_TYPE_SIGNED) {
res = LLVMBuildBitCast(builder, res, bld_base->int_bld.vec_type, "");
} else if (stype == TGSI_TYPE_DOUBLE) {
res = LLVMBuildBitCast(builder, res, bld_base->dbl_bld.vec_type, "");
if (stype == TGSI_TYPE_SIGNED || stype == TGSI_TYPE_UNSIGNED || stype == TGSI_TYPE_DOUBLE) {
struct lp_build_context *bld_fetch = stype_to_fetch(bld_base, stype);
res = LLVMBuildBitCast(builder, res, bld_fetch->vec_type, "");
}
return res;
@@ -1548,7 +1542,7 @@ emit_fetch_gs_input(
swizzle_index);
assert(res);
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
LLVMValueRef swizzle_index = lp_build_const_int32(gallivm, swizzle + 1);
LLVMValueRef res2;
res2 = bld->gs_iface->fetch_input(bld->gs_iface, bld_base,
@@ -1558,7 +1552,7 @@ emit_fetch_gs_input(
attrib_index,
swizzle_index);
assert(res2);
res = emit_fetch_double(bld_base, stype, res, res2);
res = emit_fetch_64bit(bld_base, stype, res, res2);
} else if (stype == TGSI_TYPE_UNSIGNED) {
res = LLVMBuildBitCast(builder, res, bld_base->uint_bld.vec_type, "");
} else if (stype == TGSI_TYPE_SIGNED) {
@@ -1595,7 +1589,7 @@ emit_fetch_temporary(
indirect_index,
swizzle,
TRUE);
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
index_vec2 = get_soa_array_offsets(&bld_base->uint_bld,
indirect_index,
swizzle + 1,
@@ -1614,12 +1608,12 @@ emit_fetch_temporary(
temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle);
res = LLVMBuildLoad(builder, temp_ptr, "");
if (stype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(stype)) {
LLVMValueRef temp_ptr2, res2;
temp_ptr2 = lp_get_temp_ptr_soa(bld, reg->Register.Index, swizzle + 1);
res2 = LLVMBuildLoad(builder, temp_ptr2, "");
res = emit_fetch_double(bld_base, stype, res, res2);
res = emit_fetch_64bit(bld_base, stype, res, res2);
}
}
@@ -1790,20 +1784,19 @@ emit_fetch_predicate(
}
/**
* store an array of 8 doubles into two arrays of 8 floats
* store an array of 8 64-bit into two arrays of 8 floats
* i.e.
* value is d0, d1, d2, d3 etc.
* each double has high and low pieces x, y
* each 64-bit has high and low pieces x, y
* so gets stored into the separate channels as:
* chan_ptr = d0.x, d1.x, d2.x, d3.x
* chan_ptr2 = d0.y, d1.y, d2.y, d3.y
*/
static void
emit_store_double_chan(struct lp_build_tgsi_context *bld_base,
int dtype,
LLVMValueRef chan_ptr, LLVMValueRef chan_ptr2,
LLVMValueRef pred,
LLVMValueRef value)
emit_store_64bit_chan(struct lp_build_tgsi_context *bld_base,
LLVMValueRef chan_ptr, LLVMValueRef chan_ptr2,
LLVMValueRef pred,
LLVMValueRef value)
{
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
@@ -1870,9 +1863,9 @@ emit_store_chan(
if (reg->Register.Indirect) {
/*
* Currently the mesa/st doesn't generate indirect stores
* to doubles, it normally uses MOV to do indirect stores.
* to 64-bit values, it normally uses MOV to do indirect stores.
*/
assert(dtype != TGSI_TYPE_DOUBLE);
assert(!tgsi_type_is_64bit(dtype));
indirect_index = get_indirect_index(bld,
reg->Register.File,
reg->Register.Index,
@@ -1912,11 +1905,11 @@ emit_store_chan(
LLVMValueRef out_ptr = lp_get_output_ptr(bld, reg->Register.Index,
chan_index);
if (dtype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(dtype)) {
LLVMValueRef out_ptr2 = lp_get_output_ptr(bld, reg->Register.Index,
chan_index + 1);
emit_store_double_chan(bld_base, dtype, out_ptr, out_ptr2,
pred, value);
emit_store_64bit_chan(bld_base, out_ptr, out_ptr2,
pred, value);
} else
lp_exec_mask_store(&bld->exec_mask, float_bld, pred, value, out_ptr);
}
@@ -1924,7 +1917,7 @@ emit_store_chan(
case TGSI_FILE_TEMPORARY:
/* Temporaries are always stored as floats */
if (dtype != TGSI_TYPE_DOUBLE)
if (!tgsi_type_is_64bit(dtype))
value = LLVMBuildBitCast(builder, value, float_bld->vec_type, "");
else
value = LLVMBuildBitCast(builder, value, LLVMVectorType(LLVMFloatTypeInContext(gallivm->context), bld_base->base.type.length * 2), "");
@@ -1950,12 +1943,12 @@ emit_store_chan(
LLVMValueRef temp_ptr;
temp_ptr = lp_get_temp_ptr_soa(bld, reg->Register.Index, chan_index);
if (dtype == TGSI_TYPE_DOUBLE) {
if (tgsi_type_is_64bit(dtype)) {
LLVMValueRef temp_ptr2 = lp_get_temp_ptr_soa(bld,
reg->Register.Index,
chan_index + 1);
emit_store_double_chan(bld_base, dtype, temp_ptr, temp_ptr2,
pred, value);
emit_store_64bit_chan(bld_base, temp_ptr, temp_ptr2,
pred, value);
}
else
lp_exec_mask_store(&bld->exec_mask, float_bld, pred, value, temp_ptr);
@@ -2035,7 +2028,7 @@ emit_store(
TGSI_FOR_EACH_DST0_ENABLED_CHANNEL( inst, chan_index ) {
if (dtype == TGSI_TYPE_DOUBLE && (chan_index == 1 || chan_index == 3))
if (tgsi_type_is_64bit(dtype) && (chan_index == 1 || chan_index == 3))
continue;
emit_store_chan(bld_base, inst, 0, chan_index, pred[chan_index], dst[chan_index]);
}
@@ -2882,7 +2875,7 @@ emit_dump_file(struct lp_build_tgsi_soa_context *bld,
int chan;
if (index < 8 * sizeof(unsigned) &&
(info->file_mask[file] & (1 << index)) == 0) {
(info->file_mask[file] & (1u << index)) == 0) {
/* This was not declared.*/
continue;
}

View File

@@ -38,10 +38,7 @@ LOCAL_SRC_FILES := $(COMMON_SOURCES)
LOCAL_MODULE := libmesa_pipe_loader
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SRC_FILES += $(DRM_SOURCES)
LOCAL_SHARED_LIBRARIES := libdrm
LOCAL_STATIC_LIBRARIES := libmesa_loader
endif

View File

@@ -676,10 +676,10 @@ static void
micro_trunc(union tgsi_exec_channel *dst,
const union tgsi_exec_channel *src)
{
dst->f[0] = (float)(int)src->f[0];
dst->f[1] = (float)(int)src->f[1];
dst->f[2] = (float)(int)src->f[2];
dst->f[3] = (float)(int)src->f[3];
dst->f[0] = truncf(src->f[0]);
dst->f[1] = truncf(src->f[1]);
dst->f[2] = truncf(src->f[2]);
dst->f[3] = truncf(src->f[3]);
}
static void

View File

@@ -262,6 +262,9 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] =
{ 1, 1, 0, 0, 0, 0, 0, COMP, "DFLR", TGSI_OPCODE_DFLR },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "DROUND", TGSI_OPCODE_DROUND },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "DSSG", TGSI_OPCODE_DSSG },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "VOTE_ANY", TGSI_OPCODE_VOTE_ANY },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "VOTE_ALL", TGSI_OPCODE_VOTE_ALL },
{ 1, 1, 0, 0, 0, 0, 0, COMP, "VOTE_EQ", TGSI_OPCODE_VOTE_EQ },
};
const struct tgsi_opcode_info *

View File

@@ -101,6 +101,13 @@ enum tgsi_opcode_type {
TGSI_TYPE_DOUBLE
};
static inline bool tgsi_type_is_64bit(enum tgsi_opcode_type type)
{
if (type == TGSI_TYPE_DOUBLE)
return true;
return false;
}
enum tgsi_opcode_type
tgsi_opcode_infer_src_type( uint opcode );

View File

@@ -96,7 +96,7 @@ struct psprite_transform_context
unsigned stream_out_point_pos:1; // set if to stream out original point pos
unsigned aa_point:1; // set if doing aa point
unsigned out_tmp_index[PIPE_MAX_SHADER_OUTPUTS];
int max_generic;
int max_generic; // max generic semantic index
};
static inline struct psprite_transform_context *
@@ -133,7 +133,7 @@ psprite_decl(struct tgsi_transform_context *ctx,
else if (decl->Semantic.Name == TGSI_SEMANTIC_GENERIC &&
decl->Semantic.Index < 32) {
ts->point_coord_decl |= 1 << decl->Semantic.Index;
ts->max_generic = MAX2(ts->max_generic, decl->Semantic.Index);
ts->max_generic = MAX2(ts->max_generic, (int)decl->Semantic.Index);
}
ts->num_out = MAX2(ts->num_out, decl->Range.Last + 1);
}
@@ -216,7 +216,7 @@ psprite_prolog(struct tgsi_transform_context *ctx)
if (en & 0x1) {
tgsi_transform_output_decl(ctx, ts->num_out++,
TGSI_SEMANTIC_GENERIC, i, 0);
ts->max_generic = MAX2(ts->max_generic, i);
ts->max_generic = MAX2(ts->max_generic, (int)i);
}
}
}

View File

@@ -100,8 +100,6 @@ struct blitter_context_priv
/* FS which outputs an average of all samples. */
void *fs_resolve[PIPE_MAX_TEXTURE_TYPES][NUM_RESOLVE_FRAG_SHADERS][2];
void *fs_resolve_sint[PIPE_MAX_TEXTURE_TYPES][NUM_RESOLVE_FRAG_SHADERS][2];
void *fs_resolve_uint[PIPE_MAX_TEXTURE_TYPES][NUM_RESOLVE_FRAG_SHADERS][2];
/* Blend state. */
void *blend[PIPE_MASK_RGBA+1][2]; /**< blend state with writemask */
@@ -487,16 +485,6 @@ void util_blitter_destroy(struct blitter_context *blitter)
for (f = 0; f < 2; f++)
if (ctx->fs_resolve[i][j][f])
ctx->delete_fs_state(pipe, ctx->fs_resolve[i][j][f]);
for (j = 0; j< ARRAY_SIZE(ctx->fs_resolve_sint[i]); j++)
for (f = 0; f < 2; f++)
if (ctx->fs_resolve_sint[i][j][f])
ctx->delete_fs_state(pipe, ctx->fs_resolve_sint[i][j][f]);
for (j = 0; j< ARRAY_SIZE(ctx->fs_resolve_uint[i]); j++)
for (f = 0; f < 2; f++)
if (ctx->fs_resolve_uint[i][j][f])
ctx->delete_fs_state(pipe, ctx->fs_resolve_uint[i][j][f]);
}
if (ctx->fs_empty)
@@ -891,18 +879,18 @@ static void *blitter_get_fs_texfetch_col(struct blitter_context_priv *ctx,
if (src_nr_samples > 1) {
void **shader;
if (dst_nr_samples <= 1) {
/* OpenGL requires that integer textures just copy 1 sample instead
* of averaging.
*/
if (dst_nr_samples <= 1 &&
stype != TGSI_RETURN_TYPE_UINT &&
stype != TGSI_RETURN_TYPE_SINT) {
/* The destination has one sample, so we'll do color resolve. */
unsigned index = GET_MSAA_RESOLVE_FS_IDX(src_nr_samples);
assert(filter < 2);
if (stype == TGSI_RETURN_TYPE_UINT)
shader = &ctx->fs_resolve_uint[target][index][filter];
else if (stype == TGSI_RETURN_TYPE_SINT)
shader = &ctx->fs_resolve_sint[target][index][filter];
else
shader = &ctx->fs_resolve[target][index][filter];
shader = &ctx->fs_resolve[target][index][filter];
if (!*shader) {
assert(!ctx->cached_all_shaders);

View File

@@ -369,6 +369,7 @@ util_cpu_detect(void)
((regs2[2] >> 27) & 1) && // OSXSAVE
((xgetbv() & 6) == 6); // XMM & YMM
util_cpu_caps.has_f16c = ((regs2[2] >> 29) & 1) && util_cpu_caps.has_avx;
util_cpu_caps.has_fma = ((regs2[2] >> 12) & 1) && util_cpu_caps.has_avx;
util_cpu_caps.has_mmx2 = util_cpu_caps.has_sse; /* SSE cpus supports mmxext too */
#if defined(PIPE_ARCH_X86_64)
util_cpu_caps.has_daz = 1;

View File

@@ -66,6 +66,7 @@ struct util_cpu_caps {
unsigned has_avx:1;
unsigned has_avx2:1;
unsigned has_f16c:1;
unsigned has_fma:1;
unsigned has_3dnow:1;
unsigned has_3dnow_ext:1;
unsigned has_xop:1;

View File

@@ -626,10 +626,17 @@ static inline void
util_copy_image_view(struct pipe_image_view *dst,
const struct pipe_image_view *src)
{
pipe_resource_reference(&dst->resource, src->resource);
dst->format = src->format;
dst->access = src->access;
dst->u = src->u;
if (src) {
pipe_resource_reference(&dst->resource, src->resource);
dst->format = src->format;
dst->access = src->access;
dst->u = src->u;
} else {
pipe_resource_reference(&dst->resource, NULL);
dst->format = PIPE_FORMAT_NONE;
dst->access = 0;
memset(&dst->u, 0, sizeof(dst->u));
}
}
static inline unsigned
@@ -650,6 +657,18 @@ util_max_layer(const struct pipe_resource *r, unsigned level)
}
}
static inline bool
util_texrange_covers_whole_level(const struct pipe_resource *tex,
unsigned level, unsigned x, unsigned y,
unsigned z, unsigned width,
unsigned height, unsigned depth)
{
return x == 0 && y == 0 && z == 0 &&
width == u_minify(tex->width0, level) &&
height == u_minify(tex->height0, level) &&
depth == util_max_layer(tex, level) + 1;
}
#ifdef __cplusplus
}
#endif

View File

@@ -1,136 +0,0 @@
/**************************************************************************
*
* Copyright 2010 Luca Barbieri
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "util/u_staging.h"
#include "pipe/p_context.h"
#include "util/u_memory.h"
#include "util/u_inlines.h"
static void
util_staging_resource_template(struct pipe_resource *pt, unsigned width,
unsigned height, unsigned depth,
struct pipe_resource *template)
{
memset(template, 0, sizeof(struct pipe_resource));
if (pt->target != PIPE_BUFFER && depth <= 1)
template->target = PIPE_TEXTURE_RECT;
else
template->target = pt->target;
template->format = pt->format;
template->width0 = width;
template->height0 = height;
template->depth0 = depth;
template->array_size = 1;
template->last_level = 0;
template->nr_samples = pt->nr_samples;
template->bind = 0;
template->usage = PIPE_USAGE_STAGING;
template->flags = 0;
}
struct util_staging_transfer *
util_staging_transfer_init(struct pipe_context *pipe,
struct pipe_resource *pt,
unsigned level, enum pipe_resource_usage usage,
const struct pipe_box *box,
boolean direct, struct util_staging_transfer *tx)
{
struct pipe_screen *pscreen = pipe->screen;
struct pipe_resource staging_resource_template;
pipe_resource_reference(&tx->base.resource, pt);
tx->base.level = level;
tx->base.usage = usage;
tx->base.box = *box;
if (direct) {
tx->staging_resource = pt;
return tx;
}
util_staging_resource_template(pt, box->width, box->height,
box->depth, &staging_resource_template);
tx->staging_resource = pscreen->resource_create(pscreen,
&staging_resource_template);
if (!tx->staging_resource) {
pipe_resource_reference(&tx->base.resource, NULL);
FREE(tx);
return NULL;
}
if (usage & PIPE_TRANSFER_READ) {
/* XXX this looks wrong dst is always the same but looping over src z? */
int zi;
struct pipe_box sbox;
sbox.x = box->x;
sbox.y = box->y;
sbox.z = box->z;
sbox.width = box->width;
sbox.height = box->height;
sbox.depth = 1;
for (zi = 0; zi < box->depth; ++zi) {
sbox.z = sbox.z + zi;
pipe->resource_copy_region(pipe, tx->staging_resource, 0, 0, 0, 0,
tx->base.resource, level, &sbox);
}
}
return tx;
}
void
util_staging_transfer_destroy(struct pipe_context *pipe,
struct pipe_transfer *ptx)
{
struct util_staging_transfer *tx = (struct util_staging_transfer *)ptx;
if (tx->staging_resource != tx->base.resource) {
if (tx->base.usage & PIPE_TRANSFER_WRITE) {
/* XXX this looks wrong src is always the same but looping over dst z? */
int zi;
struct pipe_box sbox;
sbox.x = 0;
sbox.y = 0;
sbox.z = 0;
sbox.width = tx->base.box.width;
sbox.height = tx->base.box.height;
sbox.depth = 1;
for (zi = 0; zi < tx->base.box.depth; ++zi)
pipe->resource_copy_region(pipe, tx->base.resource, tx->base.level,
tx->base.box.x, tx->base.box.y,
tx->base.box.z + zi,
tx->staging_resource, 0, &sbox);
}
pipe_resource_reference(&tx->staging_resource, NULL);
}
pipe_resource_reference(&ptx->resource, NULL);
FREE(ptx);
}

View File

@@ -1,67 +0,0 @@
/**************************************************************************
*
* Copyright 2010 Luca Barbieri
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
/* Direct3D 10/11 has no concept of transfers. Applications instead
* create resources with a STAGING or DYNAMIC usage, copy between them
* and the real resource and use Map to map the STAGING/DYNAMIC resource.
*
* This util module allows to implement Gallium drivers as a Direct3D
* driver would be implemented: transfers allocate a resource with
* PIPE_USAGE_STAGING, and copy the data between it and the real resource
* with resource_copy_region.
*/
#ifndef U_STAGING_H
#define U_STAGING_H
#include "pipe/p_state.h"
struct util_staging_transfer {
struct pipe_transfer base;
/* if direct, same as base.resource, otherwise the temporary staging
* resource
*/
struct pipe_resource *staging_resource;
};
/* user must be stride, slice_stride and offset.
* pt->usage == PIPE_USAGE_DYNAMIC || pt->usage == PIPE_USAGE_STAGING
* should be a good value to pass for direct staging resource is currently
* created with PIPE_USAGE_STAGING
*/
struct util_staging_transfer *
util_staging_transfer_init(struct pipe_context *pipe,
struct pipe_resource *pt,
unsigned level, enum pipe_resource_usage usage,
const struct pipe_box *box,
boolean direct, struct util_staging_transfer *tx);
void
util_staging_transfer_destroy(struct pipe_context *pipe,
struct pipe_transfer *ptx);
#endif

View File

@@ -41,7 +41,6 @@ struct u_suballocator {
struct pipe_context *pipe;
unsigned size; /* Size of the whole buffer, in bytes. */
unsigned alignment; /* Alignment of each sub-allocation. */
unsigned bind; /* Bitmask of PIPE_BIND_* flags. */
enum pipe_resource_usage usage;
boolean zero_buffer_memory; /* If the buffer contents should be zeroed. */
@@ -58,8 +57,7 @@ struct u_suballocator {
* cleared to 0 after the allocation.
*/
struct u_suballocator *
u_suballocator_create(struct pipe_context *pipe, unsigned size,
unsigned alignment, unsigned bind,
u_suballocator_create(struct pipe_context *pipe, unsigned size, unsigned bind,
enum pipe_resource_usage usage,
boolean zero_buffer_memory)
{
@@ -68,8 +66,7 @@ u_suballocator_create(struct pipe_context *pipe, unsigned size,
return NULL;
allocator->pipe = pipe;
allocator->size = align(size, alignment);
allocator->alignment = alignment;
allocator->size = size;
allocator->bind = bind;
allocator->usage = usage;
allocator->zero_buffer_memory = zero_buffer_memory;
@@ -85,17 +82,18 @@ u_suballocator_destroy(struct u_suballocator *allocator)
void
u_suballocator_alloc(struct u_suballocator *allocator, unsigned size,
unsigned *out_offset, struct pipe_resource **outbuf)
unsigned alignment, unsigned *out_offset,
struct pipe_resource **outbuf)
{
unsigned alloc_size = align(size, allocator->alignment);
allocator->offset = align(allocator->offset, alignment);
/* Don't allow allocations larger than the buffer size. */
if (alloc_size > allocator->size)
if (size > allocator->size)
goto fail;
/* Make sure we have enough space in the buffer. */
if (!allocator->buffer ||
allocator->offset + alloc_size > allocator->size) {
allocator->offset + size > allocator->size) {
/* Allocate a new buffer. */
pipe_resource_reference(&allocator->buffer, NULL);
allocator->offset = 0;
@@ -117,15 +115,15 @@ u_suballocator_alloc(struct u_suballocator *allocator, unsigned size,
}
}
assert(allocator->offset % allocator->alignment == 0);
assert(allocator->offset % alignment == 0);
assert(allocator->offset < allocator->buffer->width0);
assert(allocator->offset + alloc_size <= allocator->buffer->width0);
assert(allocator->offset + size <= allocator->buffer->width0);
/* Return the buffer. */
*out_offset = allocator->offset;
pipe_resource_reference(outbuf, allocator->buffer);
allocator->offset += alloc_size;
allocator->offset += size;
return;
fail:

View File

@@ -34,8 +34,7 @@
struct u_suballocator;
struct u_suballocator *
u_suballocator_create(struct pipe_context *pipe, unsigned size,
unsigned alignment, unsigned bind,
u_suballocator_create(struct pipe_context *pipe, unsigned size, unsigned bind,
enum pipe_resource_usage usage,
boolean zero_buffer_memory);
@@ -44,6 +43,7 @@ u_suballocator_destroy(struct u_suballocator *allocator);
void
u_suballocator_alloc(struct u_suballocator *allocator, unsigned size,
unsigned *out_offset, struct pipe_resource **outbuf);
unsigned alignment, unsigned *out_offset,
struct pipe_resource **outbuf);
#endif

View File

@@ -238,8 +238,21 @@ util_fill_box(ubyte * dst,
}
/** Mipmap level size computation, with minimum block size */
static inline unsigned
minify(unsigned value, unsigned levels, unsigned blocksize)
{
return MAX2(blocksize, value >> levels);
}
/**
* Fallback function for pipe->resource_copy_region().
* We support copying between different formats (including compressed/
* uncompressed) if the bytes per block or pixel matches. If copying
* compressed -> uncompressed, the dst region is reduced by the block
* width, height. If copying uncompressed -> compressed, the dest region
* is expanded by the block width, height. See GL_ARB_copy_image.
* Note: (X,Y)=(0,0) is always the upper-left corner.
*/
void
@@ -249,14 +262,15 @@ util_resource_copy_region(struct pipe_context *pipe,
unsigned dst_x, unsigned dst_y, unsigned dst_z,
struct pipe_resource *src,
unsigned src_level,
const struct pipe_box *src_box)
const struct pipe_box *src_box_in)
{
struct pipe_transfer *src_trans, *dst_trans;
uint8_t *dst_map;
const uint8_t *src_map;
MAYBE_UNUSED enum pipe_format src_format;
enum pipe_format dst_format;
struct pipe_box dst_box;
struct pipe_box src_box, dst_box;
unsigned src_bs, dst_bs, src_bw, dst_bw, src_bh, dst_bh;
assert(src && dst);
if (!src || !dst)
@@ -268,47 +282,112 @@ util_resource_copy_region(struct pipe_context *pipe,
src_format = src->format;
dst_format = dst->format;
assert(util_format_get_blocksize(dst_format) == util_format_get_blocksize(src_format));
assert(util_format_get_blockwidth(dst_format) == util_format_get_blockwidth(src_format));
assert(util_format_get_blockheight(dst_format) == util_format_get_blockheight(src_format));
/* init src box */
src_box = *src_box_in;
/* init dst box */
dst_box.x = dst_x;
dst_box.y = dst_y;
dst_box.z = dst_z;
dst_box.width = src_box.width;
dst_box.height = src_box.height;
dst_box.depth = src_box.depth;
src_bs = util_format_get_blocksize(src_format);
src_bw = util_format_get_blockwidth(src_format);
src_bh = util_format_get_blockheight(src_format);
dst_bs = util_format_get_blocksize(dst_format);
dst_bw = util_format_get_blockwidth(dst_format);
dst_bh = util_format_get_blockheight(dst_format);
/* Note: all box positions and sizes are in pixels */
if (src_bw > 1 && dst_bw == 1) {
/* Copy from compressed to uncompressed.
* Shrink dest box by the src block size.
*/
dst_box.width /= src_bw;
dst_box.height /= src_bh;
}
else if (src_bw == 1 && dst_bw > 1) {
/* Copy from uncompressed to compressed.
* Expand dest box by the dest block size.
*/
dst_box.width *= dst_bw;
dst_box.height *= dst_bh;
}
else {
/* compressed -> compressed or uncompressed -> uncompressed copy */
assert(src_bw == dst_bw);
assert(src_bh == dst_bh);
}
assert(src_bs == dst_bs);
if (src_bs != dst_bs) {
/* This can happen if we fail to do format checking before hand.
* Don't crash below.
*/
return;
}
/* check that region boxes are block aligned */
assert(src_box.x % src_bw == 0);
assert(src_box.y % src_bh == 0);
assert(src_box.width % src_bw == 0 ||
src_box.x + src_box.width == minify(src->width0, src_level, src_bw));
assert(src_box.height % src_bh == 0 ||
src_box.y + src_box.height == minify(src->height0, src_level, src_bh));
assert(dst_box.x % dst_bw == 0);
assert(dst_box.y % dst_bh == 0);
assert(dst_box.width % dst_bw == 0 ||
dst_box.x + dst_box.width == minify(dst->width0, dst_level, dst_bw));
assert(dst_box.height % dst_bh == 0 ||
dst_box.y + dst_box.height == minify(dst->height0, dst_level, dst_bh));
/* check that region boxes are not out of bounds */
assert(src_box.x + src_box.width <=
minify(src->width0, src_level, src_bw));
assert(src_box.y + src_box.height <=
minify(src->height0, src_level, src_bh));
assert(dst_box.x + dst_box.width <=
minify(dst->width0, dst_level, dst_bw));
assert(dst_box.y + dst_box.height <=
minify(dst->height0, dst_level, dst_bh));
/* check that total number of src, dest bytes match */
assert((src_box.width / src_bw) * (src_box.height / src_bh) * src_bs ==
(dst_box.width / dst_bw) * (dst_box.height / dst_bh) * dst_bs);
src_map = pipe->transfer_map(pipe,
src,
src_level,
PIPE_TRANSFER_READ,
src_box, &src_trans);
&src_box, &src_trans);
assert(src_map);
if (!src_map) {
goto no_src_map;
}
dst_box.x = dst_x;
dst_box.y = dst_y;
dst_box.z = dst_z;
dst_box.width = src_box->width;
dst_box.height = src_box->height;
dst_box.depth = src_box->depth;
dst_map = pipe->transfer_map(pipe,
dst,
dst_level,
PIPE_TRANSFER_WRITE | PIPE_TRANSFER_DISCARD_RANGE,
&dst_box, &dst_trans);
PIPE_TRANSFER_WRITE |
PIPE_TRANSFER_DISCARD_RANGE, &dst_box,
&dst_trans);
assert(dst_map);
if (!dst_map) {
goto no_dst_map;
}
if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) {
assert(src_box->height == 1);
assert(src_box->depth == 1);
memcpy(dst_map, src_map, src_box->width);
assert(src_box.height == 1);
assert(src_box.depth == 1);
memcpy(dst_map, src_map, src_box.width);
} else {
util_copy_box(dst_map,
dst_format,
src_format,
dst_trans->stride, dst_trans->layer_stride,
0, 0, 0,
src_box->width, src_box->height, src_box->depth,
src_box.width, src_box.height, src_box.depth,
src_map,
src_trans->stride, src_trans->layer_stride,
0, 0, 0);

View File

@@ -132,8 +132,10 @@ create_frag_shader_video_buffer(struct vl_compositor *c)
struct ureg_src tc;
struct ureg_src csc[3];
struct ureg_src sampler[3];
struct ureg_src lumakey;
struct ureg_dst texel;
struct ureg_dst fragment;
struct ureg_dst temp[2];
unsigned i;
shader = ureg_create(PIPE_SHADER_FRAGMENT);
@@ -145,6 +147,11 @@ create_frag_shader_video_buffer(struct vl_compositor *c)
csc[i] = ureg_DECL_constant(shader, i);
sampler[i] = ureg_DECL_sampler(shader, i);
}
for (i = 0; i < 2; ++i)
temp[i] = ureg_DECL_temporary(shader);
lumakey = ureg_DECL_constant(shader, 3);
texel = ureg_DECL_temporary(shader);
fragment = ureg_DECL_output(shader, TGSI_SEMANTIC_COLOR, 0);
@@ -160,7 +167,17 @@ create_frag_shader_video_buffer(struct vl_compositor *c)
for (i = 0; i < 3; ++i)
ureg_DP4(shader, ureg_writemask(fragment, TGSI_WRITEMASK_X << i), csc[i], ureg_src(texel));
ureg_MOV(shader, ureg_writemask(fragment, TGSI_WRITEMASK_W), ureg_imm1f(shader, 1.0f));
ureg_MOV(shader, ureg_writemask(temp[0], TGSI_WRITEMASK_W),
ureg_scalar(ureg_src(texel), TGSI_SWIZZLE_Z));
ureg_SLE(shader, ureg_writemask(temp[1],TGSI_WRITEMASK_W),
ureg_src(temp[0]), ureg_scalar(lumakey, TGSI_SWIZZLE_X));
ureg_SGT(shader, ureg_writemask(temp[0],TGSI_WRITEMASK_W),
ureg_src(temp[0]), ureg_scalar(lumakey, TGSI_SWIZZLE_Y));
ureg_MAX(shader, ureg_writemask(fragment, TGSI_WRITEMASK_W),
ureg_src(temp[0]), ureg_src(temp[1]));
for (i = 0; i < 2; ++i)
ureg_release_temporary(shader, temp[i]);
ureg_release_temporary(shader, texel);
ureg_END(shader);
@@ -852,20 +869,23 @@ vl_compositor_cleanup(struct vl_compositor *c)
}
void
vl_compositor_set_csc_matrix(struct vl_compositor_state *s, vl_csc_matrix const *matrix)
vl_compositor_set_csc_matrix(struct vl_compositor_state *s,
vl_csc_matrix const *matrix,
float luma_min, float luma_max)
{
struct pipe_transfer *buf_transfer;
assert(s);
memcpy
(
pipe_buffer_map(s->pipe, s->csc_matrix,
PIPE_TRANSFER_WRITE | PIPE_TRANSFER_DISCARD_RANGE,
&buf_transfer),
matrix,
sizeof(vl_csc_matrix)
);
float *ptr = pipe_buffer_map(s->pipe, s->csc_matrix,
PIPE_TRANSFER_WRITE | PIPE_TRANSFER_DISCARD_RANGE,
&buf_transfer);
memcpy(ptr, matrix, sizeof(vl_csc_matrix));
ptr += sizeof(vl_csc_matrix)/sizeof(float);
ptr[0] = luma_min;
ptr[1] = luma_max;
pipe_buffer_unmap(s->pipe, buf_transfer);
}
@@ -1142,13 +1162,13 @@ vl_compositor_init_state(struct vl_compositor_state *s, struct pipe_context *pip
pipe->screen,
PIPE_BIND_CONSTANT_BUFFER,
PIPE_USAGE_DEFAULT,
sizeof(csc_matrix)
sizeof(csc_matrix) + 2*sizeof(float)
);
vl_compositor_clear_layers(s);
vl_csc_get_matrix(VL_CSC_COLOR_STANDARD_IDENTITY, NULL, true, &csc_matrix);
vl_compositor_set_csc_matrix(s, (const vl_csc_matrix *)&csc_matrix);
vl_compositor_set_csc_matrix(s, (const vl_csc_matrix *)&csc_matrix, 1.0f, 0.0f);
return true;
}

View File

@@ -138,7 +138,9 @@ vl_compositor_init_state(struct vl_compositor_state *state, struct pipe_context
* set yuv -> rgba conversion matrix
*/
void
vl_compositor_set_csc_matrix(struct vl_compositor_state *settings, const vl_csc_matrix *matrix);
vl_compositor_set_csc_matrix(struct vl_compositor_state *settings,
const vl_csc_matrix *matrix,
float luma_min, float luma_max);
/**
* reset dirty area, so it's cleared with the clear colour

View File

@@ -447,7 +447,8 @@ vl_deint_filter_render(struct vl_deint_filter *filter,
struct pipe_sampler_view *sampler_views[4];
struct pipe_surface **dst_surfaces;
const unsigned *plane_order;
int i, j;
int i;
unsigned j;
assert(filter && prevprev && prev && cur && next && field <= 1);

View File

@@ -321,13 +321,11 @@ static void *
create_stage1_frag_shader(struct vl_idct *idct)
{
struct ureg_program *shader;
struct ureg_src l_addr[2], r_addr[2];
struct ureg_dst l[4][2], r[2];
struct ureg_dst *fragment;
int i, j;
unsigned i;
int j;
shader = ureg_create(PIPE_SHADER_FRAGMENT);
if (!shader)

View File

@@ -85,7 +85,7 @@ create_frag_shader(struct vl_matrix_filter *filter, unsigned num_offsets,
struct ureg_dst t_sum;
struct ureg_dst o_fragment;
bool first;
int i;
unsigned i;
shader = ureg_create(PIPE_SHADER_FRAGMENT);
if (!shader) {

View File

@@ -84,7 +84,7 @@ create_frag_shader(struct vl_median_filter *filter,
struct ureg_dst *t_array = MALLOC(sizeof(struct ureg_dst) * num_offsets);
struct ureg_dst o_fragment;
const unsigned median = num_offsets >> 1;
int i, j;
unsigned i, j;
assert(num_offsets & 1); /* we need an odd number of offsets */
if (!(num_offsets & 1)) { /* yeah, we REALLY need an odd number of offsets!!! */
@@ -158,7 +158,8 @@ static void
generate_offsets(enum vl_median_filter_shape shape, unsigned size,
struct vertex2f **offsets, unsigned *num_offsets)
{
int i = 0, half_size;
unsigned i = 0;
int half_size;
struct vertex2f v;
assert(offsets && num_offsets);

View File

@@ -583,12 +583,12 @@ init_dct_coeff_table(struct dct_coeff *dst, const struct dct_coeff_compressed *s
break;
}
for(i=0; i<(1 << (17 - coeff.length)); ++i)
for(i = 0; i < (1u << (17 - coeff.length)); ++i)
dst[src->bitcode << 1 | i] = coeff;
if (has_sign) {
coeff.level = -coeff.level;
for(; i<(1 << (18 - coeff.length)); ++i)
for(; i < (1u << (18 - coeff.length)); ++i)
dst[src->bitcode << 1 | i] = coeff;
}
}

View File

@@ -79,7 +79,7 @@ vl_vlc_init_table(struct vl_vlc_entry *dst, unsigned dst_size, const struct vl_v
}
for(; src_size > 0; --src_size, ++src) {
for(i=0; i<(1 << (bits - src->entry.length)); ++i)
for(i = 0; i < (1u << (bits - src->entry.length)); ++i)
dst[src->bitcode >> (16 - bits) | i] = src->entry;
}
}
@@ -293,7 +293,7 @@ vl_vlc_search_byte(struct vl_vlc *vlc, unsigned num_bits, uint8_t value)
{
/* make sure we are on a byte boundary */
assert((vl_vlc_valid_bits(vlc) % 8) == 0);
assert(num_bits == ~0 || (num_bits % 8) == 0);
assert(num_bits == ~0u || (num_bits % 8) == 0);
/* deplete the bit buffer */
while (vl_vlc_valid_bits(vlc) > 0) {
@@ -305,7 +305,7 @@ vl_vlc_search_byte(struct vl_vlc *vlc, unsigned num_bits, uint8_t value)
vl_vlc_eatbits(vlc, 8);
if (num_bits != ~0) {
if (num_bits != ~0u) {
num_bits -= 8;
if (num_bits == 0)
return FALSE;
@@ -332,7 +332,7 @@ vl_vlc_search_byte(struct vl_vlc *vlc, unsigned num_bits, uint8_t value)
}
++vlc->data;
if (num_bits != ~0) {
if (num_bits != ~0u) {
num_bits -= 8;
if (num_bits == 0) {
vl_vlc_align_data_ptr(vlc);

View File

@@ -99,15 +99,12 @@ static void *
create_vert_shader(struct vl_zscan *zscan)
{
struct ureg_program *shader;
struct ureg_src scale;
struct ureg_src vrect, vpos, block_num;
struct ureg_dst tmp;
struct ureg_dst o_vpos;
struct ureg_dst *o_vtex;
signed i;
unsigned i;
shader = ureg_create(PIPE_SHADER_VERTEX);
if (!shader)

View File

@@ -340,6 +340,7 @@ The integer capabilities:
extension and thus implements proper support for culling planes.
* ``PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES``: Whether primitive restart is
supported for patch primitives.
* ``PIPE_CAP_TGSI_VOTE``: Whether the ``VOTE_*`` ops can be used in shaders.
.. _pipe_capf:

View File

@@ -2557,6 +2557,23 @@ only be used with 32-bit integer image formats.
resource[offset] = (dst_x > src_x ? dst_x : src_x)
.. _voteopcodes:
Vote opcodes
^^^^^^^^^^^^
These opcodes compare the given value across the shader invocations
running in the current SIMD group. The details of exactly which
invocations get compared are implementation-defined, and it would be a
correct implementation to only ever consider the current thread's
value. (i.e. SIMD group of 1). The argument is treated as a boolean.
.. opcode:: VOTE_ANY - Value is set in any of the current invocations
.. opcode:: VOTE_ALL - Value is set in all of the current invocations
.. opcode:: VOTE_EQ - Value is the same in all of the current invocations
Explanation of symbols used
------------------------------

View File

@@ -40,7 +40,7 @@ LOCAL_C_INCLUDES := \
LOCAL_GENERATED_SOURCES := $(MESA_GEN_NIR_H)
LOCAL_SHARED_LIBRARIES := libdrm libdrm_freedreno
LOCAL_SHARED_LIBRARIES := libdrm_freedreno
LOCAL_STATIC_LIBRARIES := libmesa_glsl libmesa_nir
LOCAL_MODULE := libmesa_pipe_freedreno

View File

@@ -142,16 +142,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
[SB_FRAG_TEX] = REG_A3XX_TPL1_TP_FS_BORDER_COLOR_BASE_ADDR,
};
struct fd3_context *fd3_ctx = fd3_context(ctx);
unsigned i, j, off;
void *ptr;
u_upload_alloc(fd3_ctx->border_color_uploader,
0, BORDER_COLOR_UPLOAD_SIZE,
BORDER_COLOR_UPLOAD_SIZE, &off,
&fd3_ctx->border_color_buf,
&ptr);
fd_setup_border_colors(tex, ptr, tex_off[sb]);
bool needs_border = false;
unsigned i, j;
if (tex->num_samplers > 0) {
/* output sampler state: */
@@ -170,6 +162,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
OUT_RING(ring, sampler->texsamp0);
OUT_RING(ring, sampler->texsamp1);
needs_border |= sampler->needs_border;
}
}
@@ -233,10 +227,23 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
}
}
OUT_PKT0(ring, bcolor_reg[sb], 1);
OUT_RELOC(ring, fd_resource(fd3_ctx->border_color_buf)->bo, off, 0, 0);
if (needs_border) {
unsigned off;
void *ptr;
u_upload_unmap(fd3_ctx->border_color_uploader);
u_upload_alloc(fd3_ctx->border_color_uploader,
0, BORDER_COLOR_UPLOAD_SIZE,
BORDER_COLOR_UPLOAD_SIZE, &off,
&fd3_ctx->border_color_buf,
&ptr);
fd_setup_border_colors(tex, ptr, tex_off[sb]);
OUT_PKT0(ring, bcolor_reg[sb], 1);
OUT_RELOC(ring, fd_resource(fd3_ctx->border_color_buf)->bo, off, 0, 0);
u_upload_unmap(fd3_ctx->border_color_uploader);
}
}
/* emit texture state for mem->gmem restore operation.. eventually it would

View File

@@ -79,7 +79,8 @@ emit_mrt(struct fd_ringbuffer *ring, unsigned nr_bufs,
if (rsc->stencil) {
rsc = rsc->stencil;
pformat = rsc->base.b.format;
bases++;
if (bases)
bases++;
}
slice = fd_resource_slice(rsc, psurf->u.tex.level);
format = fd3_pipe2color(pformat);

View File

@@ -36,7 +36,7 @@
#include "fd3_format.h"
static enum a3xx_tex_clamp
tex_clamp(unsigned wrap, bool clamp_to_edge)
tex_clamp(unsigned wrap, bool clamp_to_edge, bool *needs_border)
{
/* Hardware does not support _CLAMP, but we emulate it: */
if (wrap == PIPE_TEX_WRAP_CLAMP) {
@@ -50,6 +50,7 @@ tex_clamp(unsigned wrap, bool clamp_to_edge)
case PIPE_TEX_WRAP_CLAMP_TO_EDGE:
return A3XX_TEX_CLAMP_TO_EDGE;
case PIPE_TEX_WRAP_CLAMP_TO_BORDER:
*needs_border = true;
return A3XX_TEX_CLAMP_TO_BORDER;
case PIPE_TEX_WRAP_MIRROR_CLAMP_TO_EDGE:
/* only works for PoT.. need to emulate otherwise! */
@@ -113,6 +114,7 @@ fd3_sampler_state_create(struct pipe_context *pctx,
so->saturate_r = (cso->wrap_r == PIPE_TEX_WRAP_CLAMP);
}
so->needs_border = false;
so->texsamp0 =
COND(!cso->normalized_coords, A3XX_TEX_SAMP_0_UNNORM_COORDS) |
COND(!cso->seamless_cube_map, A3XX_TEX_SAMP_0_CUBEMAPSEAMLESSFILTOFF) |
@@ -120,9 +122,9 @@ fd3_sampler_state_create(struct pipe_context *pctx,
A3XX_TEX_SAMP_0_XY_MAG(tex_filter(cso->mag_img_filter, aniso)) |
A3XX_TEX_SAMP_0_XY_MIN(tex_filter(cso->min_img_filter, aniso)) |
A3XX_TEX_SAMP_0_ANISO(aniso) |
A3XX_TEX_SAMP_0_WRAP_S(tex_clamp(cso->wrap_s, clamp_to_edge)) |
A3XX_TEX_SAMP_0_WRAP_T(tex_clamp(cso->wrap_t, clamp_to_edge)) |
A3XX_TEX_SAMP_0_WRAP_R(tex_clamp(cso->wrap_r, clamp_to_edge));
A3XX_TEX_SAMP_0_WRAP_S(tex_clamp(cso->wrap_s, clamp_to_edge, &so->needs_border)) |
A3XX_TEX_SAMP_0_WRAP_T(tex_clamp(cso->wrap_t, clamp_to_edge, &so->needs_border)) |
A3XX_TEX_SAMP_0_WRAP_R(tex_clamp(cso->wrap_r, clamp_to_edge, &so->needs_border));
if (cso->compare_mode)
so->texsamp0 |= A3XX_TEX_SAMP_0_COMPARE_FUNC(cso->compare_func); /* maps 1:1 */

View File

@@ -41,6 +41,7 @@ struct fd3_sampler_stateobj {
struct pipe_sampler_state base;
uint32_t texsamp0, texsamp1;
bool saturate_s, saturate_t, saturate_r;
bool needs_border;
};
static inline struct fd3_sampler_stateobj *

View File

@@ -88,7 +88,7 @@ fd4_draw(struct fd_context *ctx, struct fd_ringbuffer *ring,
}
static inline enum pc_di_index_size
static inline enum a4xx_index_size
fd4_size2indextype(unsigned index_size)
{
switch (index_size) {

View File

@@ -131,16 +131,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
[SB_FRAG_TEX] = REG_A4XX_TPL1_TP_FS_BORDER_COLOR_BASE_ADDR,
};
struct fd4_context *fd4_ctx = fd4_context(ctx);
unsigned i, off;
void *ptr;
u_upload_alloc(fd4_ctx->border_color_uploader,
0, BORDER_COLOR_UPLOAD_SIZE,
BORDER_COLOR_UPLOAD_SIZE, &off,
&fd4_ctx->border_color_buf,
&ptr);
fd_setup_border_colors(tex, ptr, 0);
bool needs_border = false;
unsigned i;
if (tex->num_samplers > 0) {
int num_samplers;
@@ -166,6 +158,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
&dummy_sampler;
OUT_RING(ring, sampler->texsamp0);
OUT_RING(ring, sampler->texsamp1);
needs_border |= sampler->needs_border;
}
for (; i < num_samplers; i++) {
@@ -235,10 +229,22 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
debug_assert(v->astc_srgb.count == 0);
}
OUT_PKT0(ring, bcolor_reg[sb], 1);
OUT_RELOC(ring, fd_resource(fd4_ctx->border_color_buf)->bo, off, 0, 0);
if (needs_border) {
unsigned off;
void *ptr;
u_upload_unmap(fd4_ctx->border_color_uploader);
u_upload_alloc(fd4_ctx->border_color_uploader,
0, BORDER_COLOR_UPLOAD_SIZE,
BORDER_COLOR_UPLOAD_SIZE, &off,
&fd4_ctx->border_color_buf,
&ptr);
fd_setup_border_colors(tex, ptr, 0);
OUT_PKT0(ring, bcolor_reg[sb], 1);
OUT_RELOC(ring, fd_resource(fd4_ctx->border_color_buf)->bo, off, 0, 0);
u_upload_unmap(fd4_ctx->border_color_uploader);
}
}
/* emit texture state for mem->gmem restore operation.. eventually it would

View File

@@ -80,7 +80,8 @@ emit_mrt(struct fd_ringbuffer *ring, unsigned nr_bufs,
if (rsc->stencil) {
rsc = rsc->stencil;
pformat = rsc->base.b.format;
bases++;
if (bases)
bases++;
}
slice = fd_resource_slice(rsc, psurf->u.tex.level);

View File

@@ -121,6 +121,12 @@ emit_shader(struct fd_ringbuffer *ring, const struct ir3_shader_variant *so)
OUT_RELOC(ring, so->bo, 0,
CP_LOAD_STATE_1_STATE_TYPE(ST_SHADER), 0);
}
/* for how clever coverity is, it is sometimes rather dull, and
* doesn't realize that the only case where bin==NULL, sz==0:
*/
assume(bin || (sz == 0));
for (i = 0; i < sz; i++) {
OUT_RING(ring, bin[i]);
}

View File

@@ -36,7 +36,7 @@
#include "fd4_format.h"
static enum a4xx_tex_clamp
tex_clamp(unsigned wrap, bool clamp_to_edge)
tex_clamp(unsigned wrap, bool clamp_to_edge, bool *needs_border)
{
/* Hardware does not support _CLAMP, but we emulate it: */
if (wrap == PIPE_TEX_WRAP_CLAMP) {
@@ -50,6 +50,7 @@ tex_clamp(unsigned wrap, bool clamp_to_edge)
case PIPE_TEX_WRAP_CLAMP_TO_EDGE:
return A4XX_TEX_CLAMP_TO_EDGE;
case PIPE_TEX_WRAP_CLAMP_TO_BORDER:
*needs_border = true;
return A4XX_TEX_CLAMP_TO_BORDER;
case PIPE_TEX_WRAP_MIRROR_CLAMP_TO_EDGE:
/* only works for PoT.. need to emulate otherwise! */
@@ -113,14 +114,15 @@ fd4_sampler_state_create(struct pipe_context *pctx,
so->saturate_r = (cso->wrap_r == PIPE_TEX_WRAP_CLAMP);
}
so->needs_border = false;
so->texsamp0 =
COND(miplinear, A4XX_TEX_SAMP_0_MIPFILTER_LINEAR_NEAR) |
A4XX_TEX_SAMP_0_XY_MAG(tex_filter(cso->mag_img_filter, aniso)) |
A4XX_TEX_SAMP_0_XY_MIN(tex_filter(cso->min_img_filter, aniso)) |
A4XX_TEX_SAMP_0_ANISO(aniso) |
A4XX_TEX_SAMP_0_WRAP_S(tex_clamp(cso->wrap_s, clamp_to_edge)) |
A4XX_TEX_SAMP_0_WRAP_T(tex_clamp(cso->wrap_t, clamp_to_edge)) |
A4XX_TEX_SAMP_0_WRAP_R(tex_clamp(cso->wrap_r, clamp_to_edge));
A4XX_TEX_SAMP_0_WRAP_S(tex_clamp(cso->wrap_s, clamp_to_edge, &so->needs_border)) |
A4XX_TEX_SAMP_0_WRAP_T(tex_clamp(cso->wrap_t, clamp_to_edge, &so->needs_border)) |
A4XX_TEX_SAMP_0_WRAP_R(tex_clamp(cso->wrap_r, clamp_to_edge, &so->needs_border));
so->texsamp1 =
// COND(miplinear, A4XX_TEX_SAMP_1_MIPFILTER_LINEAR_FAR) |

Some files were not shown because too many files have changed in this diff Show More