Compare commits

...

1186 Commits

Author SHA1 Message Date
Chad Versace
82b324c24b i965/gen8: Remove gen<8 checks in gen8 code
Some assertions in gen8_surface_state.c checked for gen < 8.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-09 14:24:12 -07:00
Chad Versace
8a0c85b258 i965/gen9: Enable rep clears on gen9
The (gen < 9) check in brw_clear() was too broad. It disabled all types
of fast color clears:
    a. singlesample rep clears
    b. singlesample MCS fast clears
    c. multisample MCS fast clears

The MCS clears are still buggy, but the rep clear works well. So let's
enable it.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-09 14:24:12 -07:00
Chad Versace
dcd59a9e32 i965/gen9: Disable MCS for 1x color surfaces
Fast color clears are disabled for gen9 (see the checks in
brw_meta_fast_clear), so there is no reason to allocate the MCS and
track its clear/resolve state.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-09 14:24:12 -07:00
Roland Scheidegger
4c4ba5a8c3 tgsi: (trivial) kill c99-ism. 2015-10-09 23:12:14 +02:00
Marek Olšák
d695c676ea program: remove _mesa_init_*_program wrappers
They didn't do anything useful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:19 +02:00
Marek Olšák
092f0427dc program: remove other unused functions
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
5042a3eef8 program: remove unused cloning and combining functions
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c947a3a4c4 program: remove unused function _mesa_find_line_column
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
ee01942eb5 st/mesa: release the glsl_to_tgsi visitor after translation
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
e5073e8d0c st/mesa: translate tessellation shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
897177020b st/mesa: translate geometry shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
a907b5dd16 st/mesa: translate fragment shaders into TGSI when we get them
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
46021ace51 st/mesa: translate vertex shaders into TGSI when we get them
The translate functions is split into two:
- translation to TGSI
- creating the variant (TGSI transformations only)

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
de6a004035 st/mesa: fix glDrawPixels with a texture
The samplers for DrawPixels data and the pixel map are assigned to slots
which don't overlap with the existing sampler slots.

The texture coordinates for the user texture are uploaded as a constant.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
f15bb3e633 st/mesa: implement DrawPixels shader transformation using tgsi_transform_shader
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
b55b986dc9 st/mesa: make Z/S drawpix shaders independent of variants, don't use Mesa IR v2
- there is no connection to user fragment shaders, so having these as
  shader variants makes no sense
- don't use Mesa IR, use TGSI
- don't create gl_fragment_program, just create the shader CSO

v2: generate exactly the same shader as before to fix llvmpipe

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
f4ec81032b st/mesa: implement glBitmap shader transformation using tgsi_transform_shader
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
3eedb63371 st/mesa: remove old emulation for VS and FS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c04e91a0e9 st/mesa: use TGSI utility to emulate features for FS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
941721ee2a st/mesa: use TGSI utility to emulate features for VS variants
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
4bbe418b4b st/mesa: decrease the size of st_vertex_program
The other variables can't be moved.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
4a21edf067 st/mesa: inline st_prepare_vertex_program
No other shader stage has a "prepare" function.
This will allow removing some variables from st_vertex_program.

Also, prepare_fragment_program was a dead prototype.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
c80c19a9d5 tgsi/scan: add info about declared samplers (v2)
v2: get it from declarations, not instructions
2015-10-09 22:02:18 +02:00
Marek Olšák
417927ebde tgsi: add a utility for emulating some GL features
st/mesa will use this, but drivers can use it too.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Marek Olšák
9ea2a86809 mesa: call ProgramStringNotify for fixed-function vertex programs
Drivers weren't notified about this at all.
This allows disabling on-demand compilation in drivers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2015-10-09 22:02:18 +02:00
Rob Clark
c9b982b72d glsl: move shader_enums into nir
First step towards inverting the dependency between glsl and nir (so nir
can be used without glsl).  Also solves this issue with 'make distclean'

  Making distclean in mesa
  make[2]: Entering directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
  Makefile:2486: ../glsl/.deps/shader_enums.Plo: No such file or directory
  make[2]: *** No rule to make target '../glsl/.deps/shader_enums.Plo'. Stop.
  make[2]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
  Makefile:684: recipe for target 'distclean-recursive' failed
  make[1]: *** [distclean-recursive] Error 1
  make[1]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src'
  Makefile:615: recipe for target 'distclean-recursive' failed
  make: *** [distclean-recursive] Error 1

Reported-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-09 15:03:28 -04:00
Francisco Jerez
7e441bf025 mesa: Get rid of texture-dependent image unit derived state.
The point is to avoid having to re-validate all image units when
_NEW_TEXTURE is flagged, which can be expensive if the driver exposes
a large number of image units.  This has been reported to fix a 36%
performance regression in the Synmark2 Multithread benchmark on the
i965 driver which exposes 192 image units.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788
Reported-by: Wendy Wang <wendy.wang@intel.com>
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:49:01 +03:00
Francisco Jerez
2d97a78b37 i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.
gl_image_unit::_Valid will be removed in a future commit.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:52 +03:00
Francisco Jerez
25d3338be3 mesa: Skip redundant texture completeness checking during image validation.
The call to _mesa_test_texobj_completeness() is unnecessary if the
texture is already known to be complete.  If the texture object is
dirtied in the meantime _BaseComplete and _MipmapComplete will be
reset to false.  _mesa_is_image_unit_valid() will start to be called
more frequently in a future commit, so it seems desirable to avoid the
unnecessary work.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:46 +03:00
Francisco Jerez
5152db415f mesa: Expose function to calculate whether a shader image unit is valid.
A future commit will remove all texture object-dependent derived state
from the image unit struct to make validation unnecessary on texture
state changes.  Instead of checking gl_image_unit::_Valid drivers will
be required to call this function when needed to find out whether an
image unit is in a valid state and whether access from the shader is
allowed.

Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-09 17:48:28 +03:00
Francisco Jerez
5346c11670 i965: Don't tell the hardware about our UAV access.
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives.  The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.

Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage.  An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required".  The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.

The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.

None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled.  We do set the "Accesses UAV" bits
though since my commit ac7664e493 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:

> 3DSTATE_GS
>[...]
> 12 Accesses UAV
>    Format: Enable
>    This field must be set when GS has a UAV access.

There are similar comments in the documentation for the other
3DSTATE_*S commands.  The "must" part is misleading and unjustified
AFAIK.  Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty.  There is an exception though.  On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.

The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).

The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only.  Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-10-09 17:48:26 +03:00
Connor Abbott
bb59ba8634 nir/instr_set: remove unnecessary check in nir_instrs_equal()
This was originally added to nir_instrs_equal() instead of
nir_instr_can_cse() incorrectly, but this was fixed when moving to the
instruction set API (as it had to be, otherwise hashing wouldn't work).
Now, this is dead code since instr_can_rewrite() will only return true
for texture instructions that use an index, so we can turn the check into
an assert. This also means that now nir_instrs_equal(instr, instr) will
always return true unless it assert-fails.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:15:28 -04:00
Connor Abbott
bf5f931aee nir: make nir_instrs_equal() static
This was previously tied to CSE, since it would only work for
instructions where nir_can_cse() (now instr_can_rewrite()) returned
true. Now that CSE uses the instruction set abstraction which only uses
this internally, we can make it local to nir_instr_set.c.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:15:15 -04:00
Connor Abbott
e8308d0523 nir/cse: use the instruction set API
This replaces an O(n^2) algorithm with an O(n) one, while allowing us to
import most of the infrastructure required for GVN. The idea is to walk
the dominance tree depth-first, similar when converting to SSA, and
remove the instructions from the set when we're done visiting the
sub-tree of the dominance tree so that the only instructions in the set
are the instructions that dominate the current block.

No piglit regressions. No shader-db changes.

Compilation time for full shader-db:

Difference at 95.0% confidence
        -35.826 +/- 2.16018
        -6.2852% +/- 0.378975%
        (Student's t, pooled s = 3.37504)

v2:
- rebase on start_block removal
- remove useless state struct
- change commit message

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:42 -04:00
Connor Abbott
523a28d3fe nir: add an instruction set API
This will replace direct usage of nir_instrs_equal() in the CSE pass,
which reduces an O(n^2) algorithm with an effectively O(n) one. It'll
also be useful for implementing GVN on top of GCM.

v2:
- Add texture support.
- Add more comments.
- Rename instr_can_hash() to instr_can_rewrite() since it's really more
about whether its uses can be rewritten, and it's implicitly used by
nir_instrs_equal() as well.
- Rename nir_instr_set_add() to nir_instr_set_add_or_rewrite() (Jason).
- Make the HASH() macro less magical (Topi).
- Rewrite the commit message.

v3:
- For sorting phi sources, use a VLA, store pointers to the sources, and
compare the predecessor pointer directly (Jason).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:35 -04:00
Connor Abbott
005c2efb7b nir: constify instruction comparison functions
v2: rebase, don't constify nir_srcs_equal() as it's pass-by-value
anyways

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:28 -04:00
Connor Abbott
d6bc35934f nir: constify nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:14:20 -04:00
Connor Abbott
20d6d812dc nir: split out instruction comparison functions
Right now nir_instrs_equal() is tied pretty tightly to CSE, but we're
going to introduce the idea of an instruction set and tie it to that
instead.  In anticipation of that, move this into its own file where
we'll add the rest of the instruction set implementation later.

v2: Rebase on texture support.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-09 10:13:27 -04:00
Neil Roberts
da361acd1c i965/fs: Handle non-const sample number in interpolateAtSample
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.

The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.

This fixes the following two Piglit tests:

arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform

v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
    Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
    number.
v5: Don't special case immediate values in
    brw_pixel_interpolator_query. Make a better wrapper for the
    function to set up the PI send instruction. Ensure that the SHL
    instructions are scalar. (Francisco Jerez).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-09 15:13:40 +02:00
Neil Roberts
728d7bc85f i965: Add a second successor to BRW_OPCODE_WHILE
It is possible to directly predicate the WHILE instruction. In this
case there will be a second successor block because the execution can
resume from the instruction after the loop. This will be used in a
subsequent patch.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-09 15:13:22 +02:00
Neil Roberts
886d46b089 nir: Add a function to determine if a source is dynamically uniform
Adds nir_src_is_dynamically_uniform which returns true if the source
is known to be dynamically uniform. This will be used in a later patch
to add a workaround for cases that only work with dynamically uniform
sources. Note that the function is not definitive, it can return false
negatives (but not false positives). Currently it only detects
constants and uniform accesses. It could easily be extended to include
more cases.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-09 15:10:40 +02:00
Samuel Pitoiset
7129cbf5f4 nvc0: move HW SM queries to nvc0_query_hw_sm.c/h files
Global performance counters (PCOUNTER) will be added to
nvc0_query_hw_pm.c/h files.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
224fec05ea nvc0: move HW queries to nvc0_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
77b6990d14 nvc0: move SW queries to nvc0_query_sw.c/h files
Loosely based on freedreno driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
0678530b9e nvc0: move nvc0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Pitoiset
0644196ab1 nvc0: add a header file for nvc0_query
This will allow to split SW and HW queries in an upcoming patch.

While we are at it, make use of nvc0_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-09 14:09:57 +02:00
Samuel Iglesias Gonsalvez
3da58730ee main: fix length of values written to glGetProgramResourceiv() for ACTIVE_VARIABLES
Return the number of values written.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
d0992fa15a main: buffer array variables can have array size of 0 if they are unsized
From ARB_program_query_interface:

  For the property ARRAY_SIZE, a single integer identifying the number of
  active array elements of an active variable is written to <params>. The
  array size returned is in units of the type associated with the property
  TYPE. For active variables not corresponding to an array of basic types,
  the value one is written to <params>. If the variable is a shader
  storage block member in an array with no declared size, the value zero
  is written to <params>.

v2:
- Unsized arrays of arrays have an array size different than zero

v3:
- Arrays and unsized arrays will have an array_stride > 0. Use it
  instead of is_unsized_array flag (Timothy).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
66ca8e6632 main: consider that unsized arrays have at least one active element
From ARB_shader_storage_buffer_object:

"When using the ARB_program_interface_query extension to enumerate the
 set of active buffer variables, only the first element of arrays (sized
 or unsized) will be enumerated"

_mesa_program_resource_array_size() is used when getting the name (and
name length) of the active variables. When it is an unsized array,
we want to indicate it has one active element so the returned name
would have "[0]" at the end.

v2:
- Use array_stride > 0 and array_elements == 0 to detect unsized
  arrays. Because of that, we don't need is_unsized_array flag
  (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-09 08:13:55 +02:00
Samuel Iglesias Gonsalvez
77c0b64ce3 main: fix TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE
When the active variable is an array which is already a top-level
shader storage block member, don't return its array size and stride
when querying TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE
respectively.

Fixes the following 12 dEQP-GLES31 tests:

dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.column_major_mat3x4

v2:
- Fix check when the shader storage block is instanced
- Write auxiliary function to do the check.

v3:
- Check if full_instanced_name is NULL just after allocation (Ilia)
- Remove () from one strcmp() in the if statement (Ilia)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-09 08:13:49 +02:00
Samuel Iglesias Gonsalvez
5be9bf2746 main: fix goto in program_resource_top_level_array_stride
Use found_top_level_array_stride instead of found_top_level_array_size.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-09 08:12:10 +02:00
Tapani Pälli
d8d0e4a81e mesa: add GL_UNSIGNED_INT_24_8 to _mesa_pack_depth_span
Patch adds missing type (used with NV_read_depth) so that it gets
handled correctly. This fixes errors seen with following CTS test:

   ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-09 09:11:14 +03:00
Brian Paul
7d7dd18711 mesa,meta: move gl_texture_object::TargetIndex initializations
Before, we were unconditionally assigning the TargetIndex field in
_mesa_BindTexture(), even if it was already set properly.  Now we
initialize TargetIndex wherever we initialize the Target field, in
_mesa_initialize_texture_object(), finish_texture_init(), etc.

v2: also update the meta_copy_image code.  In make_view() the
view_tex_obj->Target field was set, but not the TargetIndex field.
Also, remove a second, redundant assignment to view_tex_obj->Target.
Add sanity check assertions too.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Brian Paul
d61f492aba mesa: remove unused _mesa_create_nameless_texture()
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Brian Paul
b373c77693 mesa: remove unneeded error check in create_textures()
Callers of create_texture() will either pass target=0 or a validated
GL texture target enum so no need to do another error check inside
the loop.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-10-08 13:53:33 -06:00
Kristian Høgsberg Kristensen
c71f0d45e6 i965: Link compiler unit tests to libi965_compiler.la
We can now link the unit tests against just libi965_compiler.la. This
lets us drop a lot of DRI driver dependencies, but we still pull in all
of libmesa and more.

This also provides a few standalone users of libi965_compiler.la, which
will help us accidentally using i965_dri.so functions from the compiler.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
08d890d3bb i965: Break out backend compiler to its own library
This introduces a new libtool helper library, libi965_compiler.la.  This
library is moderately self-contained, but still needs to link to all of
libmesa.la among other things.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
9a2573e5fc i965/cs: Get max_cs_threads from brw_compiler devinfo
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
ee0f0108c8 i965: Move brw_get_shader_time_index() call out of emit functions
brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
ffc841cae5 i965: Move brw_select_clip_planes() to brw_shader.cpp
We call this from the compiler so move it to brw_shader.cpp.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
365e5d7892 i965: Use util_next_power_of_two() for brw_get_scratch_size()
This function computes the next power of two, but at least 1024. We can
do that by bitwise or'ing in 1023 and calling util_next_power_of_two().

We use brw_get_scratch_size() from the compiler so we need it out of
brw_program.c. We could move it to brw_shader.cpp, but let's make it a
small inline function instead.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
cc4683992b i965: Move brw_mark_surface_used() to brw_shader.cpp
brw_program.c won't be part of the compiler library, but we need
brw_mark_surface_used() in the compiler. Move to brw_shader.cpp.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:03 -07:00
Kristian Høgsberg Kristensen
469d0e449b i965/cs: Split out helper for building local id payload
The initial motivation for this patch was to avoid calling
brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the
compiler. This commit ends up refactoring things a bit more so as to
split out the logic to build the local id payload to brw_fs.cpp. This
moves the payload building closer to the compiler code that uses the
payload layout and makes it available to other users of the compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:15:02 -07:00
Kristian Høgsberg Kristensen
4f33700f5a i965: Move brw_link_shader() and friends to new file brw_link.cpp
We want to use the rest of brw_shader.cpp with the rest of the compiler
without pulling in the GLSL linking code.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:14:44 -07:00
Kristian Høgsberg Kristensen
99ca2256c1 i965: Configure bufmgr debug options from intel_screen.c
We need the debug flag parsing and INTEL_DEBUG in the compiler, but we
don't want the dependency on bufmgr (libdrm_intel) in there. Move to
intel_screen.c.

There are now only two lines left in brw_process_intel_debug_variable(),
but we keep it in intel_debug.h to avoid having to expose
'debug_control' as a global variable.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Kristian Høgsberg Kristensen
04158fb0f6 util: Move DRI parse_debug_string() to util
We want to use intel_debug.c in code that doesn't link to dri common.

v2: Remove unnecessary stddef.h include (Topi), use util/debug.h
    in all DRI driver and remove driParseDebugString() (Iago).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Kristian Høgsberg Kristensen
ba71d581ae i965: Move brw_dump_ir() out of brw_*_emit() functions
We move these calls one level up into the codegen functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-08 12:13:31 -07:00
Emil Velikov
1fda56cdb2 gallium/ddebug: add missing dd_util.h to sources list
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-08 18:13:24 +01:00
Emil Velikov
62741ff052 gallium/ddebug: automake: sort sources alphabetically
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-08 18:13:24 +01:00
Jason Ekstrand
9c528f5dfa nir/sweep: Reparent the shader name
Previously the name of the nir shader was being freed prematurely during
nir_sweep. Since 756613ed35 the name was later being used to generate
filenames for the optimiser debug output and these would end up with
garbage from the dangling pointer.

Co-authored-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-08 08:20:31 -07:00
Jan Vesely
c8031a879a c11/threads: initialize timeout structure
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-08 14:05:57 +01:00
Boyan Ding
89ae41ab4c docs/relnotes: document EGL_KHR_create_context on llvmpipe and softpipe
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-10-08 14:05:36 +01:00
Iago Toral Quiroga
1efbb8151b i965/gs/gen6: Maximum allowed size of SEND messages is 15 (4 bits)
Comit d48ac93066 addressed this for VS, but we forgot to do the same for
URB writes generated by the gen6 GS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
3141906fa3 i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place
That should make tracking where we do spills and pull loads a bit easier.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
36e82b137d i965: make pull constant loads in gen6 start at MRFs 16/17
So they do not conflict with our (un)spills (MRF 21..23) or our
URB writes (MRF 1..15)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Iago Toral Quiroga
0c2add7751 i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-08 11:28:16 +02:00
Tapani Pälli
aee28a0aa3 mesa: include bad type in error string of _mesa_pack_depth_span
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-08 09:25:16 +03:00
Tapani Pälli
4e7fd66cf0 glsl: add varyings to resource list only with SSO
Varyings can be considered inputs or outputs of a program only when
SSO is in use. With multi-stage programs, inputs contain only inputs
for first stage and outputs contains outputs of the final shader stage.

I've tested that fix works for Assault Android Cactus (demo version)
and does not cause Piglit or CTS regressions in glGetProgramiv tests.

Following ES 3.1 CTS separate shader tests that do query properties
of varyings in SSO shader programs pass:

   ES31-CTS.program_interface_query.separate-programs-vertex
   ES31-CTS.program_interface_query.separate-programs-fragment

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92122
2015-10-08 07:43:11 +03:00
Jason Ekstrand
6ad9ebb073 mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks
The EXT_texture_format_BGRA8888 extension (which mesa supports
unconditionally) adds a new format and internal format called GL_BGRA_EXT.
Previously, this was not really handled at all in
_mesa_ex3_error_check_format_and_type.  When the checks were tightened in
commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT
would always cause an error to be thrown.

There were two primary issues here.  First, is that
_mesa_es3_effective_internal_format_for_format_and_type didn't handle the
GL_BGRA_EXT format.  Second is that it blindly uses _mesa_base_tex_format
which returns GL_RGBA for GL_BGRA_EXT.  This commit fixes both of these
issues as well as adds explicit checks that GL_BGRA_EXT is only ever used
with GL_BGRA_EXT and GL_UNSIGNED_BYTE.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-07 20:32:53 -07:00
Emil Velikov
bbf728f11b Revert "mesa: enable KHR_debug for ES contexts"
This reverts commit b69cfbdf18.

This isn't quite baked yet. Seems that despite building the ES piglits,
none of them got executed.
2015-10-07 21:49:50 +01:00
Matt Turner
164c8277f0 egl/dri2: Properly dereference array.
Fixes a regression that broke EGL since

commit 858f2f2ae6
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date:   Sun Sep 13 12:25:27 2015 +0100

    egl/dri2: ease srgb __DRIconfig conditionals
2015-10-07 11:48:49 -07:00
Marek Olšák
13e69805ea radeonsi: fix a GS hang on VI
Broken by one of the cleanups: 0d46c3bc9d
Not applicable to stable.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Marek Olšák
5749676d03 radeonsi: remove TC L2 cache flush for index buffers on VI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 19:18:50 +02:00
Brian Paul
6ed8fd3d67 svga: whitespace fixes in svga_sampler_view.c 2015-10-07 08:45:56 -06:00
Brian Paul
70c4cde453 svga: whitespace fixes in svga_resource_buffer.c 2015-10-07 08:45:56 -06:00
Stefan Dösinger
a2bc4a7b04 mesa: Remove GL_ARB_sampler_object depth compare error checking.
Version 3: Simplify the code comment, word wrap commit description.

Version 2: Return GL_FALSE if ARB_shadow is unsupported instead of
pretending to store the value as suggested by Brian Paul.

This fixes a GL error warning on r200 in Wine.

The GL_ARB_sampler_objects extension does not specify a dependency on
GL_ARB_shadow or GL_ARB_depth_texture for setting the depth texture
compare mode and function. Silently ignore attempts to change these
settings. They won't matter without a depth texture being assigned
anyway.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-07 08:45:56 -06:00
Brian Paul
2bad030ac9 svga: round UBO constant buffer size up/down to multiple of 16 bytes
The svga3d device requires constant buffers to be a multiple of 16 bytes
in size.  OpenGL UBOs may not fit that restriction.  As a work-around,
round the size up if possible, else round down.

Note that this patch only effects UBO constant buffers (index 1 or higher),
not the 0th/default constant buffer.

Fixes the game Grim Fandango Remastered.  VMware bug 1510130.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-07 08:45:56 -06:00
Emil Velikov
4ea5ed9f51 egl/dri2: enable EGL_KHR_gl_colorspace for swrast
No driver changes needed for softpipe/llvmpipe - things just work.

v2: Whitespace fixes.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 15:18:03 +01:00
Emil Velikov
858f2f2ae6 egl/dri2: ease srgb __DRIconfig conditionals
One can simplify the if-else chain, by declaring the driconfigs as a
two sized array, whist using srgb as a index to the correct entry.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-07 15:17:57 +01:00
Emil Velikov
b69cfbdf18 mesa: enable KHR_debug for ES contexts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:08:50 +01:00
Matthew Waters
70643a1389 main/get: make KHR_debug enums available everywhere
Move all the enums but CONTEXT_FLAGS. The spec seems quite explicit
about the latter (wrt OpenGL ES)

    "In OpenGL ES versions prior to and including ES 3.1 there is no
    CONTEXT_FLAGS state and therefore the CONTEXT_FLAG_DEBUG_BIT cannot
    be queried."

v2 [Emil Velikov] Rebase.
v3 [Emil Veliokv] Drop the CONTEXT_FLAGS hunk - not applicable for GLES

Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:07:01 +01:00
Matthew Waters
ae6ff72f5a glapi: add function pointers for KHR_debug for gles
v2 [Emil Velikov]
 - Rebase.
 - Correct version in gles11 dispatch_sanity.
 - Move the extension enable to a separate patch.

Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:07:01 +01:00
Varad Gautam
deb1765ec6 egl: move memcpy to bring conf->base operations together
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:05:28 +01:00
Varad Gautam
f988eff379 egl: restore surface type before linking config to its display
commit c2c2e9a (egl: implement EGL_KHR_gl_colorspace (v2)) leaves
_EGLConfig->SurfaceType set incorrectly before calling _eglLinkConfig(),
and the bad value is passed around to platform_android. set it to zero
as earlier.

v2: Set SurfaceType to 0, rather than surface_type (Suggested by Emil)

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91596
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-07 15:05:20 +01:00
Ilia Mirkin
47d11990b2 nouveau: make sure there's always room to emit a fence
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-07 04:30:05 -04:00
Boyan Ding
64d9d4b730 vc4: use nir two-sided-color lowering
Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering).
No piglit regression.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-06 16:34:07 -07:00
Eric Anholt
b6cd39fc47 vc4: Fix a leak of the last color read/write surface on context destroy. 2015-10-06 16:32:03 -07:00
Eric Anholt
922e0680f9 vc4: Fix a memory leak in the simulator case.
We validate per draw call, and need to free the shader per draw call, too.
2015-10-06 16:29:14 -07:00
Mark Janes
3861010213 mesa: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
3475b68abd radeon/r200: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
eb6b80842f i965: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Mark Janes
83f9f911b2 i915: remove unneeded #include of colormac.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 12:36:32 -07:00
Ville Syrjälä
3bcc780126 i915: Drop broken front_buffer_reading/drawing optimization
Bring the following commit over to i915:
 commit ec542d7457
 Author: Eric Anholt <eric@anholt.net>
 Date:   Mon Mar 3 10:43:10 2014 -0800

    i965: Drop broken front_buffer_reading/drawing optimization.

Not sure if it might fix anything, but since the i965 and i915 used to
share a bunch of that code, it would seem reasonable the same problems
could be present in the i915 code still, and the i965 approach is well
tested by now so bringing it over seems fairly safe.

No piglit regressions on 855.

v2: Rebase on _mesa_is_front_buffer_* refactor.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
ea8b77e892 mesa/i965: Refactor brw_is_front_buffer_{drawing,reading} to common code
There are multiple similar implementations of these functions, and a
later patch was going to add another.

v2: Move removing intel_framebuffer to a different patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
5c4ef9f1d2 st/mesa: Don't override NewFramebuffer just to call _mesa_new_framebuffer
v2: Since state_tracker does not call _mesa_init_driver_functions, we
need to initialize the dd::NewFramebuffer pointer to
_mesa_new_framebuffer here.  Suggested by Brian.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:36:37 -07:00
Ian Romanick
df75babf74 radeon: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-06 11:36:32 -07:00
Ian Romanick
e32a6590a4 i915: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:28:00 -07:00
Ian Romanick
ed7f00f564 i965: Don't override NewFramebuffer just to call _mesa_new_framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-06 11:27:45 -07:00
Ville Syrjälä
021f15816e i830: Fix culling with user fbos on gen2
Flip the cull bits when rendering to a user fbo on gen2. This
was already done on gen3 (since before git history starts)
but was missing from the gen2 code.

Fixes rendering of the driver+kart model in supertuxkart kart
selection screen.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
3e2c7ca773 i915: Adjust line size limits
The hardware can draw lines 0.5 to 7.5 pixels wide. Adjust the limits
to 1.0-7.0. The old limits seems to be from the era when i915 and i965
were sharing this code.

Not really sure if 1.0-7.0 is correct. Maybe it could be 0.5.7.5 as
those are the hw limits, or maybe some combination of the two?

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
00ee403883 i915: Enable intel_render path for points
The sub-pixel adjustment for points was killed off in
 commit 60d762aa62
 Author: Xiang, Haihao <haihao.xiang@intel.com>
 Date:   Wed Jan 2 11:38:51 2008 +0800

    i915: Needn't adjust pixel centers. fix #12944

so if we don't need it in intel_tris.c we don't need it in
intel_render.c either, which means we can allow intel_render.c to render
points.

No apparent regressions on PNV in ES1 or ES2 conformance.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
0febd0ecfd i915: Use COPY_DWORDS for points
The sub-pixel adjustment for points was killed off in
 commit 60d762aa62
 Author: Xiang, Haihao <haihao.xiang@intel.com>
 Date:   Wed Jan 2 11:38:51 2008 +0800

    i915: Needn't adjust pixel centers. fix #12944

so we can just as well use COPY_DWORDS().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
bcf650496f i915: Use _tnl_RenderClippedPolygon and _tnl_RenderClippedLine
_tnl_RenderClippedPolygon and _tnl_RenderClippedLine already do most of
what we want so use them.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
303895655c i915: Handle provoking vertex in intelFastRenderClippedPoly()
intelFastRenderClippedPoly() renders the polygon using triangles. For
polygons the provoking vertex is always the first one, and currently
this function assumes that the provoking vertex for triangles is the
last one. In case the user changed the provoking vertex convention,
the hardware may be configured to treat the first vertex of triangles
as the provoking vertex. So check the convention and emit the triangles
in the appropriate order to avoid having to change the hardware
provoking vertex convention for rendering polygons.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
0886426503 t_dd_dmatmp: Check provoking vertex convention when rendering quads
When drawing quads using triangles we need to be careful to make
the provoking vertices match when flat shading.

v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
83d511e190 t_dd_dmatmp: Disallow flat shading when rendering quad strips via tri strips
When rendering quad strips via tri strips we can't get the provoking
vertex right, so disallow flat shading.

v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ville Syrjälä
b15b4581d1 t_dd_dmatmp: Allow flat shaded polygons with tri fans
We can allow rendering flat shaded polygons using tri fans if we check
the provoking vertex convention.

v2 (idr): Remove _EXT suffixes from GL_FIRST_VERTEX_CONVENTION.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-06 11:16:19 -07:00
Ian Romanick
5ca00e0b8d t_dd_dmatmp: Replace fprintf with unreachable
From http://lists.freedesktop.org/archives/mesa-dev/2015-May/084883.html:

    "There are no real error cases here, just dead code.
    validate_render() is supposed to make sure we never call these
    functions if the code can't actually render the primitives. The
    fprintf()+return branches should really just contain assert(0) or
    equivalent."

I also rearranged the if-else-block in render_quad_strip_verts to look
more like the other functions.  A future patch is going to change a
bunch of that code anyway.

v2: Make "unreachable" message more descriptive.  Suggested by Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-06 10:44:00 -07:00
Ian Romanick
46b13666d8 radeon: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Ian Romanick
68976a5a00 i965: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Ville Syrjälä
fad5fd3a25 i915: Use C99 initializers for primitive arrays
Using C99 initializers for the primitive arrays makes things more
readable.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-06 10:41:56 -07:00
Brian Paul
3801fa65c1 tgsi: add const qualifier to silence warning
Trivial.
2015-10-06 08:51:33 -06:00
Brian Paul
b7766a95e1 glsl: whitespace/formatting/typo fixes in link_uniforms.cpp 2015-10-06 08:51:33 -06:00
Samuel Iglesias Gonsalvez
50d5a36f35 main: array stride for unsized arrays of arrays are calculated like records
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-06 14:28:26 +02:00
Samuel Iglesias Gonsalvez
82db642042 glsl: add std430 layout support for AoA
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-06 14:02:13 +02:00
Timothy Arceri
6483183279 docs: Mark GL_ARB_enhanced_layouts as in progress 2015-10-06 14:04:23 +11:00
Ilia Mirkin
dbae576f7f i965: add EXT_polygon_offset_clamp support to gen4/gen5
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-05 14:39:38 -07:00
Matt Turner
833fa9a8cd meta: Update comment about unsupported texture types.
Ken added support for 2DArray (commit ec23d5197e) and 1DArray (commit
14ca61125) last year.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 14:35:13 -07:00
Matt Turner
d4ff638504 glx: Drop CRAY support.
It couldn't have worked anyway. There were calls to undefined functions.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-05 14:34:16 -07:00
Matt Turner
617eb5e6c3 glsl: Remove CSE pass.
With NIR, it actually hurts things.

total instructions in shared programs: 6529329 -> 6528888 (-0.01%)
instructions in affected programs:     14833 -> 14392 (-2.97%)
helped:                                299
HURT:                                  1

In all affected programs I inspected (including the single hurt one) the
pass CSE'd some multiplies and caused some reassociation (e.g., caused
(A * B) * C to be A * (B * C)) when the original intermediate result was
reused elsewhere.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 14:31:26 -07:00
Matt Turner
5a360dcad1 i965: Generalize predicated break pass for use in vec4 backend.
instructions in affected programs:     44204 -> 43762 (-1.00%)
helped:                                221

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
4098a756b5 i965/fs: Use backend_instruction in predicated break peephole.
We're not using any fs_inst fields, and the next commit will make the
peephole used by the vec4 backend.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
5964419921 i965/fs: Remove SNB embedded-comparison support from optimizations.
We never emit IF instructions with an embedded comparison (lost in the
switch to NIR), so this code is not used. If we want to readd support,
we should have a pass that merges a CMP instruction with an IF or a
WHILE instruction after other optimizations have run.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-05 13:42:58 -07:00
Matt Turner
36ea9922ad mesa: Add missing _mm_mfence() before streaming loads.
According to the Intel Software Development Manual (Volume 1: Basic
Architecture, 12.10.3 Streaming Load Hint Instruction):

   Streaming loads may be weakly ordered and may appear to software to
   execute out of order with respect to other memory operations.
   Software must explicitly use fences (e.g. MFENCE) if it needs to
   preserve order among streaming loads or between streaming loads and
   other memory operations.

That is, a memory fence is needed to preserve the order between the GPU
writing the buffer and the streaming loads reading it back.

Reported-by: Joseph Nuzman <joseph.nuzman@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-10-05 12:06:33 -07:00
Chad Versace
93161be9e7 i965: Fix intel_miptree_is_fast_clear_capable()
There are three types of fast clears:
  a. fast depth clears
  b. fast singlesample color clears
  c. fast multisample color clears
Function intel_miptree_is_fast_clear_capable() checks if a miptree
supports fast clears of type (b).

Rename the function to disambiguate what it does:
  old: intel_miptree_is_fast_clear_capable
  new: intel_miptree_supports_non_msrt_fast_clear

The functionally accidentally rejected multisampled color surfaces
because it thought they were singlesample array surfaces. Fix that by
explicitly rejecting surfaces with samples > 1.

This fix would have been needed before we enabled layered fast
singlesample color clears (introduced in gen8), which we want to do
eventually. For now, though, this patch changes no behavior; it just
fixes how the driver chooses its behavior.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 11:14:04 -07:00
Chad Versace
125a04b474 i965/mt: Declare some functions as static
intel_tiling_supports_non_msrt_mcs() and
intel_miptree_is_fast_clear_capable() are not used outside of
intel_mipmap_tree.c.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-05 11:10:11 -07:00
Iago Toral Quiroga
73e0dfbaca i965: Make vec4_visitor's destructor virtual
We need a virtual destructor when at least one of the class' methods is virtual.
Failure to do so might lead to undefined behavior when destructing derived classes.
Fixes the following warning:

brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)':
brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
    delete gs;

Curro: This shouldn't be causing any actual bugs at the moment because
gen6_gs_visitor is the only subclass of vec4_visitor destroyed through
a pointer of a base class (vec4_gs_visitor *) and its destructor is
basically the same as its parent's. Anyway it seems sensible to change
this so it doesn't bite us in the future.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-05 13:50:15 +02:00
Tapani Pälli
a90feb581a glsl: set glsl error if binding qualifier used on global scope
Fixes following Piglit test:
	global-scope-binding-qualifier.frag

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-05 14:44:24 +03:00
Iago Toral Quiroga
102f6c446b i965: Assert on the number of combined UBO and SSBO binding table entries
In theory we can't break this assertion since the compiler frontend checks
that we don't exceed any of the individual limits, but it does not hurt to
be extra safe.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:19:34 +02:00
Iago Toral Quiroga
20cbe3688a i965: Reserve binding table space for SSBO surfaces
These share the space with UBO surfaces but we need to make sure we
allocate enough space for both sets (12 of each)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Iago Toral Quiroga
41c4d45e08 i965: Define BRW_MAX_SSBO
Instead of using hard-coded values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Iago Toral Quiroga
440f9348c1 i965: Define BRW_MAX_UBO
Instead of using hard-coded values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-05 08:12:17 +02:00
Matt Turner
4caa10193f i965/vec4: Remove more dead visitor/vertex program code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-04 23:03:59 -07:00
Matt Turner
cd7fa1034a i965: Don't print line numbers with INTEL_DEBUG=optimizer.
The thing you want to do with the output files is diff them, which is
made more difficult by line numbers changing.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2015-10-04 23:03:59 -07:00
Ilia Mirkin
78ec9e28ec nv30: always go through translate module on big-endian
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-04 21:50:41 -04:00
Ilia Mirkin
1fec05d114 nv30: pretend to have packed texture/surface formats
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-04 21:50:41 -04:00
Michel Dänzer
87c3c9acd2 st/dri: Use packed RGB formats
Fixes Gallium based DRI drivers failing to load on big endian hosts
because they can't find any matching fbconfigs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71789
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-04 21:50:31 -04:00
Timothy Arceri
763cd8c080 glsl: reduce memory footprint of uniform_storage struct
The uniform will only be of a single type so store the data for
opaque types in a single array.

Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-05 10:53:24 +11:00
Kenneth Graunke
b85757bc72 i965: Remove shader_prog from vec4_gs_visitor.
Unfortunately it has to stay in gen6_gs_visitor.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
21585048a2 i965: Use nir->has_transform_feedback_varyings to avoid shader_prog.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
7768b802e5 nir: Add a nir_shader_info::has_transform_feedback_varyings flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
5d7f8cb5a5 nir: Introduce new nir_intrinsic_load_per_vertex_input intrinsics.
Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number.  While GLSL makes
this look like a normal array, it can be very different behind the
scenes.

On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct.  This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.

NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset.  We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.

This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case.  They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.

v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-04 14:00:01 -07:00
Kenneth Graunke
f2a4b40cf1 nir/lower_io: Make get_io_offset() return a nir_ssa_def * for indirects.
get_io_offset() already walks the dereference chain and discovers
whether or not we have an indirect; we can just return that rather than
computing it a second time via deref_has_indirect().  This means moving
the call a bit earlier.

By returning a nir_ssa_def *, we can pass back both an existence flag
(via NULL checking the pointer) and the value in one parameter.  It
also simplifies the code somewhat.  nir_lower_samplers works in a
similar fashion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-04 14:00:01 -07:00
Timothy Arceri
6994ca20aa glsl: fix whitespace
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-04 17:42:41 +11:00
Marek Olšák
814b7d1ab9 radeonsi: enable PIPE_CAP_FORCE_PERSAMPLE_INTERP
Now st/mesa won't generate 2 variants for this state.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
b3c55fc669 radeonsi: do force_persample_interp in shaders for non-trivial cases
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
9652bfcf2d radeonsi: implement the simple case of force_persample_interp
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
214de2d815 radeonsi: move SPI_PS_INPUT_ENA/ADDR registers to a separate state
This will be a derived state used for changing center->sample and
centroid->sample at runtime.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
55d406b71e tgsi/scan: add interpolation info into tgsi_shader_info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
6b0f21cb28 st/mesa: automatically set per-sample interpolation if using SampleID/Pos
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
4e9fc7e4e2 st/mesa: set force_persample_interp if ARB_sample_shading is used
This is only a half of the work. The next patch will handle
gl_SampleID/SamplePos, which is the other half of ARB_sample_shading.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
f3b37e321f gallium: add per-sample interpolation control into rasterizer statOAe
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
d8932a355d st/mesa: add ST_DEBUG=precompile support for tessellation shaders
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-03 22:06:09 +02:00
Marek Olšák
dd340b34f3 mesa: remove Driver.BindImageTexture
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
92709dcb9b mesa: remove Driver.DeleteSamplerObject
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
00f6beed02 mesa: remove Driver.EndCallList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
ef6c0714af mesa: remove Driver.BeginCallList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
f457964885 mesa: remove Driver.EndList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
55735cad00 mesa: remove Driver.NewList
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
7a54939728 mesa: remove Driver.NotifySaveBegin
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:09 +02:00
Marek Olšák
4b8bb2f559 mesa: remove Driver.SaveFlushVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
72a5dff9cb mesa: remove Driver.FlushVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
91799880b3 mesa: remove Driver.BeginVertices
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
82a950f187 mesa: remove Driver.BindArrayObject
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
d1269a844f mesa: remove Driver.DeleteArrayObject
Nothing reimplements it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
7401807e8d mesa: remove Driver.NewArrayObject
Nothing reimplements it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
1044f99812 mesa: remove Driver.Hint
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
8de82faf95 mesa: remove Driver.ColorMaskIndexed
Nothing sets it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
379255298f mesa: remove some Driver.Blend* hooks
Nothing sets them.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
a6cc895e93 mesa: remove Driver.Accum
Nothing calls it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
a4fca24484 mesa: remove Driver.ResizeBuffers
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
6863d5b02a mesa: remove Driver.DeleteShaderProgram
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
b37dcb8c18 mesa: remove Driver.NewShaderProgram
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
95e0303312 mesa: remove Driver.DeleteShader
Nothing overrides it.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
18123a732b egl/dri2: don't require a context for ClientWaitSync (v2)
The spec doesn't require it. This fixes a crash on Android.

v2: don't set any flags if ctx == NULL
v3: add the spec note

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
b78336085b st/dri: don't use _ctx in client_wait_sync
Not needed and it can be NULL.

v2: fix dri2_get_fence_from_cl_event - thanks Albert

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
27b102e7fd r600g: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
c23c92c965 radeonsi: only do depth-only or stencil-only in-place decompression
instead of always doing both.
Usually, only depth is needed, so stencil decompression is useless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
5804c6adf8 gallium/radeon: add separate stencil level dirty flags
We will only do depth-only or stencil-only decompress blits, whichever is
needed by textures, instead of always doing both.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
cc92b90375 radeonsi: dump buffer lists while debugging
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:08 +02:00
Marek Olšák
eb55610c89 winsys/radeon: implement cs_get_buffer_list
This is more complicated, because tracking priority_usage needed changing
the relocs_bo type.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
6f48e2bee1 winsys/amdgpu: add winsys function cs_get_buffer_list
For debugging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
93641f4341 gallium/radeon: stop using "reloc" in a few places
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
2edb060639 gallium/radeon: tell the winsys the exact resource binding types
Use the priority flags and expand them.
This information will be used for debugging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
9bd7928a35 radeonsi: add an option for debugging VM faults
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
4502d0bf88 radeonsi: move dumping the last IB into its own function
v2: indentation fix

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Marek Olšák
89f73827d0 ddebug: separate creation of debug files
This will be used by radeonsi for logging.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-03 22:06:07 +02:00
Emil Velikov
3cd5395206 docs: add news item and link release notes for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-03 13:23:13 +01:00
Emil Velikov
61c35ce4f9 docs: add sha256 checksums for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8957b696f9)
2015-10-03 13:20:08 +01:00
Emil Velikov
b2a987fc12 docs: add release notes for 10.6.9
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit ab9aacce2d)
2015-10-03 13:20:06 +01:00
Matthew Waters
11cabc45b7 egl: rework handling EGL_CONTEXT_FLAGS
As of version 15 of the EGL_KHR_create_context spec, debug contexts
are allowed for ES contexts.  We should allow creation instead of
erroring.

While we're here provide a more comprehensive checking for the other two
flags - ROBUST_ACCESS_BIT_KHR and FORWARD_COMPATIBLE_BIT_KHR

v2 [Emil Velikov] Rebase. Minor tweak in commit message.

Cc: Boyan Ding <boyan.j.ding@gmail.com>
Cc: Chad Versace <chad.versace@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91044
Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-03 12:30:13 +01:00
Jason Ekstrand
443d3bf340 i965/wm: Make compute_barycentric_interp_modes take a nir_shader and a devinfo
Now that everything comes in through NIR, we can pick this directly out of
the shader source and don't need to reference the gl_fragment_program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:20 -07:00
Jason Ekstrand
1e3c1b107e i965: Use nir_foreach_variable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:18 -07:00
Jason Ekstrand
050e4787d3 nir: Add a nir_foreach_variable macro
This is a common enough operation that it's nice to not have to think about
the arguments to foreach_list_typed every time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:16 -07:00
Jason Ekstrand
ca941799ce i965/nir: Remove the prog parameter from brw_nir_lower_inputs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 21:21:00 -07:00
Tom Stellard
a2e1e3d325 radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target v2
This fixes a race condition in the glx-multithreaded-shader-compile
test.

v2:
  - Replace gallivm_init_llvm_{begin,end}() with gallivm_init_llvm_targets().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:41:27 +00:00
Tom Stellard
76cfd6f1da gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets v2
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.

When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).

The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.

This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.

v2:
  - Use call_once and remove mutexes and static initializations.
  - Replace gallivm_init_llvm_{begin,end}() with
    gallivm_init_llvm_targets().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:41:26 +00:00
Tom Stellard
3219b48ae5 gallium/radeon: Use call_once() when initailizing LLVM targets
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 23:19:01 +00:00
Jason Ekstrand
bf7b6fd3fd i965/shader: Get rid of the shader, prog, and shader_prog fields
Unfortunately, we can't get rid of them entirely.  The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE.  The GS vec4 backend still
needs gl_shader_program for handling transfom feedback.  However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:54 -07:00
Jason Ekstrand
404419ee1a i965/fs,vec4: Get rid of the sanity_param_count
It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip.  Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
ca6a436f12 i965/vec4: Use nir info instead of pulling things out of [shader_]prog
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
756613ed35 i965/fs: Use the nir info instead of pulling things out of [shader_]prog
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
b62e36d18f i965/fs: Move sampler unit lookup into rescale_texcoord
The texunit variable we create and assign in nir_emit_texture gets passed
through two more layers of function calls before it gets to its sole use in
rescale_texcoord.  The best part is that we already pass the sampler into
rescale_texcoord so we can just look it up there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7b974c5f90 i965/cs: Remove the prog argument from local_id_payload_dwords
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7926c3ea7d i965/backend_shader: Add a field to store the NIR shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
7a8d06b6dd nir: Move GS data to nir_shader_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
e4fea486da nir: Add a a nir_shader_info struct
This commit also adds code to glsl_to_nir and prog_to_nir to fill it out.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
cd1ae6ebfa nir/glsl: Take a gl_shader_program and a stage rather than a gl_shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
30c6357113 i965: Move prog_data uniform setup to the codegen level
As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor.  This makes uniform setup more of a
language/API thing than a backend compiler thing.  This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
ea006c4cb5 i965: Move binding table setup to codegen time.
Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data.  This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:53 -07:00
Jason Ekstrand
28709e37d9 i965/shader: Pull assign_common_binding_table_offsets out of backend_shader
This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 14:22:52 -07:00
Jason Ekstrand
cdf314cb21 i965/nir: Simplify uniform setup
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
7fee8b6f05 i965/nir: Pull GLSL uniform handling into a common function
The way we deal with GLSL uniforms and builtins is basically the same in
both the vec4 and the fs backend.  This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
03c4171b57 i965/nir: Pull common ARB program uniform handling into a common function
The way we deal with ARB program uniforms is basically the same in both the
vec4 and the fs backend.  This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
390b48fc4a i965/vec4: Use the uniform count from nir_assign_var_locations
Previously, we were counting up uniforms as we set them up.  However, this
count should be exactly identical to shader->num_uniforms provided by
nir_assign_var_locations.  (If it's not, we're in trouble anyway because
that means that locations don't match up.)  This matches what the fs
backend is already doing.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
3de81508ea i965/shader: Get rid of the setup_vec4_uniform_value helper
It's not used by anything anymore

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
58cea0c2b6 i965/shader: Pull setup_image_uniform_values out of backend_shader
I tried to do this once before but Curro pointed out that having it in
backend_shader meant it could use the setup_vec4_uniform_values helper
which did different things in vec4 and fs.  Now the setup_uniform_values
function differs only by an assert in the two backends so there's no real
good reason to be using it anymore.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
5609e0d7b4 i965/vec4: Get rid of the uniform_vector_size array
The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
ea35fb0fbe i965/vec4: Use the actual channels used in pack_uniform_registers
Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms.  However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go.  This may also allow us
to pack things tighter in a few cases.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
cd2132f45b glsl/types: Make subroutine types have a single matrix column
That way, if we do the usual thing of multiplying vector_elements by
matrix_columns we get the actual number of components in the type as per
component_slots().

While we're at it, we also switch to using the actual C++ field
initializers for vector_elements and matrix_columns.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
a7e0f755bc i965: Pull stage_prog_data.nr_params out of the NIR shader
Previously, we had a bunch of code in each stage to figure out how many
slots we needed in stage_prog_data.param.  This code was mostly identical
across the stages and had been copied and pasted around.  Unfortunately,
this meant that any time you did something special, you had to add code for
it to each of these places.  In particular, none of the stages took
subroutines into account; they were working entirely by accident.  By
taking this data from the NIR shader, we know the exact number of entries
we need and everything goes a bit smoother.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:39 -07:00
Jason Ekstrand
fc3f45234b i965/vs: Move lazy NIR creation to codegen_vs_prog
The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases.  It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:19:38 -07:00
Jason Ekstrand
64b145422b i965/vec4: Delete the old vec4_vp code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:19:36 -07:00
Jason Ekstrand
1153f12076 i965/vec4: Delete the old ir_visitor code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:19:34 -07:00
Jason Ekstrand
b85761d11d i965/vec4: Always use NIR
GLSL IR vs. NIR shader-db results for vec4 programs on i965:

   total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
   instructions in affected programs:     1245199 -> 1134225 (-8.91%)
   helped:                                7469
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on G4x:

   total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
   instructions in affected programs:     1205599 -> 1094625 (-9.20%)
   helped:                                7469
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:

   total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
   instructions in affected programs:     1205503 -> 1094531 (-9.21%)
   helped:                                7468
   HURT:                                  2440

GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:

   total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
   instructions in affected programs:     1850547 -> 1621331 (-12.39%)
   helped:                                14856
   HURT:                                  1481

GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
   instructions in affected programs:     1660279 -> 1460468 (-12.03%)
   helped:                                14668
   HURT:                                  1369

I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:

   Test                        master-glsl     master-nir     diff
   bench_OglGeomPoint          461.556         463.006        1.450
   bench_OglTerrainFlyInst     184.484         187.574        3.090
   bench_OglTerrainPanInst     132.412         136.307        3.895
   bench_OglTexFilterAniso     19.653          19.645         -0.008
   bench_OglTexFilterTri       58.333          58.009         -0.324
   bench_OglVSInstancing       65.049          65.327         0.278
   bench_trexoff               69.474          69.694         0.220
   bench_valley                40.708          41.125         0.417

v2 (Jason Ekstrand):
 - Remove more uses of NirOptions as a switch
 - New shader-db numbers
 - Added benchmark numbers

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-02 14:18:46 -07:00
Ilia Mirkin
4e0a8e0a50 i965: don't forget to free image_param on prog_data free
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Ilia Mirkin
19598aaa5d glsl: avoid leaking hiddenUniforms map when there are no uniforms
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Ilia Mirkin
da2fdf950f mesa: avoid leaking closure when iterating over a string_to_uint_map
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 14:14:27 -04:00
Chris Wilson
6b7036498a nir: Fix uninitialized 'progress' variable in nir_lower_system_values.
Commit 0a1adaf11d (nir: Report progress
from nir_lower_system_values().) introduced a bug caught by Valgrind:

==823== Conditional jump or move depends on uninitialised value(s)
==823==    at 0xB09020C: convert_block (nir_lower_system_values.c:68)
==823==    by 0xB079FB8: foreach_cf_node (nir.c:1310)
==823==    by 0xB07A0AF: nir_foreach_block (nir.c:1336)
==823==    by 0xB09026B: convert_impl (nir_lower_system_values.c:79)
...
==823==  Uninitialised value was created by a stack allocation
==823==    at 0xB090249: convert_impl (nir_lower_system_values.c:76)

which is trivially fixed by initializing progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-02 10:44:28 -07:00
Connor Abbott
33da78adee nir/remove_phis: handle trivial back-edges
Some loops may have phi nodes that look like:

foo = ...
loop {
    bar = phi(foo, bar)
    ...
}

in which case we can remove the phi node and replace all uses of 'bar'
with 'foo'. In particular, there are some L4D2 vertex shaders with loops
that, after optimization, look like:

        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 block_4 */
                vec1 ssa_2195 = phi block_0: ssa_2136, block_4: ssa_994
                vec1 ssa_7321 = phi block_0: ssa_8195, block_4: ssa_7321
                vec1 ssa_7324 = phi block_0: ssa_8198, block_4: ssa_7324
                vec1 ssa_7327 = phi block_0: ssa_8174, block_4: ssa_7327
                vec1 ssa_8139 = intrinsic load_uniform () () (232)
                vec1 ssa_588 = ige ssa_2195, ssa_8139
                /* succs: block_2 block_3 */
                if ssa_588 {
                        block block_2:
                        /* preds: block_1 */
                        break
                        /* succs: block_5 */
                } else {
                        block block_3:
                        /* preds: block_1 */
                        /* succs: block_4 */
                }
                block block_4:
                /* preds: block_3 */
                vec1 ssa_994 = iadd ssa_2195, ssa_2150
                /* succs: block_1 */
        }

where after removing the second, third, and fourth phi nodes, the loop becomes
entirely dead, and this patch will cause the loop to be deleted entirely.

No piglit regressions.

Shader-db results on bdw:

instructions in affected programs:     5824 -> 5664 (-2.75%)
total loops in shared programs:        2234 -> 2202 (-1.43%)
helped:                                32

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-10-02 13:19:45 -04:00
Kyle Brenneman
d35391cfda glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)
Add a macro GL_LIB_NAME to hold the filename that configure comes up with
based on the --with-gl-lib-name and --enable-mangling options.

In driOpenDriver, use the GL_LIB_NAME macro instead of hard-coding
"libGL.so.1".

v2: Add an #ifndef/#define for GL_LIB_NAME so that non-autoconf builds will
    work.
v3: Fix the library filename in the Makefile.

Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 13:25:05 +01:00
Kyle Brenneman
798f260a2f mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.

This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-02 13:23:18 +01:00
Kyle Brenneman
a27f2d991b glx: Fix build errors with --enable-mangling (v2)
Rearranged the GLX_ALIAS macro in glextensions.h so that it will pick up
the renames from glx_mangle.h.

Fixed the alias attribute for glXGetProcAddress when USE_MGL_NAMESPACE is
defined.

v2: Add a comment clarifying why GLX_ALIAS needs two macros.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-02 13:22:46 +01:00
Tapani Pälli
85313ff8ab glsl: validate binding qualifier on block members
Fixes following Piglit test:
	member-invalid-binding-qualifier.frag

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-02 10:50:42 +03:00
Samuel Iglesias Gonsalvez
f42466322a glsl: emit row_major matrix's SSBO stores only for components in writemask
When writing to a column of a row-major matrix, each component of the
vector is stored to non-consecutive memory addresses, so we generate
one instruction per component.

This patch skips the disabled components in the writemask, saving some
store instructions plus avoid storing wrong data on each disabled
component.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-02 08:34:25 +02:00
Tapani Pälli
a552b77dcc glsl: error out if non-constant indexing of SSBO arrays with GLSL ES
Fixes a failing subtest in:
	ES31-CTS.shader_storage_buffer_object.negative-glsl-compileTime

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-02 08:37:02 +03:00
Daniel Scharrer
b3f9c5cc0f mesa: Add abs input modifier to base for POW in ffvertex_prog
The result of POW for a negative base is undefined. Even when the result
is multiplied by zero (which is the case here whenever the base is
negative), the Inf and NaNs can propagate past that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91342
Signed-off-by: Daniel Scharrer <daniel@constexpr.org>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-01 16:37:55 -04:00
Kenneth Graunke
604ce8253a i965/fs: Print reg and reg_offset separately for ATTR files.
Reading this output was really confusing.  reg represents attribute
slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 11:01:58 -07:00
Kenneth Graunke
193d29516d i965/nir: Refactor input/output lowering setup into helpers.
The code for input lowering is going to get significantly more
complicated shortly, so I wanted to pull it out.  Vertex shader inputs
are handled nearly identically regardless of vec4/scalar mode, so I
opted to not split that.

I thought about having each function actually do the lowering, but one
pass through nir_lower_io that handles all types (which weren't handled
earlier) is probably more efficient.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 10:58:30 -07:00
Kenneth Graunke
39a1d36a67 nir: Allow nir_lower_io() to only lower one type of variable.
We may want to use different type_size functions for (e.g.) inputs
vs. uniforms.  Passing in -1 for mode ignores this, handling all
modes as before.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-01 10:58:30 -07:00
Brian Paul
1c6689bf03 mesa: fix incorrect error in _mesa_BindTextureUnit()
If the texture object exists, but the Name field is zero, it means
the object was created but never bound to a target.  Trying to bind it
in _mesa_BindTextureUnit() should generate GL_INVALID_OPERATION.

Fixes piglit's arb_direct_state_access-bind-texture-unit test.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
a9408f3ca1 mesa: remove _mesa_get_tex_unit_err() and fix error handling
This helper was only called from _mesa_BindTextureUnit().  It's simpler
to just inline it.

The error check / code / message in the helper was incorrect.  It was
written for glBindTextures(), not glBindTextureUnit().  The correct
error for a bad texture unit number is GL_INVALID_VALUE.  The error
message now reports the unit number rather than a GL_TEXTUREi enum.

Fixes a failure in piglit's arb_direct_state_access-bind-texture-unit test.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
c277fa3940 mesa: consolidate texture binding code
Before, we were doing the actual _mesa_reference_texobj() call and
ctx->Driver.BindTexture() and misc housekeeping in three different
places.  This consolidates the common code in a new bind_texture()
function.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
78f908c54b mesa: fix indentation in _mesa_create_nameless_texture() 2015-10-01 07:45:43 -06:00
Brian Paul
aa249190a5 st/mesa: clean up #includes in st_draw.c
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
82e3d8ba8b mesa: clean up #includes in sampler.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
32a4999ee7 mesa: clean up #includes in ir_to_mesa.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
b9b13d873a mesa: clean up #includes in uniforms.h
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:43 -06:00
Brian Paul
e13b515044 mesa: clean up #includes in uniform_query.cpp
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Brian Paul
85ea125620 mesa: clean up #includes in pipelineobj.c
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Brian Paul
1a22550725 mesa: clean up #includes in ff_fragment_shader.cpp
Get rid of "../glsl/" paths.  Sort alphabetically.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 07:45:42 -06:00
Iago Toral Quiroga
7455324030 main: Fix block index when mixing UBO and SSBO blocks
Since we store both in UniformBlocks, we can't just compute the index by
subtracting the array address start, we need to count the number of
buffers of the approriate type.

v2:
  - Just fall back to calc_resource_index (Tapani)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-01 09:25:30 +02:00
Tapani Pälli
ca2e16d26e mesa: use strtok_s for strtok_r on windows
https://msdn.microsoft.com/en-us/library/ftsafwz3.aspx

v2: use _WIN32 instead of _MSC_VER (Brian Paul)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92183
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-01 08:01:03 +03:00
Ian Romanick
9bd9cf1fa4 meta: Handle array textures in scaled MSAA blits
The old code had some significant problems with respect to
sampler2DArray textures.  The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2.  The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.

The input to the fragment shader is always treated as a vec3.  If the
source data is only vec2, the vertex puller will supply 0 for the .z
component.  The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer.  If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.

Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.

Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 16:22:56 -07:00
Chad Versace
b217e6f035 i965/miptree: Add PRM references for most struct members (v2)
Add comments that link the driver's miptree structures to the hardware
structures documented in the PRM.  This provides sorely needed
orientation to developers new to the miptree code. And for miptree
veterans, this clarifies some of the more obscure miptree data.

For each driver struct field that closely corresponds to a
hardware struct field, add a PRM reference to that hardware field's
name. For example,

    struct intel_mipmap_tree {
       ...
       /**
        * @brief One of GL_TEXTURE_2D, GL_TEXTURE_2D_ARRAY, etc.
        *
        * @see RENDER_SURFACE_STATE.SurfaceType
        * @see RENDER_SURFACE_STATE.SurfaceArray
        * @see 3DSTATE_DEPTH_BUFFER.SurfaceType
        */
       GLenum target;
       ...
    };

Also annotate the INTEL_MSAA_LAYOUT_* enums with the name of the PRM
sections that documents the layout.

v2: Replace "2D subimage" with "slice", and define what a "slice" is.
    For Ben.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
2015-09-30 15:32:03 -07:00
Chad Versace
f7fe9fb0f1 i965/miptree: Rename align_w,align_h -> halign,valign
The values of intel_mipmap_tree::align_w and ::align_h correspond to the
hardware enums HALIGN_* and VALIGN_*.

See the confusion?
    align_h != HALIGN
    align_h == VALIGN

Reduce the confusion by renaming the variables to match the hardware
enum names:
    git ls-files |
    xargs sed -i -e 's/align_w/halign/g' \
                 -e 's/align_h/valign/g'

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-30 15:31:06 -07:00
Chad Versace
56367b0290 i965/miptree: Rename intel_miptree_map::mt -> ::linear_mt (v2)
Because that's what it is. It's an untiled, *linear* miptree.

v2:
  - Add space after /*.
  - Use one comment per function argument.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:31:04 -07:00
Chad Versace
b7882ae677 i965/miptree: Fix comments for map mode
The comment for intel_miptree_map::mode claimed that it was a bitmask of
GL_MAP_{READ,WRITE,INVALIDATE}_BIT. In reality, the bitmask may include
any of {GL,BRW}_MAP_*_BIT.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:31:03 -07:00
Chad Versace
bd191b7cc6 i965/miptree: More comments for BRW_MAP_DIRECT_BIT (v2)
Clarify that this bit extends the set of GL_MAP_*_BIT enums.
Also fix typo of "temporary".

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-30 15:30:55 -07:00
Kenneth Graunke
651395b6e8 i965: Remove duplicate copy of is_scalar_shader_stage().
Jason open coded this in 60befc63 when cleaning up some ugly code;
using our existing helper tidies it up a bit more.

v2: Drop inline (suggested by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-30 13:56:24 -07:00
Ville Syrjälä
a1a3f0961b i915: Remember to call intel_prepare_render() before blitting
Bring over the following fix from i965:
 commit fb3d62fe3d
 Author: Kenneth Graunke <kenneth@whitecape.org>
 Date:   Tue Aug 6 14:36:09 2013 -0700

    i965: Remember to call intel_prepare_render() before blitting.

Fixes a crash in the following piglit tests:
 bin/fbo-sys-blit -auto
 bin/fbo-sys-sub-blit -auto

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 13:10:03 -07:00
Ville Syrjälä
c349031c27 i915: Fix texcoord vs. varying collision in fragment programs
i915 fragment programs utilize the texture coordinate registers
for both texture coordinates and varyings. Unfortunately the
code doesn't check if the same index might be in use for both.
It just naively uses the index to pick a texture unit, which
could lead to collisions.

Add an extra mapping step to allocate non conflicting texture
units for both uses.

The issue can be reproduced with a pair of simple shaders like
these:
 attribute vec4 in_mod;
 varying vec4 mod;
 void main() {
   mod = in_mod;
   gl_TexCoord[0] = gl_MultiTexCoord0;
   gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
 }

 varying vec4 mod;
 uniform sampler2D tex;
 void main() {
   gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod;
 }

Fixes many piglit tests on i915:

    glsl-link-varyings-2
    glsl-orangebook-ch06-bump
    interpolation-none-gl_frontcolor-smooth-fixed
    interpolation-none-gl_frontcolor-smooth-none
    interpolation-none-gl_frontcolor-smooth-vertex
    interpolation-none-gl_frontsecondarycolor-smooth-fixed
    interpolation-none-gl_frontsecondarycolor-smooth-vertex
    interpolation-none-gl_frontsecondarycolor-smooth-none
    interpolation-none-other-flat-fixed
    interpolation-none-other-flat-none
    interpolation-none-other-flat-vertex
    interpolation-none-other-smooth-fixed
    interpolation-none-other-smooth-none
    interpolation-none-other-smooth-vertex

v2 [idr]: Minor formatting tweaks.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 13:10:03 -07:00
Ville Syrjälä
9504740f3e i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0)
I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy
the same bit. Move the texture bits upwards a bit to make room for
I830_UPLOAD_RASTER_RULES.

Now the driver will actually upload the raster rules which is rather
important to get the provoking vertex right. Fixes the appearance
of glxgears teeth on gen2.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-30 12:49:28 -07:00
Jordan Justen
7b391142e9 i965/cs: Upload UBO/SSBO surfaces
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-30 11:28:12 -07:00
Rhys Kidd
83018f5c20 mesa: Fix format specifier warning in mesa_DispatchComputeIndirect()
Commit 1665d29ee3 introduced an incorrect
format specifier that operates on GLintptr indirect within the function
_mesa_DispatchComputeIndirect().

This patch mitigates the introduced GCC warning:

src/mesa/main/compute.c: In function '_mesa_DispatchComputeIndirect':
src/mesa/main/compute.c:53:7: warning: format '%d' expects argument of type 'int', but argument 3 has type 'GLintptr' [-Wformat=]
       _mesa_debug(ctx, "glDispatchComputeIndirect(%d)\n", indirect);
           ^

v2: Amend for Boyan Ding <boyan.j.ding@gmail.com> feedback.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-30 10:13:41 -07:00
Jason Ekstrand
3948ac19a4 i965: Get rid of prog_data compare functions
They are no longer used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Jason Ekstrand
bfdc76c133 i965/state_cache: Remove the aux_compare fields
They haven't been used since 1bba29ed40 so
there's no good reason to keep them around.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Jason Ekstrand
a4734b34b3 i965/copy_image: Fix a copy+past error
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-30 08:35:32 -07:00
Chris Wilson
70e91d61fd i965: Remove early release of DRI2 miptree
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-30 10:52:30 +03:00
Samuel Iglesias Gonsalvez
e21bb9e7bd glsl: assert base_alignment > 0 for records
From GLSL 1.50 spec, section 4.1.8 "Structures":

"Structures must have at least one member declaration."

So the base_alignment should be higher than zero.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
f3afcbecc6 util: use strnlen() in strndup() implementations
If the string being copied is not NULL-terminated the result of
strlen() is undefined.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
023165a734 i965/vec4/nir: add nir_intrinsic_memory_barrier support
Fix OpenGL ES 3.1 conformance tests: advanced-readWrite-case1-vsfs
and advanced-matrix-vsfs.

v2:
- Fix SHADER_OPCODE_MEMORY_FENCE emission and the allocation of 'tmp'
  (Francisco).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-30 08:13:07 +02:00
Samuel Iglesias Gonsalvez
f24e5e68d6 glsl: apply shader storage block member rules when adding program resources
From ARB_program_interface_query:

"For an active shader storage block member declared as an array, an
 entry will be generated only for the first array element, regardless
 of its type. For arrays of aggregate types, the enumeration rules are
 applied recursively for the single enumerated array element."

v2:
- Simplify 'if' conditions and return true if it is not a buffer
  variable, because these rules only apply to buffer variables (Timothy).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-30 08:13:07 +02:00
Jordan Justen
4810d02112 nir: Don't set dest in SSBO store glsl_to_nir conversion
This matches the function signature created in
lower_ubo_reference_visitor::ssbo_store which has a void return.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-29 17:17:20 -07:00
Kenneth Graunke
476e6d732f nir: Use a system value for gl_PrimitiveIDIn.
At least on Intel hardware, gl_PrimitiveIDIn comes in as a special part
of the payload rather than a normal input.  This is typically what we
use system values for.  Dave and Ilia also agree that a system value
would be nicer.

At some point, we should change it at the GLSL IR level as well.  But
that requires changing most of the drivers.  For now, let's at least
make NIR do the right thing, which is easy.

v2: Add a comment about not creating a temporary (suggested by Iago).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-29 14:19:32 -07:00
Brian Paul
cb758b892a st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats
For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag
to try to get a hardware format which also supports rendering (for FBO
textures).  Do the same thing for floating point formats.

This allows the Redway3D Flat demo to run.

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-29 11:52:22 -06:00
Brian Paul
daf23bd4cb st/mesa: add some debugging code in st_ChooseTextureFormat()
I've temporarily added code like this many times.  Wrap it in a
conditional that can be enabled when needed.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-29 11:52:03 -06:00
Brian Paul
7147f7098e mesa: clean up #includes in shaderapi.c
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:56 -06:00
Brian Paul
b24c6d3fef mesa: clean up the #includes in shader_query.cpp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:51 -06:00
Brian Paul
3bbff1e26e mesa: remove an extern "C" wrapper in shader_query.cpp
The shaderapi.h header already has the extern "C" wrapper.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-29 11:51:38 -06:00
Jordan Justen
681b4badae i965/cs: Generate code to load gl_NumWorkGroups
This code also sets cs_prog_data->uses_num_work_groups which is later
used by state setup to indicate that the gl_NumWorkGroups surface
needs to be setup.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
4c6ddd3397 nir: Convert SYSTEM_VALUE_NUM_WORK_GROUPS to a nir intrinsic
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
f6ae914069 glsl/cs: Add gl_NumWorkGroups as a system value
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
63d7b33f51 i965/cs: Setup surface binding for gl_NumWorkGroups
This will only be setup when the prog_data uses_num_work_groups
boolean is set.

At this point nothing will set uses_num_work_groups, but soon code
will set it when emitting code for the intrinsic that loads
gl_NumWorkGroups.

We can't emit this surface information earlier at the start of the
DispatchCompute* call because we may not have generated the program
yet. Until we generate the program, we don't know if the
gl_NumWorkGroups variable is accessed.

We also can't emit the surface as part of the brw_cs_state atom,
because we might not need the surface if gl_NumWorkGroups is not used
by the program.

Lastly, we cannot emit the surface later (after state upload) in the
DispatchCompute* call, because it needs to be run before the
brw_cs_state atom is emitted, since it changes the surface state.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
d1be9d2126 i965/cs: Add a binding table entry for gl_NumWorkGroups
If glDispatchComputeIndirect is used, then the value for this variable
must be read from the indirect BO.

To allow the same generated code to support indirect and
glDispatchCompute, we will also setup a BO for the number of work
groups using the intel_upload_data mechanism. This will only be
required if the gl_NumWorkGroups variable is accessed.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
d57a85f32b i965/cs: Store compute invocation information in brw context
We will need this in an atom to setup a surface to read the
gl_NumWorkGroups values from.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
60cf84dea7 i965/cs: Re-emit cs_state when surfaces have changed
Unlike rendering (BINDING_TABLE_POINTERS_*S), compute doesn't have a
binding table pointers command. Instead it is part of the
MEDIA_INTERFACE_DESCRIPTOR structure loaded by the brw_cs_state atom.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
2ec5f3e1d5 i965/cs: Re-emit push constants and cs_state on new batches
We need to re-emit push constansts when a new batch is started since
the push constants are stored in the batch. We also need to re-emit
the MEDIA_INTERFACE_DESCRIPTOR (in brw_cs_state) since it is stored in
the batch.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jordan Justen
1665d29ee3 mesa/cs: Add MESA_VERBOSE=api support in DispatchCompute*
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 08:23:47 -07:00
Jose Fonseca
952366a60e util: Fix strndup prototype on C++.
Trivial.
2015-09-29 16:01:56 +01:00
Tapani Pälli
c0722be9f5 mesa: fix ARRAY_SIZE query for GetProgramResourceiv
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.

Fixes rest of the failures with
   ES31-CTS.program_interface_query.no-locations

v2: make additional check only for GS inputs
v3: create helper function for resource name length
    so that it gets calculated only in one place

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
2015-09-29 12:46:28 +03:00
Iago Toral Quiroga
12d510ab74 glsl: Fix forward NULL dereference coverity warning
The comment says that it should be impossible for decl_type to be NULL
here, so don't try to handle the case where it is, simply add an assert.

>>>     CID 1324977:  Null pointer dereferences  (FORWARD_NULL)
>>>     Comparing "decl_type" to null implies that "decl_type" might be null.

No piglit regressions observed.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:53:08 +02:00
Iago Toral Quiroga
1dc2db7a4d glsl: Fix null return coverity warning
Add an assert on the result of as_dereference() not being NULL:

>>>     CID 1324978:  Null pointer dereferences  (NULL_RETURNS)
>>>     Dereferencing a null pointer "deref_record->record->as_dereference()".

Since we are introducing a new variable to hold the result of
as_dereference(), take the opportunity to rename deref_record_type to
interface_type and just name the new variable interface_deref, which is
less confusing.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:53:08 +02:00
Iago Toral Quiroga
6bf718fec2 glsl: Fix unused value warning reported by Coverity
We don't use param in this part of the code, so no point in advancing
the pointer forward:

>>>     CID 1324983:  Code maintainability issues  (UNUSED_VALUE)
>>>     Assigning value from "param->get_next()" to "param" here, but that stored value is overwritten before it can be used.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:53:08 +02:00
Samuel Iglesias Gonsalvez
bea66d22f2 util: implement strndup for WIN32
v2:
- Add strndup.h to Makefile.sources (Emil)
- Use calloc instead of malloc (Emil).
- Check if allocation fails (Emil, Jose)
- Add '#pragma once' and include stdlib.h to strndup.h (Jose)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92124
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
7efb235019 glsl: use correct number of uniform blocks in error message
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
6668eb5a45 mesa: rename gl_shader_program's NumUniformBlocks to NumBufferInterfaceBlocks
Because it counts shader storage blocks too.

v2:
- Use NumBufferInterfaceBlocks instead (Jordan).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-29 10:03:47 +02:00
Samuel Iglesias Gonsalvez
38004eb17c main: fix ACTIVE_UNIFORM_BLOCKS value
NumUniformBlocks also counts shader storage blocks.
NumUniformBlocks variable will be renamed in a later patch to avoid
misunderstandings.

v2:

- Modify the condition to use !IsShaderStorage and the list of
  uniform blocks (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-29 10:03:47 +02:00
Emil Velikov
589249a792 docs: add news item and link release notes for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-29 00:22:32 +01:00
Emil Velikov
dda02d202e docs: add sha256 checksums for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4c0b484612)
2015-09-29 00:21:14 +01:00
Emil Velikov
58e02b2a4e docs: add release notes for 11.0.2
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 51e0b06d99)
2015-09-29 00:21:12 +01:00
Anuj Phogat
945592f92c i965/gen9: Add a condition for starting pixel in fast copy blit
This condition restricts the use of fast copy blit to cases
where starting pixel of src and dst is oword (16 byte) aligned.

Many piglit tests (if using fast copy blit in Mesa) failed earlier
because I missed adding this condition.Fast copy blit is currently
enabled for use only with Yf/Ys tiling.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 15:00:53 -07:00
Ilia Mirkin
1d8cba9b51 nouveau: wait to unref the transfer's bo until it's no longer used
The bo will often come from a slab in which case it doesn't matter. But
for larger allocations this will be in its own bo, and we have to make
sure to wait until it's no longer used in order for it to be freed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Ilia Mirkin
3a6b9a7830 nouveau: delay deleting buffer with unflushed fence
If there is an unflushed fence on the bo, then the resource may still be
used in commands built up in the local pushbuf. Flushing can cause all
sorts of unwanted effects, so just free the bo when the relevant fence
is hit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Ilia Mirkin
d4e650b07b nouveau: be more careful about freeing temporary transfer buffers
Deleting a buffer does not flush the command stream. Make sure that we
wait for the copies to finish before deleting the temporary bo.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-28 17:28:54 -04:00
Anuj Phogat
4c5308bbf4 i965: Rename intel_miptree_get_dimensions_for_image()
This function isn't specific to miptrees. So, drop the "miptree"
from function name.

V3: Add a comment explaining how the 1D Array texture height and
    depth is interpreted by Intel hardware.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
0bfd914f9f i965/gen9: Fix {src, dst}_pitch alignment check for XY_FAST_COPY_BLT
I misinterpreted the alignmnet restriction in XY_FAST_COPY_BLT earlier.
Instead of checking pitch for 64KB alignmnet we need to check it for
tile widh alignment.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
0fa39bff19 i965: Fix {src, dst}_pitch alignment check for XY_SRC_COPY_BLT
Current code checks the alignment restrictions only for Y tiling.
From Broadwell PRM vol 10:

 "pitch is of 512Byte granularity for Tile-X: This means the tiled-x
  surface pitch can be (512, 1024, 1536, 2048...)/4 (in Dwords)."

This patch adds the restriction for X tiling as well.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
e83b07aa7b i965: Move conversion of {src, dst}_pitch to dwords outside if/else
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
485285498f i965: Delete temporary variable 'src_pitch'
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
bbbc9fd8e5 i965: Use helper function intel_get_tile_dims() in surface setup
It takes care of using the correct tile width if we later use other
tiling patterns for aux miptree.

V2: Remove the comment about using Yf for aux miptree.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
1dc41be9eb i965: Use intel_get_tile_dims() to get tile masks
This will require change in the parameters passed to
intel_miptree_get_tile_masks().

V2: Rearrange the order of parameters. (Ben)
    Change the name to intel_get_tile_masks(). (Topi)

V3: Use temporary variables in intel_get_tile_masks()
    for clarity. Fix mask_y computation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Anuj Phogat
21fdc59d34 i965: Add a helper function intel_get_tile_dims()
V2:
- Do the tile width/height computations in the new helper
  function and use it later in intel_miptree_get_tile_masks().
- Change the name to intel_get_tile_dims().

V3: Return the tile_h in number of rows in place of bytes.
    Document the units of tile_w, tile_h parameters.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-09-28 12:43:43 -07:00
Eduardo Lima Mitev
5edd9961c1 mesa: Use the effective internal format instead for validation
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:

    "if internalformat is a base internal format, the effective internal
     format is a sized internal format that is derived from the format and
     type for internal use by the GL. Table 3.12 specifies the mapping of
     format and type to effective internal formats. The effective internal
     format is used by the GL for purposes such as texture completeness or
     type checks for CopyTex* commands. In these cases, the GL is required
     to operate as if the effective internal format was used as the
     internalformat when specifying the texture data."

v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.

v4: * Improved comments in
      _mesa_es3_effective_internal_format_for_format_and_type().
    * Splitted patch to separate chunk about reordering of
      error_check_subtexture_dimensions() error check, which is not directly
      related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
    dEQP tests that are buggy:

    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
    dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Eduardo Lima Mitev
c6bf1cd146 mesa: Move _mesa_base_tex_format() from teximage to glformats files
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.

Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.

The function is moved as-is, without any modification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Eduardo Lima Mitev
15ab968f62 mesa: Fix order of format+type and internal format checks for glTexImageXD ops
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.

v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-09-28 11:39:53 -07:00
Alexander von Gluck IV
7cdd818d2a egl: Fix missing Haiku include path 2015-09-28 13:58:25 -04:00
Alexander von Gluck IV
255a225265 state_trackers/hgl: Fix missing include path 2015-09-28 13:58:24 -04:00
Francisco Jerez
b61292296b i965/fs: Fix hang on IVB and VLV with image format mismatch.
IVB and VLV hang sporadically when an untyped surface read or write
message is used to access a surface of format other than RAW, as may
happen when there is a mismatch between the format qualifier of the
image uniform and the format of the actual image bound to the
pipeline.  According to the spec this condition gives undefined
results but may not lead to program termination (which is one of the
possible outcomes of the hang).  Fix it by checking at runtime whether
the surface is of the right type.

Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
subtest.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-28 18:10:39 +03:00
Serge Martin
2518645f63 clover: Implement clCreateImage?D w/ clCreateImage.
Remplace clCreateImage2D and clCreateImage3D implementation with call
to clCreateImage.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 18:10:39 +03:00
Serge Martin
f2c52e392b clover: Implement CL1.2 clCreateImage().
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 18:10:39 +03:00
Francisco Jerez
92666b90c0 clover: Move down canonicalization of memory object flags into validate_flags().
This will be used to share the same logic between buffer and image
creation.

v2: Make memory flag set constants local to validate_flags. (Serge
    Martin)
2015-09-28 18:10:39 +03:00
Samuel Iglesias Gonsalvez
2b9248dc58 docs: mention ARB_shader_storage_buffer_object on 11.1.0 release notes
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-28 16:34:24 +02:00
Iago Toral Quiroga
e7ae6d9e14 glsl: revert "glsl: atomic counters can be declared as buffer-qualified variables"
This reverts commit 586142658e.

The specs are not explicit about any restrictions related to the types allowed
on buffer variables, however, the description of opaque types (like atomic
counters) is in conclict with the purpose of buffer variables:

"The opaque types declare variables that are effectively opaque
 handles to other objects. These objects are
 accessed through built-in functions, not through direct reading or
 writing of the declared variable.
 (...)
 Opaque variables cannot be treated as l-values;(...)"

Also, Mesa is already disallowing opaque types in interface blocks anyway, so
that commit was not really achieving anything.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-28 14:23:26 +02:00
Ilia Mirkin
5bff12ecb4 gallium/util: avoid unreferencing random memory on buffer alloc failure
Found by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-28 02:38:58 -04:00
Ilia Mirkin
6dd059fefe mesa: don't leak interface_name
Found by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-28 02:38:58 -04:00
Timothy Arceri
e413d2fbc4 glsl: fix component size calculation for tessellation and geom shaders
Broken in commit abdab88b30 when adding arrays of arrays support

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-09-28 11:31:50 +10:00
Boyan Ding
3c63a2d2f0 docs/GL3.txt: fix typo
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-09-27 17:54:23 -07:00
Kenneth Graunke
d6a41b5f70 i965/gs: Optimize away the EOT write on Gen8+ with static vertex count.
With static vertex counts, the final EOT write doesn't actually write
any data - it's just there to end the thread.  Typically, the last
thing before ending the thread will be an EmitVertex() call, resulting
in a URB write.  We can just set EOT on that.

Note that this isn't always possible - there might be an intervening
SSBO write/image store, or the URB write may have been in a loop.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3173 -> 3149 (-0.76%)
instructions in affected programs:     176 -> 152 (-13.64%)
helped:                                8

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 12:02:34 -07:00
Kenneth Graunke
08fe5799e6 i965/gs: Allow src0 immediates in GS_OPCODE_SET_WRITE_OFFSET.
GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special
strides.  We can easily make the generator handle constant src[0]
arguments by instead generating a MOV with the product of both operands.

This isn't necessarily a win in and of itself - instead of a MUL, we
generate a MOV, which should be basically the same cost.  However, we
can probably avoid the earlier MOV to put src[0] into a register.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3207 -> 3173 (-1.06%)
instructions in affected programs:     3207 -> 3173 (-1.06%)
helped:                                11

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 12:02:31 -07:00
Kenneth Graunke
f0a618ee7c i965: Implement "Static Vertex Count" geometry shader optimization.
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization.  Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly.  However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.

Thanks to the new NIR GS intrinsics, this is easy.  We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.

Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.

shader-db statistics for geometry shaders only:

total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs:     242 -> 222 (-8.26%)
helped:                                10

v2: Don't break non-NIR paths (just skip this optimization).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:58 -07:00
Kenneth Graunke
bcef2abad7 i965: Move GS_THREAD_END mlen calculations out of the generator.
The visitor was setting a mlen that was wrong for Broadwell, but the
generator was ignoring it and doing the right thing regardless.  We may
as well move the logic fully into the visitor.  This will be useful in
the next commit as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:57 -07:00
Kenneth Graunke
02530c5dc5 nir: Add a function to count the number of vertices a GS emits.
Some hardware (such as Broadwell) can run geometry shaders more
efficiently when the number of vertices emitted is statically known.

This pass provides a way to obtain the constant vertex count, or
-1 indicating that the vertex count is unknown/non-constant.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-26 12:01:53 -07:00
Kenneth Graunke
df221f65e2 i965: Simplify handling of VUE map changes.
The old code was disasterously complex - spread across multiple atoms
which may not even run, inspecting the dirty bits to try and decide
whether it was necessary to do checks...storing VS information in
brw_context...extra flagging...

This code tripped me and Carl up very badly when working on the
shader cache code.  It's very fragile and hard to maintain.

Now that geometry shaders only depend on their inputs and don't have
to worry about the VS VUE map, we can dramatically simplify this:
just compute the VUE map coming out of the geometry shader stage
in brw_upload_programs.  If it changes, flag it.  Done.

v2: Also check vue_map.separable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
6301af22bb i965/gs: Remove the dependency on the VS VUE map.
Because we only support geometry shaders in core profile, we can safely
ignore any driver-extending of VS outputs.

Those are:
- Legacy userclipping (doesn't exist in core profile)
- Edgeflag copying (Gen4-5 only, no GS support)
- Point coord replacement (Gen4-5 only, no GS support)
- front/back color hacks (Gen4-5 only, no GS support)

v2: Rebase; leave a comment about why SSO works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
99df02ca26 i965: Don't re-layout varyings for separate shader programs.
Previously, our VUE map code always assigned slots to varyings
sequentially, in one contiguous block.

This was a bad fit for separate shaders - the GS input layout depended
or the VS output layout, so if we swapped out vertex shaders, we might
have to recompile the GS on the fly - which rather defeats the point of
using separate shader objects.  (Tessellation would suffer from this
as well - we could have to recompile the HS, DS, and GS.)

Instead, this patch makes the VUE map for separate shaders use a fixed
layout, based on the input/output variable's location field.  (This is
either specified by layout(location = ...) or assigned by the linker.)
Corresponding inputs/outputs will match up by location; if there's a
mismatch, we're allowed to have undefined behavior.

This may be less efficient - depending what locations were chosen, we
may have empty padding slots in the VUE.  But applications presumably
use small consecutive integers for locations, so it hopefully won't be
much worse in practice.

3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions.
This seems like a small price to pay for avoiding recompiles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
1e5180316c i965/vue: Make assign_vue_map() take an explicit slot.
Our plan of assigning consecutive slots doesn't work properly for
separate shader objects - at least, if we want to avoid recompiling them
whenever the interface changes.

As a first step, make assign_vue_map take an explicit slot parameter,
rather than implicitly incrementing it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
268008f98c i965: Initialize unused VUE map slots to BRW_VARYING_SLOT_PAD.
Nothing actually relies on unused slots being initialized to
BRW_VARYING_SLOT_COUNT.  Soon, we're going to have VUE maps with holes
in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot
more sense.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Kenneth Graunke
39d4b553a8 i965: Fix BRW_VARYING_SLOT_PAD handling in the scalar VS backend.
We can't just break for padding slots.  Instead, treat them like
unwritten output variables, so we handle flushing and incrementing
urb_offset correctly.

Paul introduced the concept of padding slots back in 2011, but we've
never actually used them for anything.  So it's unsurprising that the
scalar VS backend didn't handle them quite right.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-26 11:59:56 -07:00
Samuel Iglesias Gonsalvez
511a86383b main/tests: Enable glShaderStorageBlockBinding() check in dispatch_sanity test
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-26 16:54:02 +02:00
Emil Velikov
d2d4f00a2c docs: add news item and link release notes for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-26 14:25:19 +01:00
Emil Velikov
5d08669e2f docs: add sha256 checksums for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 7f1a77ae66)
2015-09-26 14:23:00 +01:00
Emil Velikov
aeec994954 docs: add release notes for 11.0.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit bcb9e1d26b)
2015-09-26 14:22:59 +01:00
Timothy Arceri
abdab88b30 glsl: calculate component size for arrays of arrays when varying packing disabled
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:48:49 +10:00
Timothy Arceri
1d401f9ce4 glsl: validate binding qualifier for AoA
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:28:05 +10:00
Timothy Arceri
9bad7afbc2 glsl: add helper for calculating size of AoA
V2: return 0 if not array rather than -1

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-26 22:27:47 +10:00
Timothy Arceri
776a3845d6 glsl: clean-up link uniform code
These changes are also needed to allow linking of
struct and interface arrays of arrays.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-26 22:27:24 +10:00
Marek Olšák
9932142192 radeonsi: add scratch buffer to the buffer list when it's re-allocated
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-09-26 01:51:05 +02:00
Leo Liu
1e97b41893 radeon/vce: fix vui time_scale zero error
if app pass 0 as frame_rate_num, it should not be encoded to the VUI.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-25 18:47:14 -04:00
Matt Turner
1dd943d7fb mesa: Add locking to programs.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
3c57a102eb mesa: Add locking to sampler objects.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
d4b0e0b717 mesa: Remove debugging code from _mesa_reference_*.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Matt Turner
c8dc04d4c0 c11/threads: Assert that mtx is non-NULL and check return values.
Passing NULL to C11 threads functions isn't safe, so there's no need for
our implementation to handle it. Cuts about 1k of .text.

   text     data      bss      dec      hex  filename
5009514   198440    26328  5234282   4fde6a  i965_dri.so before
5008346   198440    26328  5233114   4fd9da  i965_dri.so after

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 14:08:31 -07:00
Tapani Pälli
266d05a3a0 glsl: fix packed varyings interface type and add default case
fixes Piglit test:
   arb_program_interface_query/linker/query-varyings.shader_test

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-25 12:19:36 +03:00
Antia Puentes
e92c35a872 glsl: Mark as active all elements of shared/std140 block arrays
Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140' blocks
or block members) considered as active 'shared' and 'std140' uniform
blocks and uniform block arrays, but did not include the block array
elements. Because of that, it was possible to have an active uniform
block array without any elements marked as used, making the assertion
   ((b->num_array_elements > 0) == b->type->is_array())
in link_uniform_blocks() fail.

Fixes the following 5 dEQP tests:

 * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18
 * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24
 * dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19
 * dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49
 * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83508
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
065e7d37f1 docs: Mark ARB_shader_storage_buffer_object as done for i965
v2:
- Mark it too for GLES 3.1

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
614b5307fd i965: Enable ARB_shader_storage_buffer_object extension for gen7+
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
5b080e3ddf mesa: enable ARB_shader_storage_buffer_object extension for GLES 3.1
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
10b5c6491f mesa: Add getters for the GL_ARB_shader_storage_buffer_object max constants
v2:
- Add tessellation shader constants support

v3:
- Add GLES 3.1 support.

v4:
- Move the getters to the proper place

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
91191af6d6 glapi: add ARB_shader_storage_block_buffer_object
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
26011fa22a main/tests: add ARB_shader_storage_buffer_object tokens to enum_strings
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
9b477ad49d main: Add SHADER_STORAGE_BLOCK and BUFFER_VARIABLE support for ARB_program_interface_query
Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE queries.

v2:
- Use std430_array_stride() to get top level array stride following std430's rules.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
0f18945cb6 glsl: Do not allow reads from write-only buffer variables
The error location won't be right, but fixing that would require to check
for this as we process each type of AST node that can involve a variable
read.

v2:
  - Limit the check to buffer variables, image variables have different
    semantics involved.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
995a719499 glsl: Do not allow assignments to read-only buffer variables
v2:
  - Merge the error check for the readonly qualifier with the already
    existing check for variables flagged as readonly (Timothy).
  - Limit the check to buffer variables, image variables have different
    semantics involved (Curro).

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
6ef82f039c glsl: Allow memory qualifiers on shader storage buffer blocks
v2:
  - Memory qualifiers on shader storage buffer objects do not come in the form
    of layout qualifiers, they are block-level qualifiers.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
f1b647fdd1 glsl: Apply memory qualifiers to buffer variables
v2:
  - Save memory qualifier info in the top level members of a shader
    storage block.
  - Add a checks to record_compare() which is used when comparing
    shader storage buffer declarations in different shaders.
  - Always report an error for incompatible readonly/writeonly
    definitions, whether they are present at block or field level.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
f4c8c01a3d glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
3b2037f88c glsl: fix UNIFORM_BUFFER_START or UNIFORM_BUFFER_SIZE query when no buffer object is bound
According to ARB_uniform_buffer_object spec:

"If the parameter (starting offset or size) was not specified when the
 buffer object was bound (e.g. if bound with BindBufferBase), or if no
 buffer object is bound to <index>, zero is returned."

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
2e16dd1350 mesa: Add queries for GL_SHADER_STORAGE_BUFFER
These handle querying the buffer name attached to a giving binding point
as well as the start offset and size of that buffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Samuel Iglesias Gonsalvez
4b7b1cf3c0 mesa: add glShaderStorageBlockBinding()
Defined in ARB_shader_storage_buffer_object extension.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
a07d0c2657 glsl: First argument to atomic functions must be a buffer variable
v2:
  - Add ssbo_in the names of the static functions so it is clear that this
    is specific to SSBO atomics.

v3:
  - Move the check after the loop (Kristian Høgsberg)

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
5ef169034c i965/nir/vec4: Implement nir_intrinsic_ssbo_atomic_*
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
14af6f4698 i965/nir/fs: Implement nir_intrinsic_ssbo_atomic_*
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
9d5c0be5d5 nir: Implement lowered SSBO atomic intrinsics
The original GLSL IR intrinsics have been lowered to an internal
version that accepts a block index and an offset instead of a
SSBO reference.

v2 (Connor):
  - Document the sources used by the atomic intrinsics.

Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:23 +02:00
Iago Toral Quiroga
d2719b6e4f glsl: lower SSBO atomic intrinsics
The first argument to SSBO atomics is a reference to a SSBO buffer variable
so we want to compute its block index and offset and provide these values
to an internal version of the intrinsic that takes them instead of the
buffer variable reference.

v2:
- Support single components of integer vectors to be passed in as arguments.
- Get interface packing information from interface's type.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
da659087b9 glsl: use ir_rvalue instead of ir_dereference in auxiliary functions
In a later commit we will need to handle ir_swizzle nodes too, which are
not an ir_dereference. That can happen, for example, when we pass a
component of an integer vector as argument to any of the SSBO atomic
functions.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
ea0a1f5beb glsl: Add atomic functions from ARB_shader_storage_buffer_object
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
2cacebaad3 glsl: Rename atomic counter functions
Shader Storage Buffer Object will add new atomic functions that are not
associated with counters, so better have atomic counter-specific functions
explicitly include the word "counter" in their names.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
586142658e glsl: atomic counters can be declared as buffer-qualified variables
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
475d9c32d1 nir/glsl_to_nir: ignore an instruction's dest if it hasn't any
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
e3f9c7829c i965/nir/vec4: Implement nir_intrinsic_load_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
5b186aafe7 i965/nir/fs: Implement nir_intrinsic_load_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
e59ae238b6 nir: Implement __intrinsic_load_ssbo
v2:
- Fix ssbo loads with boolean variables.

v3:
- Simplify the changes (Kristian)

Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
3e70c968de nir: modify the instruction insertion in nir_visitor::visit(ir_call *ir)
This patch moves nir_instr_insert_after_cf_list call into each case
in the intrinsics switch at nir_visitor::visit(ir_call *ir) and
define a nir_dest variable which will be used when handling
ir->return_deref after the switch.

This patch simplifies the code for nir_intrinsic_load_ssbo
implementation changes we are going to do next.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
922b3d1bb1 i965/nir/vec4: Implement nir_intrinsic_store_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
337dad8cee i965/nir/fs: Implement nir_intrinsic_store_ssbo
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Iago Toral Quiroga
9bb7d9ecf8 nir: Implement __intrinsic_store_ssbo
v2 (Connor):
 - Make the STORE() macro take arguments for the extra sources (and their
   size) and any extra indices required.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
f17c6b9066 i965/vec4: Import surface message builder functions.
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit.

v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
d5503ce39f i965/vec4: Import helpers to convert vectors into arrays and back.
These functions handle the conversion of a vec4 into the form expected
by the dataport unit in message and message return payloads.  The
conversion is not always trivial because some messages don't support
SIMD4x2 for some generations, in which case a strided copy may be
necessary.

v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Francisco Jerez
402cb7ce13 i965/vec4: Introduce VEC4 IR builder.
See "i965/fs: Introduce FS IR builder." for the rationale.

v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument.  Improve handling
    of debug annotations and execution control flags.  Rename "instr"
    variable.  Initialize cursor to NULL by default and add method to
    explicitly point the builder at the end of the program.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
203cd1bf28 glsl: shader storage blocks use different max block size values than uniforms
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
eb9a9b62b1 glsl: ignore buffer variables when counting uniform components
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
138e4ae8ae glsl: number of active shader storage blocks must be within allowed limits
Notice that we should differentiate between shader storage blocks and
uniform blocks, since they have different limits.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
a7b4ab45d0 glsl: a shader storage buffer must be smaller than the maximum size allowed
Otherwise, generate a link time error as per the
ARB_shader_storage_buffer_object spec.

v2:
- Fix error message (Jordan)

v3:
- Move std140_size() changes to its own patch (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
e854a98001 glsl: add std430 interface packing support to ssbo related operations
v2:
- Get interface packing information from interface's type, not the
  variable type.
- Simplify is_std430 condition in emit_access() for readability (Jordan)
- Add a commment explaing why array of three-component vector case is
  different in std430 than the rest of cases.
- Add calls to std430_array_stride().

v3:
- Simplify size_mul change for std430's case (Jordan)
- Fix commit log lines length (Jordan)
- Pass 'packing' instead of 'is_std430' to emit_access() (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
1be180b941 glsl: Add std430 support to program_resource_visitor's member functions
They are used to calculate the offset, array stride of uniform/shader
storage buffer variables. Take into account this info to get the right
value for std430.

v2:
- Fix commit log line length and indention. (Jordan)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
8f0167c65b glsl: Add parser/compiler support for std430 interface packing qualifier
v2:
- Fix a missing check in has_layout()

v3:
- Mention shader storage block in error message for layout qualifiers
  (Kristian).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez
35476c2bae glsl: Add std430 related member functions to glsl_type class
They are used to calculate size, base alignment and array stride values
for a glsl_type following std430 rules.

v2:
- Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
a40f917c4b glsl: allow default qualifiers for shader storage block definitions
This kind of definitions:

    layout(xxx) buffer;

was not supported by commit 84fc5fece0.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
3763a0e0a7 glsl: Move interface block processing to glsl_parser_extras.cpp
No functional changes.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
9c1f10b1bc glsl: ignore default qualifier declarations when checking for duplicate layout qualifiers
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
130031168d glsl: layout qualifier can appear more than once since OpenGL 4.20
Also if GL_ARB_shading_language_420pack extension is enabled.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
5bb5eeea00 i965/wm: surfaces should have the API buffer size, not the drm buffer size
The returned drm buffer object has a size multiple of 4096 but that should not
be exposed to the API user, which is working with a different size.

As far as I can see this problem is only visible in the calculation of the
length of unsized arrays used in SSBOs, as the implementation of this needs
to query the underlying buffer size via a message.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
eaa6f01c8d i965/wm: emit null buffer surfaces when null buffers are attached
Otherwise we can expect odd things to happen if, for example, we ask
for the size of the attached buffer from shader code, since that
might query this value from the surface we uploaded and get random
results.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f5dd2c1822 i965/fs/nir: implement nir_intrinsic_get_buffer_size
v2:
- Remove inst->regs_written assignment as the instruction only
  writes to one register.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
b23eb643eb i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZE
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
65d7f5fe9f i965/vec4/nir: implement nir_intrinsic_get_buffer_size
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
6485880232 i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZE
Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
003ce30e36 nir: Implement ir_unop_get_buffer_size
This is how backends provide the buffer size required to compute
the size of unsized arrays in the previous patch

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
750c694474 glsl: implement unsized array length
v2:
- Reduce the number of lines over 80 character line width
  limit. (Thomas Hellan)

v3:
- Inject the formula to compute the array length in the IR, backends
  only need to provide the buffer size (Curro)
- Create an auxiliary function to simplify code (Jordan Justen)
- Rename variables (Jordan Justen)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
273f61a005 glsl: Add parser/compiler support for unsized array's length()
The unsized array length is computed with the following formula:

array.length() =
   max((buffer_object_size - offset_of_array) / stride_of_array, 0)

Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.

This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.

It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.

v2:
- Do not define a triop that will force backends to implement the
  entire formula, they should only need to provide the buffer size
  since the other values are known by the frontend (Curro).

v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
  of using state->ARB_shader_storage_buffer_object_enable (Tapani).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
1440d2a683 glsl: Add unsized array support to glsl_type::std140_size()
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
68f5a4e6d2 glsl: fix indention in glsl_types.cpp
No functional changes.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f3f64cd0c4 glsl: add support for unsized arrays in shader storage blocks
They only can be defined in the last position of the shader
storage blocks.

When an unsized array is used in different shaders, it might be
converted in different sized arrays, avoid get a linker error
in that case.

v2:
- Rework error condition and error messages (Timothy Arceri)

v3:
- Move OpenGL ES check to its own patch.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Samuel Iglesias Gonsalvez
f45d39f6af glsl: return error if unsized arrays are found in OpenGL ES
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
6335c79236 i965/fs: Do not split buffer variables
Buffer variables are the same as uniforms, only that read/write, so we want
the same treatment.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
2773a7cf1d i965: handle visiting of ir_var_shader_storage variables
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
37da6a2acd i965: Upload Shader Storage Buffer Object surfaces
Since these are a special kind of UBOs we emit them together reusing the
same infrastructure, however, we use a RAW surface so we can reuse
existing untyped read/write/atomic messages which include a pixel mask
header that we need to set to obtain correct behavior with helper
invocations of the fragment shader.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
bdbabc57e3 i965: Set MaxShaderStorageBuffers for compute shaders
v2:
- Set it after the driver's MaxShaderStorageBuffers value assignment.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Samuel Iglesias Gonsalvez
36f392c4ef i965: set ARB_shader_storage_buffer_object related constant values
v2:
- Add tessellation shader constants assignment

v3:
- Set MaxShaderStorageBufferBindings to 36.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
dfdeb94a5a i965: Implement DriverFlags.NewShaderStorageBuffer
We use the same dirty state for SSBOs and UBOs because they share the
same infrastructure.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Iago Toral Quiroga
332ff009ff i965: Use 64-byte offset alignment for shader storage buffers
This should be a cacheline (64 bytes) so that we can safely have the
CPU and GPU writing the same SSBO on non-cachecoherent systems (our
Atom CPUs). With UBOs, the GPU never writes, so there's no
problem. For an SSBO, the GPU and the CPU can be updating disjoint
regions of the buffer simultaneously and that will break if the
regions overlap the same cacheline.

v2:
- Use cacheline size (64 bytes) instead of 16 bytes (Kristian).
- Update commit log and add a comment in the code explaining
  why we use cacheline size (Ben).

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Samuel Iglesias Gonsalvez
4cf908f9cb mesa: set MAX_SHADER_STORAGE_BUFFERS to 16.
v2:
- Set the value to 16 and drop the comment. (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:20 +02:00
Tapani Pälli
4639cea292 glsl: add packed varyings to program resource list
This makes sure that user is still able to query properties about
variables that have gotten packed by lower_packed_varyings pass.

Fixes following OpenGL ES 3.1 test:
   ES31-CTS.program_interface_query.separate-programs-vertex

v2: fix 'name included in packed list' check (Ilia Mirkin)
v3: iterate over instances of name using strtok_r (Ilia Mirkin)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-25 08:14:41 +03:00
Tapani Pälli
a6b55beb78 mesa: add packed_varyings list to gl_shader
This is required to store information about packed varyings, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-25 08:05:59 +03:00
Jordan Justen
ebbe6cdad7 i965/cs: Implement DispatchComputeIndirect support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
d11d018ce3 mesa/cs: Implement glDispatchComputeIndirect
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
12cf91db02 mesa/cs: Support GL_DISPATCH_INDIRECT_BUFFER
v2:
 * Use _mesa_has_compute_shaders (Ilia)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Jordan Justen
4a1ba7e6bd mesa/cs: Add _mesa_validate_DispatchCompute
Move API validation to _mesa_validate_DispatchCompute in
api_validate.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-24 19:15:13 -07:00
Roland Scheidegger
19604d30e1 mesa: fix mipmap generation for immutable, compressed textures
If the immutable compressed texture didn't have the full mip pyramid,
this didn't work, because it tried to generate mip levels for non-existing
levels. _mesa_prepare_mipmap_level() would correctly handle this by returning
FALSE if the mip level didn't exist, however we actually created the
non-existing mip level right before that because we used _mesa_get_tex_image()
before calling _mesa_prepare_mipmap_level(). It would then proceed to crash
(we allocated the mip level, which is a bad idea on an immutable texture,
but didn't initialize the values, leading to assertion failures or segfaults).
Fix this by using _mesa_select_tex_image() instead and call it after
_mesa_prepare_mipmap_level(), as that function will allocate missing mip levels
for non-immutable textures already.
This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu
packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip
chains - I believe this app not doing it is actually unintentional, always one
level less than full mip chain...).

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-25 00:06:10 +02:00
Matt Turner
d6bb46bbe8 glsl: Expose gl_MaxTess{Control,Evaluation}AtomicCounters.
... with only ARB_shader_atomic_counters.

I expected to see interactions with ARB_tessellation_shader in the
ARB_shader_atomic_counters spec, but they do not exist. It seems that we
should unconditionally expose these variables in the presence of
ARB_shader_atomic_counters:

   gl_MaxTessControlAtomicCounters
   gl_MaxTessEvaluationAtomicCounters

This partially reverts commit da7adb99e8. The commit also affected
gl_MaxTessControlImageUniforms and gl_MaxTessEvaluationImageUniforms
similarly but the ARB_shader_image_load_store spec does list an
interaction with ARB_tessellation_shader.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92095
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-24 12:15:47 -07:00
Alejandro Piñeiro
7fee23569b i965/vec4: check swizzle before discarding a uniform on a 3src operand
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.

For example, this is what vec4_visitor::fix_3src_operand() is already
doing:

   if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
      return src;

Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs:     812755 -> 795090 (-2.17%)
helped:                                7930
HURT:                                  0

Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs:     296630 -> 292596 (-1.36%)
helped:                                2533
HURT:                                  0

v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
  Ekstrand suggested
- Clean up the condition

v3:
- Move the check back to the original place, to keep things
  tidy, as suggested by Jason Ekstrand

v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-24 21:12:53 +02:00
Mauro Rossi
1d040160f8 android: radeonsi: fix sid_tables.h missing LOCAL_MODULE_CLASS
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 20:05:41 +02:00
Benjamin Bellec
ebcc886d87 gallium/radeon: remove the percentage symbol from HUD temperature
The HUD adds '%' if max == 100.

Signed-off-by: Benjamin Bellec <b.bellec@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 19:54:50 +02:00
Marek Olšák
7bbce21e45 gallium/u_blitter: handle allocation failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
ae418a7b56 radeonsi: handle dummy constant buffer allocation failure
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
b737d9c1dc radeonsi: don't forget to update scratch relocations for LS, HS, ES shaders
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
d556346b35 radeonsi: skip drawing if updating the scratch buffer fails
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
1f99b0be7e radeonsi: skip drawing if PS fails to compile or upload
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
237d7cccce radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
9b6d9dd7d8 radeonsi: handle fixed-func TCS shader create failure
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
5dbadb0257 radeonsi: handle shader precompile failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
263f5a2cf9 radeonsi: skip drawing if GS ring allocations fail
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:43 +02:00
Marek Olšák
22d3ccf5a8 radeonsi: skip drawing if the tess factor ring allocation fails
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
5c219ab552 radeonsi: add malloc fail paths to si_create_shader_state
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
394d67a58f radeonsi: report alloc failure from si_shader_binary_read
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
dea834e639 gallium/radeon: add a fail path for depth MSAA texture readback
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
f95e695059 gallium/radeon: handle buffer alloc failures in r600_draw_rectangle
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
282b378012 gallium/radeon: handle buffer_map staging buffer failures better
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
cd27ff6a0f radeonsi: handle constant buffer alloc failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
29dff6f676 radeonsi: handle index buffer alloc failures
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-24 19:51:42 +02:00
Marek Olšák
f3a0819533 st/mesa: fix front buffer regression after dropping st_validate_state in Blit
Broken by: d082c53249
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-24 19:51:42 +02:00
Kristian Høgsberg Kristensen
21c1c7ff81 wayland: Add copyright notice for wayland-egl.c
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-24 10:51:10 -07:00
Kristian Høgsberg Kristensen
2ea16966ae i965: Respect stride and subreg_offset for ATTR registers
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-24 10:17:27 -07:00
Brian Paul
200aee4247 mesa: rework Driver.CopyImageSubData() and related code
Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures
to wrap renderbuffer sources/destinations.  This caused a bit of a mess in
the Mesa/gallium state tracker because we had to basically undo that
wrapping.

Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer
and gl_texture_image src/dst pointers (one being null, the other non-null)
so the driver can handle renderbuffer vs. texture as needed.

For the i965 driver, we basically moved the code that wrapped textures
around renderbuffers from copyimage.c down into the met and driver code.

The old code in copyimage.c also made some questionable calls to
_mesa_BindTexture(), etc. which weren't undone at the end.

v2 (Jason Ekstrand): Rework the intel bits
v3 (Brian Paul): Update the temporary st_CopyImageSubData() function.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
2015-09-24 07:52:42 -06:00
Thomas Hellstrom
c8cb5ed93c st/xa: Fixups for PIPE_FORMAT_R8_UNORM A8 usage v2.
Check for PIPE_FORMAT_R8_UNORM when setting up the copy shader.
Also re-enable the dest alpha blending with A8 destination that
actually turned out to be correct.

Verified using rendercheck that the composite operators
overreverse, in, out, atop, atopreverse and xor seem to work fine
with a8 destiation.

v2: Fix a copy-paste error.

Reported-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-24 04:47:48 -07:00
Ilia Mirkin
1614c39a8f st/mesa: keep track of saturated writes when eliminating dead code
It doesn't matter whether a write is saturated or not, in another
implementation it might even have been a separate opcode. This code was
most likely copied from the copy-propagation pass (where one does have
to distinguish saturation).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-24 00:19:55 -04:00
Timothy Arceri
827d794834 glsl: correctly detect inactive UBO arrays
Previously the code was trying to get the packing type from the array not the
interface.

Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-09-24 10:07:42 +10:00
Ilia Mirkin
71e187430c i965: add ARB_texture_barrier support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 15:49:54 -04:00
Kenneth Graunke
31a36ffbc8 i965/gs: Fix extra level of indentation left by the previous commit.
I left a bunch of code indented a level in the previous patch to make
the diff easier to read.  But now we should fix that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
df31c1850d i965/gs: Use new NIR intrinsics.
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:

total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs:     3491 -> 3259 (-6.65%)
helped:                                11
HURT:                                  0

Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.

This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
542d40d698 nir: Add new GS intrinsics that maintain a count of emitted vertices.
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones.  See the comments above that for the
rationale behind the new intrinsics.

This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.

v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
  Ekstrand - I'd mistakenly used nir_after_block when rebasing this
  code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
  the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
  Michael Schellenberger Costa).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
0a040975ec nir: Add unit tests for control flow graphs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
fbaa1b19d7 nir/cf: Fix dominance metadata in the dead control flow pass.
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on.  Preserving
information about block dominance is hard (and probably not worthwhile).

This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.

We then make the dead control flow pass compute dominance information
right before it uses it.  This is necessary because earlier work by the
pass may have invalidated it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
6560838703 nir/cf: Fix unlink_block_successors to actually unlink the second one.
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0].  So the successors[1] != NULL check will
always fail.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke
024e5ec977 nir/cf: Alter block successors before adding a fake link.
Consider the case of "while (...) { break }".  Or in NIR:

        block block_0 (0x7ab640):
        ...
        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 */
                break
                /* succs: block_2 */
        }
        block block_2:

Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.

Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set.  This is illegal: a block cannot specify the
same successor twice.  In particular, adding the predecessor would have
no effect, as it was already present in the set.

We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set.  It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.

The fix appears to be simple: simply unlink the block's successors and
recreate them to point at the correct blocks first.  Then, add the fake
link.  In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), so adding
the fake link won't cause a duplicate successor.

v2: Add comments (requested by Connor Abbott) and fix commit message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
0991b2eb35 nir/cf: Conditionally do block_add_normal_succs() in unlink_jump();
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges.  In order to
fix that, I need everything in one routine.

However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
9674c76c0e nir/cf: Don't break outer-block successors in split_block_beginning().
Consider the following NIR:

   block block_0;
   /* succs: block_1 block_2 */
   if (...) {
      block block_1;
      ...
   } else {
      block block_2;
   }

Calling split_block_beginning() on block_1 would break block_0's
successors:  link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */.  This is invalid: the
block before an if statement must have two successors.

Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order.  NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.

This patch creates a new function that swaps out one successor for
another, preserving the ordering.  It then uses this to fix the issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
e2637db618 nir/cf: Make a helper function for removing a predecessor.
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke
6a67ede6b3 nir: Validate that a block doesn't have two identical successors.
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Jason Ekstrand
8dcbca5957 nir/lower_vec_to_movs: Don't emit unneeded movs
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself.  If swizzling is involved,
we need to emit the move but.  However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.

Shader-db results on Haswell:

   total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
   instructions in affected programs:     184780 -> 181802 (-1.61%)
   helped:                                838
   HURT:                                  0

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Jason Ekstrand
65e80ce5b5 nir/lower_vec_to_movs: Properly handle source modifiers on vecN ops
I don't know of any piglit tests that are currently broken.  However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Ville Syrjälä
aae0c88797 i915: Make hw_prim[] const
The table used to map the GL primitive to the hw primitive never
changes so make it const.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-23 09:57:46 -07:00
Ville Syrjälä
84fec757de t_dd_dmatmp: Make the render_tab[]s const
These tables hold function pointers and they never change so
make them const.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-23 09:57:46 -07:00
Ian Romanick
abbaf3301f mesa: Remove unused HAVE_TRI_STRIP_1 defines
Defined to 0 in a few places, but it's not used anywhere.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:42 -07:00
Ian Romanick
d830965057 t_dd_dmatmp: Constify dmasz
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:38 -07:00
Ian Romanick
8e9968f184 t_dd_dmatmp: Silence comparison between signed and unsigned integer expression warnings
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                                                           ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j);
                                                       ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j);
                                                    ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                         ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
       nr = MIN2(currentsz, count - j + 1);
                                                        ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - j + 1);
                                                           ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                            ^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
          nr = MIN2(currentsz, count - nr);
                                                        ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:26 -07:00
Ian Romanick
d663d8f5d4 t_dd_dmatmp: Use stdbool.h
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:24 -07:00
Ian Romanick
b7259fc6b0 t_dd_dmatmp: General indentation and formatting fixes
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:22 -07:00
Ian Romanick
57ae5c237d t_dd_dmatmp: Indentation and formatting fixes after HAVE_ELTS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:20 -07:00
Ian Romanick
25b42f13bd t_dd_dmatmp: Remove HAVE_ELTS support
Two drivers use this file, and neither supports ELTs.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:17 -07:00
Ian Romanick
1f374958fd t_dd_dmatmp: Indentation and formatting fixes after HAVE_TRI_FANS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:15 -07:00
Ian Romanick
03c3208c18 t_dd_dmatmp: Require HAVE_TRI_FANS
Two drivers use this file, and both support triangle fans.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:13 -07:00
Ian Romanick
2e19ed3cb5 t_dd_dmatmp: Indentation and formatting fixes after HAVE_TRI_STRIPS change
v2: Fix '- nr' typo noticed by Marius.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
2015-09-23 09:57:11 -07:00
Ian Romanick
fd97a05508 t_dd_dmatmp: Require HAVE_TRI_STRIPS
Two drivers use this file, and both support triangle strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:08 -07:00
Ian Romanick
22b73f3c2a t_dd_dmatmp: Require HAVE_TRIANGLES
Two drivers use this file, and both support triangles.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:06 -07:00
Ian Romanick
dcd8e49962 t_dd_dmatmp: Indentation and formatting fixes after HAVE_LINE_STRIPS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:04 -07:00
Ian Romanick
1ecdf956ac t_dd_dmatmp: Require HAVE_LINE_STRIPS
Two drivers use this file, and both support line strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:57:01 -07:00
Ian Romanick
1ab8a69a3b t_dd_dmatmp: Indentation and formatting fixes after HAVE_LINES change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:59 -07:00
Ian Romanick
b8461e03f0 t_dd_dmatmp: Require HAVE_LINES
Two drivers use this file, and both support lines.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:56 -07:00
Ian Romanick
265624c5af t_dd_dmatmp: Indentation and formatting fixes after HAVE_QUADS change
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:53 -07:00
Ian Romanick
4ecc387a93 t_dd_dmatmp: Remove HAVE_QUADS support
Two drivers use this file, and neither supports quads.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:51 -07:00
Ian Romanick
249ba09f59 t_dd_dmatmp: Remove HAVE_QUAD_STRIPS support
Two drivers use this file, and neither supports quad strips.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-23 09:56:48 -07:00
Ian Romanick
25543d8ec5 t_dd_dmatmp: Use addition instead of subtraction in loop bounds
This is used everywhere else in this file because it avoids problems
when count is zero (due to trimming).

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: Marius Predut <marius.predut@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:46 -07:00
Ian Romanick
c0b3b2f760 t_dd_dmatmp: Pull out common 'count -= count & 3' code
This was missing in the HAVE_TRIANGLES path, and that could cause
incorrect rendering.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: Marius Predut <marius.predut@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:43 -07:00
Ian Romanick
0d475ee2b9 t_dd_dmatmp: Use '& 3' instead of '% 4' everywhere
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:36 -07:00
Ian Romanick
fad8d54de7 t_dd_dmatmp: Clean up improper code formatting from previous patch
No piglit regressions on i915 (G33) or radeon (Radeon 7500).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:34 -07:00
Ian Romanick
d7bf7969b9 t_dd_dmatmp: Make "count" actually be the count
The value passed in count previously was "vertex after the last vertex
to be processed."  Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count.  That's just silly.

If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.

NOTE: There is some strange formatting left by this patch.  That was
done to make it more obvious that the before and after code is
equivalent.  These will be fixed in the next patch.

No piglit regressions on i915 (G33) or radeon (Radeon 7500).

v2: Fix a remaining (count-start) in render_quad_strip_verts.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-23 09:56:01 -07:00
Antia Puentes
f2e75ac88a i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 13:12:25 +02:00
Iago Toral Quiroga
cf439951b7 mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.
From section 9.2. Binding and Managing Framebuffer Objects:

"Upon successful return from Get*FramebufferAttachmentParameteriv, if
pname is FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, then params will contain
one of NONE, FRAMEBUFFER_DEFAULT, TEXTURE, or RENDERBUFFER, identifying
the type of object which contains the attached image."

And then it clarifies further:

"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default framebuffer is
bound, attachment is DEPTH or STENCIL, and the number of depth or stencil
bits, respectively, is zero"

Currently, if the default framebuffer is bound, we always return
GL_FRAMEBUFFER_DEFAULT for FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, but
according to the spec, when GL_DEPTH or GL_STENCIL attachments are
the ones being queried, we should return GL_NONE if they don't exist.

Fixes the following dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
2015-09-23 12:50:00 +02:00
Tapani Pälli
89524e7171 glsl: bail out early in _mesa_ShaderSource if no shaderobj
Patch fixes a crash in conformance test that tries out different
invalid arguments for glShaderSource and glGetShaderSource:

   ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle

This is a regression from commit:
   04e201d0c0

Additions in v2 also fix following failing deqp test:
   dEQP-GLES[2|3].functional.negative_api.shader.shader_source

v2: cleanup function, do check earlier (Iago Toral)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-23 08:45:00 +03:00
Matt Turner
10da96887c i965/vec4: Detect and delete useless MOVs.
With NIR:

instructions in affected programs:     111508 -> 109193 (-2.08%)
helped:                                507

Without NIR:

instructions in affected programs:     28763 -> 28474 (-1.00%)
helped:                                186

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 21:20:29 -07:00
Jason Ekstrand
e7496fed2a prog_to_nir: Use nir_op_dph
Shader-db results on HSW:

   instructions in affected programs:     72 -> 56 (-22.22%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
999ff3c77d nir/lower_alu_to_scalar: Add support for nir_op_fdph
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
2e5423ad63 i965/vec4: Add support for fdph_replicated
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
e5a9346d00 nir: Add fdph and fdph_replicated opcodes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
0f9bf64770 nir/lower_alu_to_scalar: Return after lower_reduction
We don't use any of the code after the switch anyway.  Since we check for
num_components == 1 and early-return, it doesn't get executed so
everything's ok.  However, it makes it much clearer what's going on if we
simply do an early return.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand
2b79db2c02 nir/lower_alu_to_scalar: Use the builder
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Chris Forbes
f5991ebf34 i965: Add defines for tessellation stages
v2 (Ken):
- Squash together commits for HS, DS, and TE, as well as fixes.
- Add INTEL_MASK variants so we can use SET_FIELD if we want.
- Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match
  the documentation.
- Add some more fields from the PRMs.
- Add Broadwell variants.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:23:46 -07:00
Grazvydas Ignotas
8ae8feca84 r600g: update num_dw in scissor_enable workaround
"r600g: apply disable workaround on all scissors" forgot to update
num_dw, fix it.

Fixes: fbb423b433 "r600g: apply disable workaround on all scissors"
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-23 09:09:04 +10:00
Alejandro Piñeiro
1bd89db921 i965/vec4: refactor brw_vec4_copy_propagation.
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:

1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value

Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches, even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).

Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs:     739837 -> 725032 (-2.00%)
helped:                                6237
HURT:                                  0

v2: using 'arg' index to get the from inst was wrong
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
    only set the source on a direct copy
v5: change the approach for a refactoring
v6: tweaked comments

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-22 19:30:18 +02:00
Brian Paul
4a03066e5a st/mesa: remove st_bind_framebuffer()
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer
is null, Mesa won't try to use it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:32 -06:00
Brian Paul
b590ffd0f9 mesa: const-qualify _mesa_is_legal_tex_storage_format ctx param
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:32 -06:00
Brian Paul
acee1a322d mesa: const-qualify _mesa_base_tex_format() ctx param
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:31 -06:00
Brian Paul
4879b76601 mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-22 10:15:30 -06:00
Brian Paul
76dbab0a69 mesa: whitespace, comment fixes in texstorage.c 2015-09-22 09:10:10 -06:00
Marta Lofstedt
419210005a mesa/es3.1: Enable GL_ARB_vertex_attrib_binding functionality for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:22:13 +02:00
Marta Lofstedt
cf293e518e mesa/es3.1: Allow query of Vertex bindings for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:22:06 +02:00
Marta Lofstedt
6c3de8996f mesa/es3.1 : Align OpenGL ES 3.1 glBindVertexBuffer error handling with OpenGL Core
According to OpenGL ES 3.1 specification 10.3.1:
"An INVALID_OPERATION error is generated if buffer is not zero
or a name returned from a previous call to GenBuffers,
or if such a name has since been deleted with DeleteBuffers."
This error check was previously limited to OpenGL Core.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-22 12:21:59 +02:00
Tapani Pälli
7f8815bcb9 i965: fix textureGrad for cubemaps
Fixes bugs exposed by commit
2b1cdb0edd in:
   ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_frag

No regressions observed in deqp, CTS or Piglit.

v2: address review feedback from Iago Toral:
   - move rho calculation to else branch
   - optimize dx and dy calculation
   - fix documentation inconsistensies

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91114
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-22 08:14:20 +03:00
Kenneth Graunke
5cede90f62 nir: Report progress from nir_normalize_cubemap_coords().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:54:34 -07:00
Kenneth Graunke
d7ffd90ecb nir: Add braces around multi-line loop.
This was correct but not our usual style.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:01 -07:00
Kenneth Graunke
0a1adaf11d nir: Report progress from nir_lower_system_values().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:00 -07:00
Kenneth Graunke
dc18b9357b nir: Report progress from nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:59 -07:00
Kenneth Graunke
cfae0f8a3a nir: Report progress from nir_lower_locals_to_regs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:57 -07:00
Kenneth Graunke
1adde5b87e nir: Report progress from nir_remove_dead_variables().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:55 -07:00
Jason Ekstrand
9f5e7ae9d8 nir: Report progress from lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:54 -07:00
Kenneth Graunke
967a5ddb88 nir: Report progress from nir_lower_globals_vars_to_local().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:45 -07:00
Jason Ekstrand
60befc6347 i965: Clean up GLSL compiler option setup
The only functional change here is that we now set EmitNoIndirectOutput and
EmitNoIndirectTemp for compute shaders.  Compute shaders don't have outputs
per-se and we should have been setting EmitNoIndirectTemp all along.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-21 13:26:52 -07:00
Jeremy Huddleston
6dfc5e28f7 configure.ac: Add support to enable read-only text segment on x86.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.gentoo.org/240956
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-21 12:47:09 -07:00
Ben Widawsky
c1e38ad370 i965/skl: Use larger URB size where available.
All SKL SKUs except the lowest one which has half the L3 size actually have 384K
of URB per slice.

For once, I can explain how this mistake was made and how it was missed in
review...  Historically when we enable a platform and put the production sizes,
you can simply look at the "smallest" SKU and see what its URB size is (and we
assumed it was the 1 slice variant). Since on newer platforms the URB sizes are
scaled automatically by HW, this was sufficient. On SKL, this is a bit different
as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not
GT1) variant and it has 384K.

There are no Jenkins tests fixed (or regressions) and we don't expect any fixes
here because you can always run with less URB size.

Thanks to Sarah for bringing this to my attention.

Cc: Sarah Sharp <sarah.a.sharp@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-09-21 11:27:08 -07:00
Jason Ekstrand
46362db4a6 nir/builder: Don't use designated initializers
Designated initializers are not allowed in C++ (not even C++11).  Since
nir_lower_samplers is now using nir_builder, and nir_lower_samplers is in
C++, this breaks the build on some compilers.  Aparently, GCC 5 allows it
in some limited extent because mesa still builds on my system without this
patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92052
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 10:41:43 -07:00
Jason Ekstrand
d513388c8a nir: Move system value -> intrinsic mapping into nir.c
This way they're right next to the map going the other direction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 09:49:40 -07:00
Emil Velikov
de7ffdb383 nir: rename nir_lower_samplers.c{pp,}
With the only C++ function having its own wrapper we can 'demote' this
file to a normal C one. This allows us to get rid of extern C { #include
<foo.h> } 'hacks'. Plus some of the headers may use C99 initializers,
which are not supported by the ISO standard.

This may cause build issue on incremental builds. If so run the
following:

sed -i -e 's|samplers\.cpp|samplers.c|' src/glsl/nir/.deps/nir_lower_samplers.Plo

Fixes: ef8eebc6ad5(nir: support indirect indexing samplers in struct arrays)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Gottfried Haider <gottfried.haider@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:02:06 +01:00
Emil Velikov
d130cda453 nir: add C wrapper around glsl_type::record_location_offset
This will allow us to convert nir_lower_sampler.cpp to C.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:56 +01:00
Emil Velikov
bdb1faf44e nir: move stdio.h inclusion before extern C
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:32 +01:00
Kenneth Graunke
c1070550c2 i965: Fix MRF register number assertions for compr4.
compr4 is represented by setting the high bit on the MRF number.
We need to mask it out before sanity checking the register number.

Fixes ~8000 assert fails on Ironlake and G45.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 07:45:14 -07:00
Ilia Mirkin
72ebd532a1 radeonsi: implement TXQS support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fredrik Bruhn <f@unibap.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-21 08:31:29 -04:00
Ilia Mirkin
7d5162bdc0 radeonsi: load fmask ptr relative to the resources array
res_ptr already contains the resource values. fmask_ptr needs to be
looked up relative to the start of the resource params.

Note that this only affects indirect loads of MS sampler arrays.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-21 08:30:51 -04:00
Iago Toral Quiroga
5d23ce2f15 i965/vec4: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:48:05 +02:00
Iago Toral Quiroga
6789a32075 i965/fs: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:56 +02:00
Iago Toral Quiroga
f50645d05c i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631

Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).

Notice that the hardware docs are not clear about this fact:

SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"

However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:

"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."

Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:

Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461

Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461

This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:45 +02:00
Iago Toral Quiroga
0858610836 i965: Move MRF register asserts out of brw_reg.h
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.

Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:35 +02:00
Iago Toral Quiroga
d48ac93066 i965: Maximum allowed size of SEND messages is 15 (4 bits)
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.

Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 12:47:03 +02:00
Rob Clark
b65f91dd32 nir/print: fix coverity error
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path).  But it
was something I overlooked the first time, so might as well fix it.

    *** CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    /src/glsl/nir/nir_print.c: 299 in print_var_decl()
    293
    294           fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
    295        }
    296
    297        fprintf(fp, "\n");
    298
    >>>     CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    >>>     Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    299        if (state) {
    300           _mesa_set_add(state->syms, name);
    301           _mesa_hash_table_insert(state->ht, var, name);
    302        }
    303     }
    304

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-20 14:04:06 -04:00
Eduardo Lima Mitev
6ba291db4b i965/vec4/nir: Remove all "this->" snippets
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-20 17:11:49 +02:00
Marcin Ślusarz
8f6fd57db2 dri/common: fix gbm-symbols-check regression
Broken by commit c228514c72
"dri/common: use sysconfdir when looking for drirc".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92054
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-09-20 13:44:07 +02:00
Emil Velikov
1e01db0fa9 docs: add news item and link release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-20 11:59:24 +01:00
Emil Velikov
278a32374c docs: add sha256 checksums for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 02387926ad)
2015-09-20 11:58:04 +01:00
Emil Velikov
72d407da10 docs: add release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 91c6302734)
2015-09-20 11:58:03 +01:00
Nanley Chery
99b1f4751f mesa/teximage: reuse compressed format utility functions for base_format
Reuse utility functions instead of reimplementing the same logic.

* _mesa_is_compressed_format() performs the required checking to
  determine format support in the current context.
* _mesa_gl_compressed_format_base_format() returns the base format.

As a side effect, we now check that we're in a desktop context when
determining support for the FXT1 and RGTC formats. This is in agreement
with our extension table and the glext headers.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:27:15 -07:00
Nanley Chery
db2777091d mesa/texcompress: add compressed formats to base format utility function
Add S3TC and PALETTE formats.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:27:10 -07:00
Nanley Chery
29835fe19e mesa/glformats: refactor compressed format support function
Instead of case statements, use _mesa_get_format_layout() to
determine if a GL format is part of a family of compressed formats.

v2. restrict LATC formats to API_OPENGL_COMPAT (Ilia).
    rename the variable mFormat to m_format.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:26:55 -07:00
Nanley Chery
31a5135cd7 mesa/formats: add MESA_LAYOUT_LATC
This enables us to predicate statments on a compressed format being
a type of LATC format. Also, remove the comment that lists the enum
(it was getting a tad long).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-19 13:25:59 -07:00
Marcin Ślusarz
c228514c72 dri/common: use sysconfdir when looking for drirc
Useful when locally installed mesa has more quirks than the system one.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-19 19:17:34 +02:00
Rob Clark
9ffc1049ca freedreno/ir3: use nir two-sided-color lowering
With this, we completely switch over to nir lowering passes instead of
tgsi_lowering.  So one step closer to supporting direct glsl or spirv to
nir support for freedreno a3xx/a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-18 21:07:50 -04:00
Rob Clark
e13ed3ffb4 nir: add two-sided-color lowering pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-18 21:07:50 -04:00
Rob Clark
e4dfcdcbec nir/build: add nir_vec() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-18 21:07:50 -04:00
Rob Clark
c71cb670ba freedreno/ir3: lower txp/clamp in NIR
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-18 21:07:50 -04:00
Rob Clark
3745c38425 nir/lower_tex: add support to clamp texture coords
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP.  This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.

v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
1ce8060c25 nir/lower_tex: support for lowering RECT textures
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
faf5f174dd nir/lower_tex: support projector lowering per sampler type
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types.  In this case we want more fine grained control over
which texture projectors get lowered.

v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
f83ba7bc41 nir/lower_tex: split out project_src() helper
Split this out to reduce noise in later patches.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark
d9b9ff76f1 nir: rename nir_lower_tex_projector
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Alejandro Piñeiro
06d31dceae i965/vec4: Change types as needed to propagate source modifiers using current instruction
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around.  So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.

This commit adds support for copy propagation using current instruction
as reference.

Equivalent to commit 472ef9 but for vec4.

v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
    solve (among others) the following deqp regression with v2:
    dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-19 00:31:25 +02:00
Iago Toral Quiroga
f7ca52dd6d i965/fs: Fix comparison between signed and unsigned integer expressions
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-18 13:37:25 +02:00
Tapani Pälli
afa1efdc85 mesa: fix errors when reading depth with glReadPixels
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().

Fix regression caused by 81d2fd91a9 in:
   ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels

Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.

v2: change check in _mesa_error_check_format_and_type to be explicit
    for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
    this function for both glReadPixels and glDrawPixels validation.
    (No Piglit regressions seen with v2.)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-18 07:41:47 +03:00
Rob Clark
2e4ab489b5 nir/builder: fix c++11 compiler warning
Fixes:

   In file included from nir/nir_lower_samplers.cpp:27:0:
   nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
   nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
       unsigned swizzle[4] = {c, c, c, c};

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:08:25 -04:00
Rob Clark
7c72f593ad nir: really actually fix comment this time
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:06:11 -04:00
Rob Clark
5305603b9d nir/print: print variable names
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:26:12 -04:00
Rob Clark
ba78260b0f nir: some comment fixups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:25:33 -04:00
Rob Clark
c70ed86172 freedreno/ir3: add --gpu arg to cmdline compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
c970ec0577 freedreno/a4xx: wire up ucp support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
91ec210ea8 freedreno/ir3: add support for ucp
Use nir_lower_clip pass for adding the VS/FS instructions to handle
user-clip-planes and CLIPDIST.  Wire up support for load_user_clip_plane
intrinsic to fetch ucp[plane] values as driver-params (passed as const's
to the shader).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:57:52 -04:00
Rob Clark
509e0c4505 nir: add lowering stage for user-clip-planes / clipdist
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).

Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them).  But it's better than no UCP
support at all for drivers that don't have this in hw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-17 19:57:21 -04:00
Rob Clark
53671a3723 nir: add sysval for user-clip-planes
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-09-17 19:55:43 -04:00
Rob Clark
c4572b7dfe freedreno/ir3: convert from tgsi semantic/index to varying-slot
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Rob Clark
4a121e1a90 glsl: add SYSTEM_VALUE_VERTEX_CNT
Used internally in freedreno/ir3 to calc stream-out position.  Seems
like a generic enough way to implement stream-out (using str instrs),
plus it avoids compiler warnings by sneaking in a non-enum value in
switch statements.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Rob Clark
e523f69b1d freedreno/ir3: switch to shader_enums.h interp constants
A small step towards un-TGSI'ifying ir3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 19:55:43 -04:00
Ilia Mirkin
e844e1007d nv50,nvc0: flush texture cache in presence of coherent bufs
This fixes the newly-added arb_texture_buffer_object-bufferstorage
piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-17 19:50:47 -04:00
Ilia Mirkin
323c912506 nv50,nvc0: detect underlying resource changes and update tic
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.

This fixes:
  arb_direct_state_access-texture-buffer
  arb_texture_buffer_object-data-sync

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-17 19:50:47 -04:00
Boyan Ding
8d3b92af21 vc4: Try to pair up instructions when only one of them has PM bit
Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.

total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs:     11688 -> 10939 (-6.41%)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-17 14:57:46 -04:00
Jason Ekstrand
fc11dbe13f i965/vec4: Use nir_move_vec_src_uses_to_dest
The idea here is not that it gives register coalescing a little bit of a
helping hand.  It doesn't actually fix the coalescing problems, but it
seems to help a good bit.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
   instructions in affected programs:     1259166 -> 1196845 (-4.95%)
   helped:                                11363
   HURT:                                  148

v2 (Jason Ekstrand):
 - Run nir_move_vec_src_uses_to_dest after going out of SSA
 - New shader-db numbers

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-17 08:21:31 -07:00
Jason Ekstrand
a6c467d6c5 nir: Add a pass to rewrite uses of vecN sources to the vecN destination
v2 (Jason Ekstrand):
 - Handle non-SSA sources and destinations

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-17 08:19:48 -07:00
Jason Ekstrand
ddffe30f40 nir: Add comments to nir_index_instrs and nir_index_ssa_defs
The provided indices have the very nice property that if A dominates B then
A->index <= B->index.  We should document that somewhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Jason Ekstrand
8ecaef967d nir: Add a generic instruction index
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Ulrich Weigand
bd016a2601 mesa: Fix texture compression on big-endian systems
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.

The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order.  However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.

This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer.  This fixes
about 100 piglit test case failures on s390x for me.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-09-17 21:23:45 +10:00
Thomas Hellstrom
7e28650649 st/xa: Use PIPE_FORMAT_R8_UNORM when available
XA has been using L8_UNORM for a8 and yuv component surfaces.
This commit instead makes XA prefer R8_UNORM since it's assumed to have a
higher availability.

Also neither of these formats are suitable as destination formats using
destination alpha blending, so reject those operations.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-17 00:03:00 -07:00
Tapani Pälli
ba02f7a3b6 mesa: return initial value for VALIDATE_STATUS if pipe not bound
From OpenGL 4.5 Core spec (7.13):

    "If pipeline is a name that has been generated (without subsequent
    deletion) by GenProgramPipelines, but refers to a program pipeline
    object that has not been previously bound, the GL first creates a
    new state vector in the same manner as when BindProgramPipeline
    creates a new program pipeline object."

I interpret this as "If GetProgramPipelineiv gets called without a
bound (but valid) pipeline object, the state should reflect initial
state of a new pipeline object." This is also expected behaviour by
ES31-CTS.sepshaderobjs.PipelineApi conformance test.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Tapani Pälli
d9689be5c6 mesa: return initial value for PROGRAM_SEPARABLE when not linked
From OpenGL ES 3.1 spec (7.12):

    "Most properties set within program objects are specified not to
    take effect until the next call to LinkProgram or ProgramBinary.
    Some properties further require a successful call to either of
    these commands before taking effect. GetProgramiv returns the
    properties currently in effect for program, which may differ from
    the properties set within program since the most recent call to
    LinkProgram or ProgramBinary, which have not yet taken effect. If
    there has been no such call putting changes to pname into effect,
    initial values are returned."

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Tapani Pälli
8f1ae9abeb mesa: enable query of PROGRAM_PIPELINE_BINDING for ES 3.1
Specified in OpenGL ES 3.1 spec, Table 23.32: Program Object State.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-09-17 08:26:33 +03:00
Timothy Arceri
ef8eebc6ad nir: support indirect indexing samplers in struct arrays
As a bonus we get indirect support for arrays of arrays for free.

V5: couple of small clean-ups suggested by Jason.

V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason

V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason

V2: Fixed validation error in debug build

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:34 +10:00
Timothy
0ad44ce373 glsl: add helper for calculating offsets for struct members
V2: update comments

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:27 +10:00
Timothy Arceri
12af915e27 glsl: make variables private
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:21 +10:00
Timothy Arceri
dcd9cd0383 glsl: store uniform slot id in var location field
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.

V3: remove line wrap change from this patch

V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:14 +10:00
Timothy Arceri
9788700caf glsl: assign hidden uniforms their slot id earlier
This is required so that the next patch can safely assign the slot id
to the var.

The ids are now assigned in the order we want before allocating storage
so there is no need to sort the storage array and move things around.

V2: rename variable to make code easier to follow as suggested by Jason

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:26:45 +10:00
Timothy Arceri
874a0217fd glsl: order indices for samplers inside a struct array
This allows the correct offset to be easily calculated for indirect
indexing when a struct array contains multiple samplers, or any crazy
nesting.

The indices for the folling struct will now look like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 1 Name: s[1].tex
Sampler index: 2 Name: s[0].si.tex
Sampler index: 3 Name: s[1].si.tex
Sampler index: 4 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2

Before this change it looked like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 3 Name: s[1].tex
Sampler index: 1 Name: s[0].si.tex
Sampler index: 4 Name: s[1].si.tex
Sampler index: 2 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2

struct S_inner {
   sampler2D tex;
   sampler2D tex2;
};

struct S {
   sampler2D tex;
   S_inner si;
};

uniform S s[2];

V3: Update comments with suggestions from Jason

V2: rename struct array counter to have better name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:26:39 +10:00
Dave Airlie
b5df52b112 Revert "mesa/extensions: restrict GL_OES_EGL_image to GLES"
This reverts commit 48961fa3ba.

glamor/Xwayland use this, the spec saying something when it
was written, and the fact that the comment says Mesa relies on it
hasn't changed.

I also don't have a copy of this patch in my mail archive, which
seems wierd, did it get posted to mesa-dev?

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-17 06:58:51 +10:00
Eric Anholt
f5b26b4744 vc4: Only build in simulator mode if we find pkg-config for it.
This will let other developers build it x86 for build-testing purposes.
2015-09-16 15:54:00 -04:00
Ilia Mirkin
37d0becfd9 freedreno/a3xx: use NUM_USER_CLIP_PLANES helper instead of magic number
Use the helper from the newly-updated generated header file.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-16 15:42:55 -04:00
Ilia Mirkin
545a3cbb01 freedreno/a3xx: fix blending of L8 format
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-16 15:42:55 -04:00
Ilia Mirkin
ee6b95c82c freedreno/a3xx: add support for dual-source blending
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-16 15:42:54 -04:00
Eric Anholt
cfa980f493 vc4: convert from tgsi semantic/index to varying-slot
(originally part of previous patch, split out to separate patch by Rob)

v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:07:08 -04:00
Eric Anholt
8fd3e53f3d gallium/ttn: Convert to using VARYING_SLOT_* / FRAG_RESULT_*.
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.

v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
    helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
    frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:03:53 -04:00
Ilia Mirkin
7a275fcda8 nv50, nvc0: fix max texture buffer size to 128M elements
This is what the hardware supports, there never was any sort of 64K
limit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-16 12:51:58 -04:00
Ilia Mirkin
eb081681df st/mesa: avoid integer overflows with buffers >= 512MB
This fixes failures with the newly-submitted max-size texture buffer
piglit test for GPUs exposing >= 128M max texels.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-16 12:51:58 -04:00
Brian Paul
1aff899a87 mesa: move GL_APPLE_object_purgeable functions to new file
Move this code out of bufferobj.c since it's not strongly connected to
buffer objects.

Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-16 09:02:40 -06:00
Brian Paul
8faed71830 mesa: remove trailing whitespace in bufferobj.c
Trivial.
2015-09-16 08:53:21 -06:00
Brian Paul
edc01c6704 mesa: whitespace, line wrap fixes in varray.c
Trivial.
2015-09-16 08:53:21 -06:00
Rob Clark
aecbc93f2d nir/print: print symbolic names from shader-enum
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 10:15:35 -04:00
Rob Clark
840df72f93 nir/print: bit of state refactoring
Rename print_var_state to print_state, and stuff FILE ptr into the state
object.  This avoids passing around an extra parameter everywhere.

v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-16 10:15:17 -04:00
Rob Clark
f2533f2f8c glsl: shader-enum to name debug fxns
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 10:04:13 -04:00
Rob Clark
5bb41d9094 freedreno: one screen to rule them all
Similar to fee0686c21, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.

Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 09:14:39 -04:00
Rob Clark
b3958f9f83 freedreno/ir3: use NIR to lower ffract instead of tgsi_lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 08:28:18 -04:00
Rob Clark
d9efe40dc9 nir: add lowering for ffract
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 08:27:36 -04:00
Jordan Justen
47e18a5957 i965/fs: The barrier send uses only 1 payload register
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.

fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.

These issues were flagged by:

commit cadd7dd384
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Thu Jul 2 15:41:02 2015 -0700

    i965/fs: Add a very basic validation pass

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-15 15:41:07 -07:00
Jason Ekstrand
cb503c3227 nir/builder: Use a normal temporary array in nir_channel
C++ gets cranky if we take references of temporaries.  This isn't a problem
yet in master because nir_builder is never used from C++.  However, it will
be in the future so we should fix it now.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 14:51:05 -07:00
Rob Clark
18385bc3ac freedreno/a4xx: more texture formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
d85267c4bb freedreno/a4xx: border-color support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
f8222724f5 freedreno/a4xx: wire up texture clamp lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
9124a49d54 freedreno: helper for a3xx/a4xx border-colors
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:01 -04:00
Rob Clark
76977222af freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 17:29:00 -04:00
Jason Ekstrand
29348631fe nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions.  We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction.  With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
   instructions in affected programs:     143274 -> 141801 (-1.03%)
   helped:                                667
   HURT:                                  0

It turns out that dot-products matter...

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
a88ce0c1c4 i965/vec4: Use the replicated fdot instruction in NIR
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
47739c7df4 nir: Add a fdot instruction that replicates the result to a vec4
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand
2458ea95c5 nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed.  This adds code to try and
coalesce into the destination of the instruction providing the value.

Shader-db results for vec4 shaders on Haswell:

   total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
   instructions in affected programs:     231230 -> 224563 (-2.88%)
   helped:                                1017
   HURT:                                  2

This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>.  Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:07 -07:00
Jason Ekstrand
2b2f1f16a0 nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable.  I think this was a relic
of the way that GLSL IR implements writemasks.  In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing.  In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone.  We can simplify things a good deal (and
make them correct) by removing this "compacting" step.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-15 11:13:48 -07:00
Jason Ekstrand
c951bb8305 i965/vec4_nir: Use partial SSA form rather than full non-SSA
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier.  In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:48 -07:00
Jason Ekstrand
c3f8cde964 nir/lower_vec_to_movs: Handle partially SSA shaders
v2 (Jason Ekstrand):
 - Use nir_instr_rewrite_dest
 - Pass the impl directly into lower_vec_to_movs_block

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:45 -07:00
Jason Ekstrand
b7eeced3c7 nir/lower_vec_to_movs: Pass the shader around directly
Previously, we were passing the shader around, we were just calling it
"mem_ctx".  However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:40 -07:00
Jason Ekstrand
cadd7dd384 i965/fs: Add a very basic validation pass
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's.  We can add more as
we find it to be useful.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-15 11:11:50 -07:00
Jason Ekstrand
0c6df7a1cb i965/fs_surface_builder: Only apply predicate to components that exist
In certain conditions, we have to do bounds-checking in the shader for
image_load_store.  The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds.  However, we were hard-coding 4 components which may not be
correct.  Instead, we should be using size which is the number of
components read.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-15 11:09:48 -07:00
Jason Ekstrand
5182400054 i965/fs: Only read output_components many components when writing an output
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 11:08:12 -07:00
Jason Ekstrand
f55836f567 i965/fs: Set output_components for lowered clip distance outputs
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 11:07:54 -07:00
Nanley Chery
8200793649 mesa/teximage: restrict GL_ETC1_RGB8_OES support to GLES
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:11:14 -07:00
Nanley Chery
48961fa3ba mesa/extensions: restrict GL_OES_EGL_image to GLES
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:00:00 -07:00
Nanley Chery
fe796a1831 mesa/extensions: restrict luminance alpha formats to API_OPENGL_COMPAT
According the GL 3.1 spec, luminance alpha formats are deprecated.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-09-15 10:00:00 -07:00
Thomas Hellstrom
edfb7ed109 gallium/svga: Enable PIPE_FORMAT_L8_UNORM for vgpu10
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-15 09:25:02 -07:00
Emil Velikov
a1ac742f70 egl/dri2: don't leak the fd on dri2_terminate
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:39:02 +01:00
Emil Velikov
bd5bcb5b8c egl/dri2/drm: compact existing device mgmt
Move the fcntl(dupfd_cloexec) to the else branch where it belongs.
Otherwise it's not immediately obvious that the code is hit, only when
an existing device is used.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:37:27 +01:00
Matt Turner
e4f0d26c8c egl/dri2: Close file descriptor on error.
v2: [Emil Velikov]
Rework the error path to a common goto, close only if we own the fd.
v3; [Emil Velikov]
Always close the fd (we either opened the device or dup'd) (Boyan, Ian)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-15 12:37:26 +01:00
Ray Strode
4bf151e662 gbm: convert gbm bo format to fourcc format on dma-buf import
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)

This commit addresses the problem in two ways:

1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created

2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.

Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-15 12:27:45 +01:00
Alejandro Piñeiro
a26e82b81d docs: document INTEL_DEBUG 'optimizer' envvar
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-15 08:33:35 +02:00
Kristian Høgsberg Kristensen
a548c75e31 i965: Move perf_debug code to brw_codegen_*_prog()
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:56:59 -07:00
Kristian Høgsberg Kristensen
84f2ed2cfd i965: Move brw_fs_precompile() to brw_wm.c
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:55:49 -07:00
Kristian Høgsberg Kristensen
dc70c86b9b i965: Move compute shader code around
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:

    brw_vs.c, brw_wm.c brw_gs.c:

        code to drive code generation and implement precompiling and
        cache search.

    genX_<stage>_state.c

        gen specific implementation of the state emission for the shader
        stage.

The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).

To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files.  Finally, move state setup
and atoms to gen7_cs_state.c.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-09-14 16:52:42 -07:00
Anuj Phogat
64e25167ed meta: Abort meta pbo path if TexSubImage need signed unsigned conversion
See similar fix for Readpixels in mesa commit 0d20790. Jason suggested
we need that for TexSubImage as well.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 15:22:37 -07:00
Ilia Mirkin
5877a594d5 nvc0/ir: start offset at texBindBase for txq, like regular texturing
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-14 17:26:25 -04:00
Eric Anholt
64aee8fe9f vc4: Fix build from recent NIR cleanups. 2015-09-14 11:21:07 -04:00
Antia Puentes
b8d2263c83 i965/vec4_nir: Load constants as integers
Loads constants using integer as their register type, like it is
done in FS backend.

No shader-db changes in HSW.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91716
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 12:11:46 +02:00
Antia Puentes
79f1a7ae28 i965/vec4: Fix saturation errors when coalescing registers
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.

This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.

For example,
   mov      vgrf7:D vgrf2:D
   mov.sat  m4:F vgrf7:F

is coalesced to:
   mov.sat  m4:D vgrf2:D

The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.

Shader-db results in HSW (only vec4 instructions shown):

total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs:     74 -> 75 (1.35%)
helped:                                0
HURT:                                  1
GAINED:                                0
LOST:                                  0

Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-14 12:11:46 +02:00
Tapani Pälli
d1bce52e13 docs: cleanups + mark some work as done
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-14 09:29:30 +03:00
Ilia Mirkin
f0b9d53262 docs: only astc ldr required for ES3.2, not hdr
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-14 02:08:42 -04:00
Ilia Mirkin
67d2d3ba43 st/mesa: emit TXQS, support ARB_shader_texture_image_samples
The image component of the ext is a no-op since there is no image support
in gallium (yet).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 18:24:45 -04:00
Ilia Mirkin
ec3fe42b3a r600g: add support for TXQS tgsi opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-13 18:24:44 -04:00
Ilia Mirkin
4294db90b1 nv50/ir: add support for TXQS tgsi opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-13 18:24:44 -04:00
Ilia Mirkin
f46a53ffa5 gallium: add PIPE_CAP_TGSI_TXQS to let st know if TXQS is supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-09-13 18:24:37 -04:00
Ilia Mirkin
d173c5e77d tgsi: add a TXQS opcode to retrieve the number of texture samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-09-13 18:24:01 -04:00
Jordan Justen
c4cf824658 glsl/cs: Initialize gl_LocalInvocationIndex in main()
We initialize gl_LocalInvocationIndex based on the extension spec
formula:

    gl_LocalInvocationIndex =
        gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
        gl_LocalInvocationID.y * gl_WorkGroupSize.x +
        gl_LocalInvocationID.x;

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:17 -07:00
Jordan Justen
6823e12d5a glsl/cs: Exclude gl_LocalInvocationIndex from builtin variable stripping
We lower gl_LocalInvocationIndex based on the extension spec formula:

    gl_LocalInvocationIndex =
        gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
        gl_LocalInvocationID.y * gl_WorkGroupSize.x +
        gl_LocalInvocationID.x;

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

We need to set this variable in main(), even if gl_LocalInvocationIndex
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate it as a dead variable.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
2b6cc0395b glsl/cs: Initialize gl_GlobalInvocationID in main()
We initialize gl_GlobalInvocationID based on the extension spec
formula:

    gl_GlobalInvocationID =
        gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
c4d049f646 glsl: Move link_get_main_function_signature to a common location
Also rename to _mesa_get_main_function_signature.

We will call it near the end of compilation to insert some code into
main for initializing some compute shader global variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
34e187ec38 glsl/cs: Don't strip gl_GlobalInvocationID and dependencies
We lower gl_GlobalInvocationID based on the extension spec formula:

    gl_GlobalInvocationID =
        gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID

https://www.opengl.org/registry/specs/ARB/compute_shader.txt

We need to set this variable in main(), even if gl_GlobalInvocationID
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate these as dead variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
c5743a5d7f i965/nir: Support gl_WorkGroupID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
4e454cb7c6 i965/cs: Initialize gl_WorkGroupID variable from payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
4f178f0d8b nir: Add gl_WorkGroupID system variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
f5bb5a1bf1 glsl/cs: Add gl_WorkGroupID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
49f999b9cb i965/nir: Support gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
43624361df i965/cs: Initialize gl_LocalInvocationID from payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
b94b57f7c5 i965/cs: Initialize gl_LocalInvocationID in push constant data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
c7161a3c35 i965/cs: Reserve local invocation id in payload regs
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-13 09:53:16 -07:00
Jordan Justen
62e011d593 nir: Add gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen
bf8d6e501c glsl/cs: Add gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Krzesimir Nowak
08ceb5e076 softpipe: Change faces type to uint
This is to avoid needless float<->int conversions, since all
face-related computations are made on integers. Spotted by Emil
Velikov.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 09:50:21 -06:00
Rob Clark
59519c2283 freedreno/ir3: fix compile warn after 1807a08e
New enum to add to switch so compiler doesn't complain.

   commit 1807a08e4f
   Author:     Ilia Mirkin <imirkin@alum.mit.edu>
   AuthorDate: Thu Aug 27 23:05:03 2015 -0400
   Commit:     Ilia Mirkin <imirkin@alum.mit.edu>
   CommitDate: Thu Sep 10 17:38:33 2015 -0400

       nir: add nir_texop_texture_samples and convert from glsl

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-13 11:31:45 -04:00
Rob Clark
bf45a7d28e freedreno/ir3: fix compile break after a4aa25be
Following commit dropped the unused memctx arg:

   commit a4aa25be1e
   Author:     Jason Ekstrand <jason.ekstrand@intel.com>
   AuthorDate: Wed Sep 9 13:24:35 2015 -0700
   Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
   CommitDate: Fri Sep 11 09:21:20 2015 -0700

       nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-13 11:31:30 -04:00
Rob Clark
b88aeff4f5 nir: add nir_channel() to get at single components of vec's
Rather than make yet another copy of channel(), let's move it into nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-13 11:08:27 -04:00
Rob Clark
86358e949e tgsi/scan: add support to figure out max nesting depth
Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
know.  And pretty trivial for scan to figure this out for us.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-13 11:08:27 -04:00
Kai Wasserbäch
d6fbcf6ee2 r600: Fix llvm build since const buffer changes
In commit f9caabe8f1:

One place in r600_llvm.c was forgotten when replacing
R600_UCP_CONST_BUFFER with R600_BUFFER_INFO_CONST_BUFFER.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91985
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-09-13 07:09:08 +10:00
Jason Ekstrand
1037e0a84f i965/vec4: Don't reswizzle hardware registers
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91719
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-12 10:46:26 -07:00
Jason Ekstrand
dd7290cf59 i965/emit: Add assertions for accumulator restrictions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-12 10:46:26 -07:00
Emil Velikov
7852a44e3c docs: add news item and link release notes for 11.0.0
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-12 13:50:33 +01:00
Emil Velikov
c34ed46217 docs: add sha256 checksums for 11.0.0
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c4bae5792b)
2015-09-12 13:48:15 +01:00
Emil Velikov
09223bfa9b docs: Update 11.0.0 release notes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4f1e500150)
2015-09-12 13:48:14 +01:00
Glenn Kennard
ce34048b57 r600: Enable fp64 on chips with native support
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset
of the needed fp64 ops, and don't do GL4 anyway.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 07:32:08 +01:00
Glenn Kennard
d2ca9afd5d r600g: Support I2D/U2D/D2I/D2U
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support.
Uses float intermediate values so only accurate for int24 range, which
matches what the blob does.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 07:30:10 +01:00
Dave Airlie
f9caabe8f1 r600g: lower number of driver const buffers
I'm going to want a driver constant buffer for tess to coordinate
LDS storage, so before I go tackling that I decided to merge the
clip/samplepos and texture info buffers into one. So I can steal
the spare one.

This creates a single constant buffer between the two, with
clip/samplepos taking up a reserved 128 bytes at the start.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 06:56:58 +01:00
Dave Airlie
0337a9b2af r600: define some values for the fetch constant offsets.
This just puts these in one place and #defines them.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-12 06:56:51 +01:00
Thomas Helland
2e7e3fe55f docs: Update with GLES3.2 entries and status
V2: -Change to "not started" for most entries
    -Add status for multisample_2d_array
    -Change shader_multisample_interpolation to "not_stared"

V3 (idr): Move the GLES 3.2 section after the "Additional functions"
section from GLES 3.1.  Note that GL_KHR_texture_compression_astc_hdr is
done for i965 on gen9+ hardware.  Note that GL_OES_shader_io_blocks is
based on some features from GLSL 1.50.

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v2]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-11 18:46:43 -07:00
Krzesimir Nowak
2135aba8d9 softpipe: Constify variables
This commit makes a lot of variables constant - this is basically done
by moving the computation to variable definition. Some of them are
moved into lower scopes (like in img_filter_2d_ewa).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:37:00 -06:00
Krzesimir Nowak
231687c19b softpipe: Constify sp_tgsi_sampler
Add a small inline function doing the casting - this is to make sure
we don't do a cast from some completely unrelated type. This commit
does not make tgsi_sampler parameters const in vfuncs themselves for
now - probably llvmpipe would need looking at before making such a
change.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:54 -06:00
Krzesimir Nowak
ac23116de5 softpipe: Constify sampler and view parameters in mip filters
Those functions actually could always take them as constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:47 -06:00
Krzesimir Nowak
ea764baa61 softpipe: Constify sampler and view parameters in img filters
Those functions actually could always take them as constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:43 -06:00
Krzesimir Nowak
ba72e6cfb8 tgsi, softpipe: Constify tgsi_sampler in query_lod vfunc
A followup from previous commit - since all functions called by
query_lod take pointers to const sp_sampler_view and const sp_sampler,
which are taken from tgsi_sampler subclass, we can the tgsi_sampler as
const itself now.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:38 -06:00
Krzesimir Nowak
ea0fecd1a3 softpipe: Constify some sampler and view parameters
This is to prepare for making tgsi_sampler parameter in query_lod a
const too. These functions do not modify anything in either sampler or
view anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:32 -06:00
Krzesimir Nowak
4ca2896e8e softpipe: Move the faces array from view to filter_args
With that, sp_sampler_view instances are not abused anymore as a local
storage, so we can later make them constant.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 15:36:23 -06:00
Jason Ekstrand
ca11c3c0a4 nir/from_ssa: Use instr_rewrite_dest
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
cee29220e3 nir: Add a function for rewriting instruction destinations
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
106a3b2cc3 nir: Only unlink sources that are actually valid
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
a4aa25be1e nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand
8c8fc5f833 nir: Fix a bunch of ralloc parenting errors
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader.  In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.

We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:04 -07:00
Jason Ekstrand
794355e771 nir/lower_outputs_to_temporaries: Reparent the output name
We copy the output, make the old output the temporary, and give the
temporary a new name.  The copy keeps the pointer to the old name.  This
works just fine up until the point where we lower things to SSA and delete
the old variable and, with it, the name.  Instead, we should re-parent to
the copy.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 08:55:51 -07:00
Alejandro Piñeiro
d4e29af234 i965/vec4: check writemask when bailing out at register coalesce
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs:     1238390 -> 1191754 (-3.77%)
helped:                                12782
HURT:                                  0
GAINED:                                0
LOST:                                  0

v2: removed some parenthesis, fixed indentation, as suggested by
    Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-11 17:43:22 +02:00
Brian Paul
2c52c794d7 tgsi,softpipe: capitalize the tgsi_sampler_control enum values
We use capitalized enum values everywhere else.
This improves understanding a bit too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-11 08:50:10 -06:00
Kenneth Graunke
b811085b79 nir: Store some geometry shader data in nir_shader.
This makes it possible for NIR shaders to know the number of output
vertices and the number of invocations.  Drivers could also access
these directly without going through gl_program.

We should probably add InputType and OutputType here too, but currently
those are stored as GL_* enums, and I wanted to avoid using those in
NIR, as I suspect Vulkan/SPIR-V will use different enums.  (We should
probably make our own.)

We could add VerticesIn, but it's easily computable from the input
topology, so I'm not sure whether it's worth it.  It's also currently
not stored in gl_shader (only gl_shader_program), which would require
changes to the glsl_to_nir interface or require us to store it there.

This is a bit of duplication of data...ideally, we would factor these
substructs out of gl_program, gl_shader_program, and nir_shader, creating
a gl_geometry_info class...but it would need to go in a new place (in
src/glsl?) that isn't mtypes.h nor nir.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:05:09 -07:00
Kenneth Graunke
cb2b118e40 nir/builder: Add nir_load_var() and nir_store_var() helpers.
These provide a convenient way to do simple variable loads and stores.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:04:17 -07:00
Kenneth Graunke
4654439fdd glsl: Use hash tables for opt_constant_propagation() kill sets.
Cuts compile/link time of the fragment shader in #91857 by 19%
(16.28 -> 13.05).

I didn't bother with the acp sets because they're smaller, but it
might be worth doing as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Kenneth Graunke
e20f30eb51 i965: Use hash tables for brw_fs_vector_splitting().
Cuts compile/link time of the fragment shader in #91857 by 25%
(21.64 -> 16.28).

v2: Drop unnecessary _mesa_hash_table_destroy call, and use
    refs.ht->entries == 0 rather than ad-hoc checking (suggested by
    Timothy Arceri).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Kenneth Graunke
2fc0ce293a glsl: Use hash tables in opt_constant_variable().
Cuts compile/link time of the fragment shader in bug #91857 by 31%
(31.79 -> 21.64).  It has over 8,000 variables so linked lists are
terrible.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91857
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-11 00:01:24 -07:00
Ian Romanick
4603723722 meta: Use result of texture coordinate clamping operation
Previously the result of the complicated clamp() expression just dropped
on the floor: clamp does not modify any of its parameters.  Looking at
the surrounding code, I believe this is supposed to modify the value of
tex_coord.

This change (along with a change to avoid the use of
brw_blorp_framebuffer) does not affect any existing piglit tests.  I'm
not sure what this clamp is trying to accomplish, so I'm not sure how to
write a test to exercise this path.

I also noticed another bug in this code.  There is no way the array
texture case could possibly work.  This will generate code for the
TEXEL_FETCH macro like:

    #define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord), sample_map[int(2 * fract(coord.x))]);

Since the coord parameter of this macro is a vec2 at all invocations, no
expansion of this macro will even compile.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
767c33e881 meta: Always bind the texture
We may have been called from glGenerateTextureMipmap with CurrentUnit
still set to 0, so we don't know when we can skip binding the texture.
Assume that _mesa_BindTexture will be fast if we're rebinding the same
texture.

v2: Remove currentTexUnitSave because it is now unused.  Suggested by
both Neil and Anuj.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91847
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
86c0a2d574 i915, i965: Silence unused parameter warnings in intel_batchbuffer_advance
These only occurred in release builds, but they occurred in every file
that included intel_batchbuffer.h.  Lots of spam. :(

intel_batchbuffer.h: In function 'intel_batchbuffer_advance':
intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_batchbuffer_advance(struct brw_context *brw)
                                               ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
307d5e5849 i915: Silence unused parameter warning in intel_miptree_create_layout
The for_bo parameter of intel_miptree_create_layout appears to be unused
since 27eedca when Eric removed some Gen5 code (after the i915 and i965
drivers parted ways).

intel_mipmap_tree.c: In function 'old_intel_miptree_create_layout':
intel_mipmap_tree.c:77:35: warning: unused parameter 'for_bo' [-Wunused-parameter]
                             bool for_bo)
                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
5c8aa21309 i915, i965: Silence unused parameter warnings in intel_miptree_unmap_gtt
intel_mipmap_tree.c: In function 'intel_miptree_unmap_gtt':
intel_mipmap_tree.c:777:34: warning: unused parameter 'map' [-Wunused-parameter]
    struct intel_miptree_map *map,
                                  ^
intel_mipmap_tree.c:778:17: warning: unused parameter 'level' [-Wunused-parameter]
    unsigned int level,
                 ^
intel_mipmap_tree.c:779:17: warning: unused parameter 'slice' [-Wunused-parameter]
    unsigned int slice)
                 ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
0412231266 i915: Silence unused parameter warnings
intel_mipmap_tree.c: In function 'old_intel_miptree_unmap_raw':
intel_mipmap_tree.c:726:51: warning: unused parameter 'intel' [-Wunused-parameter]
 intel_miptree_unmap_raw(struct intel_context *intel,
                                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
20915dd2e0 i915: Remove prototype for nonexistent brw_miptree_layout
Hasn't existed in the i915 source since the i915 and i965 drivers parted
ways.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
31f0967fb5 i965: Make intel_miptree_map_raw static
This hasn't been used outside intel_mipmap_tree.c since d5d4ba9 started
using meta instead of the blitter for PBO TexSubImage.  While we're
here, remove the unused brw parameter from the function formerly known
as intel_miptree_unmap_raw.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
68b44dd5b2 i915, i965: Silence unused parameter warnings in intel_mipmap_tree.h
These only occurred in release builds, but they occurred in every file
that included intel_mipmap_tree.h.  Lots of spam. :(

intel_mipmap_tree.h: In function 'intel_miptree_check_level_layer':
intel_mipmap_tree.h:595:59: warning: unused parameter 'mt' [-Wunused-parameter]
 intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
                                                           ^
intel_mipmap_tree.h:596:42: warning: unused parameter 'level' [-Wunused-parameter]
                                 uint32_t level,
                                          ^
intel_mipmap_tree.h:597:42: warning: unused parameter 'layer' [-Wunused-parameter]
                                 uint32_t layer)
                                          ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:51 -07:00
Ian Romanick
094877f9d2 i965: Silence unused parameter warnings in intel_mipmap_tree.c
The target parameter of compute_msaa_layout appears to be unused since
83b83fb when support for CMS textures was added for Gen7.

The brw parameter of intel_get_non_msrt_mcs_alignment appears to be
unused since e92fbdc when the GEN check (along with the "can we fast
clear" decision) was moved to a different function.

intel_mipmap_tree.c: In function 'compute_msaa_layout':
intel_mipmap_tree.c:62:73: warning: unused parameter 'target' [-Wunused-parameter]
 compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
                                                                         ^
intel_mipmap_tree.c: In function 'intel_get_non_msrt_mcs_alignment':
intel_mipmap_tree.c:143:54: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
                                                      ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
2015-09-10 20:29:50 -07:00
Ian Romanick
38e412d548 i965: Silence unused parameter warnings in intel_fbo.c
intel_fbo.c: In function 'intel_alloc_window_storage':
intel_fbo.c:415:48: warning: unused parameter 'ctx' [-Wunused-parameter]
 intel_alloc_window_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
                                                ^
intel_fbo.c: In function 'intel_nop_alloc_storage':
intel_fbo.c:428:74: warning: unused parameter 'rb' [-Wunused-parameter]
 intel_nop_alloc_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
                                                                          ^
intel_fbo.c:429:32: warning: unused parameter 'internalFormat' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                ^
intel_fbo.c:429:55: warning: unused parameter 'width' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                                       ^
intel_fbo.c:429:69: warning: unused parameter 'height' [-Wunused-parameter]
                         GLenum internalFormat, GLuint width, GLuint height)
                                                                     ^
intel_fbo.c: In function 'intel_blit_framebuffer_with_blitter':
intel_fbo.c:790:61: warning: unused parameter 'filter' [-Wunused-parameter]
                                     GLbitfield mask, GLenum filter)
                                                             ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-10 20:29:50 -07:00
Dave Airlie
b46cbc3607 st/mesa: set the vbuffer to NULL if we are skipping it
If we skip a vbuffer we need to make sure we NULL out
the contents, otherwise when it gets passed to the driver
it will get confused.

This was hit by:
GL41-CTS.gpu_shader_fp64.varyings

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-11 03:05:42 +01:00
Jordan Justen
34cff76fc2 i965/cs: Enable barrier in MEDIA_INTERFACE_DESCRIPTOR
Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
barrier() GLSL function.

On Ivy Bridge and Haswell, this allows the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass. On gen8, this enables a similar test with a local group size
of 896 to pass.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
b01d047391 i965/cs: Emit texture surfaces to enable CS sampling
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
1180b79487 i965: Set up sampler state for compute shaders
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
af48612b88 i965/fs: Set first_non_payload_grf in assign_curb_setup
first_non_payload_grf may be updated in assign_urb_setup for FS or
assign_vs_urb_setup for VS.

We need to set this in assign_curb_setup for compute shaders since cs
does not have an assign_cs_urb_setup like assign_urb_setup (fs) or
assign_vs_urb_setup (vs).

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
75d04e561b i965: Support compute shaders in is_scalar_shader_stage()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Jordan Justen
2b9c35945a i965: Support CS in update_stage_texture_surfaces
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-10 16:46:29 -07:00
Ilia Mirkin
bfc5ace5bd i965: enable ARB_shader_texture_image_samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:39:46 -04:00
Ilia Mirkin
55ebaa6d00 i965: add handling for imageSamples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:55 -04:00
Ilia Mirkin
56238305e5 nir: convert glsl imageSamples into a new intrinsic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:52 -04:00
Ilia Mirkin
37c5c86281 glsl: add support for the imageSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:49 -04:00
Ilia Mirkin
0b91bcea98 i965: add support for textureSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:45 -04:00
Ilia Mirkin
0c7fbcb844 glsl: add support for the textureSamples function
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:41 -04:00
Ilia Mirkin
fb18ee9ba6 glsl: add ARB_shader_texture_image_samples infrastructure
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:37 -04:00
Ilia Mirkin
1807a08e4f nir: add nir_texop_texture_samples and convert from glsl
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:33 -04:00
Ilia Mirkin
f9052914e9 glsl: add ir_texture_samples texture opcode
Will be used for textureSamples()

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:29 -04:00
Ilia Mirkin
6efae687b7 mesa: add infra for ARB_shader_texture_image_samples
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:37:05 -04:00
Ian Romanick
284dcad20a i965: Fix typos in license
grep -lr 'sub license' | while read f; do \
    sed --in-place -e 's/sub license/sublicense/' $f ;\
    done

grep -lr 'NON-INFRINGEMENT' | while read f; do \
    sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\
    done

As noted by Matt, both of these changes match the MIT license text found
at http://opensource.org/licenses/MIT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-10 11:36:30 -07:00
Ian Romanick
aa1a5c0c9e i965: Remove horizontal bars from file header comments
Why was that ever a thing?

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-09-10 11:36:03 -07:00
Brian Paul
a9b143a648 svga: clean up the compile_vs/gs/fs() functions
Sipmlify structure and remove gotos.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
289804515f svga: fix shader variant memory leak
Fixes a small leak in a seldom-hit corner case for VS/FS compilation.
Found with coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
ece33f9687 svga: remove useless MAX2() call
The sum of two unsigned ints is always >= 0.  Found with Coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Brian Paul
bc75fe214d winsys/svga: remove useless assertion
An unsigned int is always >= 0.  Found with Coverity.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-10 12:23:46 -06:00
Emil Velikov
9de62819c9 docs: add news item and link release notes for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 19:12:38 +01:00
Emil Velikov
ded289e348 docs: add sha256 checksums for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8789dd627c)
2015-09-10 19:10:58 +01:00
Emil Velikov
e3c5aeee71 docs: add release notes for 10.6.7
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 32efdc87cb)
2015-09-10 19:10:57 +01:00
Krzesimir Nowak
423a1dca2f docs: Update wrt. textureQueryLod on softpipe
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
60905f2b19 softpipe: Implement and enable textureQueryLod
Passes the shader piglit tests and introduces no regressions.

This commit finally makes use of the refactoring in previous
commits.

v2:
  - adapted the code to changes in previous commits (renames,
    need_cube_convert stuff)
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
263d4a7406 tgsi: Add code for handling lodq opcode
This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.

v2:
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
d71a3be860 softpipe: Add functions for computing relative mipmap level
These functions will be used by textureQueryLod.

v2:

  - renamed mip_level_* funcs to mip_rel_level_* to indicate that
    these functions return mip level relative to base level and
    documented them
  - renamed a level member in sp_filter_funcs struct to relative_level
  - changed mip_rel_level_none and mip_rel_level_nearest to return mip
    level relative to base level, mip_rel_level_linear already did
    that
  - documented clamp_lod function

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
ac3637dda0 softpipe: Split 3D to 2D coords conversion into separate function
This is to avoid tying the conversion to the sampling -
textureQueryLod will need to do the conversion too, but it does not do
any sampling.

So instead of a "get_samples" vfunc, there is just a bool saying
whether the conversion is needed or not. This solution keeps a nice
property of not adding any overhead for the common case (2D textures).

v2:
  - replaced the "convert_coords" vfunc with a "need_cube_convert"
    boolean to avoid overhead of copying arrays in common case
  - removed an unused typedef
  - splitted too long lines in convert_cube
  - const fixes in convert_cube

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
380a3c0804 softpipe: Split code getting a filter into separate function
This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.

v2:
  - adapted to changes in previous commit (renames)
  - simplified conditions a bit
  - updated docs
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
b9bc6c42c9 softpipe: Put mip_filter_func inside a struct
Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning relative mip level.

v2:
  - renamed sp_mip struct to sp_filter_funcs
  - renamed sp_filter_funcs instances from mip_foo to funcs_foo
  - splitted too long lines
  - sp_sampler now holds a pointer to sp_filter_funcs instead of an
    instance of it
  - some const fixes

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
16084cd2cf softpipe: Split compute_lambda_lod into two functions
textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.

v2:
  - changed the "not_clamped" part to "unclamped"
  - corrected "clamp into" to "clamp to"
  - splitted too long lines

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:14 -06:00
Krzesimir Nowak
bdc69552ca softpipe: Fix textureLod with nonzero GL_TEXTURE_LOD_BIAS value
The level-of-detail bias wasn't simply added in the explicit LOD case.
This case seems to be tested only in piglit's
fs-texturequerylod-nearest-biased test, which is currently skipped, as
softpipe does not support textureQueryLod at the moment.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:13 -06:00
Krzesimir Nowak
85500fe2e1 tgsi: Remove trailing backslash in comment
It clearly is here by accident.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-10 09:45:13 -06:00
Marek Olšák
b409524fef gallium/radeon: handle PIPE_TRANSFER_FLUSH_EXPLICIT
Basically, do the same thing as for buffer_unmap, but use the explicit range
instead. It's for apps which want to map a whole buffer and mark touched
ranges explicitly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
60ec8fb448 radeonsi: don't update polygon offset state if it has no effect
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
afa752d3f0 radeonsi: decrease the size of si_pm4_state
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
6a684ff67e radeonsi/compute: add buffers to the CS directly
Packets are emitted immediately anyway.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
2176b3b09f radeonsi: only use new versions of LLVM image and sample intrinsics
Just a cleanup I had made a long time ago and forgot about.

v2: use tgsi_is_shadow_target

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
e6d3846dd0 gallium/radeon: drop support for LLVM 3.4
This allows using the new tex instrinsics unconditionally.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
5fbfd8dd23 r600/llvm: remove dead code for LLVM 3.3
LLVM 3.3 has been unsupported for quite a while.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
5c6c5b5246 r600g: use pipe_resource::width0 instead pb_buffer::size
pb_buffer::size was aligned by 29aaab2b5f,
which broke the CMASK code I think.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91881

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
7956eae1c7 radeonsi: enable VGPR spilling on VI
This fixes corruption in Unigine Heaven on VI

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-10 17:14:15 +02:00
Marek Olšák
c6502e880b winsys/amdgpu: calculate the maximum number of compute units
Required for register spilling.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-10 17:14:15 +02:00
Jon TURNEY
adeba943e1 Use IMP_LIB_EXT when checking for LLVM shared libraries
When checking for LLVM shared libraries, use IMP_LIB_EXT for the extension for
shared libraries appropriate to the target, rather than hardcoding '.so'

Also add some comments to explain why we have this circus of pain.

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 15:09:30 +01:00
Rhys Kidd
2c3007652d i965: Resolve GCC sign-compare warning.
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_control_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:805:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) {
                      ^
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_source_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:839:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) {
                      ^
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c: In function 'dump_sampler_state':
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c:382:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (i = 0; i < size / 16; i++) {
                  ^
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c: In function 'brw_pipeline_state_finished':
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:801:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (i != pipeline) {
             ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen7_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1544:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                               ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen8_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1638:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                            ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_hiz':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1771:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int level = mt->first_level; level <= mt->last_level; ++level) {
                                            ^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1775:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int layer = 0; layer < mt->level[level].depth; ++layer) {
                                 ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
1c194840fd mesa: Resolve GCC sign-compare warning.
mesa/src/mesa/program/prog_to_nir.c: In function 'setup_registers_and_variables':
/mesa/src/mesa/program/prog_to_nir.c:1059:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < c->prog->NumTemporaries; i++) {
                      ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
32cdb49fe2 glsl: Resolve GCC sign-compare warning.
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:63:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = 0; i < tex->num_srcs; i++) {
                         ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:114:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = proj_index + 1; i < tex->num_srcs; i++) {
                                      ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:53:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (proj_index = 0; proj_index < tex->num_srcs; proj_index++) {
                                       ^
mesa/src/glsl/nir/nir_lower_tex_projector.c:57:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (proj_index == tex->num_srcs)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < num_components; ++i)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:110:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          for (int i = 0; i < num_components; ++i) {
                            ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:139:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             if (i < num_components)
                   ^
mesa/src/glsl/nir/nir_opt_peephole_ffma.c: In function 'get_mul_for_src':
mesa/src/glsl/nir/nir_opt_peephole_ffma.c:130:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (unsigned i = 0; i < num_components; i++)
                           ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Rhys Kidd
548bf70fd2 mesa: Resolve GCC missing field initializer warning.
Resolve a series of missing field initializer warnings within get_hash_params.py

Of the form:
In file included from mesa/src/mesa/main/get.c:495:0:
mesa/src/mesa/main/get_hash.h:180:5: warning: missing initializer for field
'extra' of 'const struct value_desc' [-Wmissing-field-initializers]
     { GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES, LOC_CUSTOM, TYPE_INT, 0 },
     ^
mesa/src/mesa/main/get.c:165:15: note: 'extra' declared here
    const int *extra;
               ^

This patch addresses some likely code rot around the *extra field, where the
initialization is via C code generated indirectly from a Python script.
It resolves a number of warnings reported by GCC when configured to be pedantic.

$ gcc --version
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2

No piglit regressions on Ironlake.

v2:
- Squash series into a single patch.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Albert Freeman
1691ead1b8 clover: Avoid using typename to allow compilation of clover by clang
When parsing an variable declaration qualified with the typename
keyword, clang attempted to declare a variable with the type of non
type member "enum type type" of module::argument (within the header
file clover/core/module.hpp) instead of the typed member of
module::argument "enum type".

Replaced "typename" with "enum" to force clang to declare the variable
marg_type with type "enum type" of module::argument.

CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Albert Freeman <albertwdfreeman@gmail.com>
2015-09-10 14:56:40 +01:00
Kenneth Graunke
bf58a2c362 i965: Advertise 65536 for GL_MAX_UNIFORM_BLOCK_SIZE.
Our old value of 16384 is the minimum value.  DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise.  AMD advertises MAX_INT.

Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit.  Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders.  While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.

We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger.  For now, matching
nVidia and Intel/Windows seems like a good plan.

We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.

According to Tapani, there's an unreleased game that asserts on this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 02:26:26 -07:00
Ilia Mirkin
74b86b971f nv50/ir: don't fold immediate into mad if registers are too high
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 05:03:24 -04:00
Ilia Mirkin
ce28ca7133 nv50/ir: fix emission of 8-byte wide interp instruction
This can come up if the target register number is > 63, which is fairly
rare.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91551
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 04:30:45 -04:00
Ilia Mirkin
641eda0c79 nv50/ir: r63 is only 0 if we are using less than 63 registers
It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 04:30:45 -04:00
Ilia Mirkin
a072ef8748 nv50/ir: make edge splitting fix up phi node sources
Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so
the mapping between source and the actual BB is by inbound edge order.
So when manipulating edges one has to be extremely careful. We were
insufficiently careful when splitting critical edges which resulted in
the phi nodes being confused as to where their sources were coming from.

This primarily manifests itself with the TXL-lowering logic on nv50,
when it is inside of a conditional. I've been unable to trigger the
issue anywhere else so far. This resolves rendering failures
in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2,
XCOM:Enemy Unknown, Stacking. It also improves the situation in
Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief.
However more work needs to be done there (splitting a lot more edges
solves it, so it's some other sort of RA-related issue).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-10 03:11:31 -04:00
Ian Romanick
13a974f9ae glsl: Remove ADD_VARYING macro
The purpose of the macro was to create the name_as_gs_input from name.
The previous commit removed the name_as_gs_input from add_varying, so
the macro is unnecessary.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-09 19:15:15 -07:00
Ian Romanick
bd0245b8b2 glsl: Silence unused parameter warnings
builtin_variables.cpp:1062:53: warning: unused parameter 'name_as_gs_input' [-Wunused-parameter]
                                         const char *name_as_gs_input)
                                                     ^
builtin_functions.cpp:4774:47: warning: unused parameter 'intrinsic_name' [-Wunused-parameter]
                                   const char *intrinsic_name,
                                               ^
builtin_functions.cpp:4907:66: warning: unused parameter 'state' [-Wunused-parameter]
 _mesa_glsl_find_builtin_function_by_name(_mesa_glsl_parse_state *state,
                                                                  ^
builtin_functions.cpp:4915:49: warning: unused parameter 'num_arguments' [-Wunused-parameter]
                                        unsigned num_arguments,
                                                 ^
builtin_functions.cpp:4916:49: warning: unused parameter 'flags' [-Wunused-parameter]
                                        unsigned flags)
                                                 ^
ir_print_visitor.cpp:589:37: warning: unused parameter 'ir' [-Wunused-parameter]
 ir_print_visitor::visit(ir_barrier *ir)
                                     ^
linker.cpp:3212:48: warning: unused parameter 'ctx' [-Wunused-parameter]
 build_program_resource_list(struct gl_context *ctx,
                                                ^
standalone_scaffolding.cpp:65:57: warning: unused parameter ‘id’ [-Wunused-parameter]
 _mesa_shader_debug(struct gl_context *, GLenum, GLuint *id,
                                                         ^

v2: Rebase on top of GL_ARB_shader_image_size work (especially
58a86897).  Silence more warnings added by that work.

v3: Remove mention of the removed parameter from comments.  Suggested by
Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "Martin Peres <martin.peres@linux.intel.com>"
2015-09-09 19:15:15 -07:00
Ilia Mirkin
342e68dc60 nvc0: remove BGRA4 format support
Something is wrong with the support somewhere. I couldn't get the blob
driver to use it either, although it happily used RGB5_A1.
teximage-colors works, but WoW seems to fail in the menus for drawing
text.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91526
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 21:54:47 -04:00
Rob Clark
9ce2e30726 gallium/ttn: fix cursor handling vs builder
After inserting instructions the cursor.option becomes _after_instr
(even if it started life as an _after_block).  So we cannot simply stash
the current cursor on the if/loop_stack.  Otherwise we end up inserting
instructions after the endif/endloop in the block preceeding the if/
loop.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-09 17:34:47 -04:00
Ilia Mirkin
e50c01d5af nvc0: keep track of cb bindings per buffer, use for upload settings
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.

So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.

This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 16:29:21 -04:00
Jason Ekstrand
b828f7a27b nir/glsl: Use lower_outputs_to_temporaries instead of relying on GLSL IR
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:38 -07:00
Jason Ekstrand
1dbe4af9c9 nir: Add a pass to lower outputs to temporary variables
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:21 -07:00
Jason Ekstrand
f5e08ab6b1 nir/cursor: Add a constructor for the end of a block but before the jump
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:28:51 -07:00
Hans de Goede
3e9df0e3af nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:

 [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
 [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
 [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12

Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.

To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 12:10:20 -04:00
Hans de Goede
ac066bf65c nv30: Fix color resolving for nv3x cards
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.

This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-09 11:57:34 -04:00
Rob Clark
30a915bd17 gallium/docs: clairify dmabuf fd ownership
Since debugging issues w/ fd's close()d at the wrong time can be quite
fun, this should probably be made more explicit in the docs.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-09 11:24:56 -04:00
Mauro Rossi
c12ffb30b4 android: radeonsi: add support for sid_tables.h generated sources
This patch is necessary to avoid building error on android,
due to missing sid_tables.h generated sources

v2:[Emil Velikov] Correctly split the lists.

Fixes: fbbebeae10f(radeonsi: inline si_cmd_context_control)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:27:31 +01:00
Mauro Rossi
8056b3ffeb android: Always define __STDC_LIMIT_MACROS.
Analogous to commit 02a4fe22b1 (configure.ac: Always define
__STDC_LIMIT_MACROS.)

v2: [Emil Velikov] keep the LLVM specific __STDC_FORMAT_MACROS

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:26:46 +01:00
Mauro Rossi
5235bfe7b7 android: rename LLVM_VERSION_PATCH to MESA_LLVM_VERSION_PATCH
Fixes: 797f4eacea8(configure.ac: rename LLVM_VERSION_PATCH to avoid
conflict with llvm-config.h)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:26:06 +01:00
Mauro Rossi
e838d91b94 nouveau: android: add space before PRIx64 macro
Otherwise the android build fails with

   error : unable to find string literal operator ‘operator"" PRIx64’

There are several resources referring to the problem, which is related
to c++11, in our case used when building mesa for lollipop.

http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883

I've not investigated all the semantics, some people even suggested a
bug in the gcc compiler,
I just saw the building error was solved with one little space for
lollipop and no side effect when c+11 not used.

v2: [Emil Velikov] add an alternative commit message from Mauro.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 15:25:35 +01:00
Emil Velikov
d9df8c2fa2 svga: pick all the files into the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-09-09 14:52:34 +01:00
Emil Velikov
0d39279448 auxiliary: rework the python generated sources rules
There are a few bits this commit aims to resolve:

One can generalise the mkdir rule to a simple MKDIR_P $(@D) which will
expand appropriately for even if we change the subdir name, and/or add
new rules. We can also drop the explicit $(srcdir) prefix for the
dependency rules, they they are not strictly required, nor used
elsewhere in mesa.

Finally replace $< with explicit filename to be consistent through the
file, and honour PYTHON_FLAGS.

v2: Add comprehensive commit summary/message (Ian, Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:50 +01:00
Emil Velikov
c373eaedfc glsl: build: remove bogus dependency
v2: rebase on top of the previous commit - don't touch the LOCAL_PATH
prefix for nir_constant_expressions.h

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:47 +01:00
Emil Velikov
a3b05e0492 glsl: build: use makefile.sources variables when possible
Rather than folding one variable within the other only to unwrap them,
just use the ones we need.

v2: bring back LOCAL_PATH prefix for nir_constant_expressions,h

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
2015-09-09 12:48:43 +01:00
Emil Velikov
da5e4559ee glsl: automake: reuse $(NIR_GENERATED_FILES) where possible
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:39 +01:00
Emil Velikov
9e0594418d glsl: automake: rework the sources generation rules
The glsl equivalent of "mesa: automake: rework the source generation
rules". Plus let's make things consistent and always explicitly provide
the header name.

v2: Rebase on top of reverted "remove custom AM_V_LEX/YACC" (Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:33 +01:00
Emil Velikov
fd913f47b7 mesa: automake: rework the source generation rules
Same logic as previous commit applies.

Additionally remove the odd (set -e/mv/INDENT) from the rules.
The last one is the only one we remotely care about, if reading the
generated sources.

Upcoming work from DylanB which will replace the existing python
scripts with ones that produce more readable output anyway.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:29 +01:00
Emil Velikov
96509aa804 mapi: automake: rework the source generation rules
Same logic as previous commit applies. Also fix bogus MESA_MAPI_DIR -
the sources are located in the source dir (duh).

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:25 +01:00
Emil Velikov
449ce5d64f mapi: automake: rework the *api/glapi_mapi_tmp.h rules
Same logic as previous commit applies.

v2: Merge with "inline glapi_gen_mapi define" (Matt)

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-09 12:48:18 +01:00
Emil Velikov
d65bd7a7be util: automake: rework the format_srgb.c rule
A handful of changes/cleanups paving the way to bmake support:
 - Remove optional $(srcdir)/ prefix for files in the prereq list.
 - Drop the space after the AM_V_GEN variable.
 - Using $< in a non-suffix rule is a GNU make idiom.
 - Use $(@D) over $(dir $@). The latter is a POSIX standard.

v2: Cosmetic tweaks in the commit summary.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
2015-09-09 12:48:09 +01:00
Emil Velikov
c8984a7a46 xmlpool: 'promote' LOCALEDIR variable
This is the only place in mesa that uses this constuct which seems
to be GNUmake-ism. Attempting to build with POSIX make implementations
(bmake) would fail as below.

--- options.h ---
LOCALEDIR := .
sh: line 2: LOCALEDIR: command not found
*** [options.h] Error code 127

So let's keep things consistent and compatible by making the variable
non target specific.

v2:
 - Bring back LOCALEDIR.
 - Reword the commit message
 - Change mesa-stable tag 10.6 > 11.0

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-09 12:48:04 +01:00
Boyan Ding
63c4b7ee1e egl_dri2: Add support for EGL_KHR_create_contest when using swrast
This requires swrast version >= 3. Also EGL_EXT_create_context_robostness
is supported if __DRI2_ROBUSTNESS extension is found.

Reference: https://bugs.freedesktop.org/show_bug.cgi?id=80821
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:26:48 +01:00
Boyan Ding
6345d2da60 egl_dri2: Use createContextAttribs if swrast version >= 3
v2: Change return type of the new function from int to bool

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:25:55 +01:00
Boyan Ding
b9ea608c1a egl_dri2: Move filling context_attrib array in a separate function
v2: Change return type of the new function from int to bool

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-09-09 11:25:18 +01:00
Marta Lofstedt
b8d6de87f6 mesa: Allow query of GL_VERTEX_BINDING_BUFFER
According to OpenGL ES 3.1 specification table : 20.2 and
OpenGL specification 4.4 table 23.4. The glGetIntegeri_v
functions should report the name  of the buffer bound
when called with GL_VERTEX_BINDING_BUFFER.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-09 09:29:04 +02:00
Marta Lofstedt
ea69ae04db mesa/es3.1: Enable GL_MAX_VERTEX_ATTRIB enums for GLES 3.1
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-09-09 09:29:04 +02:00
Kenneth Graunke
0cc331dddd i965/nir: Use nir_system_value_from_intrinsic to reduce duplication.
This code is all pretty much identical.  We just needed the translation
from one enum value to the other.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-08 18:02:16 -07:00
Kenneth Graunke
d5d74d0b86 nir: Add a nir_system_value_from_intrinsic() function.
This converts NIR intrinsics that load system values into Mesa's
SYSTEM_VALUE_* enumerations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-08 18:02:08 -07:00
Kenneth Graunke
8fbc4ae330 i965: Mark topologies with adjacency information as G45+.
These didn't exist on the original 965.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-08 18:00:42 -07:00
Kenneth Graunke
aa18fa30c5 i965: Fix value of _3DPRIM_TRIFAN_NOSTIPPLE.
TRIFAN_NOSTIPPLE has always been 0x16 - 0x15 is marked "Reserved" on all
platforms.  See the 965 PRM, Volume 2, Table 3-1, "3D Primitive Topology
Type Encoding" for a list.

We don't currently use this, and I don't expect we will, but we may as
well not leave the bogus value around.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-08 18:00:40 -07:00
Chris Forbes
70650094ef i965: Add 64-bit dirty flag handling to brw_upload_pull_constants
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-08 18:00:36 -07:00
Chris Forbes
a9df772e0e i965: Add defines for all new Gen7/8 URB opcodes
Tessellation needs to emit URB reads and atomics;

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-08 17:57:54 -07:00
Ben Widawsky
e8a219ab46 i965/gen8+: Skip depth stalls on state change
Docs suggest this is no longer required starting with Gen8.

Perf (no regressions in n=20)
OglMultithread       0.67%
OglTerrainPanInst    0.12%
trex                 0.45%
warsow               0.64%

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-09-08 16:09:52 -07:00
Dave Airlie
6d2ceb10cd r600: don't use shader key without verifying shader type (v2)
Since 7a32652231
r600: Turn 'r600_shader_key' struct into union

we were accessing key fields that might be aliased in the union
with other fields, so we should check what shader type we are
compiling for before using key values from it.

v1.1: make it compile
v2: have caffeine, make it work - we don't set type
until later, so don't reference it until we've set it.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-09 08:42:06 +10:00
Ben Widawsky
f5509874aa i965/skl: Use more compact hiz dimensions
I meant to do this here, but it was in the wrong place:

commit c1151b18f2
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Jun 24 20:07:54 2015 -0700

   i965/skl: Use more compact hiz dimensions

NOTE: Jordan did go back and look at the original mailing list post. I mailed
the right thing, and pushed the wrong one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-09-08 15:36:01 -07:00
Ilia Mirkin
458e55d7c5 st/mesa: increase viewport bounds limits for GL4 hw
According to the ARB_viewport_array spec, GL4 limit is higher than the
GL3 limit. Also take this opportunity to fix the GL3 limit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-08 17:15:02 -04:00
Ilia Mirkin
39df725f73 nvc0: always emit a full shader colormask
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.

Fixes the following piglit on Fermi (Kepler is unaffected):
  fbo-stencil blit GL_DEPTH32F_STENCIL8

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-08 17:13:12 -04:00
Brian Paul
a3b0b3fda5 docs: fix date formatting in index.html 2015-09-08 08:47:01 -06:00
Iago Toral Quiroga
205ff843ff nir: UBO loads no longer use const_index[1]
Commit 2126c68e5c killed the array elements parameter on load/store
intrinsics that was stored in const_index[1]. It looks like that
patch missed to remove this assignment in the UBO path.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-08 09:06:34 +02:00
Hans de Goede
87073c69f3 nv30: Fix max width / height checks in nv30 sifm code
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.

This commit fixes this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-07 16:10:23 -04:00
Chris Wilson
be519c2d50 i965: Disallow fast blit paths for CopyTexImage with PixelTransfer ops
glCopyTexImage behaves similarly to glReadPixels with respect to the
pixel transfer operations. Therefore if any are set we cannot use the
simple blit-only fast paths.

(Though if would be possible to relax the blorp path to handle
pixel zoom, or we can just enhance meta.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviwewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-09-07 20:50:07 +01:00
Jon TURNEY
a1575b55c2 mesa/tests: Remove unneeded X11_CFLAGS
X11_CFLAGS is never defined.  Path to X11 headers is not needed here, so
just remove.

Future work: Using AM_CFLAGS here looks wrong, as this Makefile only builds
C++ files

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-07 10:43:32 +01:00
Jon TURNEY
5f9c72ad23 glxl/tests: Use X11_INCLUDES instead of X11_CFLAGS
X11_CFLAGS is undefined, so these tests will fail to build if x11proto is
installed in a non-standard location.

(See also commits 35189d76, bc93c3798, 54b028ba, d901d7e08, etc.)

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-07 10:43:32 +01:00
Thomas Hellstrom
f1ef89eaab svga: Fix surface view error handling
Make sure errors are correcly propagated.
Also don't flush during state emission if emission fails.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-07 01:25:08 -07:00
Rob Clark
1432a18241 xa: add xa_surface_from_handle2 v2
Like xa_surface_from_handle(), but takes a handle type, rather than
hard-coding 'shared' handle.  This is needed to fix bugs seen with
xf86-video-freedreno with xrandr rotation, for example.  The root issue
is that doing a GEM_OPEN ioctl on a bo that already has a GEM handle
associated with the drm_file will result in two unique handles for the
same bo.  Which causes all sorts of follow-on fail.

v2:
- Add support for for fd handles.
- Avoid duplicating code.
- Bump xa version minor.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2015-09-07 01:25:08 -07:00
Alejandro Piñeiro
00c568f679 i965/nir/vec4: removed unneeded tex src swizzle set
At that point the swizzle should be correct.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-07 10:10:42 +02:00
Ilia Mirkin
ae535cb0bf util: make mesa-sha1.c completely empty when there are no SHA1 impls
My earlier attempt to fix this missed the fact that there was a #else
clause that assumes that you have openssh. This moves the whole thing
under #ifdef HAVE_SHA1 which should avoid this issue.

Fixes: 13bfa5201 (util: always include sha1 into the build)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91898
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
2015-09-07 00:18:12 -04:00
Ilia Mirkin
13bfa52011 util: always include sha1 into the build
SHA1 is now used in all builds when HAVE_SHA1 is defined. Adjust src to
do the same thing, rather than predicating on shader cache.

Fixes: 04e201d0c0 ("mesa: change 'SHADER_SUBST' facility to work with env variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
2015-09-06 16:11:24 -04:00
Ilia Mirkin
e40f32d562 st/mesa: don't fall back to 16F when 32F is requested
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-06 14:15:59 -04:00
Ilia Mirkin
bfd3d5244b st/mesa: properly handle u_upload_alloc failure
vbuf is never null. We want to make sure that a resource was allocated
for the vbuf, which is *vbuf.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-09-06 11:32:07 -04:00
Ilia Mirkin
a778831735 nouveau: don't mark full range as used on unmap with explicit flush
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:23 -04:00
Ilia Mirkin
c830d193db nv50: avoid using inline vertex data submit when gl_VertexID is used
The hardware only generates vertexid when vertices come from a VBO. This
fixes:

  vertexid-drawelements
  vertexid-drawarrays

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-09-05 23:04:21 -04:00
Ilia Mirkin
4a025c6bc8 nv50: don't flush vertex arrays when index buffer changes
The index buffer is fed in inline over a pushbuf. It's not related to
vertices or any caching that might be done on them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:18 -04:00
Ilia Mirkin
1f62d36ae2 nv50: rebind bo to bufctx when invalidating idxbuf storage
There is nothing to be done on a dirty idxbuf, but the bo may have
changed, so we have to rebind it to the bufctx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:15 -04:00
Ilia Mirkin
114cc18b98 nv50: clear buffer status on all vertex bufs, not just the first one
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:04:08 -04:00
Ilia Mirkin
75e34d1df8 nv50: fix drawing from tfb, direct-to-pushbuf submits
The stride was being set to 0, which is illegal (and also non-sensical).
Also we must wait for the buffer to become available for reading as
otherwise a wrong value may be prefetched. Since we must wait for the
buffer anyways, and it's mapped and in GART, we may as well avoid the
annoyance of the indirect pushbuf submit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-09-05 23:03:52 -04:00
Ben Widawsky
5165e464f2 i965: Remove base miplevel from sampler state.
Gen9 changes the meaning of this to coarse LOD quality mode. Although that's a
desirable thing to be setting, it doesn't match the gen8 behavior and this was
unintentional. More importantly, we don't ever use this field. So instead of
getting it "wrong" drop it entirely.

This is a respin of a patch which only [incorrectly] tried to address gen9.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-04 16:05:02 -07:00
Emil Velikov
509ba61d5a docs: add news item and link release notes for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 23:11:40 +01:00
Emil Velikov
f39bc1c828 docs: add sha256 checksums for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e3e2a3e0e5)
2015-09-04 23:10:09 +01:00
Emil Velikov
5685ed72b8 docs: add release notes for 10.6.6
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 4b05739e9d)
2015-09-04 23:10:07 +01:00
Oded Gabbay
4f2290d161 llvmpipe: convert double to long long instead of unsigned long long
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.

This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-04 17:37:17 -04:00
Hans de Goede
3c6c4d4f29 nv30: Implement color resolve for msaa
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.

And on nv3x we end up using the cpu which is really slow.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 16:07:08 -04:00
Hans de Goede
3329703eb1 nv30: Fix creation of scanout buffers
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.

These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.

nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.

This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 16:07:08 -04:00
Boyan Ding
48de40ce9c vc4: Initialize pack field of qreg to 0 in qir_get_temp
This avoids generation of undefined packing in qir and qpu instructions,
fixing a lot of rendering errors.

Fixes 8b36d107fd (vc4: Pack the unorm-packing bits into a src MUL
instruction when possible.)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 12:16:07 -07:00
Chris Wilson
099f5b3a62 i965: Disallow PixelTransfer operations for tiled-memcpy TexImage/ReadPixels
The tiled memcpy fast paths perform a simple blit (with only a couple of
trivial pixel conversion routines) and do not accommodate PixelTransfer
operations. Therefore if any are set, fallback to the regular routines.
Note that PixelTransfer only applies to TexImage and ReadPixels, not to
GetTexImage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-09-04 20:11:15 +01:00
Iago Toral Quiroga
96ea166308 i965/vec4: Don't unspill the same register in consecutive instructions
If we have spilled/unspilled a register in the current instruction, avoid
emitting unspills for the same register in the same instruction or consecutive
instructions following the current one as long as they keep reading the spilled
register. This should allow us to avoid emitting costy unspills that come with
little benefit to register allocation.

v2:
  - Apply the same logic when evaluating spilling costs (Curro).

v3:
  - Abstract the logic that decides if a register can be reused in a function.
    that can be used from both spill_reg and evaluate_spill_costs (Curro).

v4:
  - Do not disallow reusing scratch_reg in predicated reads (Curro).
  - Track if previous sources in the same instruction read scratch_reg (Curro).
  - Return prev_inst_read_scratch_reg at the end (Curro).
  - No need to explicitily skip scratch read/write opcodes in spill_reg (Curro).
  - Fix the comments explaining what happens when we hit an instruction that
    does not read or write scratch_reg (Curro)
  - Return true early when the current or previous instructions read
    scratch_reg with a compatible mask.

v5:
  - Do not return true early, the loop should not be expensive anyway
    and this adds more complexity (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-04 15:13:49 +02:00
Iago Toral Quiroga
bd6e516fc2 i965: Add a debug option for spilling everything in vec4 code
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-09-04 12:49:36 +02:00
Francisco Jerez
6cf4142db8 dri/common: Tokenize driParseDebugString() argument before matching debug flags.
Fixes debug string parsing when one of the supported flags is a
substring of another.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-04 12:49:36 +02:00
Francisco Jerez
3d4f75506c dri/common: Fix codestyle of driParseDebugString().
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-04 12:49:36 +02:00
Tapani Pälli
08e9049e3d glsl: error out on ES 3.1 if VS or FS present but not both
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 09:22:24 +03:00
Tapani Pälli
69678953d1 glsl: error on linking if no shaders are attached to program
This applies to OpenGL Core >= 4.5 and OpenGL ES >= 3.1.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-04 09:01:00 +03:00
Kenneth Graunke
4323e78d3f i965: Improve disassembly of data port read messages.
We now print out the name of the message instead of its numerical
value, and label the message control and surface numbers.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
0e23c246c0 i965: Optimize VUE map comparisons.
The entire VUE map is computed based on the slots_valid bitfield;
calling brw_compute_vue_map on the same bitfield will return the
same result.  So we can simply compare those.

struct brw_vue_map is 136 bytes; doing a single 8-byte comparison is
much cheaper and should work just as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
6e03377daf i965/gs: Don't reserve space for clip plane uniforms.
These were only for legacy userclipping, which we no longer support
in geometry shaders.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
fba4823a91 i965: Don't do legacy userclipping in non-compatibility contexts.
According to the GLSL 1.50 specification, page 76:
"The shader must also set all values in gl_ClipDistance that have been
 enabled via the OpenGL API, or results are undefined."

With this patch, we only enable clip distance writes when the shader
actually writes them.  We no longer force a value to be written when
clip planes are enabled in the API.  This could mean the first varying
slot would be used as clip distances - I believe it should be the safe
kind of undefined behavior.

Empirically, it doesn't seem to cause a problem.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
4f4b7c4711 i965: Remove the brw_vue_prog_key base class.
The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys.  So there's no need for a "VUE" key base class.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:04 -07:00
Kenneth Graunke
3239621825 i965: Virtualize vec4_visitor::emit_urb_slot().
This avoids a downcast of key, which won't exist in the base class soon.

I'm not a huge fan of this patch, but given that we're currently using
inheritance, this seems like the "right" way to do it.  The alternative
is to make key a void pointer in the parent class and continue
downcasting.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
27e83b62bb i965: Store a key_tex pointer in vec4_visitor.
I'm about to remove the base class for VS/GS/HS/DS program keys, at
which point we won't be able to use key->tex anymore.  Instead, we'll
need to store a direct pointer (like we do in the FS backend).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
014b90221a i965: Move legacy clip plane handling to vec4_vs_visitor.
This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.

Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter.  This eliminates the bogus NULL parameter in the
GS case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
082b7f1876 i965: Delete the brw_vue_program_key::userclip_active flag.
There are two uses of this flag.

The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances.  In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value.  Checking if it's
> 0 works for detecting this case.

Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags.  Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
294282aaa6 i965: Remove legacy clip plane handling from geometry shaders.
We only support geometry shaders in core profiles, where gl_ClipVertex
doesn't exist.  Presumably the even older behavior of clipping to
gl_Position isn't supported either.  In fact, GLSL 1.50 page 76 claims:

"The shader must also set all values in gl_ClipDistance that have been
 enabled via the OpenGL API, or results are undefined."

So we don't need to handle legacy clipping in geometry shaders.  I think
Paul added this back when we were considering supporting the old
GL_ARB_geometry_shader4 extension.

This removes a non-orthagonal state dependency on GS compilation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Kenneth Graunke
a2151560b8 i965: Move brw_setup_tex_for_precompile to brw_program.[ch].
This living in brw_fs.{h,cpp} is a historical artifact of us supporting
texturing for fragment shaders before any other stages.  It's kind of
awkward given that we use it for all stages.

This avoids having to include brw_fs.h in geometry shader code in order
to access this function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-09-03 22:31:03 -07:00
Tapani Pälli
04e201d0c0 mesa: change 'SHADER_SUBST' facility to work with env variables
Patch modifies existing shader source and replace functionality to work
with environment variables rather than enable dumping on compile time.
Also instead of _mesa_str_checksum, _mesa_sha1_compute is used to avoid
collisions.

Functionality is controlled via two environment variables:

MESA_SHADER_DUMP_PATH - path where shader sources are dumped
MESA_SHADER_READ_PATH - path where replacement shaders are read

v2: cleanups, add strerror if fopen fails, put all functionality
    inside HAVE_SHA1 since sha1 is required

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-04 08:22:37 +03:00
Tapani Pälli
0db323a624 build: add HAVE_SHA1 define when using --with-sha1 option
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
2015-09-04 08:05:24 +03:00
Kenneth Graunke
2ace64fd59 i965: Fix copy propagation type changes.
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around".  It didn't account for type conversion moves,
however.  So it would happily turn this:

   mov(8) vgrf6:D, -vgrf5:D
   mov(8) vgrf7:F, vgrf6:UD

into this:

   mov(8) vgrf6:D, -vgrf5:D
   mov(8) vgrf7:D, -vgrf5:D

which erroneously drops the conversion to float.

Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-03 21:12:54 -07:00
Dave Airlie
5fa5a012b1 r600: fix loop overrun in cayman_mul_double_instr
Coverity warned about this. Ilia pointed it out.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-04 08:02:14 +10:00
Ben Widawsky
b05619c627 i965/gen9: Annotate input coverage mask change
As far as I can tell, the behavior is preserved from the previous generations.
Before we set a single bit to tell the FS whether or not we'll be using an input
coverage mask. Now we have some options which are implementing various
extensions. These bits are used for the various conservative rasterization
mechanisms (for collision detection, binning, and whatever else).

I believe that the behavior is preserved because the problem which conservative
rasterization is attempting to fix would go away with the "NORMAL" mode (at the
cost of performance, I believe).

This patch serves as documentation of the change by creating the enums, as well
as giving some of the history with the links here so that the next person who
comes along and looks at it doesn't spend as long as I had to in order to
determine if there is an issue or not.

Previously, this algorithm had been done in software, and this can still be used
as long as we don't export an extension stating otherwise.

References: https://www.opengl.org/registry/specs/NV/conservative_raster.txt
References: https://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-03 11:55:31 -07:00
Brian Paul
70dbdca15f svga: update call to u_upload_alloc()
u_upload_alloc() no longer returns a return value.

Trivial.
2015-09-03 11:24:24 -06:00
Marek Olšák
efea7c3a3f winsys/radeon: remove exported buffers from the cache
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:45 +02:00
Marek Olšák
54964c7751 winsys/amdgpu: remove exported buffers from the cache
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:42 +02:00
Marek Olšák
35d0f12797 gallium/pb_bufmgr_cache: add a way to remove buffers from the cache explicitly
This must be done before exporting a buffer as dmabuf fds, because
we lose track of who is using it and can't trust the reference counter.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-09-03 18:41:40 +02:00
Marek Olšák
44dbaa1746 u_upload_mgr: remove the return value from u_upload_data
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:50 +02:00
Marek Olšák
0c5df863ba u_upload_mgr: remove the return value from u_upload_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:48 +02:00
Marek Olšák
b4f7639955 u_upload_mgr: remove the return value from u_upload_alloc_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:43 +02:00
Marek Olšák
8c6ff05517 u_upload_mgr: remove the return value from u_upload_alloc
The return buffer or the returned pointer can be used instead.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:14:09 +02:00
Marek Olšák
6c1e368cf3 u_upload_mgr: optimize u_upload_alloc
This is probably the most called util function. It does almost nothing,
yet it can consume 10% of the CPU on the profile. This drops it down to 5%.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-03 18:09:13 +02:00
Grazvydas Ignotas
722ce74743 gallium/radeon: remove 'dirty' member from r600_atom
It's no longer used by both r600 and radeonsi now.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:51 +02:00
Grazvydas Ignotas
ccbc7952a4 r600g: simplify dirty atom tracking
Now that R600_NUM_ATOMS is below 64, dirty atom tracking can be
simplified.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:42 +02:00
Grazvydas Ignotas
6ef4572937 r600g: start numbering atoms from 1
There doesn't seem any reason to start from 4.
Start from 1 instead (0 is left reserved to catch uninitialized atoms).

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:29 +02:00
Grazvydas Ignotas
4d9af438bc r600g: make all viewport states use single atom
Similarly to scissor states, we can use single atom to track all viewport
states. This will allow to simplify dirty atom handling later.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:06:14 +02:00
Grazvydas Ignotas
fbb423b433 r600g: apply disable workaround on all scissors
During review of the "r600g: make all scissor states use single atom" patch
Marek Olšák noticed that scissor disable workaround should be applied on
all scissor states and not just first one, so let's do so.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:05:58 +02:00
Grazvydas Ignotas
7d475bad66 r600g: make all scissor states use single atom
As suggested by Marek Olšák, we can use single atom to track all scissor
states. This will allow to simplify dirty atom handling later.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-09-03 18:05:54 +02:00
Neil Roberts
ce181aea6c mesa/pbo: Handle zero width, height or depth when validating access
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.

This was causing GL errors to be generated during the Piglit
texsubimage test, although the test was passing anyway.

v2: Also check for width == 0. Don't validate the start pointer if any
    of the dimensions are zero.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-03 17:00:54 +01:00
Kenneth Graunke
30e84530a0 glsl: Remove unused total_attribs_size variable.
Accidentally left behind by my previous patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-03 00:56:18 -07:00
Kenneth Graunke
c3294ca5a1 glsl: Handle attribute aliasing in attribute storage limit check.
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.

So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes.  Instead, we need to look at the
enabled slots.

This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots).  We then
count normal attributes once, and count the double-size attributes
a second time.

Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.

No Piglit changes on llvmpipe (which actually supports dvecs).

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-02 23:28:20 -07:00
Ian Romanick
6e37304521 i965/meta: Fix typo in comment
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-02 16:24:18 -07:00
Ian Romanick
7237c937af mesa: Don't allow wrong type setters for matrix uniforms
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4.  Both are illegal.  That later also
overwrites the storage for the mat4 and causes bad things to happen.

Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-02 16:24:17 -07:00
Ian Romanick
a6976f0972 mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type
This matches _mesa_uniform, and it enables the bug fix in the next
patch.

v2: s/type/basicType/ in the assert in _mesa_uniform_matrix.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au> [v1]
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-09-02 16:24:17 -07:00
Ian Romanick
882aab00ab mesa: Silence unused parameter warnings in bufferobj.c
main/bufferobj.c: In function 'count_buffer_size':
main/bufferobj.c:520:26: warning: unused parameter 'key' [-Wunused-parameter]
 count_buffer_size(GLuint key, void *data, void *userData)
                          ^
main/bufferobj.c: In function 'flush_mapped_buffer_range_fallback':
main/bufferobj.c:740:56: warning: unused parameter 'index' [-Wunused-parameter]
                                    gl_map_buffer_index index)
                                                        ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
8ba3b7661b mesa: Remove target parameter from _mesa_handle_bind_buffer_gen
main/bufferobj.c: In function '_mesa_handle_bind_buffer_gen':
main/bufferobj.c:915:37: warning: unused parameter 'target' [-Wunused-parameter]
                              GLenum target,
                                     ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
1e4d3d25ff i965: Make gen7_enable_hw_binding_tables static
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-09-02 16:24:17 -07:00
Ian Romanick
97ce8bd437 i965: Make gen8_upload_state_base_address static
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-09-02 16:24:17 -07:00
Ian Romanick
4ff9e599cb linker: Silence GCC unused parameter warnings
linker.cpp:320:55: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_function *ir)
                                                       ^
linker.cpp:327:53: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_return *ir)
                                                     ^
linker.cpp:333:49: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_enter(ir_if *ir)
                                                 ^
linker.cpp:339:49: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_if *ir)
                                                 ^
linker.cpp:345:51: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_enter(ir_loop *ir)
                                                   ^
linker.cpp:351:51: warning: unused parameter 'ir' [-Wunused-parameter]
    virtual ir_visitor_status visit_leave(ir_loop *ir)
                                                   ^
linker.cpp:2824:53: warning: unused parameter 'ctx' [-Wunused-parameter]
 link_calculate_subroutine_compat(struct gl_context *ctx, struct gl_shader_program *prog)
                                                     ^
linker.cpp:2854:47: warning: unused parameter 'ctx' [-Wunused-parameter]
 check_subroutine_resources(struct gl_context *ctx, struct gl_shader_program *prog)
                                               ^
linker.cpp:3368:49: warning: unused parameter 'ctx' [-Wunused-parameter]
 link_assign_subroutine_types(struct gl_context *ctx,
                                                 ^

Also make link_assign_subroutine_types static since it is only called
from this file.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Ian Romanick
8fafb0a67f mesa: Fix warning about static being in the wrong place
Because the compiler already has enough things to complain about.

    grep -rl 'const static' src/ | while read f
    do
        sed --in-place -e 's/const static/static const/g' $f
    done

brw_eu_emit.c: In function 'brw_reg_type_to_hw_type':
brw_eu_emit.c:98:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
       const static int imm_hw_types[] = {
       ^
brw_eu_emit.c:120:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
       const static int hw_types[] = {
       ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-02 16:24:17 -07:00
Jordan Justen
06ada493fb i965/cs: Setup push constant data for uniforms
brw_upload_cs_push_constants was based on gen6_upload_push_constants.

v2:
 * Add FINISHME comments about more efficient ways to push uniforms

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-09-02 14:17:24 -07:00
Jordan Justen
4bdd5e09c3 meta: Save/restore compute shaders
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-02 14:17:24 -07:00
Charmaine Lee
4a9480b64a svga: fix referencing a NULL framebuffer cbuf
Check for a valid framebuffer cbuf pointer before accessing its
associated surface.

Fix piglit test fbo-drawbuffers-none.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-02 13:22:42 -06:00
Charmaine Lee
5a5e5e3959 svga: increment texture age when surface is to be marked as dirty
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture(). This implies a surface can now be reused for a
render buffer. Currently, when we render to a texture, we mark the
surface as dirty. But in svga_mark_surface_dirty(), if the surface
is already marked as dirty, it does not increment the texture age.
Any view to this texture might not be updated properly then.

With this patch, the texture age is incremented regardless of whether
the surface is already marked as dirty or not.

Fix bug 1499181.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-09-02 13:22:42 -06:00
Charmaine Lee
b2fd41ce46 svga: fix backed surface view regression
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture() and exposes a bug in the backed surface view
creation.  Currently a backed surface view for a conflicted surface view
is created at framebuffer emit time. But if shader sampler views are changed
but framebuffer surface views remain unchanged, emit_framebuffer() will not
be called and conflicted surface views will not be detected.

To fix this, also check for conflicted surface views when setting sampler
views. If there is any conflicted surface views, enable the
framebuffer dirty bit so that the framebuffer emit code has a chance to
create a backed surface view for the conflicted surface view.

Fix cinebench-r11-test regression.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-02 13:22:42 -06:00
Matt Turner
9390cb8459 i965/fs: Handle MRF destinations in lower_integer_multiplication().
The lowered code reads from the destination, which isn't possible from
message registers.

Fixes the following dEQP tests on SNB:

    dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
    dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
    dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment

Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by:  Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-02 11:52:10 -07:00
Brian Paul
4fd314852c docs: document VMware OpenGL 3.3 support
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
e054251ed1 svga: update driver for version 10 GPU interface
This is a squash commit of roughly two years of development work.
Authors include:
  Brian Paul
  Charmaine Lee
  Thomas Hellstrom
  Jakob Bornecrantz
  Sinclair Yeh
  Mingcheng Chen
  Kai Ninomiya
  MengLin Wu

The driver supports OpenGL 3.3.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
656dac120d svga: add new version 10 device command prototypes
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:27:43 -06:00
Brian Paul
e8c20d97eb svga: add new svga_streamout.h file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:24 -06:00
Brian Paul
8ddf98d671 svga: add new svga_state_tgsi_transform.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:24 -06:00
Brian Paul
26d8bae889 svga: add new svga_state_sampler.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
a633948e7e svga: add new svga_state_gs.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
ff85bcdba2 svga: add new svga_pipe_streamout.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
7ce20cf59a svga: add new svga_pipe_gs.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
9cb2d9ddfa svga: add new svga_link.[ch] files
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
53d07910c3 svga: add new svga_cmd_vgpu10.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
35bb29d499 svga: add new svga_tgsi_vgpu10.c file
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
1c5468e9c0 svga: remove unused SVGA3D_* command functions
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
133a47107c gallium/st: add pipe_context::get_timestamp()
The VMware svga driver doesn't directly support pipe_screen::get_timestamp()
but we can do a work-around.  However, we need a gallium context to do so.
This patch adds a new pipe_context::get_timestamp() function that will only
be called if the pipe_screen::get_timestamp() function is NULL.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
e2a1d21cb6 svga/winsys: Add support for VGPU10
This involves a few driver modifications to keep things building.
The driver may not actually run properly at this point.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
c191b507cb svga: update the svga3d device header files
Remove some obsolete svga_dump.c code for items which no longer exist.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
3a92526704 svga: add new version 10 device header files
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Brian Paul
75f92e28b4 winsys/svga: add new vmw_query.c[h] files
Functions for creating, destroying, getting queries, etc.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-09-02 09:05:23 -06:00
Chris Wilson
f30cf3258e meta: Compute correct buffer size with SkipRows/SkipPixels
If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:

==28136== Invalid read of size 8
==28136==    at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136==    at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==

Fixes regression from commit 7f396189f0
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jan 5 18:17:04 2015 -0800

    meta: Add a BlitFramebuffers-based implementation of TexSubImage

v2: However, the teximage we create does need to be width x full_height x 1

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>
2015-09-02 10:08:39 +01:00
Alejandro Piñeiro
4de86e1371 i965/vec4: fill src_reg type using the constructor type parameter
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.

This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs:     2918 -> 2833 (-2.91%)
helped:                                79
HURT:                                  0
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-02 09:59:47 +02:00
Glenn Kennard
d2cab815b4 r600g: Add doubles support for CYPRESS
This doesn't enable the support, just adds some of
the code, so we don't have to keep rebasing.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:34:39 +10:00
Dave Airlie
3be5ee1574 r600g: add doubles support for CAYMAN
Only a subset of AMD GPUs supported by r600g support doubles,
CAYMAN and CYPRESS are probably all we'll try and support, however
I don't have a CYPRESS so ignore that for now.

This disables SB support for doubles, as we think we need to
make the scheduler smarter to introduce delay slots.

[airlied: pushing this to avoid pain of rebasing, it mostly
works on cayman only so far, Glenn has some ideas about
delay slot issues we need to look into. turned off by
default for now]

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:06:18 +10:00
Dave Airlie
ee67fd70c2 tgsi/scan: add uses_doubles to tgsi scanner
This allows drivers to work out if a shader contains any
double opcodes easily.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 16:06:13 +10:00
Glenn Kennard
3bfa345c1e r600g: add multiple stream support for geom shaders
This patch is taken from work by Glenn and myself,
and I've spent some time making it all work here.

This adds support for the multiple streams part of
ARB_gpu_shader5 to r600g.

It doesn't enable ARB_gpu_shader5 yet.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
3d497e0d91 r600g/sb: add support for multiple streams to SB backend
This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
d503bbbf30 r600g: add support for streams to the assembler.
This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
90ac5fb6bb r600g/sb: dump sampler/resource index modes for textures.
This just aids debugging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 15:55:47 +10:00
Dave Airlie
32769ac016 mesa/readpixels: check strides are equal before skipping conversion
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.

This fixes all the core mesa problems with CTS packed_pixels
tests.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:34:21 +10:00
Dave Airlie
b4a70401f5 texcompress_s3tc/fxt1: fix stride checks (v1.1)
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.

align the rowlength to the pack alignment before comparing.

This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.

v1.1: add fxt1 fix (Iago)

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:32:26 +10:00
Dave Airlie
6a3e1fb958 st/readpixels: fix accel path for skipimages.
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.

This fixes some memory overruns under CTS
 packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:30:48 +10:00
Dave Airlie
c3c242070e mesa/formats: 8-bit channel integer formats addition
Add enough 8-bit channel formats to handle all the
different things CTS throws at us.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:26:34 +10:00
Dave Airlie
8185a02316 mesa/formats: add some formats from GL3.3
GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.

While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.

This adds the first chunk of formats that hit the
assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:26:13 +10:00
Dave Airlie
5b6c7da460 mesa: handle SwapBytes in compressed texture get code.
This case just wasn't handled, so add support for it.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:17:29 +10:00
Dave Airlie
0ad3a475ef mesa: fix SwapBytes handling in numerous places
In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.

This adds a common routine to swap bytes a 2D image and uses this
code in:

texture storage
texture get
readpixels
swrast drawpixels.

[airlied: updated with Brian's nitpicks].

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-02 09:16:43 +10:00
José Fonseca
60aea30115 auxiliary/os: Don't implement os_get_option() on embedded builds.
Let it be defined externally instead, allowing setting mechanisms other
than environment variables.

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
84e71ef2ee util: add a couple primitive restart helper functions
The first function translates prim restart indexes to be 0xffff or
0xffffffff.

The second splits indexed primitives with restart indexes into sub-
primitives without restart indexes.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
14f35194d8 tgsi: add tgsi utility to transform a fragment shader to support aa point
This adds a tgsi utility tgsi_add_aa_point to transform a fragment shader
to support anti-aliased wide point by computing the fragment distance from
the point center. This utility assumes the geometry shader is emitting
an extra generic output with point coord data. The semantic index of
this generic output is passed to the tgsi_add_aa_point utility.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
bca238d4f5 tgsi: adds tgsi utility to transform a shader to support point sprite
This adds a tgsi utility tgsi_add_point_sprite to transform a geometry
shader to emulate wide points by drawing quads. This utility adds an
extra output for the original point position if the point position is
to be written to a stream output buffer. It also assumes the driver will
add a constant for inverse viewport scale after the user defined constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
a65bdf5f47 tgsi: add new tgsi_two_side.c utility code
This could be used by any driver where the device doesn't directly
support two-sided lighting.  This code modifies a fragment shader
to accecpt back-face colors and choose between the front/back colors
depending on the triangle's front-face sign.
2015-09-01 16:29:17 -06:00
Brian Paul
da33c2434b util: add util_strcasecmp() wrapper
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
0c4b621590 gallium/util: add a utility to create geometry passthrough shader
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Roland Scheidegger
1754208617 gallium/util: fix returning empty box for rectangle intersection
These functions deal with inclusive coordinates, hence a 0/0/0/0 rect
returned when there's no intersection doesn't actually represent an empty
rectangle. Hence return 0/-1/0/-1 instead.
This fixes some problems in llvmpipe with empty scissor rects (which up
to now didn't really matter because while the intersect test returned the
wrong result all pixels were scissored away later anyway).
2015-09-01 16:29:17 -06:00
Roland Scheidegger
fec4f5de67 gallium/util: return FALSE for intersection if there's empty rectangles
It isn't really obvious if intersection test should take into account empty
rectangles or if the caller should do it. But it looks like most callers
actually verified one of the rects but not the other, but since correctly
returning an empty rect that other rect could actually be empty leading to
more bugs. Hence just verify both rects for emptyness in the intersection
test itself which makes the code easier in the caller (though it will be
slower if the caller knows the rectangles are non-empty).

Reviewed-by: Zack Rusin <zackr@vmware.com>
2015-09-01 16:29:17 -06:00
Charmaine Lee
1775687637 tgsi: add some more helper functions
This patch adds some more helper functions such as
   . tgsi_transform_temps_decl
   . tgsi_transform_output_decl
   . tgsi_transform_dst_reg
   . tgsi_transform_src_reg

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
f8da1e1459 tgsi: added tgsi_is_shadow_target() helper 2015-09-01 16:29:17 -06:00
Brian Paul
bd883c9070 tgsi: add negate parameter to tgsi_transform_kill_inst()
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-09-01 16:29:17 -06:00
Brian Paul
56852e925e util: added ffsll() function
v2: fix errant _GNU_SOURCE test, per Matt Turner.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-01 16:29:17 -06:00
Brian Paul
84dad65088 util: added util_set_index_buffer()
Like util_set_vertex_buffers_count(), this basically just copies a
pipe_index_buffer object, taking care of refcounting.
2015-09-01 16:29:17 -06:00
Jason Ekstrand
47b4efc710 mesa: Move gl_vert_attrib from mtypes.h to shader_enums.h
It is a shader enum after all...

Acked-by: Brian Paul <brianp@vmware.com>
2015-09-01 14:45:37 -07:00
Matt Turner
e34834f059 glapi: Inline x86_64_current_tls().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-01 13:23:13 -07:00
Edward O'Callaghan
d351bab9c5 r600g: Simplify out a couple of unnecessary branches
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-09-01 21:55:23 +02:00
Marek Olšák
2d8f7d3c15 radeonsi: use an indirect buffer for init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
df12ddb55d radeonsi: add IB2 indirect buffer support for pm4 states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
8a9ab86ca6 winsys/radeon: add a flag telling how gfx IBs should be padded
This is always false on amdgpu (set by calloc).

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
ba79ff7fa8 winsys/amdgpu: remove IB padding for SI
SI is unsupported by amdgpu

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
0f4688fbe7 radeonsi: remove unused macro si_pm4_set_state
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
b89fa63d45 radeonsi: remove si_pm4_cleanup
All remaining pm4 state are created and destroyed by state trackers.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
a9971e85d9 radeonsi: rework uploading border colors
The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
5e2619ef30 radeonsi: use all built-in border colors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
fbbebeae10 radeonsi: inline si_cmd_context_control
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
77f80a20be radeonsi: remove unused si_pm4_state code
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
228e80123a radeonsi: reorder si_context variables
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
28b34b474e radeonsi: don't send IB dword usage to si_need_cs_space
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
aad43f0768 radeonsi: don't set number of IB dwords for states
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
ec9d5e181e radeonsi: don't count IB space for states, just use an upper bound
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
fc95058add radeonsi: convert SPI state to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:15 +02:00
Marek Olšák
7ff2991e34 gallium/radeon: rename r600_context_bo_reloc -> radeon_add_to_buffer_list
this name should be easy to understand without other knowledge

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
d2e63ac042 gallium/radeon: rename write_*_reg functions
e.g. radeon_set_context_reg is nicer and looks consistent next to
radeon_emit().

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0da159ecac radeonsi: rename and precalculate polygon offset states
one less calloc and state construction while drawing

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
45e549fcbc radeonsi: convert CB_TARGET_MASK setup to an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
8a67e78bb8 radeonsi: don't set VGT_VTX_CNT_EN twice in init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
e21418f221 radeonsi: convert stencil ref state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c44de30979 radeonsi: convert blend color state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
74aa64876b radeonsi: convert sample mask state into an atom
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
12b205341a radeonsi: convert clip state into an atom
Reducing calloc overhead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0c2eed0ede radeonsi: avoid redundant CB and DB register updates
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c2a42d1f9f radeonsi: don't rebind GSVS ring buffers every draw call using GS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
c9a3196b14 radeonsi: don't clear the tessellation factor ring buffer
Leftover from the bring-up.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
a2c6ae07b4 radeonsi: remove the tf_ring state, add the registers to init_config
One less state to worry about.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
0d46c3bc9d radeonsi: remove the gs_rings state, add the registers to init_config
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
87c1e9e19c radeonsi: use a bitmask for tracking dirty atoms
This mainly removes the cache misses when checking the dirty flags.
Not much else though.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
2fe040ee61 radeonsi: initialize atom IDs for external atoms
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:14 +02:00
Marek Olšák
5bb0ad7ccc radeonsi: call si_init_atom for remaining radeonsi atoms
I need to initialize more atom IDs.

This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
e191c58324 radeonsi: initialize atom IDs
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
ba7a6cf626 radeonsi: define the state atom array separately
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
8a97528b3a radeonsi: optimize viewport states
same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
f6a10f60b7 radeonsi: optimize scissor states
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
02c8e06497 radeonsi: add SI_MAX_ATTRIBS
PIPE_MAX_ATTRIBS is 32, but we currently only support 16.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
05af645a95 radeonsi: fix memory usage checking for big IBs
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
08775a2196 radeonsi: set all 16 viewport Z bounds for GL 4.1
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
9b510a9652 radeonsi: fix a Unigine Heaven hang when drirc is missing
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
b1e5451211 winsys/amdgpu: use small IBs for better performance on VI
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Marek Olšák
fc292b5821 gallium/util: add u_bit_scan_consecutive_range
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2015-09-01 21:51:13 +02:00
Chris Wilson
d38a560106 i965: Prevent coordinate overflow in intel_emit_linear_blit
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Apr 15 03:04:33 2015 -0700

    i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.

which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.

Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-09-01 16:41:07 +01:00
Connor Abbott
1484d8c9aa i965/nir: enable the dead control flow optimization
total instructions in shared programs: 7541551 -> 7541381 (-0.00%)
instructions in affected programs:     3054 -> 2884 (-5.57%)
helped:                                29

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 01:48:04 -07:00
Connor Abbott
aec6744501 nir/dead_cf: add support for removing useless loops
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
    checking for values live after the loop to catch const_load
    instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-01 00:58:17 -07:00
Connor Abbott
019eea1c4f nir: add a helper for iterating over blocks in a cf node
We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
89dc0626bd nir: add nir_block_get_following_loop() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
f649afc9dd nir/dead_cf: delete code that's unreachable due to jumps
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott
1e6ad4b027 nir: add an optimization for removing dead control flow
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Dave Airlie
0de53ccc8c r600g: fix calculation for gpr allocation
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.

This is totally my fault as well, maths fail 101.

This makes geom shaders on r600 not fail heavily.

Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-01 16:43:22 +10:00
Marta Lofstedt
f8a938814e mesa: Limit Framebuffer Parameter OpenGL ES 3.1 usage
According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:

"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."

Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
2015-09-01 08:24:37 +03:00
Marta Lofstedt
d770e2746c mesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1
V2: Conform to new standard for exposing enums for OpenGL ES 3.1.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-09-01 08:19:11 +03:00
Jason Ekstrand
e16531fbe3 nir/builder: Use nir_after_instr to advance the cursor
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node.  Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node.  By just resetting to after_instr, we
avoid all these problems.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-31 18:17:07 -07:00
Nanley Chery
f3a483069a i965: advertise ASTC support for Skylake
v2: remove OES ASTC extension reference.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 17:29:36 -07:00
Nanley Chery
be7f640257 mesa/glformats: recognize ASTC formats as color formats
ASTC formats contain RGBA components.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 17:23:10 -07:00
Nanley Chery
76f17266ec mesa/texformat: use format conversion function in _mesa_choose_tex_format
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:21 -07:00
Nanley Chery
01024ded1e mesa/texcompress: correct mapping of S3TC formats in conversion function
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:08 -07:00
Dave Airlie
3063913f77 r600/sb: update last_cf for finalize if.
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.

I think this fixes one of the earlier shader going off end
of memory problems we've stopped.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-01 07:39:24 +10:00
Matt Turner
a4ba41638d i965/fs: Use greater-equal cmod to implement maximum.
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.

See commit 3b7f683f.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-08-31 11:51:59 -07:00
Ben Widawsky
d2e3638ef9 i965/chv|skl: Apply sampler bypass w/a
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).

Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-31 10:08:43 -07:00
Dave Airlie
78027c965a st/mesa: move to renumbering registers in a group
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:33 +01:00
Dave Airlie
aee73f2942 st/mesa: reduce time spent in calculating temp read/writes
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,

so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:18 +01:00
Dave Airlie
46968c1140 st/mesa: cache tgsi opcode info in the instruction
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:26:23 +01:00
Dave Airlie
03b7ec8778 r600: move prim convert from geom shader to function.
This should avoid C++ fail including this header.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 19:45:13 +10:00
Timothy Arceri
c8bc8d7235 glsl: remove specical case subroutine type counting
Unlike samplers we can get the correct value for subroutines from
component_slots()

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-31 13:10:44 +10:00
Edward O'Callaghan
0d19dc302f r600g: Use TGSI parse results instead of manually exfiltrating
This makes better use of the work that the TGSI API has done for
us.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:14 +02:00
Edward O'Callaghan
3eed81a97b r600g: Set geometry properties in r600_create_shader_state()
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:00 +02:00
Edward O'Callaghan
b4dee1b636 r600g: Move geometry properties state from shader to selector
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:44 +02:00
Edward O'Callaghan
7b6369eb69 r600g: Remove dead assigment to 'gs_input_prim' in shader state
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:26 +02:00
Marek Olšák
7dc8a3497f radeonsi: don't use the emit qt keyword in si_init_atom
It confuses my editor.
2015-08-29 23:18:23 +02:00
Marek Olšák
379e3382e8 radeonsi: remove no-op 32-bit masking
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:21 +02:00
Marek Olšák
437cb1e3f4 gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:08 +02:00
Marek Olšák
e321596e9f winsys/radeon: handle non-zero finite timeout when waiting for buffers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:06 +02:00
Ilia Mirkin
a5a96118ed freedreno/a3xx: implement half-z clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-29 16:18:04 -04:00
Ilia Mirkin
58e24b4761 freedreno/a3xx: add basic clip plane support
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-29 16:18:04 -04:00
Samuel Pitoiset
c8a61ea4fb nvc0: change prefix of MP performance counters to HW_SM
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 11:04:00 +02:00
Samuel Pitoiset
21bdb4d8f3 nvc0: sort performance counter queries by name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:50 +02:00
Samuel Pitoiset
ebca85423c nvc0: make names of performance counter queries consistent
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:44 +02:00
Samuel Pitoiset
981f46aa95 nvc0: use enumerations for driver queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:40 +02:00
Samuel Pitoiset
0eac599001 nvc0: remove commented out code related to PCOUNTER queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:35 +02:00
Dave Airlie
6941883175 r600: port si_conv_prim_to_gs_out from radeonsi
This code was broken by the tess merge, and I totally missed it
until now. I'm not sure this fixes anything but it stops the assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
c149d84d45 r600g: use PRIi64 for some compute debug printfs
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
8d6d0cc17d gallium/util: fix debug_get_flags_option on 32-bit
On 32-bit we need to use PRIu64 flags for printfs,
otherwise this segfaults in R600_DEBUG=help otherwise.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Ilia Mirkin
275c5810ca glsl: provide the option of using BFE for unpack builting lowering
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Ilia Mirkin
889a946a45 glsl: use bitfield_insert instead of and + shift + or for packing
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Matt Turner
c676c432f3 i965/fs: Remove fs_visitor::try_replace_with_sel().
No shader-db changes on g4x, snb, hsw, or bdw.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
64e312d7fa i965/fs: Replace awful variable names.
start_to      -> dst_start
   end_to        -> dst_end
   start_from    -> src_start
   end_from      -> src_end
   var_to        -> dst_var
   var_from      -> src_var
   reg_to        -> dst_reg
   reg_to_offset -> dst_reg_offset
   reg_from      -> src_reg

Not sure how these made sense to me before.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
a2ff1e95a4 i965/fs: Skip blocks in register coalescing interference check.
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f2f8c43af9 i965/fs: Improve register coalescing interference check.
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.

The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.

With that fixed, we can calculate interference more accurately.

total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs:     346282 -> 342880 (-0.98%)
helped:                                1552

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f3d0a894af i965/fs: Use overwrites_reg() instead of dst.equals().
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
8765f1d7dd i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.
Noticed when debugging things that lead to the next patch.

On G45 (and presumably ILK) this helps register coalescing:

total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs:     43751 -> 43718 (-0.08%)
helped:                                52
HURT:                                  2

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Marta Lofstedt
2581fe931a i965/fs: Do not set the size for zero-size uniforms
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them.  This was previously fixed in:

commit: 781dc7c0e1.

However,

commit: 259f7291de

removed the fix.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-28 09:52:59 -07:00
Daniel Scharrer
0516159613 mesa: return old name for deleted samplers for SAMPLER_BINDING queries
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.

Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:39 +02:00
Daniel Scharrer
5aaaaebf22 mesa: add missing queries for ARB_direct_state_access
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:26 +02:00
Neil Roberts
2dbc6a0ad9 docs: Fix a typo in GL3.txt concerning GL_KHR_context_flush_control 2015-08-28 14:29:22 +01:00
Ilia Mirkin
b319fd7c14 mesa: fix dispatch sanity with GL_OES_texture_storage_multisample_2d_array
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91785
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-08-28 03:12:05 -04:00
Vinson Lee
2ef5a4f830 ABI-check: Use more portable bash invocation.
Fixes 'make check' on FreeBSD.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-27 23:48:43 -07:00
Boyan Ding
86c57ebe0e i965/nir: Make use of nir_opt_undef
Shader-db result on Ivy Bridge:
total instructions in shared programs: 145484 -> 145445 (-0.03%)
instructions in affected programs:     225 -> 186 (-17.33%)
helped:                                5
HURT:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-08-27 23:33:49 -07:00
Matt Turner
559b8842fa glapi: Remove _x86_64_get_get_dispatch symbol from x86-64 assembly.
Never used.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-08-27 22:28:49 -07:00
Ilia Mirkin
4a6a47ed05 glsl: clean up textureSize prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-27 23:49:13 -04:00
Glenn Kennard
608c7b4a63 r600g/sb: Don't crash on empty if jump target
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:36 +10:00
Glenn Kennard
a830225adb r600g/sb: Don't read junk after EOP
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.

Add a couple of asserts, and print EOP bit if set in old asm printer.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:32 +10:00
Glenn Kennard
36f1999a87 r600g/sb: Handle undef in read port tracker
e8e443 missed adding check for undef values also in
unreserve function, leading to an assert triggering.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:14 +10:00
Brian Paul
52f7487923 mesa: rename rowStride to imageStride in texturesubimage()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 15:22:01 -06:00
Ilia Mirkin
2259b11100 mesa: only copy the requested teximage faces
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.

zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-27 17:18:43 -04:00
Kenneth Graunke
0a913a9d85 nir: Convert the builder to use the new NIR cursor API.
The NIR cursor API is exactly what we want for the builder's insertion
point.  This simplifies the API, the implementation, and is actually
more flexible as well.

This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.

v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
2015-08-27 13:36:57 -07:00
Kenneth Graunke
3e3cb77901 nir: Convert the NIR instruction insertion API to use cursors.
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point.  It then reworks the existing API to
simply be a wrapper around that for compatibility.

This largely involves moving the existing code into a new function.

Suggested by Connor Abbott.

v2: Make the legacy functions static inline in nir.h (requested by
    Connor Abbott).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
f90c6b1ce0 nir: Move nir_cursor to nir.h.
We want to use this for normal instruction insertion too, not just
control flow.  Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
c44d507752 nir: Strengthen "no jumps" assertions in instruction insertion API.
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.

Previously, we only checked this when the new instruction being inserted
was a jump.  This is a red herring - inserting *any* kind of instruction
after a jump is illegal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Brian Paul
bcae4640c8 st/mesa: use PROGRAM_ARRAY for storing structs containing arrays
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices.  But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.

Before, we failed an assertion:
  state_tracker/st_glsl_to_tgsi.cpp:4900:
  Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-08-27 13:11:26 -06:00
Brian Paul
42c7be5877 glsl: fix comment typo: s/filed/field/ 2015-08-27 13:11:26 -06:00
Brian Paul
3c256f572b gallium/util: fix code formatting in u_blitter.h
Trivial.
2015-08-27 13:11:26 -06:00
Jason Ekstrand
fee0c5af11 i965/fs: Split VGRFs after lowering pull constants
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.

This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:36 -07:00
Jason Ekstrand
f2e667172a i964/fs: Refactor assign_constant_locations
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things.  In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:24 -07:00
Kenneth Graunke
885a9b058c i965: Rename INTEL_DEBUG=vec4vs to INTEL_DEBUG=vec4.
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().

This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.

We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2015-08-27 11:38:50 -07:00
Tapani Pälli
16ad1d2a8d mesa: enable enums for OES_texture_storage_multisample_2d_array
v2: use _mesa_is_gles31(ctx) for verifying we are on ES 3.1,
    remove _es31 usage from get_hash_params.py

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:58:10 +03:00
Tapani Pälli
c2c64fd269 glsl: add support for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample enable bit

Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:54:41 +03:00
Tapani Pälli
b9101b1443 mesa: Add extension enable for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample bit to enable extension

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:53:15 +03:00
Tapani Pälli
f4280b740d glapi: add GL_OES_texture_storage_multisample_2d_array extension
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:52:46 +03:00
Nanley Chery
9a759a6ee0 swrast: add a new macro, FETCH_COMPRESSED
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
42ee16176d mesa: return bool instead of GLboolean in compressedteximage_only_format()
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.

Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
43d5b4db96 i965: refactor miptree alignment calculation code
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.

v2: reintroduce brw.
    don't include functional changes.
    don't adjust function parameters or create a new function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
a687734135 i965: change the meaning of cpp for compressed textures
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.

Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.

v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
    simplify cpp assignment in copy_image_with_blitter().
    update miptree width and offset variables in: intel_miptree_copy_slice(),
        intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
1a9ceed4ba i965: correct mt->align_h for 2D textures on Skylake
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.

v2: add newlines to separate declarations, statments, and comments.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
10ff64fd3d i965: use ALIGN_NPOT for setting ASTC mipmap layouts
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.

v2: handle texture arrays and LDR-only systems.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
54d2aa4258 mesa/macros: move ALIGN_NPOT to macros.h
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.

v2: add greater than 0 assertion (Anuj).
    convert the macro to a static inline function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
97f4efd573 mesa/macros: add power-of-two assertions for alignment macros
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.

v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
8b1f008e9a i965/surface_formats: add support for 2D ASTC surface formats
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.

ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.

v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
    update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
cd49b97a8a mesa/teximage: return the base internal format of the ASTC formats
This is necesary to initialize the gl_texture_image struct.

From the KHR_texture_compression_astc_ldr spec:
  "Added to Section 3.8.6, Compressed Texture Images

   Add the tokens specified above to Table 3.16, Compressed Internal Formats.
   In all cases, the base internal format will be RGBA. The encoding allows
   images to be encoded with fewer channels, but this is always presented as
   RGBA to the sampler."

v2. use _mesa_is_astc_format().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
12b519b457 mesa/teximage: accept ASTC formats for 3D texture specification
The ASTC spec was revised as follows:

   Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
   commands accepting ASTC format tokens in the New Tokens section [...].

Support only exists in the HDR submode:

   Add a second new column "3D Tex." which is empty for all non-ASTC
   formats. If only the LDR profile is supported by the implementation,
   this column is also empty for all ASTC formats. If both the LDR and HDR
   profiles are supported only, this column is checked for all ASTC
   formats.

LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.

v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
23c9cd5a96 mesa/texcompress: enable translation between MESA and GL ASTC formats
v3. conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
692578ed13 mesa/glformats: recognize ASTC formats as compressed
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
4143511b15 mesa: add ASTC extensions to the extensions table
v2: alphabetize the extensions.
    remove OES ASTC extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
582ce1ea97 mesa: don't enable online compression for ASTC formats
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.

v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
e9fd8e154f glapi: add support for KHR_texture_compression_astc_ldr
v2: correct the spelling of the sRGB variants.
    remove spaces around "=" when setting the enum value.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
8ae37365f3 mesa/formats: define the 2D ASTC formats
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().

v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
    only used for generating mipmaps - something ASTC formats do not support
    due to lack of online compression.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Ilia Mirkin
c4cbaca327 nouveau: avoid build failures since 0fc21ecf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 14:04:41 -04:00
Marek Olšák
6924ecac77 gallium/radeon: read_registers should return bool meaning success or failure
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:20 +02:00
Marek Olšák
16e5d8ad38 radeonsi: add IB parser support for CP DMA packets
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
2c14a6d3b1 radeonsi: add IB tracing support for debug contexts
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
189953ee13 radeonsi: remove old CS tracing code
Some of it is left there and it will be re-used in the next commit.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
df6a5666b6 radeonsi: parse and dump status registers on GPU hang
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]

This may print too much information that we might not understand yet,
but some of the bits are very useful.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
61df4f0cd3 radeonsi: add an IB parser
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
be6dc87776 radeonsi: save the contents of indirect buffers for debug contexts
This will be used by the IB parser.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
a6a6c68955 radeonsi: generate register and packet tables for an IB parser from sid.h
This makes writing a good IB parser a lot easier.

It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
d15b71b4bd radeonsi: remove duplicated register definitions and instruction definitions
Instruction encoding isn't needed in Mesa.

The border color address registers were duplicated.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
c59ad265df r600g,radeonsi: remove unused ill-formed register field definitions
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
110873ed11 radeonsi: add an initial dump_debug_state implementation dumping shaders
This is usually called after a draw call.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
93d97db349 radeonsi: allow si_dump_key to write to a file
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
525921ed51 gallium/ddebug: new pipe for hang detection and driver state dumping (v2)
v2: lots of improvements

This is like identity or trace, but simpler. It doesn't wrap most states.

Run with:
  GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.

If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
  /home/$username/dd_dumps/$processname_$pid_$index.

Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.

You can also do:
  GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.

Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.

v2: drop pipe-loader changes, drop SConscript
    rename dd.h -> dd_pipe.h

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
0fc21ecfc0 gallium: add flags parameter to pipe_screen::context_create
This allows creating compute-only and debug contexts.

Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
7b5c92391f gallium: add an interface for dumping debug driver state
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Ilia Mirkin
a3b617a258 mesa: remove pointless es31 checks, fix indirect to only be in es31
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 12:37:38 -04:00
Ilia Mirkin
332fb341dd mesa: uncomment checks in es31 computation, add texture_ms
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
2015-08-26 12:37:17 -04:00
Marek Olšák
f432ae899f mesa: create multisample fallback textures like normal textures
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-26 15:42:26 +02:00
Grazvydas Ignotas
f8b01ae47c radeonsi: mark unreachable paths to avoid warnings
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 15:42:26 +02:00
Tapani Pälli
e0c2ea0337 mesa: GetTexLevelParameter{if}v changes for OpenGL ES 3.1
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.

v2: remove extra is_gles31() checks (suggested by Ilia)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 08:38:25 +03:00
Marta Lofstedt
ae8d0e7abe mesa/es3.1: Allow GL_COMPUTE_WORK_GROUP_SIZE for OpenGL ES 3.1
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Marta Lofstedt
c2a766880d mesa/es3.1: Enable getting MAX_COMPUTE_WORK_GROUP_ values for OpenGL ES 3.1
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Dave Airlie
73e5adc4b2 mesa/formats: pass correct parameter to _mesa_is_format_compressed
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date:   Fri Jul 31 10:26:36 2015 -0700

    mesa/formats: remove compressed formats from matching function

caused a regression in my CTS testing, this looks like a clear
thinko.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 14:13:27 +10:00
Roland Scheidegger
48e6404c04 gallium/auxiliary: optimize rgb9e5 helper some more
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:38 +02:00
Roland Scheidegger
941346a803 gallium/auxiliary: optimize rgb9e5 helper a bit
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
  getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:37 +02:00
Dave Airlie
c1452983b4 mesa/texgetimage: fix missing stencil check
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.

This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 10:22:09 +10:00
Nanley Chery
1d2a844e7d mesa/teximage: Add GL error parameter to _mesa_target_can_be_compressed
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.

Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:53:46 -07:00
Nanley Chery
26c549e69d mesa/formats: remove compressed formats from matching function
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.

v2. Since the exhaustive switch is removed, add a gtest to ensure
    all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
    fix formatting and misc improvements (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:17 -07:00
Nanley Chery
8e581747d2 mesa/formats: make format testing a gtest
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().

v2. indentation and conditional inclusion fixes (Chad).
    allow tests to continue running if any format fails
    and display the failing format name.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:13 -07:00
Kenneth Graunke
1bec29d04d gallium/ttn: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
78856194c1 prog_to_nir: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
5f14c417c8 nir: Use nir_shader::stage rather than passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
d4d5b430a5 nir: Store gl_shader_stage in nir_shader.
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.

Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Jason Ekstrand
dfacae3a56 i965/fs: Combine assign_constant_locations and move_uniform_array_access_to_pull_constants
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c999a58f50 nir/lower_io: Remove assign_var_locations_direct_first
This is no longer used so we might as well get rid of it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
259f7291de i965/fs: Rework uniform handling
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect.  This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform.  This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate.  This should allow us
to more easily make decisions about what to push.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
cfa056c6a5 i965/vec4_nir: Get rid of the uniform_driver_location tracking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
ce5e9139aa nir/lower_io: Separate driver_location and base offset for uniforms
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
0db8e87b4a nir/intrinsics: Add a second const index to load_uniform
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path.  In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load.  Previously, it was completely flattened by
nir_lower_io which made things difficult.  This adds more information to
the intrinsic to make this easier for us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
6c33d6bbf9 nir: Pass a type_size() function pointer into nir_lower_io().
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play.  The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer.  This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
 - One side-effect of passing in a function pointer is that nir_lower_io is
   now aware of and properly allocates space for image uniforms, allowing
   us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
a23f82053d prog_to_nir: Don't allocate nir_variable with type vec4[0] for uniforms.
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.

Should avoid i965 backend assertions in future patches Jason and I are
working on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
640c472fd0 i965: Move type_size() methods out of visitor classes.
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c56899f41a i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
8d8b8f5854 i965: Rename setup_vector_uniform_values to setup_vec4_uniform_value
The new name more accurately represents what it does: Set up a single vec4
uniform value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Rob Clark
0ab29751b6 freedreno/ir3: fix compile break after splitting out nir_control_flow.h
The commit:

  commit b49371b8ed
  Author:     Connor Abbott <cwabbott0@gmail.com>
  AuthorDate: Tue Jul 21 19:54:18 2015 -0700

      nir: move control flow modification to its own file

split out some control flow related APIs into a separate header, but did
not update drivers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:17:30 -04:00
Rob Clark
8b2d0bb844 freedreno/ir3: fix compile break after fxn->start_block removal
The commit:

  commit 8e0d4ef341
  Author:     Kenneth Graunke <kenneth@whitecape.org>
  AuthorDate: Thu Aug 6 18:18:40 2015 -0700

      nir: Delete the nir_function_impl::start_block field.

removed the start_block field without fixing up drivers..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:13:04 -04:00
Dave Airlie
529acab22a mesa: enable texture stencil8 for multisample
This fixes GL45-CTS.gtf44.GL31Tests.texture_stencil8.texture_stencil8_gl44
from the ogl conform suite.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-25 11:06:58 +10:00
Brian Paul
e089ca26e1 mesa: make _mesa_bind_texture_unit() static
It's only called from the file it's defined in.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-24 18:23:19 -06:00
Nanley Chery
8f378d1083 mesa/formats: store whether or not a format is sRGB in gl_format_info
v2: remove extra newline.
v3: use bool instead of GLboolean.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 16:08:01 -07:00
Kenneth Graunke
4f2cdd8497 nir: Use !block_ends_in_jump() in a few places rather than open-coding.
Connor introduced this helper recently; we should use it here too.

I had to move the function earlier in the file for it to be available.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-24 15:10:55 -07:00
Connor Abbott
d7971b41ce nir/cf: reimplement nir_cf_node_remove() using the new API
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
fc7f2d2364 nir/cf: add new control modification API's
These will help us do a number of things, including:

- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:

if (foo) {
    ...
}
if (!foo) {
    ...
}

with:

if (foo) {
    ...
} else {
    ...
}

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
476eb5e4a1 nir/cf: use a cursor for inserting control flow
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
d356f84d4c nir/cf: add split_block_cursor()
This is a helper that will be shared between the new control flow
insertion and modification code.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
58a360c6b8 nir/cf: add split_block_before_instr()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6e47a34b29 nir/cf: add a cursor structure
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().

In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6f5c81f86f nir/cf: fix link_blocks() when there are no successors
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6d028749ac nir/cf: clean up jumps when cleaning up CF nodes
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
211c79515d nir/cf: remove uses of SSA definitions that are being deleted
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
633cbbc068 nir/cf: handle jumps better in stitch_blocks()
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
940873bf22 nir/cf: handle jumps in split_block_end()
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
f596e4021c nir/cf: add block_ends_in_jump()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
788d45cb47 nir/cf: handle phi nodes better in split_block_beginning()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
747ddc3cdd nir/cf: split up and improve nir_handle_remove_jumps()
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
13482111d0 nir/cf: add remove_phi_src() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
f41e108d8b nir: add nir_foreach_phi_src_safe()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
762ae436ea nir/cf: add insert_phi_undef() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
b49371b8ed nir: move control flow modification to its own file
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
1c53f89696 nir: make cleanup_cf_node() not use remove_defs_uses()
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
9d5944053c nir: inline block_add_pred() a few places
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
c7df141c71 nir/validate: check successors/predecessors more carefully
We should be checking almost everything now.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Kenneth Graunke
8e0d4ef341 nir: Delete the nir_function_impl::start_block field.
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly.  Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.

Thanks to Connor Abbott for suggesting this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-24 13:31:41 -07:00
Nanley Chery
9f00af672b mesa/formats: only do type and component lookup for uncompressed formats
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.

v2. include compressed format cases to avoid gcc warnings (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 11:27:46 -07:00
Rob Clark
000e225360 freedreno/a4xx: formats update
Fixes glamor, which wants to use R8 integer textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:16:27 -04:00
Rob Clark
afb6c24a20 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:15:57 -04:00
Chris Wilson
4e5752e2b7 i965: Always re-emit the pipeline select during invariant state emission
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of

commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Wed Apr 22 11:43:50 2015 -0700

    i965/state: Emit pipeline select when changing pipelines

However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.

Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-24 08:57:55 +01:00
Marek Olšák
a83c36b5c0 Revert "radeon/winsys: increase the IB size for VM"
This reverts commit 567394112d.

It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-08-23 19:01:15 +02:00
Ilia Mirkin
e18c29b031 nv50: fix 2d engine blits for 64- and 128-bit formats
This fixes bin/ext_framebuffer_multisample-formats all_samples

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 03:12:07 -04:00
Ilia Mirkin
a6ad49cbbd nv50: account for the int RT0 rule for alpha-to-one/cov
Same as commit 1af0641db but for nvc0. If an integer texture is
bound to RT0, don't do alpha-to-one or alpha-to-coverage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 02:58:58 -04:00
Dave Airlie
45971fd0df mesa/arb_gpu_shader_fp64: add support for glGetUniformdv
This was missed when I did fp64, I've sent a piglit test to cover
the case as well.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-23 15:56:35 +10:00
Ilia Mirkin
abbf05cfc2 nv50,nvc0: disable depth bounds test on blit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 01:39:29 -04:00
Neil Roberts
3a1ab23480 i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.

v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
    when gl_VertexID is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:25:39 -07:00
Neil Roberts
fb02b4ec48 i965: Swap the order of the vertex ID and edge flag attributes
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.

v2: Also implement for BDW+

v3 [by Ben]: Remove 10.5 tag. Too late.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:20:33 -07:00
Glenn Kennard
50932268aa r600g: Fix assert in tgsi_cmp
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=91726

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-08-23 09:31:12 +10:00
Alexander von Gluck IV
5abbd1cacc egl: scons: fix the haiku build, do not build the dri2 backend
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 10:13:31 -05:00
Emil Velikov
a8c5c62359 docs: add 11.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 13:28:16 +01:00
906 changed files with 61019 additions and 28936 deletions

View File

@@ -42,6 +42,7 @@ LOCAL_CFLAGS += \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
LOCAL_CFLAGS += \
-D__STDC_LIMIT_MACROS \
-DHAVE___BUILTIN_EXPECT \
-DHAVE___BUILTIN_FFS \
-DHAVE___BUILTIN_FFSLL \
@@ -70,7 +71,7 @@ endif
ifeq ($(MESA_ENABLE_LLVM),true)
LOCAL_CFLAGS += \
-DHAVE_LLVM=0x0305 -DLLVM_VERSION_PATCH=2 \
-DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2 \
-D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS \
-D__STDC_LIMIT_MACROS

View File

@@ -1 +1 @@
11.0.0-devel
11.1.0-devel

View File

@@ -74,7 +74,7 @@ LIBDRM_AMDGPU_REQUIRED=2.4.63
LIBDRM_INTEL_REQUIRED=2.4.61
LIBDRM_NVVIEUX_REQUIRED=2.4.33
LIBDRM_NOUVEAU_REQUIRED=2.4.62
LIBDRM_FREEDRENO_REQUIRED=2.4.64
LIBDRM_FREEDRENO_REQUIRED=2.4.65
DRI2PROTO_REQUIRED=2.6
DRI3PROTO_REQUIRED=1.0
PRESENTPROTO_REQUIRED=1.0
@@ -533,15 +533,32 @@ AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
dnl
dnl library names
dnl
dnl Unfortunately we need to do a few things that libtool can't help us with,
dnl so we need some knowledge of shared library filenames:
dnl
dnl LIB_EXT is the extension used when creating symlinks for alternate
dnl filenames for a shared library which will be dynamically loaded
dnl
dnl IMP_LIB_EXT is the extension used when checking for the presence of a
dnl the file for a shared library we wish to link with
dnl
case "$host_os" in
darwin* )
LIB_EXT='dylib' ;;
LIB_EXT='dylib'
IMP_LIB_EXT=$LIB_EXT
;;
cygwin* )
LIB_EXT='dll' ;;
LIB_EXT='dll'
IMP_LIB_EXT='dll.a'
;;
aix* )
LIB_EXT='a' ;;
LIB_EXT='a'
IMP_LIB_EXT=$LIB_EXT
;;
* )
LIB_EXT='so' ;;
LIB_EXT='so'
IMP_LIB_EXT=$LIB_EXT
;;
esac
AC_SUBST([LIB_EXT])
@@ -1110,6 +1127,11 @@ AC_MSG_RESULT([$with_sha1])
AC_SUBST(SHA1_LIBS)
AC_SUBST(SHA1_CFLAGS)
# Enable a define for SHA1
if test "x$with_sha1" != "x"; then
DEFINES="$DEFINES -DHAVE_SHA1"
fi
# Allow user to configure out the shader-cache feature
AC_ARG_ENABLE([shader-cache],
AS_HELP_STRING([--disable-shader-cache], [Disable binary shader cache]),
@@ -1289,6 +1311,16 @@ AC_SUBST(GLX_TLS, ${GLX_USE_TLS})
AS_IF([test "x$GLX_USE_TLS" = xyes -a "x$ax_pthread_ok" = xyes],
[DEFINES="${DEFINES} -DGLX_USE_TLS"])
dnl Read-only text section on x86 hardened platforms
AC_ARG_ENABLE([glx-read-only-text],
[AS_HELP_STRING([--enable-glx-read-only-text],
[Disable writable .text section on x86 (decreases performance) @<:@default=disabled@:>@])],
[enable_glx_read_only_text="$enableval"],
[enable_glx_read_only_text=no])
if test "x$enable_glx_read_only_text" = xyes; then
DEFINES="$DEFINES -DGLX_X86_READONLY_TEXT"
fi
dnl
dnl More DRI setup
dnl
@@ -2051,7 +2083,7 @@ radeon_llvm_check() {
if test "x$enable_gallium_llvm" != "xyes"; then
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
llvm_check_version_for "3" "4" "2" $1
llvm_check_version_for "3" "5" "0" $1
if test true && $LLVM_CONFIG --targets-built | grep -iqvw $amdgpu_llvm_target_name ; then
AC_MSG_ERROR([LLVM $amdgpu_llvm_target_name not enabled in your LLVM build.])
fi
@@ -2139,11 +2171,8 @@ if test -n "$with_gallium_drivers"; then
gallium_require_drm "vc4"
gallium_require_drm_loader
case "$host_cpu" in
i?86 | x86_64 | amd64)
USE_VC4_SIMULATOR=yes
;;
esac
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes], [USE_VC4_SIMULATOR=no])
;;
*)
AC_MSG_ERROR([Unknown Gallium driver: $driver])
@@ -2163,10 +2192,14 @@ if test "x$MESA_LLVM" != x0; then
LLVM_LIBS="`$LLVM_CONFIG --libs ${LLVM_COMPONENTS}`"
dnl llvm-config may not give the right answer when llvm is a built as a
dnl single shared library, so we must work the library name out for
dnl ourselves.
dnl (See https://llvm.org/bugs/show_bug.cgi?id=6823)
if test "x$enable_llvm_shared_libs" = xyes; then
dnl We can't use $LLVM_VERSION because it has 'svn' stripped out,
LLVM_SO_NAME=LLVM-`$LLVM_CONFIG --version`
AS_IF([test -f "$LLVM_LIBDIR/lib$LLVM_SO_NAME.so"], [llvm_have_one_so=yes])
AS_IF([test -f "$LLVM_LIBDIR/lib$LLVM_SO_NAME.$IMP_LIB_EXT"], [llvm_have_one_so=yes])
if test "x$llvm_have_one_so" = xyes; then
dnl LLVM was built using auto*, so there is only one shared object.
@@ -2174,7 +2207,7 @@ if test "x$MESA_LLVM" != x0; then
else
dnl If LLVM was built with CMake, there will be one shared object per
dnl component.
AS_IF([test ! -f "$LLVM_LIBDIR/libLLVMTarget.so"],
AS_IF([test ! -f "$LLVM_LIBDIR/libLLVMTarget.$IMP_LIB_EXT"],
[AC_MSG_ERROR([Could not find llvm shared libraries:
Please make sure you have built llvm with the --enable-shared option
and that your llvm libraries are installed in $LLVM_LIBDIR
@@ -2317,6 +2350,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
src/gallium/drivers/freedreno/Makefile
src/gallium/drivers/ddebug/Makefile
src/gallium/drivers/i915/Makefile
src/gallium/drivers/ilo/Makefile
src/gallium/drivers/llvmpipe/Makefile

View File

@@ -109,14 +109,14 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
- Enhanced per-sample shading DONE (r600)
- Interpolation functions DONE (r600)
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe)
GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50, r600)
GL_ARB_shader_subroutine DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_tessellation_shader DONE ()
GL_ARB_texture_buffer_object_rgb32 DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_texture_cube_map_array DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_texture_query_lod DONE (i965, nv50, r600)
GL_ARB_texture_query_lod DONE (i965, nv50, r600, softpipe)
GL_ARB_transform_feedback2 DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_transform_feedback3 DONE (i965, nv50, r600, llvmpipe, softpipe)
@@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe)
GL_ARB_vertex_attrib_64bit DONE (r600, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, r600, llvmpipe)
@@ -164,7 +164,7 @@ GL 4.3, GLSL 4.30:
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior not started
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object in progress (Iago Toral, Samuel Iglesias)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
@@ -178,7 +178,13 @@ GL 4.4, GLSL 4.40:
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965) (gallium - in progress, VMware)
GL_ARB_enhanced_layouts not started
GL_ARB_enhanced_layouts in progress (Timothy)
- compile-time constant expressions in progress
- explicit byte offsets for blocks in progress
- forced alignment within blocks in progress
- specified vec4-slot component numbers in progress
- specified transform/feedback layout in progress
- input/output block locations in progress
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object not started
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
@@ -194,9 +200,9 @@ GL 4.5, GLSL 4.50:
GL_ARB_derivative_control DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples not started
GL_ARB_texture_barrier DONE (nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EXT extension to be useful)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_texture_barrier DONE (i965, nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robust_buffer_access_behavior not started
GL_KHR_robustness 90% done (the ARB variant)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
@@ -212,7 +218,7 @@ GLES3.1, GLSL ES 3.1
GL_ARB_shader_atomic_counters DONE (i965)
GL_ARB_shader_image_load_store DONE (i965)
GL_ARB_shader_image_size DONE (i965)
GL_ARB_shader_storage_buffer_object in progress (Iago Toral, Samuel Iglesias)
GL_ARB_shader_storage_buffer_object DONE (i965)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
@@ -223,10 +229,35 @@ GLES3.1, GLSL ES 3.1
GS5 Packing/bitfield/conversion functions DONE (i965, nvc0, r600, radeonsi)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
Additional functions not covered above:
glMemoryBarrierByRegion
glGetTexLevelParameter[fi]v - needs updates to restrict to GLES enums
glGetBooleani_v - needs updates to restrict to GLES enums
Additional functionality not covered above:
glMemoryBarrierByRegion DONE
glGetTexLevelParameter[fi]v - needs updates DONE
glGetBooleani_v - restrict to GLES enums
gl_HelperInvocation support
GLES3.2, GLSL ES 3.2
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced not started
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness 90% done (the ARB variant)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image not started (based on GL_ARB_copy_image, which is done for some drivers)
GL_OES_draw_buffers_indexed not started
GL_OES_draw_elements_base_vertex not started (based on GL_ARB_draw_elements_base_vertex, which is done for all drivers)
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
http://dri.freedesktop.org/wiki/MissingFunctionality

View File

@@ -87,6 +87,13 @@ created in a <code>lib64</code> directory at the top of the Mesa source
tree.</p>
</dd>
<dt><code>--sysconfdir=DIR</code></dt>
<dd><p>This option specifies the directory where the configuration
files will be installed. The default is <code>${prefix}/etc</code>.
Currently there's only one config file provided when dri drivers are
enabled - it's <code>drirc</code>.</p>
</dd>
<dt><code>--enable-static, --disable-shared</code></dt>
<dd><p>By default, Mesa
will build shared libraries. Either of these options will force static
@@ -217,7 +224,7 @@ GLX.
<dt><code>--with-expat=DIR</code>
<dd><p><strong>DEPRECATED</strong>, use <code>PKG_CONFIG_PATH</code> instead.</p>
<p>The DRI-enabled libGL uses expat to
parse the DRI configuration files in <code>/etc/drirc</code> and
parse the DRI configuration files in <code>${sysconfdir}/drirc</code> and
<code>~/.drirc</code>. This option allows a specific expat installation
to be used. For example, <code>--with-expat=/usr/local</code> will
search for expat headers and libraries in <code>/usr/local/include</code>

View File

@@ -153,6 +153,7 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
</ul>
</ul>

View File

@@ -16,25 +16,72 @@
<h1>News</h1>
<h2>August 22 2015</h2>
<h2>October 3, 2015</h2>
<p>
<a href="relnotes/10.6.9.html">Mesa 10.6.9</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 10.6.9 will be the final release in the 10.6
series. Users of 10.5 are encouraged to migrate to the 11.0 series in order
to obtain future fixes.
</p>
<h2>September 28, 2015</h2>
<p>
<a href="relnotes/11.0.2.html">Mesa 11.0.2</a> is released.
This is a bug-fix release.
</p>
<h2>September 26, 2015</h2>
<p>
<a href="relnotes/11.0.1.html">Mesa 11.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>September 20, 2015</h2>
<p>
<a href="relnotes/10.6.8.html">Mesa 10.6.8</a> is released.
This is a bug-fix release.
</p>
<h2>September 12, 2015</h2>
<p>
<a href="relnotes/11.0.0.html">Mesa 11.0.0</a> is released. This is a new
development release. See the release notes for more information about
the release.
</p>
<h2>September 10, 2015</h2>
<p>
<a href="relnotes/10.6.7.html">Mesa 10.6.7</a> is released.
This is a bug-fix release.
</p>
<h2>September 4, 2015</h2>
<p>
<a href="relnotes/10.6.6.html">Mesa 10.6.6</a> is released.
This is a bug-fix release.
</p>
<h2>August 22, 2015</h2>
<p>
<a href="relnotes/10.6.5.html">Mesa 10.6.5</a> is released.
This is a bug-fix release.
</p>
<h2>August 11 2015</h2>
<h2>August 11, 2015</h2>
<p>
<a href="relnotes/10.6.4.html">Mesa 10.6.4</a> is released.
This is a bug-fix release.
</p>
<h2>July 26 2015</h2>
<h2>July 26, 2015</h2>
<p>
<a href="relnotes/10.6.3.html">Mesa 10.6.3</a> is released.
This is a bug-fix release.
</p>
<h2>July 11 2015</h2>
<h2>July 11, 2015</h2>
<p>
<a href="relnotes/10.6.2.html">Mesa 10.6.2</a> is released.
This is a bug-fix release.

View File

@@ -21,6 +21,13 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/10.6.9.html">10.6.9 release notes</a>
<li><a href="relnotes/11.0.2.html">11.0.2 release notes</a>
<li><a href="relnotes/11.0.1.html">11.0.1 release notes</a>
<li><a href="relnotes/10.6.8.html">10.6.8 release notes</a>
<li><a href="relnotes/11.0.0.html">11.0.0 release notes</a>
<li><a href="relnotes/10.6.7.html">10.6.7 release notes</a>
<li><a href="relnotes/10.6.6.html">10.6.6 release notes</a>
<li><a href="relnotes/10.6.5.html">10.6.5 release notes</a>
<li><a href="relnotes/10.6.4.html">10.6.4 release notes</a>
<li><a href="relnotes/10.6.3.html">10.6.3 release notes</a>

164
docs/relnotes/10.6.6.html Normal file
View File

@@ -0,0 +1,164 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.6 Release Notes / September 04, 2015</h1>
<p>
Mesa 10.6.6 is a bug fix release which fixes bugs found since the 10.6.5 release.
</p>
<p>
Mesa 10.6.6 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
416517aa9df4791f97d34451a9e4da33c966afcd18c115c5769b92b15b018ef5 mesa-10.6.6.tar.gz
570f2154b7340ff5db61ff103bc6e85165b8958798b78a50fa2df488e98e5778 mesa-10.6.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90748">Bug 90748</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.depth.rg_half_float_oes fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90902">Bug 90902</a> - [bsw][regression] dEQP: &quot;Found invalid pixel values&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - &quot;high fidelity&quot;: Segfault in _mesa_program_resource_find_name</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
</ul>
<h2>Changes</h2>
<p>Chris Wilson (2):</p>
<ul>
<li>i965: Prevent coordinate overflow in intel_emit_linear_blit</li>
<li>i965: Always re-emit the pipeline select during invariant state emission</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: add missing queries for ARB_direct_state_access</li>
</ul>
<p>Dave Airlie (8):</p>
<ul>
<li>mesa/arb_gpu_shader_fp64: add support for glGetUniformdv</li>
<li>mesa/texgetimage: fix missing stencil check</li>
<li>st/readpixels: fix accel path for skipimages.</li>
<li>texcompress_s3tc/fxt1: fix stride checks (v1.1)</li>
<li>mesa/readpixels: check strides are equal before skipping conversion</li>
<li>mesa: enable texture stencil8 for multisample</li>
<li>r600/sb: update last_cf for finalize if.</li>
<li>r600g: fix calculation for gpr allocation</li>
</ul>
<p>David Heidelberg (1):</p>
<ul>
<li>st/nine: Require gcc &gt;= 4.6</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.5</li>
<li>get-pick-list.sh: Require explicit "10.6" for nominating stable patches</li>
</ul>
<p>Glenn Kennard (4):</p>
<ul>
<li>r600g: Fix assert in tgsi_cmp</li>
<li>r600g/sb: Handle undef in read port tracker</li>
<li>r600g/sb: Don't read junk after EOP</li>
<li>r600g/sb: Don't crash on empty if jump target</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: fix assignments with 4-operand arguments (i.e. BFI)</li>
<li>st/mesa: pass through 4th opcode argument in bitmap/pixel visitors</li>
<li>nv50,nvc0: disable depth bounds test on blit</li>
<li>nv50: fix 2d engine blits for 64- and 128-bit formats</li>
<li>mesa: only copy the requested teximage faces</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Split VGRFs after lowering pull constants</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>i965: Fix copy propagation type changes.</li>
<li>Revert "i965: Advertise a line width of 40.0 on Cherryview and Skylake."</li>
<li>i965: Momentarily pretend to support ARB_texture_stencil8 for blits.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets</li>
<li>mesa: create multisample fallback textures like normal textures</li>
<li>radeonsi: fix a Unigine Heaven hang when drirc is missing</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>i965/fs: Handle MRF destinations in lower_integer_multiplication().</li>
</ul>
<p>Neil Roberts (2):</p>
<ul>
<li>i965: Swap the order of the vertex ID and edge flag attributes</li>
<li>i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used</li>
</ul>
<p>Tapani Pälli (5):</p>
<ul>
<li>mesa: update fbo state in glTexStorage</li>
<li>glsl: build stageref mask using IR, not symbol table</li>
<li>glsl: expose build_program_resource_list function</li>
<li>glsl: create program resource list after LinkShader</li>
<li>mesa: add GL_RED, GL_RG support for floating point textures</li>
</ul>
</div>
</body>
</html>

75
docs/relnotes/10.6.7.html Normal file
View File

@@ -0,0 +1,75 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.7 Release Notes / September 10, 2015</h1>
<p>
Mesa 10.6.7 is a bug fix release which fixes bugs found since the 10.6.6 release.
</p>
<p>
Mesa 10.6.7 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
4ba10c59abee30d72476543a57afd2f33803dabf4620dc333b335d47966ff842 mesa-10.6.7.tar.gz
feb1f640b915dada88a7c793dfaff0ae23580f8903f87a6b76469253de0d28d8 mesa-10.6.7.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90751">Bug 90751</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.stencil.stencil_index8 fails</li>
</ul>
<h2>Changes</h2>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/teximage: use correct extension for accept stencil texture.</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.6</li>
<li>Revert "i965: Momentarily pretend to support ARB_texture_stencil8 for blits."</li>
<li>Update version to 10.6.7</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>glsl: Handle attribute aliasing in attribute storage limit check.</li>
</ul>
</div>
</body>
</html>

136
docs/relnotes/10.6.8.html Normal file
View File

@@ -0,0 +1,136 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.8 Release Notes / September 20, 2015</h1>
<p>
Mesa 10.6.8 is a bug fix release which fixes bugs found since the 10.6.7 release.
</p>
<p>
Mesa 10.6.8 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
1f34dba2a8059782e3e4e0f18b9628004e253b2c69085f735b846d2e63c9e250 mesa-10.6.8.tar.gz
e36ee5ceeadb3966fb5ce5b4cf18322dbb76a4f075558ae49c3bba94f57d58fd mesa-10.6.8.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Antia Puentes (1):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.7</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
</ul>
<p>Hans de Goede (4):</p>
<ul>
<li>nv30: Fix creation of scanout buffers</li>
<li>nv30: Implement color resolve for msaa</li>
<li>nv30: Fix max width / height checks in nv30 sifm code</li>
<li>nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type</li>
<li>mesa: Don't allow wrong type setters for matrix uniforms</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: don't fall back to 16F when 32F is requested</li>
<li>nvc0: always emit a full shader colormask</li>
<li>nvc0: remove BGRA4 format support</li>
<li>st/mesa: avoid integer overflows with buffers &gt;= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/vec4: Don't reswizzle hardware registers</li>
</ul>
<p>Jose Fonseca (1):</p>
<ul>
<li>gallivm: Workaround LLVM PR23628.</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Momentarily pretend to support ARB_texture_stencil8 for blits.</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: convert double to long long instead of unsigned long long</li>
</ul>
<p>Ray Strode (1):</p>
<ul>
<li>gbm: convert gbm bo format to fourcc format on dma-buf import</li>
</ul>
<p>Ulrich Weigand (1):</p>
<ul>
<li>mesa: Fix texture compression on big-endian systems</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>gallivm: Do not use NoFramePointerElim with LLVM 3.7.</li>
</ul>
</div>
</body>
</html>

130
docs/relnotes/10.6.9.html Normal file
View File

@@ -0,0 +1,130 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 10.6.9 Release Notes / Octover 03, 2015</h1>
<p>
Mesa 10.6.9 is a bug fix release which fixes bugs found since the 10.6.8 release.
</p>
<p>
Mesa 10.6.9 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
3406876aac67546d0c3e2cb97da330b62644c313e7992b95618662e13c54296a mesa-10.6.9.tar.gz
b04c4de6280b863babc2929573da17218d92e9e4ba6272d548d135415723e8c3 mesa-10.6.9.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats</li>
</ul>
<p>Chris Wilson (1):</p>
<ul>
<li>i965: Remove early release of DRI2 miptree</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.8</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
<li>Update version to 10.6.9</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.</li>
</ul>
<p>Ian Romanick (5):</p>
<ul>
<li>t_dd_dmatmp: Make "count" actually be the count</li>
<li>t_dd_dmatmp: Clean up improper code formatting from previous patch</li>
<li>t_dd_dmatmp: Use '&amp; 3' instead of '% 4' everywhere</li>
<li>t_dd_dmatmp: Pull out common 'count -= count &amp; 3' code</li>
<li>t_dd_dmatmp: Use addition instead of subtraction in loop bounds</li>
</ul>
<p>Jeremy Huddleston (1):</p>
<ul>
<li>configure.ac: Add support to enable read-only text segment on x86.</li>
</ul>
<p>Kristian Høgsberg Kristensen (1):</p>
<ul>
<li>i965: Respect stride and subreg_offset for ATTR registers</li>
</ul>
<p>Kyle Brenneman (3):</p>
<ul>
<li>glx: Fix build errors with --enable-mangling (v2)</li>
<li>mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.</li>
<li>glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: fix vui time_scale zero error</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: fix front buffer regression after dropping st_validate_state in Blit</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>mesa: fix mipmap generation for immutable, compressed textures</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.0 Release Notes / TBD</h1>
<h1>Mesa 11.0.0 Release Notes / September 12, 2015</h1>
<p>
Mesa 11.0.0 is a new development release.
@@ -33,7 +33,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD.
7d7e4ddffa3b162506efa01e2cc41e329caa4995336b92e5cc21f2e1fb36c1b3 mesa-11.0.0.tar.gz
e095a3eb2eca9dfde7efca8946527c8ae20a0cc938a8c78debc7f158ad44af32 mesa-11.0.0.tar.xz
</pre>
@@ -83,13 +84,175 @@ Note: some of the new features are only available with certain drivers.
<li>EGL 1.5 on r600, radeonsi, nv50, nvc0</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=51658">Bug 51658</a> - r200 (&amp; possibly radeon) DRI fixes for gnome shell on Mesa 8.0.3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=65525">Bug 65525</a> - [llvmpipe] lp_scene.h:210:lp_scene_alloc: Assertion `size &lt;= (64 * 1024)' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=66346">Bug 66346</a> - shader_query.cpp:49: error: invalid conversion from 'void*' to 'GLuint'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=73512">Bug 73512</a> - [clover] mesa.icd. should contain full path</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=74329">Bug 74329</a> - Please expose OES_texture_float and OES_texture_half_float on the ES3 context</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84225">Bug 84225</a> - Allow constant-index-expression sampler array indexing with GLSL-ES &lt; 300</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90000">Bug 90000</a> - [i965 Bisected NIR] Piglit/gglean_fragprog1-z-write_test fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90073">Bug 90073</a> - Leaks in xcb_dri3_open_reply_fds() and get_render_node_from_id_path_tag</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90249">Bug 90249</a> - Fails to build egl_dri2 on osx</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90537">Bug 90537</a> - radeonsi bo/va conflict on RADEON_GEM_VA (rscreen-&gt;ws-&gt;buffer_from_handle returns NULL)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90547">Bug 90547</a> - [BDW/BSW/SKL Bisected]Piglit/glean&#64;vertprog1-rsq_test_2_(reciprocal_square_root_of_negative_value) fais</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90580">Bug 90580</a> - [HSW bisected] integer multiplication bug</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90600">Bug 90600</a> - IOError: [Errno 2] No such file or directory: 'gl_API.xml'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90629">Bug 90629</a> - [i965] SIMD16 dual_source_blend assertion `src[i].file != GRF || src[i].width == dst.width' failed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90691">Bug 90691</a> - [BSW]Piglit/spec/nv_conditional_render/dlist fails intermittently</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90728">Bug 90728</a> - dvd playback with vlc and vdpau causes segmentation fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90748">Bug 90748</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.depth.rg_half_float_oes fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90749">Bug 90749</a> - [BDW Bisected]dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90751">Bug 90751</a> - [BDW Bisected]dEQP-GLES3.functional.fbo.completeness.renderable.texture.stencil.stencil_index8 fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90797">Bug 90797</a> - [ALL bisected] Mesa change cause performance case manhattan fail.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90817">Bug 90817</a> - swrast fails to load with certain remote X servers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90895">Bug 90895</a> - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex &amp; ALU and many SynMark cases performance reduced by 10-23%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90902">Bug 90902</a> - [bsw][regression] dEQP: &quot;Found invalid pixel values&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90903">Bug 90903</a> - egl_dri2.c:dri2_load fails to load libglapi on osx</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90904">Bug 90904</a> - OSX: EXC_BAD_ACCESS when using translate_sse + gallium + softpipe/llvmpipe</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90905">Bug 90905</a> - mesa: Finish subdir-objects transition</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - &quot;high fidelity&quot;: Segfault in _mesa_program_resource_find_name</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91022">Bug 91022</a> - [g45 g965 bisected] assertions generated from textureGrad cube samplers fix</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91077">Bug 91077</a> - dri2_glx.c:1186: undefined reference to `loader_open_device'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91099">Bug 91099</a> - [llvmpipe] piglit glsl-max-varyings &gt;max_varying_components regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91101">Bug 91101</a> - [softpipe] piglit glsl-1.50&#64;execution&#64;geometry&#64;max-input-components regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91173">Bug 91173</a> - Oddworld: Stranger's Wrath HD: disfigured models in wrong colors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91193">Bug 91193</a> - [290x] Dota2 reborn ingame rendering breaks with git-af4b9c7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91222">Bug 91222</a> - lp_test_format regression on CentOS 7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91226">Bug 91226</a> - Crash in glLinkProgram (NEW)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91231">Bug 91231</a> - [NV92] Psychonauts (native) segfaults on start when DRI3 enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91290">Bug 91290</a> - SIGSEGV glcpp/glcpp-parse.y:1077</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91337">Bug 91337</a> - OSMesaGetProcAdress(&quot;OSMesaPixelStore&quot;) returns nil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91418">Bug 91418</a> - Visual Studio 2015 vsnprintf build error</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91425">Bug 91425</a> - [regression, bisected] Piglit spec/ext_packed_float/ getteximage-invalid-format-for-packed-type fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91441">Bug 91441</a> - make check DispatchSanity_test.GL30 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91444">Bug 91444</a> - regression bisected radeonsi: don't change pipe_resource in resource_copy_region</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91461">Bug 91461</a> - gl_TessLevel* writes have no effect for all but the last TCS invocation</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91513">Bug 91513</a> - [IVB/HSW/BDW/SKL Bisected] Lightsmark performance reduced by 7%-10%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91544">Bug 91544</a> - [i965, regression, bisected] regression of several tests in 93977d3a151675946c03e</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91591">Bug 91591</a> - rounding.h:102:2: error: #error &quot;Unsupported or undefined LONG_BIT&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image &amp; DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
</ul>
<h2>Changes</h2>
TBD.
<li>Removed the EGL loader from the Linux SCons build.</li>
</div>
</body>

134
docs/relnotes/11.0.1.html Normal file
View File

@@ -0,0 +1,134 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.1 Release Notes / September 26, 2015</h1>
<p>
Mesa 11.0.1 is a bug fix release which fixes bugs found since the 11.0.0 release.
</p>
<p>
Mesa 11.0.1 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
6dab262877e12c0546a0e2970c6835a0f217e6d4026ccecb3cd5dd733d1ce867 mesa-11.0.1.tar.gz
43d0dfcd1f1e36f07f8228cd76d90175d3fc74c1ed25d7071794a100a98ef2a6 mesa-11.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
</ul>
<h2>Changes</h2>
<p>Antia Puentes (2):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
<li>i965/vec4_nir: Load constants as integers</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>meta: Abort meta pbo path if TexSubImage need signed unsigned conversion</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.0</li>
<li>Update version to 11.0.1</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>mesa: Fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for default framebuffer.</li>
</ul>
<p>Ian Romanick (5):</p>
<ul>
<li>t_dd_dmatmp: Make "count" actually be the count</li>
<li>t_dd_dmatmp: Clean up improper code formatting from previous patch</li>
<li>t_dd_dmatmp: Use '&amp; 3' instead of '% 4' everywhere</li>
<li>t_dd_dmatmp: Pull out common 'count -= count &amp; 3' code</li>
<li>t_dd_dmatmp: Use addition instead of subtraction in loop bounds</li>
</ul>
<p>Ilia Mirkin (6):</p>
<ul>
<li>st/mesa: avoid integer overflows with buffers &gt;= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
<li>freedreno/a3xx: fix blending of L8 format</li>
<li>nv50,nvc0: detect underlying resource changes and update tic</li>
<li>nv50,nvc0: flush texture cache in presence of coherent bufs</li>
<li>radeonsi: load fmask ptr relative to the resources array</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir: Fix a bunch of ralloc parenting errors</li>
<li>i965/vec4: Don't reswizzle hardware registers</li>
</ul>
<p>Jeremy Huddleston (1):</p>
<ul>
<li>configure.ac: Add support to enable read-only text segment on x86.</li>
</ul>
<p>Ray Strode (1):</p>
<ul>
<li>gbm: convert gbm bo format to fourcc format on dma-buf import</li>
</ul>
<p>Tapani Pälli (2):</p>
<ul>
<li>mesa: fix errors when reading depth with glReadPixels</li>
<li>i965: fix textureGrad for cubemaps</li>
</ul>
<p>Ulrich Weigand (1):</p>
<ul>
<li>mesa: Fix texture compression on big-endian systems</li>
</ul>
</div>
</body>
</html>

85
docs/relnotes/11.0.2.html Normal file
View File

@@ -0,0 +1,85 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.2 Release Notes / September 28, 2015</h1>
<p>
Mesa 11.0.2 is a bug fix release which fixes bugs found since the 11.0.1 release.
</p>
<p>
Mesa 11.0.2 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
45170773500d6ae2f9eb93fc85efee69f7c97084411ada4eddf92f78bca56d20 mesa-11.0.2.tar.gz
fce11fb27eb87adf1e620a76455d635c6136dfa49ae58c53b34ef8d0c7b7eae4 mesa-11.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91582">Bug 91582</a> - [bisected] Regression in DEQP gles2.functional.negative_api.texture.texsubimage2d_neg_offset</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
</ul>
<h2>Changes</h2>
<p>Eduardo Lima Mitev (3):</p>
<ul>
<li>mesa: Fix order of format+type and internal format checks for glTexImageXD ops</li>
<li>mesa: Move _mesa_base_tex_format() from teximage to glformats files</li>
<li>mesa: Use the effective internal format instead for validation</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.1</li>
<li>Update version to 11.0.2</li>
</ul>
<p>Kristian Høgsberg Kristensen (1):</p>
<ul>
<li>i965: Respect stride and subreg_offset for ATTR registers</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>glsl: Expose gl_MaxTess{Control,Evaluation}AtomicCounters.</li>
</ul>
</div>
</body>
</html>

67
docs/relnotes/11.1.0.html Normal file
View File

@@ -0,0 +1,67 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
<p>
Mesa 11.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.1.1.
</p>
<p>
Mesa 11.1.0 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_blend_func_extended on freedreno (a3xx)</li>
<li>GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips</li>
<li>GL_ARB_shader_storage_buffer_object on i965</li>
<li>GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, radeonsi</li>
<li>GL_ARB_texture_barrier / GL_NV_texture_barrier on i965</li>
<li>GL_ARB_texture_query_lod on softpipe</li>
<li>EGL_KHR_create_context on softpipe, llvmpipe</li>
<li>EGL_KHR_gl_colorspace on softpipe, llvmpipe</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -63,6 +63,20 @@ execution. These are generally used for debugging.
Example: export MESA_GLSL=dump,nopt
</p>
<p>
Shaders can be dumped and replaced on runtime for debugging purposes. Mesa
needs to be configured with '--with-sha1' to enable this functionality. This
feature is not currently supported by SCons build.
This is controlled via following environment variables:
<ul>
<li><b>MESA_SHADER_DUMP_PATH</b> - path where shader sources are dumped
<li><b>MESA_SHADER_READ_PATH</b> - path where replacement shaders are read
</ul>
Note, path set must exist before running for dumping or replacing to work.
When both are set, these paths should be different so the dumped shaders do
not clobber the replacement shaders.
</p>
<h2 id="support">GLSL Version</h2>

View File

@@ -26,6 +26,31 @@ VMware Workstation running on Linux or Windows and VMware Fusion running on
MacOS are all supported.
</p>
<p>
With the August 2015 Workstation 12 / Fusion 8 releases, OpenGL 3.3
is supported in the guest.
This requires:
<ul>
<li>The VM is configured for virtual hardware version 12.
<li>The host OS, GPU and graphics driver supports DX11 (Windows) or
OpenGL 4.0 (Linux, Mac)
<li>On Linux, the vmwgfx kernel module must be version 2.9.0 or later.
<li>A recent version of Mesa with the updated svga gallium driver.
</ul>
</p>
<p>
Otherwise, OpenGL 2.1 is supported.
</p>
<p>
OpenGL 3.3 support can be disabled by setting the environment variable
SVGA_VGPU10=0.
You will then have OpenGL 2.1 support.
This may be useful to work around application bugs (such as incorrect use
of the OpenGL 3.x core profile).
</p>
<p>
Most modern Linux distros include the SVGA3D driver so end users shouldn't
be concerned with this information.
@@ -227,6 +252,16 @@ If you don't see this, try setting this environment variable:
then rerun glxinfo and examine the output for error messages.
</p>
<p>
If OpenGL 3.3 is not working (you only get OpenGL 2.1):
</p>
<ul>
<li>Make sure the VM uses hardware version 12.
<li>Make sure the vmwgfx kernel module is version 2.9.0 or later.
<li>Check the vmware.log file for errors.
<li>Run 'dmesg | grep vmwgfx' and look for "DX: yes".
</div>
</body>
</html>

View File

@@ -102,9 +102,8 @@ call_once(once_flag *flag, void (*func)(void))
static inline int
cnd_broadcast(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_broadcast(cond);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_broadcast(cond) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.2
@@ -119,18 +118,16 @@ cnd_destroy(cnd_t *cond)
static inline int
cnd_init(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_init(cond, NULL);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_init(cond, NULL) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.4
static inline int
cnd_signal(cnd_t *cond)
{
if (!cond) return thrd_error;
pthread_cond_signal(cond);
return thrd_success;
assert(cond != NULL);
return (pthread_cond_signal(cond) == 0) ? thrd_success : thrd_error;
}
// 7.25.3.5
@@ -139,7 +136,14 @@ cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt)
{
struct timespec abs_time;
int rt;
if (!cond || !mtx || !xt) return thrd_error;
assert(mtx != NULL);
assert(cond != NULL);
assert(xt != NULL);
abs_time.tv_sec = xt->sec;
abs_time.tv_nsec = xt->nsec;
rt = pthread_cond_timedwait(cond, mtx, &abs_time);
if (rt == ETIMEDOUT)
return thrd_busy;
@@ -150,9 +154,9 @@ cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt)
static inline int
cnd_wait(cnd_t *cond, mtx_t *mtx)
{
if (!cond || !mtx) return thrd_error;
pthread_cond_wait(cond, mtx);
return thrd_success;
assert(mtx != NULL);
assert(cond != NULL);
return (pthread_cond_wait(cond, mtx) == 0) ? thrd_success : thrd_error;
}
@@ -161,7 +165,7 @@ cnd_wait(cnd_t *cond, mtx_t *mtx)
static inline void
mtx_destroy(mtx_t *mtx)
{
assert(mtx);
assert(mtx != NULL);
pthread_mutex_destroy(mtx);
}
@@ -170,7 +174,7 @@ static inline int
mtx_init(mtx_t *mtx, int type)
{
pthread_mutexattr_t attr;
if (!mtx) return thrd_error;
assert(mtx != NULL);
if (type != mtx_plain && type != mtx_timed && type != mtx_try
&& type != (mtx_plain|mtx_recursive)
&& type != (mtx_timed|mtx_recursive)
@@ -188,9 +192,8 @@ mtx_init(mtx_t *mtx, int type)
static inline int
mtx_lock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
pthread_mutex_lock(mtx);
return thrd_success;
assert(mtx != NULL);
return (pthread_mutex_lock(mtx) == 0) ? thrd_success : thrd_error;
}
static inline int
@@ -203,7 +206,9 @@ thrd_yield(void);
static inline int
mtx_timedlock(mtx_t *mtx, const xtime *xt)
{
if (!mtx || !xt) return thrd_error;
assert(mtx != NULL);
assert(xt != NULL);
{
#ifdef EMULATED_THREADS_USE_NATIVE_TIMEDLOCK
struct timespec ts;
@@ -233,7 +238,7 @@ mtx_timedlock(mtx_t *mtx, const xtime *xt)
static inline int
mtx_trylock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
assert(mtx != NULL);
return (pthread_mutex_trylock(mtx) == 0) ? thrd_success : thrd_busy;
}
@@ -241,9 +246,8 @@ mtx_trylock(mtx_t *mtx)
static inline int
mtx_unlock(mtx_t *mtx)
{
if (!mtx) return thrd_error;
pthread_mutex_unlock(mtx);
return thrd_success;
assert(mtx != NULL);
return (pthread_mutex_unlock(mtx) == 0) ? thrd_success : thrd_error;
}
@@ -253,7 +257,7 @@ static inline int
thrd_create(thrd_t *thr, thrd_start_t func, void *arg)
{
struct impl_thrd_param *pack;
if (!thr) return thrd_error;
assert(thr != NULL);
pack = (struct impl_thrd_param *)malloc(sizeof(struct impl_thrd_param));
if (!pack) return thrd_nomem;
pack->func = func;
@@ -329,7 +333,7 @@ thrd_yield(void)
static inline int
tss_create(tss_t *key, tss_dtor_t dtor)
{
if (!key) return thrd_error;
assert(key != NULL);
return (pthread_key_create(key, dtor) == 0) ? thrd_success : thrd_error;
}

View File

@@ -8,6 +8,7 @@ env = env.Clone()
env.Append(CPPPATH = [
'#/include',
'#/include/HaikuGL',
'#/src/egl/main',
'#/src',
])
@@ -15,7 +16,6 @@ env.Append(CPPPATH = [
# parse Makefile.sources
egl_sources = env.ParseSourceList('Makefile.sources', 'LIBEGL_C_FILES')
egl_sources.append(env.ParseSourceList('Makefile.sources', 'dri2_backend_core_FILES'))
env.Append(CPPDEFINES = [
'_EGL_NATIVE_PLATFORM=_EGL_PLATFORM_HAIKU',

View File

@@ -27,6 +27,7 @@
#define WL_HIDE_DEPRECATED
#include <stdbool.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
@@ -130,12 +131,10 @@ const __DRIconfig *
dri2_get_dri_config(struct dri2_egl_config *conf, EGLint surface_type,
EGLenum colorspace)
{
if (colorspace == EGL_GL_COLORSPACE_SRGB_KHR)
return surface_type == EGL_WINDOW_BIT ? conf->dri_srgb_double_config :
conf->dri_srgb_single_config;
else
return surface_type == EGL_WINDOW_BIT ? conf->dri_double_config :
conf->dri_single_config;
const bool srgb = colorspace == EGL_GL_COLORSPACE_SRGB_KHR;
return surface_type == EGL_WINDOW_BIT ? conf->dri_double_config[srgb] :
conf->dri_single_config[srgb];
}
static EGLBoolean
@@ -283,14 +282,10 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id,
if (num_configs == 1) {
conf = (struct dri2_egl_config *) matching_config;
if (double_buffer && srgb && !conf->dri_srgb_double_config)
conf->dri_srgb_double_config = dri_config;
else if (double_buffer && !srgb && !conf->dri_double_config)
conf->dri_double_config = dri_config;
else if (!double_buffer && srgb && !conf->dri_srgb_single_config)
conf->dri_srgb_single_config = dri_config;
else if (!double_buffer && !srgb && !conf->dri_single_config)
conf->dri_single_config = dri_config;
if (double_buffer && !conf->dri_double_config[srgb])
conf->dri_double_config[srgb] = dri_config;
else if (!double_buffer && !conf->dri_single_config[srgb])
conf->dri_single_config[srgb] = dri_config;
else
/* a similar config type is already added (unlikely) => discard */
return NULL;
@@ -300,18 +295,13 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id,
if (conf == NULL)
return NULL;
if (double_buffer)
conf->dri_double_config[srgb] = dri_config;
else
conf->dri_single_config[srgb] = dri_config;
memcpy(&conf->base, &base, sizeof base);
if (double_buffer) {
if (srgb)
conf->dri_srgb_double_config = dri_config;
else
conf->dri_double_config = dri_config;
} else {
if (srgb)
conf->dri_srgb_single_config = dri_config;
else
conf->dri_single_config = dri_config;
}
conf->base.SurfaceType = 0;
conf->base.ConfigID = config_id;
_eglLinkConfig(&conf->base);
@@ -588,7 +578,8 @@ dri2_setup_screen(_EGLDisplay *disp)
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB))
disp->Extensions.KHR_gl_colorspace = EGL_TRUE;
if (dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) {
if ((dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) ||
(dri2_dpy->swrast && dri2_dpy->swrast->base.version >= 3)) {
disp->Extensions.KHR_create_context = EGL_TRUE;
if (dri2_dpy->robustness)
@@ -784,7 +775,7 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp)
if (dri2_dpy->own_dri_screen)
dri2_dpy->core->destroyScreen(dri2_dpy->dri_screen);
if (dri2_dpy->fd)
if (dri2_dpy->fd >= 0)
close(dri2_dpy->fd);
if (dri2_dpy->driver)
dlclose(dri2_dpy->driver);
@@ -902,6 +893,55 @@ dri2_create_context_attribs_error(int dri_error)
_eglError(egl_error, "dri2_create_context");
}
static bool
dri2_fill_context_attribs(struct dri2_egl_context *dri2_ctx,
struct dri2_egl_display *dri2_dpy,
uint32_t *ctx_attribs,
unsigned *num_attribs)
{
int pos = 0;
assert(*num_attribs >= 8);
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MAJOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMajorVersion;
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMinorVersion;
if (dri2_ctx->base.Flags != 0) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode. Instead, generate the required EGL error here.
*/
if ((dri2_ctx->base.Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) != 0
&& !dri2_dpy->robustness) {
_eglError(EGL_BAD_MATCH, "eglCreateContext");
return false;
}
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_FLAGS;
ctx_attribs[pos++] = dri2_ctx->base.Flags;
}
if (dri2_ctx->base.ResetNotificationStrategy != EGL_NO_RESET_NOTIFICATION_KHR) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it a reset strategy. It may
* explode. Instead, generate the required EGL error here.
*/
if (!dri2_dpy->robustness) {
_eglError(EGL_BAD_CONFIG, "eglCreateContext");
return false;
}
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_RESET_STRATEGY;
ctx_attribs[pos++] = __DRI_CTX_RESET_LOSE_CONTEXT;
}
*num_attribs = pos;
return true;
}
/**
* Called via eglCreateContext(), drv->API.CreateContext().
*/
@@ -970,10 +1010,10 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
* doubleBufferMode check in
* src/mesa/main/context.c:check_compatible()
*/
if (dri2_config->dri_double_config)
dri_config = dri2_config->dri_double_config;
if (dri2_config->dri_double_config[0])
dri_config = dri2_config->dri_double_config[0];
else
dri_config = dri2_config->dri_single_config;
dri_config = dri2_config->dri_single_config[0];
/* EGL_WINDOW_BIT is set only when there is a dri_double_config. This
* makes sure the back buffer will always be used.
@@ -987,44 +1027,12 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
if (dri2_dpy->dri2) {
if (dri2_dpy->dri2->base.version >= 3) {
unsigned error;
unsigned num_attribs = 0;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_MAJOR_VERSION;
ctx_attribs[num_attribs++] = dri2_ctx->base.ClientMajorVersion;
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[num_attribs++] = dri2_ctx->base.ClientMinorVersion;
if (dri2_ctx->base.Flags != 0) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode. Instead, generate the required EGL error here.
*/
if ((dri2_ctx->base.Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) != 0
&& !dri2_dpy->robustness) {
_eglError(EGL_BAD_MATCH, "eglCreateContext");
goto cleanup;
}
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_FLAGS;
ctx_attribs[num_attribs++] = dri2_ctx->base.Flags;
}
if (dri2_ctx->base.ResetNotificationStrategy != EGL_NO_RESET_NOTIFICATION_KHR) {
/* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it a reset strategy. It may
* explode. Instead, generate the required EGL error here.
*/
if (!dri2_dpy->robustness) {
_eglError(EGL_BAD_CONFIG, "eglCreateContext");
goto cleanup;
}
ctx_attribs[num_attribs++] = __DRI_CTX_ATTRIB_RESET_STRATEGY;
ctx_attribs[num_attribs++] = __DRI_CTX_RESET_LOSE_CONTEXT;
}
assert(num_attribs <= ARRAY_SIZE(ctx_attribs));
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->dri2->createContextAttribs(dri2_dpy->dri_screen,
@@ -1046,12 +1054,33 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
}
} else {
assert(dri2_dpy->swrast);
dri2_ctx->dri_context =
dri2_dpy->swrast->createNewContextForAPI(dri2_dpy->dri_screen,
api,
dri_config,
shared,
dri2_ctx);
if (dri2_dpy->swrast->base.version >= 3) {
unsigned error;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->swrast->createContextAttribs(dri2_dpy->dri_screen,
api,
dri_config,
shared,
num_attribs / 2,
ctx_attribs,
& error,
dri2_ctx);
dri2_create_context_attribs_error(error);
} else {
dri2_ctx->dri_context =
dri2_dpy->swrast->createNewContextForAPI(dri2_dpy->dri_screen,
api,
dri_config,
shared,
dri2_ctx);
}
}
if (!dri2_ctx->dri_context)
@@ -2384,13 +2413,18 @@ dri2_client_wait_sync(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSync *sync,
unsigned wait_flags = 0;
EGLint ret = EGL_CONDITION_SATISFIED_KHR;
if (flags & EGL_SYNC_FLUSH_COMMANDS_BIT_KHR)
/* The EGL_KHR_fence_sync spec states:
*
* "If no context is current for the bound API,
* the EGL_SYNC_FLUSH_COMMANDS_BIT_KHR bit is ignored.
*/
if (dri2_ctx && flags & EGL_SYNC_FLUSH_COMMANDS_BIT_KHR)
wait_flags |= __DRI2_FENCE_FLAG_FLUSH_COMMANDS;
/* the sync object should take a reference while waiting */
dri2_egl_ref_sync(dri2_sync);
if (dri2_dpy->fence->client_wait_sync(dri2_ctx->dri_context,
if (dri2_dpy->fence->client_wait_sync(dri2_ctx ? dri2_ctx->dri_context : NULL,
dri2_sync->fence, wait_flags,
timeout))
dri2_sync->base.SyncStatus = EGL_SIGNALED_KHR;

View File

@@ -284,10 +284,8 @@ struct dri2_egl_surface
struct dri2_egl_config
{
_EGLConfig base;
const __DRIconfig *dri_single_config;
const __DRIconfig *dri_double_config;
const __DRIconfig *dri_srgb_single_config;
const __DRIconfig *dri_srgb_double_config;
const __DRIconfig *dri_single_config[2];
const __DRIconfig *dri_double_config[2];
};
struct dri2_egl_image

View File

@@ -101,6 +101,7 @@ dri2_drm_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
struct dri2_egl_surface *dri2_surf;
struct gbm_surface *window = native_window;
struct gbm_dri_surface *surf;
const __DRIconfig *config;
(void) drv;
@@ -130,21 +131,20 @@ dri2_drm_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
goto cleanup_surf;
}
if (dri2_dpy->dri2) {
const __DRIconfig *config =
dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
config = dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
if (dri2_dpy->dri2) {
dri2_surf->dri_drawable =
(*dri2_dpy->dri2->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf->gbm_surf);
} else {
assert(dri2_dpy->swrast != NULL);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf->gbm_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf->gbm_surf);
}
if (dri2_surf->dri_drawable == NULL) {
@@ -623,27 +623,19 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
dri2_dpy->own_device = 1;
gbm = gbm_create_device(fd);
if (gbm == NULL)
return EGL_FALSE;
goto cleanup;
} else {
fd = fcntl(gbm_device_get_fd(gbm), F_DUPFD_CLOEXEC, 3);
if (fd < 0)
goto cleanup;
}
if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0) {
free(dri2_dpy);
return EGL_FALSE;
}
if (strcmp(gbm_device_get_backend_name(gbm), "drm") != 0)
goto cleanup;
dri2_dpy->gbm_dri = gbm_dri_device(gbm);
if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI) {
free(dri2_dpy);
return EGL_FALSE;
}
if (fd < 0) {
fd = fcntl(gbm_device_get_fd(gbm), F_DUPFD_CLOEXEC, 3);
if (fd < 0) {
free(dri2_dpy);
return EGL_FALSE;
}
}
if (dri2_dpy->gbm_dri->base.type != GBM_DRM_DRIVER_TYPE_DRI)
goto cleanup;
dri2_dpy->fd = fd;
dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd);
@@ -727,4 +719,11 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp)
dri2_dpy->vtbl = &dri2_drm_display_vtbl;
return EGL_TRUE;
cleanup:
if (fd >= 0)
close(fd);
free(dri2_dpy);
return EGL_FALSE;
}

View File

@@ -1645,6 +1645,7 @@ dri2_wl_swrast_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
struct dri2_egl_config *dri2_conf = dri2_egl_config(conf);
struct wl_egl_window *window = native_window;
struct dri2_egl_surface *dri2_surf;
const __DRIconfig *config;
(void) drv;
@@ -1669,10 +1670,12 @@ dri2_wl_swrast_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
dri2_surf->base.Width = -1;
dri2_surf->base.Height = -1;
config = dri2_get_dri_config(dri2_conf, EGL_WINDOW_BIT,
dri2_surf->base.GLColorspace);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen,
config, dri2_surf);
if (dri2_surf->dri_drawable == NULL) {
_eglError(EGL_BAD_ALLOC, "swrast->createNewDrawable");
goto cleanup_dri_drawable;
@@ -1804,6 +1807,7 @@ dri2_initialize_wayland_swrast(_EGLDriver *drv, _EGLDisplay *disp)
if (roundtrip(dri2_dpy) < 0 || dri2_dpy->formats == 0)
goto cleanup_shm;
dri2_dpy->fd = -1;
dri2_dpy->driver_name = strdup("swrast");
if (!dri2_load_driver_swrast(disp))
goto cleanup_shm;

View File

@@ -206,6 +206,7 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
xcb_generic_error_t *error;
xcb_drawable_t drawable;
xcb_screen_t *screen;
const __DRIconfig *config;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_surface));
drawable = (uintptr_t) native_surface;
@@ -245,19 +246,18 @@ dri2_x11_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
dri2_surf->drawable = drawable;
}
if (dri2_dpy->dri2) {
const __DRIconfig *config =
dri2_get_dri_config(dri2_conf, type, dri2_surf->base.GLColorspace);
config = dri2_get_dri_config(dri2_conf, type,
dri2_surf->base.GLColorspace);
if (dri2_dpy->dri2) {
dri2_surf->dri_drawable =
(*dri2_dpy->dri2->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf);
} else {
assert(dri2_dpy->swrast);
dri2_surf->dri_drawable =
(*dri2_dpy->swrast->createNewDrawable) (dri2_dpy->dri_screen,
dri2_conf->dri_double_config,
dri2_surf);
(*dri2_dpy->swrast->createNewDrawable)(dri2_dpy->dri_screen, config,
dri2_surf);
}
if (dri2_surf->dri_drawable == NULL) {
@@ -1161,6 +1161,7 @@ dri2_initialize_x11_swrast(_EGLDriver *drv, _EGLDisplay *disp)
* Every hardware driver_name is set using strdup. Doing the same in
* here will allow is to simply free the memory at dri2_terminate().
*/
dri2_dpy->fd = -1;
dri2_dpy->driver_name = strdup("swrast");
if (!dri2_load_driver_swrast(disp))
goto cleanup_conn;

View File

@@ -152,12 +152,51 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay *dpy,
/* The EGL_KHR_create_context spec says:
*
* "Flags are only defined for OpenGL context creation, and
* specifying a flags value other than zero for other types of
* contexts, including OpenGL ES contexts, will generate an
* error."
* "If the EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR flag bit is set in
* EGL_CONTEXT_FLAGS_KHR, then a <debug context> will be created.
* [...]
* In some cases a debug context may be identical to a non-debug
* context. This bit is supported for OpenGL and OpenGL ES
* contexts."
*/
if (api != EGL_OPENGL_API && val != 0) {
if ((val & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR) &&
(api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API)) {
err = EGL_BAD_ATTRIBUTE;
break;
}
/* The EGL_KHR_create_context spec says:
*
* "If the EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR flag bit
* is set in EGL_CONTEXT_FLAGS_KHR, then a <forward-compatible>
* context will be created. Forward-compatible contexts are
* defined only for OpenGL versions 3.0 and later. They must not
* support functionality marked as <deprecated> by that version of
* the API, while a non-forward-compatible context must support
* all functionality in that version, deprecated or not. This bit
* is supported for OpenGL contexts, and requesting a
* forward-compatible context for OpenGL versions less than 3.0
* will generate an error."
*/
if ((val & EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR) &&
(api != EGL_OPENGL_API || ctx->ClientMajorVersion < 3)) {
err = EGL_BAD_ATTRIBUTE;
break;
}
/* The EGL_KHR_create_context_spec says:
*
* "If the EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR bit is set in
* EGL_CONTEXT_FLAGS_KHR, then a context supporting <robust buffer
* access> will be created. Robust buffer access is defined in the
* GL_ARB_robustness extension specification, and the resulting
* context must also support either the GL_ARB_robustness
* extension, or a version of OpenGL incorporating equivalent
* functionality. This bit is supported for OpenGL contexts.
*/
if ((val & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) &&
(api != EGL_OPENGL_API ||
!dpy->Extensions.EXT_create_context_robustness)) {
err = EGL_BAD_ATTRIBUTE;
break;
}

View File

@@ -197,7 +197,7 @@ drm_authenticate(struct wl_client *client,
wl_resource_post_event(resource, WL_DRM_AUTHENTICATED);
}
const static struct wl_drm_interface drm_interface = {
static const struct wl_drm_interface drm_interface = {
drm_authenticate,
drm_create_buffer,
drm_create_planar_buffer,

View File

@@ -1,3 +1,32 @@
/*
* Copyright © 2011 Kristian Høgsberg
* Copyright © 2011 Benjamin Franzke
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*
* Authors:
* Kristian Høgsberg <krh@bitplanet.net>
* Benjamin Franzke <benjaminfranzke@googlemail.com>
*/
#include <stdlib.h>
#include <wayland-client.h>

View File

@@ -11,6 +11,7 @@ SUBDIRS += auxiliary
##
SUBDIRS += \
drivers/ddebug \
drivers/noop \
drivers/trace \
drivers/rbug

View File

@@ -38,18 +38,23 @@ libgallium_la_SOURCES += \
endif
indices/u_indices_gen.c: $(srcdir)/indices/u_indices_gen.py
$(AM_V_at)$(MKDIR_P) indices
$(AM_V_GEN) $(PYTHON2) $< > $@
MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
indices/u_unfilled_gen.c: $(srcdir)/indices/u_unfilled_gen.py
$(AM_V_at)$(MKDIR_P) indices
$(AM_V_GEN) $(PYTHON2) $< > $@
indices/u_indices_gen.c: indices/u_indices_gen.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/indices/u_indices_gen.py > $@
util/u_format_table.c: $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format_pack.py $(srcdir)/util/u_format_parse.py $(srcdir)/util/u_format.csv
$(AM_V_at)$(MKDIR_P) util
$(AM_V_GEN) $(PYTHON2) $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format.csv > $@
indices/u_unfilled_gen.c: indices/u_unfilled_gen.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/indices/u_unfilled_gen.py > $@
util/u_format_table.c: util/u_format_table.py \
util/u_format_pack.py \
util/u_format_parse.py \
util/u_format.csv
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format.csv > $@
noinst_LTLIBRARIES += libgalliumvl_stub.la
libgalliumvl_stub_la_SOURCES = \

View File

@@ -129,12 +129,16 @@ C_SOURCES := \
rtasm/rtasm_execmem.h \
rtasm/rtasm_x86sse.c \
rtasm/rtasm_x86sse.h \
tgsi/tgsi_aa_point.c \
tgsi/tgsi_aa_point.h \
tgsi/tgsi_build.c \
tgsi/tgsi_build.h \
tgsi/tgsi_dump.c \
tgsi/tgsi_dump.h \
tgsi/tgsi_exec.c \
tgsi/tgsi_exec.h \
tgsi/tgsi_emulate.c \
tgsi/tgsi_emulate.h \
tgsi/tgsi_info.c \
tgsi/tgsi_info.h \
tgsi/tgsi_iterate.c \
@@ -144,6 +148,8 @@ C_SOURCES := \
tgsi/tgsi_opcode_tmp.h \
tgsi/tgsi_parse.c \
tgsi/tgsi_parse.h \
tgsi/tgsi_point_sprite.c \
tgsi/tgsi_point_sprite.h \
tgsi/tgsi_sanity.c \
tgsi/tgsi_sanity.h \
tgsi/tgsi_scan.c \
@@ -154,6 +160,8 @@ C_SOURCES := \
tgsi/tgsi_text.h \
tgsi/tgsi_transform.c \
tgsi/tgsi_transform.h \
tgsi/tgsi_two_side.c \
tgsi/tgsi_two_side.h \
tgsi/tgsi_ureg.c \
tgsi/tgsi_ureg.h \
tgsi/tgsi_util.c \
@@ -260,6 +268,8 @@ C_SOURCES := \
util/u_pack_color.h \
util/u_pointer.h \
util/u_prim.h \
util/u_prim_restart.c \
util/u_prim_restart.h \
util/u_pstipple.c \
util/u_pstipple.h \
util/u_range.h \

View File

@@ -240,7 +240,8 @@ aa_transform_prolog(struct tgsi_transform_context *ctx)
TGSI_FILE_INPUT, texInput, TGSI_SWIZZLE_W);
/* KILL_IF -tmp0.yyyy; # if -tmp0.y < 0, KILL */
tgsi_transform_kill_inst(ctx, TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_Y);
tgsi_transform_kill_inst(ctx, TGSI_FILE_TEMPORARY, tmp0,
TGSI_SWIZZLE_Y, TRUE);
/* compute coverage factor = (1-d)/(1-k) */

View File

@@ -280,7 +280,8 @@ pstip_transform_prolog(struct tgsi_transform_context *ctx)
/* KILL_IF -texTemp.wwww; # if -texTemp < 0, KILL fragment */
tgsi_transform_kill_inst(ctx,
TGSI_FILE_TEMPORARY, pctx->texTemp, TGSI_SWIZZLE_W);
TGSI_FILE_TEMPORARY, pctx->texTemp,
TGSI_SWIZZLE_W, TRUE);
}

View File

@@ -311,7 +311,7 @@ lp_build_const_elem(struct gallivm_state *gallivm,
else {
double dscale = lp_const_scale(type);
elem = LLVMConstInt(elem_type, round(val*dscale), 0);
elem = LLVMConstInt(elem_type, (long long) round(val*dscale), 0);
}
return elem;

View File

@@ -81,6 +81,8 @@
# pragma pop_macro("DEBUG")
#endif
#include "c11/threads.h"
#include "os/os_thread.h"
#include "pipe/p_config.h"
#include "util/u_debug.h"
#include "util/u_cpu_detect.h"
@@ -103,6 +105,33 @@ static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
}
static once_flag init_native_targets_once_flag;
static void init_native_targets()
{
// If we have a native target, initialize it to ensure it is linked in and
// usable by the JIT.
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetDisassembler();
}
/**
* The llvm target registry is not thread-safe, so drivers and state-trackers
* that want to initialize targets should use the gallivm_init_llvm_targets()
* function to safely initialize targets.
*
* LLVM targets should be initialized before the driver or state-tracker tries
* to access the registry.
*/
extern "C" void
gallivm_init_llvm_targets(void)
{
call_once(&init_native_targets_once_flag, init_native_targets);
}
extern "C" void
lp_set_target_options(void)
{
@@ -115,13 +144,7 @@ lp_set_target_options(void)
llvm::DisablePrettyStackTrace = true;
#endif
// If we have a native target, initialize it to ensure it is linked in and
// usable by the JIT.
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetDisassembler();
gallivm_init_llvm_targets();
}

View File

@@ -41,6 +41,8 @@ extern "C" {
struct lp_generated_code;
extern void
gallivm_init_llvm_targets(void);
extern void
lp_set_target_options(void);

View File

@@ -24,9 +24,10 @@
#include "util/ralloc.h"
#include "glsl/nir/nir.h"
#include "glsl/nir/nir_control_flow.h"
#include "glsl/nir/nir_builder.h"
#include "glsl/list.h"
#include "glsl/shader_enums.h"
#include "glsl/nir/shader_enums.h"
#include "nir/tgsi_to_nir.h"
#include "tgsi/tgsi_parse.h"
@@ -64,24 +65,24 @@ struct ttn_compile {
nir_register *addr_reg;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* For each IF/ELSE/ENDIF block, if_stack[if_stack_pos] has where the else
* instructions should be placed, and if_stack[if_stack_pos - 1] has where
* the next instructions outside of the if/then/else block go.
*/
struct exec_list **if_stack;
nir_cursor *if_stack;
unsigned if_stack_pos;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* loop_stack[loop_stack_pos - 1] contains the cf_node_list for the outside
* of the loop.
*/
struct exec_list **loop_stack;
nir_cursor *loop_stack;
unsigned loop_stack_pos;
/* How many TGSI_FILE_IMMEDIATE vec4s have been parsed so far. */
@@ -93,6 +94,128 @@ struct ttn_compile {
#define ttn_channel(b, src, swiz) \
nir_swizzle(b, src, SWIZ(swiz, swiz, swiz, swiz), 1, false)
static gl_varying_slot
tgsi_varying_semantic_to_slot(unsigned semantic, unsigned index)
{
switch (semantic) {
case TGSI_SEMANTIC_POSITION:
return VARYING_SLOT_POS;
case TGSI_SEMANTIC_COLOR:
if (index == 0)
return VARYING_SLOT_COL0;
else
return VARYING_SLOT_COL1;
case TGSI_SEMANTIC_BCOLOR:
if (index == 0)
return VARYING_SLOT_BFC0;
else
return VARYING_SLOT_BFC1;
case TGSI_SEMANTIC_FOG:
return VARYING_SLOT_FOGC;
case TGSI_SEMANTIC_PSIZE:
return VARYING_SLOT_PSIZ;
case TGSI_SEMANTIC_GENERIC:
return VARYING_SLOT_VAR0 + index;
case TGSI_SEMANTIC_FACE:
return VARYING_SLOT_FACE;
case TGSI_SEMANTIC_EDGEFLAG:
return VARYING_SLOT_EDGE;
case TGSI_SEMANTIC_PRIMID:
return VARYING_SLOT_PRIMITIVE_ID;
case TGSI_SEMANTIC_CLIPDIST:
if (index == 0)
return VARYING_SLOT_CLIP_DIST0;
else
return VARYING_SLOT_CLIP_DIST1;
case TGSI_SEMANTIC_CLIPVERTEX:
return VARYING_SLOT_CLIP_VERTEX;
case TGSI_SEMANTIC_TEXCOORD:
return VARYING_SLOT_TEX0 + index;
case TGSI_SEMANTIC_PCOORD:
return VARYING_SLOT_PNTC;
case TGSI_SEMANTIC_VIEWPORT_INDEX:
return VARYING_SLOT_VIEWPORT;
case TGSI_SEMANTIC_LAYER:
return VARYING_SLOT_LAYER;
default:
fprintf(stderr, "Bad TGSI semantic: %d/%d\n", semantic, index);
abort();
}
}
/* Temporary helper to remap back to TGSI style semantic name/index
* values, for use in drivers that haven't been converted to using
* VARYING_SLOT_
*/
void
varying_slot_to_tgsi_semantic(gl_varying_slot slot,
unsigned *semantic_name, unsigned *semantic_index)
{
static const unsigned map[][2] = {
[VARYING_SLOT_POS] = { TGSI_SEMANTIC_POSITION, 0 },
[VARYING_SLOT_COL0] = { TGSI_SEMANTIC_COLOR, 0 },
[VARYING_SLOT_COL1] = { TGSI_SEMANTIC_COLOR, 1 },
[VARYING_SLOT_BFC0] = { TGSI_SEMANTIC_BCOLOR, 0 },
[VARYING_SLOT_BFC1] = { TGSI_SEMANTIC_BCOLOR, 1 },
[VARYING_SLOT_FOGC] = { TGSI_SEMANTIC_FOG, 0 },
[VARYING_SLOT_PSIZ] = { TGSI_SEMANTIC_PSIZE, 0 },
[VARYING_SLOT_FACE] = { TGSI_SEMANTIC_FACE, 0 },
[VARYING_SLOT_EDGE] = { TGSI_SEMANTIC_EDGEFLAG, 0 },
[VARYING_SLOT_PRIMITIVE_ID] = { TGSI_SEMANTIC_PRIMID, 0 },
[VARYING_SLOT_CLIP_DIST0] = { TGSI_SEMANTIC_CLIPDIST, 0 },
[VARYING_SLOT_CLIP_DIST1] = { TGSI_SEMANTIC_CLIPDIST, 1 },
[VARYING_SLOT_CLIP_VERTEX] = { TGSI_SEMANTIC_CLIPVERTEX, 0 },
[VARYING_SLOT_PNTC] = { TGSI_SEMANTIC_PCOORD, 0 },
[VARYING_SLOT_VIEWPORT] = { TGSI_SEMANTIC_VIEWPORT_INDEX, 0 },
[VARYING_SLOT_LAYER] = { TGSI_SEMANTIC_LAYER, 0 },
};
if (slot >= VARYING_SLOT_VAR0) {
*semantic_name = TGSI_SEMANTIC_GENERIC;
*semantic_index = slot - VARYING_SLOT_VAR0;
return;
}
if (slot >= VARYING_SLOT_TEX0 && slot <= VARYING_SLOT_TEX7) {
*semantic_name = TGSI_SEMANTIC_TEXCOORD;
*semantic_index = slot - VARYING_SLOT_TEX0;
return;
}
if (slot >= ARRAY_SIZE(map)) {
fprintf(stderr, "Unknown varying slot %d\n", slot);
abort();
}
*semantic_name = map[slot][0];
*semantic_index = map[slot][1];
}
/* Temporary helper to remap back to TGSI style semantic name/index
* values, for use in drivers that haven't been converted to using
* FRAG_RESULT_
*/
void
frag_result_to_tgsi_semantic(gl_frag_result slot,
unsigned *semantic_name, unsigned *semantic_index)
{
static const unsigned map[][2] = {
[FRAG_RESULT_DEPTH] = { TGSI_SEMANTIC_POSITION, 0 },
[FRAG_RESULT_COLOR] = { TGSI_SEMANTIC_COLOR, -1 },
[FRAG_RESULT_DATA0 + 0] = { TGSI_SEMANTIC_COLOR, 0 },
[FRAG_RESULT_DATA0 + 1] = { TGSI_SEMANTIC_COLOR, 1 },
[FRAG_RESULT_DATA0 + 2] = { TGSI_SEMANTIC_COLOR, 2 },
[FRAG_RESULT_DATA0 + 3] = { TGSI_SEMANTIC_COLOR, 3 },
[FRAG_RESULT_DATA0 + 4] = { TGSI_SEMANTIC_COLOR, 4 },
[FRAG_RESULT_DATA0 + 5] = { TGSI_SEMANTIC_COLOR, 5 },
[FRAG_RESULT_DATA0 + 6] = { TGSI_SEMANTIC_COLOR, 6 },
[FRAG_RESULT_DATA0 + 7] = { TGSI_SEMANTIC_COLOR, 7 },
};
*semantic_name = map[slot][0];
*semantic_index = map[slot][1];
}
static nir_ssa_def *
ttn_src_for_dest(nir_builder *b, nir_alu_dest *dest)
{
@@ -215,12 +338,15 @@ ttn_emit_declaration(struct ttn_compile *c)
var->data.mode = nir_var_shader_in;
var->name = ralloc_asprintf(var, "in_%d", idx);
/* We should probably translate to a VERT_ATTRIB_* or VARYING_SLOT_*
* instead, but nothing in NIR core is looking at the value
* currently, and this is less change to drivers.
*/
var->data.location = decl->Semantic.Name;
var->data.index = decl->Semantic.Index;
if (c->scan->processor == TGSI_PROCESSOR_FRAGMENT) {
var->data.location =
tgsi_varying_semantic_to_slot(decl->Semantic.Name,
decl->Semantic.Index);
} else {
assert(!decl->Declaration.Semantic);
var->data.location = VERT_ATTRIB_GENERIC0 + idx;
}
var->data.index = 0;
/* We definitely need to translate the interpolation field, because
* nir_print will decode it.
@@ -240,6 +366,8 @@ ttn_emit_declaration(struct ttn_compile *c)
exec_list_push_tail(&b->shader->inputs, &var->node);
break;
case TGSI_FILE_OUTPUT: {
int semantic_name = decl->Semantic.Name;
int semantic_index = decl->Semantic.Index;
/* Since we can't load from outputs in the IR, we make temporaries
* for the outputs and emit stores to the real outputs at the end of
* the shader.
@@ -251,14 +379,40 @@ ttn_emit_declaration(struct ttn_compile *c)
var->data.mode = nir_var_shader_out;
var->name = ralloc_asprintf(var, "out_%d", idx);
var->data.index = 0;
var->data.location = decl->Semantic.Name;
if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
decl->Semantic.Index == 0 &&
c->scan->properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS])
var->data.index = -1;
else
var->data.index = decl->Semantic.Index;
if (c->scan->processor == TGSI_PROCESSOR_FRAGMENT) {
switch (semantic_name) {
case TGSI_SEMANTIC_COLOR: {
/* TODO tgsi loses some information, so we cannot
* actually differentiate here between DSB and MRT
* at this point. But so far no drivers using tgsi-
* to-nir support dual source blend:
*/
bool dual_src_blend = false;
if (dual_src_blend && (semantic_index == 1)) {
var->data.location = FRAG_RESULT_DATA0;
var->data.index = 1;
} else {
if (c->scan->properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS])
var->data.location = FRAG_RESULT_COLOR;
else
var->data.location = FRAG_RESULT_DATA0 + semantic_index;
}
break;
}
case TGSI_SEMANTIC_POSITION:
var->data.location = FRAG_RESULT_DEPTH;
break;
default:
fprintf(stderr, "Bad TGSI semantic: %d/%d\n",
decl->Semantic.Name, decl->Semantic.Index);
abort();
}
} else {
var->data.location =
tgsi_varying_semantic_to_slot(semantic_name, semantic_index);
}
if (is_array) {
unsigned j;
@@ -307,7 +461,7 @@ ttn_emit_immediate(struct ttn_compile *c)
for (i = 0; i < 4; i++)
load_const->value.u[i] = tgsi_imm->u[i].Uint;
nir_instr_insert_after_cf_list(b->cf_node_list, &load_const->instr);
nir_builder_instr_insert(b, &load_const->instr);
}
static nir_src
@@ -363,7 +517,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->variables[0] = ttn_array_deref(c, load, var, offset, indirect);
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
@@ -414,7 +568,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->num_components = ncomp;
nir_ssa_dest_init(&load->instr, &load->dest, ncomp, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -476,7 +630,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
srcn++;
}
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -552,7 +706,7 @@ ttn_get_dest(struct ttn_compile *c, struct tgsi_full_dst_register *tgsi_fdst)
load->dest = nir_dest_for_reg(reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
} else {
assert(!tgsi_dst->Indirect);
dest.dest.reg.reg = c->temp_regs[index].reg;
@@ -667,7 +821,7 @@ ttn_alu(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
instr->src[i].src = nir_src_for_ssa(src[i]);
instr->dest = dest;
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -683,7 +837,7 @@ ttn_move_dest_masked(nir_builder *b, nir_alu_dest dest,
mov->src[0].src = nir_src_for_ssa(def);
for (unsigned i = def->num_components; i < 4; i++)
mov->src[0].swizzle[i] = def->num_components - 1;
nir_instr_insert_after_cf_list(b->cf_node_list, &mov->instr);
nir_builder_instr_insert(b, &mov->instr);
}
static void
@@ -902,7 +1056,7 @@ ttn_kill(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
{
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -912,7 +1066,7 @@ ttn_kill_if(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
discard->src[0] = nir_src_for_ssa(cmp);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -920,10 +1074,6 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
{
nir_builder *b = &c->build;
/* Save the outside-of-the-if-statement node list. */
c->if_stack[c->if_stack_pos] = b->cf_node_list;
c->if_stack_pos++;
src = ttn_channel(b, src, X);
nir_if *if_stmt = nir_if_create(b->shader);
@@ -932,11 +1082,14 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
} else {
if_stmt->condition = nir_src_for_ssa(nir_fne(b, src, nir_imm_int(b, 0)));
}
nir_cf_node_insert_end(b->cf_node_list, &if_stmt->cf_node);
nir_builder_cf_insert(b, &if_stmt->cf_node);
nir_builder_insert_after_cf_list(b, &if_stmt->then_list);
c->if_stack[c->if_stack_pos] = nir_after_cf_node(&if_stmt->cf_node);
c->if_stack_pos++;
c->if_stack[c->if_stack_pos] = &if_stmt->else_list;
b->cursor = nir_after_cf_list(&if_stmt->then_list);
c->if_stack[c->if_stack_pos] = nir_after_cf_list(&if_stmt->else_list);
c->if_stack_pos++;
}
@@ -945,7 +1098,7 @@ ttn_else(struct ttn_compile *c)
{
nir_builder *b = &c->build;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos - 1]);
b->cursor = c->if_stack[c->if_stack_pos - 1];
}
static void
@@ -954,7 +1107,7 @@ ttn_endif(struct ttn_compile *c)
nir_builder *b = &c->build;
c->if_stack_pos -= 2;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos]);
b->cursor = c->if_stack[c->if_stack_pos];
}
static void
@@ -962,28 +1115,27 @@ ttn_bgnloop(struct ttn_compile *c)
{
nir_builder *b = &c->build;
/* Save the outside-of-the-loop node list. */
c->loop_stack[c->loop_stack_pos] = b->cf_node_list;
nir_loop *loop = nir_loop_create(b->shader);
nir_builder_cf_insert(b, &loop->cf_node);
c->loop_stack[c->loop_stack_pos] = nir_after_cf_node(&loop->cf_node);
c->loop_stack_pos++;
nir_loop *loop = nir_loop_create(b->shader);
nir_cf_node_insert_end(b->cf_node_list, &loop->cf_node);
nir_builder_insert_after_cf_list(b, &loop->body);
b->cursor = nir_after_cf_list(&loop->body);
}
static void
ttn_cont(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_continue);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
ttn_brk(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_break);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -992,7 +1144,7 @@ ttn_endloop(struct ttn_compile *c)
nir_builder *b = &c->build;
c->loop_stack_pos--;
nir_builder_insert_after_cf_list(b, c->loop_stack[c->loop_stack_pos]);
b->cursor = c->loop_stack[c->loop_stack_pos];
}
static void
@@ -1279,7 +1431,7 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
assert(src_number == num_srcs);
nir_ssa_dest_init(&instr->instr, &instr->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
/* Resolve the writemask on the texture op. */
ttn_move_dest(b, dest, &instr->dest.ssa);
@@ -1318,10 +1470,10 @@ ttn_txq(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
txs->src[0].src_type = nir_tex_src_lod;
nir_ssa_dest_init(&txs->instr, &txs->dest, 3, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &txs->instr);
nir_builder_instr_insert(b, &txs->instr);
nir_ssa_dest_init(&qlv->instr, &qlv->dest, 1, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &qlv->instr);
nir_builder_instr_insert(b, &qlv->instr);
ttn_move_dest_masked(b, dest, &txs->dest.ssa, TGSI_WRITEMASK_XYZ);
ttn_move_dest_masked(b, dest, &qlv->dest.ssa, TGSI_WRITEMASK_W);
@@ -1730,7 +1882,7 @@ ttn_emit_instruction(struct ttn_compile *c)
store->variables[0] = ttn_array_deref(c, store, var, offset, indirect);
store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
@@ -1759,11 +1911,26 @@ ttn_add_output_stores(struct ttn_compile *c)
store->const_index[0] = loc;
store->src[0].reg.reg = c->output_regs[loc].reg;
store->src[0].reg.base_offset = c->output_regs[loc].offset;
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
}
static gl_shader_stage
tgsi_processor_to_shader_stage(unsigned processor)
{
switch (processor) {
case TGSI_PROCESSOR_FRAGMENT: return MESA_SHADER_FRAGMENT;
case TGSI_PROCESSOR_VERTEX: return MESA_SHADER_VERTEX;
case TGSI_PROCESSOR_GEOMETRY: return MESA_SHADER_GEOMETRY;
case TGSI_PROCESSOR_TESS_CTRL: return MESA_SHADER_TESS_CTRL;
case TGSI_PROCESSOR_TESS_EVAL: return MESA_SHADER_TESS_EVAL;
case TGSI_PROCESSOR_COMPUTE: return MESA_SHADER_COMPUTE;
default:
unreachable("invalid TGSI processor");
};
}
struct nir_shader *
tgsi_to_nir(const void *tgsi_tokens,
const nir_shader_compiler_options *options)
@@ -1775,17 +1942,19 @@ tgsi_to_nir(const void *tgsi_tokens,
int ret;
c = rzalloc(NULL, struct ttn_compile);
s = nir_shader_create(NULL, options);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
s = nir_shader_create(NULL, tgsi_processor_to_shader_stage(scan.processor),
options);
nir_function *func = nir_function_create(s, "main");
nir_function_overload *overload = nir_function_overload_create(func);
nir_function_impl *impl = nir_function_impl_create(overload);
nir_builder_init(&c->build, impl);
nir_builder_insert_after_cf_list(&c->build, &impl->body);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
c->build.cursor = nir_after_cf_list(&impl->body);
s->num_inputs = scan.file_max[TGSI_FILE_INPUT] + 1;
s->num_uniforms = scan.const_file_max[0] + 1;
@@ -1801,10 +1970,10 @@ tgsi_to_nir(const void *tgsi_tokens,
c->num_samp_types = scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1;
c->samp_types = rzalloc_array(c, nir_alu_type, c->num_samp_types);
c->if_stack = rzalloc_array(c, struct exec_list *,
c->if_stack = rzalloc_array(c, nir_cursor,
(scan.opcode_count[TGSI_OPCODE_IF] +
scan.opcode_count[TGSI_OPCODE_UIF]) * 2);
c->loop_stack = rzalloc_array(c, struct exec_list *,
c->loop_stack = rzalloc_array(c, nir_cursor,
scan.opcode_count[TGSI_OPCODE_BGNLOOP]);
ret = tgsi_parse_init(&parser, tgsi_tokens);

View File

@@ -28,3 +28,9 @@ struct nir_shader_compiler_options *options;
struct nir_shader *
tgsi_to_nir(const void *tgsi_tokens,
const struct nir_shader_compiler_options *options);
void
varying_slot_to_tgsi_semantic(gl_varying_slot slot,
unsigned *semantic_name, unsigned *semantic_index);
void
frag_result_to_tgsi_semantic(gl_frag_result slot,
unsigned *semantic_name, unsigned *semantic_index);

View File

@@ -96,11 +96,13 @@ os_log_message(const char *message)
}
#if !defined(PIPE_SUBSYSTEM_EMBEDDED)
const char *
os_get_option(const char *name)
{
return getenv(name);
}
#endif /* !PIPE_SUBSYSTEM_EMBEDDED */
/**

View File

@@ -166,6 +166,11 @@ pb_cache_manager_create(struct pb_manager *provider,
unsigned bypass_usage,
uint64_t maximum_cache_size);
/**
* Remove a buffer from the cache, but keep it alive.
*/
void
pb_cache_manager_remove_buffer(struct pb_buffer *buf);
struct pb_fence_ops;

View File

@@ -104,18 +104,42 @@ pb_cache_manager(struct pb_manager *mgr)
}
static void
_pb_cache_manager_remove_buffer_locked(struct pb_cache_buffer *buf)
{
struct pb_cache_manager *mgr = buf->mgr;
if (buf->head.next) {
LIST_DEL(&buf->head);
assert(mgr->numDelayed);
--mgr->numDelayed;
mgr->cache_size -= buf->base.size;
}
buf->mgr = NULL;
}
void
pb_cache_manager_remove_buffer(struct pb_buffer *pb_buf)
{
struct pb_cache_buffer *buf = (struct pb_cache_buffer*)pb_buf;
struct pb_cache_manager *mgr = buf->mgr;
if (!mgr)
return;
pipe_mutex_lock(mgr->mutex);
_pb_cache_manager_remove_buffer_locked(buf);
pipe_mutex_unlock(mgr->mutex);
}
/**
* Actually destroy the buffer.
*/
static inline void
_pb_cache_buffer_destroy(struct pb_cache_buffer *buf)
{
struct pb_cache_manager *mgr = buf->mgr;
LIST_DEL(&buf->head);
assert(mgr->numDelayed);
--mgr->numDelayed;
mgr->cache_size -= buf->base.size;
if (buf->mgr)
_pb_cache_manager_remove_buffer_locked(buf);
assert(!pipe_is_referenced(&buf->base.reference));
pb_reference(&buf->buffer, NULL);
FREE(buf);
@@ -156,6 +180,12 @@ pb_cache_buffer_destroy(struct pb_buffer *_buf)
struct pb_cache_buffer *buf = pb_cache_buffer(_buf);
struct pb_cache_manager *mgr = buf->mgr;
if (!mgr) {
pb_reference(&buf->buffer, NULL);
FREE(buf);
return;
}
pipe_mutex_lock(mgr->mutex);
assert(!pipe_is_referenced(&buf->base.reference));

View File

@@ -11,6 +11,10 @@
* one or more debug driver: rbug, trace.
*/
#ifdef GALLIUM_DDEBUG
#include "ddebug/dd_public.h"
#endif
#ifdef GALLIUM_TRACE
#include "trace/tr_public.h"
#endif
@@ -30,6 +34,10 @@
static inline struct pipe_screen *
debug_screen_wrap(struct pipe_screen *screen)
{
#if defined(GALLIUM_DDEBUG)
screen = ddebug_screen_create(screen);
#endif
#if defined(GALLIUM_RBUG)
screen = rbug_screen_create(screen);
#endif

View File

@@ -0,0 +1,309 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/**
* This utility transforms the fragment shader to support anti-aliasing points.
*/
#include "util/u_debug.h"
#include "util/u_math.h"
#include "tgsi_info.h"
#include "tgsi_aa_point.h"
#include "tgsi_transform.h"
#define INVALID_INDEX 9999
struct aa_transform_context
{
struct tgsi_transform_context base;
unsigned tmp; // temp register
unsigned color_out; // frag color out register
unsigned color_tmp; // frag color temp register
unsigned num_tmp; // number of temp registers
unsigned num_imm; // number of immediates
unsigned num_input; // number of inputs
unsigned aa_point_coord_index;
};
static inline struct aa_transform_context *
aa_transform_context(struct tgsi_transform_context *ctx)
{
return (struct aa_transform_context *) ctx;
}
/**
* TGSI declaration transform callback.
*/
static void
aa_decl(struct tgsi_transform_context *ctx,
struct tgsi_full_declaration *decl)
{
struct aa_transform_context *ts = aa_transform_context(ctx);
if (decl->Declaration.File == TGSI_FILE_OUTPUT &&
decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
decl->Semantic.Index == 0) {
ts->color_out = decl->Range.First;
}
else if (decl->Declaration.File == TGSI_FILE_INPUT) {
ts->num_input++;
}
else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
}
ctx->emit_declaration(ctx, decl);
}
/**
* TGSI immediate declaration transform callback.
*/
static void
aa_immediate(struct tgsi_transform_context *ctx,
struct tgsi_full_immediate *imm)
{
struct aa_transform_context *ts = aa_transform_context(ctx);
ctx->emit_immediate(ctx, imm);
ts->num_imm++;
}
/**
* TGSI transform prolog callback.
*/
static void
aa_prolog(struct tgsi_transform_context *ctx)
{
struct aa_transform_context *ts = aa_transform_context(ctx);
unsigned tmp0;
unsigned texIn;
unsigned imm;
/* Declare two temporary registers, one for temporary and
* one for color.
*/
ts->tmp = ts->num_tmp++;
ts->color_tmp = ts->num_tmp++;
tgsi_transform_temps_decl(ctx, ts->tmp, ts->color_tmp);
/* Declare new generic input/texcoord */
texIn = ts->num_input++;
tgsi_transform_input_decl(ctx, texIn, TGSI_SEMANTIC_GENERIC,
ts->aa_point_coord_index, TGSI_INTERPOLATE_LINEAR);
/* Declare extra immediates */
imm = ts->num_imm++;
tgsi_transform_immediate_decl(ctx, 0.5, 0.5, 0.45, 1.0);
/*
* Emit code to compute fragment coverage.
* The point always has radius 0.5. The threshold value will be a
* value less than, but close to 0.5, such as 0.45.
* We compute a coverage factor from the distance and threshold.
* If the coverage is negative, the fragment is outside the circle and
* it's discarded.
* If the coverage is >= 1, the fragment is fully inside the threshold
* distance. We limit/clamp the coverage to 1.
* Otherwise, the fragment is between the threshold value and 0.5 and we
* compute a coverage value in [0,1].
*
* Input reg (texIn) usage:
* texIn.x = x point coord in [0,1]
* texIn.y = y point coord in [0,1]
* texIn.z = "k" the smoothing threshold distance
* texIn.w = unused
*
* Temp reg (t0) usage:
* t0.x = distance of fragment from center point
* t0.y = boolean, is t0.x > 0.5, also misc temp usage
* t0.z = temporary for computing 1/(0.5-k) value
* t0.w = final coverage value
*/
tmp0 = ts->tmp;
/* SUB t0.xy, texIn, (0.5, 0,5) */
tgsi_transform_op2_inst(ctx, TGSI_OPCODE_SUB,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_XY,
TGSI_FILE_INPUT, texIn,
TGSI_FILE_IMMEDIATE, imm);
/* DP2 t0.x, t0.xy, t0.xy; # t0.x = x^2 + y^2 */
tgsi_transform_op2_inst(ctx, TGSI_OPCODE_DP2,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_X,
TGSI_FILE_TEMPORARY, tmp0,
TGSI_FILE_TEMPORARY, tmp0);
/* SQRT t0.x, t0.x */
tgsi_transform_op1_inst(ctx, TGSI_OPCODE_SQRT,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_X,
TGSI_FILE_TEMPORARY, tmp0);
/* compute coverage factor = (0.5-d)/(0.5-k) */
/* SUB t0.w, 0.5, texIn.z; # t0.w = 0.5-k */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_SUB,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_W,
TGSI_FILE_IMMEDIATE, imm, TGSI_SWIZZLE_X,
TGSI_FILE_INPUT, texIn, TGSI_SWIZZLE_Z);
/* SUB t0.y, 0.5, t0.x; # t0.y = 0.5-d */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_SUB,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_Y,
TGSI_FILE_IMMEDIATE, imm, TGSI_SWIZZLE_X,
TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_X);
/* DIV t0.w, t0.y, t0.w; # coverage = (0.5-d)/(0.5-k) */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_DIV,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_W,
TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_Y,
TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_W);
/* If the coverage value is negative, it means the fragment is outside
* the point's circular boundary. Kill it.
*/
/* KILL_IF tmp0.w; # if tmp0.w < 0 KILL */
tgsi_transform_kill_inst(ctx, TGSI_FILE_TEMPORARY, tmp0,
TGSI_SWIZZLE_W, FALSE);
/* If the distance is less than the threshold, the coverage/alpha value
* will be greater than one. Clamp to one here.
*/
/* MIN tmp0.w, tmp0.w, 1.0 */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_MIN,
TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_W,
TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_W,
TGSI_FILE_IMMEDIATE, imm, TGSI_SWIZZLE_W);
}
/**
* TGSI instruction transform callback.
*/
static void
aa_inst(struct tgsi_transform_context *ctx,
struct tgsi_full_instruction *inst)
{
struct aa_transform_context *ts = aa_transform_context(ctx);
unsigned i;
/* Look for writes to color output reg and replace it with
* color temp reg.
*/
for (i = 0; i < inst->Instruction.NumDstRegs; i++) {
struct tgsi_full_dst_register *dst = &inst->Dst[i];
if (dst->Register.File == TGSI_FILE_OUTPUT &&
dst->Register.Index == ts->color_out) {
dst->Register.File = TGSI_FILE_TEMPORARY;
dst->Register.Index = ts->color_tmp;
}
}
ctx->emit_instruction(ctx, inst);
}
/**
* TGSI transform epilog callback.
*/
static void
aa_epilog(struct tgsi_transform_context *ctx)
{
struct aa_transform_context *ts = aa_transform_context(ctx);
/* add alpha modulation code at tail of program */
assert(ts->color_out != INVALID_INDEX);
assert(ts->color_tmp != INVALID_INDEX);
/* MOV output.color.xyz colorTmp */
tgsi_transform_op1_inst(ctx, TGSI_OPCODE_MOV,
TGSI_FILE_OUTPUT, ts->color_out,
TGSI_WRITEMASK_XYZ,
TGSI_FILE_TEMPORARY, ts->color_tmp);
/* MUL output.color.w colorTmp.w tmp0.w */
tgsi_transform_op2_inst(ctx, TGSI_OPCODE_MUL,
TGSI_FILE_OUTPUT, ts->color_out,
TGSI_WRITEMASK_W,
TGSI_FILE_TEMPORARY, ts->color_tmp,
TGSI_FILE_TEMPORARY, ts->tmp);
}
/**
* TGSI utility to transform a fragment shader to support antialiasing point.
*
* This utility accepts two inputs:
*\param tokens_in -- the original token string of the shader
*\param aa_point_coord_index -- the semantic index of the generic register
* that contains the point sprite texture coord
*
* For each fragment in the point, we compute the distance of the fragment
* from the point center using the point sprite texture coordinates.
* If the distance is greater than 0.5, we'll discard the fragment.
* Otherwise, we'll compute a coverage value which approximates how much
* of the fragment is inside the bounding circle of the point. If the distance
* is less than 'k', the coverage is 1. Else, the coverage is between 0 and 1.
* The final fragment color's alpha channel is then modulated by the coverage
* value.
*/
struct tgsi_token *
tgsi_add_aa_point(const struct tgsi_token *tokens_in,
const int aa_point_coord_index)
{
struct aa_transform_context transform;
const uint num_new_tokens = 200; /* should be enough */
const uint new_len = tgsi_num_tokens(tokens_in) + num_new_tokens;
struct tgsi_token *new_tokens;
/* allocate new tokens buffer */
new_tokens = tgsi_alloc_tokens(new_len);
if (!new_tokens)
return NULL;
/* setup transformation context */
memset(&transform, 0, sizeof(transform));
transform.base.transform_declaration = aa_decl;
transform.base.transform_instruction = aa_inst;
transform.base.transform_immediate = aa_immediate;
transform.base.prolog = aa_prolog;
transform.base.epilog = aa_epilog;
transform.tmp = INVALID_INDEX;
transform.color_out = INVALID_INDEX;
transform.color_tmp = INVALID_INDEX;
assert(aa_point_coord_index != -1);
transform.aa_point_coord_index = (unsigned)aa_point_coord_index;
transform.num_tmp = 0;
transform.num_imm = 0;
transform.num_input = 0;
/* transform the shader */
tgsi_transform_shader(tokens_in, new_tokens, new_len, &transform.base);
return new_tokens;
}

View File

@@ -0,0 +1,35 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef TGSI_AA_POINT_H
#define TGSI_AA_POINT_H
struct tgsi_token;
struct tgsi_token *
tgsi_add_aa_point(const struct tgsi_token *tokens_in,
const int aa_point_coord_index);
#endif /* TGSI_AA_POINT_H */

View File

@@ -0,0 +1,169 @@
/*
* Copyright (C) 2015 Advanced Micro Devices, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*/
#include "tgsi/tgsi_transform.h"
#include "tgsi/tgsi_scan.h"
#include "tgsi/tgsi_dump.h"
#include "util/u_debug.h"
#include "tgsi_emulate.h"
struct tgsi_emulation_context {
struct tgsi_transform_context base;
struct tgsi_shader_info info;
unsigned flags;
bool first_instruction_emitted;
};
static inline struct tgsi_emulation_context *
tgsi_emulation_context(struct tgsi_transform_context *tctx)
{
return (struct tgsi_emulation_context *)tctx;
}
static void
transform_decl(struct tgsi_transform_context *tctx,
struct tgsi_full_declaration *decl)
{
struct tgsi_emulation_context *ctx = tgsi_emulation_context(tctx);
if (ctx->flags & TGSI_EMU_FORCE_PERSAMPLE_INTERP &&
decl->Declaration.File == TGSI_FILE_INPUT) {
assert(decl->Declaration.Interpolate);
decl->Interp.Location = TGSI_INTERPOLATE_LOC_SAMPLE;
}
tctx->emit_declaration(tctx, decl);
}
static void
passthrough_edgeflag(struct tgsi_transform_context *tctx)
{
struct tgsi_emulation_context *ctx = tgsi_emulation_context(tctx);
struct tgsi_full_declaration decl;
struct tgsi_full_instruction new_inst;
/* Input */
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_INPUT;
decl.Range.First = decl.Range.Last = ctx->info.num_inputs;
tctx->emit_declaration(tctx, &decl);
/* Output */
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_OUTPUT;
decl.Declaration.Semantic = true;
decl.Range.First = decl.Range.Last = ctx->info.num_outputs;
decl.Semantic.Name = TGSI_SEMANTIC_EDGEFLAG;
decl.Semantic.Index = 0;
tctx->emit_declaration(tctx, &decl);
/* MOV */
new_inst = tgsi_default_full_instruction();
new_inst.Instruction.Opcode = TGSI_OPCODE_MOV;
new_inst.Instruction.NumDstRegs = 1;
new_inst.Dst[0].Register.File = TGSI_FILE_OUTPUT;
new_inst.Dst[0].Register.Index = ctx->info.num_outputs;
new_inst.Dst[0].Register.WriteMask = TGSI_WRITEMASK_XYZW;
new_inst.Instruction.NumSrcRegs = 1;
new_inst.Src[0].Register.File = TGSI_FILE_INPUT;
new_inst.Src[0].Register.Index = ctx->info.num_inputs;
new_inst.Src[0].Register.SwizzleX = TGSI_SWIZZLE_X;
new_inst.Src[0].Register.SwizzleY = TGSI_SWIZZLE_X;
new_inst.Src[0].Register.SwizzleZ = TGSI_SWIZZLE_X;
new_inst.Src[0].Register.SwizzleW = TGSI_SWIZZLE_X;
tctx->emit_instruction(tctx, &new_inst);
}
static void
transform_instr(struct tgsi_transform_context *tctx,
struct tgsi_full_instruction *inst)
{
struct tgsi_emulation_context *ctx = tgsi_emulation_context(tctx);
/* Pass through edgeflags. */
if (!ctx->first_instruction_emitted) {
ctx->first_instruction_emitted = true;
if (ctx->flags & TGSI_EMU_PASSTHROUGH_EDGEFLAG)
passthrough_edgeflag(tctx);
}
/* Clamp color outputs. */
if (ctx->flags & TGSI_EMU_CLAMP_COLOR_OUTPUTS) {
int i;
for (i = 0; i < inst->Instruction.NumDstRegs; i++) {
unsigned semantic;
if (inst->Dst[i].Register.File != TGSI_FILE_OUTPUT ||
inst->Dst[i].Register.Indirect)
continue;
semantic =
ctx->info.output_semantic_name[inst->Dst[i].Register.Index];
if (semantic == TGSI_SEMANTIC_COLOR ||
semantic == TGSI_SEMANTIC_BCOLOR)
inst->Instruction.Saturate = true;
}
}
tctx->emit_instruction(tctx, inst);
}
const struct tgsi_token *
tgsi_emulate(const struct tgsi_token *tokens, unsigned flags)
{
struct tgsi_emulation_context ctx;
struct tgsi_token *newtoks;
int newlen;
if (!(flags & (TGSI_EMU_CLAMP_COLOR_OUTPUTS |
TGSI_EMU_PASSTHROUGH_EDGEFLAG |
TGSI_EMU_FORCE_PERSAMPLE_INTERP)))
return NULL;
memset(&ctx, 0, sizeof(ctx));
ctx.flags = flags;
tgsi_scan_shader(tokens, &ctx.info);
if (flags & TGSI_EMU_FORCE_PERSAMPLE_INTERP)
ctx.base.transform_declaration = transform_decl;
if (flags & (TGSI_EMU_CLAMP_COLOR_OUTPUTS |
TGSI_EMU_PASSTHROUGH_EDGEFLAG))
ctx.base.transform_instruction = transform_instr;
newlen = tgsi_num_tokens(tokens) + 20;
newtoks = tgsi_alloc_tokens(newlen);
if (!newtoks)
return NULL;
tgsi_transform_shader(tokens, newtoks, newlen, &ctx.base);
return newtoks;
}

View File

@@ -0,0 +1,38 @@
/*
* Copyright (C) 2015 Advanced Micro Devices, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*/
#ifndef TGSI_GL_EMULATION_H_
#define TGSI_GL_EMULATION_H_
#include "pipe/p_shader_tokens.h"
#define TGSI_EMU_CLAMP_COLOR_OUTPUTS (1 << 0)
#define TGSI_EMU_PASSTHROUGH_EDGEFLAG (1 << 1)
#define TGSI_EMU_FORCE_PERSAMPLE_INTERP (1 << 2)
const struct tgsi_token *
tgsi_emulate(const struct tgsi_token *tokens, unsigned flags);
#endif /* TGSI_GL_EMULATION_H_ */

View File

@@ -2021,7 +2021,7 @@ fetch_sampler_unit(struct tgsi_exec_machine *mach,
/*
* execute a texture instruction.
*
* modifier is used to control the channel routing for the\
* modifier is used to control the channel routing for the
* instruction variants like proj, lod, and texture with lod bias.
* sampler indicates which src register the sampler is contained in.
*/
@@ -2032,7 +2032,7 @@ exec_tex(struct tgsi_exec_machine *mach,
{
const union tgsi_exec_channel *args[5], *proj = NULL;
union tgsi_exec_channel r[5];
enum tgsi_sampler_control control = tgsi_sampler_lod_none;
enum tgsi_sampler_control control = TGSI_SAMPLER_LOD_NONE;
uint chan;
uint unit;
int8_t offsets[3];
@@ -2078,11 +2078,11 @@ exec_tex(struct tgsi_exec_machine *mach,
args[i] = &ZeroVec;
if (modifier == TEX_MODIFIER_EXPLICIT_LOD)
control = tgsi_sampler_lod_explicit;
control = TGSI_SAMPLER_LOD_EXPLICIT;
else if (modifier == TEX_MODIFIER_LOD_BIAS)
control = tgsi_sampler_lod_bias;
control = TGSI_SAMPLER_LOD_BIAS;
else if (modifier == TEX_MODIFIER_GATHER)
control = tgsi_sampler_gather;
control = TGSI_SAMPLER_GATHER;
}
else {
for (i = dim; i < Elements(args); i++)
@@ -2132,6 +2132,46 @@ exec_tex(struct tgsi_exec_machine *mach,
}
}
static void
exec_lodq(struct tgsi_exec_machine *mach,
const struct tgsi_full_instruction *inst)
{
uint unit;
int dim;
int i;
union tgsi_exec_channel coords[4];
const union tgsi_exec_channel *args[Elements(coords)];
union tgsi_exec_channel r[2];
unit = fetch_sampler_unit(mach, inst, 1);
dim = tgsi_util_get_texture_coord_dim(inst->Texture.Texture, NULL);
assert(dim <= Elements(coords));
/* fetch coordinates */
for (i = 0; i < dim; i++) {
FETCH(&coords[i], 0, TGSI_CHAN_X + i);
args[i] = &coords[i];
}
for (i = dim; i < Elements(coords); i++) {
args[i] = &ZeroVec;
}
mach->Sampler->query_lod(mach->Sampler, unit, unit,
args[0]->f,
args[1]->f,
args[2]->f,
args[3]->f,
TGSI_SAMPLER_LOD_NONE,
r[0].f,
r[1].f);
if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_X) {
store_dest(mach, &r[0], &inst->Dst[0], inst, TGSI_CHAN_X,
TGSI_EXEC_DATA_FLOAT);
}
if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_Y) {
store_dest(mach, &r[1], &inst->Dst[0], inst, TGSI_CHAN_Y,
TGSI_EXEC_DATA_FLOAT);
}
}
static void
exec_txd(struct tgsi_exec_machine *mach,
@@ -2155,7 +2195,7 @@ exec_txd(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, unit, unit,
&r[0], &ZeroVec, &ZeroVec, &ZeroVec, &ZeroVec, /* S, T, P, C, LOD */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* R, G, B, A */
break;
@@ -2171,7 +2211,7 @@ exec_txd(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, unit, unit,
&r[0], &r[1], &r[2], &ZeroVec, &ZeroVec, /* S, T, P, C, LOD */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* R, G, B, A */
break;
@@ -2185,7 +2225,7 @@ exec_txd(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, unit, unit,
&r[0], &r[1], &ZeroVec, &ZeroVec, &ZeroVec, /* S, T, P, C, LOD */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* R, G, B, A */
break;
@@ -2205,7 +2245,7 @@ exec_txd(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, unit, unit,
&r[0], &r[1], &r[2], &r[3], &ZeroVec, /* inputs */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* outputs */
break;
@@ -2225,7 +2265,7 @@ exec_txd(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, unit, unit,
&r[0], &r[1], &r[2], &r[3], &ZeroVec, /* inputs */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* outputs */
break;
@@ -2364,7 +2404,7 @@ exec_sample(struct tgsi_exec_machine *mach,
const uint sampler_unit = inst->Src[2].Register.Index;
union tgsi_exec_channel r[5], c1;
const union tgsi_exec_channel *lod = &ZeroVec;
enum tgsi_sampler_control control = tgsi_sampler_lod_none;
enum tgsi_sampler_control control = TGSI_SAMPLER_LOD_NONE;
uint chan;
unsigned char swizzles[4];
int8_t offsets[3];
@@ -2378,16 +2418,16 @@ exec_sample(struct tgsi_exec_machine *mach,
if (modifier == TEX_MODIFIER_LOD_BIAS) {
FETCH(&c1, 3, TGSI_CHAN_X);
lod = &c1;
control = tgsi_sampler_lod_bias;
control = TGSI_SAMPLER_LOD_BIAS;
}
else if (modifier == TEX_MODIFIER_EXPLICIT_LOD) {
FETCH(&c1, 3, TGSI_CHAN_X);
lod = &c1;
control = tgsi_sampler_lod_explicit;
control = TGSI_SAMPLER_LOD_EXPLICIT;
}
else {
assert(modifier == TEX_MODIFIER_LEVEL_ZERO);
control = tgsi_sampler_lod_zero;
control = TGSI_SAMPLER_LOD_ZERO;
}
}
@@ -2513,7 +2553,7 @@ exec_sample_d(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, resource_unit, sampler_unit,
&r[0], &r[1], &ZeroVec, &ZeroVec, &ZeroVec, /* S, T, P, C, LOD */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* R, G, B, A */
break;
@@ -2529,7 +2569,7 @@ exec_sample_d(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, resource_unit, sampler_unit,
&r[0], &r[1], &r[2], &ZeroVec, &ZeroVec, /* inputs */
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]); /* outputs */
break;
@@ -2547,7 +2587,7 @@ exec_sample_d(struct tgsi_exec_machine *mach,
fetch_texel(mach->Sampler, resource_unit, sampler_unit,
&r[0], &r[1], &r[2], &r[3], &ZeroVec,
derivs, offsets, tgsi_sampler_derivs_explicit,
derivs, offsets, TGSI_SAMPLER_DERIVS_EXPLICIT,
&r[0], &r[1], &r[2], &r[3]);
break;
@@ -4378,6 +4418,12 @@ exec_instruction(
exec_tex(mach, inst, TEX_MODIFIER_GATHER, 2);
break;
case TGSI_OPCODE_LODQ:
/* src[0] = texcoord */
/* src[1] = sampler unit */
exec_lodq(mach, inst);
break;
case TGSI_OPCODE_UP2H:
assert (0);
break;

View File

@@ -88,13 +88,14 @@ struct tgsi_interp_coef
float dady[TGSI_NUM_CHANNELS];
};
enum tgsi_sampler_control {
tgsi_sampler_lod_none,
tgsi_sampler_lod_bias,
tgsi_sampler_lod_explicit,
tgsi_sampler_lod_zero,
tgsi_sampler_derivs_explicit,
tgsi_sampler_gather,
enum tgsi_sampler_control
{
TGSI_SAMPLER_LOD_NONE,
TGSI_SAMPLER_LOD_BIAS,
TGSI_SAMPLER_LOD_EXPLICIT,
TGSI_SAMPLER_LOD_ZERO,
TGSI_SAMPLER_DERIVS_EXPLICIT,
TGSI_SAMPLER_GATHER,
};
/**
@@ -138,6 +139,16 @@ struct tgsi_sampler
const int j[TGSI_QUAD_SIZE], const int k[TGSI_QUAD_SIZE],
const int lod[TGSI_QUAD_SIZE], const int8_t offset[3],
float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE]);
void (*query_lod)(const struct tgsi_sampler *tgsi_sampler,
const unsigned sview_index,
const unsigned sampler_index,
const float s[TGSI_QUAD_SIZE],
const float t[TGSI_QUAD_SIZE],
const float p[TGSI_QUAD_SIZE],
const float c0[TGSI_QUAD_SIZE],
const enum tgsi_sampler_control control,
float mipmap[TGSI_QUAD_SIZE],
float lod[TGSI_QUAD_SIZE]);
};
#define TGSI_EXEC_NUM_TEMPS 4096

View File

@@ -141,7 +141,7 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] =
{ 0, 0, 0, 1, 1, 0, NONE, "ENDLOOP", TGSI_OPCODE_ENDLOOP },
{ 0, 0, 0, 0, 1, 0, NONE, "ENDSUB", TGSI_OPCODE_ENDSUB },
{ 1, 1, 1, 0, 0, 0, OTHR, "TXQ_LZ", TGSI_OPCODE_TXQ_LZ },
{ 0, 0, 0, 0, 0, 0, NONE, "", 104 }, /* removed */
{ 1, 1, 1, 0, 0, 0, OTHR, "TXQS", TGSI_OPCODE_TXQS },
{ 0, 0, 0, 0, 0, 0, NONE, "", 105 }, /* removed */
{ 0, 0, 0, 0, 0, 0, NONE, "", 106 }, /* removed */
{ 0, 0, 0, 0, 0, 0, NONE, "NOP", TGSI_OPCODE_NOP },
@@ -331,6 +331,7 @@ tgsi_opcode_infer_type( uint opcode )
case TGSI_OPCODE_SAD: /* XXX some src args may be signed for SAD ? */
case TGSI_OPCODE_TXQ:
case TGSI_OPCODE_TXQ_LZ:
case TGSI_OPCODE_TXQS:
case TGSI_OPCODE_F2U:
case TGSI_OPCODE_UDIV:
case TGSI_OPCODE_UMAD:

View File

@@ -0,0 +1,582 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/**
* This utility transforms the geometry shader to emulate point sprite by
* drawing a quad. It also adds an extra output for the original point position
* if the point position is to be written to a stream output buffer.
* Note: It assumes the driver will add a constant for the inverse viewport
* after the user defined constants.
*/
#include "util/u_debug.h"
#include "util/u_math.h"
#include "tgsi_info.h"
#include "tgsi_point_sprite.h"
#include "tgsi_transform.h"
#include "pipe/p_state.h"
#define INVALID_INDEX 9999
/* Set swizzle based on the immediates (0, 1, 0, -1) */
static inline unsigned
set_swizzle(int x, int y, int z, int w)
{
static const unsigned map[3] = {TGSI_SWIZZLE_W, TGSI_SWIZZLE_X,
TGSI_SWIZZLE_Y};
assert(x >= -1);
assert(x <= 1);
assert(y >= -1);
assert(y <= 1);
assert(z >= -1);
assert(z <= 1);
assert(w >= -1);
assert(w <= 1);
return map[x+1] | (map[y+1] << 2) | (map[z+1] << 4) | (map[w+1] << 6);
}
static inline unsigned
get_swizzle(unsigned swizzle, unsigned component)
{
assert(component < 4);
return (swizzle >> (component * 2)) & 0x3;
}
struct psprite_transform_context
{
struct tgsi_transform_context base;
unsigned num_tmp;
unsigned num_out;
unsigned num_orig_out;
unsigned num_const;
unsigned num_imm;
unsigned point_size_in; // point size input
unsigned point_size_out; // point size output
unsigned point_size_tmp; // point size temp
unsigned point_pos_in; // point pos input
unsigned point_pos_out; // point pos output
unsigned point_pos_sout; // original point pos for streamout
unsigned point_pos_tmp; // point pos temp
unsigned point_scale_tmp; // point scale temp
unsigned point_color_out; // point color output
unsigned point_color_tmp; // point color temp
unsigned point_imm; // point immediates
unsigned point_ivp; // point inverseViewport constant
unsigned point_dir_swz[4]; // point direction swizzle
unsigned point_coord_swz[4]; // point coord swizzle
unsigned point_coord_enable; // point coord enable mask
unsigned point_coord_decl; // point coord output declared mask
unsigned point_coord_out; // point coord output starting index
unsigned point_coord_aa; // aa point coord semantic index
unsigned point_coord_k; // aa point coord threshold distance
unsigned stream_out_point_pos:1; // set if to stream out original point pos
unsigned aa_point:1; // set if doing aa point
unsigned out_tmp_index[PIPE_MAX_SHADER_OUTPUTS];
int max_generic;
};
static inline struct psprite_transform_context *
psprite_transform_context(struct tgsi_transform_context *ctx)
{
return (struct psprite_transform_context *) ctx;
}
/**
* TGSI declaration transform callback.
*/
static void
psprite_decl(struct tgsi_transform_context *ctx,
struct tgsi_full_declaration *decl)
{
struct psprite_transform_context *ts = psprite_transform_context(ctx);
if (decl->Declaration.File == TGSI_FILE_INPUT) {
if (decl->Semantic.Name == TGSI_SEMANTIC_PSIZE) {
ts->point_size_in = decl->Range.First;
}
else if (decl->Semantic.Name == TGSI_SEMANTIC_POSITION) {
ts->point_pos_in = decl->Range.First;
}
}
else if (decl->Declaration.File == TGSI_FILE_OUTPUT) {
if (decl->Semantic.Name == TGSI_SEMANTIC_PSIZE) {
ts->point_size_out = decl->Range.First;
}
else if (decl->Semantic.Name == TGSI_SEMANTIC_POSITION) {
ts->point_pos_out = decl->Range.First;
}
else if (decl->Semantic.Name == TGSI_SEMANTIC_GENERIC &&
decl->Semantic.Index < 32) {
ts->point_coord_decl |= 1 << decl->Semantic.Index;
ts->max_generic = MAX2(ts->max_generic, decl->Semantic.Index);
}
ts->num_out = MAX2(ts->num_out, decl->Range.Last + 1);
}
else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
}
else if (decl->Declaration.File == TGSI_FILE_CONSTANT) {
ts->num_const = MAX2(ts->num_const, decl->Range.Last + 1);
}
ctx->emit_declaration(ctx, decl);
}
/**
* TGSI immediate declaration transform callback.
*/
static void
psprite_immediate(struct tgsi_transform_context *ctx,
struct tgsi_full_immediate *imm)
{
struct psprite_transform_context *ts = psprite_transform_context(ctx);
ctx->emit_immediate(ctx, imm);
ts->num_imm++;
}
/**
* TGSI transform prolog callback.
*/
static void
psprite_prolog(struct tgsi_transform_context *ctx)
{
struct psprite_transform_context *ts = psprite_transform_context(ctx);
unsigned point_coord_enable, en;
int i;
/* Replace output registers with temporary registers */
for (i = 0; i < ts->num_out; i++) {
ts->out_tmp_index[i] = ts->num_tmp++;
}
ts->num_orig_out = ts->num_out;
/* Declare a tmp register for point scale */
ts->point_scale_tmp = ts->num_tmp++;
if (ts->point_size_out != INVALID_INDEX)
ts->point_size_tmp = ts->out_tmp_index[ts->point_size_out];
else
ts->point_size_tmp = ts->num_tmp++;
assert(ts->point_pos_out != INVALID_INDEX);
ts->point_pos_tmp = ts->out_tmp_index[ts->point_pos_out];
ts->out_tmp_index[ts->point_pos_out] = INVALID_INDEX;
/* Declare one more tmp register for point coord threshold distance
* if we are generating anti-aliased point.
*/
if (ts->aa_point)
ts->point_coord_k = ts->num_tmp++;
tgsi_transform_temps_decl(ctx, ts->point_size_tmp, ts->num_tmp-1);
/* Declare an extra output for the original point position for stream out */
if (ts->stream_out_point_pos) {
ts->point_pos_sout = ts->num_out++;
tgsi_transform_output_decl(ctx, ts->point_pos_sout,
TGSI_SEMANTIC_GENERIC, 0, 0);
}
/* point coord outputs to be declared */
point_coord_enable = ts->point_coord_enable & ~ts->point_coord_decl;
/* Declare outputs for those point coord that are enabled but are not
* already declared in this shader.
*/
ts->point_coord_out = ts->num_out;
if (point_coord_enable) {
for (i = 0, en = point_coord_enable; en; en>>=1, i++) {
if (en & 0x1) {
tgsi_transform_output_decl(ctx, ts->num_out++,
TGSI_SEMANTIC_GENERIC, i, 0);
ts->max_generic = MAX2(ts->max_generic, i);
}
}
}
/* add an extra generic output for aa point texcoord */
if (ts->aa_point) {
ts->point_coord_aa = ts->max_generic + 1;
assert((ts->point_coord_enable & (1 << ts->point_coord_aa)) == 0);
ts->point_coord_enable |= 1 << (ts->point_coord_aa);
tgsi_transform_output_decl(ctx, ts->num_out++, TGSI_SEMANTIC_GENERIC,
ts->point_coord_aa, 0);
}
/* Declare extra immediates */
ts->point_imm = ts->num_imm;
tgsi_transform_immediate_decl(ctx, 0, 1, 0.5, -1);
/* Declare point constant -
* constant.xy -- inverseViewport
* constant.z -- current point size
* constant.w -- max point size
* The driver needs to add this constant to the constant buffer
*/
ts->point_ivp = ts->num_const++;
tgsi_transform_const_decl(ctx, ts->point_ivp, ts->point_ivp);
/* If this geometry shader does not specify point size,
* get the current point size from the point constant.
*/
if (ts->point_size_out == INVALID_INDEX) {
struct tgsi_full_instruction inst;
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_MOV;
inst.Instruction.NumDstRegs = 1;
tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_TEMPORARY,
ts->point_size_tmp, TGSI_WRITEMASK_XYZW);
inst.Instruction.NumSrcRegs = 1;
tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_CONSTANT,
ts->point_ivp, TGSI_SWIZZLE_Z,
TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z);
ctx->emit_instruction(ctx, &inst);
}
}
/**
* Add the point sprite emulation instructions at the emit vertex instruction
*/
static void
psprite_emit_vertex_inst(struct tgsi_transform_context *ctx,
struct tgsi_full_instruction *vert_inst)
{
struct psprite_transform_context *ts = psprite_transform_context(ctx);
struct tgsi_full_instruction inst;
unsigned point_coord_enable, en;
unsigned i, j, s;
/* new point coord outputs */
point_coord_enable = ts->point_coord_enable & ~ts->point_coord_decl;
/* OUTPUT[pos_sout] = TEMP[pos] */
if (ts->point_pos_sout != INVALID_INDEX) {
tgsi_transform_op1_inst(ctx, TGSI_OPCODE_MOV,
TGSI_FILE_OUTPUT, ts->point_pos_sout,
TGSI_WRITEMASK_XYZW,
TGSI_FILE_TEMPORARY, ts->point_pos_tmp);
}
/**
* Set up the point scale vector
* scale = pointSize * pos.w * inverseViewport
*/
/* MUL point_scale.x, point_size.x, point_pos.w */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_MUL,
TGSI_FILE_TEMPORARY, ts->point_scale_tmp, TGSI_WRITEMASK_X,
TGSI_FILE_TEMPORARY, ts->point_size_tmp, TGSI_SWIZZLE_X,
TGSI_FILE_TEMPORARY, ts->point_pos_tmp, TGSI_SWIZZLE_W);
/* MUL point_scale.xy, point_scale.xx, inverseViewport.xy */
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_MUL;
inst.Instruction.NumDstRegs = 1;
tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_TEMPORARY,
ts->point_scale_tmp, TGSI_WRITEMASK_XY);
inst.Instruction.NumSrcRegs = 2;
tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_TEMPORARY,
ts->point_scale_tmp, TGSI_SWIZZLE_X,
TGSI_SWIZZLE_X, TGSI_SWIZZLE_X, TGSI_SWIZZLE_X);
tgsi_transform_src_reg(&inst.Src[1], TGSI_FILE_CONSTANT,
ts->point_ivp, TGSI_SWIZZLE_X,
TGSI_SWIZZLE_Y, TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z);
ctx->emit_instruction(ctx, &inst);
/**
* Set up the point coord threshold distance
* k = 0.5 - 1 / pointsize
*/
if (ts->aa_point) {
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_DIV,
TGSI_FILE_TEMPORARY, ts->point_coord_k,
TGSI_WRITEMASK_X,
TGSI_FILE_IMMEDIATE, ts->point_imm,
TGSI_SWIZZLE_Y,
TGSI_FILE_TEMPORARY, ts->point_size_tmp,
TGSI_SWIZZLE_X);
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_SUB,
TGSI_FILE_TEMPORARY, ts->point_coord_k,
TGSI_WRITEMASK_X,
TGSI_FILE_IMMEDIATE, ts->point_imm,
TGSI_SWIZZLE_Z,
TGSI_FILE_TEMPORARY, ts->point_coord_k,
TGSI_SWIZZLE_X);
}
for (i = 0; i < 4; i++) {
unsigned point_dir_swz = ts->point_dir_swz[i];
unsigned point_coord_swz = ts->point_coord_swz[i];
/* All outputs need to be emitted for each vertex */
for (j = 0; j < ts->num_orig_out; j++) {
if (ts->out_tmp_index[j] != INVALID_INDEX) {
tgsi_transform_op1_inst(ctx, TGSI_OPCODE_MOV,
TGSI_FILE_OUTPUT, j,
TGSI_WRITEMASK_XYZW,
TGSI_FILE_TEMPORARY, ts->out_tmp_index[j]);
}
}
/* pos = point_scale * point_dir + point_pos */
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_MAD;
inst.Instruction.NumDstRegs = 1;
tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_OUTPUT, ts->point_pos_out,
TGSI_WRITEMASK_XYZW);
inst.Instruction.NumSrcRegs = 3;
tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_TEMPORARY, ts->point_scale_tmp,
TGSI_SWIZZLE_X, TGSI_SWIZZLE_Y, TGSI_SWIZZLE_X,
TGSI_SWIZZLE_X);
tgsi_transform_src_reg(&inst.Src[1], TGSI_FILE_IMMEDIATE, ts->point_imm,
get_swizzle(point_dir_swz, 0),
get_swizzle(point_dir_swz, 1),
get_swizzle(point_dir_swz, 2),
get_swizzle(point_dir_swz, 3));
tgsi_transform_src_reg(&inst.Src[2], TGSI_FILE_TEMPORARY, ts->point_pos_tmp,
TGSI_SWIZZLE_X, TGSI_SWIZZLE_Y, TGSI_SWIZZLE_Z,
TGSI_SWIZZLE_W);
ctx->emit_instruction(ctx, &inst);
/* point coord */
for (j = 0, s = 0, en = point_coord_enable; en; en>>=1, s++) {
unsigned dstReg;
if (en & 0x1) {
dstReg = ts->point_coord_out + j;
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_MOV;
inst.Instruction.NumDstRegs = 1;
tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_OUTPUT,
dstReg, TGSI_WRITEMASK_XYZW);
inst.Instruction.NumSrcRegs = 1;
tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_IMMEDIATE, ts->point_imm,
get_swizzle(point_coord_swz, 0),
get_swizzle(point_coord_swz, 1),
get_swizzle(point_coord_swz, 2),
get_swizzle(point_coord_swz, 3));
ctx->emit_instruction(ctx, &inst);
/* MOV point_coord.z point_coord_k.x */
if (s == ts->point_coord_aa) {
tgsi_transform_op1_swz_inst(ctx, TGSI_OPCODE_MOV,
TGSI_FILE_OUTPUT, dstReg, TGSI_WRITEMASK_Z,
TGSI_FILE_TEMPORARY, ts->point_coord_k,
TGSI_SWIZZLE_X);
}
j++; /* the next point coord output offset */
}
}
/* Emit the EMIT instruction for each vertex of the quad */
ctx->emit_instruction(ctx, vert_inst);
}
/* Emit the ENDPRIM instruction for the quad */
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_ENDPRIM;
inst.Instruction.NumDstRegs = 0;
inst.Instruction.NumSrcRegs = 1;
inst.Src[0] = vert_inst->Src[0];
ctx->emit_instruction(ctx, &inst);
}
/**
* TGSI instruction transform callback.
*/
static void
psprite_inst(struct tgsi_transform_context *ctx,
struct tgsi_full_instruction *inst)
{
struct psprite_transform_context *ts = psprite_transform_context(ctx);
if (inst->Instruction.Opcode == TGSI_OPCODE_EMIT) {
psprite_emit_vertex_inst(ctx, inst);
}
else if (inst->Dst[0].Register.File == TGSI_FILE_OUTPUT &&
inst->Dst[0].Register.Index == ts->point_size_out) {
/**
* Replace point size output reg with tmp reg.
* The tmp reg will be later used as a src reg for computing
* the point scale factor.
*/
inst->Dst[0].Register.File = TGSI_FILE_TEMPORARY;
inst->Dst[0].Register.Index = ts->point_size_tmp;
ctx->emit_instruction(ctx, inst);
/* Clamp the point size */
/* MAX point_size_tmp.x, point_size_tmp.x, point_imm.y */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_MAX,
TGSI_FILE_TEMPORARY, ts->point_size_tmp, TGSI_WRITEMASK_X,
TGSI_FILE_TEMPORARY, ts->point_size_tmp, TGSI_SWIZZLE_X,
TGSI_FILE_IMMEDIATE, ts->point_imm, TGSI_SWIZZLE_Y);
/* MIN point_size_tmp.x, point_size_tmp.x, point_ivp.w */
tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_MIN,
TGSI_FILE_TEMPORARY, ts->point_size_tmp, TGSI_WRITEMASK_X,
TGSI_FILE_TEMPORARY, ts->point_size_tmp, TGSI_SWIZZLE_X,
TGSI_FILE_CONSTANT, ts->point_ivp, TGSI_SWIZZLE_W);
}
else if (inst->Dst[0].Register.File == TGSI_FILE_OUTPUT &&
inst->Dst[0].Register.Index == ts->point_pos_out) {
/**
* Replace point pos output reg with tmp reg.
*/
inst->Dst[0].Register.File = TGSI_FILE_TEMPORARY;
inst->Dst[0].Register.Index = ts->point_pos_tmp;
ctx->emit_instruction(ctx, inst);
}
else if (inst->Dst[0].Register.File == TGSI_FILE_OUTPUT) {
/**
* Replace output reg with tmp reg.
*/
inst->Dst[0].Register.File = TGSI_FILE_TEMPORARY;
inst->Dst[0].Register.Index = ts->out_tmp_index[inst->Dst[0].Register.Index];
ctx->emit_instruction(ctx, inst);
}
else {
ctx->emit_instruction(ctx, inst);
}
}
/**
* TGSI property instruction transform callback.
* Transforms a point into a 4-vertex triangle strip.
*/
static void
psprite_property(struct tgsi_transform_context *ctx,
struct tgsi_full_property *prop)
{
switch (prop->Property.PropertyName) {
case TGSI_PROPERTY_GS_OUTPUT_PRIM:
prop->u[0].Data = PIPE_PRIM_TRIANGLE_STRIP;
break;
case TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES:
prop->u[0].Data *= 4;
break;
default:
break;
}
ctx->emit_property(ctx, prop);
}
/**
* TGSI utility to transform a geometry shader to support point sprite.
*/
struct tgsi_token *
tgsi_add_point_sprite(const struct tgsi_token *tokens_in,
const unsigned point_coord_enable,
const bool sprite_origin_lower_left,
const bool stream_out_point_pos,
int *aa_point_coord_index)
{
struct psprite_transform_context transform;
const uint num_new_tokens = 200; /* should be enough */
const uint new_len = tgsi_num_tokens(tokens_in) + num_new_tokens;
struct tgsi_token *new_tokens;
/* setup transformation context */
memset(&transform, 0, sizeof(transform));
transform.base.transform_declaration = psprite_decl;
transform.base.transform_instruction = psprite_inst;
transform.base.transform_property = psprite_property;
transform.base.transform_immediate = psprite_immediate;
transform.base.prolog = psprite_prolog;
transform.point_size_in = INVALID_INDEX;
transform.point_size_out = INVALID_INDEX;
transform.point_size_tmp = INVALID_INDEX;
transform.point_pos_in = INVALID_INDEX;
transform.point_pos_out = INVALID_INDEX;
transform.point_pos_sout = INVALID_INDEX;
transform.point_pos_tmp = INVALID_INDEX;
transform.point_scale_tmp = INVALID_INDEX;
transform.point_imm = INVALID_INDEX;
transform.point_coord_aa = INVALID_INDEX;
transform.point_coord_k = INVALID_INDEX;
transform.stream_out_point_pos = stream_out_point_pos;
transform.point_coord_enable = point_coord_enable;
transform.aa_point = aa_point_coord_index != NULL;
transform.max_generic = -1;
/* point sprite directions based on the immediates (0, 1, 0.5, -1) */
/* (-1, -1, 0, 0) */
transform.point_dir_swz[0] = set_swizzle(-1, -1, 0, 0);
/* (-1, 1, 0, 0) */
transform.point_dir_swz[1] = set_swizzle(-1, 1, 0, 0);
/* (1, -1, 0, 0) */
transform.point_dir_swz[2] = set_swizzle(1, -1, 0, 0);
/* (1, 1, 0, 0) */
transform.point_dir_swz[3] = set_swizzle(1, 1, 0, 0);
/* point coord based on the immediates (0, 1, 0, -1) */
if (sprite_origin_lower_left) {
/* (0, 0, 0, 1) */
transform.point_coord_swz[0] = set_swizzle(0, 0, 0, 1);
/* (0, 1, 0, 1) */
transform.point_coord_swz[1] = set_swizzle(0, 1, 0, 1);
/* (1, 0, 0, 1) */
transform.point_coord_swz[2] = set_swizzle(1, 0, 0, 1);
/* (1, 1, 0, 1) */
transform.point_coord_swz[3] = set_swizzle(1, 1, 0, 1);
}
else {
/* (0, 1, 0, 1) */
transform.point_coord_swz[0] = set_swizzle(0, 1, 0, 1);
/* (0, 0, 0, 1) */
transform.point_coord_swz[1] = set_swizzle(0, 0, 0, 1);
/* (1, 1, 0, 1) */
transform.point_coord_swz[2] = set_swizzle(1, 1, 0, 1);
/* (1, 0, 0, 1) */
transform.point_coord_swz[3] = set_swizzle(1, 0, 0, 1);
}
/* allocate new tokens buffer */
new_tokens = tgsi_alloc_tokens(new_len);
if (!new_tokens)
return NULL;
/* transform the shader */
tgsi_transform_shader(tokens_in, new_tokens, new_len, &transform.base);
if (aa_point_coord_index)
*aa_point_coord_index = transform.point_coord_aa;
return new_tokens;
}

View File

@@ -0,0 +1,38 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef TGSI_POINT_SPRITE_H
#define TGSI_POINT_SPRITE_H
struct tgsi_token;
struct tgsi_token *
tgsi_add_point_sprite(const struct tgsi_token *tokens_in,
const unsigned point_coord_enable,
const bool sprite_origin_lower_left,
const bool stream_out_point_pos,
int *aa_point_coord_index);
#endif /* TGSI_POINT_SPRITE_H */

View File

@@ -56,6 +56,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
{
uint procType, i;
struct tgsi_parse_context parse;
unsigned current_depth = 0;
memset(info, 0, sizeof(*info));
for (i = 0; i < TGSI_FILE_COUNT; i++)
@@ -100,6 +101,72 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
assert(fullinst->Instruction.Opcode < TGSI_OPCODE_LAST);
info->opcode_count[fullinst->Instruction.Opcode]++;
switch (fullinst->Instruction.Opcode) {
case TGSI_OPCODE_IF:
case TGSI_OPCODE_UIF:
case TGSI_OPCODE_BGNLOOP:
current_depth++;
info->max_depth = MAX2(info->max_depth, current_depth);
break;
case TGSI_OPCODE_ENDIF:
case TGSI_OPCODE_ENDLOOP:
current_depth--;
break;
default:
break;
}
if (fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_CENTROID ||
fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_OFFSET ||
fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_SAMPLE) {
const struct tgsi_full_src_register *src0 = &fullinst->Src[0];
unsigned input;
if (src0->Register.Indirect && src0->Indirect.ArrayID)
input = info->input_array_first[src0->Indirect.ArrayID];
else
input = src0->Register.Index;
/* For the INTERP opcodes, the interpolation is always
* PERSPECTIVE unless LINEAR is specified.
*/
switch (info->input_interpolate[input]) {
case TGSI_INTERPOLATE_COLOR:
case TGSI_INTERPOLATE_CONSTANT:
case TGSI_INTERPOLATE_PERSPECTIVE:
switch (fullinst->Instruction.Opcode) {
case TGSI_OPCODE_INTERP_CENTROID:
info->uses_persp_opcode_interp_centroid = true;
break;
case TGSI_OPCODE_INTERP_OFFSET:
info->uses_persp_opcode_interp_offset = true;
break;
case TGSI_OPCODE_INTERP_SAMPLE:
info->uses_persp_opcode_interp_sample = true;
break;
}
break;
case TGSI_INTERPOLATE_LINEAR:
switch (fullinst->Instruction.Opcode) {
case TGSI_OPCODE_INTERP_CENTROID:
info->uses_linear_opcode_interp_centroid = true;
break;
case TGSI_OPCODE_INTERP_OFFSET:
info->uses_linear_opcode_interp_offset = true;
break;
case TGSI_OPCODE_INTERP_SAMPLE:
info->uses_linear_opcode_interp_sample = true;
break;
}
break;
}
}
if (fullinst->Instruction.Opcode >= TGSI_OPCODE_F2D &&
fullinst->Instruction.Opcode <= TGSI_OPCODE_DSSG)
info->uses_doubles = true;
for (i = 0; i < fullinst->Instruction.NumSrcRegs; i++) {
const struct tgsi_full_src_register *src =
&fullinst->Src[i];
@@ -216,8 +283,48 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
info->input_cylindrical_wrap[reg] = (ubyte)fulldecl->Interp.CylindricalWrap;
info->num_inputs++;
if (fulldecl->Interp.Location == TGSI_INTERPOLATE_LOC_CENTROID)
info->uses_centroid = TRUE;
/* Only interpolated varyings. Don't include POSITION.
* Don't include integer varyings, because they are not
* interpolated.
*/
if (semName == TGSI_SEMANTIC_GENERIC ||
semName == TGSI_SEMANTIC_TEXCOORD ||
semName == TGSI_SEMANTIC_COLOR ||
semName == TGSI_SEMANTIC_BCOLOR ||
semName == TGSI_SEMANTIC_FOG ||
semName == TGSI_SEMANTIC_CLIPDIST ||
semName == TGSI_SEMANTIC_CULLDIST) {
switch (fulldecl->Interp.Interpolate) {
case TGSI_INTERPOLATE_COLOR:
case TGSI_INTERPOLATE_PERSPECTIVE:
switch (fulldecl->Interp.Location) {
case TGSI_INTERPOLATE_LOC_CENTER:
info->uses_persp_center = true;
break;
case TGSI_INTERPOLATE_LOC_CENTROID:
info->uses_persp_centroid = true;
break;
case TGSI_INTERPOLATE_LOC_SAMPLE:
info->uses_persp_sample = true;
break;
}
break;
case TGSI_INTERPOLATE_LINEAR:
switch (fulldecl->Interp.Location) {
case TGSI_INTERPOLATE_LOC_CENTER:
info->uses_linear_center = true;
break;
case TGSI_INTERPOLATE_LOC_CENTROID:
info->uses_linear_centroid = true;
break;
case TGSI_INTERPOLATE_LOC_SAMPLE:
info->uses_linear_sample = true;
break;
}
break;
/* TGSI_INTERPOLATE_CONSTANT doesn't do any interpolation. */
}
}
if (semName == TGSI_SEMANTIC_PRIMID)
info->uses_primid = TRUE;
@@ -302,6 +409,8 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
info->writes_edgeflag = TRUE;
}
}
} else if (file == TGSI_FILE_SAMPLER) {
info->samplers_declared |= 1 << reg;
}
}
}

View File

@@ -64,6 +64,7 @@ struct tgsi_shader_info
uint file_count[TGSI_FILE_COUNT]; /**< number of declared registers */
int file_max[TGSI_FILE_COUNT]; /**< highest index of declared registers */
int const_file_max[PIPE_MAX_CONSTANT_BUFFERS];
unsigned samplers_declared; /**< bitmask of declared samplers */
ubyte input_array_first[PIPE_MAX_SHADER_INPUTS];
ubyte input_array_last[PIPE_MAX_SHADER_INPUTS];
@@ -82,7 +83,18 @@ struct tgsi_shader_info
boolean writes_stencil; /**< does fragment shader write stencil value? */
boolean writes_edgeflag; /**< vertex shader outputs edgeflag */
boolean uses_kill; /**< KILL or KILL_IF instruction used? */
boolean uses_centroid;
boolean uses_persp_center;
boolean uses_persp_centroid;
boolean uses_persp_sample;
boolean uses_linear_center;
boolean uses_linear_centroid;
boolean uses_linear_sample;
boolean uses_persp_opcode_interp_centroid;
boolean uses_persp_opcode_interp_offset;
boolean uses_persp_opcode_interp_sample;
boolean uses_linear_opcode_interp_centroid;
boolean uses_linear_opcode_interp_offset;
boolean uses_linear_opcode_interp_sample;
boolean uses_instanceid;
boolean uses_vertexid;
boolean uses_vertexid_nobase;
@@ -95,7 +107,7 @@ struct tgsi_shader_info
boolean writes_viewport_index;
boolean writes_layer;
boolean is_msaa_sampler[PIPE_MAX_SAMPLERS];
boolean uses_doubles; /**< uses any of the double instructions */
unsigned clipdist_writemask;
unsigned culldist_writemask;
unsigned num_written_culldistance;
@@ -113,6 +125,11 @@ struct tgsi_shader_info
unsigned indirect_files_written;
unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */
/**
* Max nesting limit of loops/if's
*/
unsigned max_depth;
};
extern void

View File

@@ -95,19 +95,38 @@ struct tgsi_transform_context
* Helper for emitting temporary register declarations.
*/
static inline void
tgsi_transform_temp_decl(struct tgsi_transform_context *ctx,
unsigned index)
tgsi_transform_temps_decl(struct tgsi_transform_context *ctx,
unsigned firstIdx, unsigned lastIdx)
{
struct tgsi_full_declaration decl;
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_TEMPORARY;
decl.Range.First =
decl.Range.Last = index;
decl.Range.First = firstIdx;
decl.Range.Last = lastIdx;
ctx->emit_declaration(ctx, &decl);
}
static inline void
tgsi_transform_temp_decl(struct tgsi_transform_context *ctx,
unsigned index)
{
tgsi_transform_temps_decl(ctx, index, index);
}
static inline void
tgsi_transform_const_decl(struct tgsi_transform_context *ctx,
unsigned firstIdx, unsigned lastIdx)
{
struct tgsi_full_declaration decl;
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_CONSTANT;
decl.Range.First = firstIdx;
decl.Range.Last = lastIdx;
ctx->emit_declaration(ctx, &decl);
}
static inline void
tgsi_transform_input_decl(struct tgsi_transform_context *ctx,
unsigned index,
@@ -129,6 +148,26 @@ tgsi_transform_input_decl(struct tgsi_transform_context *ctx,
ctx->emit_declaration(ctx, &decl);
}
static inline void
tgsi_transform_output_decl(struct tgsi_transform_context *ctx,
unsigned index,
unsigned sem_name, unsigned sem_index,
unsigned interp)
{
struct tgsi_full_declaration decl;
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_OUTPUT;
decl.Declaration.Interpolate = 1;
decl.Declaration.Semantic = 1;
decl.Semantic.Name = sem_name;
decl.Semantic.Index = sem_index;
decl.Range.First =
decl.Range.Last = index;
decl.Interp.Interpolate = interp;
ctx->emit_declaration(ctx, &decl);
}
static inline void
tgsi_transform_sampler_decl(struct tgsi_transform_context *ctx,
@@ -182,6 +221,28 @@ tgsi_transform_immediate_decl(struct tgsi_transform_context *ctx,
ctx->emit_immediate(ctx, &immed);
}
static inline void
tgsi_transform_dst_reg(struct tgsi_full_dst_register *reg,
unsigned file, unsigned index, unsigned writemask)
{
reg->Register.File = file;
reg->Register.Index = index;
reg->Register.WriteMask = writemask;
}
static inline void
tgsi_transform_src_reg(struct tgsi_full_src_register *reg,
unsigned file, unsigned index,
unsigned swizzleX, unsigned swizzleY,
unsigned swizzleZ, unsigned swizzleW)
{
reg->Register.File = file;
reg->Register.Index = index;
reg->Register.SwizzleX = swizzleX;
reg->Register.SwizzleY = swizzleY;
reg->Register.SwizzleZ = swizzleZ;
reg->Register.SwizzleW = swizzleW;
}
/**
* Helper for emitting 1-operand instructions.
@@ -399,7 +460,8 @@ static inline void
tgsi_transform_kill_inst(struct tgsi_transform_context *ctx,
unsigned src_file,
unsigned src_index,
unsigned src_swizzle)
unsigned src_swizzle,
boolean negate)
{
struct tgsi_full_instruction inst;
@@ -413,7 +475,7 @@ tgsi_transform_kill_inst(struct tgsi_transform_context *ctx,
inst.Src[0].Register.SwizzleY =
inst.Src[0].Register.SwizzleZ =
inst.Src[0].Register.SwizzleW = src_swizzle;
inst.Src[0].Register.Negate = 1;
inst.Src[0].Register.Negate = negate;
ctx->emit_instruction(ctx, &inst);
}

View File

@@ -0,0 +1,228 @@
/*
* Copyright 2013 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/**
* This utility transforms fragment shaders to facilitate two-sided lighting.
*
* Basically, if the FS has any color inputs (TGSI_SEMANTIC_COLOR) we'll:
* 1. create corresponding back-color inputs (TGSI_SEMANTIC_BCOLOR)
* 2. use the FACE register to choose between front/back colors and put the
* selected color in new temp regs.
* 3. replace reads of the original color inputs with the new temp regs.
*
* Then, the driver just needs to link the VS front/back output colors to
* the FS front/back input colors.
*/
#include "util/u_debug.h"
#include "util/u_math.h"
#include "tgsi_info.h"
#include "tgsi_two_side.h"
#include "tgsi_transform.h"
#define INVALID_INDEX 9999
struct two_side_transform_context
{
struct tgsi_transform_context base;
uint num_temps;
uint num_inputs;
uint face_input; /**< index of the FACE input */
uint front_color_input[2]; /**< INPUT regs */
uint front_color_interp[2];/**< TGSI_INTERPOLATE_x */
uint back_color_input[2]; /**< INPUT regs */
uint new_colors[2]; /**< TEMP regs */
};
static inline struct two_side_transform_context *
two_side_transform_context(struct tgsi_transform_context *ctx)
{
return (struct two_side_transform_context *) ctx;
}
static void
xform_decl(struct tgsi_transform_context *ctx,
struct tgsi_full_declaration *decl)
{
struct two_side_transform_context *ts = two_side_transform_context(ctx);
if (decl->Declaration.File == TGSI_FILE_INPUT) {
if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR) {
/* found a front color */
assert(decl->Semantic.Index < 2);
ts->front_color_input[decl->Semantic.Index] = decl->Range.First;
ts->front_color_interp[decl->Semantic.Index] = decl->Interp.Interpolate;
}
else if (decl->Semantic.Name == TGSI_SEMANTIC_FACE) {
ts->face_input = decl->Range.First;
}
ts->num_inputs = MAX2(ts->num_inputs, decl->Range.Last + 1);
}
else if (decl->Declaration.File == TGSI_FILE_TEMPORARY) {
ts->num_temps = MAX2(ts->num_temps, decl->Range.Last + 1);
}
ctx->emit_declaration(ctx, decl);
}
static void
emit_prolog(struct tgsi_transform_context *ctx)
{
struct two_side_transform_context *ts = two_side_transform_context(ctx);
struct tgsi_full_declaration decl;
struct tgsi_full_instruction inst;
uint num_colors = 0;
uint i;
/* Declare 0, 1 or 2 new BCOLOR inputs */
for (i = 0; i < 2; i++) {
if (ts->front_color_input[i] != INVALID_INDEX) {
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_INPUT;
decl.Declaration.Interpolate = 1;
decl.Declaration.Semantic = 1;
decl.Semantic.Name = TGSI_SEMANTIC_BCOLOR;
decl.Semantic.Index = i;
decl.Range.First = decl.Range.Last = ts->num_inputs++;
decl.Interp.Interpolate = ts->front_color_interp[i];
ctx->emit_declaration(ctx, &decl);
ts->back_color_input[i] = decl.Range.First;
num_colors++;
}
}
if (num_colors > 0) {
/* Declare 1 or 2 temp registers */
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_TEMPORARY;
decl.Range.First = ts->num_temps;
decl.Range.Last = ts->num_temps + num_colors - 1;
ctx->emit_declaration(ctx, &decl);
ts->new_colors[0] = ts->num_temps;
ts->new_colors[1] = ts->num_temps + 1;
if (ts->face_input == INVALID_INDEX) {
/* declare FACE INPUT register */
decl = tgsi_default_full_declaration();
decl.Declaration.File = TGSI_FILE_INPUT;
decl.Declaration.Semantic = 1;
decl.Semantic.Name = TGSI_SEMANTIC_FACE;
decl.Semantic.Index = 0;
decl.Range.First = decl.Range.Last = ts->num_inputs++;
ctx->emit_declaration(ctx, &decl);
ts->face_input = decl.Range.First;
}
/* CMP temp[c0], face, bcolor[c0], fcolor[c0]
* temp[c0] = face < 0.0 ? bcolor[c0] : fcolor[c0]
*/
for (i = 0; i < 2; i++) {
if (ts->front_color_input[i] != INVALID_INDEX) {
inst = tgsi_default_full_instruction();
inst.Instruction.Opcode = TGSI_OPCODE_CMP;
inst.Instruction.NumDstRegs = 1;
inst.Dst[0].Register.File = TGSI_FILE_TEMPORARY;
inst.Dst[0].Register.Index = ts->new_colors[i];
inst.Instruction.NumSrcRegs = 3;
inst.Src[0].Register.File = TGSI_FILE_INPUT;
inst.Src[0].Register.Index = ts->face_input;
inst.Src[1].Register.File = TGSI_FILE_INPUT;
inst.Src[1].Register.Index = ts->back_color_input[i];
inst.Src[2].Register.File = TGSI_FILE_INPUT;
inst.Src[2].Register.Index = ts->front_color_input[i];
ctx->emit_instruction(ctx, &inst);
}
}
}
}
static void
xform_inst(struct tgsi_transform_context *ctx,
struct tgsi_full_instruction *inst)
{
struct two_side_transform_context *ts = two_side_transform_context(ctx);
const struct tgsi_opcode_info *info =
tgsi_get_opcode_info(inst->Instruction.Opcode);
uint i, j;
/* Look for src regs which reference the input color and replace
* them with the temp color.
*/
for (i = 0; i < info->num_src; i++) {
if (inst->Src[i].Register.File == TGSI_FILE_INPUT) {
for (j = 0; j < 2; j++) {
if (inst->Src[i].Register.Index == ts->front_color_input[j]) {
/* replace color input with temp reg */
inst->Src[i].Register.File = TGSI_FILE_TEMPORARY;
inst->Src[i].Register.Index = ts->new_colors[j];
break;
}
}
}
}
ctx->emit_instruction(ctx, inst);
}
struct tgsi_token *
tgsi_add_two_side(const struct tgsi_token *tokens_in)
{
struct two_side_transform_context transform;
const uint num_new_tokens = 100; /* should be enough */
const uint new_len = tgsi_num_tokens(tokens_in) + num_new_tokens;
struct tgsi_token *new_tokens;
/* setup transformation context */
memset(&transform, 0, sizeof(transform));
transform.base.transform_declaration = xform_decl;
transform.base.transform_instruction = xform_inst;
transform.base.prolog = emit_prolog;
transform.face_input = INVALID_INDEX;
transform.front_color_input[0] = INVALID_INDEX;
transform.front_color_input[1] = INVALID_INDEX;
transform.front_color_interp[0] = TGSI_INTERPOLATE_COLOR;
transform.front_color_interp[1] = TGSI_INTERPOLATE_COLOR;
transform.back_color_input[0] = INVALID_INDEX;
transform.back_color_input[1] = INVALID_INDEX;
/* allocate new tokens buffer */
new_tokens = tgsi_alloc_tokens(new_len);
if (!new_tokens)
return NULL;
/* transform the shader */
tgsi_transform_shader(tokens_in, new_tokens, new_len, &transform.base);
return new_tokens;
}

View File

@@ -0,0 +1,34 @@
/*
* Copyright 2013 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef TGSI_TWO_SIDE_H
#define TGSI_TWO_SIDE_H
struct tgsi_token;
struct tgsi_token *
tgsi_add_two_side(const struct tgsi_token *tokens_in);
#endif /* TGSI_TWO_SIDE_H */

View File

@@ -462,3 +462,21 @@ tgsi_util_get_texture_coord_dim(int tgsi_tex, int *shadow_or_sample)
return dim;
}
boolean
tgsi_is_shadow_target(unsigned target)
{
switch (target) {
case TGSI_TEXTURE_SHADOW1D:
case TGSI_TEXTURE_SHADOW2D:
case TGSI_TEXTURE_SHADOWRECT:
case TGSI_TEXTURE_SHADOW1D_ARRAY:
case TGSI_TEXTURE_SHADOW2D_ARRAY:
case TGSI_TEXTURE_SHADOWCUBE:
case TGSI_TEXTURE_SHADOWCUBE_ARRAY:
return TRUE;
default:
return FALSE;
}
}

View File

@@ -82,6 +82,9 @@ tgsi_util_get_src_from_ind(const struct tgsi_ind_register *reg);
int
tgsi_util_get_texture_coord_dim(int tgsi_tex, int *shadow_or_sample);
boolean
tgsi_is_shadow_target(unsigned target);
#if defined __cplusplus
}
#endif

View File

@@ -1190,6 +1190,8 @@ static void blitter_draw(struct blitter_context_priv *ctx,
u_upload_data(ctx->upload, 0, sizeof(ctx->vertices), ctx->vertices,
&vb.buffer_offset, &vb.buffer);
if (!vb.buffer)
return;
u_upload_unmap(ctx->upload);
pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, &vb);
@@ -2063,7 +2065,7 @@ void util_blitter_clear_buffer(struct blitter_context *blitter,
struct blitter_context_priv *ctx = (struct blitter_context_priv*)blitter;
struct pipe_context *pipe = ctx->base.pipe;
struct pipe_vertex_buffer vb = {0};
struct pipe_stream_output_target *so_target;
struct pipe_stream_output_target *so_target = NULL;
unsigned offsets[PIPE_MAX_SO_BUFFERS] = {0};
assert(num_channels >= 1);
@@ -2089,6 +2091,9 @@ void util_blitter_clear_buffer(struct blitter_context *blitter,
u_upload_data(ctx->upload, 0, num_channels*4, clear_value,
&vb.buffer_offset, &vb.buffer);
if (!vb.buffer)
goto out;
vb.stride = 0;
blitter_set_running_flag(ctx);
@@ -2112,6 +2117,7 @@ void util_blitter_clear_buffer(struct blitter_context *blitter,
util_draw_arrays(pipe, PIPE_PRIM_POINTS, 0, size / 4);
out:
blitter_restore_vertex_states(ctx);
blitter_restore_render_cond(ctx);
blitter_unset_running_flag(ctx);

View File

@@ -372,30 +372,28 @@ void util_blitter_custom_resolve_color(struct blitter_context *blitter,
*
* States not listed here are not affected by util_blitter. */
static inline
void util_blitter_save_blend(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_blend(struct blitter_context *blitter, void *state)
{
blitter->saved_blend_state = state;
}
static inline
void util_blitter_save_depth_stencil_alpha(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_depth_stencil_alpha(struct blitter_context *blitter,
void *state)
{
blitter->saved_dsa_state = state;
}
static inline
void util_blitter_save_vertex_elements(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_vertex_elements(struct blitter_context *blitter, void *state)
{
blitter->saved_velem_state = state;
}
static inline
void util_blitter_save_stencil_ref(struct blitter_context *blitter,
const struct pipe_stencil_ref *state)
static inline void
util_blitter_save_stencil_ref(struct blitter_context *blitter,
const struct pipe_stencil_ref *state)
{
blitter->saved_stencil_ref = *state;
}
@@ -407,23 +405,20 @@ void util_blitter_save_rasterizer(struct blitter_context *blitter,
blitter->saved_rs_state = state;
}
static inline
void util_blitter_save_fragment_shader(struct blitter_context *blitter,
void *fs)
static inline void
util_blitter_save_fragment_shader(struct blitter_context *blitter, void *fs)
{
blitter->saved_fs = fs;
}
static inline
void util_blitter_save_vertex_shader(struct blitter_context *blitter,
void *vs)
static inline void
util_blitter_save_vertex_shader(struct blitter_context *blitter, void *vs)
{
blitter->saved_vs = vs;
}
static inline
void util_blitter_save_geometry_shader(struct blitter_context *blitter,
void *gs)
static inline void
util_blitter_save_geometry_shader(struct blitter_context *blitter, void *gs)
{
blitter->saved_gs = gs;
}
@@ -442,24 +437,24 @@ util_blitter_save_tesseval_shader(struct blitter_context *blitter,
blitter->saved_tes = sh;
}
static inline
void util_blitter_save_framebuffer(struct blitter_context *blitter,
const struct pipe_framebuffer_state *state)
static inline void
util_blitter_save_framebuffer(struct blitter_context *blitter,
const struct pipe_framebuffer_state *state)
{
blitter->saved_fb_state.nr_cbufs = 0; /* It's ~0 now, meaning it's unsaved. */
util_copy_framebuffer_state(&blitter->saved_fb_state, state);
}
static inline
void util_blitter_save_viewport(struct blitter_context *blitter,
struct pipe_viewport_state *state)
static inline void
util_blitter_save_viewport(struct blitter_context *blitter,
struct pipe_viewport_state *state)
{
blitter->saved_viewport = *state;
}
static inline
void util_blitter_save_scissor(struct blitter_context *blitter,
struct pipe_scissor_state *state)
static inline void
util_blitter_save_scissor(struct blitter_context *blitter,
struct pipe_scissor_state *state)
{
blitter->saved_scissor = *state;
}

View File

@@ -41,6 +41,7 @@
#include "util/u_tile.h"
#include "util/u_prim.h"
#include "util/u_surface.h"
#include <inttypes.h>
#include <stdio.h>
#include <limits.h> /* CHAR_BIT */
@@ -275,7 +276,7 @@ debug_get_flags_option(const char *name,
for (; flags->name; ++flags)
namealign = MAX2(namealign, strlen(flags->name));
for (flags = orig; flags->name; ++flags)
_debug_printf("| %*s [0x%0*lx]%s%s\n", namealign, flags->name,
_debug_printf("| %*s [0x%0*"PRIu64"]%s%s\n", namealign, flags->name,
(int)sizeof(uint64_t)*CHAR_BIT/4, flags->value,
flags->desc ? " " : "", flags->desc ? flags->desc : "");
}
@@ -290,9 +291,9 @@ debug_get_flags_option(const char *name,
if (debug_get_option_should_print()) {
if (str) {
debug_printf("%s: %s = 0x%lx (%s)\n", __FUNCTION__, name, result, str);
debug_printf("%s: %s = 0x%"PRIu64" (%s)\n", __FUNCTION__, name, result, str);
} else {
debug_printf("%s: %s = 0x%lx\n", __FUNCTION__, name, result);
debug_printf("%s: %s = 0x%"PRIu64"\n", __FUNCTION__, name, result);
}
}

View File

@@ -21,7 +21,8 @@
* DEALINGS IN THE SOFTWARE.
*/
/* Copied from EXT_texture_shared_exponent and edited. */
/* Copied from EXT_texture_shared_exponent and edited, getting rid of
* expensive float math bits too. */
#ifndef RGB9E5_H
#define RGB9E5_H
@@ -39,7 +40,6 @@
#define RGB9E5_MANTISSA_VALUES (1<<RGB9E5_MANTISSA_BITS)
#define MAX_RGB9E5_MANTISSA (RGB9E5_MANTISSA_VALUES-1)
#define MAX_RGB9E5 (((float)MAX_RGB9E5_MANTISSA)/RGB9E5_MANTISSA_VALUES * (1<<MAX_RGB9E5_EXP))
#define EPSILON_RGB9E5 ((1.0/RGB9E5_MANTISSA_VALUES) / (1<<RGB9E5_EXP_BIAS))
typedef union {
unsigned int raw;
@@ -74,63 +74,59 @@ typedef union {
} field;
} rgb9e5;
static inline float rgb9e5_ClampRange(float x)
{
if (x > 0.0f) {
if (x >= MAX_RGB9E5) {
return MAX_RGB9E5;
} else {
return x;
}
} else {
/* NaN gets here too since comparisons with NaN always fail! */
return 0.0;
}
}
/* Ok, FloorLog2 is not correct for the denorm and zero values, but we
are going to do a max of this value with the minimum rgb9e5 exponent
that will hide these problem cases. */
static inline int rgb9e5_FloorLog2(float x)
static inline int rgb9e5_ClampRange(float x)
{
float754 f;
float754 max;
f.value = x;
return (f.field.biasedexponent - 127);
max.value = MAX_RGB9E5;
if (f.raw > 0x7f800000)
/* catches neg, NaNs */
return 0;
else if (f.raw >= max.raw)
return max.raw;
else
return f.raw;
}
static inline unsigned float3_to_rgb9e5(const float rgb[3])
{
rgb9e5 retval;
float maxrgb;
int rm, gm, bm;
float rc, gc, bc;
int exp_shared, maxm;
double denom;
int rm, gm, bm, exp_shared;
float754 revdenom = {0};
float754 rc, bc, gc, maxrgb;
rc = rgb9e5_ClampRange(rgb[0]);
gc = rgb9e5_ClampRange(rgb[1]);
bc = rgb9e5_ClampRange(rgb[2]);
rc.raw = rgb9e5_ClampRange(rgb[0]);
gc.raw = rgb9e5_ClampRange(rgb[1]);
bc.raw = rgb9e5_ClampRange(rgb[2]);
maxrgb.raw = MAX3(rc.raw, gc.raw, bc.raw);
maxrgb = MAX3(rc, gc, bc);
exp_shared = MAX2(-RGB9E5_EXP_BIAS-1, rgb9e5_FloorLog2(maxrgb)) + 1 + RGB9E5_EXP_BIAS;
/*
* Compared to what the spec suggests, instead of conditionally adjusting
* the exponent after the fact do it here by doing the equivalent of +0.5 -
* the int add will spill over into the exponent in this case.
*/
maxrgb.raw += maxrgb.raw & (1 << (23-9));
exp_shared = MAX2((maxrgb.raw >> 23), -RGB9E5_EXP_BIAS - 1 + 127) +
1 + RGB9E5_EXP_BIAS - 127;
revdenom.field.biasedexponent = 127 - (exp_shared - RGB9E5_EXP_BIAS -
RGB9E5_MANTISSA_BITS) + 1;
assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP);
assert(exp_shared >= 0);
/* This exp2 function could be replaced by a table. */
denom = exp2(exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS);
maxm = (int) floor(maxrgb / denom + 0.5);
if (maxm == MAX_RGB9E5_MANTISSA+1) {
denom *= 2;
exp_shared += 1;
assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP);
} else {
assert(maxm <= MAX_RGB9E5_MANTISSA);
}
rm = (int) floor(rc / denom + 0.5);
gm = (int) floor(gc / denom + 0.5);
bm = (int) floor(bc / denom + 0.5);
/*
* The spec uses strict round-up behavior (d3d10 disagrees, but in any case
* must match what is done above for figuring out exponent).
* We avoid the doubles ((int) rc * revdenom + 0.5) by doing the rounding
* ourselves (revdenom was adjusted by +1, above).
*/
rm = (int) (rc.value * revdenom.value);
gm = (int) (gc.value * revdenom.value);
bm = (int) (bc.value * revdenom.value);
rm = (rm & 1) + (rm >> 1);
gm = (gm & 1) + (gm >> 1);
bm = (bm & 1) + (bm >> 1);
assert(rm <= MAX_RGB9E5_MANTISSA);
assert(gm <= MAX_RGB9E5_MANTISSA);
@@ -151,15 +147,15 @@ static inline void rgb9e5_to_float3(unsigned rgb, float retval[3])
{
rgb9e5 v;
int exponent;
float scale;
float754 scale = {0};
v.raw = rgb;
exponent = v.field.biasedexponent - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS;
scale = exp2f(exponent);
scale.field.biasedexponent = exponent + 127;
retval[0] = v.field.r * scale;
retval[1] = v.field.g * scale;
retval[2] = v.field.b * scale;
retval[0] = v.field.r * scale.value;
retval[1] = v.field.g * scale.value;
retval[2] = v.field.b * scale.value;
}
#endif

View File

@@ -88,3 +88,18 @@ void util_set_vertex_buffers_count(struct pipe_vertex_buffer *dst,
*dst_count = util_last_bit(enabled_buffers);
}
void
util_set_index_buffer(struct pipe_index_buffer *dst,
const struct pipe_index_buffer *src)
{
if (src) {
pipe_resource_reference(&dst->buffer, src->buffer);
memcpy(dst, src, sizeof(*dst));
}
else {
pipe_resource_reference(&dst->buffer, NULL);
memset(dst, 0, sizeof(*dst));
}
}

View File

@@ -44,6 +44,9 @@ void util_set_vertex_buffers_count(struct pipe_vertex_buffer *dst,
const struct pipe_vertex_buffer *src,
unsigned start_slot, unsigned count);
void util_set_index_buffer(struct pipe_index_buffer *dst,
const struct pipe_index_buffer *src);
#ifdef __cplusplus
}
#endif

View File

@@ -389,6 +389,26 @@ unsigned ffs( unsigned u )
#define ffs __builtin_ffs
#endif
#ifdef HAVE___BUILTIN_FFSLL
#define ffsll __builtin_ffsll
#else
static inline int
ffsll(long long int val)
{
int bit;
bit = ffs((unsigned) (val & 0xffffffff));
if (bit != 0)
return bit;
bit = ffs((unsigned) (val >> 32));
if (bit != 0)
return 32 + bit;
return 0;
}
#endif
#endif /* FFS_DEFINED */
/**
@@ -483,6 +503,26 @@ u_bit_scan64(uint64_t *mask)
}
#endif
/* For looping over a bitmask when you want to loop over consecutive bits
* manually, for example:
*
* while (mask) {
* int start, count, i;
*
* u_bit_scan_consecutive_range(&mask, &start, &count);
*
* for (i = 0; i < count; i++)
* ... process element (start+i)
* }
*/
static inline void
u_bit_scan_consecutive_range(unsigned *mask, int *start, int *count)
{
*start = ffs(*mask) - 1;
*count = ffs(~(*mask >> *start)) - 1;
*mask &= ~(((1 << *count) - 1) << *start);
}
/**
* Return float bits.
*/

View File

@@ -0,0 +1,267 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "u_inlines.h"
#include "u_memory.h"
#include "u_prim_restart.h"
/**
* Translate an index buffer for primitive restart.
* Create a new index buffer which is a copy of the original index buffer
* except that instances of 'restart_index' are converted to 0xffff or
* 0xffffffff.
* Also, index buffers using 1-byte indexes are converted to 2-byte indexes.
*/
enum pipe_error
util_translate_prim_restart_ib(struct pipe_context *context,
struct pipe_index_buffer *src_buffer,
struct pipe_resource **dst_buffer,
unsigned num_indexes,
unsigned restart_index)
{
struct pipe_screen *screen = context->screen;
struct pipe_transfer *src_transfer = NULL, *dst_transfer = NULL;
void *src_map = NULL, *dst_map = NULL;
const unsigned src_index_size = src_buffer->index_size;
unsigned dst_index_size;
/* 1-byte indexes are converted to 2-byte indexes, 4-byte stays 4-byte */
dst_index_size = MAX2(2, src_buffer->index_size);
assert(dst_index_size == 2 || dst_index_size == 4);
/* no user buffers for now */
assert(src_buffer->user_buffer == NULL);
/* Create new index buffer */
*dst_buffer = pipe_buffer_create(screen, PIPE_BIND_INDEX_BUFFER,
PIPE_USAGE_STREAM,
num_indexes * dst_index_size);
if (!*dst_buffer)
goto error;
/* Map new / dest index buffer */
dst_map = pipe_buffer_map(context, *dst_buffer,
PIPE_TRANSFER_WRITE, &dst_transfer);
if (!dst_map)
goto error;
/* Map original / src index buffer */
src_map = pipe_buffer_map_range(context, src_buffer->buffer,
src_buffer->offset,
num_indexes * src_index_size,
PIPE_TRANSFER_READ,
&src_transfer);
if (!src_map)
goto error;
if (src_index_size == 1 && dst_index_size == 2) {
uint8_t *src = (uint8_t *) src_map;
uint16_t *dst = (uint16_t *) dst_map;
unsigned i;
for (i = 0; i < num_indexes; i++) {
dst[i] = (src[i] == restart_index) ? 0xffff : src[i];
}
}
else if (src_index_size == 2 && dst_index_size == 2) {
uint16_t *src = (uint16_t *) src_map;
uint16_t *dst = (uint16_t *) dst_map;
unsigned i;
for (i = 0; i < num_indexes; i++) {
dst[i] = (src[i] == restart_index) ? 0xffff : src[i];
}
}
else {
uint32_t *src = (uint32_t *) src_map;
uint32_t *dst = (uint32_t *) dst_map;
unsigned i;
assert(src_index_size == 4);
assert(dst_index_size == 4);
for (i = 0; i < num_indexes; i++) {
dst[i] = (src[i] == restart_index) ? 0xffffffff : src[i];
}
}
pipe_buffer_unmap(context, src_transfer);
pipe_buffer_unmap(context, dst_transfer);
return PIPE_OK;
error:
if (src_transfer)
pipe_buffer_unmap(context, src_transfer);
if (dst_transfer)
pipe_buffer_unmap(context, dst_transfer);
if (*dst_buffer)
screen->resource_destroy(screen, *dst_buffer);
return PIPE_ERROR_OUT_OF_MEMORY;
}
/** Helper structs for util_draw_vbo_without_prim_restart() */
struct range {
unsigned start, count;
};
struct range_info {
struct range *ranges;
unsigned count, max;
};
/**
* Helper function for util_draw_vbo_without_prim_restart()
* \return true for success, false if out of memory
*/
static boolean
add_range(struct range_info *info, unsigned start, unsigned count)
{
if (info->max == 0) {
info->max = 10;
info->ranges = MALLOC(info->max * sizeof(struct range));
if (!info->ranges) {
return FALSE;
}
}
else if (info->count == info->max) {
/* grow the ranges[] array */
info->ranges = REALLOC(info->ranges,
info->max * sizeof(struct range),
2 * info->max * sizeof(struct range));
if (!info->ranges) {
return FALSE;
}
info->max *= 2;
}
/* save the range */
info->ranges[info->count].start = start;
info->ranges[info->count].count = count;
info->count++;
return TRUE;
}
/**
* Implement primitive restart by breaking an indexed primitive into
* pieces which do not contain restart indexes. Each piece is then
* drawn by calling pipe_context::draw_vbo().
* \return PIPE_OK if no error, an error code otherwise.
*/
enum pipe_error
util_draw_vbo_without_prim_restart(struct pipe_context *context,
const struct pipe_index_buffer *ib,
const struct pipe_draw_info *info)
{
const void *src_map;
struct range_info ranges = {0};
struct pipe_draw_info new_info;
struct pipe_transfer *src_transfer = NULL;
unsigned i, start, count;
assert(info->indexed);
assert(info->primitive_restart);
/* Get pointer to the index data */
if (ib->buffer) {
/* map the index buffer (only the range we need to scan) */
src_map = pipe_buffer_map_range(context, ib->buffer,
ib->offset + info->start * ib->index_size,
info->count * ib->index_size,
PIPE_TRANSFER_READ,
&src_transfer);
if (!src_map) {
return PIPE_ERROR_OUT_OF_MEMORY;
}
}
else {
if (!ib->user_buffer) {
debug_printf("User-space index buffer is null!");
return PIPE_ERROR_BAD_INPUT;
}
src_map = (const uint8_t *) ib->user_buffer
+ ib->offset
+ info->start * ib->index_size;
}
#define SCAN_INDEXES(TYPE) \
for (i = 0; i <= info->count; i++) { \
if (i == info->count || \
((const TYPE *) src_map)[i] == info->restart_index) { \
/* cut / restart */ \
if (count > 0) { \
if (!add_range(&ranges, info->start + start, count)) { \
if (src_transfer) \
pipe_buffer_unmap(context, src_transfer); \
return PIPE_ERROR_OUT_OF_MEMORY; \
} \
} \
start = i + 1; \
count = 0; \
} \
else { \
count++; \
} \
}
start = info->start;
count = 0;
switch (ib->index_size) {
case 1:
SCAN_INDEXES(uint8_t);
break;
case 2:
SCAN_INDEXES(uint16_t);
break;
case 4:
SCAN_INDEXES(uint32_t);
break;
default:
assert(!"Bad index size");
return PIPE_ERROR_BAD_INPUT;
}
/* unmap index buffer */
if (src_transfer)
pipe_buffer_unmap(context, src_transfer);
/* draw ranges between the restart indexes */
new_info = *info;
new_info.primitive_restart = FALSE;
for (i = 0; i < ranges.count; i++) {
new_info.start = ranges.ranges[i].start;
new_info.count = ranges.ranges[i].count;
context->draw_vbo(context, &new_info);
}
FREE(ranges.ranges);
return PIPE_OK;
}

View File

@@ -0,0 +1,62 @@
/*
* Copyright 2014 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef U_PRIM_RESTART_H
#define U_PRIM_RESTART_H
#include "pipe/p_defines.h"
#ifdef __cplusplus
extern "C" {
#endif
struct pipe_context;
struct pipe_draw_info;
struct pipe_index_buffer;
struct pipe_resource;
enum pipe_error
util_translate_prim_restart_ib(struct pipe_context *context,
struct pipe_index_buffer *src_buffer,
struct pipe_resource **dst_buffer,
unsigned num_indexes,
unsigned restart_index);
enum pipe_error
util_draw_vbo_without_prim_restart(struct pipe_context *context,
const struct pipe_index_buffer *ib,
const struct pipe_draw_info *info);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -339,7 +339,7 @@ pstip_transform_prolog(struct tgsi_transform_context *ctx)
/* KILL_IF -texTemp; # if -texTemp < 0, kill fragment */
tgsi_transform_kill_inst(ctx,
TGSI_FILE_TEMPORARY, texTemp,
TGSI_SWIZZLE_W);
TGSI_SWIZZLE_W, TRUE);
}

View File

@@ -42,6 +42,7 @@ struct u_rect {
};
/* Do two rectangles intersect?
* Note: empty rectangles are valid as inputs (and never intersect).
*/
static inline boolean
u_rect_test_intersection(const struct u_rect *a,
@@ -50,7 +51,11 @@ u_rect_test_intersection(const struct u_rect *a,
return (!(a->x1 < b->x0 ||
b->x1 < a->x0 ||
a->y1 < b->y0 ||
b->y1 < a->y0));
b->y1 < a->y0 ||
a->x1 < a->x0 ||
a->y1 < a->y0 ||
b->x1 < b->x0 ||
b->y1 < b->y0));
}
/* Find the intersection of two rectangles known to intersect.
@@ -82,7 +87,12 @@ u_rect_possible_intersection(const struct u_rect *a,
u_rect_find_intersection(a,b);
}
else {
b->x0 = b->x1 = b->y0 = b->y1 = 0;
/*
* Note the u_rect_xx tests deal with inclusive coordinates
* hence all-zero would not be an empty box.
*/
b->x0 = b->y0 = 0;
b->x1 = b->y1 = -1;
}
}

View File

@@ -831,3 +831,54 @@ util_make_fs_msaa_resolve_bilinear(struct pipe_context *pipe,
return ureg_create_shader_and_destroy(ureg, pipe);
}
void *
util_make_geometry_passthrough_shader(struct pipe_context *pipe,
uint num_attribs,
const ubyte *semantic_names,
const ubyte *semantic_indexes)
{
static const unsigned zero[4] = {0, 0, 0, 0};
struct ureg_program *ureg;
struct ureg_dst dst[PIPE_MAX_SHADER_OUTPUTS];
struct ureg_src src[PIPE_MAX_SHADER_INPUTS];
struct ureg_src imm;
unsigned i;
ureg = ureg_create(TGSI_PROCESSOR_GEOMETRY);
if (ureg == NULL)
return NULL;
ureg_property(ureg, TGSI_PROPERTY_GS_INPUT_PRIM, PIPE_PRIM_POINTS);
ureg_property(ureg, TGSI_PROPERTY_GS_OUTPUT_PRIM, PIPE_PRIM_POINTS);
ureg_property(ureg, TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES, 1);
ureg_property(ureg, TGSI_PROPERTY_GS_INVOCATIONS, 1);
imm = ureg_DECL_immediate_uint(ureg, zero, 4);
/**
* Loop over all the attribs and declare the corresponding
* declarations in the geometry shader
*/
for (i = 0; i < num_attribs; i++) {
src[i] = ureg_DECL_input(ureg, semantic_names[i],
semantic_indexes[i], 0, 1);
src[i] = ureg_src_dimension(src[i], 0);
dst[i] = ureg_DECL_output(ureg, semantic_names[i], semantic_indexes[i]);
}
/* MOV dst[i] src[i] */
for (i = 0; i < num_attribs; i++) {
ureg_MOV(ureg, dst[i], src[i]);
}
/* EMIT IMM[0] */
ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
/* END */
ureg_END(ureg);
return ureg_create_shader_and_destroy(ureg, pipe);
}

View File

@@ -146,6 +146,12 @@ util_make_fs_msaa_resolve_bilinear(struct pipe_context *pipe,
unsigned tgsi_tex, unsigned nr_samples,
enum tgsi_return_type stype);
extern void *
util_make_geometry_passthrough_shader(struct pipe_context *pipe,
uint num_attribs,
const ubyte *semantic_names,
const ubyte *semantic_indexes);
#ifdef __cplusplus
}
#endif

View File

@@ -199,6 +199,8 @@ util_memmove(void *dest, const void *src, size_t n)
}
#define util_strcasecmp stricmp
#else
#define util_vsnprintf vsnprintf
@@ -211,6 +213,7 @@ util_memmove(void *dest, const void *src, size_t n)
#define util_strncat strncat
#define util_strstr strstr
#define util_memmove memmove
#define util_strcasecmp strcasecmp
#endif

View File

@@ -457,7 +457,7 @@ null_constant_buffer(struct pipe_context *ctx)
void
util_run_tests(struct pipe_screen *screen)
{
struct pipe_context *ctx = screen->context_create(screen, NULL);
struct pipe_context *ctx = screen->context_create(screen, NULL, 0);
tgsi_vs_window_space_position(ctx);
null_sampler_view(ctx, TGSI_TEXTURE_2D);

View File

@@ -129,9 +129,9 @@ void u_upload_destroy( struct u_upload_mgr *upload )
}
static enum pipe_error
u_upload_alloc_buffer( struct u_upload_mgr *upload,
unsigned min_size )
static void
u_upload_alloc_buffer(struct u_upload_mgr *upload,
unsigned min_size)
{
struct pipe_screen *screen = upload->pipe->screen;
struct pipe_resource buffer;
@@ -161,9 +161,8 @@ u_upload_alloc_buffer( struct u_upload_mgr *upload,
}
upload->buffer = screen->resource_create(screen, &buffer);
if (upload->buffer == NULL) {
return PIPE_ERROR_OUT_OF_MEMORY;
}
if (upload->buffer == NULL)
return;
/* Map the new buffer. */
upload->map = pipe_buffer_map_range(upload->pipe, upload->buffer,
@@ -172,52 +171,54 @@ u_upload_alloc_buffer( struct u_upload_mgr *upload,
if (upload->map == NULL) {
upload->transfer = NULL;
pipe_resource_reference(&upload->buffer, NULL);
return PIPE_ERROR_OUT_OF_MEMORY;
return;
}
upload->offset = 0;
return PIPE_OK;
}
enum pipe_error u_upload_alloc( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
unsigned *out_offset,
struct pipe_resource **outbuf,
void **ptr )
void
u_upload_alloc(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
unsigned *out_offset,
struct pipe_resource **outbuf,
void **ptr)
{
unsigned alloc_size = align( size, upload->alignment );
unsigned alloc_size = align(size, upload->alignment);
unsigned alloc_offset = align(min_out_offset, upload->alignment);
unsigned buffer_size = upload->buffer ? upload->buffer->width0 : 0;
unsigned offset;
/* Init these return values here in case we fail below to make
* sure the caller doesn't get garbage values.
*/
*out_offset = ~0;
pipe_resource_reference(outbuf, NULL);
*ptr = NULL;
/* Make sure we have enough space in the upload buffer
* for the sub-allocation. */
if (!upload->buffer ||
MAX2(upload->offset, alloc_offset) + alloc_size > upload->buffer->width0) {
enum pipe_error ret = u_upload_alloc_buffer(upload,
alloc_offset + alloc_size);
if (ret != PIPE_OK)
return ret;
if (unlikely(MAX2(upload->offset, alloc_offset) + alloc_size > buffer_size)) {
u_upload_alloc_buffer(upload, alloc_offset + alloc_size);
if (unlikely(!upload->buffer)) {
*out_offset = ~0;
pipe_resource_reference(outbuf, NULL);
*ptr = NULL;
return;
}
buffer_size = upload->buffer->width0;
}
offset = MAX2(upload->offset, alloc_offset);
if (!upload->map) {
if (unlikely(!upload->map)) {
upload->map = pipe_buffer_map_range(upload->pipe, upload->buffer,
offset,
upload->buffer->width0 - offset,
buffer_size - offset,
upload->map_flags,
&upload->transfer);
if (!upload->map) {
if (unlikely(!upload->map)) {
upload->transfer = NULL;
return PIPE_ERROR_OUT_OF_MEMORY;
*out_offset = ~0;
pipe_resource_reference(outbuf, NULL);
*ptr = NULL;
return;
}
upload->map -= offset;
@@ -229,46 +230,37 @@ enum pipe_error u_upload_alloc( struct u_upload_mgr *upload,
/* Emit the return values: */
*ptr = upload->map + offset;
pipe_resource_reference( outbuf, upload->buffer );
pipe_resource_reference(outbuf, upload->buffer);
*out_offset = offset;
upload->offset = offset + alloc_size;
return PIPE_OK;
}
enum pipe_error u_upload_data( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
const void *data,
unsigned *out_offset,
struct pipe_resource **outbuf)
void u_upload_data(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
const void *data,
unsigned *out_offset,
struct pipe_resource **outbuf)
{
uint8_t *ptr;
enum pipe_error ret = u_upload_alloc(upload, min_out_offset, size,
out_offset, outbuf,
(void**)&ptr);
if (ret != PIPE_OK)
return ret;
memcpy(ptr, data, size);
return PIPE_OK;
u_upload_alloc(upload, min_out_offset, size,
out_offset, outbuf,
(void**)&ptr);
if (ptr)
memcpy(ptr, data, size);
}
/* As above, but upload the full contents of a buffer. Useful for
* uploading user buffers, avoids generating an explosion of GPU
* buffers if you have an app that does lots of small vertex buffer
* renders or DrawElements calls.
*/
enum pipe_error u_upload_buffer( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned offset,
unsigned size,
struct pipe_resource *inbuf,
unsigned *out_offset,
struct pipe_resource **outbuf)
/* XXX: Remove. It's basically a CPU fallback of resource_copy_region. */
void u_upload_buffer(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned offset,
unsigned size,
struct pipe_resource *inbuf,
unsigned *out_offset,
struct pipe_resource **outbuf)
{
enum pipe_error ret = PIPE_OK;
struct pipe_transfer *transfer = NULL;
const char *map = NULL;
@@ -279,20 +271,13 @@ enum pipe_error u_upload_buffer( struct u_upload_mgr *upload,
&transfer);
if (map == NULL) {
return PIPE_ERROR_OUT_OF_MEMORY;
pipe_resource_reference(outbuf, NULL);
return;
}
if (0)
debug_printf("upload ptr %p ofs %d sz %d\n", map, offset, size);
ret = u_upload_data( upload,
min_out_offset,
size,
map,
out_offset,
outbuf);
u_upload_data(upload, min_out_offset, size, map, out_offset, outbuf);
pipe_buffer_unmap( upload->pipe, transfer );
return ret;
}

View File

@@ -78,12 +78,12 @@ void u_upload_unmap( struct u_upload_mgr *upload );
* \param outbuf Pointer to where the upload buffer will be returned.
* \param ptr Pointer to the allocated memory that is returned.
*/
enum pipe_error u_upload_alloc( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
unsigned *out_offset,
struct pipe_resource **outbuf,
void **ptr );
void u_upload_alloc(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
unsigned *out_offset,
struct pipe_resource **outbuf,
void **ptr);
/**
@@ -92,12 +92,12 @@ enum pipe_error u_upload_alloc( struct u_upload_mgr *upload,
* Same as u_upload_alloc, but in addition to that, it copies "data"
* to the pointer returned from u_upload_alloc.
*/
enum pipe_error u_upload_data( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
const void *data,
unsigned *out_offset,
struct pipe_resource **outbuf);
void u_upload_data(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned size,
const void *data,
unsigned *out_offset,
struct pipe_resource **outbuf);
/**
@@ -106,13 +106,13 @@ enum pipe_error u_upload_data( struct u_upload_mgr *upload,
* Same as u_upload_data, except that the input data comes from a buffer
* instead of a user pointer.
*/
enum pipe_error u_upload_buffer( struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned offset,
unsigned size,
struct pipe_resource *inbuf,
unsigned *out_offset,
struct pipe_resource **outbuf);
void u_upload_buffer(struct u_upload_mgr *upload,
unsigned min_out_offset,
unsigned offset,
unsigned size,
struct pipe_resource *inbuf,
unsigned *out_offset,
struct pipe_resource **outbuf);

View File

@@ -406,7 +406,6 @@ u_vbuf_translate_buffers(struct u_vbuf *mgr, struct translate_key *key,
struct pipe_resource *out_buffer = NULL;
uint8_t *out_map;
unsigned out_offset, mask;
enum pipe_error err;
/* Get a translate object. */
tr = translate_cache_find(mgr->translate_cache, key);
@@ -454,12 +453,12 @@ u_vbuf_translate_buffers(struct u_vbuf *mgr, struct translate_key *key,
assert((ib->buffer || ib->user_buffer) && ib->index_size);
/* Create and map the output buffer. */
err = u_upload_alloc(mgr->uploader, 0,
key->output_stride * num_indices,
&out_offset, &out_buffer,
(void**)&out_map);
if (err != PIPE_OK)
return err;
u_upload_alloc(mgr->uploader, 0,
key->output_stride * num_indices,
&out_offset, &out_buffer,
(void**)&out_map);
if (!out_buffer)
return PIPE_ERROR_OUT_OF_MEMORY;
if (ib->user_buffer) {
map = (uint8_t*)ib->user_buffer + offset;
@@ -486,13 +485,13 @@ u_vbuf_translate_buffers(struct u_vbuf *mgr, struct translate_key *key,
}
} else {
/* Create and map the output buffer. */
err = u_upload_alloc(mgr->uploader,
key->output_stride * start_vertex,
key->output_stride * num_vertices,
&out_offset, &out_buffer,
(void**)&out_map);
if (err != PIPE_OK)
return err;
u_upload_alloc(mgr->uploader,
key->output_stride * start_vertex,
key->output_stride * num_vertices,
&out_offset, &out_buffer,
(void**)&out_map);
if (!out_buffer)
return PIPE_ERROR_OUT_OF_MEMORY;
out_offset -= key->output_stride * start_vertex;
@@ -977,7 +976,6 @@ u_vbuf_upload_buffers(struct u_vbuf *mgr,
unsigned start, end;
struct pipe_vertex_buffer *real_vb;
const uint8_t *ptr;
enum pipe_error err;
i = u_bit_scan(&buffer_mask);
@@ -988,10 +986,10 @@ u_vbuf_upload_buffers(struct u_vbuf *mgr,
real_vb = &mgr->real_vertex_buffer[i];
ptr = mgr->vertex_buffer[i].user_buffer;
err = u_upload_data(mgr->uploader, start, end - start, ptr + start,
&real_vb->buffer_offset, &real_vb->buffer);
if (err != PIPE_OK)
return err;
u_upload_data(mgr->uploader, start, end - start, ptr + start,
&real_vb->buffer_offset, &real_vb->buffer);
if (!real_vb->buffer)
return PIPE_ERROR_OUT_OF_MEMORY;
real_vb->buffer_offset -= start;
}

View File

@@ -1120,7 +1120,7 @@ vl_create_mpeg12_decoder(struct pipe_context *context,
dec->base = *templat;
dec->base.context = context;
dec->context = context->screen->context_create(context->screen, NULL);
dec->context = context->screen->context_create(context->screen, NULL, 0);
dec->base.destroy = vl_mpeg12_destroy;
dec->base.begin_frame = vl_mpeg12_begin_frame;

View File

@@ -267,6 +267,16 @@ The integer capabilities:
* ``PIPE_CAP_DEPTH_BOUNDS_TEST``: Whether bounds_test, bounds_min, and
bounds_max states of pipe_depth_stencil_alpha_state behave according
to the GL_EXT_depth_bounds_test specification.
* ``PIPE_CAP_TGSI_TXQS``: Whether the `TXQS` opcode is supported
* ``PIPE_CAP_FORCE_PERSAMPLE_INTERP``: If the driver can force per-sample
interpolation for all fragment shader inputs if
pipe_rasterizer_state::force_persample_interp is set. This is only used
by GL3-level sample shading (ARB_sample_shading). GL4-level sample shading
(ARB_gpu_shader5) doesn't use this. While GL3 hardware has a state for it,
GL4 hardware will likely need to emulate it with a shader variant, or by
selecting the interpolation weights with a conditional assignment
in the shader.
.. _pipe_capf:

View File

@@ -960,7 +960,6 @@ XXX doesn't look like most of the opcodes really belong here.
For components which don't return a resource dimension, their value
is undefined.
.. math::
lod = src0.x
@@ -973,6 +972,17 @@ XXX doesn't look like most of the opcodes really belong here.
dst.w = texture\_levels(unit)
.. opcode:: TXQS - Texture Samples Query
This retrieves the number of samples in the texture, and stores it
into the x component. The other components are undefined.
.. math::
dst.x = texture\_samples(unit)
.. opcode:: TG4 - Texture Gather
As per ARB_texture_gather, gathers the four texels to be used in a bi-linear

View File

@@ -0,0 +1,9 @@
include Makefile.sources
include $(top_srcdir)/src/gallium/Automake.inc
AM_CFLAGS = \
$(GALLIUM_DRIVER_CFLAGS)
noinst_LTLIBRARIES = libddebug.la
libddebug_la_SOURCES = $(C_SOURCES)

View File

@@ -0,0 +1,7 @@
C_SOURCES := \
dd_context.c \
dd_draw.c \
dd_pipe.h \
dd_public.h \
dd_screen.c \
dd_util.h

View File

@@ -0,0 +1,771 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "tgsi/tgsi_parse.h"
#include "util/u_memory.h"
static void
safe_memcpy(void *dst, const void *src, size_t size)
{
if (src)
memcpy(dst, src, size);
else
memset(dst, 0, size);
}
/********************************************************************
* queries
*/
static struct dd_query *
dd_query(struct pipe_query *query)
{
return (struct dd_query *)query;
}
static struct pipe_query *
dd_query_unwrap(struct pipe_query *query)
{
if (query) {
return dd_query(query)->query;
} else {
return NULL;
}
}
static struct pipe_query *
dd_context_create_query(struct pipe_context *_pipe, unsigned query_type,
unsigned index)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_query *query;
query = pipe->create_query(pipe, query_type, index);
/* Wrap query object. */
if (query) {
struct dd_query *dd_query = CALLOC_STRUCT(dd_query);
if (dd_query) {
dd_query->type = query_type;
dd_query->query = query;
query = (struct pipe_query *)dd_query;
} else {
pipe->destroy_query(pipe, query);
query = NULL;
}
}
return query;
}
static void
dd_context_destroy_query(struct pipe_context *_pipe,
struct pipe_query *query)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->destroy_query(pipe, dd_query_unwrap(query));
FREE(query);
}
static boolean
dd_context_begin_query(struct pipe_context *_pipe, struct pipe_query *query)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
return pipe->begin_query(pipe, dd_query_unwrap(query));
}
static void
dd_context_end_query(struct pipe_context *_pipe, struct pipe_query *query)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->end_query(pipe, dd_query_unwrap(query));
}
static boolean
dd_context_get_query_result(struct pipe_context *_pipe,
struct pipe_query *query, boolean wait,
union pipe_query_result *result)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_query_result(pipe, dd_query_unwrap(query), wait, result);
}
static void
dd_context_render_condition(struct pipe_context *_pipe,
struct pipe_query *query, boolean condition,
uint mode)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->render_condition(pipe, dd_query_unwrap(query), condition, mode);
dctx->render_cond.query = dd_query(query);
dctx->render_cond.condition = condition;
dctx->render_cond.mode = mode;
}
/********************************************************************
* constant (immutable) non-shader states
*/
#define DD_CSO_CREATE(name, shortname) \
static void * \
dd_context_create_##name##_state(struct pipe_context *_pipe, \
const struct pipe_##name##_state *state) \
{ \
struct pipe_context *pipe = dd_context(_pipe)->pipe; \
struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
\
if (!hstate) \
return NULL; \
hstate->cso = pipe->create_##name##_state(pipe, state); \
hstate->state.shortname = *state; \
return hstate; \
}
#define DD_CSO_BIND(name, shortname) \
static void \
dd_context_bind_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
dctx->shortname = hstate; \
pipe->bind_##name##_state(pipe, hstate ? hstate->cso : NULL); \
}
#define DD_CSO_DELETE(name) \
static void \
dd_context_delete_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
pipe->delete_##name##_state(pipe, hstate->cso); \
FREE(hstate); \
}
#define DD_CSO_WHOLE(name, shortname) \
DD_CSO_CREATE(name, shortname) \
DD_CSO_BIND(name, shortname) \
DD_CSO_DELETE(name)
DD_CSO_WHOLE(blend, blend)
DD_CSO_WHOLE(rasterizer, rs)
DD_CSO_WHOLE(depth_stencil_alpha, dsa)
DD_CSO_CREATE(sampler, sampler)
DD_CSO_DELETE(sampler)
static void
dd_context_bind_sampler_states(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned count, void **states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
memcpy(&dctx->sampler_states[shader][start], states,
sizeof(void*) * count);
if (states) {
void *samp[PIPE_MAX_SAMPLERS];
int i;
for (i = 0; i < count; i++) {
struct dd_state *s = states[i];
samp[i] = s ? s->cso : NULL;
}
pipe->bind_sampler_states(pipe, shader, start, count, samp);
}
else
pipe->bind_sampler_states(pipe, shader, start, count, NULL);
}
static void *
dd_context_create_vertex_elements_state(struct pipe_context *_pipe,
unsigned num_elems,
const struct pipe_vertex_element *elems)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct dd_state *hstate = CALLOC_STRUCT(dd_state);
if (!hstate)
return NULL;
hstate->cso = pipe->create_vertex_elements_state(pipe, num_elems, elems);
memcpy(hstate->state.velems.velems, elems, sizeof(elems[0]) * num_elems);
hstate->state.velems.count = num_elems;
return hstate;
}
DD_CSO_BIND(vertex_elements, velems)
DD_CSO_DELETE(vertex_elements)
/********************************************************************
* shaders
*/
#define DD_SHADER(NAME, name) \
static void * \
dd_context_create_##name##_state(struct pipe_context *_pipe, \
const struct pipe_shader_state *state) \
{ \
struct pipe_context *pipe = dd_context(_pipe)->pipe; \
struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
\
if (!hstate) \
return NULL; \
hstate->cso = pipe->create_##name##_state(pipe, state); \
hstate->state.shader = *state; \
hstate->state.shader.tokens = tgsi_dup_tokens(state->tokens); \
return hstate; \
} \
\
static void \
dd_context_bind_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
dctx->shaders[PIPE_SHADER_##NAME] = hstate; \
pipe->bind_##name##_state(pipe, hstate ? hstate->cso : NULL); \
} \
\
static void \
dd_context_delete_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
pipe->delete_##name##_state(pipe, hstate->cso); \
tgsi_free_tokens(hstate->state.shader.tokens); \
FREE(hstate); \
}
DD_SHADER(FRAGMENT, fs)
DD_SHADER(VERTEX, vs)
DD_SHADER(GEOMETRY, gs)
DD_SHADER(TESS_CTRL, tcs)
DD_SHADER(TESS_EVAL, tes)
/********************************************************************
* immediate states
*/
#define DD_IMM_STATE(name, type, deref, ref) \
static void \
dd_context_set_##name(struct pipe_context *_pipe, type deref) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
\
dctx->name = deref; \
pipe->set_##name(pipe, ref); \
}
DD_IMM_STATE(blend_color, const struct pipe_blend_color, *state, state)
DD_IMM_STATE(stencil_ref, const struct pipe_stencil_ref, *state, state)
DD_IMM_STATE(clip_state, const struct pipe_clip_state, *state, state)
DD_IMM_STATE(sample_mask, unsigned, sample_mask, sample_mask)
DD_IMM_STATE(min_samples, unsigned, min_samples, min_samples)
DD_IMM_STATE(framebuffer_state, const struct pipe_framebuffer_state, *state, state)
DD_IMM_STATE(polygon_stipple, const struct pipe_poly_stipple, *state, state)
static void
dd_context_set_constant_buffer(struct pipe_context *_pipe,
uint shader, uint index,
struct pipe_constant_buffer *constant_buffer)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->constant_buffers[shader][index], constant_buffer,
sizeof(*constant_buffer));
pipe->set_constant_buffer(pipe, shader, index, constant_buffer);
}
static void
dd_context_set_scissor_states(struct pipe_context *_pipe,
unsigned start_slot, unsigned num_scissors,
const struct pipe_scissor_state *states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->scissors[start_slot], states,
sizeof(*states) * num_scissors);
pipe->set_scissor_states(pipe, start_slot, num_scissors, states);
}
static void
dd_context_set_viewport_states(struct pipe_context *_pipe,
unsigned start_slot, unsigned num_viewports,
const struct pipe_viewport_state *states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->viewports[start_slot], states,
sizeof(*states) * num_viewports);
pipe->set_viewport_states(pipe, start_slot, num_viewports, states);
}
static void dd_context_set_tess_state(struct pipe_context *_pipe,
const float default_outer_level[4],
const float default_inner_level[2])
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
memcpy(dctx->tess_default_levels, default_outer_level, sizeof(float) * 4);
memcpy(dctx->tess_default_levels+4, default_inner_level, sizeof(float) * 2);
pipe->set_tess_state(pipe, default_outer_level, default_inner_level);
}
/********************************************************************
* views
*/
static struct pipe_surface *
dd_context_create_surface(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_surface *surf_tmpl)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_surface *view =
pipe->create_surface(pipe, resource, surf_tmpl);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_surface_destroy(struct pipe_context *_pipe,
struct pipe_surface *surf)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->surface_destroy(pipe, surf);
}
static struct pipe_sampler_view *
dd_context_create_sampler_view(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_sampler_view *templ)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_sampler_view *view =
pipe->create_sampler_view(pipe, resource, templ);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_sampler_view_destroy(struct pipe_context *_pipe,
struct pipe_sampler_view *view)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->sampler_view_destroy(pipe, view);
}
static struct pipe_image_view *
dd_context_create_image_view(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_image_view *templ)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_image_view *view =
pipe->create_image_view(pipe, resource, templ);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_image_view_destroy(struct pipe_context *_pipe,
struct pipe_image_view *view)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->image_view_destroy(pipe, view);
}
static struct pipe_stream_output_target *
dd_context_create_stream_output_target(struct pipe_context *_pipe,
struct pipe_resource *res,
unsigned buffer_offset,
unsigned buffer_size)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_stream_output_target *view =
pipe->create_stream_output_target(pipe, res, buffer_offset,
buffer_size);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_stream_output_target_destroy(struct pipe_context *_pipe,
struct pipe_stream_output_target *target)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->stream_output_target_destroy(pipe, target);
}
/********************************************************************
* set states
*/
static void
dd_context_set_sampler_views(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num,
struct pipe_sampler_view **views)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->sampler_views[shader][start], views,
sizeof(views[0]) * num);
pipe->set_sampler_views(pipe, shader, start, num, views);
}
static void
dd_context_set_shader_images(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num,
struct pipe_image_view **views)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->shader_images[shader][start], views,
sizeof(views[0]) * num);
pipe->set_shader_images(pipe, shader, start, num, views);
}
static void
dd_context_set_shader_buffers(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num_buffers,
struct pipe_shader_buffer *buffers)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->shader_buffers[shader][start], buffers,
sizeof(buffers[0]) * num_buffers);
pipe->set_shader_buffers(pipe, shader, start, num_buffers, buffers);
}
static void
dd_context_set_vertex_buffers(struct pipe_context *_pipe,
unsigned start, unsigned num_buffers,
const struct pipe_vertex_buffer *buffers)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->vertex_buffers[start], buffers,
sizeof(buffers[0]) * num_buffers);
pipe->set_vertex_buffers(pipe, start, num_buffers, buffers);
}
static void
dd_context_set_index_buffer(struct pipe_context *_pipe,
const struct pipe_index_buffer *ib)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->index_buffer, ib, sizeof(*ib));
pipe->set_index_buffer(pipe, ib);
}
static void
dd_context_set_stream_output_targets(struct pipe_context *_pipe,
unsigned num_targets,
struct pipe_stream_output_target **tgs,
const unsigned *offsets)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
dctx->num_so_targets = num_targets;
safe_memcpy(dctx->so_targets, tgs, sizeof(*tgs) * num_targets);
safe_memcpy(dctx->so_offsets, offsets, sizeof(*offsets) * num_targets);
pipe->set_stream_output_targets(pipe, num_targets, tgs, offsets);
}
static void
dd_context_destroy(struct pipe_context *_pipe)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->destroy(pipe);
FREE(dctx);
}
/********************************************************************
* transfer
*/
static void *
dd_context_transfer_map(struct pipe_context *_pipe,
struct pipe_resource *resource, unsigned level,
unsigned usage, const struct pipe_box *box,
struct pipe_transfer **transfer)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->transfer_map(pipe, resource, level, usage, box, transfer);
}
static void
dd_context_transfer_flush_region(struct pipe_context *_pipe,
struct pipe_transfer *transfer,
const struct pipe_box *box)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_flush_region(pipe, transfer, box);
}
static void
dd_context_transfer_unmap(struct pipe_context *_pipe,
struct pipe_transfer *transfer)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_unmap(pipe, transfer);
}
static void
dd_context_transfer_inline_write(struct pipe_context *_pipe,
struct pipe_resource *resource,
unsigned level, unsigned usage,
const struct pipe_box *box,
const void *data, unsigned stride,
unsigned layer_stride)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_inline_write(pipe, resource, level, usage, box, data,
stride, layer_stride);
}
/********************************************************************
* miscellaneous
*/
static void
dd_context_texture_barrier(struct pipe_context *_pipe)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->texture_barrier(pipe);
}
static void
dd_context_memory_barrier(struct pipe_context *_pipe, unsigned flags)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->memory_barrier(pipe, flags);
}
static void
dd_context_get_sample_position(struct pipe_context *_pipe,
unsigned sample_count, unsigned sample_index,
float *out_value)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_sample_position(pipe, sample_count, sample_index,
out_value);
}
static void
dd_context_invalidate_resource(struct pipe_context *_pipe,
struct pipe_resource *resource)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->invalidate_resource(pipe, resource);
}
static enum pipe_reset_status
dd_context_get_device_reset_status(struct pipe_context *_pipe)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_device_reset_status(pipe);
}
static void
dd_context_dump_debug_state(struct pipe_context *_pipe, FILE *stream,
unsigned flags)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->dump_debug_state(pipe, stream, flags);
}
struct pipe_context *
dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe)
{
struct dd_context *dctx;
if (!pipe)
return NULL;
dctx = CALLOC_STRUCT(dd_context);
if (!dctx) {
pipe->destroy(pipe);
return NULL;
}
dctx->pipe = pipe;
dctx->base.priv = pipe->priv; /* expose wrapped priv data */
dctx->base.screen = &dscreen->base;
dctx->base.destroy = dd_context_destroy;
CTX_INIT(render_condition);
CTX_INIT(create_query);
CTX_INIT(destroy_query);
CTX_INIT(begin_query);
CTX_INIT(end_query);
CTX_INIT(get_query_result);
CTX_INIT(create_blend_state);
CTX_INIT(bind_blend_state);
CTX_INIT(delete_blend_state);
CTX_INIT(create_sampler_state);
CTX_INIT(bind_sampler_states);
CTX_INIT(delete_sampler_state);
CTX_INIT(create_rasterizer_state);
CTX_INIT(bind_rasterizer_state);
CTX_INIT(delete_rasterizer_state);
CTX_INIT(create_depth_stencil_alpha_state);
CTX_INIT(bind_depth_stencil_alpha_state);
CTX_INIT(delete_depth_stencil_alpha_state);
CTX_INIT(create_fs_state);
CTX_INIT(bind_fs_state);
CTX_INIT(delete_fs_state);
CTX_INIT(create_vs_state);
CTX_INIT(bind_vs_state);
CTX_INIT(delete_vs_state);
CTX_INIT(create_gs_state);
CTX_INIT(bind_gs_state);
CTX_INIT(delete_gs_state);
CTX_INIT(create_tcs_state);
CTX_INIT(bind_tcs_state);
CTX_INIT(delete_tcs_state);
CTX_INIT(create_tes_state);
CTX_INIT(bind_tes_state);
CTX_INIT(delete_tes_state);
CTX_INIT(create_vertex_elements_state);
CTX_INIT(bind_vertex_elements_state);
CTX_INIT(delete_vertex_elements_state);
CTX_INIT(set_blend_color);
CTX_INIT(set_stencil_ref);
CTX_INIT(set_sample_mask);
CTX_INIT(set_min_samples);
CTX_INIT(set_clip_state);
CTX_INIT(set_constant_buffer);
CTX_INIT(set_framebuffer_state);
CTX_INIT(set_polygon_stipple);
CTX_INIT(set_scissor_states);
CTX_INIT(set_viewport_states);
CTX_INIT(set_sampler_views);
CTX_INIT(set_tess_state);
CTX_INIT(set_shader_buffers);
CTX_INIT(set_shader_images);
CTX_INIT(set_vertex_buffers);
CTX_INIT(set_index_buffer);
CTX_INIT(create_stream_output_target);
CTX_INIT(stream_output_target_destroy);
CTX_INIT(set_stream_output_targets);
CTX_INIT(create_sampler_view);
CTX_INIT(sampler_view_destroy);
CTX_INIT(create_surface);
CTX_INIT(surface_destroy);
CTX_INIT(create_image_view);
CTX_INIT(image_view_destroy);
CTX_INIT(transfer_map);
CTX_INIT(transfer_flush_region);
CTX_INIT(transfer_unmap);
CTX_INIT(transfer_inline_write);
CTX_INIT(texture_barrier);
CTX_INIT(memory_barrier);
/* create_video_codec */
/* create_video_buffer */
/* create_compute_state */
/* bind_compute_state */
/* delete_compute_state */
/* set_compute_resources */
/* set_global_binding */
CTX_INIT(get_sample_position);
CTX_INIT(invalidate_resource);
CTX_INIT(get_device_reset_status);
CTX_INIT(dump_debug_state);
dd_init_draw_functions(dctx);
dctx->sample_mask = ~0;
return &dctx->base;
}

View File

@@ -0,0 +1,784 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "util/u_dump.h"
#include "util/u_format.h"
#include "tgsi/tgsi_scan.h"
enum call_type
{
CALL_DRAW_VBO,
CALL_RESOURCE_COPY_REGION,
CALL_BLIT,
CALL_FLUSH_RESOURCE,
CALL_CLEAR,
CALL_CLEAR_BUFFER,
CALL_CLEAR_RENDER_TARGET,
CALL_CLEAR_DEPTH_STENCIL,
};
struct call_resource_copy_region
{
struct pipe_resource *dst;
unsigned dst_level;
unsigned dstx, dsty, dstz;
struct pipe_resource *src;
unsigned src_level;
const struct pipe_box *src_box;
};
struct call_clear
{
unsigned buffers;
const union pipe_color_union *color;
double depth;
unsigned stencil;
};
struct call_clear_buffer
{
struct pipe_resource *res;
unsigned offset;
unsigned size;
const void *clear_value;
int clear_value_size;
};
struct dd_call
{
enum call_type type;
union {
struct pipe_draw_info draw_vbo;
struct call_resource_copy_region resource_copy_region;
struct pipe_blit_info blit;
struct pipe_resource *flush_resource;
struct call_clear clear;
struct call_clear_buffer clear_buffer;
} info;
};
static FILE *
dd_get_file_stream(struct dd_context *dctx)
{
struct pipe_screen *screen = dctx->pipe->screen;
FILE *f = dd_get_debug_file();
if (!f)
return NULL;
fprintf(f, "Driver vendor: %s\n", screen->get_vendor(screen));
fprintf(f, "Device vendor: %s\n", screen->get_device_vendor(screen));
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
return f;
}
static void
dd_close_file_stream(FILE *f)
{
fclose(f);
}
static unsigned
dd_num_active_viewports(struct dd_context *dctx)
{
struct tgsi_shader_info info;
const struct tgsi_token *tokens;
if (dctx->shaders[PIPE_SHADER_GEOMETRY])
tokens = dctx->shaders[PIPE_SHADER_GEOMETRY]->state.shader.tokens;
else if (dctx->shaders[PIPE_SHADER_TESS_EVAL])
tokens = dctx->shaders[PIPE_SHADER_TESS_EVAL]->state.shader.tokens;
else if (dctx->shaders[PIPE_SHADER_VERTEX])
tokens = dctx->shaders[PIPE_SHADER_VERTEX]->state.shader.tokens;
else
return 1;
tgsi_scan_shader(tokens, &info);
return info.writes_viewport_index ? PIPE_MAX_VIEWPORTS : 1;
}
#define COLOR_RESET "\033[0m"
#define COLOR_SHADER "\033[1;32m"
#define COLOR_STATE "\033[1;33m"
#define DUMP(name, var) do { \
fprintf(f, COLOR_STATE #name ": " COLOR_RESET); \
util_dump_##name(f, var); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_I(name, var, i) do { \
fprintf(f, COLOR_STATE #name " %i: " COLOR_RESET, i); \
util_dump_##name(f, var); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_M(name, var, member) do { \
fprintf(f, " " #member ": "); \
util_dump_##name(f, (var)->member); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_M_ADDR(name, var, member) do { \
fprintf(f, " " #member ": "); \
util_dump_##name(f, &(var)->member); \
fprintf(f, "\n"); \
} while(0)
static void
print_named_value(FILE *f, const char *name, int value)
{
fprintf(f, COLOR_STATE "%s" COLOR_RESET " = %i\n", name, value);
}
static void
print_named_xvalue(FILE *f, const char *name, int value)
{
fprintf(f, COLOR_STATE "%s" COLOR_RESET " = 0x%08x\n", name, value);
}
static void
util_dump_uint(FILE *f, unsigned i)
{
fprintf(f, "%u", i);
}
static void
util_dump_hex(FILE *f, unsigned i)
{
fprintf(f, "0x%x", i);
}
static void
util_dump_double(FILE *f, double d)
{
fprintf(f, "%f", d);
}
static void
util_dump_format(FILE *f, enum pipe_format format)
{
fprintf(f, "%s", util_format_name(format));
}
static void
util_dump_color_union(FILE *f, const union pipe_color_union *color)
{
fprintf(f, "{f = {%f, %f, %f, %f}, ui = {%u, %u, %u, %u}",
color->f[0], color->f[1], color->f[2], color->f[3],
color->ui[0], color->ui[1], color->ui[2], color->ui[3]);
}
static void
util_dump_query(FILE *f, struct dd_query *query)
{
if (query->type >= PIPE_QUERY_DRIVER_SPECIFIC)
fprintf(f, "PIPE_QUERY_DRIVER_SPECIFIC + %i",
query->type - PIPE_QUERY_DRIVER_SPECIFIC);
else
fprintf(f, "%s", util_dump_query_type(query->type, false));
}
static void
dd_dump_render_condition(struct dd_context *dctx, FILE *f)
{
if (dctx->render_cond.query) {
fprintf(f, "render condition:\n");
DUMP_M(query, &dctx->render_cond, query);
DUMP_M(uint, &dctx->render_cond, condition);
DUMP_M(uint, &dctx->render_cond, mode);
fprintf(f, "\n");
}
}
static void
dd_dump_draw_vbo(struct dd_context *dctx, struct pipe_draw_info *info, FILE *f)
{
int sh, i;
const char *shader_str[PIPE_SHADER_TYPES];
shader_str[PIPE_SHADER_VERTEX] = "VERTEX";
shader_str[PIPE_SHADER_TESS_CTRL] = "TESS_CTRL";
shader_str[PIPE_SHADER_TESS_EVAL] = "TESS_EVAL";
shader_str[PIPE_SHADER_GEOMETRY] = "GEOMETRY";
shader_str[PIPE_SHADER_FRAGMENT] = "FRAGMENT";
shader_str[PIPE_SHADER_COMPUTE] = "COMPUTE";
DUMP(draw_info, info);
if (info->indexed) {
DUMP(index_buffer, &dctx->index_buffer);
if (dctx->index_buffer.buffer)
DUMP_M(resource, &dctx->index_buffer, buffer);
}
if (info->count_from_stream_output)
DUMP_M(stream_output_target, info,
count_from_stream_output);
if (info->indirect)
DUMP_M(resource, info, indirect);
fprintf(f, "\n");
/* TODO: dump active queries */
dd_dump_render_condition(dctx, f);
for (i = 0; i < PIPE_MAX_ATTRIBS; i++)
if (dctx->vertex_buffers[i].buffer ||
dctx->vertex_buffers[i].user_buffer) {
DUMP_I(vertex_buffer, &dctx->vertex_buffers[i], i);
if (dctx->vertex_buffers[i].buffer)
DUMP_M(resource, &dctx->vertex_buffers[i], buffer);
}
if (dctx->velems) {
print_named_value(f, "num vertex elements",
dctx->velems->state.velems.count);
for (i = 0; i < dctx->velems->state.velems.count; i++) {
fprintf(f, " ");
DUMP_I(vertex_element, &dctx->velems->state.velems.velems[i], i);
}
}
print_named_value(f, "num stream output targets", dctx->num_so_targets);
for (i = 0; i < dctx->num_so_targets; i++)
if (dctx->so_targets[i]) {
DUMP_I(stream_output_target, dctx->so_targets[i], i);
DUMP_M(resource, dctx->so_targets[i], buffer);
fprintf(f, " offset = %i\n", dctx->so_offsets[i]);
}
fprintf(f, "\n");
for (sh = 0; sh < PIPE_SHADER_TYPES; sh++) {
if (sh == PIPE_SHADER_COMPUTE)
continue;
if (sh == PIPE_SHADER_TESS_CTRL &&
!dctx->shaders[PIPE_SHADER_TESS_CTRL] &&
dctx->shaders[PIPE_SHADER_TESS_EVAL])
fprintf(f, "tess_state: {default_outer_level = {%f, %f, %f, %f}, "
"default_inner_level = {%f, %f}}\n",
dctx->tess_default_levels[0],
dctx->tess_default_levels[1],
dctx->tess_default_levels[2],
dctx->tess_default_levels[3],
dctx->tess_default_levels[4],
dctx->tess_default_levels[5]);
if (sh == PIPE_SHADER_FRAGMENT)
if (dctx->rs) {
unsigned num_viewports = dd_num_active_viewports(dctx);
if (dctx->rs->state.rs.clip_plane_enable)
DUMP(clip_state, &dctx->clip_state);
for (i = 0; i < num_viewports; i++)
DUMP_I(viewport_state, &dctx->viewports[i], i);
if (dctx->rs->state.rs.scissor)
for (i = 0; i < num_viewports; i++)
DUMP_I(scissor_state, &dctx->scissors[i], i);
DUMP(rasterizer_state, &dctx->rs->state.rs);
if (dctx->rs->state.rs.poly_stipple_enable)
DUMP(poly_stipple, &dctx->polygon_stipple);
fprintf(f, "\n");
}
if (!dctx->shaders[sh])
continue;
fprintf(f, COLOR_SHADER "begin shader: %s" COLOR_RESET "\n", shader_str[sh]);
DUMP(shader_state, &dctx->shaders[sh]->state.shader);
for (i = 0; i < PIPE_MAX_CONSTANT_BUFFERS; i++)
if (dctx->constant_buffers[sh][i].buffer ||
dctx->constant_buffers[sh][i].user_buffer) {
DUMP_I(constant_buffer, &dctx->constant_buffers[sh][i], i);
if (dctx->constant_buffers[sh][i].buffer)
DUMP_M(resource, &dctx->constant_buffers[sh][i], buffer);
}
for (i = 0; i < PIPE_MAX_SAMPLERS; i++)
if (dctx->sampler_states[sh][i])
DUMP_I(sampler_state, &dctx->sampler_states[sh][i]->state.sampler, i);
for (i = 0; i < PIPE_MAX_SAMPLERS; i++)
if (dctx->sampler_views[sh][i]) {
DUMP_I(sampler_view, dctx->sampler_views[sh][i], i);
DUMP_M(resource, dctx->sampler_views[sh][i], texture);
}
/* TODO: print shader images */
/* TODO: print shader buffers */
fprintf(f, COLOR_SHADER "end shader: %s" COLOR_RESET "\n\n", shader_str[sh]);
}
if (dctx->dsa)
DUMP(depth_stencil_alpha_state, &dctx->dsa->state.dsa);
DUMP(stencil_ref, &dctx->stencil_ref);
if (dctx->blend)
DUMP(blend_state, &dctx->blend->state.blend);
DUMP(blend_color, &dctx->blend_color);
print_named_value(f, "min_samples", dctx->min_samples);
print_named_xvalue(f, "sample_mask", dctx->sample_mask);
fprintf(f, "\n");
DUMP(framebuffer_state, &dctx->framebuffer_state);
for (i = 0; i < dctx->framebuffer_state.nr_cbufs; i++)
if (dctx->framebuffer_state.cbufs[i]) {
fprintf(f, " " COLOR_STATE "cbufs[%i]:" COLOR_RESET "\n ", i);
DUMP(surface, dctx->framebuffer_state.cbufs[i]);
fprintf(f, " ");
DUMP(resource, dctx->framebuffer_state.cbufs[i]->texture);
}
if (dctx->framebuffer_state.zsbuf) {
fprintf(f, " " COLOR_STATE "zsbuf:" COLOR_RESET "\n ");
DUMP(surface, dctx->framebuffer_state.zsbuf);
fprintf(f, " ");
DUMP(resource, dctx->framebuffer_state.zsbuf->texture);
}
fprintf(f, "\n");
}
static void
dd_dump_resource_copy_region(struct dd_context *dctx,
struct call_resource_copy_region *info,
FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, dst);
DUMP_M(uint, info, dst_level);
DUMP_M(uint, info, dstx);
DUMP_M(uint, info, dsty);
DUMP_M(uint, info, dstz);
DUMP_M(resource, info, src);
DUMP_M(uint, info, src_level);
DUMP_M(box, info, src_box);
}
static void
dd_dump_blit(struct dd_context *dctx, struct pipe_blit_info *info, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, dst.resource);
DUMP_M(uint, info, dst.level);
DUMP_M_ADDR(box, info, dst.box);
DUMP_M(format, info, dst.format);
DUMP_M(resource, info, src.resource);
DUMP_M(uint, info, src.level);
DUMP_M_ADDR(box, info, src.box);
DUMP_M(format, info, src.format);
DUMP_M(hex, info, mask);
DUMP_M(uint, info, filter);
DUMP_M(uint, info, scissor_enable);
DUMP_M_ADDR(scissor_state, info, scissor);
DUMP_M(uint, info, render_condition_enable);
if (info->render_condition_enable)
dd_dump_render_condition(dctx, f);
}
static void
dd_dump_flush_resource(struct dd_context *dctx, struct pipe_resource *res,
FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP(resource, res);
}
static void
dd_dump_clear(struct dd_context *dctx, struct call_clear *info, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(uint, info, buffers);
DUMP_M(color_union, info, color);
DUMP_M(double, info, depth);
DUMP_M(hex, info, stencil);
}
static void
dd_dump_clear_buffer(struct dd_context *dctx, struct call_clear_buffer *info,
FILE *f)
{
int i;
const char *value = (const char*)info->clear_value;
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, res);
DUMP_M(uint, info, offset);
DUMP_M(uint, info, size);
DUMP_M(uint, info, clear_value_size);
fprintf(f, " clear_value:");
for (i = 0; i < info->clear_value_size; i++)
fprintf(f, " %02x", value[i]);
fprintf(f, "\n");
}
static void
dd_dump_clear_render_target(struct dd_context *dctx, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
/* TODO */
}
static void
dd_dump_clear_depth_stencil(struct dd_context *dctx, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
/* TODO */
}
static void
dd_dump_driver_state(struct dd_context *dctx, FILE *f, unsigned flags)
{
if (dctx->pipe->dump_debug_state) {
fprintf(f,"\n\n**************************************************"
"***************************\n");
fprintf(f, "Driver-specific state:\n\n");
dctx->pipe->dump_debug_state(dctx->pipe, f, flags);
}
}
static void
dd_dump_call(struct dd_context *dctx, struct dd_call *call, unsigned flags)
{
FILE *f = dd_get_file_stream(dctx);
if (!f)
return;
switch (call->type) {
case CALL_DRAW_VBO:
dd_dump_draw_vbo(dctx, &call->info.draw_vbo, f);
break;
case CALL_RESOURCE_COPY_REGION:
dd_dump_resource_copy_region(dctx, &call->info.resource_copy_region, f);
break;
case CALL_BLIT:
dd_dump_blit(dctx, &call->info.blit, f);
break;
case CALL_FLUSH_RESOURCE:
dd_dump_flush_resource(dctx, call->info.flush_resource, f);
break;
case CALL_CLEAR:
dd_dump_clear(dctx, &call->info.clear, f);
break;
case CALL_CLEAR_BUFFER:
dd_dump_clear_buffer(dctx, &call->info.clear_buffer, f);
break;
case CALL_CLEAR_RENDER_TARGET:
dd_dump_clear_render_target(dctx, f);
break;
case CALL_CLEAR_DEPTH_STENCIL:
dd_dump_clear_depth_stencil(dctx, f);
}
dd_dump_driver_state(dctx, f, flags);
dd_close_file_stream(f);
}
static void
dd_kill_process(void)
{
sync();
fprintf(stderr, "dd: Aborting the process...\n");
fflush(stdout);
fflush(stderr);
abort();
}
static bool
dd_flush_and_check_hang(struct dd_context *dctx,
struct pipe_fence_handle **flush_fence,
unsigned flush_flags)
{
struct pipe_fence_handle *fence = NULL;
struct pipe_context *pipe = dctx->pipe;
struct pipe_screen *screen = pipe->screen;
uint64_t timeout_ms = dd_screen(dctx->base.screen)->timeout_ms;
bool idle;
assert(timeout_ms > 0);
pipe->flush(pipe, &fence, flush_flags);
if (flush_fence)
screen->fence_reference(screen, flush_fence, fence);
if (!fence)
return false;
idle = screen->fence_finish(screen, fence, timeout_ms * 1000000);
screen->fence_reference(screen, &fence, NULL);
if (!idle)
fprintf(stderr, "dd: GPU hang detected!\n");
return !idle;
}
static void
dd_flush_and_handle_hang(struct dd_context *dctx,
struct pipe_fence_handle **fence, unsigned flags,
const char *cause)
{
if (dd_flush_and_check_hang(dctx, fence, flags)) {
FILE *f = dd_get_file_stream(dctx);
if (f) {
fprintf(f, "dd: %s.\n", cause);
dd_dump_driver_state(dctx, f, PIPE_DEBUG_DEVICE_IS_HUNG);
dd_close_file_stream(f);
}
/* Terminate the process to prevent future hangs. */
dd_kill_process();
}
}
static void
dd_context_flush(struct pipe_context *_pipe,
struct pipe_fence_handle **fence, unsigned flags)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
switch (dd_screen(dctx->base.screen)->mode) {
case DD_DETECT_HANGS:
dd_flush_and_handle_hang(dctx, fence, flags,
"GPU hang detected in pipe->flush()");
break;
case DD_DUMP_ALL_CALLS:
pipe->flush(pipe, fence, flags);
break;
default:
assert(0);
}
}
static void
dd_before_draw(struct dd_context *dctx)
{
if (dd_screen(dctx->base.screen)->mode == DD_DETECT_HANGS &&
!dd_screen(dctx->base.screen)->no_flush)
dd_flush_and_handle_hang(dctx, NULL, 0,
"GPU hang most likely caused by internal "
"driver commands");
}
static void
dd_after_draw(struct dd_context *dctx, struct dd_call *call)
{
switch (dd_screen(dctx->base.screen)->mode) {
case DD_DETECT_HANGS:
if (!dd_screen(dctx->base.screen)->no_flush &&
dd_flush_and_check_hang(dctx, NULL, 0)) {
dd_dump_call(dctx, call, PIPE_DEBUG_DEVICE_IS_HUNG);
/* Terminate the process to prevent future hangs. */
dd_kill_process();
}
break;
case DD_DUMP_ALL_CALLS:
dd_dump_call(dctx, call, 0);
break;
default:
assert(0);
}
}
static void
dd_context_draw_vbo(struct pipe_context *_pipe,
const struct pipe_draw_info *info)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_DRAW_VBO;
call.info.draw_vbo = *info;
dd_before_draw(dctx);
pipe->draw_vbo(pipe, info);
dd_after_draw(dctx, &call);
}
static void
dd_context_resource_copy_region(struct pipe_context *_pipe,
struct pipe_resource *dst, unsigned dst_level,
unsigned dstx, unsigned dsty, unsigned dstz,
struct pipe_resource *src, unsigned src_level,
const struct pipe_box *src_box)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_RESOURCE_COPY_REGION;
call.info.resource_copy_region.dst = dst;
call.info.resource_copy_region.dst_level = dst_level;
call.info.resource_copy_region.dstx = dstx;
call.info.resource_copy_region.dsty = dsty;
call.info.resource_copy_region.dstz = dstz;
call.info.resource_copy_region.src = src;
call.info.resource_copy_region.src_level = src_level;
call.info.resource_copy_region.src_box = src_box;
dd_before_draw(dctx);
pipe->resource_copy_region(pipe,
dst, dst_level, dstx, dsty, dstz,
src, src_level, src_box);
dd_after_draw(dctx, &call);
}
static void
dd_context_blit(struct pipe_context *_pipe, const struct pipe_blit_info *info)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_BLIT;
call.info.blit = *info;
dd_before_draw(dctx);
pipe->blit(pipe, info);
dd_after_draw(dctx, &call);
}
static void
dd_context_flush_resource(struct pipe_context *_pipe,
struct pipe_resource *resource)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_FLUSH_RESOURCE;
call.info.flush_resource = resource;
dd_before_draw(dctx);
pipe->flush_resource(pipe, resource);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear(struct pipe_context *_pipe, unsigned buffers,
const union pipe_color_union *color, double depth,
unsigned stencil)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR;
call.info.clear.buffers = buffers;
call.info.clear.color = color;
call.info.clear.depth = depth;
call.info.clear.stencil = stencil;
dd_before_draw(dctx);
pipe->clear(pipe, buffers, color, depth, stencil);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_render_target(struct pipe_context *_pipe,
struct pipe_surface *dst,
const union pipe_color_union *color,
unsigned dstx, unsigned dsty,
unsigned width, unsigned height)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_RENDER_TARGET;
dd_before_draw(dctx);
pipe->clear_render_target(pipe, dst, color, dstx, dsty, width, height);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_depth_stencil(struct pipe_context *_pipe,
struct pipe_surface *dst, unsigned clear_flags,
double depth, unsigned stencil, unsigned dstx,
unsigned dsty, unsigned width, unsigned height)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_DEPTH_STENCIL;
dd_before_draw(dctx);
pipe->clear_depth_stencil(pipe, dst, clear_flags, depth, stencil,
dstx, dsty, width, height);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_buffer(struct pipe_context *_pipe, struct pipe_resource *res,
unsigned offset, unsigned size,
const void *clear_value, int clear_value_size)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_BUFFER;
call.info.clear_buffer.res = res;
call.info.clear_buffer.offset = offset;
call.info.clear_buffer.size = size;
call.info.clear_buffer.clear_value = clear_value;
call.info.clear_buffer.clear_value_size = clear_value_size;
dd_before_draw(dctx);
pipe->clear_buffer(pipe, res, offset, size, clear_value, clear_value_size);
dd_after_draw(dctx, &call);
}
void
dd_init_draw_functions(struct dd_context *dctx)
{
CTX_INIT(flush);
CTX_INIT(draw_vbo);
CTX_INIT(resource_copy_region);
CTX_INIT(blit);
CTX_INIT(clear);
CTX_INIT(clear_render_target);
CTX_INIT(clear_depth_stencil);
CTX_INIT(clear_buffer);
CTX_INIT(flush_resource);
/* launch_grid */
}

View File

@@ -0,0 +1,139 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#ifndef DD_H_
#define DD_H_
#include "pipe/p_context.h"
#include "pipe/p_state.h"
#include "pipe/p_screen.h"
#include "dd_util.h"
enum dd_mode {
DD_DETECT_HANGS,
DD_DUMP_ALL_CALLS
};
struct dd_screen
{
struct pipe_screen base;
struct pipe_screen *screen;
unsigned timeout_ms;
enum dd_mode mode;
bool no_flush;
};
struct dd_query
{
unsigned type;
struct pipe_query *query;
};
struct dd_state
{
void *cso;
union {
struct pipe_blend_state blend;
struct pipe_depth_stencil_alpha_state dsa;
struct pipe_rasterizer_state rs;
struct pipe_sampler_state sampler;
struct {
struct pipe_vertex_element velems[PIPE_MAX_ATTRIBS];
unsigned count;
} velems;
struct pipe_shader_state shader;
} state;
};
struct dd_context
{
struct pipe_context base;
struct pipe_context *pipe;
struct {
struct dd_query *query;
bool condition;
unsigned mode;
} render_cond;
struct pipe_index_buffer index_buffer;
struct pipe_vertex_buffer vertex_buffers[PIPE_MAX_ATTRIBS];
unsigned num_so_targets;
struct pipe_stream_output_target *so_targets[PIPE_MAX_SO_BUFFERS];
unsigned so_offsets[PIPE_MAX_SO_BUFFERS];
struct dd_state *shaders[PIPE_SHADER_TYPES];
struct pipe_constant_buffer constant_buffers[PIPE_SHADER_TYPES][PIPE_MAX_CONSTANT_BUFFERS];
struct pipe_sampler_view *sampler_views[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS];
struct dd_state *sampler_states[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS];
struct pipe_image_view *shader_images[PIPE_SHADER_TYPES][PIPE_MAX_SHADER_IMAGES];
struct pipe_shader_buffer shader_buffers[PIPE_SHADER_TYPES][PIPE_MAX_SHADER_BUFFERS];
struct dd_state *velems;
struct dd_state *rs;
struct dd_state *dsa;
struct dd_state *blend;
struct pipe_blend_color blend_color;
struct pipe_stencil_ref stencil_ref;
unsigned sample_mask;
unsigned min_samples;
struct pipe_clip_state clip_state;
struct pipe_framebuffer_state framebuffer_state;
struct pipe_poly_stipple polygon_stipple;
struct pipe_scissor_state scissors[PIPE_MAX_VIEWPORTS];
struct pipe_viewport_state viewports[PIPE_MAX_VIEWPORTS];
float tess_default_levels[6];
};
struct pipe_context *
dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe);
void
dd_init_draw_functions(struct dd_context *dctx);
static inline struct dd_context *
dd_context(struct pipe_context *pipe)
{
return (struct dd_context *)pipe;
}
static inline struct dd_screen *
dd_screen(struct pipe_screen *screen)
{
return (struct dd_screen*)screen;
}
#define CTX_INIT(_member) \
dctx->base._member = dctx->pipe->_member ? dd_context_##_member : NULL
#endif /* DD_H_ */

View File

@@ -1,5 +1,8 @@
/*
* Copyright 2012 Advanced Micro Devices, Inc.
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2010 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
@@ -20,17 +23,14 @@
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* Authors:
* Christian König <christian.koenig@amd.com>
*/
**************************************************************************/
#include "sid.h"
#include "si_pipe.h"
#ifndef DD_PUBLIC_H_
#define DD_PUBLIC_H_
void si_cmd_context_control(struct si_pm4_state *pm4)
{
si_pm4_cmd_begin(pm4, PKT3_CONTEXT_CONTROL);
si_pm4_cmd_add(pm4, 0x80000000);
si_pm4_cmd_add(pm4, 0x80000000);
si_pm4_cmd_end(pm4, false);
}
struct pipe_screen;
struct pipe_screen *
ddebug_screen_create(struct pipe_screen *screen);
#endif /* DD_PUBLIC_H_ */

View File

@@ -0,0 +1,353 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "dd_public.h"
#include "util/u_memory.h"
#include <stdio.h>
static const char *
dd_screen_get_name(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_name(screen);
}
static const char *
dd_screen_get_vendor(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_vendor(screen);
}
static const char *
dd_screen_get_device_vendor(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_device_vendor(screen);
}
static int
dd_screen_get_param(struct pipe_screen *_screen,
enum pipe_cap param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_param(screen, param);
}
static float
dd_screen_get_paramf(struct pipe_screen *_screen,
enum pipe_capf param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_paramf(screen, param);
}
static int
dd_screen_get_shader_param(struct pipe_screen *_screen, unsigned shader,
enum pipe_shader_cap param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_shader_param(screen, shader, param);
}
static uint64_t
dd_screen_get_timestamp(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_timestamp(screen);
}
static struct pipe_context *
dd_screen_context_create(struct pipe_screen *_screen, void *priv,
unsigned flags)
{
struct dd_screen *dscreen = dd_screen(_screen);
struct pipe_screen *screen = dscreen->screen;
flags |= PIPE_CONTEXT_DEBUG;
return dd_context_create(dscreen,
screen->context_create(screen, priv, flags));
}
static boolean
dd_screen_is_format_supported(struct pipe_screen *_screen,
enum pipe_format format,
enum pipe_texture_target target,
unsigned sample_count,
unsigned tex_usage)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->is_format_supported(screen, format, target, sample_count,
tex_usage);
}
static boolean
dd_screen_can_create_resource(struct pipe_screen *_screen,
const struct pipe_resource *templat)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->can_create_resource(screen, templat);
}
static void
dd_screen_flush_frontbuffer(struct pipe_screen *_screen,
struct pipe_resource *resource,
unsigned level, unsigned layer,
void *context_private,
struct pipe_box *sub_box)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->flush_frontbuffer(screen, resource, level, layer, context_private,
sub_box);
}
static int
dd_screen_get_driver_query_info(struct pipe_screen *_screen,
unsigned index,
struct pipe_driver_query_info *info)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_driver_query_info(screen, index, info);
}
static int
dd_screen_get_driver_query_group_info(struct pipe_screen *_screen,
unsigned index,
struct pipe_driver_query_group_info *info)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_driver_query_group_info(screen, index, info);
}
/********************************************************************
* resource
*/
static struct pipe_resource *
dd_screen_resource_create(struct pipe_screen *_screen,
const struct pipe_resource *templat)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res = screen->resource_create(screen, templat);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static struct pipe_resource *
dd_screen_resource_from_handle(struct pipe_screen *_screen,
const struct pipe_resource *templ,
struct winsys_handle *handle)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res =
screen->resource_from_handle(screen, templ, handle);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static struct pipe_resource *
dd_screen_resource_from_user_memory(struct pipe_screen *_screen,
const struct pipe_resource *templ,
void *user_memory)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res =
screen->resource_from_user_memory(screen, templ, user_memory);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static void
dd_screen_resource_destroy(struct pipe_screen *_screen,
struct pipe_resource *res)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->resource_destroy(screen, res);
}
static boolean
dd_screen_resource_get_handle(struct pipe_screen *_screen,
struct pipe_resource *resource,
struct winsys_handle *handle)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->resource_get_handle(screen, resource, handle);
}
/********************************************************************
* fence
*/
static void
dd_screen_fence_reference(struct pipe_screen *_screen,
struct pipe_fence_handle **pdst,
struct pipe_fence_handle *src)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->fence_reference(screen, pdst, src);
}
static boolean
dd_screen_fence_finish(struct pipe_screen *_screen,
struct pipe_fence_handle *fence,
uint64_t timeout)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->fence_finish(screen, fence, timeout);
}
/********************************************************************
* screen
*/
static void
dd_screen_destroy(struct pipe_screen *_screen)
{
struct dd_screen *dscreen = dd_screen(_screen);
struct pipe_screen *screen = dscreen->screen;
screen->destroy(screen);
FREE(dscreen);
}
struct pipe_screen *
ddebug_screen_create(struct pipe_screen *screen)
{
struct dd_screen *dscreen;
const char *option = debug_get_option("GALLIUM_DDEBUG", NULL);
bool dump_always = option && !strcmp(option, "always");
bool no_flush = option && strstr(option, "noflush");
bool help = option && !strcmp(option, "help");
unsigned timeout = 0;
if (help) {
puts("Gallium driver debugger");
puts("");
puts("Usage:");
puts("");
puts(" GALLIUM_DDEBUG=always");
puts(" Dump context and driver information after every draw call into");
puts(" $HOME/"DD_DIR"/.");
puts("");
puts(" GALLIUM_DDEBUG=[timeout in ms] noflush");
puts(" Flush and detect a device hang after every draw call based on the given");
puts(" fence timeout and dump context and driver information into");
puts(" $HOME/"DD_DIR"/ when a hang is detected.");
puts(" If 'noflush' is specified, only detect hangs in pipe->flush.");
puts("");
exit(0);
}
if (!option)
return screen;
if (!dump_always && sscanf(option, "%u", &timeout) != 1)
return screen;
dscreen = CALLOC_STRUCT(dd_screen);
if (!dscreen)
return NULL;
#define SCR_INIT(_member) \
dscreen->base._member = screen->_member ? dd_screen_##_member : NULL
dscreen->base.destroy = dd_screen_destroy;
dscreen->base.get_name = dd_screen_get_name;
dscreen->base.get_vendor = dd_screen_get_vendor;
dscreen->base.get_device_vendor = dd_screen_get_device_vendor;
dscreen->base.get_param = dd_screen_get_param;
dscreen->base.get_paramf = dd_screen_get_paramf;
dscreen->base.get_shader_param = dd_screen_get_shader_param;
/* get_video_param */
/* get_compute_param */
SCR_INIT(get_timestamp);
dscreen->base.context_create = dd_screen_context_create;
dscreen->base.is_format_supported = dd_screen_is_format_supported;
/* is_video_format_supported */
SCR_INIT(can_create_resource);
dscreen->base.resource_create = dd_screen_resource_create;
dscreen->base.resource_from_handle = dd_screen_resource_from_handle;
SCR_INIT(resource_from_user_memory);
dscreen->base.resource_get_handle = dd_screen_resource_get_handle;
dscreen->base.resource_destroy = dd_screen_resource_destroy;
SCR_INIT(flush_frontbuffer);
SCR_INIT(fence_reference);
SCR_INIT(fence_finish);
SCR_INIT(get_driver_query_info);
SCR_INIT(get_driver_query_group_info);
#undef SCR_INIT
dscreen->screen = screen;
dscreen->timeout_ms = timeout;
dscreen->mode = dump_always ? DD_DUMP_ALL_CALLS : DD_DETECT_HANGS;
dscreen->no_flush = no_flush;
switch (dscreen->mode) {
case DD_DUMP_ALL_CALLS:
fprintf(stderr, "Gallium debugger active. Logging all calls.\n");
break;
case DD_DETECT_HANGS:
fprintf(stderr, "Gallium debugger active. "
"The hang detection timout is %i ms.\n", timeout);
break;
default:
assert(0);
}
return &dscreen->base;
}

View File

@@ -0,0 +1,71 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#ifndef DD_UTIL_H
#define DD_UTIL_H
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>
#include "os/os_process.h"
#include "util/u_debug.h"
/* name of the directory in home */
#define DD_DIR "ddebug_dumps"
static inline FILE *
dd_get_debug_file()
{
static unsigned index;
char proc_name[128], dir[256], name[512];
FILE *f;
if (!os_get_process_name(proc_name, sizeof(proc_name))) {
fprintf(stderr, "dd: can't get the process name\n");
return NULL;
}
snprintf(dir, sizeof(dir), "%s/"DD_DIR, debug_get_option("HOME", "."));
if (mkdir(dir, 0774) && errno != EEXIST) {
fprintf(stderr, "dd: can't create a directory (%i)\n", errno);
return NULL;
}
snprintf(name, sizeof(name), "%s/%s_%u_%08u", dir, proc_name, getpid(), index++);
f = fopen(name, "w");
if (!f) {
fprintf(stderr, "dd: can't open file %s\n", name);
return NULL;
}
return f;
}
#endif /* DD_UTIL_H */

View File

@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32901 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10755 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67771 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63970 bytes, from 2015-09-14 20:50:12)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)

View File

@@ -86,7 +86,7 @@ static const uint8_t a20x_primtypes[PIPE_PRIM_MAX] = {
};
struct pipe_context *
fd2_context_create(struct pipe_screen *pscreen, void *priv)
fd2_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
{
struct fd_screen *screen = fd_screen(pscreen);
struct fd2_context *fd2_ctx = CALLOC_STRUCT(fd2_context);

View File

@@ -47,6 +47,6 @@ fd2_context(struct fd_context *ctx)
}
struct pipe_context *
fd2_context_create(struct pipe_screen *pscreen, void *priv);
fd2_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
#endif /* FD2_CONTEXT_H_ */

View File

@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32901 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10755 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67771 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63970 bytes, from 2015-09-14 20:50:12)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -280,6 +280,8 @@ enum a3xx_rb_blend_opcode {
enum a3xx_intp_mode {
SMOOTH = 0,
FLAT = 1,
ZERO = 2,
ONE = 3,
};
enum a3xx_repl_mode {
@@ -680,9 +682,16 @@ static inline uint32_t REG_A3XX_CP_PROTECT_REG(uint32_t i0) { return 0x00000460
#define A3XX_GRAS_CL_CLIP_CNTL_VP_CLIP_CODE_IGNORE 0x00080000
#define A3XX_GRAS_CL_CLIP_CNTL_VP_XFORM_DISABLE 0x00100000
#define A3XX_GRAS_CL_CLIP_CNTL_PERSP_DIVISION_DISABLE 0x00200000
#define A3XX_GRAS_CL_CLIP_CNTL_ZERO_GB_SCALE_Z 0x00400000
#define A3XX_GRAS_CL_CLIP_CNTL_ZCOORD 0x00800000
#define A3XX_GRAS_CL_CLIP_CNTL_WCOORD 0x01000000
#define A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE 0x02000000
#define A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES__MASK 0x1c000000
#define A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES__SHIFT 26
static inline uint32_t A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(uint32_t val)
{
return ((val) << A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES__SHIFT) & A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES__MASK;
}
#define REG_A3XX_GRAS_CL_GB_CLIP_ADJ 0x00002044
#define A3XX_GRAS_CL_GB_CLIP_ADJ_HORZ__MASK 0x000003ff
@@ -773,7 +782,7 @@ static inline uint32_t A3XX_GRAS_SU_POINT_SIZE(float val)
#define A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT 0
static inline uint32_t A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL(float val)
{
return ((((int32_t)(val * 16384.0))) << A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__MASK;
return ((((int32_t)(val * 1048576.0))) << A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__SHIFT) & A3XX_GRAS_SU_POLY_OFFSET_SCALE_VAL__MASK;
}
#define REG_A3XX_GRAS_SU_POLY_OFFSET_OFFSET 0x0000206d
@@ -894,6 +903,9 @@ static inline uint32_t A3XX_RB_MODE_CONTROL_MRT(uint32_t val)
#define A3XX_RB_MODE_CONTROL_PACKER_TIMER_ENABLE 0x00010000
#define REG_A3XX_RB_RENDER_CONTROL 0x000020c1
#define A3XX_RB_RENDER_CONTROL_DUAL_COLOR_IN_ENABLE 0x00000001
#define A3XX_RB_RENDER_CONTROL_YUV_IN_ENABLE 0x00000002
#define A3XX_RB_RENDER_CONTROL_COV_VALUE_INPUT_ENABLE 0x00000004
#define A3XX_RB_RENDER_CONTROL_FACENESS 0x00000008
#define A3XX_RB_RENDER_CONTROL_BIN_WIDTH__MASK 0x00000ff0
#define A3XX_RB_RENDER_CONTROL_BIN_WIDTH__SHIFT 4
@@ -907,6 +919,8 @@ static inline uint32_t A3XX_RB_RENDER_CONTROL_BIN_WIDTH(uint32_t val)
#define A3XX_RB_RENDER_CONTROL_YCOORD 0x00008000
#define A3XX_RB_RENDER_CONTROL_ZCOORD 0x00010000
#define A3XX_RB_RENDER_CONTROL_WCOORD 0x00020000
#define A3XX_RB_RENDER_CONTROL_I_CLAMP_ENABLE 0x00080000
#define A3XX_RB_RENDER_CONTROL_COV_VALUE_OUTPUT_ENABLE 0x00100000
#define A3XX_RB_RENDER_CONTROL_ALPHA_TEST 0x00400000
#define A3XX_RB_RENDER_CONTROL_ALPHA_TEST_FUNC__MASK 0x07000000
#define A3XX_RB_RENDER_CONTROL_ALPHA_TEST_FUNC__SHIFT 24
@@ -914,6 +928,8 @@ static inline uint32_t A3XX_RB_RENDER_CONTROL_ALPHA_TEST_FUNC(enum adreno_compar
{
return ((val) << A3XX_RB_RENDER_CONTROL_ALPHA_TEST_FUNC__SHIFT) & A3XX_RB_RENDER_CONTROL_ALPHA_TEST_FUNC__MASK;
}
#define A3XX_RB_RENDER_CONTROL_ALPHA_TO_COVERAGE 0x40000000
#define A3XX_RB_RENDER_CONTROL_ALPHA_TO_ONE 0x80000000
#define REG_A3XX_RB_MSAA_CONTROL 0x000020c2
#define A3XX_RB_MSAA_CONTROL_DISABLE 0x00000400

View File

@@ -28,6 +28,7 @@
#include "pipe/p_state.h"
#include "util/u_blend.h"
#include "util/u_dual_blend.h"
#include "util/u_string.h"
#include "util/u_memory.h"
@@ -131,5 +132,8 @@ fd3_blend_state_create(struct pipe_context *pctx,
so->rb_mrt[i].control |= A3XX_RB_MRT_CONTROL_DITHER_MODE(DITHER_ALWAYS);
}
if (cso->rt[0].blend_enable && util_blend_state_is_dual(cso, 0))
so->rb_render_control = A3XX_RB_RENDER_CONTROL_DUAL_COLOR_IN_ENABLE;
return so;
}

View File

@@ -36,6 +36,7 @@
struct fd3_blend_stateobj {
struct pipe_blend_state base;
uint32_t rb_render_control;
struct {
/* Blend control bits for color if there is an alpha channel */
uint32_t blend_control_rgb;

View File

@@ -98,7 +98,7 @@ static const uint8_t primtypes[PIPE_PRIM_MAX] = {
};
struct pipe_context *
fd3_context_create(struct pipe_screen *pscreen, void *priv)
fd3_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
{
struct fd_screen *screen = fd_screen(pscreen);
struct fd3_context *fd3_ctx = CALLOC_STRUCT(fd3_context);

View File

@@ -73,22 +73,6 @@ struct fd3_context {
*/
struct fd_vertex_state blit_vbuf_state;
/*
* Border color layout *appears* to be as arrays of 0x40 byte
* elements, with frag shader elements starting at (16 x 0x40).
* But at some point I should probably experiment more with
* samplers in vertex shaders to be sure. Unclear about why
* there is this offset when there are separate VS and FS base
* addr regs.
*
* The first 8 bytes of each entry are the requested border
* color in fp16. Unclear about the rest.. could be used for
* other formats, or could simply be for aligning the pitch
* to 32 pixels.
*/
#define BORDERCOLOR_SIZE 0x40
struct u_upload_mgr *border_color_uploader;
struct pipe_resource *border_color_buf;
@@ -119,6 +103,6 @@ fd3_context(struct fd_context *ctx)
}
struct pipe_context *
fd3_context_create(struct pipe_screen *pscreen, void *priv);
fd3_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
#endif /* FD3_CONTEXT_H_ */

View File

@@ -149,6 +149,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
&fd3_ctx->border_color_buf,
&ptr);
fd_setup_border_colors(tex, ptr, tex_off[sb]);
if (tex->num_samplers > 0) {
/* output sampler state: */
OUT_PKT3(ring, CP_LOAD_STATE, 2 + (2 * tex->num_samplers));
@@ -163,57 +165,6 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer *ring,
const struct fd3_sampler_stateobj *sampler = tex->samplers[i] ?
fd3_sampler_stateobj(tex->samplers[i]) :
&dummy_sampler;
uint16_t *bcolor = (uint16_t *)((uint8_t *)ptr +
(BORDERCOLOR_SIZE * tex_off[sb]) +
(BORDERCOLOR_SIZE * i));
uint32_t *bcolor32 = (uint32_t *)&bcolor[16];
/*
* XXX HACK ALERT XXX
*
* The border colors need to be swizzled in a particular
* format-dependent order. Even though samplers don't know about
* formats, we can assume that with a GL state tracker, there's a
* 1:1 correspondence between sampler and texture. Take advantage
* of that knowledge.
*/
if (i < tex->num_textures && tex->textures[i]) {
const struct util_format_description *desc =
util_format_description(tex->textures[i]->format);
for (j = 0; j < 4; j++) {
if (desc->swizzle[j] >= 4)
continue;
const struct util_format_channel_description *chan =
&desc->channel[desc->swizzle[j]];
int size = chan->size;
/* The Z16 texture format we use seems to look in the
* 32-bit border color slots
*/
if (desc->colorspace == UTIL_FORMAT_COLORSPACE_ZS)
size = 32;
/* Formats like R11G11B10 or RGB9_E5 don't specify
* per-channel sizes properly.
*/
if (desc->layout == UTIL_FORMAT_LAYOUT_OTHER)
size = 16;
if (chan->pure_integer && size > 16)
bcolor32[desc->swizzle[j] + 4] =
sampler->base.border_color.i[j];
else if (size > 16)
bcolor32[desc->swizzle[j]] =
fui(sampler->base.border_color.f[j]);
else if (chan->pure_integer)
bcolor[desc->swizzle[j] + 8] =
sampler->base.border_color.i[j];
else
bcolor[desc->swizzle[j]] =
util_float_to_half(sampler->base.border_color.f[j]);
}
}
OUT_RING(ring, sampler->texsamp0);
OUT_RING(ring, sampler->texsamp1);
@@ -400,15 +351,27 @@ fd3_emit_vertex_bufs(struct fd_ringbuffer *ring, struct fd3_emit *emit)
unsigned vtxcnt_regid = regid(63, 0);
for (i = 0; i < vp->inputs_count; i++) {
uint8_t semantic = sem2name(vp->inputs[i].semantic);
if (semantic == TGSI_SEMANTIC_VERTEXID_NOBASE)
vertex_regid = vp->inputs[i].regid;
else if (semantic == TGSI_SEMANTIC_INSTANCEID)
instance_regid = vp->inputs[i].regid;
else if (semantic == IR3_SEMANTIC_VTXCNT)
vtxcnt_regid = vp->inputs[i].regid;
else if (i < vtx->vtx->num_elements && vp->inputs[i].compmask)
if (vp->inputs[i].sysval) {
switch(vp->inputs[i].slot) {
case SYSTEM_VALUE_BASE_VERTEX:
/* handled elsewhere */
break;
case SYSTEM_VALUE_VERTEX_ID_ZERO_BASE:
vertex_regid = vp->inputs[i].regid;
break;
case SYSTEM_VALUE_INSTANCE_ID:
instance_regid = vp->inputs[i].regid;
break;
case SYSTEM_VALUE_VERTEX_CNT:
vtxcnt_regid = vp->inputs[i].regid;
break;
default:
unreachable("invalid system value");
break;
}
} else if (i < vtx->vtx->num_elements && vp->inputs[i].compmask) {
last = i;
}
}
/* hw doesn't like to be configured for zero vbo's, it seems: */
@@ -419,7 +382,7 @@ fd3_emit_vertex_bufs(struct fd_ringbuffer *ring, struct fd3_emit *emit)
return;
for (i = 0, j = 0; i <= last; i++) {
assert(sem2name(vp->inputs[i].semantic) == 0);
assert(!vp->inputs[i].sysval);
if (vp->inputs[i].compmask) {
struct pipe_vertex_element *elem = &vtx->vtx->pipe[i];
const struct pipe_vertex_buffer *vb =
@@ -492,8 +455,10 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
A3XX_RB_MSAA_CONTROL_SAMPLE_MASK(ctx->sample_mask));
}
if ((dirty & (FD_DIRTY_ZSA | FD_DIRTY_PROG)) && !emit->key.binning_pass) {
uint32_t val = fd3_zsa_stateobj(ctx->zsa)->rb_render_control;
if ((dirty & (FD_DIRTY_ZSA | FD_DIRTY_PROG | FD_DIRTY_BLEND_DUAL)) &&
!emit->key.binning_pass) {
uint32_t val = fd3_zsa_stateobj(ctx->zsa)->rb_render_control |
fd3_blend_stateobj(ctx->blend)->rb_render_control;
val |= COND(fp->frag_face, A3XX_RB_RENDER_CONTROL_FACENESS);
val |= COND(fp->frag_coord, A3XX_RB_RENDER_CONTROL_XCOORD |
@@ -563,10 +528,30 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
val |= COND(fp->writes_pos, A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE);
val |= COND(fp->frag_coord, A3XX_GRAS_CL_CLIP_CNTL_ZCOORD |
A3XX_GRAS_CL_CLIP_CNTL_WCOORD);
/* TODO only use if prog doesn't use clipvertex/clipdist */
val |= A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(
MIN2(util_bitcount(ctx->rasterizer->clip_plane_enable), 6));
OUT_PKT0(ring, REG_A3XX_GRAS_CL_CLIP_CNTL, 1);
OUT_RING(ring, val);
}
if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_UCP)) {
uint32_t planes = ctx->rasterizer->clip_plane_enable;
int count = 0;
while (planes && count < 6) {
int i = ffs(planes) - 1;
planes &= ~(1U << i);
fd_wfi(ctx, ring);
OUT_PKT0(ring, REG_A3XX_GRAS_CL_USER_PLANE(count++), 4);
OUT_RING(ring, fui(ctx->ucp.ucp[i][0]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][1]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][2]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][3]));
}
}
/* NOTE: since primitive_restart is not actually part of any
* state object, we need to make sure that we always emit
* PRIM_VTX_CNTL.. either that or be more clever and detect
@@ -620,9 +605,13 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
OUT_RING(ring, A3XX_GRAS_CL_VPORT_ZSCALE(ctx->viewport.scale[2]));
}
if (dirty & (FD_DIRTY_PROG | FD_DIRTY_FRAMEBUFFER)) {
if (dirty & (FD_DIRTY_PROG | FD_DIRTY_FRAMEBUFFER | FD_DIRTY_BLEND_DUAL)) {
struct pipe_framebuffer_state *pfb = &ctx->framebuffer;
fd3_program_emit(ring, emit, pfb->nr_cbufs, pfb->cbufs);
int nr_cbufs = pfb->nr_cbufs;
if (fd3_blend_stateobj(ctx->blend)->rb_render_control &
A3XX_RB_RENDER_CONTROL_DUAL_COLOR_IN_ENABLE)
nr_cbufs++;
fd3_program_emit(ring, emit, nr_cbufs, pfb->cbufs);
}
/* TODO we should not need this or fd_wfi() before emit_constants():

View File

@@ -355,6 +355,8 @@ fd3_fs_output_format(enum pipe_format format)
case PIPE_FORMAT_R16G16_FLOAT:
case PIPE_FORMAT_R11G11B10_FLOAT:
return RB_R16G16B16A16_FLOAT;
case PIPE_FORMAT_L8_UNORM:
return RB_R8G8B8A8_UNORM;
default:
return fd3_pipe2color(format);
}

View File

@@ -194,24 +194,17 @@ fd3_program_emit(struct fd_ringbuffer *ring, struct fd3_emit *emit,
/* seems like vs->constlen + fs->constlen > 256, then CONSTMODE=1 */
constmode = ((vp->constlen + fp->constlen) > 256) ? 1 : 0;
pos_regid = ir3_find_output_regid(vp,
ir3_semantic_name(TGSI_SEMANTIC_POSITION, 0));
posz_regid = ir3_find_output_regid(fp,
ir3_semantic_name(TGSI_SEMANTIC_POSITION, 0));
psize_regid = ir3_find_output_regid(vp,
ir3_semantic_name(TGSI_SEMANTIC_PSIZE, 0));
pos_regid = ir3_find_output_regid(vp, VARYING_SLOT_POS);
posz_regid = ir3_find_output_regid(fp, FRAG_RESULT_DEPTH);
psize_regid = ir3_find_output_regid(vp, VARYING_SLOT_PSIZ);
if (fp->color0_mrt) {
color_regid[0] = color_regid[1] = color_regid[2] = color_regid[3] =
ir3_find_output_regid(fp, ir3_semantic_name(TGSI_SEMANTIC_COLOR, 0));
ir3_find_output_regid(fp, FRAG_RESULT_COLOR);
} else {
for (i = 0; i < fp->outputs_count; i++) {
ir3_semantic sem = fp->outputs[i].semantic;
unsigned idx = sem2idx(sem);
if (sem2name(sem) != TGSI_SEMANTIC_COLOR)
continue;
debug_assert(idx < ARRAY_SIZE(color_regid));
color_regid[idx] = fp->outputs[i].regid;
}
color_regid[0] = ir3_find_output_regid(fp, FRAG_RESULT_DATA0);
color_regid[1] = ir3_find_output_regid(fp, FRAG_RESULT_DATA1);
color_regid[2] = ir3_find_output_regid(fp, FRAG_RESULT_DATA2);
color_regid[3] = ir3_find_output_regid(fp, FRAG_RESULT_DATA3);
}
/* adjust regids for alpha output formats. there is no alpha render
@@ -280,14 +273,14 @@ fd3_program_emit(struct fd_ringbuffer *ring, struct fd3_emit *emit,
j = ir3_next_varying(fp, j);
if (j < fp->inputs_count) {
k = ir3_find_output(vp, fp->inputs[j].semantic);
k = ir3_find_output(vp, fp->inputs[j].slot);
reg |= A3XX_SP_VS_OUT_REG_A_REGID(vp->outputs[k].regid);
reg |= A3XX_SP_VS_OUT_REG_A_COMPMASK(fp->inputs[j].compmask);
}
j = ir3_next_varying(fp, j);
if (j < fp->inputs_count) {
k = ir3_find_output(vp, fp->inputs[j].semantic);
k = ir3_find_output(vp, fp->inputs[j].slot);
reg |= A3XX_SP_VS_OUT_REG_B_REGID(vp->outputs[k].regid);
reg |= A3XX_SP_VS_OUT_REG_B_COMPMASK(fp->inputs[j].compmask);
}
@@ -394,7 +387,6 @@ fd3_program_emit(struct fd_ringbuffer *ring, struct fd3_emit *emit,
/* figure out VARYING_INTERP / FLAT_SHAD register values: */
for (j = -1; (j = ir3_next_varying(fp, j)) < (int)fp->inputs_count; ) {
uint32_t interp = fp->inputs[j].interpolate;
/* TODO might be cleaner to just +8 in SP_VS_VPC_DST_REG
* instead.. rather than -8 everywhere else..
@@ -406,8 +398,8 @@ fd3_program_emit(struct fd_ringbuffer *ring, struct fd3_emit *emit,
*/
debug_assert((inloc % 4) == 0);
if ((interp == TGSI_INTERPOLATE_CONSTANT) ||
((interp == TGSI_INTERPOLATE_COLOR) && emit->rasterflat)) {
if ((fp->inputs[j].interpolate == INTERP_QUALIFIER_FLAT) ||
(fp->inputs[j].rasterflat && emit->rasterflat)) {
uint32_t loc = inloc;
for (i = 0; i < 4; i++, loc++) {
vinterp[loc / 16] |= FLAT << ((loc % 16) * 2);
@@ -415,14 +407,20 @@ fd3_program_emit(struct fd_ringbuffer *ring, struct fd3_emit *emit,
}
}
/* Replace the .xy coordinates with S/T from the point sprite. Set
* interpolation bits for .zw such that they become .01
*/
if (emit->sprite_coord_enable & (1 << sem2idx(fp->inputs[j].semantic))) {
vpsrepl[inloc / 16] |= (emit->sprite_coord_mode ? 0x0d : 0x09)
<< ((inloc % 16) * 2);
vinterp[(inloc + 2) / 16] |= 2 << (((inloc + 2) % 16) * 2);
vinterp[(inloc + 3) / 16] |= 3 << (((inloc + 3) % 16) * 2);
gl_varying_slot slot = fp->inputs[j].slot;
/* since we don't enable PIPE_CAP_TGSI_TEXCOORD: */
if (slot >= VARYING_SLOT_VAR0) {
unsigned texmask = 1 << (slot - VARYING_SLOT_VAR0);
/* Replace the .xy coordinates with S/T from the point sprite. Set
* interpolation bits for .zw such that they become .01
*/
if (emit->sprite_coord_enable & texmask) {
vpsrepl[inloc / 16] |= (emit->sprite_coord_mode ? 0x0d : 0x09)
<< ((inloc % 16) * 2);
vinterp[(inloc + 2) / 16] |= 2 << (((inloc + 2) % 16) * 2);
vinterp[(inloc + 3) / 16] |= 3 << (((inloc + 3) % 16) * 2);
}
}
}

View File

@@ -65,7 +65,8 @@ fd3_rasterizer_state_create(struct pipe_context *pctx,
if (cso->multisample)
TODO
*/
so->gras_cl_clip_cntl = A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER; /* ??? */
so->gras_cl_clip_cntl = A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER /* ??? */ |
COND(cso->clip_halfz, A3XX_GRAS_CL_CLIP_CNTL_ZERO_GB_SCALE_Z);
so->gras_su_point_minmax =
A3XX_GRAS_SU_POINT_MINMAX_MIN(psize_min) |
A3XX_GRAS_SU_POINT_MINMAX_MAX(psize_max);

View File

@@ -11,10 +11,10 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml ( 364 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml ( 1453 bytes, from 2015-05-20 20:03:07)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml ( 32901 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10755 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67771 bytes, from 2015-09-14 20:46:55)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63970 bytes, from 2015-09-14 20:50:12)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -162,10 +162,13 @@ enum a4xx_tex_fmt {
TFMT4_8_UNORM = 4,
TFMT4_8_8_UNORM = 14,
TFMT4_8_8_8_8_UNORM = 28,
TFMT4_8_SNORM = 5,
TFMT4_8_8_SNORM = 15,
TFMT4_8_8_8_8_SNORM = 29,
TFMT4_8_UINT = 6,
TFMT4_8_8_UINT = 16,
TFMT4_8_8_8_8_UINT = 30,
TFMT4_8_SINT = 7,
TFMT4_8_8_SINT = 17,
TFMT4_8_8_8_8_SINT = 31,
TFMT4_16_UINT = 21,
@@ -246,7 +249,8 @@ enum a4xx_tex_clamp {
A4XX_TEX_REPEAT = 0,
A4XX_TEX_CLAMP_TO_EDGE = 1,
A4XX_TEX_MIRROR_REPEAT = 2,
A4XX_TEX_CLAMP_NONE = 3,
A4XX_TEX_CLAMP_TO_BORDER = 3,
A4XX_TEX_MIRROR_CLAMP = 4,
};
enum a4xx_tex_aniso {

Some files were not shown because too many files have changed in this diff Show More