Compare commits

..

168 Commits

Author SHA1 Message Date
Nanley Chery
76f17266ec mesa/texformat: use format conversion function in _mesa_choose_tex_format
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:21 -07:00
Nanley Chery
01024ded1e mesa/texcompress: correct mapping of S3TC formats in conversion function
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-31 15:03:08 -07:00
Dave Airlie
3063913f77 r600/sb: update last_cf for finalize if.
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.

I think this fixes one of the earlier shader going off end
of memory problems we've stopped.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-09-01 07:39:24 +10:00
Matt Turner
a4ba41638d i965/fs: Use greater-equal cmod to implement maximum.
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.

See commit 3b7f683f.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-08-31 11:51:59 -07:00
Ben Widawsky
d2e3638ef9 i965/chv|skl: Apply sampler bypass w/a
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).

Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-31 10:08:43 -07:00
Dave Airlie
78027c965a st/mesa: move to renumbering registers in a group
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:33 +01:00
Dave Airlie
aee73f2942 st/mesa: reduce time spent in calculating temp read/writes
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,

so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:27:18 +01:00
Dave Airlie
46968c1140 st/mesa: cache tgsi opcode info in the instruction
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 11:26:23 +01:00
Dave Airlie
03b7ec8778 r600: move prim convert from geom shader to function.
This should avoid C++ fail including this header.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-31 19:45:13 +10:00
Timothy Arceri
c8bc8d7235 glsl: remove specical case subroutine type counting
Unlike samplers we can get the correct value for subroutines from
component_slots()

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-31 13:10:44 +10:00
Edward O'Callaghan
0d19dc302f r600g: Use TGSI parse results instead of manually exfiltrating
This makes better use of the work that the TGSI API has done for
us.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:14 +02:00
Edward O'Callaghan
3eed81a97b r600g: Set geometry properties in r600_create_shader_state()
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:41:00 +02:00
Edward O'Callaghan
b4dee1b636 r600g: Move geometry properties state from shader to selector
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:44 +02:00
Edward O'Callaghan
7b6369eb69 r600g: Remove dead assigment to 'gs_input_prim' in shader state
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-30 11:40:26 +02:00
Marek Olšák
7dc8a3497f radeonsi: don't use the emit qt keyword in si_init_atom
It confuses my editor.
2015-08-29 23:18:23 +02:00
Marek Olšák
379e3382e8 radeonsi: remove no-op 32-bit masking
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:21 +02:00
Marek Olšák
437cb1e3f4 gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packets
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:08 +02:00
Marek Olšák
e321596e9f winsys/radeon: handle non-zero finite timeout when waiting for buffers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-29 23:03:06 +02:00
Ilia Mirkin
a5a96118ed freedreno/a3xx: implement half-z clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-29 16:18:04 -04:00
Ilia Mirkin
58e24b4761 freedreno/a3xx: add basic clip plane support
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-29 16:18:04 -04:00
Samuel Pitoiset
c8a61ea4fb nvc0: change prefix of MP performance counters to HW_SM
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 11:04:00 +02:00
Samuel Pitoiset
21bdb4d8f3 nvc0: sort performance counter queries by name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:50 +02:00
Samuel Pitoiset
ebca85423c nvc0: make names of performance counter queries consistent
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:44 +02:00
Samuel Pitoiset
981f46aa95 nvc0: use enumerations for driver queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:40 +02:00
Samuel Pitoiset
0eac599001 nvc0: remove commented out code related to PCOUNTER queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-08-29 10:24:35 +02:00
Dave Airlie
6941883175 r600: port si_conv_prim_to_gs_out from radeonsi
This code was broken by the tess merge, and I totally missed it
until now. I'm not sure this fixes anything but it stops the assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
c149d84d45 r600g: use PRIi64 for some compute debug printfs
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Dave Airlie
8d6d0cc17d gallium/util: fix debug_get_flags_option on 32-bit
On 32-bit we need to use PRIu64 flags for printfs,
otherwise this segfaults in R600_DEBUG=help otherwise.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-29 09:06:04 +10:00
Ilia Mirkin
275c5810ca glsl: provide the option of using BFE for unpack builting lowering
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Ilia Mirkin
889a946a45 glsl: use bitfield_insert instead of and + shift + or for packing
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-28 18:28:04 -04:00
Matt Turner
c676c432f3 i965/fs: Remove fs_visitor::try_replace_with_sel().
No shader-db changes on g4x, snb, hsw, or bdw.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
64e312d7fa i965/fs: Replace awful variable names.
start_to      -> dst_start
   end_to        -> dst_end
   start_from    -> src_start
   end_from      -> src_end
   var_to        -> dst_var
   var_from      -> src_var
   reg_to        -> dst_reg
   reg_to_offset -> dst_reg_offset
   reg_from      -> src_reg

Not sure how these made sense to me before.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
a2ff1e95a4 i965/fs: Skip blocks in register coalescing interference check.
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f2f8c43af9 i965/fs: Improve register coalescing interference check.
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.

The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.

With that fixed, we can calculate interference more accurately.

total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs:     346282 -> 342880 (-0.98%)
helped:                                1552

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
f3d0a894af i965/fs: Use overwrites_reg() instead of dst.equals().
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Matt Turner
8765f1d7dd i965: Only consider fixed_hw_reg in equals() if file is HW_REG/IMM.
Noticed when debugging things that lead to the next patch.

On G45 (and presumably ILK) this helps register coalescing:

total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs:     43751 -> 43718 (-0.08%)
helped:                                52
HURT:                                  2

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-28 11:30:47 -07:00
Marta Lofstedt
2581fe931a i965/fs: Do not set the size for zero-size uniforms
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them.  This was previously fixed in:

commit: 781dc7c0e1.

However,

commit: 259f7291de

removed the fix.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-28 09:52:59 -07:00
Daniel Scharrer
0516159613 mesa: return old name for deleted samplers for SAMPLER_BINDING queries
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.

Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:39 +02:00
Daniel Scharrer
5aaaaebf22 mesa: add missing queries for ARB_direct_state_access
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.

CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
2015-08-28 18:08:26 +02:00
Neil Roberts
2dbc6a0ad9 docs: Fix a typo in GL3.txt concerning GL_KHR_context_flush_control 2015-08-28 14:29:22 +01:00
Ilia Mirkin
b319fd7c14 mesa: fix dispatch sanity with GL_OES_texture_storage_multisample_2d_array
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91785
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-08-28 03:12:05 -04:00
Vinson Lee
2ef5a4f830 ABI-check: Use more portable bash invocation.
Fixes 'make check' on FreeBSD.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-27 23:48:43 -07:00
Boyan Ding
86c57ebe0e i965/nir: Make use of nir_opt_undef
Shader-db result on Ivy Bridge:
total instructions in shared programs: 145484 -> 145445 (-0.03%)
instructions in affected programs:     225 -> 186 (-17.33%)
helped:                                5
HURT:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2015-08-27 23:33:49 -07:00
Matt Turner
559b8842fa glapi: Remove _x86_64_get_get_dispatch symbol from x86-64 assembly.
Never used.

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-08-27 22:28:49 -07:00
Ilia Mirkin
4a6a47ed05 glsl: clean up textureSize prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-27 23:49:13 -04:00
Glenn Kennard
608c7b4a63 r600g/sb: Don't crash on empty if jump target
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:36 +10:00
Glenn Kennard
a830225adb r600g/sb: Don't read junk after EOP
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.

Add a couple of asserts, and print EOP bit if set in old asm printer.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:32 +10:00
Glenn Kennard
36f1999a87 r600g/sb: Handle undef in read port tracker
e8e443 missed adding check for undef values also in
unreserve function, leading to an assert triggering.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-28 12:32:14 +10:00
Brian Paul
52f7487923 mesa: rename rowStride to imageStride in texturesubimage()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 15:22:01 -06:00
Ilia Mirkin
2259b11100 mesa: only copy the requested teximage faces
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.

zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-27 17:18:43 -04:00
Kenneth Graunke
0a913a9d85 nir: Convert the builder to use the new NIR cursor API.
The NIR cursor API is exactly what we want for the builder's insertion
point.  This simplifies the API, the implementation, and is actually
more flexible as well.

This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.

v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
2015-08-27 13:36:57 -07:00
Kenneth Graunke
3e3cb77901 nir: Convert the NIR instruction insertion API to use cursors.
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point.  It then reworks the existing API to
simply be a wrapper around that for compatibility.

This largely involves moving the existing code into a new function.

Suggested by Connor Abbott.

v2: Make the legacy functions static inline in nir.h (requested by
    Connor Abbott).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
f90c6b1ce0 nir: Move nir_cursor to nir.h.
We want to use this for normal instruction insertion too, not just
control flow.  Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke
c44d507752 nir: Strengthen "no jumps" assertions in instruction insertion API.
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.

Previously, we only checked this when the new instruction being inserted
was a jump.  This is a red herring - inserting *any* kind of instruction
after a jump is illegal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Brian Paul
bcae4640c8 st/mesa: use PROGRAM_ARRAY for storing structs containing arrays
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices.  But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.

Before, we failed an assertion:
  state_tracker/st_glsl_to_tgsi.cpp:4900:
  Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-08-27 13:11:26 -06:00
Brian Paul
42c7be5877 glsl: fix comment typo: s/filed/field/ 2015-08-27 13:11:26 -06:00
Brian Paul
3c256f572b gallium/util: fix code formatting in u_blitter.h
Trivial.
2015-08-27 13:11:26 -06:00
Jason Ekstrand
fee0c5af11 i965/fs: Split VGRFs after lowering pull constants
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.

This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:36 -07:00
Jason Ekstrand
f2e667172a i964/fs: Refactor assign_constant_locations
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things.  In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-27 12:09:24 -07:00
Kenneth Graunke
885a9b058c i965: Rename INTEL_DEBUG=vec4vs to INTEL_DEBUG=vec4.
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().

This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.

We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
2015-08-27 11:38:50 -07:00
Tapani Pälli
16ad1d2a8d mesa: enable enums for OES_texture_storage_multisample_2d_array
v2: use _mesa_is_gles31(ctx) for verifying we are on ES 3.1,
    remove _es31 usage from get_hash_params.py

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:58:10 +03:00
Tapani Pälli
c2c64fd269 glsl: add support for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample enable bit

Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:54:41 +03:00
Tapani Pälli
b9101b1443 mesa: Add extension enable for OES_texture_storage_multisample_2d_array
v2: use ARB_texture_multisample bit to enable extension

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:53:15 +03:00
Tapani Pälli
f4280b740d glapi: add GL_OES_texture_storage_multisample_2d_array extension
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-27 10:52:46 +03:00
Nanley Chery
9a759a6ee0 swrast: add a new macro, FETCH_COMPRESSED
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
42ee16176d mesa: return bool instead of GLboolean in compressedteximage_only_format()
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.

Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
43d5b4db96 i965: refactor miptree alignment calculation code
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.

v2: reintroduce brw.
    don't include functional changes.
    don't adjust function parameters or create a new function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
a687734135 i965: change the meaning of cpp for compressed textures
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.

Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.

v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
    simplify cpp assignment in copy_image_with_blitter().
    update miptree width and offset variables in: intel_miptree_copy_slice(),
        intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
1a9ceed4ba i965: correct mt->align_h for 2D textures on Skylake
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.

v2: add newlines to separate declarations, statments, and comments.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
10ff64fd3d i965: use ALIGN_NPOT for setting ASTC mipmap layouts
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.

v2: handle texture arrays and LDR-only systems.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
54d2aa4258 mesa/macros: move ALIGN_NPOT to macros.h
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.

v2: add greater than 0 assertion (Anuj).
    convert the macro to a static inline function.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
97f4efd573 mesa/macros: add power-of-two assertions for alignment macros
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.

v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
8b1f008e9a i965/surface_formats: add support for 2D ASTC surface formats
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.

ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.

v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
    update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
cd49b97a8a mesa/teximage: return the base internal format of the ASTC formats
This is necesary to initialize the gl_texture_image struct.

From the KHR_texture_compression_astc_ldr spec:
  "Added to Section 3.8.6, Compressed Texture Images

   Add the tokens specified above to Table 3.16, Compressed Internal Formats.
   In all cases, the base internal format will be RGBA. The encoding allows
   images to be encoded with fewer channels, but this is always presented as
   RGBA to the sampler."

v2. use _mesa_is_astc_format().

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
12b519b457 mesa/teximage: accept ASTC formats for 3D texture specification
The ASTC spec was revised as follows:

   Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
   commands accepting ASTC format tokens in the New Tokens section [...].

Support only exists in the HDR submode:

   Add a second new column "3D Tex." which is empty for all non-ASTC
   formats. If only the LDR profile is supported by the implementation,
   this column is also empty for all ASTC formats. If both the LDR and HDR
   profiles are supported only, this column is checked for all ASTC
   formats.

LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.

v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
23c9cd5a96 mesa/texcompress: enable translation between MESA and GL ASTC formats
v3. conform the ASTC MESA_FORMAT enums to the existing naming convention.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:43 -07:00
Nanley Chery
692578ed13 mesa/glformats: recognize ASTC formats as compressed
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
4143511b15 mesa: add ASTC extensions to the extensions table
v2: alphabetize the extensions.
    remove OES ASTC extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
582ce1ea97 mesa: don't enable online compression for ASTC formats
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.

v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
e9fd8e154f glapi: add support for KHR_texture_compression_astc_ldr
v2: correct the spelling of the sRGB variants.
    remove spaces around "=" when setting the enum value.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Nanley Chery
8ae37365f3 mesa/formats: define the 2D ASTC formats
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().

v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
    only used for generating mipmaps - something ASTC formats do not support
    due to lack of online compression.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-26 14:36:42 -07:00
Ilia Mirkin
c4cbaca327 nouveau: avoid build failures since 0fc21ecf
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 14:04:41 -04:00
Marek Olšák
6924ecac77 gallium/radeon: read_registers should return bool meaning success or failure
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:20 +02:00
Marek Olšák
16e5d8ad38 radeonsi: add IB parser support for CP DMA packets
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
2c14a6d3b1 radeonsi: add IB tracing support for debug contexts
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
189953ee13 radeonsi: remove old CS tracing code
Some of it is left there and it will be re-used in the next commit.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
df6a5666b6 radeonsi: parse and dump status registers on GPU hang
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]

This may print too much information that we might not understand yet,
but some of the bits are very useful.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
61df4f0cd3 radeonsi: add an IB parser
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
be6dc87776 radeonsi: save the contents of indirect buffers for debug contexts
This will be used by the IB parser.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
a6a6c68955 radeonsi: generate register and packet tables for an IB parser from sid.h
This makes writing a good IB parser a lot easier.

It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
d15b71b4bd radeonsi: remove duplicated register definitions and instruction definitions
Instruction encoding isn't needed in Mesa.

The border color address registers were duplicated.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:19 +02:00
Marek Olšák
c59ad265df r600g,radeonsi: remove unused ill-formed register field definitions
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
110873ed11 radeonsi: add an initial dump_debug_state implementation dumping shaders
This is usually called after a draw call.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
93d97db349 radeonsi: allow si_dump_key to write to a file
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
525921ed51 gallium/ddebug: new pipe for hang detection and driver state dumping (v2)
v2: lots of improvements

This is like identity or trace, but simpler. It doesn't wrap most states.

Run with:
  GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.

If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
  /home/$username/dd_dumps/$processname_$pid_$index.

Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.

You can also do:
  GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.

Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.

v2: drop pipe-loader changes, drop SConscript
    rename dd.h -> dd_pipe.h

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
0fc21ecfc0 gallium: add flags parameter to pipe_screen::context_create
This allows creating compute-only and debug contexts.

Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Marek Olšák
7b5c92391f gallium: add an interface for dumping debug driver state
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-26 19:25:18 +02:00
Ilia Mirkin
a3b617a258 mesa: remove pointless es31 checks, fix indirect to only be in es31
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 12:37:38 -04:00
Ilia Mirkin
332fb341dd mesa: uncomment checks in es31 computation, add texture_ms
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
2015-08-26 12:37:17 -04:00
Marek Olšák
f432ae899f mesa: create multisample fallback textures like normal textures
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-08-26 15:42:26 +02:00
Grazvydas Ignotas
f8b01ae47c radeonsi: mark unreachable paths to avoid warnings
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 15:42:26 +02:00
Tapani Pälli
e0c2ea0337 mesa: GetTexLevelParameter{if}v changes for OpenGL ES 3.1
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.

v2: remove extra is_gles31() checks (suggested by Ilia)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-08-26 08:38:25 +03:00
Marta Lofstedt
ae8d0e7abe mesa/es3.1: Allow GL_COMPUTE_WORK_GROUP_SIZE for OpenGL ES 3.1
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Marta Lofstedt
c2a766880d mesa/es3.1: Enable getting MAX_COMPUTE_WORK_GROUP_ values for OpenGL ES 3.1
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-26 08:25:07 +03:00
Dave Airlie
73e5adc4b2 mesa/formats: pass correct parameter to _mesa_is_format_compressed
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date:   Fri Jul 31 10:26:36 2015 -0700

    mesa/formats: remove compressed formats from matching function

caused a regression in my CTS testing, this looks like a clear
thinko.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 14:13:27 +10:00
Roland Scheidegger
48e6404c04 gallium/auxiliary: optimize rgb9e5 helper some more
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:38 +02:00
Roland Scheidegger
941346a803 gallium/auxiliary: optimize rgb9e5 helper a bit
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
  getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2015-08-26 02:57:37 +02:00
Dave Airlie
c1452983b4 mesa/texgetimage: fix missing stencil check
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.

This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-26 10:22:09 +10:00
Nanley Chery
1d2a844e7d mesa/teximage: Add GL error parameter to _mesa_target_can_be_compressed
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.

Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:53:46 -07:00
Nanley Chery
26c549e69d mesa/formats: remove compressed formats from matching function
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.

v2. Since the exhaustive switch is removed, add a gtest to ensure
    all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
    fix formatting and misc improvements (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:17 -07:00
Nanley Chery
8e581747d2 mesa/formats: make format testing a gtest
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().

v2. indentation and conditional inclusion fixes (Chad).
    allow tests to continue running if any format fails
    and display the failing format name.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-25 15:45:13 -07:00
Kenneth Graunke
1bec29d04d gallium/ttn: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
78856194c1 prog_to_nir: Use nir_builder_insert() rather than poking at cf_list.
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly.  The proper way is to set the insertion point and
then simply insert things there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
5f14c417c8 nir: Use nir_shader::stage rather than passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke
d4d5b430a5 nir: Store gl_shader_stage in nir_shader.
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.

Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Jason Ekstrand
dfacae3a56 i965/fs: Combine assign_constant_locations and move_uniform_array_access_to_pull_constants
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c999a58f50 nir/lower_io: Remove assign_var_locations_direct_first
This is no longer used so we might as well get rid of it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
259f7291de i965/fs: Rework uniform handling
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect.  This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform.  This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate.  This should allow us
to more easily make decisions about what to push.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
cfa056c6a5 i965/vec4_nir: Get rid of the uniform_driver_location tracking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
ce5e9139aa nir/lower_io: Separate driver_location and base offset for uniforms
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
0db8e87b4a nir/intrinsics: Add a second const index to load_uniform
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path.  In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load.  Previously, it was completely flattened by
nir_lower_io which made things difficult.  This adds more information to
the intrinsic to make this easier for us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
6c33d6bbf9 nir: Pass a type_size() function pointer into nir_lower_io().
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play.  The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer.  This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
 - One side-effect of passing in a function pointer is that nir_lower_io is
   now aware of and properly allocates space for image uniforms, allowing
   us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
a23f82053d prog_to_nir: Don't allocate nir_variable with type vec4[0] for uniforms.
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.

Should avoid i965 backend assertions in future patches Jason and I are
working on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Kenneth Graunke
640c472fd0 i965: Move type_size() methods out of visitor classes.
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
c56899f41a i965: Make setup_vec4_uniform_value and _image_uniform_values take an offset
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand
8d8b8f5854 i965: Rename setup_vector_uniform_values to setup_vec4_uniform_value
The new name more accurately represents what it does: Set up a single vec4
uniform value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Rob Clark
0ab29751b6 freedreno/ir3: fix compile break after splitting out nir_control_flow.h
The commit:

  commit b49371b8ed
  Author:     Connor Abbott <cwabbott0@gmail.com>
  AuthorDate: Tue Jul 21 19:54:18 2015 -0700

      nir: move control flow modification to its own file

split out some control flow related APIs into a separate header, but did
not update drivers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:17:30 -04:00
Rob Clark
8b2d0bb844 freedreno/ir3: fix compile break after fxn->start_block removal
The commit:

  commit 8e0d4ef341
  Author:     Kenneth Graunke <kenneth@whitecape.org>
  AuthorDate: Thu Aug 6 18:18:40 2015 -0700

      nir: Delete the nir_function_impl::start_block field.

removed the start_block field without fixing up drivers..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-25 08:13:04 -04:00
Dave Airlie
529acab22a mesa: enable texture stencil8 for multisample
This fixes GL45-CTS.gtf44.GL31Tests.texture_stencil8.texture_stencil8_gl44
from the ogl conform suite.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-25 11:06:58 +10:00
Brian Paul
e089ca26e1 mesa: make _mesa_bind_texture_unit() static
It's only called from the file it's defined in.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-24 18:23:19 -06:00
Nanley Chery
8f378d1083 mesa/formats: store whether or not a format is sRGB in gl_format_info
v2: remove extra newline.
v3: use bool instead of GLboolean.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 16:08:01 -07:00
Kenneth Graunke
4f2cdd8497 nir: Use !block_ends_in_jump() in a few places rather than open-coding.
Connor introduced this helper recently; we should use it here too.

I had to move the function earlier in the file for it to be available.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-24 15:10:55 -07:00
Connor Abbott
d7971b41ce nir/cf: reimplement nir_cf_node_remove() using the new API
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
fc7f2d2364 nir/cf: add new control modification API's
These will help us do a number of things, including:

- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:

if (foo) {
    ...
}
if (!foo) {
    ...
}

with:

if (foo) {
    ...
} else {
    ...
}

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
476eb5e4a1 nir/cf: use a cursor for inserting control flow
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
d356f84d4c nir/cf: add split_block_cursor()
This is a helper that will be shared between the new control flow
insertion and modification code.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
58a360c6b8 nir/cf: add split_block_before_instr()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6e47a34b29 nir/cf: add a cursor structure
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().

In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6f5c81f86f nir/cf: fix link_blocks() when there are no successors
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
6d028749ac nir/cf: clean up jumps when cleaning up CF nodes
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
211c79515d nir/cf: remove uses of SSA definitions that are being deleted
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
633cbbc068 nir/cf: handle jumps better in stitch_blocks()
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
940873bf22 nir/cf: handle jumps in split_block_end()
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
f596e4021c nir/cf: add block_ends_in_jump()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
788d45cb47 nir/cf: handle phi nodes better in split_block_beginning()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
747ddc3cdd nir/cf: split up and improve nir_handle_remove_jumps()
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott
13482111d0 nir/cf: add remove_phi_src() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
f41e108d8b nir: add nir_foreach_phi_src_safe()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
762ae436ea nir/cf: add insert_phi_undef() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
b49371b8ed nir: move control flow modification to its own file
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
1c53f89696 nir: make cleanup_cf_node() not use remove_defs_uses()
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
9d5944053c nir: inline block_add_pred() a few places
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott
c7df141c71 nir/validate: check successors/predecessors more carefully
We should be checking almost everything now.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Kenneth Graunke
8e0d4ef341 nir: Delete the nir_function_impl::start_block field.
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly.  Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.

Thanks to Connor Abbott for suggesting this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-24 13:31:41 -07:00
Nanley Chery
9f00af672b mesa/formats: only do type and component lookup for uncompressed formats
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.

v2. include compressed format cases to avoid gcc warnings (Chad).

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-08-24 11:27:46 -07:00
Rob Clark
000e225360 freedreno/a4xx: formats update
Fixes glamor, which wants to use R8 integer textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:16:27 -04:00
Rob Clark
afb6c24a20 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-08-24 13:15:57 -04:00
Chris Wilson
4e5752e2b7 i965: Always re-emit the pipeline select during invariant state emission
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of

commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Wed Apr 22 11:43:50 2015 -0700

    i965/state: Emit pipeline select when changing pipelines

However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.

Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-08-24 08:57:55 +01:00
Marek Olšák
a83c36b5c0 Revert "radeon/winsys: increase the IB size for VM"
This reverts commit 567394112d.

It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-08-23 19:01:15 +02:00
Ilia Mirkin
e18c29b031 nv50: fix 2d engine blits for 64- and 128-bit formats
This fixes bin/ext_framebuffer_multisample-formats all_samples

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 03:12:07 -04:00
Ilia Mirkin
a6ad49cbbd nv50: account for the int RT0 rule for alpha-to-one/cov
Same as commit 1af0641db but for nvc0. If an integer texture is
bound to RT0, don't do alpha-to-one or alpha-to-coverage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 02:58:58 -04:00
Dave Airlie
45971fd0df mesa/arb_gpu_shader_fp64: add support for glGetUniformdv
This was missed when I did fp64, I've sent a piglit test to cover
the case as well.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-08-23 15:56:35 +10:00
Ilia Mirkin
abbf05cfc2 nv50,nvc0: disable depth bounds test on blit
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-08-23 01:39:29 -04:00
Neil Roberts
3a1ab23480 i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.

v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
    when gl_VertexID is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:25:39 -07:00
Neil Roberts
fb02b4ec48 i965: Swap the order of the vertex ID and edge flag attributes
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.

v2: Also implement for BDW+

v3 [by Ben]: Remove 10.5 tag. Too late.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2015-08-22 22:20:33 -07:00
Glenn Kennard
50932268aa r600g: Fix assert in tgsi_cmp
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=91726

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-08-23 09:31:12 +10:00
Alexander von Gluck IV
5abbd1cacc egl: scons: fix the haiku build, do not build the dri2 backend
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 10:13:31 -05:00
Emil Velikov
a8c5c62359 docs: add 11.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-08-22 13:28:16 +01:00
248 changed files with 6947 additions and 6398 deletions

View File

@@ -1 +1 @@
11.0.0-devel
11.1.0-devel

View File

@@ -2317,6 +2317,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
src/gallium/drivers/freedreno/Makefile
src/gallium/drivers/ddebug/Makefile
src/gallium/drivers/i915/Makefile
src/gallium/drivers/ilo/Makefile
src/gallium/drivers/llvmpipe/Makefile

View File

@@ -196,7 +196,7 @@ GL 4.5, GLSL 4.50:
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples not started
GL_ARB_texture_barrier DONE (nv50, nvc0, r600, radeonsi)
GL_KHR_context_flush_control DONE (all - but needs GLX/EXT extension to be useful)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robust_buffer_access_behavior not started
GL_KHR_robustness 90% done (the ARB variant)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)

60
docs/relnotes/11.1.0.html Normal file
View File

@@ -0,0 +1,60 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
<p>
Mesa 11.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 11.1.1.
</p>
<p>
Mesa 11.1.0 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
TBD.
</ul>
<h2>Bug fixes</h2>
TBD.
<h2>Changes</h2>
TBD.
</div>
</body>
</html>

View File

@@ -15,7 +15,6 @@ env.Append(CPPPATH = [
# parse Makefile.sources
egl_sources = env.ParseSourceList('Makefile.sources', 'LIBEGL_C_FILES')
egl_sources.append(env.ParseSourceList('Makefile.sources', 'dri2_backend_core_FILES'))
env.Append(CPPDEFINES = [
'_EGL_NATIVE_PLATFORM=_EGL_PLATFORM_HAIKU',

View File

@@ -11,6 +11,7 @@ SUBDIRS += auxiliary
##
SUBDIRS += \
drivers/ddebug \
drivers/noop \
drivers/trace \
drivers/rbug

View File

@@ -24,6 +24,7 @@
#include "util/ralloc.h"
#include "glsl/nir/nir.h"
#include "glsl/nir/nir_control_flow.h"
#include "glsl/nir/nir_builder.h"
#include "glsl/list.h"
#include "glsl/shader_enums.h"
@@ -64,24 +65,24 @@ struct ttn_compile {
nir_register *addr_reg;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* For each IF/ELSE/ENDIF block, if_stack[if_stack_pos] has where the else
* instructions should be placed, and if_stack[if_stack_pos - 1] has where
* the next instructions outside of the if/then/else block go.
*/
struct exec_list **if_stack;
nir_cursor *if_stack;
unsigned if_stack_pos;
/**
* Stack of cf_node_lists where instructions should be pushed as we pop
* Stack of nir_cursors where instructions should be pushed as we pop
* back out of the control flow stack.
*
* loop_stack[loop_stack_pos - 1] contains the cf_node_list for the outside
* of the loop.
*/
struct exec_list **loop_stack;
nir_cursor *loop_stack;
unsigned loop_stack_pos;
/* How many TGSI_FILE_IMMEDIATE vec4s have been parsed so far. */
@@ -307,7 +308,7 @@ ttn_emit_immediate(struct ttn_compile *c)
for (i = 0; i < 4; i++)
load_const->value.u[i] = tgsi_imm->u[i].Uint;
nir_instr_insert_after_cf_list(b->cf_node_list, &load_const->instr);
nir_builder_instr_insert(b, &load_const->instr);
}
static nir_src
@@ -363,7 +364,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->variables[0] = ttn_array_deref(c, load, var, offset, indirect);
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
@@ -414,7 +415,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
load->num_components = ncomp;
nir_ssa_dest_init(&load->instr, &load->dest, ncomp, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -476,7 +477,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned file, unsigned index,
srcn++;
}
nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
src = nir_src_for_ssa(&load->dest.ssa);
break;
@@ -552,7 +553,7 @@ ttn_get_dest(struct ttn_compile *c, struct tgsi_full_dst_register *tgsi_fdst)
load->dest = nir_dest_for_reg(reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
nir_builder_instr_insert(b, &load->instr);
} else {
assert(!tgsi_dst->Indirect);
dest.dest.reg.reg = c->temp_regs[index].reg;
@@ -667,7 +668,7 @@ ttn_alu(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
instr->src[i].src = nir_src_for_ssa(src[i]);
instr->dest = dest;
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -683,7 +684,7 @@ ttn_move_dest_masked(nir_builder *b, nir_alu_dest dest,
mov->src[0].src = nir_src_for_ssa(def);
for (unsigned i = def->num_components; i < 4; i++)
mov->src[0].swizzle[i] = def->num_components - 1;
nir_instr_insert_after_cf_list(b->cf_node_list, &mov->instr);
nir_builder_instr_insert(b, &mov->instr);
}
static void
@@ -902,7 +903,7 @@ ttn_kill(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
{
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -912,7 +913,7 @@ ttn_kill_if(nir_builder *b, nir_op op, nir_alu_dest dest, nir_ssa_def **src)
nir_intrinsic_instr *discard =
nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
discard->src[0] = nir_src_for_ssa(cmp);
nir_instr_insert_after_cf_list(b->cf_node_list, &discard->instr);
nir_builder_instr_insert(b, &discard->instr);
}
static void
@@ -921,7 +922,7 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
nir_builder *b = &c->build;
/* Save the outside-of-the-if-statement node list. */
c->if_stack[c->if_stack_pos] = b->cf_node_list;
c->if_stack[c->if_stack_pos] = b->cursor;
c->if_stack_pos++;
src = ttn_channel(b, src, X);
@@ -932,11 +933,11 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool is_uint)
} else {
if_stmt->condition = nir_src_for_ssa(nir_fne(b, src, nir_imm_int(b, 0)));
}
nir_cf_node_insert_end(b->cf_node_list, &if_stmt->cf_node);
nir_builder_cf_insert(b, &if_stmt->cf_node);
nir_builder_insert_after_cf_list(b, &if_stmt->then_list);
b->cursor = nir_after_cf_list(&if_stmt->then_list);
c->if_stack[c->if_stack_pos] = &if_stmt->else_list;
c->if_stack[c->if_stack_pos] = nir_after_cf_list(&if_stmt->else_list);
c->if_stack_pos++;
}
@@ -945,7 +946,7 @@ ttn_else(struct ttn_compile *c)
{
nir_builder *b = &c->build;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos - 1]);
b->cursor = c->if_stack[c->if_stack_pos - 1];
}
static void
@@ -954,7 +955,7 @@ ttn_endif(struct ttn_compile *c)
nir_builder *b = &c->build;
c->if_stack_pos -= 2;
nir_builder_insert_after_cf_list(b, c->if_stack[c->if_stack_pos]);
b->cursor = c->if_stack[c->if_stack_pos];
}
static void
@@ -963,27 +964,27 @@ ttn_bgnloop(struct ttn_compile *c)
nir_builder *b = &c->build;
/* Save the outside-of-the-loop node list. */
c->loop_stack[c->loop_stack_pos] = b->cf_node_list;
c->loop_stack[c->loop_stack_pos] = b->cursor;
c->loop_stack_pos++;
nir_loop *loop = nir_loop_create(b->shader);
nir_cf_node_insert_end(b->cf_node_list, &loop->cf_node);
nir_builder_cf_insert(b, &loop->cf_node);
nir_builder_insert_after_cf_list(b, &loop->body);
b->cursor = nir_after_cf_list(&loop->body);
}
static void
ttn_cont(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_continue);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
ttn_brk(nir_builder *b)
{
nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_break);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
}
static void
@@ -992,7 +993,7 @@ ttn_endloop(struct ttn_compile *c)
nir_builder *b = &c->build;
c->loop_stack_pos--;
nir_builder_insert_after_cf_list(b, c->loop_stack[c->loop_stack_pos]);
b->cursor = c->loop_stack[c->loop_stack_pos];
}
static void
@@ -1279,7 +1280,7 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
assert(src_number == num_srcs);
nir_ssa_dest_init(&instr->instr, &instr->dest, 4, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &instr->instr);
nir_builder_instr_insert(b, &instr->instr);
/* Resolve the writemask on the texture op. */
ttn_move_dest(b, dest, &instr->dest.ssa);
@@ -1318,10 +1319,10 @@ ttn_txq(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
txs->src[0].src_type = nir_tex_src_lod;
nir_ssa_dest_init(&txs->instr, &txs->dest, 3, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &txs->instr);
nir_builder_instr_insert(b, &txs->instr);
nir_ssa_dest_init(&qlv->instr, &qlv->dest, 1, NULL);
nir_instr_insert_after_cf_list(b->cf_node_list, &qlv->instr);
nir_builder_instr_insert(b, &qlv->instr);
ttn_move_dest_masked(b, dest, &txs->dest.ssa, TGSI_WRITEMASK_XYZ);
ttn_move_dest_masked(b, dest, &qlv->dest.ssa, TGSI_WRITEMASK_W);
@@ -1730,7 +1731,7 @@ ttn_emit_instruction(struct ttn_compile *c)
store->variables[0] = ttn_array_deref(c, store, var, offset, indirect);
store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
@@ -1759,11 +1760,26 @@ ttn_add_output_stores(struct ttn_compile *c)
store->const_index[0] = loc;
store->src[0].reg.reg = c->output_regs[loc].reg;
store->src[0].reg.base_offset = c->output_regs[loc].offset;
nir_instr_insert_after_cf_list(b->cf_node_list, &store->instr);
nir_builder_instr_insert(b, &store->instr);
}
}
}
static gl_shader_stage
tgsi_processor_to_shader_stage(unsigned processor)
{
switch (processor) {
case TGSI_PROCESSOR_FRAGMENT: return MESA_SHADER_FRAGMENT;
case TGSI_PROCESSOR_VERTEX: return MESA_SHADER_VERTEX;
case TGSI_PROCESSOR_GEOMETRY: return MESA_SHADER_GEOMETRY;
case TGSI_PROCESSOR_TESS_CTRL: return MESA_SHADER_TESS_CTRL;
case TGSI_PROCESSOR_TESS_EVAL: return MESA_SHADER_TESS_EVAL;
case TGSI_PROCESSOR_COMPUTE: return MESA_SHADER_COMPUTE;
default:
unreachable("invalid TGSI processor");
};
}
struct nir_shader *
tgsi_to_nir(const void *tgsi_tokens,
const nir_shader_compiler_options *options)
@@ -1775,17 +1791,19 @@ tgsi_to_nir(const void *tgsi_tokens,
int ret;
c = rzalloc(NULL, struct ttn_compile);
s = nir_shader_create(NULL, options);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
s = nir_shader_create(NULL, tgsi_processor_to_shader_stage(scan.processor),
options);
nir_function *func = nir_function_create(s, "main");
nir_function_overload *overload = nir_function_overload_create(func);
nir_function_impl *impl = nir_function_impl_create(overload);
nir_builder_init(&c->build, impl);
nir_builder_insert_after_cf_list(&c->build, &impl->body);
tgsi_scan_shader(tgsi_tokens, &scan);
c->scan = &scan;
c->build.cursor = nir_after_cf_list(&impl->body);
s->num_inputs = scan.file_max[TGSI_FILE_INPUT] + 1;
s->num_uniforms = scan.const_file_max[0] + 1;
@@ -1801,10 +1819,10 @@ tgsi_to_nir(const void *tgsi_tokens,
c->num_samp_types = scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1;
c->samp_types = rzalloc_array(c, nir_alu_type, c->num_samp_types);
c->if_stack = rzalloc_array(c, struct exec_list *,
c->if_stack = rzalloc_array(c, nir_cursor,
(scan.opcode_count[TGSI_OPCODE_IF] +
scan.opcode_count[TGSI_OPCODE_UIF]) * 2);
c->loop_stack = rzalloc_array(c, struct exec_list *,
c->loop_stack = rzalloc_array(c, nir_cursor,
scan.opcode_count[TGSI_OPCODE_BGNLOOP]);
ret = tgsi_parse_init(&parser, tgsi_tokens);

View File

@@ -11,6 +11,10 @@
* one or more debug driver: rbug, trace.
*/
#ifdef GALLIUM_DDEBUG
#include "ddebug/dd_public.h"
#endif
#ifdef GALLIUM_TRACE
#include "trace/tr_public.h"
#endif
@@ -30,6 +34,10 @@
static inline struct pipe_screen *
debug_screen_wrap(struct pipe_screen *screen)
{
#if defined(GALLIUM_DDEBUG)
screen = ddebug_screen_create(screen);
#endif
#if defined(GALLIUM_RBUG)
screen = rbug_screen_create(screen);
#endif

View File

@@ -372,30 +372,28 @@ void util_blitter_custom_resolve_color(struct blitter_context *blitter,
*
* States not listed here are not affected by util_blitter. */
static inline
void util_blitter_save_blend(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_blend(struct blitter_context *blitter, void *state)
{
blitter->saved_blend_state = state;
}
static inline
void util_blitter_save_depth_stencil_alpha(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_depth_stencil_alpha(struct blitter_context *blitter,
void *state)
{
blitter->saved_dsa_state = state;
}
static inline
void util_blitter_save_vertex_elements(struct blitter_context *blitter,
void *state)
static inline void
util_blitter_save_vertex_elements(struct blitter_context *blitter, void *state)
{
blitter->saved_velem_state = state;
}
static inline
void util_blitter_save_stencil_ref(struct blitter_context *blitter,
const struct pipe_stencil_ref *state)
static inline void
util_blitter_save_stencil_ref(struct blitter_context *blitter,
const struct pipe_stencil_ref *state)
{
blitter->saved_stencil_ref = *state;
}
@@ -407,23 +405,20 @@ void util_blitter_save_rasterizer(struct blitter_context *blitter,
blitter->saved_rs_state = state;
}
static inline
void util_blitter_save_fragment_shader(struct blitter_context *blitter,
void *fs)
static inline void
util_blitter_save_fragment_shader(struct blitter_context *blitter, void *fs)
{
blitter->saved_fs = fs;
}
static inline
void util_blitter_save_vertex_shader(struct blitter_context *blitter,
void *vs)
static inline void
util_blitter_save_vertex_shader(struct blitter_context *blitter, void *vs)
{
blitter->saved_vs = vs;
}
static inline
void util_blitter_save_geometry_shader(struct blitter_context *blitter,
void *gs)
static inline void
util_blitter_save_geometry_shader(struct blitter_context *blitter, void *gs)
{
blitter->saved_gs = gs;
}
@@ -442,24 +437,24 @@ util_blitter_save_tesseval_shader(struct blitter_context *blitter,
blitter->saved_tes = sh;
}
static inline
void util_blitter_save_framebuffer(struct blitter_context *blitter,
const struct pipe_framebuffer_state *state)
static inline void
util_blitter_save_framebuffer(struct blitter_context *blitter,
const struct pipe_framebuffer_state *state)
{
blitter->saved_fb_state.nr_cbufs = 0; /* It's ~0 now, meaning it's unsaved. */
util_copy_framebuffer_state(&blitter->saved_fb_state, state);
}
static inline
void util_blitter_save_viewport(struct blitter_context *blitter,
struct pipe_viewport_state *state)
static inline void
util_blitter_save_viewport(struct blitter_context *blitter,
struct pipe_viewport_state *state)
{
blitter->saved_viewport = *state;
}
static inline
void util_blitter_save_scissor(struct blitter_context *blitter,
struct pipe_scissor_state *state)
static inline void
util_blitter_save_scissor(struct blitter_context *blitter,
struct pipe_scissor_state *state)
{
blitter->saved_scissor = *state;
}

View File

@@ -41,6 +41,7 @@
#include "util/u_tile.h"
#include "util/u_prim.h"
#include "util/u_surface.h"
#include <inttypes.h>
#include <stdio.h>
#include <limits.h> /* CHAR_BIT */
@@ -275,7 +276,7 @@ debug_get_flags_option(const char *name,
for (; flags->name; ++flags)
namealign = MAX2(namealign, strlen(flags->name));
for (flags = orig; flags->name; ++flags)
_debug_printf("| %*s [0x%0*lx]%s%s\n", namealign, flags->name,
_debug_printf("| %*s [0x%0*"PRIu64"]%s%s\n", namealign, flags->name,
(int)sizeof(uint64_t)*CHAR_BIT/4, flags->value,
flags->desc ? " " : "", flags->desc ? flags->desc : "");
}
@@ -290,9 +291,9 @@ debug_get_flags_option(const char *name,
if (debug_get_option_should_print()) {
if (str) {
debug_printf("%s: %s = 0x%lx (%s)\n", __FUNCTION__, name, result, str);
debug_printf("%s: %s = 0x%"PRIu64" (%s)\n", __FUNCTION__, name, result, str);
} else {
debug_printf("%s: %s = 0x%lx\n", __FUNCTION__, name, result);
debug_printf("%s: %s = 0x%"PRIu64"\n", __FUNCTION__, name, result);
}
}

View File

@@ -21,7 +21,8 @@
* DEALINGS IN THE SOFTWARE.
*/
/* Copied from EXT_texture_shared_exponent and edited. */
/* Copied from EXT_texture_shared_exponent and edited, getting rid of
* expensive float math bits too. */
#ifndef RGB9E5_H
#define RGB9E5_H
@@ -39,7 +40,6 @@
#define RGB9E5_MANTISSA_VALUES (1<<RGB9E5_MANTISSA_BITS)
#define MAX_RGB9E5_MANTISSA (RGB9E5_MANTISSA_VALUES-1)
#define MAX_RGB9E5 (((float)MAX_RGB9E5_MANTISSA)/RGB9E5_MANTISSA_VALUES * (1<<MAX_RGB9E5_EXP))
#define EPSILON_RGB9E5 ((1.0/RGB9E5_MANTISSA_VALUES) / (1<<RGB9E5_EXP_BIAS))
typedef union {
unsigned int raw;
@@ -74,63 +74,59 @@ typedef union {
} field;
} rgb9e5;
static inline float rgb9e5_ClampRange(float x)
{
if (x > 0.0f) {
if (x >= MAX_RGB9E5) {
return MAX_RGB9E5;
} else {
return x;
}
} else {
/* NaN gets here too since comparisons with NaN always fail! */
return 0.0;
}
}
/* Ok, FloorLog2 is not correct for the denorm and zero values, but we
are going to do a max of this value with the minimum rgb9e5 exponent
that will hide these problem cases. */
static inline int rgb9e5_FloorLog2(float x)
static inline int rgb9e5_ClampRange(float x)
{
float754 f;
float754 max;
f.value = x;
return (f.field.biasedexponent - 127);
max.value = MAX_RGB9E5;
if (f.raw > 0x7f800000)
/* catches neg, NaNs */
return 0;
else if (f.raw >= max.raw)
return max.raw;
else
return f.raw;
}
static inline unsigned float3_to_rgb9e5(const float rgb[3])
{
rgb9e5 retval;
float maxrgb;
int rm, gm, bm;
float rc, gc, bc;
int exp_shared, maxm;
double denom;
int rm, gm, bm, exp_shared;
float754 revdenom = {0};
float754 rc, bc, gc, maxrgb;
rc = rgb9e5_ClampRange(rgb[0]);
gc = rgb9e5_ClampRange(rgb[1]);
bc = rgb9e5_ClampRange(rgb[2]);
rc.raw = rgb9e5_ClampRange(rgb[0]);
gc.raw = rgb9e5_ClampRange(rgb[1]);
bc.raw = rgb9e5_ClampRange(rgb[2]);
maxrgb.raw = MAX3(rc.raw, gc.raw, bc.raw);
maxrgb = MAX3(rc, gc, bc);
exp_shared = MAX2(-RGB9E5_EXP_BIAS-1, rgb9e5_FloorLog2(maxrgb)) + 1 + RGB9E5_EXP_BIAS;
/*
* Compared to what the spec suggests, instead of conditionally adjusting
* the exponent after the fact do it here by doing the equivalent of +0.5 -
* the int add will spill over into the exponent in this case.
*/
maxrgb.raw += maxrgb.raw & (1 << (23-9));
exp_shared = MAX2((maxrgb.raw >> 23), -RGB9E5_EXP_BIAS - 1 + 127) +
1 + RGB9E5_EXP_BIAS - 127;
revdenom.field.biasedexponent = 127 - (exp_shared - RGB9E5_EXP_BIAS -
RGB9E5_MANTISSA_BITS) + 1;
assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP);
assert(exp_shared >= 0);
/* This exp2 function could be replaced by a table. */
denom = exp2(exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS);
maxm = (int) floor(maxrgb / denom + 0.5);
if (maxm == MAX_RGB9E5_MANTISSA+1) {
denom *= 2;
exp_shared += 1;
assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP);
} else {
assert(maxm <= MAX_RGB9E5_MANTISSA);
}
rm = (int) floor(rc / denom + 0.5);
gm = (int) floor(gc / denom + 0.5);
bm = (int) floor(bc / denom + 0.5);
/*
* The spec uses strict round-up behavior (d3d10 disagrees, but in any case
* must match what is done above for figuring out exponent).
* We avoid the doubles ((int) rc * revdenom + 0.5) by doing the rounding
* ourselves (revdenom was adjusted by +1, above).
*/
rm = (int) (rc.value * revdenom.value);
gm = (int) (gc.value * revdenom.value);
bm = (int) (bc.value * revdenom.value);
rm = (rm & 1) + (rm >> 1);
gm = (gm & 1) + (gm >> 1);
bm = (bm & 1) + (bm >> 1);
assert(rm <= MAX_RGB9E5_MANTISSA);
assert(gm <= MAX_RGB9E5_MANTISSA);
@@ -151,15 +147,15 @@ static inline void rgb9e5_to_float3(unsigned rgb, float retval[3])
{
rgb9e5 v;
int exponent;
float scale;
float754 scale = {0};
v.raw = rgb;
exponent = v.field.biasedexponent - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS;
scale = exp2f(exponent);
scale.field.biasedexponent = exponent + 127;
retval[0] = v.field.r * scale;
retval[1] = v.field.g * scale;
retval[2] = v.field.b * scale;
retval[0] = v.field.r * scale.value;
retval[1] = v.field.g * scale.value;
retval[2] = v.field.b * scale.value;
}
#endif

View File

@@ -457,7 +457,7 @@ null_constant_buffer(struct pipe_context *ctx)
void
util_run_tests(struct pipe_screen *screen)
{
struct pipe_context *ctx = screen->context_create(screen, NULL);
struct pipe_context *ctx = screen->context_create(screen, NULL, 0);
tgsi_vs_window_space_position(ctx);
null_sampler_view(ctx, TGSI_TEXTURE_2D);

View File

@@ -1120,7 +1120,7 @@ vl_create_mpeg12_decoder(struct pipe_context *context,
dec->base = *templat;
dec->base.context = context;
dec->context = context->screen->context_create(context->screen, NULL);
dec->context = context->screen->context_create(context->screen, NULL, 0);
dec->base.destroy = vl_mpeg12_destroy;
dec->base.begin_frame = vl_mpeg12_begin_frame;

View File

@@ -0,0 +1,9 @@
include Makefile.sources
include $(top_srcdir)/src/gallium/Automake.inc
AM_CFLAGS = \
$(GALLIUM_DRIVER_CFLAGS)
noinst_LTLIBRARIES = libddebug.la
libddebug_la_SOURCES = $(C_SOURCES)

View File

@@ -0,0 +1,6 @@
C_SOURCES := \
dd_pipe.h \
dd_public.h \
dd_context.c \
dd_draw.c \
dd_screen.c

View File

@@ -0,0 +1,771 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "tgsi/tgsi_parse.h"
#include "util/u_memory.h"
static void
safe_memcpy(void *dst, const void *src, size_t size)
{
if (src)
memcpy(dst, src, size);
else
memset(dst, 0, size);
}
/********************************************************************
* queries
*/
static struct dd_query *
dd_query(struct pipe_query *query)
{
return (struct dd_query *)query;
}
static struct pipe_query *
dd_query_unwrap(struct pipe_query *query)
{
if (query) {
return dd_query(query)->query;
} else {
return NULL;
}
}
static struct pipe_query *
dd_context_create_query(struct pipe_context *_pipe, unsigned query_type,
unsigned index)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_query *query;
query = pipe->create_query(pipe, query_type, index);
/* Wrap query object. */
if (query) {
struct dd_query *dd_query = CALLOC_STRUCT(dd_query);
if (dd_query) {
dd_query->type = query_type;
dd_query->query = query;
query = (struct pipe_query *)dd_query;
} else {
pipe->destroy_query(pipe, query);
query = NULL;
}
}
return query;
}
static void
dd_context_destroy_query(struct pipe_context *_pipe,
struct pipe_query *query)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->destroy_query(pipe, dd_query_unwrap(query));
FREE(query);
}
static boolean
dd_context_begin_query(struct pipe_context *_pipe, struct pipe_query *query)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
return pipe->begin_query(pipe, dd_query_unwrap(query));
}
static void
dd_context_end_query(struct pipe_context *_pipe, struct pipe_query *query)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->end_query(pipe, dd_query_unwrap(query));
}
static boolean
dd_context_get_query_result(struct pipe_context *_pipe,
struct pipe_query *query, boolean wait,
union pipe_query_result *result)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_query_result(pipe, dd_query_unwrap(query), wait, result);
}
static void
dd_context_render_condition(struct pipe_context *_pipe,
struct pipe_query *query, boolean condition,
uint mode)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->render_condition(pipe, dd_query_unwrap(query), condition, mode);
dctx->render_cond.query = dd_query(query);
dctx->render_cond.condition = condition;
dctx->render_cond.mode = mode;
}
/********************************************************************
* constant (immutable) non-shader states
*/
#define DD_CSO_CREATE(name, shortname) \
static void * \
dd_context_create_##name##_state(struct pipe_context *_pipe, \
const struct pipe_##name##_state *state) \
{ \
struct pipe_context *pipe = dd_context(_pipe)->pipe; \
struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
\
if (!hstate) \
return NULL; \
hstate->cso = pipe->create_##name##_state(pipe, state); \
hstate->state.shortname = *state; \
return hstate; \
}
#define DD_CSO_BIND(name, shortname) \
static void \
dd_context_bind_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
dctx->shortname = hstate; \
pipe->bind_##name##_state(pipe, hstate ? hstate->cso : NULL); \
}
#define DD_CSO_DELETE(name) \
static void \
dd_context_delete_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
pipe->delete_##name##_state(pipe, hstate->cso); \
FREE(hstate); \
}
#define DD_CSO_WHOLE(name, shortname) \
DD_CSO_CREATE(name, shortname) \
DD_CSO_BIND(name, shortname) \
DD_CSO_DELETE(name)
DD_CSO_WHOLE(blend, blend)
DD_CSO_WHOLE(rasterizer, rs)
DD_CSO_WHOLE(depth_stencil_alpha, dsa)
DD_CSO_CREATE(sampler, sampler)
DD_CSO_DELETE(sampler)
static void
dd_context_bind_sampler_states(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned count, void **states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
memcpy(&dctx->sampler_states[shader][start], states,
sizeof(void*) * count);
if (states) {
void *samp[PIPE_MAX_SAMPLERS];
int i;
for (i = 0; i < count; i++) {
struct dd_state *s = states[i];
samp[i] = s ? s->cso : NULL;
}
pipe->bind_sampler_states(pipe, shader, start, count, samp);
}
else
pipe->bind_sampler_states(pipe, shader, start, count, NULL);
}
static void *
dd_context_create_vertex_elements_state(struct pipe_context *_pipe,
unsigned num_elems,
const struct pipe_vertex_element *elems)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct dd_state *hstate = CALLOC_STRUCT(dd_state);
if (!hstate)
return NULL;
hstate->cso = pipe->create_vertex_elements_state(pipe, num_elems, elems);
memcpy(hstate->state.velems.velems, elems, sizeof(elems[0]) * num_elems);
hstate->state.velems.count = num_elems;
return hstate;
}
DD_CSO_BIND(vertex_elements, velems)
DD_CSO_DELETE(vertex_elements)
/********************************************************************
* shaders
*/
#define DD_SHADER(NAME, name) \
static void * \
dd_context_create_##name##_state(struct pipe_context *_pipe, \
const struct pipe_shader_state *state) \
{ \
struct pipe_context *pipe = dd_context(_pipe)->pipe; \
struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
\
if (!hstate) \
return NULL; \
hstate->cso = pipe->create_##name##_state(pipe, state); \
hstate->state.shader = *state; \
hstate->state.shader.tokens = tgsi_dup_tokens(state->tokens); \
return hstate; \
} \
\
static void \
dd_context_bind_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
dctx->shaders[PIPE_SHADER_##NAME] = hstate; \
pipe->bind_##name##_state(pipe, hstate ? hstate->cso : NULL); \
} \
\
static void \
dd_context_delete_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
struct dd_state *hstate = state; \
\
pipe->delete_##name##_state(pipe, hstate->cso); \
tgsi_free_tokens(hstate->state.shader.tokens); \
FREE(hstate); \
}
DD_SHADER(FRAGMENT, fs)
DD_SHADER(VERTEX, vs)
DD_SHADER(GEOMETRY, gs)
DD_SHADER(TESS_CTRL, tcs)
DD_SHADER(TESS_EVAL, tes)
/********************************************************************
* immediate states
*/
#define DD_IMM_STATE(name, type, deref, ref) \
static void \
dd_context_set_##name(struct pipe_context *_pipe, type deref) \
{ \
struct dd_context *dctx = dd_context(_pipe); \
struct pipe_context *pipe = dctx->pipe; \
\
dctx->name = deref; \
pipe->set_##name(pipe, ref); \
}
DD_IMM_STATE(blend_color, const struct pipe_blend_color, *state, state)
DD_IMM_STATE(stencil_ref, const struct pipe_stencil_ref, *state, state)
DD_IMM_STATE(clip_state, const struct pipe_clip_state, *state, state)
DD_IMM_STATE(sample_mask, unsigned, sample_mask, sample_mask)
DD_IMM_STATE(min_samples, unsigned, min_samples, min_samples)
DD_IMM_STATE(framebuffer_state, const struct pipe_framebuffer_state, *state, state)
DD_IMM_STATE(polygon_stipple, const struct pipe_poly_stipple, *state, state)
static void
dd_context_set_constant_buffer(struct pipe_context *_pipe,
uint shader, uint index,
struct pipe_constant_buffer *constant_buffer)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->constant_buffers[shader][index], constant_buffer,
sizeof(*constant_buffer));
pipe->set_constant_buffer(pipe, shader, index, constant_buffer);
}
static void
dd_context_set_scissor_states(struct pipe_context *_pipe,
unsigned start_slot, unsigned num_scissors,
const struct pipe_scissor_state *states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->scissors[start_slot], states,
sizeof(*states) * num_scissors);
pipe->set_scissor_states(pipe, start_slot, num_scissors, states);
}
static void
dd_context_set_viewport_states(struct pipe_context *_pipe,
unsigned start_slot, unsigned num_viewports,
const struct pipe_viewport_state *states)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->viewports[start_slot], states,
sizeof(*states) * num_viewports);
pipe->set_viewport_states(pipe, start_slot, num_viewports, states);
}
static void dd_context_set_tess_state(struct pipe_context *_pipe,
const float default_outer_level[4],
const float default_inner_level[2])
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
memcpy(dctx->tess_default_levels, default_outer_level, sizeof(float) * 4);
memcpy(dctx->tess_default_levels+4, default_inner_level, sizeof(float) * 2);
pipe->set_tess_state(pipe, default_outer_level, default_inner_level);
}
/********************************************************************
* views
*/
static struct pipe_surface *
dd_context_create_surface(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_surface *surf_tmpl)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_surface *view =
pipe->create_surface(pipe, resource, surf_tmpl);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_surface_destroy(struct pipe_context *_pipe,
struct pipe_surface *surf)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->surface_destroy(pipe, surf);
}
static struct pipe_sampler_view *
dd_context_create_sampler_view(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_sampler_view *templ)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_sampler_view *view =
pipe->create_sampler_view(pipe, resource, templ);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_sampler_view_destroy(struct pipe_context *_pipe,
struct pipe_sampler_view *view)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->sampler_view_destroy(pipe, view);
}
static struct pipe_image_view *
dd_context_create_image_view(struct pipe_context *_pipe,
struct pipe_resource *resource,
const struct pipe_image_view *templ)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_image_view *view =
pipe->create_image_view(pipe, resource, templ);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_image_view_destroy(struct pipe_context *_pipe,
struct pipe_image_view *view)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->image_view_destroy(pipe, view);
}
static struct pipe_stream_output_target *
dd_context_create_stream_output_target(struct pipe_context *_pipe,
struct pipe_resource *res,
unsigned buffer_offset,
unsigned buffer_size)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
struct pipe_stream_output_target *view =
pipe->create_stream_output_target(pipe, res, buffer_offset,
buffer_size);
if (!view)
return NULL;
view->context = _pipe;
return view;
}
static void
dd_context_stream_output_target_destroy(struct pipe_context *_pipe,
struct pipe_stream_output_target *target)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->stream_output_target_destroy(pipe, target);
}
/********************************************************************
* set states
*/
static void
dd_context_set_sampler_views(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num,
struct pipe_sampler_view **views)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->sampler_views[shader][start], views,
sizeof(views[0]) * num);
pipe->set_sampler_views(pipe, shader, start, num, views);
}
static void
dd_context_set_shader_images(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num,
struct pipe_image_view **views)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->shader_images[shader][start], views,
sizeof(views[0]) * num);
pipe->set_shader_images(pipe, shader, start, num, views);
}
static void
dd_context_set_shader_buffers(struct pipe_context *_pipe, unsigned shader,
unsigned start, unsigned num_buffers,
struct pipe_shader_buffer *buffers)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->shader_buffers[shader][start], buffers,
sizeof(buffers[0]) * num_buffers);
pipe->set_shader_buffers(pipe, shader, start, num_buffers, buffers);
}
static void
dd_context_set_vertex_buffers(struct pipe_context *_pipe,
unsigned start, unsigned num_buffers,
const struct pipe_vertex_buffer *buffers)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->vertex_buffers[start], buffers,
sizeof(buffers[0]) * num_buffers);
pipe->set_vertex_buffers(pipe, start, num_buffers, buffers);
}
static void
dd_context_set_index_buffer(struct pipe_context *_pipe,
const struct pipe_index_buffer *ib)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
safe_memcpy(&dctx->index_buffer, ib, sizeof(*ib));
pipe->set_index_buffer(pipe, ib);
}
static void
dd_context_set_stream_output_targets(struct pipe_context *_pipe,
unsigned num_targets,
struct pipe_stream_output_target **tgs,
const unsigned *offsets)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
dctx->num_so_targets = num_targets;
safe_memcpy(dctx->so_targets, tgs, sizeof(*tgs) * num_targets);
safe_memcpy(dctx->so_offsets, offsets, sizeof(*offsets) * num_targets);
pipe->set_stream_output_targets(pipe, num_targets, tgs, offsets);
}
static void
dd_context_destroy(struct pipe_context *_pipe)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
pipe->destroy(pipe);
FREE(dctx);
}
/********************************************************************
* transfer
*/
static void *
dd_context_transfer_map(struct pipe_context *_pipe,
struct pipe_resource *resource, unsigned level,
unsigned usage, const struct pipe_box *box,
struct pipe_transfer **transfer)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->transfer_map(pipe, resource, level, usage, box, transfer);
}
static void
dd_context_transfer_flush_region(struct pipe_context *_pipe,
struct pipe_transfer *transfer,
const struct pipe_box *box)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_flush_region(pipe, transfer, box);
}
static void
dd_context_transfer_unmap(struct pipe_context *_pipe,
struct pipe_transfer *transfer)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_unmap(pipe, transfer);
}
static void
dd_context_transfer_inline_write(struct pipe_context *_pipe,
struct pipe_resource *resource,
unsigned level, unsigned usage,
const struct pipe_box *box,
const void *data, unsigned stride,
unsigned layer_stride)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->transfer_inline_write(pipe, resource, level, usage, box, data,
stride, layer_stride);
}
/********************************************************************
* miscellaneous
*/
static void
dd_context_texture_barrier(struct pipe_context *_pipe)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->texture_barrier(pipe);
}
static void
dd_context_memory_barrier(struct pipe_context *_pipe, unsigned flags)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->memory_barrier(pipe, flags);
}
static void
dd_context_get_sample_position(struct pipe_context *_pipe,
unsigned sample_count, unsigned sample_index,
float *out_value)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_sample_position(pipe, sample_count, sample_index,
out_value);
}
static void
dd_context_invalidate_resource(struct pipe_context *_pipe,
struct pipe_resource *resource)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
pipe->invalidate_resource(pipe, resource);
}
static enum pipe_reset_status
dd_context_get_device_reset_status(struct pipe_context *_pipe)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->get_device_reset_status(pipe);
}
static void
dd_context_dump_debug_state(struct pipe_context *_pipe, FILE *stream,
unsigned flags)
{
struct pipe_context *pipe = dd_context(_pipe)->pipe;
return pipe->dump_debug_state(pipe, stream, flags);
}
struct pipe_context *
dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe)
{
struct dd_context *dctx;
if (!pipe)
return NULL;
dctx = CALLOC_STRUCT(dd_context);
if (!dctx) {
pipe->destroy(pipe);
return NULL;
}
dctx->pipe = pipe;
dctx->base.priv = pipe->priv; /* expose wrapped priv data */
dctx->base.screen = &dscreen->base;
dctx->base.destroy = dd_context_destroy;
CTX_INIT(render_condition);
CTX_INIT(create_query);
CTX_INIT(destroy_query);
CTX_INIT(begin_query);
CTX_INIT(end_query);
CTX_INIT(get_query_result);
CTX_INIT(create_blend_state);
CTX_INIT(bind_blend_state);
CTX_INIT(delete_blend_state);
CTX_INIT(create_sampler_state);
CTX_INIT(bind_sampler_states);
CTX_INIT(delete_sampler_state);
CTX_INIT(create_rasterizer_state);
CTX_INIT(bind_rasterizer_state);
CTX_INIT(delete_rasterizer_state);
CTX_INIT(create_depth_stencil_alpha_state);
CTX_INIT(bind_depth_stencil_alpha_state);
CTX_INIT(delete_depth_stencil_alpha_state);
CTX_INIT(create_fs_state);
CTX_INIT(bind_fs_state);
CTX_INIT(delete_fs_state);
CTX_INIT(create_vs_state);
CTX_INIT(bind_vs_state);
CTX_INIT(delete_vs_state);
CTX_INIT(create_gs_state);
CTX_INIT(bind_gs_state);
CTX_INIT(delete_gs_state);
CTX_INIT(create_tcs_state);
CTX_INIT(bind_tcs_state);
CTX_INIT(delete_tcs_state);
CTX_INIT(create_tes_state);
CTX_INIT(bind_tes_state);
CTX_INIT(delete_tes_state);
CTX_INIT(create_vertex_elements_state);
CTX_INIT(bind_vertex_elements_state);
CTX_INIT(delete_vertex_elements_state);
CTX_INIT(set_blend_color);
CTX_INIT(set_stencil_ref);
CTX_INIT(set_sample_mask);
CTX_INIT(set_min_samples);
CTX_INIT(set_clip_state);
CTX_INIT(set_constant_buffer);
CTX_INIT(set_framebuffer_state);
CTX_INIT(set_polygon_stipple);
CTX_INIT(set_scissor_states);
CTX_INIT(set_viewport_states);
CTX_INIT(set_sampler_views);
CTX_INIT(set_tess_state);
CTX_INIT(set_shader_buffers);
CTX_INIT(set_shader_images);
CTX_INIT(set_vertex_buffers);
CTX_INIT(set_index_buffer);
CTX_INIT(create_stream_output_target);
CTX_INIT(stream_output_target_destroy);
CTX_INIT(set_stream_output_targets);
CTX_INIT(create_sampler_view);
CTX_INIT(sampler_view_destroy);
CTX_INIT(create_surface);
CTX_INIT(surface_destroy);
CTX_INIT(create_image_view);
CTX_INIT(image_view_destroy);
CTX_INIT(transfer_map);
CTX_INIT(transfer_flush_region);
CTX_INIT(transfer_unmap);
CTX_INIT(transfer_inline_write);
CTX_INIT(texture_barrier);
CTX_INIT(memory_barrier);
/* create_video_codec */
/* create_video_buffer */
/* create_compute_state */
/* bind_compute_state */
/* delete_compute_state */
/* set_compute_resources */
/* set_global_binding */
CTX_INIT(get_sample_position);
CTX_INIT(invalidate_resource);
CTX_INIT(get_device_reset_status);
CTX_INIT(dump_debug_state);
dd_init_draw_functions(dctx);
dctx->sample_mask = ~0;
return &dctx->base;
}

View File

@@ -0,0 +1,807 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "util/u_dump.h"
#include "util/u_format.h"
#include "tgsi/tgsi_scan.h"
#include "os/os_process.h"
#include <errno.h>
#include <sys/stat.h>
enum call_type
{
CALL_DRAW_VBO,
CALL_RESOURCE_COPY_REGION,
CALL_BLIT,
CALL_FLUSH_RESOURCE,
CALL_CLEAR,
CALL_CLEAR_BUFFER,
CALL_CLEAR_RENDER_TARGET,
CALL_CLEAR_DEPTH_STENCIL,
};
struct call_resource_copy_region
{
struct pipe_resource *dst;
unsigned dst_level;
unsigned dstx, dsty, dstz;
struct pipe_resource *src;
unsigned src_level;
const struct pipe_box *src_box;
};
struct call_clear
{
unsigned buffers;
const union pipe_color_union *color;
double depth;
unsigned stencil;
};
struct call_clear_buffer
{
struct pipe_resource *res;
unsigned offset;
unsigned size;
const void *clear_value;
int clear_value_size;
};
struct dd_call
{
enum call_type type;
union {
struct pipe_draw_info draw_vbo;
struct call_resource_copy_region resource_copy_region;
struct pipe_blit_info blit;
struct pipe_resource *flush_resource;
struct call_clear clear;
struct call_clear_buffer clear_buffer;
} info;
};
static FILE *
dd_get_file_stream(struct dd_context *dctx)
{
struct pipe_screen *screen = dctx->pipe->screen;
static unsigned index;
char proc_name[128], dir[256], name[512];
FILE *f;
if (!os_get_process_name(proc_name, sizeof(proc_name))) {
fprintf(stderr, "dd: can't get the process name\n");
return NULL;
}
snprintf(dir, sizeof(dir), "%s/"DD_DIR, debug_get_option("HOME", "."));
if (mkdir(dir, 0774) && errno != EEXIST) {
fprintf(stderr, "dd: can't create a directory (%i)\n", errno);
return NULL;
}
snprintf(name, sizeof(name), "%s/%s_%u_%08u", dir, proc_name, getpid(), index++);
f = fopen(name, "w");
if (!f) {
fprintf(stderr, "dd: can't open file %s\n", name);
return NULL;
}
fprintf(f, "Driver vendor: %s\n", screen->get_vendor(screen));
fprintf(f, "Device vendor: %s\n", screen->get_device_vendor(screen));
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
return f;
}
static void
dd_close_file_stream(FILE *f)
{
fclose(f);
}
static unsigned
dd_num_active_viewports(struct dd_context *dctx)
{
struct tgsi_shader_info info;
const struct tgsi_token *tokens;
if (dctx->shaders[PIPE_SHADER_GEOMETRY])
tokens = dctx->shaders[PIPE_SHADER_GEOMETRY]->state.shader.tokens;
else if (dctx->shaders[PIPE_SHADER_TESS_EVAL])
tokens = dctx->shaders[PIPE_SHADER_TESS_EVAL]->state.shader.tokens;
else if (dctx->shaders[PIPE_SHADER_VERTEX])
tokens = dctx->shaders[PIPE_SHADER_VERTEX]->state.shader.tokens;
else
return 1;
tgsi_scan_shader(tokens, &info);
return info.writes_viewport_index ? PIPE_MAX_VIEWPORTS : 1;
}
#define COLOR_RESET "\033[0m"
#define COLOR_SHADER "\033[1;32m"
#define COLOR_STATE "\033[1;33m"
#define DUMP(name, var) do { \
fprintf(f, COLOR_STATE #name ": " COLOR_RESET); \
util_dump_##name(f, var); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_I(name, var, i) do { \
fprintf(f, COLOR_STATE #name " %i: " COLOR_RESET, i); \
util_dump_##name(f, var); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_M(name, var, member) do { \
fprintf(f, " " #member ": "); \
util_dump_##name(f, (var)->member); \
fprintf(f, "\n"); \
} while(0)
#define DUMP_M_ADDR(name, var, member) do { \
fprintf(f, " " #member ": "); \
util_dump_##name(f, &(var)->member); \
fprintf(f, "\n"); \
} while(0)
static void
print_named_value(FILE *f, const char *name, int value)
{
fprintf(f, COLOR_STATE "%s" COLOR_RESET " = %i\n", name, value);
}
static void
print_named_xvalue(FILE *f, const char *name, int value)
{
fprintf(f, COLOR_STATE "%s" COLOR_RESET " = 0x%08x\n", name, value);
}
static void
util_dump_uint(FILE *f, unsigned i)
{
fprintf(f, "%u", i);
}
static void
util_dump_hex(FILE *f, unsigned i)
{
fprintf(f, "0x%x", i);
}
static void
util_dump_double(FILE *f, double d)
{
fprintf(f, "%f", d);
}
static void
util_dump_format(FILE *f, enum pipe_format format)
{
fprintf(f, "%s", util_format_name(format));
}
static void
util_dump_color_union(FILE *f, const union pipe_color_union *color)
{
fprintf(f, "{f = {%f, %f, %f, %f}, ui = {%u, %u, %u, %u}",
color->f[0], color->f[1], color->f[2], color->f[3],
color->ui[0], color->ui[1], color->ui[2], color->ui[3]);
}
static void
util_dump_query(FILE *f, struct dd_query *query)
{
if (query->type >= PIPE_QUERY_DRIVER_SPECIFIC)
fprintf(f, "PIPE_QUERY_DRIVER_SPECIFIC + %i",
query->type - PIPE_QUERY_DRIVER_SPECIFIC);
else
fprintf(f, "%s", util_dump_query_type(query->type, false));
}
static void
dd_dump_render_condition(struct dd_context *dctx, FILE *f)
{
if (dctx->render_cond.query) {
fprintf(f, "render condition:\n");
DUMP_M(query, &dctx->render_cond, query);
DUMP_M(uint, &dctx->render_cond, condition);
DUMP_M(uint, &dctx->render_cond, mode);
fprintf(f, "\n");
}
}
static void
dd_dump_draw_vbo(struct dd_context *dctx, struct pipe_draw_info *info, FILE *f)
{
int sh, i;
const char *shader_str[PIPE_SHADER_TYPES];
shader_str[PIPE_SHADER_VERTEX] = "VERTEX";
shader_str[PIPE_SHADER_TESS_CTRL] = "TESS_CTRL";
shader_str[PIPE_SHADER_TESS_EVAL] = "TESS_EVAL";
shader_str[PIPE_SHADER_GEOMETRY] = "GEOMETRY";
shader_str[PIPE_SHADER_FRAGMENT] = "FRAGMENT";
shader_str[PIPE_SHADER_COMPUTE] = "COMPUTE";
DUMP(draw_info, info);
if (info->indexed) {
DUMP(index_buffer, &dctx->index_buffer);
if (dctx->index_buffer.buffer)
DUMP_M(resource, &dctx->index_buffer, buffer);
}
if (info->count_from_stream_output)
DUMP_M(stream_output_target, info,
count_from_stream_output);
if (info->indirect)
DUMP_M(resource, info, indirect);
fprintf(f, "\n");
/* TODO: dump active queries */
dd_dump_render_condition(dctx, f);
for (i = 0; i < PIPE_MAX_ATTRIBS; i++)
if (dctx->vertex_buffers[i].buffer ||
dctx->vertex_buffers[i].user_buffer) {
DUMP_I(vertex_buffer, &dctx->vertex_buffers[i], i);
if (dctx->vertex_buffers[i].buffer)
DUMP_M(resource, &dctx->vertex_buffers[i], buffer);
}
if (dctx->velems) {
print_named_value(f, "num vertex elements",
dctx->velems->state.velems.count);
for (i = 0; i < dctx->velems->state.velems.count; i++) {
fprintf(f, " ");
DUMP_I(vertex_element, &dctx->velems->state.velems.velems[i], i);
}
}
print_named_value(f, "num stream output targets", dctx->num_so_targets);
for (i = 0; i < dctx->num_so_targets; i++)
if (dctx->so_targets[i]) {
DUMP_I(stream_output_target, dctx->so_targets[i], i);
DUMP_M(resource, dctx->so_targets[i], buffer);
fprintf(f, " offset = %i\n", dctx->so_offsets[i]);
}
fprintf(f, "\n");
for (sh = 0; sh < PIPE_SHADER_TYPES; sh++) {
if (sh == PIPE_SHADER_COMPUTE)
continue;
if (sh == PIPE_SHADER_TESS_CTRL &&
!dctx->shaders[PIPE_SHADER_TESS_CTRL] &&
dctx->shaders[PIPE_SHADER_TESS_EVAL])
fprintf(f, "tess_state: {default_outer_level = {%f, %f, %f, %f}, "
"default_inner_level = {%f, %f}}\n",
dctx->tess_default_levels[0],
dctx->tess_default_levels[1],
dctx->tess_default_levels[2],
dctx->tess_default_levels[3],
dctx->tess_default_levels[4],
dctx->tess_default_levels[5]);
if (sh == PIPE_SHADER_FRAGMENT)
if (dctx->rs) {
unsigned num_viewports = dd_num_active_viewports(dctx);
if (dctx->rs->state.rs.clip_plane_enable)
DUMP(clip_state, &dctx->clip_state);
for (i = 0; i < num_viewports; i++)
DUMP_I(viewport_state, &dctx->viewports[i], i);
if (dctx->rs->state.rs.scissor)
for (i = 0; i < num_viewports; i++)
DUMP_I(scissor_state, &dctx->scissors[i], i);
DUMP(rasterizer_state, &dctx->rs->state.rs);
if (dctx->rs->state.rs.poly_stipple_enable)
DUMP(poly_stipple, &dctx->polygon_stipple);
fprintf(f, "\n");
}
if (!dctx->shaders[sh])
continue;
fprintf(f, COLOR_SHADER "begin shader: %s" COLOR_RESET "\n", shader_str[sh]);
DUMP(shader_state, &dctx->shaders[sh]->state.shader);
for (i = 0; i < PIPE_MAX_CONSTANT_BUFFERS; i++)
if (dctx->constant_buffers[sh][i].buffer ||
dctx->constant_buffers[sh][i].user_buffer) {
DUMP_I(constant_buffer, &dctx->constant_buffers[sh][i], i);
if (dctx->constant_buffers[sh][i].buffer)
DUMP_M(resource, &dctx->constant_buffers[sh][i], buffer);
}
for (i = 0; i < PIPE_MAX_SAMPLERS; i++)
if (dctx->sampler_states[sh][i])
DUMP_I(sampler_state, &dctx->sampler_states[sh][i]->state.sampler, i);
for (i = 0; i < PIPE_MAX_SAMPLERS; i++)
if (dctx->sampler_views[sh][i]) {
DUMP_I(sampler_view, dctx->sampler_views[sh][i], i);
DUMP_M(resource, dctx->sampler_views[sh][i], texture);
}
/* TODO: print shader images */
/* TODO: print shader buffers */
fprintf(f, COLOR_SHADER "end shader: %s" COLOR_RESET "\n\n", shader_str[sh]);
}
if (dctx->dsa)
DUMP(depth_stencil_alpha_state, &dctx->dsa->state.dsa);
DUMP(stencil_ref, &dctx->stencil_ref);
if (dctx->blend)
DUMP(blend_state, &dctx->blend->state.blend);
DUMP(blend_color, &dctx->blend_color);
print_named_value(f, "min_samples", dctx->min_samples);
print_named_xvalue(f, "sample_mask", dctx->sample_mask);
fprintf(f, "\n");
DUMP(framebuffer_state, &dctx->framebuffer_state);
for (i = 0; i < dctx->framebuffer_state.nr_cbufs; i++)
if (dctx->framebuffer_state.cbufs[i]) {
fprintf(f, " " COLOR_STATE "cbufs[%i]:" COLOR_RESET "\n ", i);
DUMP(surface, dctx->framebuffer_state.cbufs[i]);
fprintf(f, " ");
DUMP(resource, dctx->framebuffer_state.cbufs[i]->texture);
}
if (dctx->framebuffer_state.zsbuf) {
fprintf(f, " " COLOR_STATE "zsbuf:" COLOR_RESET "\n ");
DUMP(surface, dctx->framebuffer_state.zsbuf);
fprintf(f, " ");
DUMP(resource, dctx->framebuffer_state.zsbuf->texture);
}
fprintf(f, "\n");
}
static void
dd_dump_resource_copy_region(struct dd_context *dctx,
struct call_resource_copy_region *info,
FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, dst);
DUMP_M(uint, info, dst_level);
DUMP_M(uint, info, dstx);
DUMP_M(uint, info, dsty);
DUMP_M(uint, info, dstz);
DUMP_M(resource, info, src);
DUMP_M(uint, info, src_level);
DUMP_M(box, info, src_box);
}
static void
dd_dump_blit(struct dd_context *dctx, struct pipe_blit_info *info, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, dst.resource);
DUMP_M(uint, info, dst.level);
DUMP_M_ADDR(box, info, dst.box);
DUMP_M(format, info, dst.format);
DUMP_M(resource, info, src.resource);
DUMP_M(uint, info, src.level);
DUMP_M_ADDR(box, info, src.box);
DUMP_M(format, info, src.format);
DUMP_M(hex, info, mask);
DUMP_M(uint, info, filter);
DUMP_M(uint, info, scissor_enable);
DUMP_M_ADDR(scissor_state, info, scissor);
DUMP_M(uint, info, render_condition_enable);
if (info->render_condition_enable)
dd_dump_render_condition(dctx, f);
}
static void
dd_dump_flush_resource(struct dd_context *dctx, struct pipe_resource *res,
FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP(resource, res);
}
static void
dd_dump_clear(struct dd_context *dctx, struct call_clear *info, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
DUMP_M(uint, info, buffers);
DUMP_M(color_union, info, color);
DUMP_M(double, info, depth);
DUMP_M(hex, info, stencil);
}
static void
dd_dump_clear_buffer(struct dd_context *dctx, struct call_clear_buffer *info,
FILE *f)
{
int i;
const char *value = (const char*)info->clear_value;
fprintf(f, "%s:\n", __func__+8);
DUMP_M(resource, info, res);
DUMP_M(uint, info, offset);
DUMP_M(uint, info, size);
DUMP_M(uint, info, clear_value_size);
fprintf(f, " clear_value:");
for (i = 0; i < info->clear_value_size; i++)
fprintf(f, " %02x", value[i]);
fprintf(f, "\n");
}
static void
dd_dump_clear_render_target(struct dd_context *dctx, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
/* TODO */
}
static void
dd_dump_clear_depth_stencil(struct dd_context *dctx, FILE *f)
{
fprintf(f, "%s:\n", __func__+8);
/* TODO */
}
static void
dd_dump_driver_state(struct dd_context *dctx, FILE *f, unsigned flags)
{
if (dctx->pipe->dump_debug_state) {
fprintf(f,"\n\n**************************************************"
"***************************\n");
fprintf(f, "Driver-specific state:\n\n");
dctx->pipe->dump_debug_state(dctx->pipe, f, flags);
}
}
static void
dd_dump_call(struct dd_context *dctx, struct dd_call *call, unsigned flags)
{
FILE *f = dd_get_file_stream(dctx);
if (!f)
return;
switch (call->type) {
case CALL_DRAW_VBO:
dd_dump_draw_vbo(dctx, &call->info.draw_vbo, f);
break;
case CALL_RESOURCE_COPY_REGION:
dd_dump_resource_copy_region(dctx, &call->info.resource_copy_region, f);
break;
case CALL_BLIT:
dd_dump_blit(dctx, &call->info.blit, f);
break;
case CALL_FLUSH_RESOURCE:
dd_dump_flush_resource(dctx, call->info.flush_resource, f);
break;
case CALL_CLEAR:
dd_dump_clear(dctx, &call->info.clear, f);
break;
case CALL_CLEAR_BUFFER:
dd_dump_clear_buffer(dctx, &call->info.clear_buffer, f);
break;
case CALL_CLEAR_RENDER_TARGET:
dd_dump_clear_render_target(dctx, f);
break;
case CALL_CLEAR_DEPTH_STENCIL:
dd_dump_clear_depth_stencil(dctx, f);
}
dd_dump_driver_state(dctx, f, flags);
dd_close_file_stream(f);
}
static void
dd_kill_process(void)
{
sync();
fprintf(stderr, "dd: Aborting the process...\n");
fflush(stdout);
fflush(stderr);
abort();
}
static bool
dd_flush_and_check_hang(struct dd_context *dctx,
struct pipe_fence_handle **flush_fence,
unsigned flush_flags)
{
struct pipe_fence_handle *fence = NULL;
struct pipe_context *pipe = dctx->pipe;
struct pipe_screen *screen = pipe->screen;
uint64_t timeout_ms = dd_screen(dctx->base.screen)->timeout_ms;
bool idle;
assert(timeout_ms > 0);
pipe->flush(pipe, &fence, flush_flags);
if (flush_fence)
screen->fence_reference(screen, flush_fence, fence);
if (!fence)
return false;
idle = screen->fence_finish(screen, fence, timeout_ms * 1000000);
screen->fence_reference(screen, &fence, NULL);
if (!idle)
fprintf(stderr, "dd: GPU hang detected!\n");
return !idle;
}
static void
dd_flush_and_handle_hang(struct dd_context *dctx,
struct pipe_fence_handle **fence, unsigned flags,
const char *cause)
{
if (dd_flush_and_check_hang(dctx, fence, flags)) {
FILE *f = dd_get_file_stream(dctx);
if (f) {
fprintf(f, "dd: %s.\n", cause);
dd_dump_driver_state(dctx, f, PIPE_DEBUG_DEVICE_IS_HUNG);
dd_close_file_stream(f);
}
/* Terminate the process to prevent future hangs. */
dd_kill_process();
}
}
static void
dd_context_flush(struct pipe_context *_pipe,
struct pipe_fence_handle **fence, unsigned flags)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
switch (dd_screen(dctx->base.screen)->mode) {
case DD_DETECT_HANGS:
dd_flush_and_handle_hang(dctx, fence, flags,
"GPU hang detected in pipe->flush()");
break;
case DD_DUMP_ALL_CALLS:
pipe->flush(pipe, fence, flags);
break;
default:
assert(0);
}
}
static void
dd_before_draw(struct dd_context *dctx)
{
if (dd_screen(dctx->base.screen)->mode == DD_DETECT_HANGS &&
!dd_screen(dctx->base.screen)->no_flush)
dd_flush_and_handle_hang(dctx, NULL, 0,
"GPU hang most likely caused by internal "
"driver commands");
}
static void
dd_after_draw(struct dd_context *dctx, struct dd_call *call)
{
switch (dd_screen(dctx->base.screen)->mode) {
case DD_DETECT_HANGS:
if (!dd_screen(dctx->base.screen)->no_flush &&
dd_flush_and_check_hang(dctx, NULL, 0)) {
dd_dump_call(dctx, call, PIPE_DEBUG_DEVICE_IS_HUNG);
/* Terminate the process to prevent future hangs. */
dd_kill_process();
}
break;
case DD_DUMP_ALL_CALLS:
dd_dump_call(dctx, call, 0);
break;
default:
assert(0);
}
}
static void
dd_context_draw_vbo(struct pipe_context *_pipe,
const struct pipe_draw_info *info)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_DRAW_VBO;
call.info.draw_vbo = *info;
dd_before_draw(dctx);
pipe->draw_vbo(pipe, info);
dd_after_draw(dctx, &call);
}
static void
dd_context_resource_copy_region(struct pipe_context *_pipe,
struct pipe_resource *dst, unsigned dst_level,
unsigned dstx, unsigned dsty, unsigned dstz,
struct pipe_resource *src, unsigned src_level,
const struct pipe_box *src_box)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_RESOURCE_COPY_REGION;
call.info.resource_copy_region.dst = dst;
call.info.resource_copy_region.dst_level = dst_level;
call.info.resource_copy_region.dstx = dstx;
call.info.resource_copy_region.dsty = dsty;
call.info.resource_copy_region.dstz = dstz;
call.info.resource_copy_region.src = src;
call.info.resource_copy_region.src_level = src_level;
call.info.resource_copy_region.src_box = src_box;
dd_before_draw(dctx);
pipe->resource_copy_region(pipe,
dst, dst_level, dstx, dsty, dstz,
src, src_level, src_box);
dd_after_draw(dctx, &call);
}
static void
dd_context_blit(struct pipe_context *_pipe, const struct pipe_blit_info *info)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_BLIT;
call.info.blit = *info;
dd_before_draw(dctx);
pipe->blit(pipe, info);
dd_after_draw(dctx, &call);
}
static void
dd_context_flush_resource(struct pipe_context *_pipe,
struct pipe_resource *resource)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_FLUSH_RESOURCE;
call.info.flush_resource = resource;
dd_before_draw(dctx);
pipe->flush_resource(pipe, resource);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear(struct pipe_context *_pipe, unsigned buffers,
const union pipe_color_union *color, double depth,
unsigned stencil)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR;
call.info.clear.buffers = buffers;
call.info.clear.color = color;
call.info.clear.depth = depth;
call.info.clear.stencil = stencil;
dd_before_draw(dctx);
pipe->clear(pipe, buffers, color, depth, stencil);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_render_target(struct pipe_context *_pipe,
struct pipe_surface *dst,
const union pipe_color_union *color,
unsigned dstx, unsigned dsty,
unsigned width, unsigned height)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_RENDER_TARGET;
dd_before_draw(dctx);
pipe->clear_render_target(pipe, dst, color, dstx, dsty, width, height);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_depth_stencil(struct pipe_context *_pipe,
struct pipe_surface *dst, unsigned clear_flags,
double depth, unsigned stencil, unsigned dstx,
unsigned dsty, unsigned width, unsigned height)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_DEPTH_STENCIL;
dd_before_draw(dctx);
pipe->clear_depth_stencil(pipe, dst, clear_flags, depth, stencil,
dstx, dsty, width, height);
dd_after_draw(dctx, &call);
}
static void
dd_context_clear_buffer(struct pipe_context *_pipe, struct pipe_resource *res,
unsigned offset, unsigned size,
const void *clear_value, int clear_value_size)
{
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
struct dd_call call;
call.type = CALL_CLEAR_BUFFER;
call.info.clear_buffer.res = res;
call.info.clear_buffer.offset = offset;
call.info.clear_buffer.size = size;
call.info.clear_buffer.clear_value = clear_value;
call.info.clear_buffer.clear_value_size = clear_value_size;
dd_before_draw(dctx);
pipe->clear_buffer(pipe, res, offset, size, clear_value, clear_value_size);
dd_after_draw(dctx, &call);
}
void
dd_init_draw_functions(struct dd_context *dctx)
{
CTX_INIT(flush);
CTX_INIT(draw_vbo);
CTX_INIT(resource_copy_region);
CTX_INIT(blit);
CTX_INIT(clear);
CTX_INIT(clear_render_target);
CTX_INIT(clear_depth_stencil);
CTX_INIT(clear_buffer);
CTX_INIT(flush_resource);
/* launch_grid */
}

View File

@@ -0,0 +1,141 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#ifndef DD_H_
#define DD_H_
#include "pipe/p_context.h"
#include "pipe/p_state.h"
#include "pipe/p_screen.h"
/* name of the directory in home */
#define DD_DIR "ddebug_dumps"
enum dd_mode {
DD_DETECT_HANGS,
DD_DUMP_ALL_CALLS
};
struct dd_screen
{
struct pipe_screen base;
struct pipe_screen *screen;
unsigned timeout_ms;
enum dd_mode mode;
bool no_flush;
};
struct dd_query
{
unsigned type;
struct pipe_query *query;
};
struct dd_state
{
void *cso;
union {
struct pipe_blend_state blend;
struct pipe_depth_stencil_alpha_state dsa;
struct pipe_rasterizer_state rs;
struct pipe_sampler_state sampler;
struct {
struct pipe_vertex_element velems[PIPE_MAX_ATTRIBS];
unsigned count;
} velems;
struct pipe_shader_state shader;
} state;
};
struct dd_context
{
struct pipe_context base;
struct pipe_context *pipe;
struct {
struct dd_query *query;
bool condition;
unsigned mode;
} render_cond;
struct pipe_index_buffer index_buffer;
struct pipe_vertex_buffer vertex_buffers[PIPE_MAX_ATTRIBS];
unsigned num_so_targets;
struct pipe_stream_output_target *so_targets[PIPE_MAX_SO_BUFFERS];
unsigned so_offsets[PIPE_MAX_SO_BUFFERS];
struct dd_state *shaders[PIPE_SHADER_TYPES];
struct pipe_constant_buffer constant_buffers[PIPE_SHADER_TYPES][PIPE_MAX_CONSTANT_BUFFERS];
struct pipe_sampler_view *sampler_views[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS];
struct dd_state *sampler_states[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS];
struct pipe_image_view *shader_images[PIPE_SHADER_TYPES][PIPE_MAX_SHADER_IMAGES];
struct pipe_shader_buffer shader_buffers[PIPE_SHADER_TYPES][PIPE_MAX_SHADER_BUFFERS];
struct dd_state *velems;
struct dd_state *rs;
struct dd_state *dsa;
struct dd_state *blend;
struct pipe_blend_color blend_color;
struct pipe_stencil_ref stencil_ref;
unsigned sample_mask;
unsigned min_samples;
struct pipe_clip_state clip_state;
struct pipe_framebuffer_state framebuffer_state;
struct pipe_poly_stipple polygon_stipple;
struct pipe_scissor_state scissors[PIPE_MAX_VIEWPORTS];
struct pipe_viewport_state viewports[PIPE_MAX_VIEWPORTS];
float tess_default_levels[6];
};
struct pipe_context *
dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe);
void
dd_init_draw_functions(struct dd_context *dctx);
static inline struct dd_context *
dd_context(struct pipe_context *pipe)
{
return (struct dd_context *)pipe;
}
static inline struct dd_screen *
dd_screen(struct pipe_screen *screen)
{
return (struct dd_screen*)screen;
}
#define CTX_INIT(_member) \
dctx->base._member = dctx->pipe->_member ? dd_context_##_member : NULL
#endif /* DD_H_ */

View File

@@ -0,0 +1,36 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2010 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#ifndef DD_PUBLIC_H_
#define DD_PUBLIC_H_
struct pipe_screen;
struct pipe_screen *
ddebug_screen_create(struct pipe_screen *screen);
#endif /* DD_PUBLIC_H_ */

View File

@@ -0,0 +1,353 @@
/**************************************************************************
*
* Copyright 2015 Advanced Micro Devices, Inc.
* Copyright 2008 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "dd_pipe.h"
#include "dd_public.h"
#include "util/u_memory.h"
#include <stdio.h>
static const char *
dd_screen_get_name(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_name(screen);
}
static const char *
dd_screen_get_vendor(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_vendor(screen);
}
static const char *
dd_screen_get_device_vendor(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_device_vendor(screen);
}
static int
dd_screen_get_param(struct pipe_screen *_screen,
enum pipe_cap param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_param(screen, param);
}
static float
dd_screen_get_paramf(struct pipe_screen *_screen,
enum pipe_capf param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_paramf(screen, param);
}
static int
dd_screen_get_shader_param(struct pipe_screen *_screen, unsigned shader,
enum pipe_shader_cap param)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_shader_param(screen, shader, param);
}
static uint64_t
dd_screen_get_timestamp(struct pipe_screen *_screen)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_timestamp(screen);
}
static struct pipe_context *
dd_screen_context_create(struct pipe_screen *_screen, void *priv,
unsigned flags)
{
struct dd_screen *dscreen = dd_screen(_screen);
struct pipe_screen *screen = dscreen->screen;
flags |= PIPE_CONTEXT_DEBUG;
return dd_context_create(dscreen,
screen->context_create(screen, priv, flags));
}
static boolean
dd_screen_is_format_supported(struct pipe_screen *_screen,
enum pipe_format format,
enum pipe_texture_target target,
unsigned sample_count,
unsigned tex_usage)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->is_format_supported(screen, format, target, sample_count,
tex_usage);
}
static boolean
dd_screen_can_create_resource(struct pipe_screen *_screen,
const struct pipe_resource *templat)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->can_create_resource(screen, templat);
}
static void
dd_screen_flush_frontbuffer(struct pipe_screen *_screen,
struct pipe_resource *resource,
unsigned level, unsigned layer,
void *context_private,
struct pipe_box *sub_box)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->flush_frontbuffer(screen, resource, level, layer, context_private,
sub_box);
}
static int
dd_screen_get_driver_query_info(struct pipe_screen *_screen,
unsigned index,
struct pipe_driver_query_info *info)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_driver_query_info(screen, index, info);
}
static int
dd_screen_get_driver_query_group_info(struct pipe_screen *_screen,
unsigned index,
struct pipe_driver_query_group_info *info)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->get_driver_query_group_info(screen, index, info);
}
/********************************************************************
* resource
*/
static struct pipe_resource *
dd_screen_resource_create(struct pipe_screen *_screen,
const struct pipe_resource *templat)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res = screen->resource_create(screen, templat);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static struct pipe_resource *
dd_screen_resource_from_handle(struct pipe_screen *_screen,
const struct pipe_resource *templ,
struct winsys_handle *handle)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res =
screen->resource_from_handle(screen, templ, handle);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static struct pipe_resource *
dd_screen_resource_from_user_memory(struct pipe_screen *_screen,
const struct pipe_resource *templ,
void *user_memory)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
struct pipe_resource *res =
screen->resource_from_user_memory(screen, templ, user_memory);
if (!res)
return NULL;
res->screen = _screen;
return res;
}
static void
dd_screen_resource_destroy(struct pipe_screen *_screen,
struct pipe_resource *res)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->resource_destroy(screen, res);
}
static boolean
dd_screen_resource_get_handle(struct pipe_screen *_screen,
struct pipe_resource *resource,
struct winsys_handle *handle)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->resource_get_handle(screen, resource, handle);
}
/********************************************************************
* fence
*/
static void
dd_screen_fence_reference(struct pipe_screen *_screen,
struct pipe_fence_handle **pdst,
struct pipe_fence_handle *src)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
screen->fence_reference(screen, pdst, src);
}
static boolean
dd_screen_fence_finish(struct pipe_screen *_screen,
struct pipe_fence_handle *fence,
uint64_t timeout)
{
struct pipe_screen *screen = dd_screen(_screen)->screen;
return screen->fence_finish(screen, fence, timeout);
}
/********************************************************************
* screen
*/
static void
dd_screen_destroy(struct pipe_screen *_screen)
{
struct dd_screen *dscreen = dd_screen(_screen);
struct pipe_screen *screen = dscreen->screen;
screen->destroy(screen);
FREE(dscreen);
}
struct pipe_screen *
ddebug_screen_create(struct pipe_screen *screen)
{
struct dd_screen *dscreen;
const char *option = debug_get_option("GALLIUM_DDEBUG", NULL);
bool dump_always = option && !strcmp(option, "always");
bool no_flush = option && strstr(option, "noflush");
bool help = option && !strcmp(option, "help");
unsigned timeout = 0;
if (help) {
puts("Gallium driver debugger");
puts("");
puts("Usage:");
puts("");
puts(" GALLIUM_DDEBUG=always");
puts(" Dump context and driver information after every draw call into");
puts(" $HOME/"DD_DIR"/.");
puts("");
puts(" GALLIUM_DDEBUG=[timeout in ms] noflush");
puts(" Flush and detect a device hang after every draw call based on the given");
puts(" fence timeout and dump context and driver information into");
puts(" $HOME/"DD_DIR"/ when a hang is detected.");
puts(" If 'noflush' is specified, only detect hangs in pipe->flush.");
puts("");
exit(0);
}
if (!option)
return screen;
if (!dump_always && sscanf(option, "%u", &timeout) != 1)
return screen;
dscreen = CALLOC_STRUCT(dd_screen);
if (!dscreen)
return NULL;
#define SCR_INIT(_member) \
dscreen->base._member = screen->_member ? dd_screen_##_member : NULL
dscreen->base.destroy = dd_screen_destroy;
dscreen->base.get_name = dd_screen_get_name;
dscreen->base.get_vendor = dd_screen_get_vendor;
dscreen->base.get_device_vendor = dd_screen_get_device_vendor;
dscreen->base.get_param = dd_screen_get_param;
dscreen->base.get_paramf = dd_screen_get_paramf;
dscreen->base.get_shader_param = dd_screen_get_shader_param;
/* get_video_param */
/* get_compute_param */
SCR_INIT(get_timestamp);
dscreen->base.context_create = dd_screen_context_create;
dscreen->base.is_format_supported = dd_screen_is_format_supported;
/* is_video_format_supported */
SCR_INIT(can_create_resource);
dscreen->base.resource_create = dd_screen_resource_create;
dscreen->base.resource_from_handle = dd_screen_resource_from_handle;
SCR_INIT(resource_from_user_memory);
dscreen->base.resource_get_handle = dd_screen_resource_get_handle;
dscreen->base.resource_destroy = dd_screen_resource_destroy;
SCR_INIT(flush_frontbuffer);
SCR_INIT(fence_reference);
SCR_INIT(fence_finish);
SCR_INIT(get_driver_query_info);
SCR_INIT(get_driver_query_group_info);
#undef SCR_INIT
dscreen->screen = screen;
dscreen->timeout_ms = timeout;
dscreen->mode = dump_always ? DD_DUMP_ALL_CALLS : DD_DETECT_HANGS;
dscreen->no_flush = no_flush;
switch (dscreen->mode) {
case DD_DUMP_ALL_CALLS:
fprintf(stderr, "Gallium debugger active. Logging all calls.\n");
break;
case DD_DETECT_HANGS:
fprintf(stderr, "Gallium debugger active. "
"The hang detection timout is %i ms.\n", timeout);
break;
default:
assert(0);
}
return &dscreen->base;
}

View File

@@ -14,7 +14,7 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63915 bytes, from 2015-08-24 16:56:28)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)

View File

@@ -86,7 +86,7 @@ static const uint8_t a20x_primtypes[PIPE_PRIM_MAX] = {
};
struct pipe_context *
fd2_context_create(struct pipe_screen *pscreen, void *priv)
fd2_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
{
struct fd_screen *screen = fd_screen(pscreen);
struct fd2_context *fd2_ctx = CALLOC_STRUCT(fd2_context);

View File

@@ -47,6 +47,6 @@ fd2_context(struct fd_context *ctx)
}
struct pipe_context *
fd2_context_create(struct pipe_screen *pscreen, void *priv);
fd2_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
#endif /* FD2_CONTEXT_H_ */

View File

@@ -14,7 +14,7 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63915 bytes, from 2015-08-24 16:56:28)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -680,6 +680,7 @@ static inline uint32_t REG_A3XX_CP_PROTECT_REG(uint32_t i0) { return 0x00000460
#define A3XX_GRAS_CL_CLIP_CNTL_VP_CLIP_CODE_IGNORE 0x00080000
#define A3XX_GRAS_CL_CLIP_CNTL_VP_XFORM_DISABLE 0x00100000
#define A3XX_GRAS_CL_CLIP_CNTL_PERSP_DIVISION_DISABLE 0x00200000
#define A3XX_GRAS_CL_CLIP_CNTL_ZERO_GB_SCALE_Z 0x00400000
#define A3XX_GRAS_CL_CLIP_CNTL_ZCOORD 0x00800000
#define A3XX_GRAS_CL_CLIP_CNTL_WCOORD 0x01000000
#define A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE 0x02000000

View File

@@ -98,7 +98,7 @@ static const uint8_t primtypes[PIPE_PRIM_MAX] = {
};
struct pipe_context *
fd3_context_create(struct pipe_screen *pscreen, void *priv)
fd3_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
{
struct fd_screen *screen = fd_screen(pscreen);
struct fd3_context *fd3_ctx = CALLOC_STRUCT(fd3_context);

View File

@@ -119,6 +119,6 @@ fd3_context(struct fd_context *ctx)
}
struct pipe_context *
fd3_context_create(struct pipe_screen *pscreen, void *priv);
fd3_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
#endif /* FD3_CONTEXT_H_ */

View File

@@ -563,10 +563,29 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer *ring,
val |= COND(fp->writes_pos, A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE);
val |= COND(fp->frag_coord, A3XX_GRAS_CL_CLIP_CNTL_ZCOORD |
A3XX_GRAS_CL_CLIP_CNTL_WCOORD);
/* TODO only use if prog doesn't use clipvertex/clipdist */
val |= MIN2(util_bitcount(ctx->rasterizer->clip_plane_enable), 6) << 26;
OUT_PKT0(ring, REG_A3XX_GRAS_CL_CLIP_CNTL, 1);
OUT_RING(ring, val);
}
if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_UCP)) {
uint32_t planes = ctx->rasterizer->clip_plane_enable;
int count = 0;
while (planes && count < 6) {
int i = ffs(planes) - 1;
planes &= ~(1U << i);
fd_wfi(ctx, ring);
OUT_PKT0(ring, REG_A3XX_GRAS_CL_USER_PLANE(count++), 4);
OUT_RING(ring, fui(ctx->ucp.ucp[i][0]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][1]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][2]));
OUT_RING(ring, fui(ctx->ucp.ucp[i][3]));
}
}
/* NOTE: since primitive_restart is not actually part of any
* state object, we need to make sure that we always emit
* PRIM_VTX_CNTL.. either that or be more clever and detect

View File

@@ -65,7 +65,8 @@ fd3_rasterizer_state_create(struct pipe_context *pctx,
if (cso->multisample)
TODO
*/
so->gras_cl_clip_cntl = A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER; /* ??? */
so->gras_cl_clip_cntl = A3XX_GRAS_CL_CLIP_CNTL_IJ_PERSP_CENTER /* ??? */ |
COND(cso->clip_halfz, A3XX_GRAS_CL_CLIP_CNTL_ZERO_GB_SCALE_Z);
so->gras_su_point_minmax =
A3XX_GRAS_SU_POINT_MINMAX_MIN(psize_min) |
A3XX_GRAS_SU_POINT_MINMAX_MAX(psize_max);

View File

@@ -14,7 +14,7 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63915 bytes, from 2015-08-24 16:56:28)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)
@@ -162,10 +162,13 @@ enum a4xx_tex_fmt {
TFMT4_8_UNORM = 4,
TFMT4_8_8_UNORM = 14,
TFMT4_8_8_8_8_UNORM = 28,
TFMT4_8_SNORM = 5,
TFMT4_8_8_SNORM = 15,
TFMT4_8_8_8_8_SNORM = 29,
TFMT4_8_UINT = 6,
TFMT4_8_8_UINT = 16,
TFMT4_8_8_8_8_UINT = 30,
TFMT4_8_SINT = 7,
TFMT4_8_8_SINT = 17,
TFMT4_8_8_8_8_SINT = 31,
TFMT4_16_UINT = 21,

View File

@@ -96,7 +96,7 @@ static const uint8_t primtypes[PIPE_PRIM_MAX] = {
};
struct pipe_context *
fd4_context_create(struct pipe_screen *pscreen, void *priv)
fd4_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags)
{
struct fd_screen *screen = fd_screen(pscreen);
struct fd4_context *fd4_ctx = CALLOC_STRUCT(fd4_context);

View File

@@ -97,6 +97,6 @@ fd4_context(struct fd_context *ctx)
}
struct pipe_context *
fd4_context_create(struct pipe_screen *pscreen, void *priv);
fd4_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
#endif /* FD4_CONTEXT_H_ */

View File

@@ -79,9 +79,9 @@ struct fd4_format {
static struct fd4_format formats[PIPE_FORMAT_COUNT] = {
/* 8-bit */
VT(R8_UNORM, 8_UNORM, R8_UNORM, WZYX),
V_(R8_SNORM, 8_SNORM, NONE, WZYX),
V_(R8_UINT, 8_UINT, NONE, WZYX),
V_(R8_SINT, 8_SINT, NONE, WZYX),
VT(R8_SNORM, 8_SNORM, NONE, WZYX),
VT(R8_UINT, 8_UINT, NONE, WZYX),
VT(R8_SINT, 8_SINT, NONE, WZYX),
V_(R8_USCALED, 8_UINT, NONE, WZYX),
V_(R8_SSCALED, 8_UINT, NONE, WZYX),
@@ -115,8 +115,8 @@ static struct fd4_format formats[PIPE_FORMAT_COUNT] = {
VT(R8G8_UNORM, 8_8_UNORM, R8G8_UNORM, WZYX),
VT(R8G8_SNORM, 8_8_SNORM, R8G8_SNORM, WZYX),
VT(R8G8_UINT, 8_8_UINT, NONE, WZYX),
VT(R8G8_SINT, 8_8_SINT, NONE, WZYX),
VT(R8G8_UINT, 8_8_UINT, R8G8_UINT, WZYX),
VT(R8G8_SINT, 8_8_SINT, R8G8_SINT, WZYX),
V_(R8G8_USCALED, 8_8_UINT, NONE, WZYX),
V_(R8G8_SSCALED, 8_8_SINT, NONE, WZYX),

View File

@@ -14,7 +14,7 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63915 bytes, from 2015-08-24 16:56:28)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)

View File

@@ -14,7 +14,7 @@ The rules-ng-ng source files this header was generated from are:
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml ( 10551 bytes, from 2015-05-20 20:03:14)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml ( 14968 bytes, from 2015-05-20 20:12:27)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml ( 67120 bytes, from 2015-08-14 23:22:03)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63785 bytes, from 2015-08-14 18:27:06)
- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml ( 63915 bytes, from 2015-08-24 16:56:28)
Copyright (C) 2013-2015 by the following authors:
- Rob Clark <robdclark@gmail.com> (robclark)

View File

@@ -334,6 +334,7 @@ struct fd_context {
FD_DIRTY_INDEXBUF = (1 << 16),
FD_DIRTY_SCISSOR = (1 << 17),
FD_DIRTY_STREAMOUT = (1 << 18),
FD_DIRTY_UCP = (1 << 19),
} dirty;
struct pipe_blend_state *blend;
@@ -355,6 +356,7 @@ struct fd_context {
struct fd_constbuf_stateobj constbuf[PIPE_SHADER_TYPES];
struct pipe_index_buffer indexbuf;
struct fd_streamout_stateobj streamout;
struct pipe_clip_state ucp;
/* GMEM/tile handling fxns: */
void (*emit_tile_init)(struct fd_context *ctx);

View File

@@ -191,6 +191,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
return 16383;
case PIPE_CAP_DEPTH_CLIP_DISABLE:
case PIPE_CAP_CLIP_HALFZ:
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
return is_a3xx(screen);
@@ -228,7 +229,6 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
case PIPE_CAP_CONDITIONAL_RENDER_INVERTED:
case PIPE_CAP_SAMPLER_VIEW_TARGET:
case PIPE_CAP_CLIP_HALFZ:
case PIPE_CAP_POLYGON_OFFSET_CLAMP:
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:

View File

@@ -65,7 +65,9 @@ static void
fd_set_clip_state(struct pipe_context *pctx,
const struct pipe_clip_state *clip)
{
DBG("TODO: ");
struct fd_context *ctx = fd_context(pctx);
ctx->ucp = *clip;
ctx->dirty |= FD_DIRTY_UCP;
}
static void

View File

@@ -2312,7 +2312,7 @@ emit_instructions(struct ir3_compile *ctx)
ctx->ir = ir3_create(ctx->compiler, ninputs, noutputs);
/* Create inputs in first block: */
ctx->block = get_block(ctx, fxn->start_block);
ctx->block = get_block(ctx, nir_start_block(fxn));
ctx->in_block = ctx->block;
list_addtail(&ctx->block->node, &ctx->ir->block_list);

View File

@@ -29,6 +29,7 @@
#include "ir3_nir.h"
#include "glsl/nir/nir_builder.h"
#include "glsl/nir/nir_control_flow.h"
/* Based on nir_opt_peephole_select, and hacked up to more aggressively
* flatten anything that can be flattened
@@ -171,7 +172,7 @@ flatten_block(nir_builder *bld, nir_block *if_block, nir_block *prev_block,
(intr->intrinsic == nir_intrinsic_discard_if)) {
nir_ssa_def *discard_cond;
nir_builder_insert_after_instr(bld,
bld->cursor = nir_after_instr(
nir_block_last_instr(prev_block));
if (invert) {

View File

@@ -155,7 +155,7 @@ static void i915_destroy(struct pipe_context *pipe)
}
struct pipe_context *
i915_create_context(struct pipe_screen *screen, void *priv)
i915_create_context(struct pipe_screen *screen, void *priv, unsigned flags)
{
struct i915_context *i915;

View File

@@ -401,7 +401,7 @@ void i915_init_string_functions( struct i915_context *i915 );
* i915_context.c
*/
struct pipe_context *i915_create_context(struct pipe_screen *screen,
void *priv);
void *priv, unsigned flags);
/***********************************************************************

View File

@@ -135,7 +135,7 @@ ilo_context_destroy(struct pipe_context *pipe)
}
static struct pipe_context *
ilo_context_create(struct pipe_screen *screen, void *priv)
ilo_context_create(struct pipe_screen *screen, void *priv, unsigned flags)
{
struct ilo_screen *is = ilo_screen(screen);
struct ilo_context *ilo;

View File

@@ -128,7 +128,8 @@ llvmpipe_render_condition ( struct pipe_context *pipe,
}
struct pipe_context *
llvmpipe_create_context( struct pipe_screen *screen, void *priv )
llvmpipe_create_context(struct pipe_screen *screen, void *priv,
unsigned flags)
{
struct llvmpipe_context *llvmpipe;

View File

@@ -160,7 +160,8 @@ struct llvmpipe_context {
struct pipe_context *
llvmpipe_create_context( struct pipe_screen *screen, void *priv );
llvmpipe_create_context(struct pipe_screen *screen, void *priv,
unsigned flags);
struct pipe_resource *
llvmpipe_user_buffer_create(struct pipe_screen *screen,

View File

@@ -260,7 +260,8 @@ static void noop_destroy_context(struct pipe_context *ctx)
FREE(ctx);
}
static struct pipe_context *noop_create_context(struct pipe_screen *screen, void *priv)
static struct pipe_context *noop_create_context(struct pipe_screen *screen,
void *priv, unsigned flags)
{
struct pipe_context *ctx = CALLOC_STRUCT(pipe_context);

View File

@@ -190,7 +190,7 @@ nv30_context_destroy(struct pipe_context *pipe)
} while(0)
struct pipe_context *
nv30_context_create(struct pipe_screen *pscreen, void *priv)
nv30_context_create(struct pipe_screen *pscreen, void *priv, unsigned ctxflags)
{
struct nv30_screen *screen = nv30_screen(pscreen);
struct nv30_context *nv30 = CALLOC_STRUCT(nv30_context);

View File

@@ -132,7 +132,7 @@ nv30_context(struct pipe_context *pipe)
}
struct pipe_context *
nv30_context_create(struct pipe_screen *pscreen, void *priv);
nv30_context_create(struct pipe_screen *pscreen, void *priv, unsigned flags);
void
nv30_vbo_init(struct pipe_context *pipe);

View File

@@ -240,7 +240,7 @@ nv50_context_get_sample_position(struct pipe_context *, unsigned, unsigned,
float *);
struct pipe_context *
nv50_create(struct pipe_screen *pscreen, void *priv)
nv50_create(struct pipe_screen *pscreen, void *priv, unsigned ctxflags)
{
struct nv50_screen *screen = nv50_screen(pscreen);
struct nv50_context *nv50;

View File

@@ -186,7 +186,7 @@ nv50_context_shader_stage(unsigned pipe)
}
/* nv50_context.c */
struct pipe_context *nv50_create(struct pipe_screen *, void *);
struct pipe_context *nv50_create(struct pipe_screen *, void *, unsigned flags);
void nv50_bufctx_fence(struct nouveau_bufctx *, bool on_flush);

View File

@@ -117,7 +117,6 @@ nv50_blend_state_create(struct pipe_context *pipe,
struct nv50_blend_stateobj *so = CALLOC_STRUCT(nv50_blend_stateobj);
int i;
bool emit_common_func = cso->rt[0].blend_enable;
uint32_t ms;
if (nv50_context(pipe)->screen->tesla->oclass >= NVA3_3D_CLASS) {
SB_BEGIN_3D(so, BLEND_INDEPENDENT, 1);
@@ -189,15 +188,6 @@ nv50_blend_state_create(struct pipe_context *pipe,
SB_DATA (so, nv50_colormask(cso->rt[0].colormask));
}
ms = 0;
if (cso->alpha_to_coverage)
ms |= NV50_3D_MULTISAMPLE_CTRL_ALPHA_TO_COVERAGE;
if (cso->alpha_to_one)
ms |= NV50_3D_MULTISAMPLE_CTRL_ALPHA_TO_ONE;
SB_BEGIN_3D(so, MULTISAMPLE_CTRL, 1);
SB_DATA (so, ms);
assert(so->size <= (sizeof(so->state) / sizeof(so->state[0])));
return so;
}

View File

@@ -1,4 +1,6 @@
#include "util/u_format.h"
#include "nv50/nv50_context.h"
#include "nv50/nv50_defs.xml.h"
@@ -313,6 +315,25 @@ nv50_validate_derived_2(struct nv50_context *nv50)
}
}
static void
nv50_validate_derived_3(struct nv50_context *nv50)
{
struct nouveau_pushbuf *push = nv50->base.pushbuf;
struct pipe_framebuffer_state *fb = &nv50->framebuffer;
uint32_t ms = 0;
if ((!fb->nr_cbufs || !fb->cbufs[0] ||
!util_format_is_pure_integer(fb->cbufs[0]->format)) && nv50->blend) {
if (nv50->blend->pipe.alpha_to_coverage)
ms |= NV50_3D_MULTISAMPLE_CTRL_ALPHA_TO_COVERAGE;
if (nv50->blend->pipe.alpha_to_one)
ms |= NV50_3D_MULTISAMPLE_CTRL_ALPHA_TO_ONE;
}
BEGIN_NV04(push, NV50_3D(MULTISAMPLE_CTRL), 1);
PUSH_DATA (push, ms);
}
static void
nv50_validate_clip(struct nv50_context *nv50)
{
@@ -474,6 +495,7 @@ static struct state_validate {
{ nv50_validate_derived_rs, NV50_NEW_FRAGPROG | NV50_NEW_RASTERIZER |
NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG },
{ nv50_validate_derived_2, NV50_NEW_ZSA | NV50_NEW_FRAMEBUFFER },
{ nv50_validate_derived_3, NV50_NEW_BLEND | NV50_NEW_FRAMEBUFFER },
{ nv50_validate_clip, NV50_NEW_CLIP | NV50_NEW_RASTERIZER |
NV50_NEW_VERTPROG | NV50_NEW_GMTYPROG },
{ nv50_constbufs_validate, NV50_NEW_CONSTBUF },

View File

@@ -19,7 +19,7 @@
struct nv50_blend_stateobj {
struct pipe_blend_state pipe;
int size;
uint32_t state[84]; // TODO: allocate less if !independent_blend_enable
uint32_t state[82]; // TODO: allocate less if !independent_blend_enable
};
struct nv50_rasterizer_stateobj {

View File

@@ -68,6 +68,10 @@ nv50_2d_format(enum pipe_format format, bool dst, bool dst_src_equal)
return NV50_SURFACE_FORMAT_R16_UNORM;
case 4:
return NV50_SURFACE_FORMAT_BGRA8_UNORM;
case 8:
return NV50_SURFACE_FORMAT_RGBA16_FLOAT;
case 16:
return NV50_SURFACE_FORMAT_RGBA32_FLOAT;
default:
return 0;
}
@@ -1003,6 +1007,8 @@ nv50_blitctx_prepare_state(struct nv50_blitctx *blit)
/* zsa state */
BEGIN_NV04(push, NV50_3D(DEPTH_TEST_ENABLE), 1);
PUSH_DATA (push, 0);
BEGIN_NV04(push, NV50_3D(DEPTH_BOUNDS_EN), 1);
PUSH_DATA (push, 0);
BEGIN_NV04(push, NV50_3D(STENCIL_ENABLE), 1);
PUSH_DATA (push, 0);
BEGIN_NV04(push, NV50_3D(ALPHA_TEST_ENABLE), 1);

View File

@@ -262,7 +262,7 @@ nvc0_context_get_sample_position(struct pipe_context *, unsigned, unsigned,
float *);
struct pipe_context *
nvc0_create(struct pipe_screen *pscreen, void *priv)
nvc0_create(struct pipe_screen *pscreen, void *priv, unsigned ctxflags)
{
struct nvc0_screen *screen = nvc0_screen(pscreen);
struct nvc0_context *nvc0;

View File

@@ -214,7 +214,7 @@ nvc0_shader_stage(unsigned pipe)
/* nvc0_context.c */
struct pipe_context *nvc0_create(struct pipe_screen *, void *);
struct pipe_context *nvc0_create(struct pipe_screen *, void *, unsigned flags);
void nvc0_bufctx_fence(struct nvc0_context *, struct nouveau_bufctx *,
bool on_flush);
void nvc0_default_kick_notify(struct nouveau_pushbuf *);

View File

@@ -56,10 +56,10 @@ struct nvc0_query {
#define NVC0_QUERY_ALLOC_SPACE 256
static boolean nvc0_mp_pm_query_begin(struct nvc0_context *,
static boolean nvc0_hw_sm_query_begin(struct nvc0_context *,
struct nvc0_query *);
static void nvc0_mp_pm_query_end(struct nvc0_context *, struct nvc0_query *);
static boolean nvc0_mp_pm_query_result(struct nvc0_context *,
static void nvc0_hw_sm_query_end(struct nvc0_context *, struct nvc0_query *);
static boolean nvc0_hw_sm_query_result(struct nvc0_context *,
struct nvc0_query *, void *, boolean);
static inline struct nvc0_query *
@@ -159,7 +159,7 @@ nvc0_query_create(struct pipe_context *pipe, unsigned type, unsigned index)
} else
#endif
if (nvc0->screen->base.device->drm_version >= 0x01000101) {
if (type >= NVE4_PM_QUERY(0) && type <= NVE4_PM_QUERY_LAST) {
if (type >= NVE4_HW_SM_QUERY(0) && type <= NVE4_HW_SM_QUERY_LAST) {
/* for each MP:
* [00] = WS0.C0
* [04] = WS0.C1
@@ -189,7 +189,7 @@ nvc0_query_create(struct pipe_context *pipe, unsigned type, unsigned index)
space = (4 * 4 + 4 + 4) * nvc0->screen->mp_count * sizeof(uint32_t);
break;
} else
if (type >= NVC0_PM_QUERY(0) && type <= NVC0_PM_QUERY_LAST) {
if (type >= NVC0_HW_SM_QUERY(0) && type <= NVC0_HW_SM_QUERY_LAST) {
/* for each MP:
* [00] = MP.C0
* [04] = MP.C1
@@ -327,9 +327,9 @@ nvc0_query_begin(struct pipe_context *pipe, struct pipe_query *pq)
q->u.value = 0;
} else
#endif
if ((q->type >= NVE4_PM_QUERY(0) && q->type <= NVE4_PM_QUERY_LAST) ||
(q->type >= NVC0_PM_QUERY(0) && q->type <= NVC0_PM_QUERY_LAST)) {
ret = nvc0_mp_pm_query_begin(nvc0, q);
if ((q->type >= NVE4_HW_SM_QUERY(0) && q->type <= NVE4_HW_SM_QUERY_LAST) ||
(q->type >= NVC0_HW_SM_QUERY(0) && q->type <= NVC0_HW_SM_QUERY_LAST)) {
ret = nvc0_hw_sm_query_begin(nvc0, q);
}
break;
}
@@ -412,9 +412,9 @@ nvc0_query_end(struct pipe_context *pipe, struct pipe_query *pq)
return;
} else
#endif
if ((q->type >= NVE4_PM_QUERY(0) && q->type <= NVE4_PM_QUERY_LAST) ||
(q->type >= NVC0_PM_QUERY(0) && q->type <= NVC0_PM_QUERY_LAST)) {
nvc0_mp_pm_query_end(nvc0, q);
if ((q->type >= NVE4_HW_SM_QUERY(0) && q->type <= NVE4_HW_SM_QUERY_LAST) ||
(q->type >= NVC0_HW_SM_QUERY(0) && q->type <= NVC0_HW_SM_QUERY_LAST)) {
nvc0_hw_sm_query_end(nvc0, q);
}
break;
}
@@ -453,9 +453,9 @@ nvc0_query_result(struct pipe_context *pipe, struct pipe_query *pq,
return true;
} else
#endif
if ((q->type >= NVE4_PM_QUERY(0) && q->type <= NVE4_PM_QUERY_LAST) ||
(q->type >= NVC0_PM_QUERY(0) && q->type <= NVC0_PM_QUERY_LAST)) {
return nvc0_mp_pm_query_result(nvc0, q, result, wait);
if ((q->type >= NVE4_HW_SM_QUERY(0) && q->type <= NVE4_HW_SM_QUERY_LAST) ||
(q->type >= NVC0_HW_SM_QUERY(0) && q->type <= NVC0_HW_SM_QUERY_LAST)) {
return nvc0_hw_sm_query_result(nvc0, q, result, wait);
}
if (q->state != NVC0_QUERY_STATE_READY)
@@ -692,7 +692,7 @@ static const char *nvc0_drv_stat_names[] =
* We could add a kernel interface for it, but reading the counters like this
* has the advantage of being async (if get_result isn't called immediately).
*/
static const uint64_t nve4_read_mp_pm_counters_code[] =
static const uint64_t nve4_read_hw_sm_counters_code[] =
{
/* sched 0x20 0x20 0x20 0x20 0x20 0x20 0x20
* mov b32 $r8 $tidx
@@ -776,6 +776,33 @@ static const uint64_t nve4_read_mp_pm_counters_code[] =
static const char *nve4_pm_query_names[] =
{
/* MP counters */
"active_cycles",
"active_warps",
"atom_count",
"branch",
"divergent_branch",
"gld_request",
"global_ld_mem_divergence_replays",
"global_store_transaction",
"global_st_mem_divergence_replays",
"gred_count",
"gst_request",
"inst_executed",
"inst_issued",
"inst_issued1",
"inst_issued2",
"l1_global_load_hit",
"l1_global_load_miss",
"l1_local_load_hit",
"l1_local_load_miss",
"l1_local_store_hit",
"l1_local_store_miss",
"l1_shared_load_transactions",
"l1_shared_store_transactions",
"local_load",
"local_load_transactions",
"local_store",
"local_store_transactions",
"prof_trigger_00",
"prof_trigger_01",
"prof_trigger_02",
@@ -784,41 +811,14 @@ static const char *nve4_pm_query_names[] =
"prof_trigger_05",
"prof_trigger_06",
"prof_trigger_07",
"warps_launched",
"threads_launched",
"sm_cta_launched",
"inst_issued1",
"inst_issued2",
"inst_executed",
"local_load",
"local_store",
"shared_load",
"shared_store",
"l1_local_load_hit",
"l1_local_load_miss",
"l1_local_store_hit",
"l1_local_store_miss",
"gld_request",
"gst_request",
"l1_global_load_hit",
"l1_global_load_miss",
"uncached_global_load_transaction",
"global_store_transaction",
"branch",
"divergent_branch",
"active_warps",
"active_cycles",
"inst_issued",
"atom_count",
"gred_count",
"shared_load_replay",
"shared_store",
"shared_store_replay",
"local_load_transactions",
"local_store_transactions",
"l1_shared_load_transactions",
"l1_shared_store_transactions",
"global_ld_mem_divergence_replays",
"global_st_mem_divergence_replays",
"sm_cta_launched",
"threads_launched",
"uncached_global_load_transaction",
"warps_launched",
/* metrics, i.e. functions of the MP counters */
"metric-ipc", /* inst_executed, clock */
"metric-ipac", /* inst_executed, active_cycles */
@@ -852,7 +852,7 @@ struct nvc0_mp_counter_cfg
#define NVC0_COUNTER_OP2_AVG_DIV_MM 5 /* avg(ctr0 / ctr1) */
#define NVC0_COUNTER_OP2_AVG_DIV_M0 6 /* avg(ctr0) / ctr1 of MP[0]) */
struct nvc0_mp_pm_query_cfg
struct nvc0_hw_sm_query_cfg
{
struct nvc0_mp_counter_cfg ctr[4];
uint8_t num_counters;
@@ -860,17 +860,17 @@ struct nvc0_mp_pm_query_cfg
uint8_t norm[2]; /* normalization num,denom */
};
#define _Q1A(n, f, m, g, s, nu, dn) [NVE4_PM_QUERY_##n] = { { { f, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m, 0, 0, NVE4_COMPUTE_MP_PM_A_SIGSEL_##g, s }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { nu, dn } }
#define _Q1B(n, f, m, g, s, nu, dn) [NVE4_PM_QUERY_##n] = { { { f, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m, 0, 1, NVE4_COMPUTE_MP_PM_B_SIGSEL_##g, s }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { nu, dn } }
#define _M2A(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_PM_QUERY_METRIC_##n] = { { \
#define _Q1A(n, f, m, g, s, nu, dn) [NVE4_HW_SM_QUERY_##n] = { { { f, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m, 0, 0, NVE4_COMPUTE_MP_PM_A_SIGSEL_##g, s }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { nu, dn } }
#define _Q1B(n, f, m, g, s, nu, dn) [NVE4_HW_SM_QUERY_##n] = { { { f, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m, 0, 1, NVE4_COMPUTE_MP_PM_B_SIGSEL_##g, s }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { nu, dn } }
#define _M2A(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_HW_SM_QUERY_METRIC_##n] = { { \
{ f0, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m0, 0, 0, NVE4_COMPUTE_MP_PM_A_SIGSEL_##g0, s0 }, \
{ f1, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m1, 0, 0, NVE4_COMPUTE_MP_PM_A_SIGSEL_##g1, s1 }, \
{}, {}, }, 2, NVC0_COUNTER_OP2_##o, { nu, dn } }
#define _M2B(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_PM_QUERY_METRIC_##n] = { { \
#define _M2B(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_HW_SM_QUERY_METRIC_##n] = { { \
{ f0, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m0, 0, 1, NVE4_COMPUTE_MP_PM_B_SIGSEL_##g0, s0 }, \
{ f1, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m1, 0, 1, NVE4_COMPUTE_MP_PM_B_SIGSEL_##g1, s1 }, \
{}, {}, }, 2, NVC0_COUNTER_OP2_##o, { nu, dn } }
#define _M2AB(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_PM_QUERY_METRIC_##n] = { { \
#define _M2AB(n, f0, m0, g0, s0, f1, m1, g1, s1, o, nu, dn) [NVE4_HW_SM_QUERY_METRIC_##n] = { { \
{ f0, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m0, 0, 0, NVE4_COMPUTE_MP_PM_A_SIGSEL_##g0, s0 }, \
{ f1, NVE4_COMPUTE_MP_PM_FUNC_MODE_##m1, 0, 1, NVE4_COMPUTE_MP_PM_B_SIGSEL_##g1, s1 }, \
{}, {}, }, 2, NVC0_COUNTER_OP2_##o, { nu, dn } }
@@ -881,8 +881,35 @@ struct nvc0_mp_pm_query_cfg
* metric-ipXc: we simply multiply by 4 to account for the 4 warp schedulers;
* this is inaccurate !
*/
static const struct nvc0_mp_pm_query_cfg nve4_mp_pm_queries[] =
static const struct nvc0_hw_sm_query_cfg nve4_hw_sm_queries[] =
{
_Q1B(ACTIVE_CYCLES, 0x0001, B6, WARP, 0x00000000, 1, 1),
_Q1B(ACTIVE_WARPS, 0x003f, B6, WARP, 0x31483104, 2, 1),
_Q1A(ATOM_COUNT, 0x0001, B6, BRANCH, 0x00000000, 1, 1),
_Q1A(BRANCH, 0x0001, B6, BRANCH, 0x0000000c, 1, 1),
_Q1A(DIVERGENT_BRANCH, 0x0001, B6, BRANCH, 0x00000010, 1, 1),
_Q1A(GLD_REQUEST, 0x0001, B6, LDST, 0x00000010, 1, 1),
_Q1B(GLD_MEM_DIV_REPLAY, 0x0001, B6, REPLAY, 0x00000010, 1, 1),
_Q1B(GST_TRANSACTIONS, 0x0001, B6, MEM, 0x00000004, 1, 1),
_Q1B(GST_MEM_DIV_REPLAY, 0x0001, B6, REPLAY, 0x00000014, 1, 1),
_Q1A(GRED_COUNT, 0x0001, B6, BRANCH, 0x00000008, 1, 1),
_Q1A(GST_REQUEST, 0x0001, B6, LDST, 0x00000014, 1, 1),
_Q1A(INST_EXECUTED, 0x0003, B6, EXEC, 0x00000398, 1, 1),
_Q1A(INST_ISSUED, 0x0003, B6, ISSUE, 0x00000104, 1, 1),
_Q1A(INST_ISSUED1, 0x0001, B6, ISSUE, 0x00000004, 1, 1),
_Q1A(INST_ISSUED2, 0x0001, B6, ISSUE, 0x00000008, 1, 1),
_Q1B(L1_GLD_HIT, 0x0001, B6, L1, 0x00000010, 1, 1),
_Q1B(L1_GLD_MISS, 0x0001, B6, L1, 0x00000014, 1, 1),
_Q1B(L1_LOCAL_LD_HIT, 0x0001, B6, L1, 0x00000000, 1, 1),
_Q1B(L1_LOCAL_LD_MISS, 0x0001, B6, L1, 0x00000004, 1, 1),
_Q1B(L1_LOCAL_ST_HIT, 0x0001, B6, L1, 0x00000008, 1, 1),
_Q1B(L1_LOCAL_ST_MISS, 0x0001, B6, L1, 0x0000000c, 1, 1),
_Q1B(L1_SHARED_LD_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000008, 1, 1),
_Q1B(L1_SHARED_ST_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x0000000c, 1, 1),
_Q1A(LOCAL_LD, 0x0001, B6, LDST, 0x00000008, 1, 1),
_Q1B(LOCAL_LD_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000000, 1, 1),
_Q1A(LOCAL_ST, 0x0001, B6, LDST, 0x0000000c, 1, 1),
_Q1B(LOCAL_ST_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000004, 1, 1),
_Q1A(PROF_TRIGGER_0, 0x0001, B6, USER, 0x00000000, 1, 1),
_Q1A(PROF_TRIGGER_1, 0x0001, B6, USER, 0x00000004, 1, 1),
_Q1A(PROF_TRIGGER_2, 0x0001, B6, USER, 0x00000008, 1, 1),
@@ -891,41 +918,14 @@ static const struct nvc0_mp_pm_query_cfg nve4_mp_pm_queries[] =
_Q1A(PROF_TRIGGER_5, 0x0001, B6, USER, 0x00000014, 1, 1),
_Q1A(PROF_TRIGGER_6, 0x0001, B6, USER, 0x00000018, 1, 1),
_Q1A(PROF_TRIGGER_7, 0x0001, B6, USER, 0x0000001c, 1, 1),
_Q1A(LAUNCHED_WARPS, 0x0001, B6, LAUNCH, 0x00000004, 1, 1),
_Q1A(LAUNCHED_THREADS, 0x003f, B6, LAUNCH, 0x398a4188, 1, 1),
_Q1B(LAUNCHED_CTA, 0x0001, B6, WARP, 0x0000001c, 1, 1),
_Q1A(INST_ISSUED1, 0x0001, B6, ISSUE, 0x00000004, 1, 1),
_Q1A(INST_ISSUED2, 0x0001, B6, ISSUE, 0x00000008, 1, 1),
_Q1A(INST_ISSUED, 0x0003, B6, ISSUE, 0x00000104, 1, 1),
_Q1A(INST_EXECUTED, 0x0003, B6, EXEC, 0x00000398, 1, 1),
_Q1A(LD_SHARED, 0x0001, B6, LDST, 0x00000000, 1, 1),
_Q1A(ST_SHARED, 0x0001, B6, LDST, 0x00000004, 1, 1),
_Q1A(LD_LOCAL, 0x0001, B6, LDST, 0x00000008, 1, 1),
_Q1A(ST_LOCAL, 0x0001, B6, LDST, 0x0000000c, 1, 1),
_Q1A(GLD_REQUEST, 0x0001, B6, LDST, 0x00000010, 1, 1),
_Q1A(GST_REQUEST, 0x0001, B6, LDST, 0x00000014, 1, 1),
_Q1B(L1_LOCAL_LOAD_HIT, 0x0001, B6, L1, 0x00000000, 1, 1),
_Q1B(L1_LOCAL_LOAD_MISS, 0x0001, B6, L1, 0x00000004, 1, 1),
_Q1B(L1_LOCAL_STORE_HIT, 0x0001, B6, L1, 0x00000008, 1, 1),
_Q1B(L1_LOCAL_STORE_MISS, 0x0001, B6, L1, 0x0000000c, 1, 1),
_Q1B(L1_GLOBAL_LOAD_HIT, 0x0001, B6, L1, 0x00000010, 1, 1),
_Q1B(L1_GLOBAL_LOAD_MISS, 0x0001, B6, L1, 0x00000014, 1, 1),
_Q1B(GLD_TRANSACTIONS_UNCACHED, 0x0001, B6, MEM, 0x00000000, 1, 1),
_Q1B(GST_TRANSACTIONS, 0x0001, B6, MEM, 0x00000004, 1, 1),
_Q1A(BRANCH, 0x0001, B6, BRANCH, 0x0000000c, 1, 1),
_Q1A(BRANCH_DIVERGENT, 0x0001, B6, BRANCH, 0x00000010, 1, 1),
_Q1B(ACTIVE_WARPS, 0x003f, B6, WARP, 0x31483104, 2, 1),
_Q1B(ACTIVE_CYCLES, 0x0001, B6, WARP, 0x00000000, 1, 1),
_Q1A(ATOM_COUNT, 0x0001, B6, BRANCH, 0x00000000, 1, 1),
_Q1A(GRED_COUNT, 0x0001, B6, BRANCH, 0x00000008, 1, 1),
_Q1B(LD_SHARED_REPLAY, 0x0001, B6, REPLAY, 0x00000008, 1, 1),
_Q1B(ST_SHARED_REPLAY, 0x0001, B6, REPLAY, 0x0000000c, 1, 1),
_Q1B(LD_LOCAL_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000000, 1, 1),
_Q1B(ST_LOCAL_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000004, 1, 1),
_Q1B(L1_LD_SHARED_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x00000008, 1, 1),
_Q1B(L1_ST_SHARED_TRANSACTIONS, 0x0001, B6, TRANSACTION, 0x0000000c, 1, 1),
_Q1B(GLD_MEM_DIV_REPLAY, 0x0001, B6, REPLAY, 0x00000010, 1, 1),
_Q1B(GST_MEM_DIV_REPLAY, 0x0001, B6, REPLAY, 0x00000014, 1, 1),
_Q1A(SHARED_LD, 0x0001, B6, LDST, 0x00000000, 1, 1),
_Q1B(SHARED_LD_REPLAY, 0x0001, B6, REPLAY, 0x00000008, 1, 1),
_Q1A(SHARED_ST, 0x0001, B6, LDST, 0x00000004, 1, 1),
_Q1B(SHARED_ST_REPLAY, 0x0001, B6, REPLAY, 0x0000000c, 1, 1),
_Q1B(SM_CTA_LAUNCHED, 0x0001, B6, WARP, 0x0000001c, 1, 1),
_Q1A(THREADS_LAUNCHED, 0x003f, B6, LAUNCH, 0x398a4188, 1, 1),
_Q1B(UNCACHED_GLD_TRANSACTIONS, 0x0001, B6, MEM, 0x00000000, 1, 1),
_Q1A(WARPS_LAUNCHED, 0x0001, B6, LAUNCH, 0x00000004, 1, 1),
_M2AB(IPC, 0x3, B6, EXEC, 0x398, 0xffff, LOGOP, WARP, 0x0, DIV_SUM_M0, 10, 1),
_M2AB(IPAC, 0x3, B6, EXEC, 0x398, 0x1, B6, WARP, 0x0, AVG_DIV_MM, 10, 1),
_M2A(IPEC, 0x3, B6, EXEC, 0x398, 0xe, LOGOP, EXEC, 0x398, AVG_DIV_MM, 10, 1),
@@ -940,7 +940,7 @@ static const struct nvc0_mp_pm_query_cfg nve4_mp_pm_queries[] =
#undef _M2B
/* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */
static const uint64_t nvc0_read_mp_pm_counters_code[] =
static const uint64_t nvc0_read_hw_sm_counters_code[] =
{
/* mov b32 $r8 $tidx
* mov b32 $r9 $physid
@@ -993,29 +993,21 @@ static const uint64_t nvc0_read_mp_pm_counters_code[] =
static const char *nvc0_pm_query_names[] =
{
/* MP counters */
"inst_executed",
"active_cycles",
"active_warps",
"atom_count",
"branch",
"divergent_branch",
"active_warps",
"active_cycles",
"warps_launched",
"threads_launched",
"shared_load",
"shared_store",
"local_load",
"local_store",
"gred_count",
"atom_count",
"gld_request",
"gred_count",
"gst_request",
"inst_executed",
"inst_issued1_0",
"inst_issued1_1",
"inst_issued2_0",
"inst_issued2_1",
"thread_inst_executed_0",
"thread_inst_executed_1",
"thread_inst_executed_2",
"thread_inst_executed_3",
"local_load",
"local_store",
"prof_trigger_00",
"prof_trigger_01",
"prof_trigger_02",
@@ -1024,35 +1016,35 @@ static const char *nvc0_pm_query_names[] =
"prof_trigger_05",
"prof_trigger_06",
"prof_trigger_07",
"shared_load",
"shared_store",
"threads_launched",
"thread_inst_executed_0",
"thread_inst_executed_1",
"thread_inst_executed_2",
"thread_inst_executed_3",
"warps_launched",
};
#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_PM_QUERY_##n] = { { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 1, 1 } }
#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 1, 1 } }
static const struct nvc0_mp_pm_query_cfg nvc0_mp_pm_queries[] =
static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
{
_Q(INST_EXECUTED, 0xaaaa, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 0x00, 0x00),
_Q(BRANCH, 0xaaaa, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 0x00, 0x00),
_Q(BRANCH_DIVERGENT, 0xaaaa, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 0x00, 0x00),
_Q(ACTIVE_WARPS, 0xaaaa, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65),
_Q(ACTIVE_CYCLES, 0xaaaa, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(LAUNCHED_WARPS, 0xaaaa, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(LAUNCHED_THREADS, 0xaaaa, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65),
_Q(LD_SHARED, 0xaaaa, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(ST_SHARED, 0xaaaa, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(LD_LOCAL, 0xaaaa, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(ST_LOCAL, 0xaaaa, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(GRED_COUNT, 0xaaaa, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(ACTIVE_WARPS, 0xaaaa, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65),
_Q(ATOM_COUNT, 0xaaaa, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(BRANCH, 0xaaaa, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 0x00, 0x00),
_Q(DIVERGENT_BRANCH, 0xaaaa, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 0x00, 0x00),
_Q(GLD_REQUEST, 0xaaaa, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(GRED_COUNT, 0xaaaa, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(GST_REQUEST, 0xaaaa, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(INST_EXECUTED, 0xaaaa, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 0x00, 0x00),
_Q(INST_ISSUED1_0, 0xaaaa, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(INST_ISSUED1_1, 0xaaaa, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(INST_ISSUED2_0, 0xaaaa, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(INST_ISSUED2_1, 0xaaaa, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(TH_INST_EXECUTED_0, 0xaaaa, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_1, 0xaaaa, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_2, 0xaaaa, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_3, 0xaaaa, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(LOCAL_LD, 0xaaaa, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(LOCAL_ST, 0xaaaa, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(PROF_TRIGGER_0, 0xaaaa, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(PROF_TRIGGER_1, 0xaaaa, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(PROF_TRIGGER_2, 0xaaaa, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00),
@@ -1061,38 +1053,46 @@ static const struct nvc0_mp_pm_query_cfg nvc0_mp_pm_queries[] =
_Q(PROF_TRIGGER_5, 0xaaaa, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(PROF_TRIGGER_6, 0xaaaa, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(PROF_TRIGGER_7, 0xaaaa, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(SHARED_LD, 0xaaaa, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(SHARED_ST, 0xaaaa, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00),
_Q(THREADS_LAUNCHED, 0xaaaa, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65),
_Q(TH_INST_EXECUTED_0, 0xaaaa, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_1, 0xaaaa, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_2, 0xaaaa, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(TH_INST_EXECUTED_3, 0xaaaa, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55),
_Q(WARPS_LAUNCHED, 0xaaaa, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00),
};
#undef _Q
static const struct nvc0_mp_pm_query_cfg *
nvc0_mp_pm_query_get_cfg(struct nvc0_context *nvc0, struct nvc0_query *q)
static const struct nvc0_hw_sm_query_cfg *
nvc0_hw_sm_query_get_cfg(struct nvc0_context *nvc0, struct nvc0_query *q)
{
struct nvc0_screen *screen = nvc0->screen;
if (screen->base.class_3d >= NVE4_3D_CLASS)
return &nve4_mp_pm_queries[q->type - PIPE_QUERY_DRIVER_SPECIFIC];
return &nvc0_mp_pm_queries[q->type - NVC0_PM_QUERY(0)];
return &nve4_hw_sm_queries[q->type - PIPE_QUERY_DRIVER_SPECIFIC];
return &nvc0_hw_sm_queries[q->type - NVC0_HW_SM_QUERY(0)];
}
boolean
nvc0_mp_pm_query_begin(struct nvc0_context *nvc0, struct nvc0_query *q)
nvc0_hw_sm_query_begin(struct nvc0_context *nvc0, struct nvc0_query *q)
{
struct nvc0_screen *screen = nvc0->screen;
struct nouveau_pushbuf *push = nvc0->base.pushbuf;
const bool is_nve4 = screen->base.class_3d >= NVE4_3D_CLASS;
const struct nvc0_mp_pm_query_cfg *cfg;
const struct nvc0_hw_sm_query_cfg *cfg;
unsigned i, c;
unsigned num_ab[2] = { 0, 0 };
cfg = nvc0_mp_pm_query_get_cfg(nvc0, q);
cfg = nvc0_hw_sm_query_get_cfg(nvc0, q);
/* check if we have enough free counter slots */
for (i = 0; i < cfg->num_counters; ++i)
num_ab[cfg->ctr[i].sig_dom]++;
if (screen->pm.num_mp_pm_active[0] + num_ab[0] > 4 ||
screen->pm.num_mp_pm_active[1] + num_ab[1] > 4) {
if (screen->pm.num_hw_sm_active[0] + num_ab[0] > 4 ||
screen->pm.num_hw_sm_active[1] + num_ab[1] > 4) {
NOUVEAU_ERR("Not enough free MP counter slots !\n");
return false;
}
@@ -1113,14 +1113,14 @@ nvc0_mp_pm_query_begin(struct nvc0_context *nvc0, struct nvc0_query *q)
for (i = 0; i < cfg->num_counters; ++i) {
const unsigned d = cfg->ctr[i].sig_dom;
if (!screen->pm.num_mp_pm_active[d]) {
if (!screen->pm.num_hw_sm_active[d]) {
uint32_t m = (1 << 22) | (1 << (7 + (8 * !d)));
if (screen->pm.num_mp_pm_active[!d])
if (screen->pm.num_hw_sm_active[!d])
m |= 1 << (7 + (8 * d));
BEGIN_NVC0(push, SUBC_SW(0x0600), 1);
PUSH_DATA (push, m);
}
screen->pm.num_mp_pm_active[d]++;
screen->pm.num_hw_sm_active[d]++;
for (c = d * 4; c < (d * 4 + 4); ++c) {
if (!screen->pm.mp_counter[c]) {
@@ -1163,7 +1163,7 @@ nvc0_mp_pm_query_begin(struct nvc0_context *nvc0, struct nvc0_query *q)
}
static void
nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
nvc0_hw_sm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
{
struct nvc0_screen *screen = nvc0->screen;
struct pipe_context *pipe = &nvc0->base.pipe;
@@ -1174,9 +1174,9 @@ nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
const uint block[3] = { 32, is_nve4 ? 4 : 1, 1 };
const uint grid[3] = { screen->mp_count, 1, 1 };
unsigned c;
const struct nvc0_mp_pm_query_cfg *cfg;
const struct nvc0_hw_sm_query_cfg *cfg;
cfg = nvc0_mp_pm_query_get_cfg(nvc0, q);
cfg = nvc0_hw_sm_query_get_cfg(nvc0, q);
if (unlikely(!screen->pm.prog)) {
struct nvc0_program *prog = CALLOC_STRUCT(nvc0_program);
@@ -1185,11 +1185,11 @@ nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
prog->num_gprs = 14;
prog->parm_size = 12;
if (is_nve4) {
prog->code = (uint32_t *)nve4_read_mp_pm_counters_code;
prog->code_size = sizeof(nve4_read_mp_pm_counters_code);
prog->code = (uint32_t *)nve4_read_hw_sm_counters_code;
prog->code_size = sizeof(nve4_read_hw_sm_counters_code);
} else {
prog->code = (uint32_t *)nvc0_read_mp_pm_counters_code;
prog->code_size = sizeof(nvc0_read_mp_pm_counters_code);
prog->code = (uint32_t *)nvc0_read_hw_sm_counters_code;
prog->code_size = sizeof(nvc0_read_hw_sm_counters_code);
}
screen->pm.prog = prog;
}
@@ -1207,7 +1207,7 @@ nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
/* release counters for this query */
for (c = 0; c < 8; ++c) {
if (nvc0_query(screen->pm.mp_counter[c]) == q) {
screen->pm.num_mp_pm_active[c / 4]--;
screen->pm.num_hw_sm_active[c / 4]--;
screen->pm.mp_counter[c] = NULL;
}
}
@@ -1234,7 +1234,7 @@ nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
q = nvc0_query(screen->pm.mp_counter[c]);
if (!q)
continue;
cfg = nvc0_mp_pm_query_get_cfg(nvc0, q);
cfg = nvc0_hw_sm_query_get_cfg(nvc0, q);
for (i = 0; i < cfg->num_counters; ++i) {
if (mask & (1 << q->ctr[i]))
break;
@@ -1250,10 +1250,10 @@ nvc0_mp_pm_query_end(struct nvc0_context *nvc0, struct nvc0_query *q)
}
static inline bool
nvc0_mp_pm_query_read_data(uint32_t count[32][4],
nvc0_hw_sm_query_read_data(uint32_t count[32][4],
struct nvc0_context *nvc0, bool wait,
struct nvc0_query *q,
const struct nvc0_mp_pm_query_cfg *cfg,
const struct nvc0_hw_sm_query_cfg *cfg,
unsigned mp_count)
{
unsigned p, c;
@@ -1275,10 +1275,10 @@ nvc0_mp_pm_query_read_data(uint32_t count[32][4],
}
static inline bool
nve4_mp_pm_query_read_data(uint32_t count[32][4],
nve4_hw_sm_query_read_data(uint32_t count[32][4],
struct nvc0_context *nvc0, bool wait,
struct nvc0_query *q,
const struct nvc0_mp_pm_query_cfg *cfg,
const struct nvc0_hw_sm_query_cfg *cfg,
unsigned mp_count)
{
unsigned p, c, d;
@@ -1317,22 +1317,22 @@ nve4_mp_pm_query_read_data(uint32_t count[32][4],
* NOTE: Interpretation of IPC requires knowledge of MP count.
*/
static boolean
nvc0_mp_pm_query_result(struct nvc0_context *nvc0, struct nvc0_query *q,
nvc0_hw_sm_query_result(struct nvc0_context *nvc0, struct nvc0_query *q,
void *result, boolean wait)
{
uint32_t count[32][4];
uint64_t value = 0;
unsigned mp_count = MIN2(nvc0->screen->mp_count_compute, 32);
unsigned p, c;
const struct nvc0_mp_pm_query_cfg *cfg;
const struct nvc0_hw_sm_query_cfg *cfg;
bool ret;
cfg = nvc0_mp_pm_query_get_cfg(nvc0, q);
cfg = nvc0_hw_sm_query_get_cfg(nvc0, q);
if (nvc0->screen->base.class_3d >= NVE4_3D_CLASS)
ret = nve4_mp_pm_query_read_data(count, nvc0, wait, q, cfg, mp_count);
ret = nve4_hw_sm_query_read_data(count, nvc0, wait, q, cfg, mp_count);
else
ret = nvc0_mp_pm_query_read_data(count, nvc0, wait, q, cfg, mp_count);
ret = nvc0_hw_sm_query_read_data(count, nvc0, wait, q, cfg, mp_count);
if (!ret)
return false;
@@ -1410,11 +1410,11 @@ nvc0_screen_get_driver_query_info(struct pipe_screen *pscreen,
if (screen->base.device->drm_version >= 0x01000101) {
if (screen->compute) {
if (screen->base.class_3d == NVE4_3D_CLASS) {
count += NVE4_PM_QUERY_COUNT;
count += NVE4_HW_SM_QUERY_COUNT;
} else
if (screen->base.class_3d < NVE4_3D_CLASS) {
/* NVC0_COMPUTE is not always enabled */
count += NVC0_PM_QUERY_COUNT;
count += NVC0_HW_SM_QUERY_COUNT;
}
}
}
@@ -1444,15 +1444,15 @@ nvc0_screen_get_driver_query_info(struct pipe_screen *pscreen,
if (screen->compute) {
if (screen->base.class_3d == NVE4_3D_CLASS) {
info->name = nve4_pm_query_names[id - NVC0_QUERY_DRV_STAT_COUNT];
info->query_type = NVE4_PM_QUERY(id - NVC0_QUERY_DRV_STAT_COUNT);
info->query_type = NVE4_HW_SM_QUERY(id - NVC0_QUERY_DRV_STAT_COUNT);
info->max_value.u64 =
(id < NVE4_PM_QUERY_METRIC_MP_OCCUPANCY) ? 0 : 100;
(id < NVE4_HW_SM_QUERY_METRIC_MP_OCCUPANCY) ? 0 : 100;
info->group_id = NVC0_QUERY_MP_COUNTER_GROUP;
return 1;
} else
if (screen->base.class_3d < NVE4_3D_CLASS) {
info->name = nvc0_pm_query_names[id - NVC0_QUERY_DRV_STAT_COUNT];
info->query_type = NVC0_PM_QUERY(id - NVC0_QUERY_DRV_STAT_COUNT);
info->query_type = NVC0_HW_SM_QUERY(id - NVC0_QUERY_DRV_STAT_COUNT);
info->group_id = NVC0_QUERY_MP_COUNTER_GROUP;
return 1;
}
@@ -1494,7 +1494,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen *pscreen,
info->type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU;
if (screen->base.class_3d == NVE4_3D_CLASS) {
info->num_queries = NVE4_PM_QUERY_COUNT;
info->num_queries = NVE4_HW_SM_QUERY_COUNT;
/* On NVE4+, each multiprocessor have 8 hardware counters separated
* in two distinct domains, but we allow only one active query
@@ -1504,7 +1504,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen *pscreen,
return 1;
} else
if (screen->base.class_3d < NVE4_3D_CLASS) {
info->num_queries = NVC0_PM_QUERY_COUNT;
info->num_queries = NVC0_HW_SM_QUERY_COUNT;
/* On NVC0:NVE4, each multiprocessor have 8 hardware counters
* in a single domain. */

View File

@@ -95,7 +95,7 @@ struct nvc0_screen {
struct {
struct nvc0_program *prog; /* compute state object to read MP counters */
struct pipe_query *mp_counter[8]; /* counter to query allocation */
uint8_t num_mp_pm_active[2];
uint8_t num_hw_sm_active[2];
bool mp_counters_enabled;
} pm;
@@ -120,156 +120,139 @@ nvc0_screen(struct pipe_screen *screen)
/* Performance counter queries:
*/
#define NVE4_PM_QUERY_COUNT 49
#define NVE4_PM_QUERY(i) (PIPE_QUERY_DRIVER_SPECIFIC + (i))
#define NVE4_PM_QUERY_LAST NVE4_PM_QUERY(NVE4_PM_QUERY_COUNT - 1)
#define NVE4_PM_QUERY_PROF_TRIGGER_0 0
#define NVE4_PM_QUERY_PROF_TRIGGER_1 1
#define NVE4_PM_QUERY_PROF_TRIGGER_2 2
#define NVE4_PM_QUERY_PROF_TRIGGER_3 3
#define NVE4_PM_QUERY_PROF_TRIGGER_4 4
#define NVE4_PM_QUERY_PROF_TRIGGER_5 5
#define NVE4_PM_QUERY_PROF_TRIGGER_6 6
#define NVE4_PM_QUERY_PROF_TRIGGER_7 7
#define NVE4_PM_QUERY_LAUNCHED_WARPS 8
#define NVE4_PM_QUERY_LAUNCHED_THREADS 9
#define NVE4_PM_QUERY_LAUNCHED_CTA 10
#define NVE4_PM_QUERY_INST_ISSUED1 11
#define NVE4_PM_QUERY_INST_ISSUED2 12
#define NVE4_PM_QUERY_INST_EXECUTED 13
#define NVE4_PM_QUERY_LD_LOCAL 14
#define NVE4_PM_QUERY_ST_LOCAL 15
#define NVE4_PM_QUERY_LD_SHARED 16
#define NVE4_PM_QUERY_ST_SHARED 17
#define NVE4_PM_QUERY_L1_LOCAL_LOAD_HIT 18
#define NVE4_PM_QUERY_L1_LOCAL_LOAD_MISS 19
#define NVE4_PM_QUERY_L1_LOCAL_STORE_HIT 20
#define NVE4_PM_QUERY_L1_LOCAL_STORE_MISS 21
#define NVE4_PM_QUERY_GLD_REQUEST 22
#define NVE4_PM_QUERY_GST_REQUEST 23
#define NVE4_PM_QUERY_L1_GLOBAL_LOAD_HIT 24
#define NVE4_PM_QUERY_L1_GLOBAL_LOAD_MISS 25
#define NVE4_PM_QUERY_GLD_TRANSACTIONS_UNCACHED 26
#define NVE4_PM_QUERY_GST_TRANSACTIONS 27
#define NVE4_PM_QUERY_BRANCH 28
#define NVE4_PM_QUERY_BRANCH_DIVERGENT 29
#define NVE4_PM_QUERY_ACTIVE_WARPS 30
#define NVE4_PM_QUERY_ACTIVE_CYCLES 31
#define NVE4_PM_QUERY_INST_ISSUED 32
#define NVE4_PM_QUERY_ATOM_COUNT 33
#define NVE4_PM_QUERY_GRED_COUNT 34
#define NVE4_PM_QUERY_LD_SHARED_REPLAY 35
#define NVE4_PM_QUERY_ST_SHARED_REPLAY 36
#define NVE4_PM_QUERY_LD_LOCAL_TRANSACTIONS 37
#define NVE4_PM_QUERY_ST_LOCAL_TRANSACTIONS 38
#define NVE4_PM_QUERY_L1_LD_SHARED_TRANSACTIONS 39
#define NVE4_PM_QUERY_L1_ST_SHARED_TRANSACTIONS 40
#define NVE4_PM_QUERY_GLD_MEM_DIV_REPLAY 41
#define NVE4_PM_QUERY_GST_MEM_DIV_REPLAY 42
#define NVE4_PM_QUERY_METRIC_IPC 43
#define NVE4_PM_QUERY_METRIC_IPAC 44
#define NVE4_PM_QUERY_METRIC_IPEC 45
#define NVE4_PM_QUERY_METRIC_MP_OCCUPANCY 46
#define NVE4_PM_QUERY_METRIC_MP_EFFICIENCY 47
#define NVE4_PM_QUERY_METRIC_INST_REPLAY_OHEAD 48
#define NVE4_HW_SM_QUERY(i) (PIPE_QUERY_DRIVER_SPECIFIC + (i))
#define NVE4_HW_SM_QUERY_LAST NVE4_HW_SM_QUERY(NVE4_HW_SM_QUERY_COUNT - 1)
enum nve4_pm_queries
{
NVE4_HW_SM_QUERY_ACTIVE_CYCLES = 0,
NVE4_HW_SM_QUERY_ACTIVE_WARPS,
NVE4_HW_SM_QUERY_ATOM_COUNT,
NVE4_HW_SM_QUERY_BRANCH,
NVE4_HW_SM_QUERY_DIVERGENT_BRANCH,
NVE4_HW_SM_QUERY_GLD_REQUEST,
NVE4_HW_SM_QUERY_GLD_MEM_DIV_REPLAY,
NVE4_HW_SM_QUERY_GST_TRANSACTIONS,
NVE4_HW_SM_QUERY_GST_MEM_DIV_REPLAY,
NVE4_HW_SM_QUERY_GRED_COUNT,
NVE4_HW_SM_QUERY_GST_REQUEST,
NVE4_HW_SM_QUERY_INST_EXECUTED,
NVE4_HW_SM_QUERY_INST_ISSUED,
NVE4_HW_SM_QUERY_INST_ISSUED1,
NVE4_HW_SM_QUERY_INST_ISSUED2,
NVE4_HW_SM_QUERY_L1_GLD_HIT,
NVE4_HW_SM_QUERY_L1_GLD_MISS,
NVE4_HW_SM_QUERY_L1_LOCAL_LD_HIT,
NVE4_HW_SM_QUERY_L1_LOCAL_LD_MISS,
NVE4_HW_SM_QUERY_L1_LOCAL_ST_HIT,
NVE4_HW_SM_QUERY_L1_LOCAL_ST_MISS,
NVE4_HW_SM_QUERY_L1_SHARED_LD_TRANSACTIONS,
NVE4_HW_SM_QUERY_L1_SHARED_ST_TRANSACTIONS,
NVE4_HW_SM_QUERY_LOCAL_LD,
NVE4_HW_SM_QUERY_LOCAL_LD_TRANSACTIONS,
NVE4_HW_SM_QUERY_LOCAL_ST,
NVE4_HW_SM_QUERY_LOCAL_ST_TRANSACTIONS,
NVE4_HW_SM_QUERY_PROF_TRIGGER_0,
NVE4_HW_SM_QUERY_PROF_TRIGGER_1,
NVE4_HW_SM_QUERY_PROF_TRIGGER_2,
NVE4_HW_SM_QUERY_PROF_TRIGGER_3,
NVE4_HW_SM_QUERY_PROF_TRIGGER_4,
NVE4_HW_SM_QUERY_PROF_TRIGGER_5,
NVE4_HW_SM_QUERY_PROF_TRIGGER_6,
NVE4_HW_SM_QUERY_PROF_TRIGGER_7,
NVE4_HW_SM_QUERY_SHARED_LD,
NVE4_HW_SM_QUERY_SHARED_LD_REPLAY,
NVE4_HW_SM_QUERY_SHARED_ST,
NVE4_HW_SM_QUERY_SHARED_ST_REPLAY,
NVE4_HW_SM_QUERY_SM_CTA_LAUNCHED,
NVE4_HW_SM_QUERY_THREADS_LAUNCHED,
NVE4_HW_SM_QUERY_UNCACHED_GLD_TRANSACTIONS,
NVE4_HW_SM_QUERY_WARPS_LAUNCHED,
NVE4_HW_SM_QUERY_METRIC_IPC,
NVE4_HW_SM_QUERY_METRIC_IPAC,
NVE4_HW_SM_QUERY_METRIC_IPEC,
NVE4_HW_SM_QUERY_METRIC_MP_OCCUPANCY,
NVE4_HW_SM_QUERY_METRIC_MP_EFFICIENCY,
NVE4_HW_SM_QUERY_METRIC_INST_REPLAY_OHEAD,
NVE4_HW_SM_QUERY_COUNT
};
/*
#define NVE4_PM_QUERY_GR_IDLE 50
#define NVE4_PM_QUERY_BSP_IDLE 51
#define NVE4_PM_QUERY_VP_IDLE 52
#define NVE4_PM_QUERY_PPP_IDLE 53
#define NVE4_PM_QUERY_CE0_IDLE 54
#define NVE4_PM_QUERY_CE1_IDLE 55
#define NVE4_PM_QUERY_CE2_IDLE 56
*/
/* L2 queries (PCOUNTER) */
/*
#define NVE4_PM_QUERY_L2_SUBP_WRITE_L1_SECTOR_QUERIES 57
...
*/
/* TEX queries (PCOUNTER) */
/*
#define NVE4_PM_QUERY_TEX0_CACHE_SECTOR_QUERIES 58
...
*/
#define NVC0_PM_QUERY_COUNT 31
#define NVC0_PM_QUERY(i) (PIPE_QUERY_DRIVER_SPECIFIC + 2048 + (i))
#define NVC0_PM_QUERY_LAST NVC0_PM_QUERY(NVC0_PM_QUERY_COUNT - 1)
#define NVC0_PM_QUERY_INST_EXECUTED 0
#define NVC0_PM_QUERY_BRANCH 1
#define NVC0_PM_QUERY_BRANCH_DIVERGENT 2
#define NVC0_PM_QUERY_ACTIVE_WARPS 3
#define NVC0_PM_QUERY_ACTIVE_CYCLES 4
#define NVC0_PM_QUERY_LAUNCHED_WARPS 5
#define NVC0_PM_QUERY_LAUNCHED_THREADS 6
#define NVC0_PM_QUERY_LD_SHARED 7
#define NVC0_PM_QUERY_ST_SHARED 8
#define NVC0_PM_QUERY_LD_LOCAL 9
#define NVC0_PM_QUERY_ST_LOCAL 10
#define NVC0_PM_QUERY_GRED_COUNT 11
#define NVC0_PM_QUERY_ATOM_COUNT 12
#define NVC0_PM_QUERY_GLD_REQUEST 13
#define NVC0_PM_QUERY_GST_REQUEST 14
#define NVC0_PM_QUERY_INST_ISSUED1_0 15
#define NVC0_PM_QUERY_INST_ISSUED1_1 16
#define NVC0_PM_QUERY_INST_ISSUED2_0 17
#define NVC0_PM_QUERY_INST_ISSUED2_1 18
#define NVC0_PM_QUERY_TH_INST_EXECUTED_0 19
#define NVC0_PM_QUERY_TH_INST_EXECUTED_1 20
#define NVC0_PM_QUERY_TH_INST_EXECUTED_2 21
#define NVC0_PM_QUERY_TH_INST_EXECUTED_3 22
#define NVC0_PM_QUERY_PROF_TRIGGER_0 23
#define NVC0_PM_QUERY_PROF_TRIGGER_1 24
#define NVC0_PM_QUERY_PROF_TRIGGER_2 25
#define NVC0_PM_QUERY_PROF_TRIGGER_3 26
#define NVC0_PM_QUERY_PROF_TRIGGER_4 27
#define NVC0_PM_QUERY_PROF_TRIGGER_5 28
#define NVC0_PM_QUERY_PROF_TRIGGER_6 29
#define NVC0_PM_QUERY_PROF_TRIGGER_7 30
#define NVC0_HW_SM_QUERY(i) (PIPE_QUERY_DRIVER_SPECIFIC + 2048 + (i))
#define NVC0_HW_SM_QUERY_LAST NVC0_HW_SM_QUERY(NVC0_HW_SM_QUERY_COUNT - 1)
enum nvc0_pm_queries
{
NVC0_HW_SM_QUERY_ACTIVE_CYCLES = 0,
NVC0_HW_SM_QUERY_ACTIVE_WARPS,
NVC0_HW_SM_QUERY_ATOM_COUNT,
NVC0_HW_SM_QUERY_BRANCH,
NVC0_HW_SM_QUERY_DIVERGENT_BRANCH,
NVC0_HW_SM_QUERY_GLD_REQUEST,
NVC0_HW_SM_QUERY_GRED_COUNT,
NVC0_HW_SM_QUERY_GST_REQUEST,
NVC0_HW_SM_QUERY_INST_EXECUTED,
NVC0_HW_SM_QUERY_INST_ISSUED1_0,
NVC0_HW_SM_QUERY_INST_ISSUED1_1,
NVC0_HW_SM_QUERY_INST_ISSUED2_0,
NVC0_HW_SM_QUERY_INST_ISSUED2_1,
NVC0_HW_SM_QUERY_LOCAL_LD,
NVC0_HW_SM_QUERY_LOCAL_ST,
NVC0_HW_SM_QUERY_PROF_TRIGGER_0,
NVC0_HW_SM_QUERY_PROF_TRIGGER_1,
NVC0_HW_SM_QUERY_PROF_TRIGGER_2,
NVC0_HW_SM_QUERY_PROF_TRIGGER_3,
NVC0_HW_SM_QUERY_PROF_TRIGGER_4,
NVC0_HW_SM_QUERY_PROF_TRIGGER_5,
NVC0_HW_SM_QUERY_PROF_TRIGGER_6,
NVC0_HW_SM_QUERY_PROF_TRIGGER_7,
NVC0_HW_SM_QUERY_SHARED_LD,
NVC0_HW_SM_QUERY_SHARED_ST,
NVC0_HW_SM_QUERY_THREADS_LAUNCHED,
NVC0_HW_SM_QUERY_TH_INST_EXECUTED_0,
NVC0_HW_SM_QUERY_TH_INST_EXECUTED_1,
NVC0_HW_SM_QUERY_TH_INST_EXECUTED_2,
NVC0_HW_SM_QUERY_TH_INST_EXECUTED_3,
NVC0_HW_SM_QUERY_WARPS_LAUNCHED,
NVC0_HW_SM_QUERY_COUNT
};
/* Driver statistics queries:
*/
#ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS
#define NVC0_QUERY_DRV_STAT(i) (PIPE_QUERY_DRIVER_SPECIFIC + 1024 + (i))
#define NVC0_QUERY_DRV_STAT_COUNT 29
#define NVC0_QUERY_DRV_STAT_LAST NVC0_QUERY_DRV_STAT(NVC0_QUERY_DRV_STAT_COUNT - 1)
#define NVC0_QUERY_DRV_STAT_TEX_OBJECT_CURRENT_COUNT 0
#define NVC0_QUERY_DRV_STAT_TEX_OBJECT_CURRENT_BYTES 1
#define NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_COUNT 2
#define NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_BYTES_VID 3
#define NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_BYTES_SYS 4
#define NVC0_QUERY_DRV_STAT_TEX_TRANSFERS_READ 5
#define NVC0_QUERY_DRV_STAT_TEX_TRANSFERS_WRITE 6
#define NVC0_QUERY_DRV_STAT_TEX_COPY_COUNT 7
#define NVC0_QUERY_DRV_STAT_TEX_BLIT_COUNT 8
#define NVC0_QUERY_DRV_STAT_TEX_CACHE_FLUSH_COUNT 9
#define NVC0_QUERY_DRV_STAT_BUF_TRANSFERS_READ 10
#define NVC0_QUERY_DRV_STAT_BUF_TRANSFERS_WRITE 11
#define NVC0_QUERY_DRV_STAT_BUF_READ_BYTES_STAGING_VID 12
#define NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_DIRECT 13
#define NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_STAGING_VID 14
#define NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_STAGING_SYS 15
#define NVC0_QUERY_DRV_STAT_BUF_COPY_BYTES 16
#define NVC0_QUERY_DRV_STAT_BUF_NON_KERNEL_FENCE_SYNC_COUNT 17
#define NVC0_QUERY_DRV_STAT_ANY_NON_KERNEL_FENCE_SYNC_COUNT 18
#define NVC0_QUERY_DRV_STAT_QUERY_SYNC_COUNT 19
#define NVC0_QUERY_DRV_STAT_GPU_SERIALIZE_COUNT 20
#define NVC0_QUERY_DRV_STAT_DRAW_CALLS_ARRAY 21
#define NVC0_QUERY_DRV_STAT_DRAW_CALLS_INDEXED 22
#define NVC0_QUERY_DRV_STAT_DRAW_CALLS_FALLBACK_COUNT 23
#define NVC0_QUERY_DRV_STAT_USER_BUFFER_UPLOAD_BYTES 24
#define NVC0_QUERY_DRV_STAT_CONSTBUF_UPLOAD_COUNT 25
#define NVC0_QUERY_DRV_STAT_CONSTBUF_UPLOAD_BYTES 26
#define NVC0_QUERY_DRV_STAT_PUSHBUF_COUNT 27
#define NVC0_QUERY_DRV_STAT_RESOURCE_VALIDATE_COUNT 28
#else
#define NVC0_QUERY_DRV_STAT_COUNT 0
enum nvc0_drv_stats_queries
{
#ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS
NVC0_QUERY_DRV_STAT_TEX_OBJECT_CURRENT_COUNT = 0,
NVC0_QUERY_DRV_STAT_TEX_OBJECT_CURRENT_BYTES,
NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_COUNT,
NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_BYTES_VID,
NVC0_QUERY_DRV_STAT_BUF_OBJECT_CURRENT_BYTES_SYS,
NVC0_QUERY_DRV_STAT_TEX_TRANSFERS_READ,
NVC0_QUERY_DRV_STAT_TEX_TRANSFERS_WRITE,
NVC0_QUERY_DRV_STAT_TEX_COPY_COUNT,
NVC0_QUERY_DRV_STAT_TEX_BLIT_COUNT,
NVC0_QUERY_DRV_STAT_TEX_CACHE_FLUSH_COUNT,
NVC0_QUERY_DRV_STAT_BUF_TRANSFERS_READ,
NVC0_QUERY_DRV_STAT_BUF_TRANSFERS_WRITE,
NVC0_QUERY_DRV_STAT_BUF_READ_BYTES_STAGING_VID,
NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_DIRECT,
NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_STAGING_VID,
NVC0_QUERY_DRV_STAT_BUF_WRITE_BYTES_STAGING_SYS,
NVC0_QUERY_DRV_STAT_BUF_COPY_BYTES,
NVC0_QUERY_DRV_STAT_BUF_NON_KERNEL_FENCE_SYNC_COUNT,
NVC0_QUERY_DRV_STAT_ANY_NON_KERNEL_FENCE_SYNC_COUNT,
NVC0_QUERY_DRV_STAT_QUERY_SYNC_COUNT,
NVC0_QUERY_DRV_STAT_GPU_SERIALIZE_COUNT,
NVC0_QUERY_DRV_STAT_DRAW_CALLS_ARRAY,
NVC0_QUERY_DRV_STAT_DRAW_CALLS_INDEXED,
NVC0_QUERY_DRV_STAT_DRAW_CALLS_FALLBACK_COUNT,
NVC0_QUERY_DRV_STAT_USER_BUFFER_UPLOAD_BYTES,
NVC0_QUERY_DRV_STAT_CONSTBUF_UPLOAD_COUNT,
NVC0_QUERY_DRV_STAT_CONSTBUF_UPLOAD_BYTES,
NVC0_QUERY_DRV_STAT_PUSHBUF_COUNT,
NVC0_QUERY_DRV_STAT_RESOURCE_VALIDATE_COUNT,
#endif
NVC0_QUERY_DRV_STAT_COUNT
};
int nvc0_screen_get_driver_query_info(struct pipe_screen *, unsigned,
struct pipe_driver_query_info *);

View File

@@ -887,6 +887,7 @@ nvc0_blitctx_prepare_state(struct nvc0_blitctx *blit)
/* zsa state */
IMMED_NVC0(push, NVC0_3D(DEPTH_TEST_ENABLE), 0);
IMMED_NVC0(push, NVC0_3D(DEPTH_BOUNDS_EN), 0);
IMMED_NVC0(push, NVC0_3D(STENCIL_ENABLE), 0);
IMMED_NVC0(push, NVC0_3D(ALPHA_TEST_ENABLE), 0);

View File

@@ -363,7 +363,7 @@ static void r300_init_states(struct pipe_context *pipe)
}
struct pipe_context* r300_create_context(struct pipe_screen* screen,
void *priv)
void *priv, unsigned flags)
{
struct r300_context* r300 = CALLOC_STRUCT(r300_context);
struct r300_screen* r300screen = r300_screen(screen);

View File

@@ -705,7 +705,7 @@ r300_get_nonnull_cb(struct pipe_framebuffer_state *fb, unsigned i)
}
struct pipe_context* r300_create_context(struct pipe_screen* screen,
void *priv);
void *priv, unsigned flags);
/* Context initialization. */
struct draw_stage* r300_draw_stage(struct r300_context* r300);

View File

@@ -120,7 +120,7 @@ int64_t compute_memory_prealloc_chunk(
assert(size_in_dw <= pool->size_in_dw);
COMPUTE_DBG(pool->screen, "* compute_memory_prealloc_chunk() size_in_dw = %ld\n",
COMPUTE_DBG(pool->screen, "* compute_memory_prealloc_chunk() size_in_dw = %"PRIi64"\n",
size_in_dw);
LIST_FOR_EACH_ENTRY(item, pool->item_list, link) {
@@ -151,7 +151,7 @@ struct list_head *compute_memory_postalloc_chunk(
struct compute_memory_item *next;
struct list_head *next_link;
COMPUTE_DBG(pool->screen, "* compute_memory_postalloc_chunck() start_in_dw = %ld\n",
COMPUTE_DBG(pool->screen, "* compute_memory_postalloc_chunck() start_in_dw = %"PRIi64"\n",
start_in_dw);
/* Check if we can insert it in the front of the list */
@@ -568,7 +568,7 @@ void compute_memory_free(struct compute_memory_pool* pool, int64_t id)
struct pipe_screen *screen = (struct pipe_screen *)pool->screen;
struct pipe_resource *res;
COMPUTE_DBG(pool->screen, "* compute_memory_free() id + %ld \n", id);
COMPUTE_DBG(pool->screen, "* compute_memory_free() id + %"PRIi64" \n", id);
LIST_FOR_EACH_ENTRY_SAFE(item, next, pool->item_list, link) {
@@ -628,7 +628,7 @@ struct compute_memory_item* compute_memory_alloc(
{
struct compute_memory_item *new_item = NULL;
COMPUTE_DBG(pool->screen, "* compute_memory_alloc() size_in_dw = %ld (%ld bytes)\n",
COMPUTE_DBG(pool->screen, "* compute_memory_alloc() size_in_dw = %"PRIi64" (%"PRIi64" bytes)\n",
size_in_dw, 4 * size_in_dw);
new_item = (struct compute_memory_item *)

View File

@@ -2143,11 +2143,11 @@ static void evergreen_emit_shader_stages(struct r600_context *rctx, struct r600_
if (state->geom_enable) {
uint32_t cut_val;
if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 128)
if (rctx->gs_shader->gs_max_out_vertices <= 128)
cut_val = V_028A40_GS_CUT_128;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 256)
else if (rctx->gs_shader->gs_max_out_vertices <= 256)
cut_val = V_028A40_GS_CUT_256;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 512)
else if (rctx->gs_shader->gs_max_out_vertices <= 512)
cut_val = V_028A40_GS_CUT_512;
else
cut_val = V_028A40_GS_CUT_1024;
@@ -3013,7 +3013,7 @@ void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader
struct r600_shader *rshader = &shader->shader;
struct r600_shader *cp_shader = &shader->gs_copy_shader->shader;
unsigned gsvs_itemsize =
(cp_shader->ring_item_size * rshader->gs_max_out_vertices) >> 2;
(cp_shader->ring_item_size * shader->selector->gs_max_out_vertices) >> 2;
r600_init_command_buffer(cb, 64);
@@ -3022,14 +3022,14 @@ void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader
r600_store_context_reg(cb, R_028AB8_VGT_VTX_CNT_EN, 1);
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
S_028B38_MAX_VERT_OUT(rshader->gs_max_out_vertices));
S_028B38_MAX_VERT_OUT(shader->selector->gs_max_out_vertices));
r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
r600_conv_prim_to_gs_out(rshader->gs_output_prim));
r600_conv_prim_to_gs_out(shader->selector->gs_output_prim));
if (rctx->screen->b.info.drm_minor >= 35) {
r600_store_context_reg(cb, R_028B90_VGT_GS_INSTANCE_CNT,
S_028B90_CNT(MIN2(rshader->gs_num_invocations, 127)) |
S_028B90_ENABLE(rshader->gs_num_invocations > 0));
S_028B90_CNT(MIN2(shader->selector->gs_num_invocations, 127)) |
S_028B90_ENABLE(shader->selector->gs_num_invocations > 0));
}
r600_store_context_reg_seq(cb, R_02891C_SQ_GS_VERT_ITEMSIZE, 4);
r600_store_value(cb, cp_shader->ring_item_size >> 2);

View File

@@ -2029,6 +2029,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
fprintf(stderr, "CND:%X ", cf->cond);
if (cf->pop_count)
fprintf(stderr, "POP:%X ", cf->pop_count);
if (cf->end_of_program)
fprintf(stderr, "EOP ");
fprintf(stderr, "\n");
}
}

View File

@@ -108,7 +108,8 @@ static void r600_destroy_context(struct pipe_context *context)
FREE(rctx);
}
static struct pipe_context *r600_create_context(struct pipe_screen *screen, void *priv)
static struct pipe_context *r600_create_context(struct pipe_screen *screen,
void *priv, unsigned flags)
{
struct r600_context *rctx = CALLOC_STRUCT(r600_context);
struct r600_screen* rscreen = (struct r600_screen *)screen;
@@ -624,7 +625,7 @@ struct pipe_screen *r600_screen_create(struct radeon_winsys *ws)
rscreen->global_pool = compute_memory_pool_new(rscreen);
/* Create the auxiliary context. This must be done last. */
rscreen->b.aux_context = rscreen->b.b.context_create(&rscreen->b.b, NULL);
rscreen->b.aux_context = rscreen->b.b.context_create(&rscreen->b.b, NULL, 0);
#if 0 /* This is for testing whether aux_context and buffer clearing work correctly. */
struct pipe_resource templ = {};

View File

@@ -36,6 +36,8 @@
#include "util/list.h"
#include "util/u_transfer.h"
#include "tgsi/tgsi_scan.h"
#define R600_NUM_ATOMS 75
#define R600_MAX_VIEWPORTS 16
@@ -305,12 +307,18 @@ struct r600_pipe_shader_selector {
struct tgsi_token *tokens;
struct pipe_stream_output_info so;
struct tgsi_shader_info info;
unsigned num_shaders;
/* PIPE_SHADER_[VERTEX|FRAGMENT|...] */
unsigned type;
/* geometry shader properties */
unsigned gs_output_prim;
unsigned gs_max_out_vertices;
unsigned gs_num_invocations;
unsigned nr_ps_max_color_exports;
};
@@ -936,28 +944,5 @@ static inline bool r600_can_read_depth(struct r600_texture *rtex)
#define V_028A6C_OUTPRIM_TYPE_LINESTRIP 1
#define V_028A6C_OUTPRIM_TYPE_TRISTRIP 2
static inline unsigned r600_conv_prim_to_gs_out(unsigned mode)
{
static const int prim_conv[] = {
V_028A6C_OUTPRIM_TYPE_POINTLIST,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode < Elements(prim_conv));
return prim_conv[mode];
}
unsigned r600_conv_prim_to_gs_out(unsigned mode);
#endif

View File

@@ -1809,7 +1809,6 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
struct tgsi_token *tokens = pipeshader->selector->tokens;
struct pipe_stream_output_info so = pipeshader->selector->so;
struct tgsi_full_immediate *immediate;
struct tgsi_full_property *property;
struct r600_shader_ctx ctx;
struct r600_bytecode_output output[32];
unsigned output_done, noutput;
@@ -1840,7 +1839,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
shader->indirect_files = ctx.info.indirect_files;
indirect_gprs = ctx.info.indirect_files & ~(1 << TGSI_FILE_CONSTANT);
tgsi_parse_init(&ctx.parse, tokens);
ctx.type = ctx.parse.FullHeader.Processor.Processor;
ctx.type = ctx.info.processor;
shader->processor_type = ctx.type;
ctx.bc->type = shader->processor_type;
@@ -1968,6 +1967,12 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
ctx.nliterals = 0;
ctx.literals = NULL;
shader->fs_write_all = FALSE;
if (ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS])
shader->fs_write_all = TRUE;
shader->vs_position_window_space = FALSE;
if (ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION])
shader->vs_position_window_space = TRUE;
if (shader->vs_as_gs_a)
vs_add_primid_output(&ctx, key.vs.prim_id_out);
@@ -1994,34 +1999,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx,
goto out_err;
break;
case TGSI_TOKEN_TYPE_INSTRUCTION:
break;
case TGSI_TOKEN_TYPE_PROPERTY:
property = &ctx.parse.FullToken.FullProperty;
switch (property->Property.PropertyName) {
case TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS:
if (property->u[0].Data == 1)
shader->fs_write_all = TRUE;
break;
case TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION:
if (property->u[0].Data == 1)
shader->vs_position_window_space = TRUE;
break;
case TGSI_PROPERTY_VS_PROHIBIT_UCPS:
/* we don't need this one */
break;
case TGSI_PROPERTY_GS_INPUT_PRIM:
shader->gs_input_prim = property->u[0].Data;
break;
case TGSI_PROPERTY_GS_OUTPUT_PRIM:
shader->gs_output_prim = property->u[0].Data;
break;
case TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES:
shader->gs_max_out_vertices = property->u[0].Data;
break;
case TGSI_PROPERTY_GS_INVOCATIONS:
shader->gs_num_invocations = property->u[0].Data;
break;
}
break;
default:
R600_ERR("unsupported token type %d\n", ctx.parse.FullToken.Token.Type);
@@ -6151,10 +6129,10 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx)
r = tgsi_make_src_for_op3(ctx, temp_regs[0], i, &alu.src[0], &ctx->src[0]);
if (r)
return r;
r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, &alu.src[1], &ctx->src[2]);
r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, &alu.src[1], &ctx->src[2]);
if (r)
return r;
r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, &alu.src[2], &ctx->src[1]);
r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, &alu.src[2], &ctx->src[1]);
if (r)
return r;
tgsi_dst(ctx, &inst->Dst[0], i, &alu.dst);

View File

@@ -78,11 +78,6 @@ struct r600_shader {
/* Temporarily workaround SB not handling CF_INDEX_[01] index registers */
boolean uses_index_registers;
/* geometry shader properties */
unsigned gs_input_prim;
unsigned gs_output_prim;
unsigned gs_max_out_vertices;
unsigned gs_num_invocations;
/* size in bytes of a data item in the ring (single vertex data) */
unsigned ring_item_size;

View File

@@ -1951,11 +1951,11 @@ static void r600_emit_shader_stages(struct r600_context *rctx, struct r600_atom
if (state->geom_enable) {
uint32_t cut_val;
if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 128)
if (rctx->gs_shader->gs_max_out_vertices <= 128)
cut_val = V_028A40_GS_CUT_128;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 256)
else if (rctx->gs_shader->gs_max_out_vertices <= 256)
cut_val = V_028A40_GS_CUT_256;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 512)
else if (rctx->gs_shader->gs_max_out_vertices <= 512)
cut_val = V_028A40_GS_CUT_512;
else
cut_val = V_028A40_GS_CUT_1024;
@@ -2650,7 +2650,7 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha
struct r600_shader *rshader = &shader->shader;
struct r600_shader *cp_shader = &shader->gs_copy_shader->shader;
unsigned gsvs_itemsize =
(cp_shader->ring_item_size * rshader->gs_max_out_vertices) >> 2;
(cp_shader->ring_item_size * shader->selector->gs_max_out_vertices) >> 2;
r600_init_command_buffer(cb, 64);
@@ -2659,10 +2659,10 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha
if (rctx->b.chip_class >= R700) {
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
S_028B38_MAX_VERT_OUT(rshader->gs_max_out_vertices));
S_028B38_MAX_VERT_OUT(shader->selector->gs_max_out_vertices));
}
r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
r600_conv_prim_to_gs_out(rshader->gs_output_prim));
r600_conv_prim_to_gs_out(shader->selector->gs_output_prim));
r600_store_context_reg(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE,
cp_shader->ring_item_size >> 2);

View File

@@ -34,6 +34,7 @@
#include "util/u_upload_mgr.h"
#include "util/u_math.h"
#include "tgsi/tgsi_parse.h"
#include "tgsi/tgsi_scan.h"
void r600_init_command_buffer(struct r600_command_buffer *cb, unsigned num_dw)
{
@@ -123,6 +124,31 @@ static unsigned r600_conv_pipe_prim(unsigned prim)
return prim_conv[prim];
}
unsigned r600_conv_prim_to_gs_out(unsigned mode)
{
static const int prim_conv[] = {
[PIPE_PRIM_POINTS] = V_028A6C_OUTPRIM_TYPE_POINTLIST,
[PIPE_PRIM_LINES] = V_028A6C_OUTPRIM_TYPE_LINESTRIP,
[PIPE_PRIM_LINE_LOOP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP,
[PIPE_PRIM_LINE_STRIP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP,
[PIPE_PRIM_TRIANGLES] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_TRIANGLE_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_TRIANGLE_FAN] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_QUADS] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_QUAD_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_POLYGON] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_LINES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_LINESTRIP,
[PIPE_PRIM_LINE_STRIP_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_LINESTRIP,
[PIPE_PRIM_TRIANGLES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_TRISTRIP,
[PIPE_PRIM_PATCHES] = V_028A6C_OUTPRIM_TYPE_POINTLIST,
[R600_PRIM_RECTANGLE_LIST] = V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode < Elements(prim_conv));
return prim_conv[mode];
}
/* common state between evergreen and r600 */
static void r600_bind_blend_state_internal(struct r600_context *rctx,
@@ -818,6 +844,19 @@ static void *r600_create_shader_state(struct pipe_context *ctx,
sel->type = pipe_shader_type;
sel->tokens = tgsi_dup_tokens(state->tokens);
sel->so = state->stream_output;
tgsi_scan_shader(state->tokens, &sel->info);
switch (pipe_shader_type) {
case PIPE_SHADER_GEOMETRY:
sel->gs_output_prim =
sel->info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM];
sel->gs_max_out_vertices =
sel->info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES];
sel->gs_num_invocations =
sel->info.properties[TGSI_PROPERTY_GS_INVOCATIONS];
break;
}
return sel;
}
@@ -1524,7 +1563,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
unsigned prim = info.mode;
if (rctx->gs_shader) {
prim = rctx->gs_shader->current->shader.gs_output_prim;
prim = rctx->gs_shader->gs_output_prim;
}
prim = r600_conv_prim_to_gs_out(prim); /* decrease the number of types to 3 */

View File

@@ -3428,7 +3428,6 @@
#define S_0085F0_SO3_DEST_BASE_ENA(x) (((x) & 0x1) << 5)
#define G_0085F0_SO3_DEST_BASE_ENA(x) (((x) >> 5) & 0x1)
#define C_0085F0_SO3_DEST_BASE_ENA 0xFFFFFFDF
#define S_0085F0_CB0_DEST_BASE_ENA_SHIFT 6
#define S_0085F0_CB0_DEST_BASE_ENA(x) (((x) & 0x1) << 6)
#define G_0085F0_CB0_DEST_BASE_ENA(x) (((x) >> 6) & 0x1)
#define C_0085F0_CB0_DEST_BASE_ENA 0xFFFFFFBF

View File

@@ -32,6 +32,7 @@ int bc_decoder::decode_cf(unsigned &i, bc_cf& bc) {
int r = 0;
uint32_t dw0 = dw[i];
uint32_t dw1 = dw[i+1];
assert(i+1 <= ndw);
if ((dw1 >> 29) & 1) { // CF_ALU
return decode_cf_alu(i, bc);

View File

@@ -199,6 +199,9 @@ void bc_finalizer::finalize_if(region_node* r) {
cf_node *if_jump = sh.create_cf(CF_OP_JUMP);
cf_node *if_pop = sh.create_cf(CF_OP_POP);
if (!last_cf || last_cf->get_parent_region() == r) {
last_cf = if_pop;
}
if_pop->bc.pop_count = 1;
if_pop->jump_after(if_pop);

View File

@@ -95,7 +95,7 @@ int bc_parser::decode_shader() {
if ((r = decode_cf(i, eop)))
return r;
} while (!eop || (i >> 1) <= max_cf);
} while (!eop || (i >> 1) < max_cf);
return 0;
}
@@ -769,6 +769,7 @@ int bc_parser::prepare_ir() {
}
int bc_parser::prepare_loop(cf_node* c) {
assert(c->bc.addr-1 < cf_map.size());
cf_node *end = cf_map[c->bc.addr - 1];
assert(end->bc.op == CF_OP_LOOP_END);
@@ -788,8 +789,12 @@ int bc_parser::prepare_loop(cf_node* c) {
}
int bc_parser::prepare_if(cf_node* c) {
assert(c->bc.addr-1 < cf_map.size());
cf_node *c_else = NULL, *end = cf_map[c->bc.addr];
if (!end)
return 0; // not quite sure how this happens, malformed input?
BCP_DUMP(
sblog << "parsing JUMP @" << c->bc.id;
sblog << "\n";
@@ -815,7 +820,7 @@ int bc_parser::prepare_if(cf_node* c) {
if (c_else->parent != c->parent)
c_else = NULL;
if (end->parent != c->parent)
if (end && end->parent != c->parent)
end = NULL;
region_node *reg = sh->create_region();

View File

@@ -236,7 +236,7 @@ void rp_gpr_tracker::unreserve(alu_node* n) {
for (i = 0; i < nsrc; ++i) {
value *v = n->src[i];
if (v->is_readonly())
if (v->is_readonly() || v->is_undef())
continue;
if (i == 1 && opt)
continue;

View File

@@ -197,7 +197,7 @@ static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_q
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_ZPASS_DONE) | EVENT_INDEX(1));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
case PIPE_QUERY_PRIMITIVES_EMITTED:
case PIPE_QUERY_PRIMITIVES_GENERATED:
@@ -206,13 +206,13 @@ static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_q
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | EVENT_INDEX(3));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
case PIPE_QUERY_TIME_ELAPSED:
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5));
radeon_emit(cs, va);
radeon_emit(cs, (3 << 29) | ((va >> 32UL) & 0xFF));
radeon_emit(cs, (3 << 29) | ((va >> 32) & 0xFFFF));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
break;
@@ -220,7 +220,7 @@ static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_q
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_PIPELINESTAT) | EVENT_INDEX(2));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
default:
assert(0);
@@ -254,7 +254,7 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_ZPASS_DONE) | EVENT_INDEX(1));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
case PIPE_QUERY_PRIMITIVES_EMITTED:
case PIPE_QUERY_PRIMITIVES_GENERATED:
@@ -264,7 +264,7 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | EVENT_INDEX(3));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
case PIPE_QUERY_TIME_ELAPSED:
va += query->buffer.results_end + query->result_size/2;
@@ -273,7 +273,7 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOP, 4, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5));
radeon_emit(cs, va);
radeon_emit(cs, (3 << 29) | ((va >> 32UL) & 0xFF));
radeon_emit(cs, (3 << 29) | ((va >> 32) & 0xFFFF));
radeon_emit(cs, 0);
radeon_emit(cs, 0);
break;
@@ -282,7 +282,7 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_PIPELINESTAT) | EVENT_INDEX(2));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
radeon_emit(cs, (va >> 32) & 0xFFFF);
break;
default:
assert(0);
@@ -341,8 +341,8 @@ static void r600_emit_query_predication(struct r600_common_context *ctx, struct
while (results_base < qbuf->results_end) {
radeon_emit(cs, PKT3(PKT3_SET_PREDICATION, 1, 0));
radeon_emit(cs, (va + results_base) & 0xFFFFFFFFUL);
radeon_emit(cs, op | (((va + results_base) >> 32UL) & 0xFF));
radeon_emit(cs, va + results_base);
radeon_emit(cs, op | (((va + results_base) >> 32) & 0xFF));
r600_emit_reloc(ctx, &ctx->rings.gfx, qbuf->buf, RADEON_USAGE_READ,
RADEON_PRIO_MIN);
results_base += query->result_size;

View File

@@ -680,7 +680,7 @@ struct radeon_winsys {
uint64_t (*query_value)(struct radeon_winsys *ws,
enum radeon_value_id value);
void (*read_registers)(struct radeon_winsys *ws, unsigned reg_offset,
bool (*read_registers)(struct radeon_winsys *ws, unsigned reg_offset,
unsigned num_registers, uint32_t *out);
};

View File

@@ -0,0 +1 @@
sid_tables.h

View File

@@ -31,3 +31,12 @@ AM_CFLAGS = \
noinst_LTLIBRARIES = libradeonsi.la
libradeonsi_la_SOURCES = $(C_SOURCES)
sid_tables.h: $(srcdir)/sid_tables.py $(srcdir)/sid.h
$(AM_V_GEN) $(PYTHON2) $(srcdir)/sid_tables.py $(srcdir)/sid.h > $@
EXTRA_DIST = \
sid_tables.py
BUILT_SOURCES =\
sid_tables.h

View File

@@ -4,8 +4,10 @@ C_SOURCES := \
si_commands.c \
si_compute.c \
si_cp_dma.c \
si_debug.c \
si_descriptors.c \
sid.h \
sid_tables.h \
si_dma.c \
si_hw_context.c \
si_pipe.c \

View File

@@ -362,7 +362,7 @@ static void si_launch_grid(
shader_va += pc;
#endif
si_pm4_add_bo(pm4, shader->bo, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA);
si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, (shader_va >> 8) & 0xffffffff);
si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, shader_va >> 8);
si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va >> 40);
si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1,

View File

@@ -47,10 +47,11 @@ static void si_emit_cp_dma_copy_buffer(struct si_context *sctx,
unsigned size, unsigned flags)
{
struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? PKT3_CP_DMA_CP_SYNC : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? PKT3_CP_DMA_CMD_RAW_WAIT : 0;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t sel = flags & CIK_CP_DMA_USE_L2 ?
PKT3_CP_DMA_SRC_SEL(3) | PKT3_CP_DMA_DST_SEL(3) : 0;
S_411_SRC_SEL(V_411_SRC_ADDR_TC_L2) |
S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
assert(size);
assert((size & ((1<<21)-1)) == size);
@@ -79,16 +80,16 @@ static void si_emit_cp_dma_clear_buffer(struct si_context *sctx,
uint32_t clear_value, unsigned flags)
{
struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? PKT3_CP_DMA_CP_SYNC : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? PKT3_CP_DMA_CMD_RAW_WAIT : 0;
uint32_t dst_sel = flags & CIK_CP_DMA_USE_L2 ? PKT3_CP_DMA_DST_SEL(3) : 0;
uint32_t sync_flag = flags & R600_CP_DMA_SYNC ? S_411_CP_SYNC(1) : 0;
uint32_t raw_wait = flags & SI_CP_DMA_RAW_WAIT ? S_414_RAW_WAIT(1) : 0;
uint32_t dst_sel = flags & CIK_CP_DMA_USE_L2 ? S_411_DSL_SEL(V_411_DST_ADDR_TC_L2) : 0;
assert(size);
assert((size & ((1<<21)-1)) == size);
if (sctx->b.chip_class >= CIK) {
radeon_emit(cs, PKT3(PKT3_DMA_DATA, 5, 0));
radeon_emit(cs, sync_flag | dst_sel | PKT3_CP_DMA_SRC_SEL(2)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, sync_flag | dst_sel | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, 0);
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
@@ -97,7 +98,7 @@ static void si_emit_cp_dma_clear_buffer(struct si_context *sctx,
} else {
radeon_emit(cs, PKT3(PKT3_CP_DMA, 4, 0));
radeon_emit(cs, clear_value); /* DATA [31:0] */
radeon_emit(cs, sync_flag | PKT3_CP_DMA_SRC_SEL(2)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, sync_flag | S_411_SRC_SEL(V_411_DATA)); /* CP_SYNC [31] | SRC_SEL[30:29] */
radeon_emit(cs, dst_va); /* DST_ADDR_LO [31:0] */
radeon_emit(cs, (dst_va >> 32) & 0xffff); /* DST_ADDR_HI [15:0] */
radeon_emit(cs, size | raw_wait); /* COMMAND [29:22] | BYTE_COUNT [20:0] */

View File

@@ -0,0 +1,439 @@
/*
* Copyright 2015 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* Authors:
* Marek Olšák <maraeo@gmail.com>
*/
#include "si_pipe.h"
#include "si_shader.h"
#include "sid.h"
#include "sid_tables.h"
static void si_dump_shader(struct si_shader_selector *sel, const char *name,
FILE *f)
{
if (!sel || !sel->current)
return;
fprintf(f, "%s shader disassembly:\n", name);
si_dump_shader_key(sel->type, &sel->current->key, f);
fprintf(f, "%s\n\n", sel->current->binary.disasm_string);
}
/* Parsed IBs are difficult to read without colors. Use "less -R file" to
* read them, or use "aha -b -f file" to convert them to html.
*/
#define COLOR_RESET "\033[0m"
#define COLOR_RED "\033[31m"
#define COLOR_GREEN "\033[1;32m"
#define COLOR_YELLOW "\033[1;33m"
#define COLOR_CYAN "\033[1;36m"
#define INDENT_PKT 8
static void print_spaces(FILE *f, unsigned num)
{
fprintf(f, "%*s", num, "");
}
static void print_value(FILE *file, uint32_t value, int bits)
{
/* Guess if it's int or float */
if (value <= (1 << 15))
fprintf(file, "%u\n", value);
else {
float f = uif(value);
if (fabs(f) < 100000 && f*10 == floor(f*10))
fprintf(file, "%.1ff\n", f);
else
/* Don't print more leading zeros than there are bits. */
fprintf(file, "0x%0*x\n", bits / 4, value);
}
}
static void print_named_value(FILE *file, const char *name, uint32_t value,
int bits)
{
print_spaces(file, INDENT_PKT);
fprintf(file, COLOR_YELLOW "%s" COLOR_RESET " <- ", name);
print_value(file, value, bits);
}
static void si_dump_reg(FILE *file, unsigned offset, uint32_t value,
uint32_t field_mask)
{
int r, f;
for (r = 0; r < ARRAY_SIZE(reg_table); r++) {
const struct si_reg *reg = &reg_table[r];
if (reg->offset == offset) {
bool first_field = true;
print_spaces(file, INDENT_PKT);
fprintf(file, COLOR_YELLOW "%s" COLOR_RESET " <- ",
reg->name);
if (!reg->num_fields) {
print_value(file, value, 32);
return;
}
for (f = 0; f < reg->num_fields; f++) {
const struct si_field *field = &reg->fields[f];
uint32_t val = (value & field->mask) >>
(ffs(field->mask) - 1);
if (!(field->mask & field_mask))
continue;
/* Indent the field. */
if (!first_field)
print_spaces(file,
INDENT_PKT + strlen(reg->name) + 4);
/* Print the field. */
fprintf(file, "%s = ", field->name);
if (val < field->num_values && field->values[val])
fprintf(file, "%s\n", field->values[val]);
else
print_value(file, val,
util_bitcount(field->mask));
first_field = false;
}
return;
}
}
fprintf(file, COLOR_YELLOW "0x%05x" COLOR_RESET " = 0x%08x", offset, value);
}
static void si_parse_set_reg_packet(FILE *f, uint32_t *ib, unsigned count,
unsigned reg_offset)
{
unsigned reg = (ib[1] << 2) + reg_offset;
int i;
for (i = 0; i < count; i++)
si_dump_reg(f, reg + i*4, ib[2+i], ~0);
}
static uint32_t *si_parse_packet3(FILE *f, uint32_t *ib, int *num_dw,
int trace_id)
{
unsigned count = PKT_COUNT_G(ib[0]);
unsigned op = PKT3_IT_OPCODE_G(ib[0]);
const char *predicate = PKT3_PREDICATE(ib[0]) ? "(predicate)" : "";
int i;
/* Print the name first. */
for (i = 0; i < ARRAY_SIZE(packet3_table); i++)
if (packet3_table[i].op == op)
break;
if (i < ARRAY_SIZE(packet3_table))
if (op == PKT3_SET_CONTEXT_REG ||
op == PKT3_SET_CONFIG_REG ||
op == PKT3_SET_UCONFIG_REG ||
op == PKT3_SET_SH_REG)
fprintf(f, COLOR_CYAN "%s%s" COLOR_CYAN ":\n",
packet3_table[i].name, predicate);
else
fprintf(f, COLOR_GREEN "%s%s" COLOR_RESET ":\n",
packet3_table[i].name, predicate);
else
fprintf(f, COLOR_RED "PKT3_UNKNOWN 0x%x%s" COLOR_RESET ":\n",
op, predicate);
/* Print the contents. */
switch (op) {
case PKT3_SET_CONTEXT_REG:
si_parse_set_reg_packet(f, ib, count, SI_CONTEXT_REG_OFFSET);
break;
case PKT3_SET_CONFIG_REG:
si_parse_set_reg_packet(f, ib, count, SI_CONFIG_REG_OFFSET);
break;
case PKT3_SET_UCONFIG_REG:
si_parse_set_reg_packet(f, ib, count, CIK_UCONFIG_REG_OFFSET);
break;
case PKT3_SET_SH_REG:
si_parse_set_reg_packet(f, ib, count, SI_SH_REG_OFFSET);
break;
case PKT3_DRAW_PREAMBLE:
si_dump_reg(f, R_030908_VGT_PRIMITIVE_TYPE, ib[1], ~0);
si_dump_reg(f, R_028AA8_IA_MULTI_VGT_PARAM, ib[2], ~0);
si_dump_reg(f, R_028B58_VGT_LS_HS_CONFIG, ib[3], ~0);
break;
case PKT3_ACQUIRE_MEM:
si_dump_reg(f, R_0301F0_CP_COHER_CNTL, ib[1], ~0);
si_dump_reg(f, R_0301F4_CP_COHER_SIZE, ib[2], ~0);
si_dump_reg(f, R_030230_CP_COHER_SIZE_HI, ib[3], ~0);
si_dump_reg(f, R_0301F8_CP_COHER_BASE, ib[4], ~0);
si_dump_reg(f, R_0301E4_CP_COHER_BASE_HI, ib[5], ~0);
print_named_value(f, "POLL_INTERVAL", ib[6], 16);
break;
case PKT3_SURFACE_SYNC:
si_dump_reg(f, R_0085F0_CP_COHER_CNTL, ib[1], ~0);
si_dump_reg(f, R_0085F4_CP_COHER_SIZE, ib[2], ~0);
si_dump_reg(f, R_0085F8_CP_COHER_BASE, ib[3], ~0);
print_named_value(f, "POLL_INTERVAL", ib[4], 16);
break;
case PKT3_EVENT_WRITE:
si_dump_reg(f, R_028A90_VGT_EVENT_INITIATOR, ib[1],
S_028A90_EVENT_TYPE(~0));
print_named_value(f, "EVENT_INDEX", (ib[1] >> 8) & 0xf, 4);
print_named_value(f, "INV_L2", (ib[1] >> 20) & 0x1, 1);
if (count > 0) {
print_named_value(f, "ADDRESS_LO", ib[2], 32);
print_named_value(f, "ADDRESS_HI", ib[3], 16);
}
break;
case PKT3_DRAW_INDEX_AUTO:
si_dump_reg(f, R_030930_VGT_NUM_INDICES, ib[1], ~0);
si_dump_reg(f, R_0287F0_VGT_DRAW_INITIATOR, ib[2], ~0);
break;
case PKT3_DRAW_INDEX_2:
si_dump_reg(f, R_028A78_VGT_DMA_MAX_SIZE, ib[1], ~0);
si_dump_reg(f, R_0287E8_VGT_DMA_BASE, ib[2], ~0);
si_dump_reg(f, R_0287E4_VGT_DMA_BASE_HI, ib[3], ~0);
si_dump_reg(f, R_030930_VGT_NUM_INDICES, ib[4], ~0);
si_dump_reg(f, R_0287F0_VGT_DRAW_INITIATOR, ib[5], ~0);
break;
case PKT3_INDEX_TYPE:
si_dump_reg(f, R_028A7C_VGT_DMA_INDEX_TYPE, ib[1], ~0);
break;
case PKT3_NUM_INSTANCES:
si_dump_reg(f, R_030934_VGT_NUM_INSTANCES, ib[1], ~0);
break;
case PKT3_WRITE_DATA:
si_dump_reg(f, R_370_CONTROL, ib[1], ~0);
si_dump_reg(f, R_371_DST_ADDR_LO, ib[2], ~0);
si_dump_reg(f, R_372_DST_ADDR_HI, ib[3], ~0);
for (i = 2; i < count; i++) {
print_spaces(f, INDENT_PKT);
fprintf(f, "0x%08x\n", ib[2+i]);
}
break;
case PKT3_CP_DMA:
si_dump_reg(f, R_410_CP_DMA_WORD0, ib[1], ~0);
si_dump_reg(f, R_411_CP_DMA_WORD1, ib[2], ~0);
si_dump_reg(f, R_412_CP_DMA_WORD2, ib[3], ~0);
si_dump_reg(f, R_413_CP_DMA_WORD3, ib[4], ~0);
si_dump_reg(f, R_414_COMMAND, ib[5], ~0);
break;
case PKT3_DMA_DATA:
si_dump_reg(f, R_500_DMA_DATA_WORD0, ib[1], ~0);
si_dump_reg(f, R_501_SRC_ADDR_LO, ib[2], ~0);
si_dump_reg(f, R_502_SRC_ADDR_HI, ib[3], ~0);
si_dump_reg(f, R_503_DST_ADDR_LO, ib[4], ~0);
si_dump_reg(f, R_504_DST_ADDR_HI, ib[5], ~0);
si_dump_reg(f, R_414_COMMAND, ib[6], ~0);
break;
case PKT3_NOP:
if (ib[0] == 0xffff1000) {
count = -1; /* One dword NOP. */
break;
} else if (count == 0 && SI_IS_TRACE_POINT(ib[1])) {
unsigned packet_id = SI_GET_TRACE_POINT_ID(ib[1]);
print_spaces(f, INDENT_PKT);
fprintf(f, COLOR_RED "Trace point ID: %u\n", packet_id);
if (trace_id == -1)
break; /* tracing was disabled */
print_spaces(f, INDENT_PKT);
if (packet_id < trace_id)
fprintf(f, COLOR_RED
"This trace point was reached by the CP."
COLOR_RESET "\n");
else if (packet_id == trace_id)
fprintf(f, COLOR_RED
"!!!!! This is the last trace point that "
"was reached by the CP !!!!!"
COLOR_RESET "\n");
else if (packet_id+1 == trace_id)
fprintf(f, COLOR_RED
"!!!!! This is the first trace point that "
"was NOT been reached by the CP !!!!!"
COLOR_RESET "\n");
else
fprintf(f, COLOR_RED
"!!!!! This trace point was NOT reached "
"by the CP !!!!!"
COLOR_RESET "\n");
break;
}
/* fall through, print all dwords */
default:
for (i = 0; i < count+1; i++) {
print_spaces(f, INDENT_PKT);
fprintf(f, "0x%08x\n", ib[1+i]);
}
}
ib += count + 2;
*num_dw -= count + 2;
return ib;
}
/**
* Parse and print an IB into a file.
*
* \param f file
* \param ib IB
* \param num_dw size of the IB
* \param chip_class chip class
* \param trace_id the last trace ID that is known to have been reached
* and executed by the CP, typically read from a buffer
*/
static void si_parse_ib(FILE *f, uint32_t *ib, int num_dw, int trace_id)
{
fprintf(f, "------------------ IB begin ------------------\n");
while (num_dw > 0) {
unsigned type = PKT_TYPE_G(ib[0]);
switch (type) {
case 3:
ib = si_parse_packet3(f, ib, &num_dw, trace_id);
break;
case 2:
/* type-2 nop */
if (ib[0] == 0x80000000) {
fprintf(f, COLOR_GREEN "NOP (type 2)" COLOR_RESET "\n");
ib++;
break;
}
/* fall through */
default:
fprintf(f, "Unknown packet type %i\n", type);
return;
}
}
fprintf(f, "------------------- IB end -------------------\n");
if (num_dw < 0) {
printf("Packet ends after the end of IB.\n");
exit(0);
}
}
static void si_dump_mmapped_reg(struct si_context *sctx, FILE *f,
unsigned offset)
{
struct radeon_winsys *ws = sctx->b.ws;
uint32_t value;
if (ws->read_registers(ws, offset, 1, &value))
si_dump_reg(f, offset, value, ~0);
}
static void si_dump_debug_registers(struct si_context *sctx, FILE *f)
{
if (sctx->screen->b.info.drm_major == 2 &&
sctx->screen->b.info.drm_minor < 42)
return; /* no radeon support */
fprintf(f, "Memory-mapped registers:\n");
si_dump_mmapped_reg(sctx, f, R_008010_GRBM_STATUS);
/* No other registers can be read on DRM < 3.1.0. */
if (sctx->screen->b.info.drm_major < 3 ||
sctx->screen->b.info.drm_minor < 1) {
fprintf(f, "\n");
return;
}
si_dump_mmapped_reg(sctx, f, R_008008_GRBM_STATUS2);
si_dump_mmapped_reg(sctx, f, R_008014_GRBM_STATUS_SE0);
si_dump_mmapped_reg(sctx, f, R_008018_GRBM_STATUS_SE1);
si_dump_mmapped_reg(sctx, f, R_008038_GRBM_STATUS_SE2);
si_dump_mmapped_reg(sctx, f, R_00803C_GRBM_STATUS_SE3);
si_dump_mmapped_reg(sctx, f, R_00D034_SDMA0_STATUS_REG);
si_dump_mmapped_reg(sctx, f, R_00D834_SDMA1_STATUS_REG);
si_dump_mmapped_reg(sctx, f, R_000E50_SRBM_STATUS);
si_dump_mmapped_reg(sctx, f, R_000E4C_SRBM_STATUS2);
si_dump_mmapped_reg(sctx, f, R_000E54_SRBM_STATUS3);
si_dump_mmapped_reg(sctx, f, R_008680_CP_STAT);
si_dump_mmapped_reg(sctx, f, R_008674_CP_STALLED_STAT1);
si_dump_mmapped_reg(sctx, f, R_008678_CP_STALLED_STAT2);
si_dump_mmapped_reg(sctx, f, R_008670_CP_STALLED_STAT3);
si_dump_mmapped_reg(sctx, f, R_008210_CP_CPC_STATUS);
si_dump_mmapped_reg(sctx, f, R_008214_CP_CPC_BUSY_STAT);
si_dump_mmapped_reg(sctx, f, R_008218_CP_CPC_STALLED_STAT1);
si_dump_mmapped_reg(sctx, f, R_00821C_CP_CPF_STATUS);
si_dump_mmapped_reg(sctx, f, R_008220_CP_CPF_BUSY_STAT);
si_dump_mmapped_reg(sctx, f, R_008224_CP_CPF_STALLED_STAT1);
fprintf(f, "\n");
}
static void si_dump_debug_state(struct pipe_context *ctx, FILE *f,
unsigned flags)
{
struct si_context *sctx = (struct si_context*)ctx;
if (flags & PIPE_DEBUG_DEVICE_IS_HUNG)
si_dump_debug_registers(sctx, f);
si_dump_shader(sctx->vs_shader, "Vertex", f);
si_dump_shader(sctx->tcs_shader, "Tessellation control", f);
si_dump_shader(sctx->tes_shader, "Tessellation evaluation", f);
si_dump_shader(sctx->gs_shader, "Geometry", f);
si_dump_shader(sctx->ps_shader, "Fragment", f);
if (sctx->last_ib) {
int last_trace_id = -1;
if (sctx->last_trace_buf) {
/* We are expecting that the ddebug pipe has already
* waited for the context, so this buffer should be idle.
* If the GPU is hung, there is no point in waiting for it.
*/
uint32_t *map =
sctx->b.ws->buffer_map(sctx->last_trace_buf->cs_buf,
NULL,
PIPE_TRANSFER_UNSYNCHRONIZED |
PIPE_TRANSFER_READ);
if (map)
last_trace_id = *map;
}
si_parse_ib(f, sctx->last_ib, sctx->last_ib_dw_size,
last_trace_id);
free(sctx->last_ib); /* dump only once */
sctx->last_ib = NULL;
r600_resource_reference(&sctx->last_trace_buf, NULL);
}
fprintf(f, "Done.\n");
}
void si_init_debug_functions(struct si_context *sctx)
{
sctx->b.b.dump_debug_state = si_dump_debug_state;
}

View File

@@ -426,7 +426,7 @@ static bool si_upload_vertex_buffer_descriptors(struct si_context *sctx)
va = rbuffer->gpu_address + offset;
/* Fill in T# buffer resource description */
desc[0] = va & 0xFFFFFFFF;
desc[0] = va;
desc[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) |
S_008F04_STRIDE(vb->stride);

View File

@@ -86,8 +86,8 @@ static void si_dma_copy_buffer(struct si_context *ctx,
for (i = 0; i < ncopy; i++) {
csize = size < max_csize ? size : max_csize;
cs->buf[cs->cdw++] = SI_DMA_PACKET(SI_DMA_PACKET_COPY, sub_cmd, csize);
cs->buf[cs->cdw++] = dst_offset & 0xffffffff;
cs->buf[cs->cdw++] = src_offset & 0xffffffff;
cs->buf[cs->cdw++] = dst_offset;
cs->buf[cs->cdw++] = src_offset;
cs->buf[cs->cdw++] = (dst_offset >> 32UL) & 0xff;
cs->buf[cs->cdw++] = (src_offset >> 32UL) & 0xff;
dst_offset += csize << shift;

View File

@@ -88,11 +88,8 @@ void si_need_cs_space(struct si_context *ctx, unsigned num_dw,
/* Count in framebuffer cache flushes at the end of CS. */
num_dw += ctx->atoms.s.cache_flush->num_dw;
#if SI_TRACE_CS
if (ctx->screen->b.trace_bo) {
num_dw += SI_TRACE_CS_DWORDS;
}
#endif
if (ctx->screen->b.trace_bo)
num_dw += SI_TRACE_CS_DWORDS * 2;
/* Flush if there's not enough space. */
if (num_dw > cs->max_dw) {
@@ -130,6 +127,19 @@ void si_context_gfx_flush(void *context, unsigned flags,
/* force to keep tiling flags */
flags |= RADEON_FLUSH_KEEP_TILING_FLAGS;
if (ctx->trace_buf)
si_trace_emit(ctx);
/* Save the IB for debug contexts. */
if (ctx->is_debug) {
free(ctx->last_ib);
ctx->last_ib_dw_size = cs->cdw;
ctx->last_ib = malloc(cs->cdw * 4);
memcpy(ctx->last_ib, cs->buf, cs->cdw * 4);
r600_resource_reference(&ctx->last_trace_buf, ctx->trace_buf);
r600_resource_reference(&ctx->trace_buf, NULL);
}
/* Flush the CS. */
ws->cs_flush(cs, flags, &ctx->last_gfx_fence,
ctx->screen->b.cs_count++);
@@ -138,31 +148,28 @@ void si_context_gfx_flush(void *context, unsigned flags,
if (fence)
ws->fence_reference(fence, ctx->last_gfx_fence);
#if SI_TRACE_CS
if (ctx->screen->b.trace_bo) {
struct si_screen *sscreen = ctx->screen;
unsigned i;
for (i = 0; i < 10; i++) {
usleep(5);
if (!ws->buffer_is_busy(sscreen->b.trace_bo->buf, RADEON_USAGE_READWRITE)) {
break;
}
}
if (i == 10) {
fprintf(stderr, "timeout on cs lockup likely happen at cs %d dw %d\n",
sscreen->b.trace_ptr[1], sscreen->b.trace_ptr[0]);
} else {
fprintf(stderr, "cs %d executed in %dms\n", sscreen->b.trace_ptr[1], i * 5);
}
}
#endif
si_begin_new_cs(ctx);
}
void si_begin_new_cs(struct si_context *ctx)
{
if (ctx->is_debug) {
uint32_t zero = 0;
/* Create a buffer used for writing trace IDs and initialize it to 0. */
assert(!ctx->trace_buf);
ctx->trace_buf = (struct r600_resource*)
pipe_buffer_create(ctx->b.b.screen, PIPE_BIND_CUSTOM,
PIPE_USAGE_STAGING, 4);
if (ctx->trace_buf)
pipe_buffer_write_nooverlap(&ctx->b.b, &ctx->trace_buf->b.b,
0, sizeof(zero), &zero);
ctx->trace_id = 0;
}
if (ctx->trace_buf)
si_trace_emit(ctx);
/* Flush read caches at the beginning of CS. */
ctx->b.flags |= SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER |
SI_CONTEXT_INV_TC_L1 |

View File

@@ -81,6 +81,9 @@ static void si_destroy_context(struct pipe_context *context)
LLVMDisposeTargetMachine(sctx->tm);
#endif
r600_resource_reference(&sctx->trace_buf, NULL);
r600_resource_reference(&sctx->last_trace_buf, NULL);
free(sctx->last_ib);
FREE(sctx);
}
@@ -92,7 +95,8 @@ si_amdgpu_get_reset_status(struct pipe_context *ctx)
return sctx->b.ws->ctx_query_reset_status(sctx->b.ctx);
}
static struct pipe_context *si_create_context(struct pipe_screen *screen, void *priv)
static struct pipe_context *si_create_context(struct pipe_screen *screen,
void *priv, unsigned flags)
{
struct si_context *sctx = CALLOC_STRUCT(si_context);
struct si_screen* sscreen = (struct si_screen *)screen;
@@ -111,6 +115,7 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, void *
sctx->b.b.destroy = si_destroy_context;
sctx->b.set_atom_dirty = (void *)si_set_atom_dirty;
sctx->screen = sscreen; /* Easy accessing of screen/winsys. */
sctx->is_debug = (flags & PIPE_CONTEXT_DEBUG) != 0;
if (!r600_common_context_init(&sctx->b, &sscreen->b))
goto fail;
@@ -121,6 +126,7 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, void *
si_init_blit_functions(sctx);
si_init_compute_functions(sctx);
si_init_cp_dma_functions(sctx);
si_init_debug_functions(sctx);
if (sscreen->b.info.has_uvd) {
sctx->b.b.create_video_codec = si_uvd_create_decoder;
@@ -586,7 +592,7 @@ struct pipe_screen *radeonsi_screen_create(struct radeon_winsys *ws)
sscreen->b.debug_flags |= DBG_FS | DBG_VS | DBG_GS | DBG_PS | DBG_CS;
/* Create the auxiliary context. This must be done last. */
sscreen->b.aux_context = sscreen->b.b.context_create(&sscreen->b.b, NULL);
sscreen->b.aux_context = sscreen->b.b.context_create(&sscreen->b.b, NULL, 0);
return &sscreen->b.b;
}

View File

@@ -43,8 +43,7 @@
#define SI_RESTART_INDEX_UNKNOWN INT_MIN
#define SI_NUM_SMOOTH_AA_SAMPLES 8
#define SI_TRACE_CS 0
#define SI_TRACE_CS_DWORDS 6
#define SI_TRACE_CS_DWORDS 7
#define SI_MAX_DRAW_CS_DWORDS \
(/*scratch:*/ 3 + /*derived prim state:*/ 3 + \
@@ -82,6 +81,10 @@
SI_CONTEXT_FLUSH_AND_INV_DB | \
SI_CONTEXT_FLUSH_AND_INV_DB_META)
#define SI_ENCODE_TRACE_POINT(id) (0xcafe0000 | ((id) & 0xffff))
#define SI_IS_TRACE_POINT(x) (((x) & 0xcafe0000) == 0xcafe0000)
#define SI_GET_TRACE_POINT_ID(x) ((x) & 0xffff)
struct si_compute;
struct si_screen {
@@ -243,6 +246,14 @@ struct si_context {
struct si_shader_selector *last_tcs;
int last_num_tcs_input_cp;
int last_tes_sh_base;
/* Debug state. */
bool is_debug;
uint32_t *last_ib;
unsigned last_ib_dw_size;
struct r600_resource *last_trace_buf;
struct r600_resource *trace_buf;
unsigned trace_id;
};
/* cik_sdma.c */
@@ -275,6 +286,9 @@ void si_copy_buffer(struct si_context *sctx,
bool is_framebuffer);
void si_init_cp_dma_functions(struct si_context *sctx);
/* si_debug.c */
void si_init_debug_functions(struct si_context *sctx);
/* si_dma.c */
void si_dma_copy(struct pipe_context *ctx,
struct pipe_resource *dst,
@@ -290,10 +304,6 @@ void si_context_gfx_flush(void *context, unsigned flags,
void si_begin_new_cs(struct si_context *ctx);
void si_need_cs_space(struct si_context *ctx, unsigned num_dw, boolean count_draw_in);
#if SI_TRACE_CS
void si_trace_emit(struct si_context *sctx);
#endif
/* si_compute.c */
void si_init_compute_functions(struct si_context *sctx);

View File

@@ -135,12 +135,6 @@ unsigned si_pm4_dirty_dw(struct si_context *sctx)
continue;
count += state->ndw;
#if SI_TRACE_CS
/* for tracing each states */
if (sctx->screen->b.trace_bo) {
count += SI_TRACE_CS_DWORDS;
}
#endif
}
return count;
@@ -161,12 +155,6 @@ void si_pm4_emit(struct si_context *sctx, struct si_pm4_state *state)
}
cs->cdw += state->ndw;
#if SI_TRACE_CS
if (sctx->screen->b.trace_bo) {
si_trace_emit(sctx);
}
#endif
}
void si_pm4_emit_dirty(struct si_context *sctx)

View File

@@ -2418,7 +2418,7 @@ static void tex_fetch_args(
num_deriv_channels = 1;
break;
default:
assert(0); /* no other targets are valid here */
unreachable("invalid target");
}
for (param = 0; param < 2; param++)
@@ -3781,7 +3781,7 @@ void si_shader_apply_scratch_relocs(struct si_context *sctx,
uint64_t scratch_va)
{
unsigned i;
uint32_t scratch_rsrc_dword0 = scratch_va & 0xffffffff;
uint32_t scratch_rsrc_dword0 = scratch_va;
uint32_t scratch_rsrc_dword1 =
S_008F04_BASE_ADDRESS_HI(scratch_va >> 32)
| S_008F04_STRIDE(shader->scratch_bytes_per_wave / 64);
@@ -3964,48 +3964,48 @@ static int si_generate_gs_copy_shader(struct si_screen *sscreen,
return r;
}
static void si_dump_key(unsigned shader, union si_shader_key *key)
void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f)
{
int i;
fprintf(stderr, "SHADER KEY\n");
fprintf(f, "SHADER KEY\n");
switch (shader) {
case PIPE_SHADER_VERTEX:
fprintf(stderr, " instance_divisors = {");
fprintf(f, " instance_divisors = {");
for (i = 0; i < Elements(key->vs.instance_divisors); i++)
fprintf(stderr, !i ? "%u" : ", %u",
fprintf(f, !i ? "%u" : ", %u",
key->vs.instance_divisors[i]);
fprintf(stderr, "}\n");
fprintf(f, "}\n");
if (key->vs.as_es)
fprintf(stderr, " es_enabled_outputs = 0x%"PRIx64"\n",
fprintf(f, " es_enabled_outputs = 0x%"PRIx64"\n",
key->vs.es_enabled_outputs);
fprintf(stderr, " as_es = %u\n", key->vs.as_es);
fprintf(stderr, " as_ls = %u\n", key->vs.as_ls);
fprintf(f, " as_es = %u\n", key->vs.as_es);
fprintf(f, " as_ls = %u\n", key->vs.as_ls);
break;
case PIPE_SHADER_TESS_CTRL:
fprintf(stderr, " prim_mode = %u\n", key->tcs.prim_mode);
fprintf(f, " prim_mode = %u\n", key->tcs.prim_mode);
break;
case PIPE_SHADER_TESS_EVAL:
if (key->tes.as_es)
fprintf(stderr, " es_enabled_outputs = 0x%"PRIx64"\n",
fprintf(f, " es_enabled_outputs = 0x%"PRIx64"\n",
key->tes.es_enabled_outputs);
fprintf(stderr, " as_es = %u\n", key->tes.as_es);
fprintf(f, " as_es = %u\n", key->tes.as_es);
break;
case PIPE_SHADER_GEOMETRY:
break;
case PIPE_SHADER_FRAGMENT:
fprintf(stderr, " export_16bpc = 0x%X\n", key->ps.export_16bpc);
fprintf(stderr, " last_cbuf = %u\n", key->ps.last_cbuf);
fprintf(stderr, " color_two_side = %u\n", key->ps.color_two_side);
fprintf(stderr, " alpha_func = %u\n", key->ps.alpha_func);
fprintf(stderr, " alpha_to_one = %u\n", key->ps.alpha_to_one);
fprintf(stderr, " poly_stipple = %u\n", key->ps.poly_stipple);
fprintf(f, " export_16bpc = 0x%X\n", key->ps.export_16bpc);
fprintf(f, " last_cbuf = %u\n", key->ps.last_cbuf);
fprintf(f, " color_two_side = %u\n", key->ps.color_two_side);
fprintf(f, " alpha_func = %u\n", key->ps.alpha_func);
fprintf(f, " alpha_to_one = %u\n", key->ps.alpha_to_one);
fprintf(f, " poly_stipple = %u\n", key->ps.poly_stipple);
break;
default:
@@ -4036,7 +4036,7 @@ int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
/* Dump TGSI code before doing TGSI->LLVM conversion in case the
* conversion fails. */
if (dump && !(sscreen->b.debug_flags & DBG_NO_TGSI)) {
si_dump_key(sel->type, &shader->key);
si_dump_shader_key(sel->type, &shader->key, stderr);
tgsi_dump(tokens, 0);
si_dump_streamout(&sel->so);
}

View File

@@ -304,6 +304,7 @@ static inline bool si_vs_exports_prim_id(struct si_shader *shader)
/* radeonsi_shader.c */
int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm,
struct si_shader *shader);
void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f);
int si_compile_llvm(struct si_screen *sscreen, struct si_shader *shader,
LLVMTargetMachineRef tm, LLVMModuleRef mod);
void si_shader_destroy(struct pipe_context *ctx, struct si_shader *shader);

View File

@@ -35,10 +35,10 @@
#include "util/u_pstipple.h"
static void si_init_atom(struct r600_atom *atom, struct r600_atom **list_elem,
void (*emit)(struct si_context *ctx, struct r600_atom *state),
void (*emit_func)(struct si_context *ctx, struct r600_atom *state),
unsigned num_dw)
{
atom->emit = (void*)emit;
atom->emit = (void*)emit_func;
atom->num_dw = num_dw;
atom->dirty = false;
*list_elem = atom;

View File

@@ -281,6 +281,7 @@ extern const struct r600_atom si_atom_msaa_sample_locs;
extern const struct r600_atom si_atom_msaa_config;
void si_emit_cache_flush(struct r600_common_context *sctx, struct r600_atom *atom);
void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *dinfo);
void si_trace_emit(struct si_context *sctx);
/* si_commands.c */
void si_cmd_context_control(struct si_pm4_state *pm4);

View File

@@ -835,11 +835,8 @@ void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info)
si_emit_draw_registers(sctx, info);
si_emit_draw_packets(sctx, info, &ib);
#if SI_TRACE_CS
if (sctx->screen->b.trace_bo) {
if (sctx->trace_buf)
si_trace_emit(sctx);
}
#endif
/* Workaround for a VGT hang when streamout is enabled.
* It must be done after drawing. */
@@ -874,23 +871,20 @@ void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info)
sctx->b.num_draw_calls++;
}
#if SI_TRACE_CS
void si_trace_emit(struct si_context *sctx)
{
struct si_screen *sscreen = sctx->screen;
struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
uint64_t va;
va = sscreen->b.trace_bo->gpu_address;
r600_context_bo_reloc(&sctx->b, &sctx->b.rings.gfx, sscreen->b.trace_bo,
sctx->trace_id++;
r600_context_bo_reloc(&sctx->b, &sctx->b.rings.gfx, sctx->trace_buf,
RADEON_USAGE_READWRITE, RADEON_PRIO_MIN);
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, 0));
radeon_emit(cs, PKT3_WRITE_DATA_DST_SEL(PKT3_WRITE_DATA_DST_SEL_MEM_SYNC) |
PKT3_WRITE_DATA_WR_CONFIRM |
PKT3_WRITE_DATA_ENGINE_SEL(PKT3_WRITE_DATA_ENGINE_SEL_ME));
radeon_emit(cs, va & 0xFFFFFFFFUL);
radeon_emit(cs, (va >> 32UL) & 0xFFFFFFFFUL);
radeon_emit(cs, cs->cdw);
radeon_emit(cs, sscreen->b.cs_count);
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEMORY_SYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, sctx->trace_buf->gpu_address);
radeon_emit(cs, sctx->trace_buf->gpu_address >> 32);
radeon_emit(cs, sctx->trace_id);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, SI_ENCODE_TRACE_POINT(sctx->trace_id));
}
#endif

View File

@@ -181,7 +181,7 @@ static void si_shader_es(struct si_shader *shader)
vgpr_comp_cnt = 3; /* all components are needed for TES */
num_user_sgprs = SI_TES_NUM_USER_SGPR;
} else
assert(0);
unreachable("invalid shader selector type");
num_sgprs = shader->num_sgprs;
/* One SGPR after user SGPRs is pre-loaded with es2gs_offset */
@@ -338,7 +338,7 @@ static void si_shader_vs(struct si_shader *shader)
vgpr_comp_cnt = 3; /* all components are needed for TES */
num_user_sgprs = SI_TES_NUM_USER_SGPR;
} else
assert(0);
unreachable("invalid shader selector type");
num_sgprs = shader->num_sgprs;
if (num_user_sgprs > num_sgprs) {

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,179 @@
#!/usr/bin/env python
CopyRight = '''
/*
* Copyright 2015 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* on the rights to use, copy, modify, merge, publish, distribute, sub
* license, and/or sell copies of the Software, and to permit persons to whom
* the Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
'''
import sys
import re
class Field:
def __init__(self, reg, s_name):
self.s_name = s_name
self.name = strip_prefix(s_name)
self.values = []
self.varname_values = '%s__%s__values' % (reg.r_name.lower(), self.name.lower())
class Reg:
def __init__(self, r_name):
self.r_name = r_name
self.name = strip_prefix(r_name)
self.fields = []
self.varname_fields = '%s__fields' % self.r_name.lower()
self.own_fields = True
def strip_prefix(s):
'''Strip prefix in the form ._.*_, e.g. R_001234_'''
return s[s[2:].find('_')+3:]
def parse(filename):
stream = open(filename)
regs = []
packets = []
for line in stream:
if not line.startswith('#define '):
continue
line = line[8:].strip()
if line.startswith('R_'):
reg = Reg(line.split()[0])
regs.append(reg)
elif line.startswith('S_'):
field = Field(reg, line[:line.find('(')])
reg.fields.append(field)
elif line.startswith('V_'):
field.values.append(line.split()[0])
elif line.startswith('PKT3_') and line.find('0x') != -1 and line.find('(') == -1:
packets.append(line.split()[0])
# Copy fields to indexed registers which have their fields only defined
# at register index 0.
# For example, copy fields from CB_COLOR0_INFO to CB_COLORn_INFO, n > 0.
match_number = re.compile('[0-9]+')
reg_dict = dict()
# Create a dict of registers with fields and '0' in their name
for reg in regs:
if len(reg.fields) and reg.name.find('0') != -1:
reg_dict[reg.name] = reg
# Assign fields
for reg in regs:
if not len(reg.fields):
reg0 = reg_dict.get(match_number.sub('0', reg.name))
if reg0 != None:
reg.fields = reg0.fields
reg.varname_fields = reg0.varname_fields
reg.own_fields = False
return (regs, packets)
def write_tables(tables):
regs = tables[0]
packets = tables[1]
print '/* This file is autogenerated by sid_tables.py from sid.h. Do not edit directly. */'
print
print CopyRight.strip()
print '''
#ifndef SID_TABLES_H
#define SID_TABLES_H
struct si_field {
const char *name;
unsigned mask;
unsigned num_values;
const char **values;
};
struct si_reg {
const char *name;
unsigned offset;
unsigned num_fields;
const struct si_field *fields;
};
struct si_packet3 {
const char *name;
unsigned op;
};
'''
print 'static const struct si_packet3 packet3_table[] = {'
for pkt in packets:
print '\t{"%s", %s},' % (pkt[5:], pkt)
print '};'
print
for reg in regs:
if len(reg.fields) and reg.own_fields:
for field in reg.fields:
if len(field.values):
print 'static const char *%s[] = {' % (field.varname_values)
for value in field.values:
print '\t[%s] = "%s",' % (value, strip_prefix(value))
print '};'
print
print 'static const struct si_field %s[] = {' % (reg.varname_fields)
for field in reg.fields:
if len(field.values):
print '\t{"%s", %s(~0u), ARRAY_SIZE(%s), %s},' % (field.name,
field.s_name, field.varname_values, field.varname_values)
else:
print '\t{"%s", %s(~0u)},' % (field.name, field.s_name)
print '};'
print
print 'static const struct si_reg reg_table[] = {'
for reg in regs:
if len(reg.fields):
print '\t{"%s", %s, ARRAY_SIZE(%s), %s},' % (reg.name, reg.r_name,
reg.varname_fields, reg.varname_fields)
else:
print '\t{"%s", %s},' % (reg.name, reg.r_name)
print '};'
print
print '#endif'
def main():
tables = []
for arg in sys.argv[1:]:
tables.extend(parse(arg))
write_tables(tables)
if __name__ == '__main__':
main()

View File

@@ -129,13 +129,13 @@ rbug_screen_is_format_supported(struct pipe_screen *_screen,
static struct pipe_context *
rbug_screen_context_create(struct pipe_screen *_screen,
void *priv)
void *priv, unsigned flags)
{
struct rbug_screen *rb_screen = rbug_screen(_screen);
struct pipe_screen *screen = rb_screen->screen;
struct pipe_context *result;
result = screen->context_create(screen, priv);
result = screen->context_create(screen, priv, flags);
if (result)
return rbug_context_create(_screen, result);
return NULL;
@@ -281,7 +281,7 @@ rbug_screen_create(struct pipe_screen *screen)
rb_screen->screen = screen;
rb_screen->private_context = screen->context_create(screen, NULL);
rb_screen->private_context = screen->context_create(screen, NULL, 0);
if (!rb_screen->private_context)
goto err_free;

View File

@@ -186,8 +186,8 @@ softpipe_render_condition( struct pipe_context *pipe,
struct pipe_context *
softpipe_create_context( struct pipe_screen *screen,
void *priv )
softpipe_create_context(struct pipe_screen *screen,
void *priv, unsigned flags)
{
struct softpipe_screen *sp_screen = softpipe_screen(screen);
struct softpipe_context *softpipe = CALLOC_STRUCT(softpipe_context);

View File

@@ -211,7 +211,7 @@ softpipe_context( struct pipe_context *pipe )
struct pipe_context *
softpipe_create_context( struct pipe_screen *, void *priv );
softpipe_create_context(struct pipe_screen *, void *priv, unsigned flags);
struct pipe_resource *
softpipe_user_buffer_create(struct pipe_screen *screen,

View File

@@ -81,8 +81,8 @@ static void svga_destroy( struct pipe_context *pipe )
struct pipe_context *svga_context_create( struct pipe_screen *screen,
void *priv )
struct pipe_context *svga_context_create(struct pipe_screen *screen,
void *priv, unsigned flags)
{
struct svga_screen *svgascreen = svga_screen(screen);
struct svga_context *svga = NULL;

Some files were not shown because too many files have changed in this diff Show More