Compare commits

...

20 Commits

Author SHA1 Message Date
Ian Romanick
1e6bba58d8 mesa: Bump version to 10.1-rc1
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2014-02-07 18:34:08 -08:00
Christoph Bumiller
137a0fe5c8 nvc0: handle TGSI_SEMANTIC_LAYER
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 882e98e5e6)
2014-02-07 17:10:11 -08:00
Kristian Høgsberg
70e8ec38b5 glx: Pass NULL DRI drawables into the DRI driver for None GLX drawables
GLX_ARB_create_context allows making a GLX context current with None
drawable and readables, but this was never implemented correctly in GLX.
We would create a __DRIdrawable for the None GLX drawable and pass that
to the DRI driver and that would somehow work.  Now it's somehow broken.

The way this should have worked is that we pass a NULL DRI drawable
to the DRI driver when the GLX user calls glXMakeContextCurrent()
with None for drawable and readables.

https://bugs.freedesktop.org/show_bug.cgi?id=74143
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit f658150639)
2014-02-07 17:10:11 -08:00
Kristian Høgsberg
c79a7ef9a3 i965: Move intel_prepare_render() above first buffer access
The driver is supposed to ensure buffers before any drawing operation, but in
do_blit_drawpixels() and do_blit_copypixels() we inspect the buffer format
before calling intel_prepare_render().  That was covered up by the
unconditional call to intel_prepare_render() in intelMakeCurrent(), but we
now only do this on the initial intelMakeCurrent call for a context
(to get the size for the initial viewport values).

https://bugs.freedesktop.org/show_bug.cgi?id=74083

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexander Monakov <amonakov@gmail.com>
(cherry picked from commit 44338cd826)
2014-02-07 17:10:11 -08:00
Christoph Bumiller
17aeb3fdc9 nvc0/ir/emit: hardcode vertex output stream to 0 for now
(cherry picked from commit b7233acf78)
2014-02-07 17:10:11 -08:00
Kenneth Graunke
ecaf9259e9 glsl: Don't lose precision qualifiers when encountering "centroid".
Mesa fails to retain the precision qualifier when parsing:

   #version 300 es
   centroid in mediump vec2 v;

Consider how the parser's type_qualifier production is applied.
First, the precision_qualifier rule creates a new ast_type_qualifier:

    <precision: mediump>

Then the storage_qualifier rule creates a second one:

    <flags: in>

and calls merge_qualifier() to fold in any previous qualifications,
returning:

    <flags: in, precision: mediump>

Finally, the auxiliary_storage_qualifier creates one for "centroid":

    <flags: centroid>

it then does $$ = $1 and $$.flags |= $2.flags, resulting in:

    <flags: centroid, in>

Since precision isn't stored in the flags bitfield, it is lost.  We need
to instead call merge_qualifier to combine all the fields.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 2062f40d81)
2014-02-07 17:10:10 -08:00
Brian Paul
0fb761b404 st/mesa: avoid sw fallback for getting/decompressing textures
If st_GetTexImage() is to decompress the texture, avoid the fallback
path even if prefer_blit_based_texture_transfer = false.  For drivers
that returned PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER = 0, we
were always taking the fallback path for texture decompression rather
than rendering a quad.  The later is a lot faster.

Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f47e596288)
2014-02-07 17:10:10 -08:00
Ilia Mirkin
31911f8d37 nv50: only over-allocate by a page for code
The pre-fetching doesn't go too far. Tested with over-allocating by only
a page, and didn't see any errors in dmesg. Saves ~512KB of VRAM.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
(cherry picked from commit f76c7ad5b1)
2014-02-07 17:10:10 -08:00
Ilia Mirkin
142f6cc0b4 nv50: fix layerid to be the fp input number rather than vp output number
In the tests they were the same so it didn't matter, but indications are
that this is the correct behaviour. Also take this opportunity to
(trivially) support using gl_Layer in fp.

Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
(cherry picked from commit 364bdd2419)
2014-02-07 17:10:10 -08:00
Ilia Mirkin
156ac628a8 nv50: rework primid logic
Functionally identical but much simpler. Should also better integrate
with future layer/viewport changes/fixes.

Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
(cherry picked from commit c7373b7dc7)
2014-02-07 17:10:10 -08:00
Matt Turner
7aa84761b6 glsl: Initialize ubo_binding_mask flags to zero.
Missed in commit e63bb298. Caused sporadic test failures, like
incorrect-in-layout-qualifier-repeated-prim.geom.

Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit e2ef93cf94)
2014-02-07 17:10:10 -08:00
Marek Olšák
61219adb3d st/mesa: fix crash when a shader uses a TBO and it's not bound
This binds a NULL sampler view in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74251

Cc: "10.1" "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit c6dbcf10df)
2014-02-07 17:10:10 -08:00
Paul Berry
ee632e68bd glsl: Fix continue statements in do-while loops.
From the GLSL 4.40 spec, section 6.4 (Jumps):

    The continue jump is used only in loops. It skips the remainder of
    the body of the inner most loop of which it is inside. For while
    and do-while loops, this jump is to the next evaluation of the
    loop condition-expression from which the loop continues as
    previously defined.

Previously, we incorrectly treated a "continue" statement as jumping
to the top of a do-while loop.

This patch fixes the problem by replicating the loop condition when
converting the "continue" statement to IR.  (We already do a similar
thing in "for" loops, to ensure that "continue" causes the loop
expression to be executed).

Fixes piglit tests:
- glsl-fs-continue-inside-do-while.shader_test
- glsl-vs-continue-inside-do-while.shader_test
- glsl-fs-continue-in-switch-in-do-while.shader_test
- glsl-vs-continue-in-switch-in-do-while.shader_test

Cc: mesa-stable@lists.freedesktop.org

Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7f5740899f)
2014-02-07 17:10:10 -08:00
Paul Berry
b5c99be4af glsl: Make condition_to_hir() callable from outside ast_iteration_statement.
In addition to making it public, we also need to change its first
argument from an ir_loop * to an exec_list *, so that it can be used
to insert the condition anywhere in the IR (rather than just in the
body of the loop).

This will be necessary in order to make continue statements work
properly in do-while loops.

Cc: mesa-stable@lists.freedesktop.org

Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 56790856b3)
2014-02-07 17:10:10 -08:00
Topi Pohjolainen
165868d45e i965/blorp: do not use unnecessary hw-blending support
This is really not needed as blorp blit programs already sample
XRGB normally and get alpha channel set to 1.0 automatically by
the sampler engine. This is simply copied directly to the payload
of the render target write message and hence there is no need for
any additional blending support from the pixel processing pipeline.

The blending formula is anyway broken for color components, it
multiplies the color component with itself (blend factor is the
component itself).
Alpha blending in turn would not fix the alpha to one independent
of the source but simply used the source alpha as is instead
(1.0 * src_alpha + 0.0 * dst_alpha).

Quoting Eric:

 "If we want to actually make the no-alpha-bits-present thing work,
  we need to override the bits in the surface state or in the
  generated code.  In the normal draw path, it's done for sampling
  by the swizzling code in brw_wm_surface_state.c, and the blending
  overrides is just to fix up the alpha blending stage which
  doesn't pay attention to that for the destination surface."

If one modifies piglit test gl-3.2-layered-rendering-blit to use
color component values other than zero or one, this change will
kick in on IVB. No regressions on IVB.

This is effectively revert of c0554141a9:

    i965/blorp: Support overriding destination alpha to 1.0.

    Currently, Blorp requires the source and destination formats to be
    equal.  However, we'd really like to be able to blit between XRGB and
    ARGB formats; our BLT engine paths have supported this for a long time.

    For ARGB -> XRGB, nothing needs to occur: the missing alpha is already
    interpreted as 1.0.  For XRGB -> ARGB, we need to smash the alpha
    channel to 1.0 when writing the destination colors.  This is fairly
    straightforward with blending.

    For now, this code is never used, as the source and destination formats
    still must be equal.  The next patch will relax that restriction.

    NOTE: This is a candidate for the 9.1 branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit 933be19cdf)
2014-02-07 17:10:10 -08:00
Christian König
bbcd975881 radeon/uvd: fix feedback buffer handling v2
Without the correct feedback buffer size UVD runs
into an error on each frame, reducing the maximum FPS.

v2: fixing Michels comments

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "10.1" "10.0" "9.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c3c24c3acc)
2014-02-07 17:10:10 -08:00
Brian Paul
6cfcc4fccf draw: fix incorrect color of flat-shaded clipped lines
When we clipped a line weren't copying the provoking vertex
color to the second vertex.  We also weren't checking for
first vs. last provoking vertex.

Fixes failures found with the new piglit line-flat-clip-color test.

Cc: "10.0, 10.1" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit fc3fcd1e01)
2014-02-07 17:10:09 -08:00
Brian Paul
39a3b0313b gallium/auxiliary/indices: replace free() with FREE()
To match the CALLOC_STRUCT() call.

Cc: "10.0, 10.1" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 307fd76053)
2014-02-07 17:10:09 -08:00
Dave Airlie
9e59e41266 docs: update 10.1 relnotes to note GL 3.3 on r600 and radeonsi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-06 01:03:09 +00:00
Dave Airlie
1289080c4d r600g: Add GL 3.3 support for 10.1 release
All patches on master below, except max samplers
which was removed on master.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>

commit 57c6bb18822ebf88a98b98714c846608ff3ba42b
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Feb 6 00:48:57 2014 +0000

    bump max samplers

commit 2e4bd244493bebd41edf725a2c3c4e793282a5bb
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Jan 30 04:19:57 2014 +0000

    r600g: add support for geom shaders to r600/r700 chipsets (v2)

    This is my first attempt at enabling r600/r700 geometry shaders,
    the basic tests pass on both my rv770 and my rv635,

    It requires this kernel patch:
    http://www.spinics.net/lists/dri-devel/msg52745.html

    v2: address Alex comments.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 0ed4f769d77c4db2259befba5fc1707f1cb5cb98
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Jan 29 21:48:09 2014 +0000

    r600g: enable GLSL 3.30 on evergreen GPUs

    This throws the switch to enable GL 3.3 and GLSL 330.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit aeca8f21dd42b9ecd3932ef028fa8846036c1307
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Feb 4 10:48:42 2014 +1000

    r600g: properly propogate clip dist write value

    This moves the value from the GS shader to the copy shader so the registers
    are setup correctly.

    fixes tests/spec/glsl-1.50/execution/geometry/clip-distance-out-values.shader_test

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit e1bc410fe670bb17078a55876f1700a504127fef
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Feb 3 15:31:26 2014 +1000

    r600g: calculate a better value for array_size (v2)

    attempt to calculate a better value for array size to avoid breaking apps.

    v2: use 0xfff like streamout, suggested by Grigori

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 6f2f117dec51eb51c1b09e86e829e176a98e3bfc
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jan 31 03:35:51 2014 +0000

    r600g: fix CAYMAN geometry shader support

    cayman has a different end of program bit, so do that properly.

    fixes hangs with geom shader tests on cayman.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 305ea22fd517f83406aba3e3930d710fd42a3049
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Jan 29 00:17:15 2014 +0000

    r600g: fix up shader out misc stuff for copy shader

    set the correct values so the misc out register is setup correctly
    for the copy shader.

    This also updates the state for the gs copy shader so the hw
    gets programmed correctly.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 53630e14c8791a84798a03d74653bf46bd013fc7
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jan 28 23:15:29 2014 +0000

    r600g: port the layered surface rendering patch from radeonsi

    This just makes r600 and evergreen do what the radeonsi codepaths do
    for layered rendering. This makes the 2d amd_vertex_shader_layer test
    pass on evergreen.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit aa4cd3b9bed1ea23468fba4aa5c428153e8cddc1
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jan 28 13:04:00 2014 +1000

    r600g: initial VS output layer support

    This just adds support for emitting the proper value in the VS out misc.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 75a93f2e1e0f4d6015cdf63570ec4d3d12478b8d
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jan 28 12:06:49 2014 +1000

    r600g: setup const texture buffers for geom shaders

    This just enables the workarounds we have for vertex/pixel shaders
    for geom shaders as well.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 88697a860635aae54e56dce2d6a839a06dea0c5a
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jan 24 17:14:26 2014 +1000

    r600g: calculate correct cut value

    This selects the cut value depending on the shader selected.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit dfb88bef3e13112a838773e700c35052774f8a63
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jan 24 14:46:37 2014 +1000

    r600g: fix dynamic_input_array_index.shader_test

    This follows what fglrx does, it unpacks the input we are
    going to indirect into a bunch of registers and indirects
    inside them.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit a3c6373f8cf3aab750399654a4b77150ec30bce9
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jan 24 13:39:36 2014 +1000

    r600g: add support for indirect geom ring writes

    We need to be able to write to the ring using a base register
    for when we emit vertices in a loop, in theory the SB compiler
    could collapse these indirect writes to direct writes if the
    register value is constant and known, but that is outside my
    pay grade.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit dbc6a13adf935b118eaa6b396593f50d7b7e16e6
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Dec 24 05:59:19 2013 +0000

    r600g: write proper output prim type

    Vadim's code derived it from the info.mode, but it needs
    to be takes from the geometry shader output primitive.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit f7f51b0b775f652967e2b972cf7c183482a771be
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Dec 24 05:30:37 2013 +0000

    r600g: enable instance cnt register with new enough kernel

    The instance cnt register was missing for a few kernels,
    with a new enough kernel we can output it.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 9e6ce37f66372018ec5398f74c3b43ff5f5bf309
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Dec 23 01:30:03 2013 +0000

    r600g: add primitive input support for gs

    only enable prim id if gs uses it

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit fa932dfc7df3cf9ff63d08fb0e1db2119fc2ac93
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Dec 19 05:17:00 2013 +0000

    r600g: emit streamout from dma copy shader

    This enables streamout with GS in the mix, from the
    VS dma shader.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 205defb542ac185b7f46508fd51a4077a4702107
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Dec 18 15:55:07 2013 +1000

    r600g/gs: fix cases where number of gs inputs != number of gs outputs

    this fixes a bunch of the geom shader built-in tests

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit d9e7ab40bc45644194c86f842599c76d0675243c
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jan 28 10:21:03 2014 +1000

    r600g: increase array base for exported parameters

    Trivial fix to Vadim's code.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 82d67fbd3b96b6b2cc0124a19b6f31b7912ec152
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jan 24 16:41:32 2014 +1000

    r600g: initialise the geom shader loop registers.

    As we do for vertex and pixel shaders.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 78be55d98d290d708bd1b3df3ef6cd5fa89865c7
Author: Dave Airlie <airlied@redhat.com>
Date:   Sat Nov 30 06:26:13 2013 +0000

    r600g: emit NOPs at end of shaders in more cases

    If the shader has no CF clauses at all emit an nop
    If the last instruction is an ENDLOOP add a NOP for the LOOP to go to
    if the last instruction is CALL_FS add a NOP

    These fix a bunch of hangs in the geometry shader tests.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 634b2498dc73efa3cca5a6fc3ed35c5bea6bb2e9
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Nov 28 23:38:35 2013 +0000

    r600g: don't enable SB for geom shaders

    SB needs fixes for three GS instructions it seems to raise
    them outside loops etc despite my best efforts.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 5b61dd0e917e54625ac227b8b1c2c82955f51ab1
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Dec 24 04:56:25 2013 +0000

    r600g/sb: add MEM_RING support

    Although we don't use SB on geom shaders, the VS copy shader will use it
    so we might as well implement MEM_RING support in sb.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 0247375aec4681c154ae4d14b8cd637e7a9e0e3e
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Jan 29 04:08:43 2014 +0000

    r600g: don't fail if we can't map VS->GS ring entries

    This can happen in normal operation, so don't report an error on it,
    just continue.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 2c986600fac6cb5692e9e377cb04f9f50389172c
Author: Vadim Girlin <vadimgirlin@gmail.com>
Date:   Fri Aug 2 06:38:23 2013 +0400

    r600g: initial support for geometry shaders on evergreen (v2)

    This is Vadim's initial work with a few regression fixes squashed in.

    v2: (airlied)
    fix regression in glsl-max-varyings - need to use vs and ps_dirty
    fix regression in shader exports from rebasing.
    whitespace fixing.
    v2.1: squash fix assert

    Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit ce23c43e2b611f30964afe4d1c02c4d0361ba430
Author: Vadim Girlin <vadimgirlin@gmail.com>
Date:   Fri Aug 2 06:32:32 2013 +0400

    r600g: add hw register definitions for GS block setup

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit b0ec79c28d6373930ca0dc19168dd504204456b5
Author: Vadim Girlin <vadimgirlin@gmail.com>
Date:   Wed Jul 31 23:09:39 2013 +0400

    r600g: defer shader variant selection and depending state updates

    [airlied: fix dropped streamout line - fix for master]

    Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit e41cbfb4d15d519f9301699f39d7dd0153f2edf4
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Jan 13 10:19:00 2014 +1000

    r600g/bc: add support for indexed memory writes.

    It looks like we need these for geom shaders in the future.

    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

commit 46efb1648e883b2cb231cca38c1540e7e9ec1ecc
Author: Vadim Girlin <vadimgirlin@gmail.com>
Date:   Wed Jul 31 20:02:22 2013 +0400

    r600g: move barrier and end_of_program bits from output to cf struct (v2)

    v2: fix regression on r600 NOP instructions.

    Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

commit 42802d5d8d145f07cf3fca1bb6e8ab0cd1fd5c85
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Jan 29 01:33:14 2014 +0000

    r600g: split streamout emit code into a separate function

    For geometry shaders we need to call this code from a second place.

    Just move it out for now to keep future patches cleaner.

    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-06 00:49:58 +00:00
45 changed files with 1856 additions and 576 deletions

View File

@@ -1 +1 @@
10.1.0-devel
10.1.0-rc1

View File

@@ -52,7 +52,7 @@ it.</li>
<li>GL_AMD_shader_trinary_minmax.</li>
<li>GL_EXT_framebuffer_blit on r200 and radeon.</li>
<li>Reduced memory usage for display lists.</li>
<li>OpenGL 3.3 support on nv50, nvc0</li>
<li>OpenGL 3.3 support on nv50, nvc0, r600 and radeonsi</li>
</ul>

View File

@@ -588,7 +588,12 @@ do_clip_line( struct draw_stage *stage,
if (v0->clipmask) {
interp( clipper, stage->tmp[0], t0, v0, v1, viewport_index );
copy_flat(stage, stage->tmp[0], v0);
if (stage->draw->rasterizer->flatshade_first) {
copy_flat(stage, stage->tmp[0], v0); /* copy v0 color to tmp[0] */
}
else {
copy_flat(stage, stage->tmp[0], v1); /* copy v1 color to tmp[0] */
}
newprim.v[0] = stage->tmp[0];
}
else {
@@ -597,6 +602,12 @@ do_clip_line( struct draw_stage *stage,
if (v1->clipmask) {
interp( clipper, stage->tmp[1], t1, v1, v0, viewport_index );
if (stage->draw->rasterizer->flatshade_first) {
copy_flat(stage, stage->tmp[1], v0); /* copy v0 color to tmp[1] */
}
else {
copy_flat(stage, stage->tmp[1], v1); /* copy v1 color to tmp[1] */
}
newprim.v[1] = stage->tmp[1];
}
else {

View File

@@ -74,7 +74,7 @@ void
util_primconvert_destroy(struct primconvert_context *pc)
{
util_primconvert_save_index_buffer(pc, NULL);
free(pc);
FREE(pc);
}
void

View File

@@ -71,7 +71,6 @@ struct nv50_ir_varying
#define NV50_SEMANTIC_CLIPDISTANCE (TGSI_SEMANTIC_COUNT + 0)
#define NV50_SEMANTIC_VIEWPORTINDEX (TGSI_SEMANTIC_COUNT + 4)
#define NV50_SEMANTIC_LAYER (TGSI_SEMANTIC_COUNT + 5)
#define NV50_SEMANTIC_INVOCATIONID (TGSI_SEMANTIC_COUNT + 6)
#define NV50_SEMANTIC_TESSFACTOR (TGSI_SEMANTIC_COUNT + 7)
#define NV50_SEMANTIC_TESSCOORD (TGSI_SEMANTIC_COUNT + 8)

View File

@@ -1488,8 +1488,13 @@ CodeEmitterNVC0::emitOUT(const Instruction *i)
// vertex stream
if (i->src(1).getFile() == FILE_IMMEDIATE) {
code[1] |= 0xc000;
code[0] |= SDATA(i->src(1)).u32 << 26;
// Using immediate encoding here triggers an invalid opcode error
// or random results when error reporting is disabled.
// TODO: figure this out when we get multiple vertex streams
assert(SDATA(i->src(1)).u32 == 0);
srcId(NULL, 26);
// code[1] |= 0xc000;
// code[0] |= SDATA(i->src(1)).u32 << 26;
} else {
srcId(i->src(1), 26);
}

View File

@@ -861,8 +861,8 @@ int Source::inferSysValDirection(unsigned sn) const
case TGSI_SEMANTIC_INSTANCEID:
case TGSI_SEMANTIC_VERTEXID:
return 1;
#if 0
case TGSI_SEMANTIC_LAYER:
#if 0
case TGSI_SEMANTIC_VIEWPORTINDEX:
return 0;
#endif

View File

@@ -532,7 +532,7 @@ recordLocation(uint16_t *locs, uint8_t *masks,
case TGSI_SEMANTIC_INSTANCEID: locs[SV_INSTANCE_ID] = addr; break;
case TGSI_SEMANTIC_VERTEXID: locs[SV_VERTEX_ID] = addr; break;
case TGSI_SEMANTIC_PRIMID: locs[SV_PRIMITIVE_ID] = addr; break;
case NV50_SEMANTIC_LAYER: locs[SV_LAYER] = addr; break;
case TGSI_SEMANTIC_LAYER: locs[SV_LAYER] = addr; break;
case NV50_SEMANTIC_VIEWPORTINDEX: locs[SV_VIEWPORT_INDEX] = addr; break;
default:
break;

View File

@@ -104,7 +104,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
prog->vp.bfc[info->out[i].si] = i;
break;
case TGSI_SEMANTIC_LAYER:
prog->gp.has_layer = true;
prog->gp.has_layer = TRUE;
prog->gp.layerid = n;
break;
default:
@@ -170,10 +170,8 @@ nv50_fragprog_assign_slots(struct nv50_ir_prog_info *info)
if (info->in[i].sn == TGSI_SEMANTIC_COLOR)
prog->vp.bfc[info->in[i].si] = j;
else if (info->in[i].sn == TGSI_SEMANTIC_PRIMID) {
else if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
prog->gp.primid = j;
}
prog->in[j].id = i;
prog->in[j].mask = info->in[i].mask;
@@ -345,7 +343,6 @@ nv50_program_translate(struct nv50_program *prog, uint16_t chipset)
prog->vp.clpd[0] = map_undef;
prog->vp.clpd[1] = map_undef;
prog->vp.psiz = map_undef;
prog->gp.primid = 0x80;
prog->gp.has_layer = 0;
info->driverPriv = prog;

View File

@@ -88,9 +88,8 @@ struct nv50_program {
struct {
uint32_t vert_count;
ubyte primid; /* primitive id output register */
uint8_t prim_type; /* point, line strip or tri strip */
bool has_layer;
uint8_t has_layer;
ubyte layerid; /* hw value of layer output */
} gp;

View File

@@ -741,12 +741,13 @@ nv50_screen_create(struct nouveau_device *dev)
goto fail;
}
/* This over-allocates by a whole code BO. The GP, which would execute at
* the end of the last page, would trigger faults. The going theory is that
* it prefetches up to a certain amount. This avoids dmesg spam.
/* This over-allocates by a page. The GP, which would execute at the end of
* the last page, would trigger faults. The going theory is that it
* prefetches up to a certain amount.
*/
ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 16,
4 << NV50_CODE_BO_SIZE_LOG2, NULL, &screen->code);
(3 << NV50_CODE_BO_SIZE_LOG2) + 0x1000,
NULL, &screen->code);
if (ret) {
NOUVEAU_ERR("Failed to allocate code bo: %d\n", ret);
goto fail;

View File

@@ -346,7 +346,7 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
struct nv50_varying dummy;
int i, n, c, m;
uint32_t primid = 0;
uint32_t layerid = vp->gp.layerid;
uint32_t layerid = 0;
uint32_t psiz = 0x000;
uint32_t interp = fp->fp.interp;
uint32_t colors = fp->fp.colors;
@@ -401,17 +401,21 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
if (vp->out[n].sn == fp->in[i].sn &&
vp->out[n].si == fp->in[i].si)
break;
if (i == fp->gp.primid) {
switch (fp->in[i].sn) {
case TGSI_SEMANTIC_PRIMID:
primid = m;
break;
case TGSI_SEMANTIC_LAYER:
layerid = m;
break;
}
m = nv50_vec4_map(map, m, lin,
&fp->in[i], (n < vp->out_nr) ? &vp->out[n] : &dummy);
}
if (vp->gp.has_layer) {
// In GL4.x, layer can be an fp input, but not in 3.x. Make sure to add
// it to the output map.
map[m++] = layerid;
if (vp->gp.has_layer && !layerid) {
layerid = m;
map[m++] = vp->gp.layerid;
}
if (nv50->rast->pipe.point_size_per_vertex) {

View File

@@ -64,7 +64,7 @@ nvc0_shader_output_address(unsigned sn, unsigned si, unsigned ubase)
switch (sn) {
case NV50_SEMANTIC_TESSFACTOR: return 0x000 + si * 0x4;
case TGSI_SEMANTIC_PRIMID: return 0x060;
case NV50_SEMANTIC_LAYER: return 0x064;
case TGSI_SEMANTIC_LAYER: return 0x064;
case NV50_SEMANTIC_VIEWPORTINDEX: return 0x068;
case TGSI_SEMANTIC_PSIZE: return 0x06c;
case TGSI_SEMANTIC_POSITION: return 0x070;

View File

@@ -190,7 +190,7 @@ nvc0_gmtyprog_validate(struct nvc0_context *nvc0)
/* we allow GPs with no code for specifying stream output state only */
if (gp && gp->code_size) {
const boolean gp_selects_layer = gp->hdr[13] & (1 << 9);
const boolean gp_selects_layer = !!(gp->hdr[13] & (1 << 9));
BEGIN_NVC0(push, NVC0_3D(MACRO_GP_SELECT), 1);
PUSH_DATA (push, 0x41);

View File

@@ -79,45 +79,49 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode_cf *cf)
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf->output.gpr) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf->output.elem_size) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf->output.array_base) |
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type);
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type) |
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf->output.index_gpr);
bc->bytecode[id] =
S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf->output.burst_count - 1) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_X(cf->output.swizzle_x) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Y(cf->output.swizzle_y) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Z(cf->output.swizzle_z) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_W(cf->output.swizzle_w) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->output.barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode);
if (bc->chip_class == EVERGREEN) /* no EOP on cayman */
bc->bytecode[id] |= S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->output.end_of_program);
bc->bytecode[id] |= S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program);
id++;
} else if (cfop->flags & CF_STRM) {
/* MEM_STREAM instructions */
} else if (cfop->flags & CF_MEM) {
/* MEM_STREAM, MEM_RING instructions */
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf->output.gpr) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf->output.elem_size) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf->output.array_base) |
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type);
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type) |
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf->output.index_gpr);
bc->bytecode[id] = S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf->output.burst_count - 1) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->output.barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(cf->output.comp_mask) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size);
if (bc->chip_class == EVERGREEN) /* no EOP on cayman */
bc->bytecode[id] |= S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->output.end_of_program);
bc->bytecode[id] |= S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program);
id++;
} else {
/* branch, loop, call, return instructions */
/* other instructions */
bc->bytecode[id++] = S_SQ_CF_WORD0_ADDR(cf->cf_addr >> 1);
bc->bytecode[id++] = S_SQ_CF_WORD1_CF_INST(opcode)|
S_SQ_CF_WORD1_BARRIER(1) |
S_SQ_CF_WORD1_COND(cf->cond) |
S_SQ_CF_WORD1_POP_COUNT(cf->pop_count);
S_SQ_CF_WORD1_POP_COUNT(cf->pop_count) |
S_SQ_CF_WORD1_END_OF_PROGRAM(cf->end_of_program);
}
}
return 0;
}
#if 0
void eg_bytecode_export_read(struct r600_bytecode *bc,
struct r600_bytecode_output *output, uint32_t word0, uint32_t word1)
{
@@ -138,3 +142,4 @@ void eg_bytecode_export_read(struct r600_bytecode *bc,
output->array_size = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(word1);
output->comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1);
}
#endif

View File

@@ -1407,7 +1407,7 @@ void evergreen_init_color_surface(struct r600_context *rctx,
struct pipe_resource *pipe_tex = surf->base.texture;
unsigned level = surf->base.u.tex.level;
unsigned pitch, slice;
unsigned color_info, color_attrib, color_dim = 0;
unsigned color_info, color_attrib, color_dim = 0, color_view;
unsigned format, swap, ntype, endian;
uint64_t offset, base_offset;
unsigned non_disp_tiling, macro_aspect, tile_split, bankh, bankw, fmask_bankh, nbanks;
@@ -1416,10 +1416,15 @@ void evergreen_init_color_surface(struct r600_context *rctx,
bool blend_clamp = 0, blend_bypass = 0;
offset = rtex->surface.level[level].offset;
if (rtex->surface.level[level].mode < RADEON_SURF_MODE_1D) {
if (rtex->surface.level[level].mode == RADEON_SURF_MODE_LINEAR) {
assert(surf->base.u.tex.first_layer == surf->base.u.tex.last_layer);
offset += rtex->surface.level[level].slice_size *
surf->base.u.tex.first_layer;
}
color_view = 0;
} else
color_view = S_028C6C_SLICE_START(surf->base.u.tex.first_layer) |
S_028C6C_SLICE_MAX(surf->base.u.tex.last_layer);
pitch = (rtex->surface.level[level].nblk_x) / 8 - 1;
slice = (rtex->surface.level[level].nblk_x * rtex->surface.level[level].nblk_y) / 64;
if (slice) {
@@ -1569,12 +1574,7 @@ void evergreen_init_color_surface(struct r600_context *rctx,
surf->cb_color_info = color_info;
surf->cb_color_pitch = S_028C64_PITCH_TILE_MAX(pitch);
surf->cb_color_slice = S_028C68_SLICE_TILE_MAX(slice);
if (rtex->surface.level[level].mode < RADEON_SURF_MODE_1D) {
surf->cb_color_view = 0;
} else {
surf->cb_color_view = S_028C6C_SLICE_START(surf->base.u.tex.first_layer) |
S_028C6C_SLICE_MAX(surf->base.u.tex.last_layer);
}
surf->cb_color_view = color_view;
surf->cb_color_attrib = color_attrib;
if (rtex->fmask.size) {
surf->cb_color_fmask = (base_offset + rtex->fmask.offset) >> 8;
@@ -1829,7 +1829,6 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx,
rctx->db_misc_state.atom.dirty = true;
}
evergreen_update_db_shader_control(rctx);
/* Calculate the CS size. */
rctx->framebuffer.atom.num_dw = 4; /* SCISSOR */
@@ -2519,6 +2518,7 @@ static void evergreen_emit_constant_buffers(struct r600_context *rctx,
struct r600_resource *rbuffer;
uint64_t va;
unsigned buffer_index = ffs(dirty_mask) - 1;
unsigned gs_ring_buffer = (buffer_index == R600_GS_RING_CONST_BUFFER);
cb = &state->cb[buffer_index];
rbuffer = (struct r600_resource*)cb->buffer;
@@ -2527,10 +2527,12 @@ static void evergreen_emit_constant_buffers(struct r600_context *rctx,
va = r600_resource_va(&rctx->screen->b.b, &rbuffer->b.b);
va += cb->buffer_offset;
r600_write_context_reg_flag(cs, reg_alu_constbuf_size + buffer_index * 4,
ALIGN_DIVUP(cb->buffer_size >> 4, 16), pkt_flags);
r600_write_context_reg_flag(cs, reg_alu_const_cache + buffer_index * 4, va >> 8,
pkt_flags);
if (!gs_ring_buffer) {
r600_write_context_reg_flag(cs, reg_alu_constbuf_size + buffer_index * 4,
ALIGN_DIVUP(cb->buffer_size >> 4, 16), pkt_flags);
r600_write_context_reg_flag(cs, reg_alu_const_cache + buffer_index * 4, va >> 8,
pkt_flags);
}
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0) | pkt_flags);
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READ));
@@ -2540,10 +2542,12 @@ static void evergreen_emit_constant_buffers(struct r600_context *rctx,
radeon_emit(cs, va); /* RESOURCEi_WORD0 */
radeon_emit(cs, rbuffer->buf->size - cb->buffer_offset - 1); /* RESOURCEi_WORD1 */
radeon_emit(cs, /* RESOURCEi_WORD2 */
S_030008_ENDIAN_SWAP(r600_endian_swap(32)) |
S_030008_STRIDE(16) |
S_030008_BASE_ADDRESS_HI(va >> 32UL));
S_030008_ENDIAN_SWAP(gs_ring_buffer ? ENDIAN_NONE : r600_endian_swap(32)) |
S_030008_STRIDE(gs_ring_buffer ? 4 : 16) |
S_030008_BASE_ADDRESS_HI(va >> 32UL) |
S_030008_DATA_FORMAT(FMT_32_32_32_32_FLOAT));
radeon_emit(cs, /* RESOURCEi_WORD3 */
S_03000C_UNCACHED(gs_ring_buffer ? 1 : 0) |
S_03000C_DST_SEL_X(V_03000C_SQ_SEL_X) |
S_03000C_DST_SEL_Y(V_03000C_SQ_SEL_Y) |
S_03000C_DST_SEL_Z(V_03000C_SQ_SEL_Z) |
@@ -2551,7 +2555,8 @@ static void evergreen_emit_constant_buffers(struct r600_context *rctx,
radeon_emit(cs, 0); /* RESOURCEi_WORD4 */
radeon_emit(cs, 0); /* RESOURCEi_WORD5 */
radeon_emit(cs, 0); /* RESOURCEi_WORD6 */
radeon_emit(cs, 0xc0000000); /* RESOURCEi_WORD7 */
radeon_emit(cs, /* RESOURCEi_WORD7 */
S_03001C_TYPE(V_03001C_SQ_TEX_VTX_VALID_BUFFER));
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0) | pkt_flags);
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READ));
@@ -2715,6 +2720,77 @@ static void evergreen_emit_vertex_fetch_shader(struct r600_context *rctx, struct
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, shader->buffer, RADEON_USAGE_READ));
}
static void evergreen_emit_shader_stages(struct r600_context *rctx, struct r600_atom *a)
{
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
struct r600_shader_stages_state *state = (struct r600_shader_stages_state*)a;
uint32_t v = 0, v2 = 0, primid = 0;
if (state->geom_enable) {
uint32_t cut_val;
if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 128)
cut_val = V_028A40_GS_CUT_128;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 256)
cut_val = V_028A40_GS_CUT_256;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 512)
cut_val = V_028A40_GS_CUT_512;
else
cut_val = V_028A40_GS_CUT_1024;
v = S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) |
S_028B54_GS_EN(1) |
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
v2 = S_028A40_MODE(V_028A40_GS_SCENARIO_G) |
S_028A40_CUT_MODE(cut_val);
if (rctx->gs_shader->current->shader.gs_prim_id_input)
primid = 1;
}
r600_write_context_reg(cs, R_028B54_VGT_SHADER_STAGES_EN, v);
r600_write_context_reg(cs, R_028A40_VGT_GS_MODE, v2);
r600_write_context_reg(cs, R_028A84_VGT_PRIMITIVEID_EN, primid);
}
static void evergreen_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a)
{
struct pipe_screen *screen = rctx->b.b.screen;
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a;
struct r600_resource *rbuffer;
r600_write_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1));
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_VGT_FLUSH));
if (state->enable) {
rbuffer =(struct r600_resource*)state->esgs_ring.buffer;
r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE,
(r600_resource_va(screen, &rbuffer->b.b)) >> 8);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE));
r600_write_config_reg(cs, R_008C44_SQ_ESGS_RING_SIZE,
state->esgs_ring.buffer_size >> 8);
rbuffer =(struct r600_resource*)state->gsvs_ring.buffer;
r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE,
(r600_resource_va(screen, &rbuffer->b.b)) >> 8);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE));
r600_write_config_reg(cs, R_008C4C_SQ_GSVS_RING_SIZE,
state->gsvs_ring.buffer_size >> 8);
} else {
r600_write_config_reg(cs, R_008C44_SQ_ESGS_RING_SIZE, 0);
r600_write_config_reg(cs, R_008C4C_SQ_GSVS_RING_SIZE, 0);
}
r600_write_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1));
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_VGT_FLUSH));
}
void cayman_init_common_regs(struct r600_command_buffer *cb,
enum chip_class ctx_chip_class,
enum radeon_family ctx_family,
@@ -2905,6 +2981,7 @@ static void cayman_init_atom_start_cs(struct r600_context *rctx)
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0, 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (64 * 4), 0x01000FFF);
}
void evergreen_init_common_regs(struct r600_command_buffer *cb,
@@ -3363,6 +3440,7 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx)
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0, 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (64 * 4), 0x01000FFF);
}
void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
@@ -3510,6 +3588,102 @@ void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader
shader->flatshade = rctx->rasterizer->flatshade;
}
void evergreen_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{
struct r600_command_buffer *cb = &shader->command_buffer;
struct r600_shader *rshader = &shader->shader;
r600_init_command_buffer(cb, 32);
r600_store_context_reg(cb, R_028890_SQ_PGM_RESOURCES_ES,
S_028890_NUM_GPRS(rshader->bc.ngpr) |
S_028890_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_02888C_SQ_PGM_START_ES,
r600_resource_va(ctx->screen, (void *)shader->bo) >> 8);
/* After that, the NOP relocation packet must be emitted (shader->bo, RADEON_USAGE_READ). */
}
static unsigned r600_conv_prim_to_gs_out(unsigned mode)
{
static const int prim_conv[] = {
V_028A6C_OUTPRIM_TYPE_POINTLIST,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode < Elements(prim_conv));
return prim_conv[mode];
}
void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_command_buffer *cb = &shader->command_buffer;
struct r600_shader *rshader = &shader->shader;
struct r600_shader *cp_shader = &shader->gs_copy_shader->shader;
unsigned gsvs_itemsize =
(cp_shader->ring_item_size * rshader->gs_max_out_vertices) >> 2;
r600_init_command_buffer(cb, 64);
/* VGT_GS_MODE is written by evergreen_emit_shader_stages */
r600_store_context_reg(cb, R_028AB8_VGT_VTX_CNT_EN, 1);
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
S_028B38_MAX_VERT_OUT(rshader->gs_max_out_vertices));
r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
r600_conv_prim_to_gs_out(rshader->gs_output_prim));
if (rctx->screen->b.info.drm_minor >= 35) {
r600_store_context_reg(cb, R_028B90_VGT_GS_INSTANCE_CNT,
S_028B90_CNT(0) |
S_028B90_ENABLE(0));
}
r600_store_context_reg_seq(cb, R_02891C_SQ_GS_VERT_ITEMSIZE, 4);
r600_store_value(cb, cp_shader->ring_item_size >> 2);
r600_store_value(cb, 0);
r600_store_value(cb, 0);
r600_store_value(cb, 0);
r600_store_context_reg(cb, R_028900_SQ_ESGS_RING_ITEMSIZE,
(rshader->ring_item_size) >> 2);
r600_store_context_reg(cb, R_028904_SQ_GSVS_RING_ITEMSIZE,
gsvs_itemsize);
r600_store_context_reg_seq(cb, R_02892C_SQ_GSVS_RING_OFFSET_1, 3);
r600_store_value(cb, gsvs_itemsize);
r600_store_value(cb, gsvs_itemsize);
r600_store_value(cb, gsvs_itemsize);
/* FIXME calculate these values somehow ??? */
r600_store_context_reg_seq(cb, R_028A54_GS_PER_ES, 3);
r600_store_value(cb, 0x80); /* GS_PER_ES */
r600_store_value(cb, 0x100); /* ES_PER_GS */
r600_store_value(cb, 0x2); /* GS_PER_VS */
r600_store_context_reg(cb, R_028878_SQ_PGM_RESOURCES_GS,
S_028878_NUM_GPRS(rshader->bc.ngpr) |
S_028878_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_028874_SQ_PGM_START_GS,
r600_resource_va(ctx->screen, (void *)shader->bo) >> 8);
/* After that, the NOP relocation packet must be emitted (shader->bo, RADEON_USAGE_READ). */
}
void evergreen_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{
struct r600_command_buffer *cb = &shader->command_buffer;
@@ -3552,7 +3726,8 @@ void evergreen_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader
S_02881C_VS_OUT_CCDIST0_VEC_ENA((rshader->clip_dist_write & 0x0F) != 0) |
S_02881C_VS_OUT_CCDIST1_VEC_ENA((rshader->clip_dist_write & 0xF0) != 0) |
S_02881C_VS_OUT_MISC_VEC_ENA(rshader->vs_out_misc_write) |
S_02881C_USE_VTX_POINT_SIZE(rshader->vs_out_point_size);
S_02881C_USE_VTX_POINT_SIZE(rshader->vs_out_point_size) |
S_02881C_USE_VTX_RENDER_TARGET_INDX(rshader->vs_out_layer);
}
void *evergreen_create_resolve_blend(struct r600_context *rctx)
@@ -3919,6 +4094,10 @@ void evergreen_init_state_functions(struct r600_context *rctx)
rctx->atoms[id++] = &rctx->b.streamout.begin_atom;
r600_init_atom(rctx, &rctx->vertex_shader.atom, id++, r600_emit_shader, 23);
r600_init_atom(rctx, &rctx->pixel_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->geometry_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->export_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->shader_stages.atom, id++, evergreen_emit_shader_stages, 6);
r600_init_atom(rctx, &rctx->gs_rings.atom, id++, evergreen_emit_gs_rings, 26);
rctx->b.b.create_blend_state = evergreen_create_blend_state;
rctx->b.b.create_depth_stencil_alpha_state = evergreen_create_dsa_state;

View File

@@ -48,6 +48,7 @@
#define EVENT_TYPE_ZPASS_DONE 0x15
#define EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT 0x16
#define EVENT_TYPE_SO_VGTSTREAMOUT_FLUSH 0x1f
#define EVENT_TYPE_VGT_FLUSH 0x24
#define EVENT_TYPE_FLUSH_AND_INV_DB_META 0x2c
#define EVENT_TYPE(x) ((x) << 0)
@@ -274,6 +275,11 @@
#define G_008E2C_NUM_LS_LDS(x) (((x) >> 16) & 0xFFFF)
#define C_008E2C_NUM_LS_LDS(x) 0xFFFF0000
#define R_008C40_SQ_ESGS_RING_BASE 0x00008C40
#define R_008C44_SQ_ESGS_RING_SIZE 0x00008C44
#define R_008C48_SQ_GSVS_RING_BASE 0x00008C48
#define R_008C4C_SQ_GSVS_RING_SIZE 0x00008C4C
#define R_008CF0_SQ_MS_FIFO_SIZES 0x00008CF0
#define S_008CF0_CACHE_FIFO_SIZE(x) (((x) & 0xFF) << 0)
#define G_008CF0_CACHE_FIFO_SIZE(x) (((x) >> 0) & 0xFF)
@@ -821,12 +827,22 @@
#define S_028A40_MODE(x) (((x) & 0x3) << 0)
#define G_028A40_MODE(x) (((x) >> 0) & 0x3)
#define C_028A40_MODE 0xFFFFFFFC
#define V_028A40_GS_OFF 0
#define V_028A40_GS_SCENARIO_A 1
#define V_028A40_GS_SCENARIO_B 2
#define V_028A40_GS_SCENARIO_G 3
#define V_028A40_GS_SCENARIO_C 4
#define V_028A40_SPRITE_EN 5
#define S_028A40_ES_PASSTHRU(x) (((x) & 0x1) << 2)
#define G_028A40_ES_PASSTHRU(x) (((x) >> 2) & 0x1)
#define C_028A40_ES_PASSTHRU 0xFFFFFFFB
#define S_028A40_CUT_MODE(x) (((x) & 0x3) << 3)
#define G_028A40_CUT_MODE(x) (((x) >> 3) & 0x3)
#define C_028A40_CUT_MODE 0xFFFFFFE7
#define V_028A40_GS_CUT_1024 0
#define V_028A40_GS_CUT_512 1
#define V_028A40_GS_CUT_256 2
#define V_028A40_GS_CUT_128 3
#define S_028A40_COMPUTE_MODE(x) (x << 14)
#define S_028A40_PARTIAL_THD_AT_EOI(x) (x << 17)
#define R_028A6C_VGT_GS_OUT_PRIM_TYPE 0x028A6C
@@ -1201,6 +1217,7 @@
#define C_030008_ENDIAN_SWAP 0x3FFFFFFF
#define R_03000C_SQ_VTX_CONSTANT_WORD3_0 0x03000C
#define S_03000C_UNCACHED(x) (((x) & 0x1) << 2)
#define S_03000C_DST_SEL_X(x) (((x) & 0x7) << 3)
#define G_03000C_DST_SEL_X(x) (((x) >> 3) & 0x7)
#define V_03000C_SQ_SEL_X 0x00000000
@@ -1457,6 +1474,34 @@
#define G_028860_UNCACHED_FIRST_INST(x) (((x) >> 28) & 0x1)
#define C_028860_UNCACHED_FIRST_INST 0xEFFFFFFF
#define R_028878_SQ_PGM_RESOURCES_GS 0x028878
#define S_028878_NUM_GPRS(x) (((x) & 0xFF) << 0)
#define G_028878_NUM_GPRS(x) (((x) >> 0) & 0xFF)
#define C_028878_NUM_GPRS 0xFFFFFF00
#define S_028878_STACK_SIZE(x) (((x) & 0xFF) << 8)
#define G_028878_STACK_SIZE(x) (((x) >> 8) & 0xFF)
#define C_028878_STACK_SIZE 0xFFFF00FF
#define S_028878_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_028878_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_028878_DX10_CLAMP 0xFFDFFFFF
#define S_028878_UNCACHED_FIRST_INST(x) (((x) & 0x1) << 28)
#define G_028878_UNCACHED_FIRST_INST(x) (((x) >> 28) & 0x1)
#define C_028878_UNCACHED_FIRST_INST 0xEFFFFFFF
#define R_028890_SQ_PGM_RESOURCES_ES 0x028890
#define S_028890_NUM_GPRS(x) (((x) & 0xFF) << 0)
#define G_028890_NUM_GPRS(x) (((x) >> 0) & 0xFF)
#define C_028890_NUM_GPRS 0xFFFFFF00
#define S_028890_STACK_SIZE(x) (((x) & 0xFF) << 8)
#define G_028890_STACK_SIZE(x) (((x) >> 8) & 0xFF)
#define C_028890_STACK_SIZE 0xFFFF00FF
#define S_028890_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_028890_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_028890_DX10_CLAMP 0xFFDFFFFF
#define S_028890_UNCACHED_FIRST_INST(x) (((x) & 0x1) << 28)
#define G_028890_UNCACHED_FIRST_INST(x) (((x) >> 28) & 0x1)
#define C_028890_UNCACHED_FIRST_INST 0xEFFFFFFF
#define R_028864_SQ_PGM_RESOURCES_2_VS 0x028864
#define S_028864_SINGLE_ROUND(x) (((x) & 0x3) << 0)
#define G_028864_SINGLE_ROUND(x) (((x) >> 0) & 0x3)
@@ -1880,6 +1925,8 @@
#define G_02884C_EXPORT_Z(x) (((x) >> 0) & 0x1)
#define C_02884C_EXPORT_Z 0xFFFFFFFE
#define R_02885C_SQ_PGM_START_VS 0x0002885C
#define R_028874_SQ_PGM_START_GS 0x00028874
#define R_02888C_SQ_PGM_START_ES 0x0002888C
#define R_0288A4_SQ_PGM_START_FS 0x000288A4
#define R_0288D0_SQ_PGM_START_LS 0x000288d0
#define R_0288A8_SQ_PGM_RESOURCES_FS 0x000288A8
@@ -1894,6 +1941,9 @@
#define R_028920_SQ_GS_VERT_ITEMSIZE_1 0x00028920
#define R_028924_SQ_GS_VERT_ITEMSIZE_2 0x00028924
#define R_028928_SQ_GS_VERT_ITEMSIZE_3 0x00028928
#define R_02892C_SQ_GSVS_RING_OFFSET_1 0x0002892C
#define R_028930_SQ_GSVS_RING_OFFSET_2 0x00028930
#define R_028934_SQ_GSVS_RING_OFFSET_3 0x00028934
#define R_028940_ALU_CONST_CACHE_PS_0 0x00028940
#define R_028944_ALU_CONST_CACHE_PS_1 0x00028944
#define R_028980_ALU_CONST_CACHE_VS_0 0x00028980
@@ -1928,6 +1978,15 @@
#define S_028A48_VPORT_SCISSOR_ENABLE(x) (((x) & 0x1) << 1)
#define S_028A48_LINE_STIPPLE_ENABLE(x) (((x) & 0x1) << 2)
#define R_028A4C_PA_SC_MODE_CNTL_1 0x00028A4C
#define R_028A54_GS_PER_ES 0x00028A54
#define R_028A58_ES_PER_GS 0x00028A58
#define R_028A5C_GS_PER_VS 0x00028A5C
#define R_028A84_VGT_PRIMITIVEID_EN 0x028A84
#define S_028A84_PRIMITIVEID_EN(x) (((x) & 0x1) << 0)
#define G_028A84_PRIMITIVEID_EN(x) (((x) >> 0) & 0x1)
#define C_028A84_PRIMITIVEID_EN 0xFFFFFFFE
#define R_028A94_VGT_MULTI_PRIM_IB_RESET_EN 0x00028A94
#define S_028A94_RESET_EN(x) (((x) & 0x1) << 0)
#define G_028A94_RESET_EN(x) (((x) >> 0) & 0x1)
@@ -1962,11 +2021,27 @@
#define R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET 0x028B28
#define R_028B2C_VGT_STRMOUT_DRAW_OPAQUE_BUFFER_FILLED_SIZE 0x028B2C
#define R_028B30_VGT_STRMOUT_DRAW_OPAQUE_VERTEX_STRIDE 0x028B30
#define R_028B38_VGT_GS_MAX_VERT_OUT 0x028B38
#define S_028B38_MAX_VERT_OUT(x) (((x) & 0x7FF) << 0)
#define R_028B44_VGT_STRMOUT_BASE_OFFSET_HI_0 0x028B44
#define R_028B48_VGT_STRMOUT_BASE_OFFSET_HI_1 0x028B48
#define R_028B4C_VGT_STRMOUT_BASE_OFFSET_HI_2 0x028B4C
#define R_028B50_VGT_STRMOUT_BASE_OFFSET_HI_3 0x028B50
#define R_028B54_VGT_SHADER_STAGES_EN 0x00028B54
#define S_028B54_LS_EN(x) (((x) & 0x3) << 0)
#define V_028B54_LS_STAGE_OFF 0x00
#define V_028B54_LS_STAGE_ON 0x01
#define V_028B54_CS_STAGE_ON 0x02
#define S_028B54_HS_EN(x) (((x) & 0x1) << 2)
#define S_028B54_ES_EN(x) (((x) & 0x3) << 3)
#define V_028B54_ES_STAGE_OFF 0x00
#define V_028B54_ES_STAGE_DS 0x01
#define V_028B54_ES_STAGE_REAL 0x02
#define S_028B54_GS_EN(x) (((x) & 0x1) << 5)
#define S_028B54_VS_EN(x) (((x) & 0x3) << 6)
#define V_028B54_VS_STAGE_REAL 0x00
#define V_028B54_VS_STAGE_DS 0x01
#define V_028B54_VS_STAGE_COPY_SHADER 0x02
#define R_028B70_DB_ALPHA_TO_MASK 0x00028B70
#define S_028B70_ALPHA_TO_MASK_ENABLE(x) (((x) & 0x1) << 0)
#define S_028B70_ALPHA_TO_MASK_OFFSET0(x) (((x) & 0x3) << 8)
@@ -1998,12 +2073,9 @@
#define S_028B8C_OFFSET(x) (((x) & 0xFFFFFFFF) << 0)
#define G_028B8C_OFFSET(x) (((x) >> 0) & 0xFFFFFFFF)
#define C_028B8C_OFFSET 0x00000000
#define R_028B94_VGT_STRMOUT_CONFIG 0x028B94
#define S_028B94_STREAMOUT_0_EN(x) (((x) & 0x1) << 0)
#define S_028B94_STREAMOUT_1_EN(x) (((x) & 0x1) << 1)
#define S_028B94_STREAMOUT_2_EN(x) (((x) & 0x1) << 2)
#define S_028B94_STREAMOUT_3_EN(x) (((x) & 0x1) << 3)
#define S_028B94_RAST_STREAM(x) (((x) & 0x07) << 4)
#define R_028B90_VGT_GS_INSTANCE_CNT 0x00028B90
#define S_028B90_ENABLE(x) (((x) & 0x1) << 0)
#define S_028B90_CNT(x) (((x) & 0x7F) << 2)
#define R_028B98_VGT_STRMOUT_BUFFER_CONFIG 0x028B98
#define S_028B98_STREAM_0_BUFFER_EN(x) (((x) & 0x0F) << 0)
#define S_028B98_STREAM_1_BUFFER_EN(x) (((x) & 0x0F) << 4)

View File

@@ -193,7 +193,6 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
if ((output->gpr + output->burst_count) == bc->cf_last->output.gpr &&
(output->array_base + output->burst_count) == bc->cf_last->output.array_base) {
bc->cf_last->output.end_of_program |= output->end_of_program;
bc->cf_last->op = bc->cf_last->output.op = output->op;
bc->cf_last->output.gpr = output->gpr;
bc->cf_last->output.array_base = output->array_base;
@@ -203,7 +202,6 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
} else if (output->gpr == (bc->cf_last->output.gpr + bc->cf_last->output.burst_count) &&
output->array_base == (bc->cf_last->output.array_base + bc->cf_last->output.burst_count)) {
bc->cf_last->output.end_of_program |= output->end_of_program;
bc->cf_last->op = bc->cf_last->output.op = output->op;
bc->cf_last->output.burst_count += output->burst_count;
return 0;
@@ -215,6 +213,7 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
return r;
bc->cf_last->op = output->op;
memcpy(&bc->cf_last->output, output, sizeof(struct r600_bytecode_output));
bc->cf_last->barrier = 1;
return 0;
}
@@ -1526,24 +1525,26 @@ static int r600_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf->output.gpr) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf->output.elem_size) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf->output.array_base) |
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type);
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type) |
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf->output.index_gpr);
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf->output.burst_count - 1) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_X(cf->output.swizzle_x) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Y(cf->output.swizzle_y) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Z(cf->output.swizzle_z) |
S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_W(cf->output.swizzle_w) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->output.barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->output.end_of_program);
} else if (cfop->flags & CF_STRM) {
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program);
} else if (cfop->flags & CF_MEM) {
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf->output.gpr) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf->output.elem_size) |
S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf->output.array_base) |
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type);
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf->output.type) |
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf->output.index_gpr);
bc->bytecode[id++] = S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf->output.burst_count - 1) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->output.barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->output.end_of_program) |
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(cf->output.comp_mask);
} else {
@@ -1551,7 +1552,8 @@ static int r600_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode
bc->bytecode[id++] = S_SQ_CF_WORD1_CF_INST(opcode) |
S_SQ_CF_WORD1_BARRIER(1) |
S_SQ_CF_WORD1_COND(cf->cond) |
S_SQ_CF_WORD1_POP_COUNT(cf->pop_count);
S_SQ_CF_WORD1_POP_COUNT(cf->pop_count) |
S_SQ_CF_WORD1_END_OF_PROGRAM(cf->end_of_program);
}
return 0;
}
@@ -1932,12 +1934,12 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
print_indent(o, 67);
fprintf(stderr, " ES:%X ", cf->output.elem_size);
if (!cf->output.barrier)
if (!cf->barrier)
fprintf(stderr, "NO_BARRIER ");
if (cf->output.end_of_program)
if (cf->end_of_program)
fprintf(stderr, "EOP ");
fprintf(stderr, "\n");
} else if (r600_isa_cf(cf->op)->flags & CF_STRM) {
} else if (r600_isa_cf(cf->op)->flags & CF_MEM) {
int o = 0;
const char *exp_type[] = {"WRITE", "WRITE_IND", "WRITE_ACK",
"WRITE_IND_ACK"};
@@ -1963,14 +1965,17 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
o += print_swizzle(7);
}
if (cf->output.type == V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND)
o += fprintf(stderr, " R%d", cf->output.index_gpr);
o += print_indent(o, 67);
fprintf(stderr, " ES:%i ", cf->output.elem_size);
if (cf->output.array_size != 0xFFF)
fprintf(stderr, "AS:%i ", cf->output.array_size);
if (!cf->output.barrier)
if (!cf->barrier)
fprintf(stderr, "NO_BARRIER ");
if (cf->output.end_of_program)
if (cf->end_of_program)
fprintf(stderr, "EOP ");
fprintf(stderr, "\n");
} else {
@@ -2486,6 +2491,7 @@ void r600_bytecode_alu_read(struct r600_bytecode *bc,
}
}
#if 0
void r600_bytecode_export_read(struct r600_bytecode *bc,
struct r600_bytecode_output *output, uint32_t word0, uint32_t word1)
{
@@ -2506,3 +2512,4 @@ void r600_bytecode_export_read(struct r600_bytecode *bc,
output->array_size = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(word1);
output->comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1);
}
#endif

View File

@@ -115,7 +115,6 @@ struct r600_bytecode_output {
unsigned array_size;
unsigned comp_mask;
unsigned type;
unsigned end_of_program;
unsigned op;
@@ -126,7 +125,7 @@ struct r600_bytecode_output {
unsigned swizzle_z;
unsigned swizzle_w;
unsigned burst_count;
unsigned barrier;
unsigned index_gpr;
};
struct r600_bytecode_kcache {
@@ -148,6 +147,8 @@ struct r600_bytecode_cf {
struct r600_bytecode_kcache kcache[4];
unsigned r6xx_uses_waterfall;
unsigned eg_alu_extended;
unsigned barrier;
unsigned end_of_program;
struct list_head alu;
struct list_head tex;
struct list_head vtx;

View File

@@ -59,6 +59,7 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op
util_blitter_save_vertex_buffer_slot(rctx->blitter, rctx->vertex_buffer_state.vb);
util_blitter_save_vertex_elements(rctx->blitter, rctx->vertex_fetch_shader.cso);
util_blitter_save_vertex_shader(rctx->blitter, rctx->vs_shader);
util_blitter_save_geometry_shader(rctx->blitter, rctx->gs_shader);
util_blitter_save_so_targets(rctx->blitter, rctx->b.streamout.num_targets,
(struct pipe_stream_output_target**)rctx->b.streamout.targets);
util_blitter_save_rasterizer(rctx->blitter, rctx->rasterizer_state.cso);

View File

@@ -301,6 +301,12 @@ void r600_begin_new_cs(struct r600_context *ctx)
ctx->config_state.atom.dirty = true;
ctx->stencil_ref.atom.dirty = true;
ctx->vertex_fetch_shader.atom.dirty = true;
ctx->export_shader.atom.dirty = true;
if (ctx->gs_shader) {
ctx->geometry_shader.atom.dirty = true;
ctx->shader_stages.atom.dirty = true;
ctx->gs_rings.atom.dirty = true;
}
ctx->vertex_shader.atom.dirty = true;
ctx->viewport.atom.dirty = true;

View File

@@ -372,6 +372,11 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
return 1;
case PIPE_CAP_GLSL_FEATURE_LEVEL:
if (family >= CHIP_CEDAR)
return 330;
/* pre-evergreen geom shaders need newer kernel */
if (rscreen->b.info.drm_minor >= 37)
return 330;
return 140;
/* Supported except the original R600. */
@@ -383,6 +388,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
/* Supported on Evergreen. */
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
case PIPE_CAP_CUBE_MAP_ARRAY:
case PIPE_CAP_TGSI_VS_LAYER:
return family >= CHIP_CEDAR ? 1 : 0;
/* Unsupported features. */
@@ -392,7 +398,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_TGSI_VS_LAYER:
return 0;
/* Stream output. */
@@ -416,7 +421,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
return rscreen->b.info.drm_minor >= 9 ?
(family >= CHIP_CEDAR ? 16384 : 8192) : 0;
case PIPE_CAP_MAX_COMBINED_SAMPLERS:
return 32;
return 48;
/* Render targets. */
case PIPE_CAP_MAX_RENDER_TARGETS:
@@ -449,14 +454,20 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, enum pipe_shader_cap param)
{
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
switch(shader)
{
case PIPE_SHADER_FRAGMENT:
case PIPE_SHADER_VERTEX:
case PIPE_SHADER_COMPUTE:
case PIPE_SHADER_COMPUTE:
break;
case PIPE_SHADER_GEOMETRY:
/* XXX: support and enable geometry programs */
if (rscreen->b.family >= CHIP_CEDAR)
break;
/* pre-evergreen geom shaders need newer kernel */
if (rscreen->b.info.drm_minor >= 37)
break;
return 0;
default:
/* XXX: support tessellation on Evergreen */

View File

@@ -38,7 +38,7 @@
#include "util/u_double_list.h"
#include "util/u_transfer.h"
#define R600_NUM_ATOMS 41
#define R600_NUM_ATOMS 42
/* the number of CS dwords for flushing and drawing */
#define R600_MAX_FLUSH_CS_DWORDS 16
@@ -46,13 +46,14 @@
#define R600_TRACE_CS_DWORDS 7
#define R600_MAX_USER_CONST_BUFFERS 13
#define R600_MAX_DRIVER_CONST_BUFFERS 3
#define R600_MAX_DRIVER_CONST_BUFFERS 4
#define R600_MAX_CONST_BUFFERS (R600_MAX_USER_CONST_BUFFERS + R600_MAX_DRIVER_CONST_BUFFERS)
/* start driver buffers after user buffers */
#define R600_UCP_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS)
#define R600_TXQ_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS + 1)
#define R600_BUFFER_INFO_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS + 2)
#define R600_GS_RING_CONST_BUFFER (R600_MAX_USER_CONST_BUFFERS + 3)
#define R600_MAX_CONST_BUFFER_SIZE 4096
@@ -159,6 +160,7 @@ struct r600_sample_mask {
struct r600_config_state {
struct r600_atom atom;
unsigned sq_gpr_resource_mgmt_1;
unsigned sq_gpr_resource_mgmt_2;
};
struct r600_stencil_ref
@@ -179,6 +181,18 @@ struct r600_viewport_state {
struct pipe_viewport_state state;
};
struct r600_shader_stages_state {
struct r600_atom atom;
unsigned geom_enable;
};
struct r600_gs_rings_state {
struct r600_atom atom;
unsigned enable;
struct pipe_constant_buffer esgs_ring;
struct pipe_constant_buffer gsvs_ring;
};
/* This must start from 16. */
/* features */
#define DBG_NO_LLVM (1 << 17)
@@ -353,7 +367,7 @@ struct r600_fetch_shader {
struct r600_shader_state {
struct r600_atom atom;
struct r600_pipe_shader_selector *shader;
struct r600_pipe_shader *shader;
};
struct r600_context {
@@ -415,7 +429,11 @@ struct r600_context {
struct r600_cso_state vertex_fetch_shader;
struct r600_shader_state vertex_shader;
struct r600_shader_state pixel_shader;
struct r600_shader_state geometry_shader;
struct r600_shader_state export_shader;
struct r600_cs_shader_state cs_shader_state;
struct r600_shader_stages_state shader_stages;
struct r600_gs_rings_state gs_rings;
struct r600_constbuf_state constbuf_state[PIPE_SHADER_TYPES];
struct r600_textures_info samplers[PIPE_SHADER_TYPES];
/** Vertex buffers for fetch shaders */
@@ -427,6 +445,7 @@ struct r600_context {
unsigned compute_cb_target_mask;
struct r600_pipe_shader_selector *ps_shader;
struct r600_pipe_shader_selector *vs_shader;
struct r600_pipe_shader_selector *gs_shader;
struct r600_rasterizer_state *rasterizer;
bool alpha_to_one;
bool force_blend_disable;
@@ -506,6 +525,8 @@ void cayman_init_common_regs(struct r600_command_buffer *cb,
void evergreen_init_state_functions(struct r600_context *rctx);
void evergreen_init_atom_start_cs(struct r600_context *rctx);
void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void evergreen_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void evergreen_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void *evergreen_create_db_flush_dsa(struct r600_context *rctx);
void *evergreen_create_resolve_blend(struct r600_context *rctx);
@@ -545,6 +566,8 @@ r600_create_sampler_view_custom(struct pipe_context *ctx,
void r600_init_state_functions(struct r600_context *rctx);
void r600_init_atom_start_cs(struct r600_context *rctx);
void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void r600_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader);
void *r600_create_db_flush_dsa(struct r600_context *rctx);
void *r600_create_resolve_blend(struct r600_context *rctx);

File diff suppressed because it is too large Load Diff

View File

@@ -37,6 +37,7 @@ struct r600_shader_io {
unsigned lds_pos; /* for evergreen */
unsigned back_color_input;
unsigned write_mask;
int ring_offset;
};
struct r600_shader {
@@ -61,12 +62,22 @@ struct r600_shader {
/* flag is set if the shader writes VS_OUT_MISC_VEC (e.g. for PSIZE) */
boolean vs_out_misc_write;
boolean vs_out_point_size;
boolean vs_out_layer;
boolean has_txq_cube_array_z_comp;
boolean uses_tex_buffers;
boolean gs_prim_id_input;
/* geometry shader properties */
unsigned gs_input_prim;
unsigned gs_output_prim;
unsigned gs_max_out_vertices;
/* size in bytes of a data item in the ring (single vertex data) */
unsigned ring_item_size;
unsigned indirect_files;
unsigned max_arrays;
unsigned num_arrays;
unsigned vs_as_es;
struct r600_shader_array * arrays;
};
@@ -74,6 +85,7 @@ struct r600_shader_key {
unsigned color_two_side:1;
unsigned alpha_to_one:1;
unsigned nr_cbufs:4;
unsigned vs_as_es:1;
};
struct r600_shader_array {
@@ -85,6 +97,8 @@ struct r600_shader_array {
struct r600_pipe_shader {
struct r600_pipe_shader_selector *selector;
struct r600_pipe_shader *next_variant;
/* for GS - corresponding copy shader (installed as VS) */
struct r600_pipe_shader *gs_copy_shader;
struct r600_shader shader;
struct r600_command_buffer command_buffer; /* register writes */
struct r600_resource *bo;

View File

@@ -1264,6 +1264,7 @@ static void r600_init_color_surface(struct r600_context *rctx,
unsigned level = surf->base.u.tex.level;
unsigned pitch, slice;
unsigned color_info;
unsigned color_view;
unsigned format, swap, ntype, endian;
unsigned offset;
const struct util_format_description *desc;
@@ -1277,10 +1278,15 @@ static void r600_init_color_surface(struct r600_context *rctx,
}
offset = rtex->surface.level[level].offset;
if (rtex->surface.level[level].mode < RADEON_SURF_MODE_1D) {
if (rtex->surface.level[level].mode == RADEON_SURF_MODE_LINEAR) {
assert(surf->base.u.tex.first_layer == surf->base.u.tex.last_layer);
offset += rtex->surface.level[level].slice_size *
surf->base.u.tex.first_layer;
}
surf->base.u.tex.first_layer;
color_view = 0;
} else
color_view = S_028080_SLICE_START(surf->base.u.tex.first_layer) |
S_028080_SLICE_MAX(surf->base.u.tex.last_layer);
pitch = rtex->surface.level[level].nblk_x / 8 - 1;
slice = (rtex->surface.level[level].nblk_x * rtex->surface.level[level].nblk_y) / 64;
if (slice) {
@@ -1466,14 +1472,7 @@ static void r600_init_color_surface(struct r600_context *rctx,
}
surf->cb_color_info = color_info;
if (rtex->surface.level[level].mode < RADEON_SURF_MODE_1D) {
surf->cb_color_view = 0;
} else {
surf->cb_color_view = S_028080_SLICE_START(surf->base.u.tex.first_layer) |
S_028080_SLICE_MAX(surf->base.u.tex.last_layer);
}
surf->cb_color_view = color_view;
surf->color_initialized = true;
}
@@ -1667,8 +1666,6 @@ static void r600_set_framebuffer_state(struct pipe_context *ctx,
rctx->alphatest_state.atom.dirty = true;
}
r600_update_db_shader_control(rctx);
/* Calculate the CS size. */
rctx->framebuffer.atom.num_dw =
10 /*COLOR_INFO*/ + 4 /*SCISSOR*/ + 3 /*SHADER_CONTROL*/ + 8 /*MSAA*/;
@@ -2067,6 +2064,7 @@ static void r600_emit_config_state(struct r600_context *rctx, struct r600_atom *
struct r600_config_state *a = (struct r600_config_state*)atom;
r600_write_config_reg(cs, R_008C04_SQ_GPR_RESOURCE_MGMT_1, a->sq_gpr_resource_mgmt_1);
r600_write_config_reg(cs, R_008C08_SQ_GPR_RESOURCE_MGMT_2, a->sq_gpr_resource_mgmt_2);
}
static void r600_emit_vertex_buffers(struct r600_context *rctx, struct r600_atom *atom)
@@ -2118,16 +2116,18 @@ static void r600_emit_constant_buffers(struct r600_context *rctx,
struct r600_resource *rbuffer;
unsigned offset;
unsigned buffer_index = ffs(dirty_mask) - 1;
unsigned gs_ring_buffer = (buffer_index == R600_GS_RING_CONST_BUFFER);
cb = &state->cb[buffer_index];
rbuffer = (struct r600_resource*)cb->buffer;
assert(rbuffer);
offset = cb->buffer_offset;
r600_write_context_reg(cs, reg_alu_constbuf_size + buffer_index * 4,
ALIGN_DIVUP(cb->buffer_size >> 4, 16));
r600_write_context_reg(cs, reg_alu_const_cache + buffer_index * 4, offset >> 8);
if (!gs_ring_buffer) {
r600_write_context_reg(cs, reg_alu_constbuf_size + buffer_index * 4,
ALIGN_DIVUP(cb->buffer_size >> 4, 16));
r600_write_context_reg(cs, reg_alu_const_cache + buffer_index * 4, offset >> 8);
}
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READ));
@@ -2137,8 +2137,8 @@ static void r600_emit_constant_buffers(struct r600_context *rctx,
radeon_emit(cs, offset); /* RESOURCEi_WORD0 */
radeon_emit(cs, rbuffer->buf->size - offset - 1); /* RESOURCEi_WORD1 */
radeon_emit(cs, /* RESOURCEi_WORD2 */
S_038008_ENDIAN_SWAP(r600_endian_swap(32)) |
S_038008_STRIDE(16));
S_038008_ENDIAN_SWAP(gs_ring_buffer ? ENDIAN_NONE : r600_endian_swap(32)) |
S_038008_STRIDE(gs_ring_buffer ? 4 : 16));
radeon_emit(cs, 0); /* RESOURCEi_WORD3 */
radeon_emit(cs, 0); /* RESOURCEi_WORD4 */
radeon_emit(cs, 0); /* RESOURCEi_WORD5 */
@@ -2323,34 +2323,124 @@ static void r600_emit_vertex_fetch_shader(struct r600_context *rctx, struct r600
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, shader->buffer, RADEON_USAGE_READ));
}
static void r600_emit_shader_stages(struct r600_context *rctx, struct r600_atom *a)
{
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
struct r600_shader_stages_state *state = (struct r600_shader_stages_state*)a;
uint32_t v2 = 0, primid = 0;
if (state->geom_enable) {
uint32_t cut_val;
if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 128)
cut_val = V_028A40_GS_CUT_128;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 256)
cut_val = V_028A40_GS_CUT_256;
else if (rctx->gs_shader->current->shader.gs_max_out_vertices <= 512)
cut_val = V_028A40_GS_CUT_512;
else
cut_val = V_028A40_GS_CUT_1024;
v2 = S_028A40_MODE(V_028A40_GS_SCENARIO_G) |
S_028A40_CUT_MODE(cut_val);
if (rctx->gs_shader->current->shader.gs_prim_id_input)
primid = 1;
}
r600_write_context_reg(cs, R_028A40_VGT_GS_MODE, v2);
r600_write_context_reg(cs, R_028A84_VGT_PRIMITIVEID_EN, primid);
}
static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a)
{
struct pipe_screen *screen = rctx->b.b.screen;
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a;
struct r600_resource *rbuffer;
r600_write_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1));
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_VGT_FLUSH));
if (state->enable) {
rbuffer =(struct r600_resource*)state->esgs_ring.buffer;
r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE,
(r600_resource_va(screen, &rbuffer->b.b)) >> 8);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE));
r600_write_config_reg(cs, R_008C44_SQ_ESGS_RING_SIZE,
state->esgs_ring.buffer_size >> 8);
rbuffer =(struct r600_resource*)state->gsvs_ring.buffer;
r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE,
(r600_resource_va(screen, &rbuffer->b.b)) >> 8);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE));
r600_write_config_reg(cs, R_008C4C_SQ_GSVS_RING_SIZE,
state->gsvs_ring.buffer_size >> 8);
} else {
r600_write_config_reg(cs, R_008C44_SQ_ESGS_RING_SIZE, 0);
r600_write_config_reg(cs, R_008C4C_SQ_GSVS_RING_SIZE, 0);
}
r600_write_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1));
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_VGT_FLUSH));
}
/* Adjust GPR allocation on R6xx/R7xx */
bool r600_adjust_gprs(struct r600_context *rctx)
{
unsigned num_ps_gprs = rctx->ps_shader->current->shader.bc.ngpr;
unsigned num_vs_gprs = rctx->vs_shader->current->shader.bc.ngpr;
unsigned num_vs_gprs, num_es_gprs, num_gs_gprs;
unsigned new_num_ps_gprs = num_ps_gprs;
unsigned new_num_vs_gprs = num_vs_gprs;
unsigned new_num_vs_gprs, new_num_es_gprs, new_num_gs_gprs;
unsigned cur_num_ps_gprs = G_008C04_NUM_PS_GPRS(rctx->config_state.sq_gpr_resource_mgmt_1);
unsigned cur_num_vs_gprs = G_008C04_NUM_VS_GPRS(rctx->config_state.sq_gpr_resource_mgmt_1);
unsigned cur_num_gs_gprs = G_008C08_NUM_GS_GPRS(rctx->config_state.sq_gpr_resource_mgmt_2);
unsigned cur_num_es_gprs = G_008C08_NUM_ES_GPRS(rctx->config_state.sq_gpr_resource_mgmt_2);
unsigned def_num_ps_gprs = rctx->default_ps_gprs;
unsigned def_num_vs_gprs = rctx->default_vs_gprs;
unsigned def_num_gs_gprs = 0;
unsigned def_num_es_gprs = 0;
unsigned def_num_clause_temp_gprs = rctx->r6xx_num_clause_temp_gprs;
/* hardware will reserve twice num_clause_temp_gprs */
unsigned max_gprs = def_num_ps_gprs + def_num_vs_gprs + def_num_clause_temp_gprs * 2;
unsigned tmp;
unsigned max_gprs = def_num_gs_gprs + def_num_es_gprs + def_num_ps_gprs + def_num_vs_gprs + def_num_clause_temp_gprs * 2;
unsigned tmp, tmp2;
if (rctx->gs_shader) {
num_es_gprs = rctx->vs_shader->current->shader.bc.ngpr;
num_gs_gprs = rctx->gs_shader->current->shader.bc.ngpr;
num_vs_gprs = rctx->gs_shader->current->gs_copy_shader->shader.bc.ngpr;
} else {
num_es_gprs = 0;
num_gs_gprs = 0;
num_vs_gprs = rctx->vs_shader->current->shader.bc.ngpr;
}
new_num_vs_gprs = num_vs_gprs;
new_num_es_gprs = num_es_gprs;
new_num_gs_gprs = num_gs_gprs;
/* the sum of all SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS must <= to max_gprs */
if (new_num_ps_gprs > cur_num_ps_gprs || new_num_vs_gprs > cur_num_vs_gprs) {
if (new_num_ps_gprs > cur_num_ps_gprs || new_num_vs_gprs > cur_num_vs_gprs ||
new_num_es_gprs > cur_num_es_gprs || new_num_gs_gprs > cur_num_gs_gprs) {
/* try to use switch back to default */
if (new_num_ps_gprs > def_num_ps_gprs || new_num_vs_gprs > def_num_vs_gprs) {
if (new_num_ps_gprs > def_num_ps_gprs || new_num_vs_gprs > def_num_vs_gprs ||
new_num_gs_gprs > def_num_gs_gprs || new_num_es_gprs > def_num_es_gprs) {
/* always privilege vs stage so that at worst we have the
* pixel stage producing wrong output (not the vertex
* stage) */
new_num_ps_gprs = max_gprs - (new_num_vs_gprs + def_num_clause_temp_gprs * 2);
new_num_ps_gprs = max_gprs - ((new_num_vs_gprs - new_num_es_gprs - new_num_gs_gprs) + def_num_clause_temp_gprs * 2);
new_num_vs_gprs = num_vs_gprs;
new_num_gs_gprs = num_gs_gprs;
new_num_es_gprs = num_es_gprs;
} else {
new_num_ps_gprs = def_num_ps_gprs;
new_num_vs_gprs = def_num_vs_gprs;
new_num_es_gprs = def_num_es_gprs;
new_num_gs_gprs = def_num_gs_gprs;
}
} else {
return true;
@@ -2362,10 +2452,11 @@ bool r600_adjust_gprs(struct r600_context *rctx)
* it will lockup. So in this case just discard the draw command
* and don't change the current gprs repartitions.
*/
if (num_ps_gprs > new_num_ps_gprs || num_vs_gprs > new_num_vs_gprs) {
R600_ERR("ps & vs shader require too many register (%d + %d) "
if (num_ps_gprs > new_num_ps_gprs || num_vs_gprs > new_num_vs_gprs ||
num_gs_gprs > new_num_gs_gprs || num_es_gprs > new_num_es_gprs) {
R600_ERR("shaders require too many register (%d + %d + %d + %d) "
"for a combined maximum of %d\n",
num_ps_gprs, num_vs_gprs, max_gprs);
num_ps_gprs, num_vs_gprs, num_es_gprs, num_gs_gprs, max_gprs);
return false;
}
@@ -2373,8 +2464,12 @@ bool r600_adjust_gprs(struct r600_context *rctx)
tmp = S_008C04_NUM_PS_GPRS(new_num_ps_gprs) |
S_008C04_NUM_VS_GPRS(new_num_vs_gprs) |
S_008C04_NUM_CLAUSE_TEMP_GPRS(def_num_clause_temp_gprs);
if (rctx->config_state.sq_gpr_resource_mgmt_1 != tmp) {
tmp2 = S_008C08_NUM_ES_GPRS(new_num_es_gprs) |
S_008C08_NUM_GS_GPRS(new_num_gs_gprs);
if (rctx->config_state.sq_gpr_resource_mgmt_1 != tmp || rctx->config_state.sq_gpr_resource_mgmt_2 != tmp2) {
rctx->config_state.sq_gpr_resource_mgmt_1 = tmp;
rctx->config_state.sq_gpr_resource_mgmt_2 = tmp2;
rctx->config_state.atom.dirty = true;
rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE;
}
@@ -2492,19 +2587,19 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
num_es_stack_entries = 16;
break;
case CHIP_RV770:
num_ps_gprs = 192;
num_ps_gprs = 130;
num_vs_gprs = 56;
num_temp_gprs = 4;
num_gs_gprs = 0;
num_es_gprs = 0;
num_ps_threads = 188;
num_gs_gprs = 31;
num_es_gprs = 31;
num_ps_threads = 180;
num_vs_threads = 60;
num_gs_threads = 0;
num_es_threads = 0;
num_ps_stack_entries = 256;
num_vs_stack_entries = 256;
num_gs_stack_entries = 0;
num_es_stack_entries = 0;
num_gs_threads = 4;
num_es_threads = 4;
num_ps_stack_entries = 128;
num_vs_stack_entries = 128;
num_gs_stack_entries = 128;
num_es_stack_entries = 128;
break;
case CHIP_RV730:
case CHIP_RV740:
@@ -2513,10 +2608,10 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
num_temp_gprs = 4;
num_gs_gprs = 0;
num_es_gprs = 0;
num_ps_threads = 188;
num_ps_threads = 180;
num_vs_threads = 60;
num_gs_threads = 0;
num_es_threads = 0;
num_gs_threads = 4;
num_es_threads = 4;
num_ps_stack_entries = 128;
num_vs_stack_entries = 128;
num_gs_stack_entries = 0;
@@ -2528,10 +2623,10 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
num_temp_gprs = 4;
num_gs_gprs = 0;
num_es_gprs = 0;
num_ps_threads = 144;
num_ps_threads = 136;
num_vs_threads = 48;
num_gs_threads = 0;
num_es_threads = 0;
num_gs_threads = 4;
num_es_threads = 4;
num_ps_stack_entries = 128;
num_vs_stack_entries = 128;
num_gs_stack_entries = 0;
@@ -2707,9 +2802,12 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
r600_store_value(cb, 0); /* R_028240_PA_SC_GENERIC_SCISSOR_TL */
r600_store_value(cb, S_028244_BR_X(8192) | S_028244_BR_Y(8192)); /* R_028244_PA_SC_GENERIC_SCISSOR_BR */
r600_store_context_reg_seq(cb, R_0288CC_SQ_PGM_CF_OFFSET_PS, 2);
r600_store_context_reg_seq(cb, R_0288CC_SQ_PGM_CF_OFFSET_PS, 5);
r600_store_value(cb, 0); /* R_0288CC_SQ_PGM_CF_OFFSET_PS */
r600_store_value(cb, 0); /* R_0288D0_SQ_PGM_CF_OFFSET_VS */
r600_store_value(cb, 0); /* R_0288D4_SQ_PGM_CF_OFFSET_GS */
r600_store_value(cb, 0); /* R_0288D8_SQ_PGM_CF_OFFSET_ES */
r600_store_value(cb, 0); /* R_0288DC_SQ_PGM_CF_OFFSET_FS */
r600_store_context_reg(cb, R_0288E0_SQ_VTX_SEMANTIC_CLEAR, ~0);
@@ -2718,7 +2816,6 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
r600_store_value(cb, 0); /* R_028404_VGT_MIN_VTX_INDX */
r600_store_context_reg(cb, R_0288A4_SQ_PGM_RESOURCES_FS, 0);
r600_store_context_reg(cb, R_0288DC_SQ_PGM_CF_OFFSET_FS, 0);
if (rctx->b.chip_class == R700 && rctx->screen->b.has_streamout)
r600_store_context_reg(cb, R_028354_SX_SURFACE_SYNC, S_028354_SURFACE_SYNC_MASK(0xf));
@@ -2729,6 +2826,7 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
r600_store_loop_const(cb, R_03E200_SQ_LOOP_CONST_0, 0x1000FFF);
r600_store_loop_const(cb, R_03E200_SQ_LOOP_CONST_0 + (32 * 4), 0x1000FFF);
r600_store_loop_const(cb, R_03E200_SQ_LOOP_CONST_0 + (64 * 4), 0x1000FFF);
}
void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
@@ -2901,6 +2999,94 @@ void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha
S_02881C_USE_VTX_POINT_SIZE(rshader->vs_out_point_size);
}
static unsigned r600_conv_prim_to_gs_out(unsigned mode)
{
static const int prim_conv[] = {
V_028A6C_OUTPRIM_TYPE_POINTLIST,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode < Elements(prim_conv));
return prim_conv[mode];
}
void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_command_buffer *cb = &shader->command_buffer;
struct r600_shader *rshader = &shader->shader;
struct r600_shader *cp_shader = &shader->gs_copy_shader->shader;
unsigned gsvs_itemsize =
(cp_shader->ring_item_size * rshader->gs_max_out_vertices) >> 2;
r600_init_command_buffer(cb, 64);
/* VGT_GS_MODE is written by r600_emit_shader_stages */
r600_store_context_reg(cb, R_028AB8_VGT_VTX_CNT_EN, 1);
if (rctx->b.chip_class >= R700) {
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
S_028B38_MAX_VERT_OUT(rshader->gs_max_out_vertices));
}
r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
r600_conv_prim_to_gs_out(rshader->gs_output_prim));
r600_store_context_reg_seq(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE, 4);
r600_store_value(cb, cp_shader->ring_item_size >> 2);
r600_store_value(cb, 0);
r600_store_value(cb, 0);
r600_store_value(cb, 0);
r600_store_context_reg(cb, R_0288A8_SQ_ESGS_RING_ITEMSIZE,
(rshader->ring_item_size) >> 2);
r600_store_context_reg(cb, R_0288AC_SQ_GSVS_RING_ITEMSIZE,
gsvs_itemsize);
/* FIXME calculate these values somehow ??? */
r600_store_config_reg_seq(cb, R_0088C8_VGT_GS_PER_ES, 2);
r600_store_value(cb, 0x80); /* GS_PER_ES */
r600_store_value(cb, 0x100); /* ES_PER_GS */
r600_store_config_reg_seq(cb, R_0088E8_VGT_GS_PER_VS, 1);
r600_store_value(cb, 0x2); /* GS_PER_VS */
r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_GS,
S_02887C_NUM_GPRS(rshader->bc.ngpr) |
S_02887C_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_02886C_SQ_PGM_START_GS,
r600_resource_va(ctx->screen, (void *)shader->bo) >> 8);
/* After that, the NOP relocation packet must be emitted (shader->bo, RADEON_USAGE_READ). */
}
void r600_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader *shader)
{
struct r600_command_buffer *cb = &shader->command_buffer;
struct r600_shader *rshader = &shader->shader;
r600_init_command_buffer(cb, 32);
r600_store_context_reg(cb, R_028890_SQ_PGM_RESOURCES_ES,
S_028890_NUM_GPRS(rshader->bc.ngpr) |
S_028890_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_028880_SQ_PGM_START_ES,
r600_resource_va(ctx->screen, (void *)shader->bo) >> 8);
/* After that, the NOP relocation packet must be emitted (shader->bo, RADEON_USAGE_READ). */
}
void *r600_create_resolve_blend(struct r600_context *rctx)
{
struct pipe_blend_state blend;
@@ -3262,6 +3448,10 @@ void r600_init_state_functions(struct r600_context *rctx)
rctx->atoms[id++] = &rctx->b.streamout.begin_atom;
r600_init_atom(rctx, &rctx->vertex_shader.atom, id++, r600_emit_shader, 23);
r600_init_atom(rctx, &rctx->pixel_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->geometry_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->export_shader.atom, id++, r600_emit_shader, 0);
r600_init_atom(rctx, &rctx->shader_stages.atom, id++, r600_emit_shader_stages, 0);
r600_init_atom(rctx, &rctx->gs_rings.atom, id++, r600_emit_gs_rings, 0);
rctx->b.b.create_blend_state = r600_create_blend_state;
rctx->b.b.create_depth_stencil_alpha_state = r600_create_dsa_state;

View File

@@ -301,11 +301,6 @@ static void r600_bind_dsa_state(struct pipe_context *ctx, void *state)
rctx->alphatest_state.sx_alpha_test_control = dsa->sx_alpha_test_control;
rctx->alphatest_state.sx_alpha_ref = dsa->alpha_ref;
rctx->alphatest_state.atom.dirty = true;
if (rctx->b.chip_class >= EVERGREEN) {
evergreen_update_db_shader_control(rctx);
} else {
r600_update_db_shader_control(rctx);
}
}
}
@@ -698,6 +693,8 @@ static INLINE struct r600_shader_key r600_shader_selector_key(struct pipe_contex
/* Dual-source blending only makes sense with nr_cbufs == 1. */
if (key.nr_cbufs == 1 && rctx->dual_src_blend)
key.nr_cbufs = 2;
} else if (sel->type == PIPE_SHADER_VERTEX) {
key.vs_as_es = (rctx->gs_shader != NULL);
}
return key;
}
@@ -709,7 +706,6 @@ static int r600_shader_select(struct pipe_context *ctx,
bool *dirty)
{
struct r600_shader_key key;
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader * shader = NULL;
int r;
@@ -771,11 +767,6 @@ static int r600_shader_select(struct pipe_context *ctx,
shader->next_variant = sel->current;
sel->current = shader;
if (rctx->ps_shader &&
rctx->cb_misc_state.nr_ps_color_outputs != rctx->ps_shader->current->nr_ps_color_outputs) {
rctx->cb_misc_state.nr_ps_color_outputs = rctx->ps_shader->current->nr_ps_color_outputs;
rctx->cb_misc_state.atom.dirty = true;
}
return 0;
}
@@ -784,16 +775,10 @@ static void *r600_create_shader_state(struct pipe_context *ctx,
unsigned pipe_shader_type)
{
struct r600_pipe_shader_selector *sel = CALLOC_STRUCT(r600_pipe_shader_selector);
int r;
sel->type = pipe_shader_type;
sel->tokens = tgsi_dup_tokens(state->tokens);
sel->so = state->stream_output;
r = r600_shader_select(ctx, sel, NULL);
if (r)
return NULL;
return sel;
}
@@ -809,6 +794,12 @@ static void *r600_create_vs_state(struct pipe_context *ctx,
return r600_create_shader_state(ctx, state, PIPE_SHADER_VERTEX);
}
static void *r600_create_gs_state(struct pipe_context *ctx,
const struct pipe_shader_state *state)
{
return r600_create_shader_state(ctx, state, PIPE_SHADER_GEOMETRY);
}
static void r600_bind_ps_state(struct pipe_context *ctx, void *state)
{
struct r600_context *rctx = (struct r600_context *)ctx;
@@ -816,31 +807,7 @@ static void r600_bind_ps_state(struct pipe_context *ctx, void *state)
if (!state)
state = rctx->dummy_pixel_shader;
rctx->pixel_shader.shader = rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
rctx->pixel_shader.atom.num_dw = rctx->ps_shader->current->command_buffer.num_dw;
rctx->pixel_shader.atom.dirty = true;
r600_context_add_resource_size(ctx, (struct pipe_resource *)rctx->ps_shader->current->bo);
if (rctx->b.chip_class <= R700) {
bool multiwrite = rctx->ps_shader->current->shader.fs_write_all;
if (rctx->cb_misc_state.multiwrite != multiwrite) {
rctx->cb_misc_state.multiwrite = multiwrite;
rctx->cb_misc_state.atom.dirty = true;
}
}
if (rctx->cb_misc_state.nr_ps_color_outputs != rctx->ps_shader->current->nr_ps_color_outputs) {
rctx->cb_misc_state.nr_ps_color_outputs = rctx->ps_shader->current->nr_ps_color_outputs;
rctx->cb_misc_state.atom.dirty = true;
}
if (rctx->b.chip_class >= EVERGREEN) {
evergreen_update_db_shader_control(rctx);
} else {
r600_update_db_shader_control(rctx);
}
rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
}
static void r600_bind_vs_state(struct pipe_context *ctx, void *state)
@@ -850,19 +817,19 @@ static void r600_bind_vs_state(struct pipe_context *ctx, void *state)
if (!state)
return;
rctx->vertex_shader.shader = rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
rctx->vertex_shader.atom.dirty = true;
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
rctx->b.streamout.stride_in_dw = rctx->vs_shader->so.stride;
}
r600_context_add_resource_size(ctx, (struct pipe_resource *)rctx->vs_shader->current->bo);
static void r600_bind_gs_state(struct pipe_context *ctx, void *state)
{
struct r600_context *rctx = (struct r600_context *)ctx;
/* Update clip misc state. */
if (rctx->vs_shader->current->pa_cl_vs_out_cntl != rctx->clip_misc_state.pa_cl_vs_out_cntl ||
rctx->vs_shader->current->shader.clip_dist_write != rctx->clip_misc_state.clip_dist_write) {
rctx->clip_misc_state.pa_cl_vs_out_cntl = rctx->vs_shader->current->pa_cl_vs_out_cntl;
rctx->clip_misc_state.clip_dist_write = rctx->vs_shader->current->shader.clip_dist_write;
rctx->clip_misc_state.atom.dirty = true;
}
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
if (!state)
return;
rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;
}
static void r600_delete_shader_selector(struct pipe_context *ctx,
@@ -905,6 +872,20 @@ static void r600_delete_vs_state(struct pipe_context *ctx, void *state)
r600_delete_shader_selector(ctx, sel);
}
static void r600_delete_gs_state(struct pipe_context *ctx, void *state)
{
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader_selector *sel = (struct r600_pipe_shader_selector *)state;
if (rctx->gs_shader == sel) {
rctx->gs_shader = NULL;
}
r600_delete_shader_selector(ctx, sel);
}
void r600_constant_buffers_dirty(struct r600_context *rctx, struct r600_constbuf_state *state)
{
if (state->dirty_mask) {
@@ -1098,10 +1079,65 @@ static void r600_setup_txq_cube_array_constants(struct r600_context *rctx, int s
pipe_resource_reference(&cb.buffer, NULL);
}
static void update_shader_atom(struct pipe_context *ctx,
struct r600_shader_state *state,
struct r600_pipe_shader *shader)
{
state->shader = shader;
if (shader) {
state->atom.num_dw = shader->command_buffer.num_dw;
state->atom.dirty = true;
r600_context_add_resource_size(ctx, (struct pipe_resource *)shader->bo);
} else {
state->atom.num_dw = 0;
state->atom.dirty = false;
}
}
static void update_gs_block_state(struct r600_context *rctx, unsigned enable)
{
if (rctx->shader_stages.geom_enable != enable) {
rctx->shader_stages.geom_enable = enable;
rctx->shader_stages.atom.dirty = true;
}
if (rctx->gs_rings.enable != enable) {
rctx->gs_rings.enable = enable;
rctx->gs_rings.atom.dirty = true;
if (enable && !rctx->gs_rings.esgs_ring.buffer) {
unsigned size = 0x1C000;
rctx->gs_rings.esgs_ring.buffer =
pipe_buffer_create(rctx->b.b.screen, PIPE_BIND_CUSTOM,
PIPE_USAGE_STATIC, size);
rctx->gs_rings.esgs_ring.buffer_size = size;
size = 0x4000000;
rctx->gs_rings.gsvs_ring.buffer =
pipe_buffer_create(rctx->b.b.screen, PIPE_BIND_CUSTOM,
PIPE_USAGE_STATIC, size);
rctx->gs_rings.gsvs_ring.buffer_size = size;
}
if (enable) {
r600_set_constant_buffer(&rctx->b.b, PIPE_SHADER_GEOMETRY,
R600_GS_RING_CONST_BUFFER, &rctx->gs_rings.esgs_ring);
r600_set_constant_buffer(&rctx->b.b, PIPE_SHADER_VERTEX,
R600_GS_RING_CONST_BUFFER, &rctx->gs_rings.gsvs_ring);
} else {
r600_set_constant_buffer(&rctx->b.b, PIPE_SHADER_GEOMETRY,
R600_GS_RING_CONST_BUFFER, NULL);
r600_set_constant_buffer(&rctx->b.b, PIPE_SHADER_VERTEX,
R600_GS_RING_CONST_BUFFER, NULL);
}
}
}
static bool r600_update_derived_state(struct r600_context *rctx)
{
struct pipe_context * ctx = (struct pipe_context*)rctx;
bool ps_dirty = false;
bool ps_dirty = false, vs_dirty = false, gs_dirty = false;
bool blend_disable;
if (!rctx->blitter->running) {
@@ -1119,23 +1155,101 @@ static bool r600_update_derived_state(struct r600_context *rctx)
}
}
r600_shader_select(ctx, rctx->ps_shader, &ps_dirty);
update_gs_block_state(rctx, rctx->gs_shader != NULL);
if (rctx->ps_shader && rctx->rasterizer &&
((rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable) ||
(rctx->rasterizer->flatshade != rctx->ps_shader->current->flatshade))) {
if (rctx->gs_shader) {
r600_shader_select(ctx, rctx->gs_shader, &gs_dirty);
if (unlikely(!rctx->gs_shader->current))
return false;
if (rctx->b.chip_class >= EVERGREEN)
evergreen_update_ps_state(ctx, rctx->ps_shader->current);
else
r600_update_ps_state(ctx, rctx->ps_shader->current);
if (!rctx->shader_stages.geom_enable) {
rctx->shader_stages.geom_enable = true;
rctx->shader_stages.atom.dirty = true;
}
ps_dirty = true;
/* gs_shader provides GS and VS (copy shader) */
if (unlikely(rctx->geometry_shader.shader != rctx->gs_shader->current)) {
update_shader_atom(ctx, &rctx->geometry_shader, rctx->gs_shader->current);
update_shader_atom(ctx, &rctx->vertex_shader, rctx->gs_shader->current->gs_copy_shader);
/* Update clip misc state. */
if (rctx->gs_shader->current->gs_copy_shader->pa_cl_vs_out_cntl != rctx->clip_misc_state.pa_cl_vs_out_cntl ||
rctx->gs_shader->current->gs_copy_shader->shader.clip_dist_write != rctx->clip_misc_state.clip_dist_write) {
rctx->clip_misc_state.pa_cl_vs_out_cntl = rctx->gs_shader->current->gs_copy_shader->pa_cl_vs_out_cntl;
rctx->clip_misc_state.clip_dist_write = rctx->gs_shader->current->gs_copy_shader->shader.clip_dist_write;
rctx->clip_misc_state.atom.dirty = true;
}
}
r600_shader_select(ctx, rctx->vs_shader, &vs_dirty);
if (unlikely(!rctx->vs_shader->current))
return false;
/* vs_shader is used as ES */
if (unlikely(vs_dirty || rctx->export_shader.shader != rctx->vs_shader->current)) {
update_shader_atom(ctx, &rctx->export_shader, rctx->vs_shader->current);
}
} else {
if (unlikely(rctx->geometry_shader.shader)) {
update_shader_atom(ctx, &rctx->geometry_shader, NULL);
update_shader_atom(ctx, &rctx->export_shader, NULL);
rctx->shader_stages.geom_enable = false;
rctx->shader_stages.atom.dirty = true;
}
r600_shader_select(ctx, rctx->vs_shader, &vs_dirty);
if (unlikely(!rctx->vs_shader->current))
return false;
if (unlikely(vs_dirty || rctx->vertex_shader.shader != rctx->vs_shader->current)) {
update_shader_atom(ctx, &rctx->vertex_shader, rctx->vs_shader->current);
/* Update clip misc state. */
if (rctx->vs_shader->current->pa_cl_vs_out_cntl != rctx->clip_misc_state.pa_cl_vs_out_cntl ||
rctx->vs_shader->current->shader.clip_dist_write != rctx->clip_misc_state.clip_dist_write) {
rctx->clip_misc_state.pa_cl_vs_out_cntl = rctx->vs_shader->current->pa_cl_vs_out_cntl;
rctx->clip_misc_state.clip_dist_write = rctx->vs_shader->current->shader.clip_dist_write;
rctx->clip_misc_state.atom.dirty = true;
}
}
}
if (ps_dirty) {
rctx->pixel_shader.atom.num_dw = rctx->ps_shader->current->command_buffer.num_dw;
rctx->pixel_shader.atom.dirty = true;
r600_shader_select(ctx, rctx->ps_shader, &ps_dirty);
if (unlikely(!rctx->ps_shader->current))
return false;
if (unlikely(ps_dirty || rctx->pixel_shader.shader != rctx->ps_shader->current)) {
if (rctx->cb_misc_state.nr_ps_color_outputs != rctx->ps_shader->current->nr_ps_color_outputs) {
rctx->cb_misc_state.nr_ps_color_outputs = rctx->ps_shader->current->nr_ps_color_outputs;
rctx->cb_misc_state.atom.dirty = true;
}
if (rctx->b.chip_class <= R700) {
bool multiwrite = rctx->ps_shader->current->shader.fs_write_all;
if (rctx->cb_misc_state.multiwrite != multiwrite) {
rctx->cb_misc_state.multiwrite = multiwrite;
rctx->cb_misc_state.atom.dirty = true;
}
}
if (rctx->b.chip_class >= EVERGREEN) {
evergreen_update_db_shader_control(rctx);
} else {
r600_update_db_shader_control(rctx);
}
if (unlikely(!ps_dirty && rctx->ps_shader && rctx->rasterizer &&
((rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable) ||
(rctx->rasterizer->flatshade != rctx->ps_shader->current->flatshade)))) {
if (rctx->b.chip_class >= EVERGREEN)
evergreen_update_ps_state(ctx, rctx->ps_shader->current);
else
r600_update_ps_state(ctx, rctx->ps_shader->current);
}
update_shader_atom(ctx, &rctx->pixel_shader, rctx->ps_shader->current);
}
/* on R600 we stuff masks + txq info into one constant buffer */
@@ -1145,11 +1259,15 @@ static bool r600_update_derived_state(struct r600_context *rctx)
r600_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx->vs_shader && rctx->vs_shader->current->shader.uses_tex_buffers)
r600_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
if (rctx->gs_shader && rctx->gs_shader->current->shader.uses_tex_buffers)
r600_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
} else {
if (rctx->ps_shader && rctx->ps_shader->current->shader.uses_tex_buffers)
eg_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx->vs_shader && rctx->vs_shader->current->shader.uses_tex_buffers)
eg_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
if (rctx->gs_shader && rctx->gs_shader->current->shader.uses_tex_buffers)
eg_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
}
@@ -1157,6 +1275,8 @@ static bool r600_update_derived_state(struct r600_context *rctx)
r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx->vs_shader && rctx->vs_shader->current->shader.has_txq_cube_array_z_comp)
r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_VERTEX);
if (rctx->gs_shader && rctx->gs_shader->current->shader.has_txq_cube_array_z_comp)
r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_GEOMETRY);
if (rctx->b.chip_class < EVERGREEN && rctx->ps_shader && rctx->vs_shader) {
if (!r600_adjust_gprs(rctx)) {
@@ -1174,33 +1294,10 @@ static bool r600_update_derived_state(struct r600_context *rctx)
rctx->blend_state.cso,
blend_disable);
}
return true;
}
static unsigned r600_conv_prim_to_gs_out(unsigned mode)
{
static const int prim_conv[] = {
V_028A6C_OUTPRIM_TYPE_POINTLIST,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode < Elements(prim_conv));
return prim_conv[mode];
}
void r600_emit_clip_misc_state(struct r600_context *rctx, struct r600_atom *atom)
{
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
@@ -1227,7 +1324,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
return;
}
if (!rctx->vs_shader) {
if (!rctx->vs_shader || !rctx->ps_shader) {
assert(0);
return;
}
@@ -1330,8 +1427,6 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
r600_write_context_reg(cs, R_028A0C_PA_SC_LINE_STIPPLE,
S_028A0C_AUTO_RESET_CNTL(ls_mask) |
(rctx->rasterizer ? rctx->rasterizer->pa_sc_line_stipple : 0));
r600_write_context_reg(cs, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
r600_conv_prim_to_gs_out(info.mode));
r600_write_config_reg(cs, R_008958_VGT_PRIMITIVE_TYPE,
r600_conv_pipe_prim(info.mode));
@@ -1615,11 +1710,14 @@ bool sampler_state_needs_border_color(const struct pipe_sampler_state *state)
void r600_emit_shader(struct r600_context *rctx, struct r600_atom *a)
{
struct radeon_winsys_cs *cs = rctx->b.rings.gfx.cs;
struct r600_pipe_shader *shader = ((struct r600_shader_state*)a)->shader->current;
struct r600_pipe_shader *shader = ((struct r600_shader_state*)a)->shader;
if (!shader)
return;
r600_emit_command_buffer(cs, &shader->command_buffer);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(&rctx->b, &rctx->b.rings.gfx, shader->bo, RADEON_USAGE_READ));
}
@@ -1633,7 +1731,6 @@ struct pipe_surface *r600_create_surface_custom(struct pipe_context *pipe,
assert(templ->u.tex.first_layer <= util_max_layer(texture, templ->u.tex.level));
assert(templ->u.tex.last_layer <= util_max_layer(texture, templ->u.tex.level));
assert(templ->u.tex.first_layer == templ->u.tex.last_layer);
if (surface == NULL)
return NULL;
pipe_reference_init(&surface->base.reference, 1);
@@ -2148,6 +2245,7 @@ void r600_init_common_state_functions(struct r600_context *rctx)
{
rctx->b.b.create_fs_state = r600_create_ps_state;
rctx->b.b.create_vs_state = r600_create_vs_state;
rctx->b.b.create_gs_state = r600_create_gs_state;
rctx->b.b.create_vertex_elements_state = r600_create_vertex_fetch_shader;
rctx->b.b.bind_blend_state = r600_bind_blend_state;
rctx->b.b.bind_depth_stencil_alpha_state = r600_bind_dsa_state;
@@ -2156,6 +2254,7 @@ void r600_init_common_state_functions(struct r600_context *rctx)
rctx->b.b.bind_rasterizer_state = r600_bind_rs_state;
rctx->b.b.bind_vertex_elements_state = r600_bind_vertex_elements;
rctx->b.b.bind_vs_state = r600_bind_vs_state;
rctx->b.b.bind_gs_state = r600_bind_gs_state;
rctx->b.b.delete_blend_state = r600_delete_blend_state;
rctx->b.b.delete_depth_stencil_alpha_state = r600_delete_dsa_state;
rctx->b.b.delete_fs_state = r600_delete_ps_state;
@@ -2163,6 +2262,7 @@ void r600_init_common_state_functions(struct r600_context *rctx)
rctx->b.b.delete_sampler_state = r600_delete_sampler_state;
rctx->b.b.delete_vertex_elements_state = r600_delete_vertex_elements;
rctx->b.b.delete_vs_state = r600_delete_vs_state;
rctx->b.b.delete_gs_state = r600_delete_gs_state;
rctx->b.b.set_blend_color = r600_set_blend_color;
rctx->b.b.set_clip_state = r600_set_clip_state;
rctx->b.b.set_constant_buffer = r600_set_constant_buffer;

View File

@@ -123,6 +123,7 @@
#define EVENT_TYPE_SO_VGTSTREAMOUT_FLUSH 0x1f
#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS 0x20
#define EVENT_TYPE_FLUSH_AND_INV_DB_META 0x2c /* supported on r700+ */
#define EVENT_TYPE_VGT_FLUSH 0x24
#define EVENT_TYPE_FLUSH_AND_INV_CB_META 46 /* supported on r700+ */
#define EVENT_TYPE(x) ((x) << 0)
#define EVENT_INDEX(x) ((x) << 8)
@@ -200,6 +201,19 @@
/* Registers */
#define R_008490_CP_STRMOUT_CNTL 0x008490
#define S_008490_OFFSET_UPDATE_DONE(x) (((x) & 0x1) << 0)
#define R_008C40_SQ_ESGS_RING_BASE 0x008C40
#define R_008C44_SQ_ESGS_RING_SIZE 0x008C44
#define R_008C48_SQ_GSVS_RING_BASE 0x008C48
#define R_008C4C_SQ_GSVS_RING_SIZE 0x008C4C
#define R_008C50_SQ_ESTMP_RING_BASE 0x008C50
#define R_008C54_SQ_ESTMP_RING_SIZE 0x008C54
#define R_008C50_SQ_GSTMP_RING_BASE 0x008C58
#define R_008C54_SQ_GSTMP_RING_SIZE 0x008C5C
#define R_0088C8_VGT_GS_PER_ES 0x0088C8
#define R_0088CC_VGT_ES_PER_GS 0x0088CC
#define R_0088E8_VGT_GS_PER_VS 0x0088E8
#define R_008960_VGT_STRMOUT_BUFFER_FILLED_SIZE_0 0x008960 /* read-only */
#define R_008964_VGT_STRMOUT_BUFFER_FILLED_SIZE_1 0x008964 /* read-only */
#define R_008968_VGT_STRMOUT_BUFFER_FILLED_SIZE_2 0x008968 /* read-only */
@@ -1824,12 +1838,20 @@
#define S_028A40_MODE(x) (((x) & 0x3) << 0)
#define G_028A40_MODE(x) (((x) >> 0) & 0x3)
#define C_028A40_MODE 0xFFFFFFFC
#define V_028A40_GS_OFF 0
#define V_028A40_GS_SCENARIO_A 1
#define V_028A40_GS_SCENARIO_B 2
#define V_028A40_GS_SCENARIO_G 3
#define S_028A40_ES_PASSTHRU(x) (((x) & 0x1) << 2)
#define G_028A40_ES_PASSTHRU(x) (((x) >> 2) & 0x1)
#define C_028A40_ES_PASSTHRU 0xFFFFFFFB
#define S_028A40_CUT_MODE(x) (((x) & 0x3) << 3)
#define G_028A40_CUT_MODE(x) (((x) >> 3) & 0x3)
#define C_028A40_CUT_MODE 0xFFFFFFE7
#define V_028A40_GS_CUT_1024 0
#define V_028A40_GS_CUT_512 1
#define V_028A40_GS_CUT_256 2
#define V_028A40_GS_CUT_128 3
#define R_008DFC_SQ_CF_WORD0 0x008DFC
#define S_008DFC_ADDR(x) (((x) & 0xFFFFFFFF) << 0)
#define G_008DFC_ADDR(x) (((x) >> 0) & 0xFFFFFFFF)
@@ -2332,6 +2354,26 @@
#define S_028D44_ALPHA_TO_MASK_OFFSET3(x) (((x) & 0x3) << 14)
#define S_028D44_OFFSET_ROUND(x) (((x) & 0x1) << 16)
#define R_028868_SQ_PGM_RESOURCES_VS 0x028868
#define R_028890_SQ_PGM_RESOURCES_ES 0x028890
#define S_028890_NUM_GPRS(x) (((x) & 0xFF) << 0)
#define G_028890_NUM_GPRS(x) (((x) >> 0) & 0xFF)
#define C_028890_NUM_GPRS 0xFFFFFF00
#define S_028890_STACK_SIZE(x) (((x) & 0xFF) << 8)
#define G_028890_STACK_SIZE(x) (((x) >> 8) & 0xFF)
#define C_028890_STACK_SIZE 0xFFFF00FF
#define S_028890_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_028890_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_028890_DX10_CLAMP 0xFFDFFFFF
#define R_02887C_SQ_PGM_RESOURCES_GS 0x02887C
#define S_02887C_NUM_GPRS(x) (((x) & 0xFF) << 0)
#define G_02887C_NUM_GPRS(x) (((x) >> 0) & 0xFF)
#define C_02887C_NUM_GPRS 0xFFFFFF00
#define S_02887C_STACK_SIZE(x) (((x) & 0xFF) << 8)
#define G_02887C_STACK_SIZE(x) (((x) >> 8) & 0xFF)
#define C_02887C_STACK_SIZE 0xFFFF00FF
#define S_02887C_DX10_CLAMP(x) (((x) & 0x1) << 21)
#define G_02887C_DX10_CLAMP(x) (((x) >> 21) & 0x1)
#define C_02887C_DX10_CLAMP 0xFFDFFFFF
#define R_0286CC_SPI_PS_IN_CONTROL_0 0x0286CC
#define R_0286D0_SPI_PS_IN_CONTROL_1 0x0286D0
#define R_028644_SPI_PS_INPUT_CNTL_0 0x028644
@@ -2421,11 +2463,15 @@
#define G_028C04_MAX_SAMPLE_DIST(x) (((x) >> 13) & 0xF)
#define C_028C04_MAX_SAMPLE_DIST 0xFFFE1FFF
#define R_0288CC_SQ_PGM_CF_OFFSET_PS 0x0288CC
#define R_0288DC_SQ_PGM_CF_OFFSET_FS 0x0288DC
#define R_0288D0_SQ_PGM_CF_OFFSET_VS 0x0288D0
#define R_0288D4_SQ_PGM_CF_OFFSET_GS 0x0288D4
#define R_0288D8_SQ_PGM_CF_OFFSET_ES 0x0288D8
#define R_0288DC_SQ_PGM_CF_OFFSET_FS 0x0288DC
#define R_028840_SQ_PGM_START_PS 0x028840
#define R_028894_SQ_PGM_START_FS 0x028894
#define R_028858_SQ_PGM_START_VS 0x028858
#define R_02886C_SQ_PGM_START_GS 0x02886C
#define R_028880_SQ_PGM_START_ES 0x028880
#define R_028080_CB_COLOR0_VIEW 0x028080
#define S_028080_SLICE_START(x) (((x) & 0x7FF) << 0)
#define G_028080_SLICE_START(x) (((x) >> 0) & 0x7FF)
@@ -2863,6 +2909,7 @@
#define R_0283F4_SQ_VTX_SEMANTIC_29 0x0283F4
#define R_0283F8_SQ_VTX_SEMANTIC_30 0x0283F8
#define R_0283FC_SQ_VTX_SEMANTIC_31 0x0283FC
#define R_0288C8_SQ_GS_VERT_ITEMSIZE 0x0288C8
#define R_0288E0_SQ_VTX_SEMANTIC_CLEAR 0x0288E0
#define R_028400_VGT_MAX_VTX_INDX 0x028400
#define S_028400_MAX_INDX(x) (((x) & 0xFFFFFFFF) << 0)
@@ -3287,6 +3334,8 @@
#define R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET 0x028B28
#define R_028B2C_VGT_STRMOUT_DRAW_OPAQUE_BUFFER_FILLED_SIZE 0x028B2C
#define R_028B30_VGT_STRMOUT_DRAW_OPAQUE_VERTEX_STRIDE 0x028B30
#define R_028B38_VGT_GS_MAX_VERT_OUT 0x028B38 /* r7xx */
#define S_028B38_MAX_VERT_OUT(x) (((x) & 0x7FF) << 0)
#define R_028B44_VGT_STRMOUT_BASE_OFFSET_HI_0 0x028B44
#define R_028B48_VGT_STRMOUT_BASE_OFFSET_HI_1 0x028B48
#define R_028B4C_VGT_STRMOUT_BASE_OFFSET_HI_2 0x028B4C

View File

@@ -169,8 +169,10 @@ enum shader_target
{
TARGET_UNKNOWN,
TARGET_VS,
TARGET_ES,
TARGET_PS,
TARGET_GS,
TARGET_GS_COPY,
TARGET_COMPUTE,
TARGET_FETCH,

View File

@@ -137,7 +137,7 @@ void bc_dump::dump(cf_node& n) {
for (int k = 0; k < 4; ++k)
s << chans[n.bc.sel[k]];
} else if (n.bc.op_ptr->flags & (CF_STRM | CF_RAT)) {
} else if (n.bc.op_ptr->flags & CF_MEM) {
static const char *exp_type[] = {"WRITE", "WRITE_IND", "WRITE_ACK",
"WRITE_IND_ACK"};
fill_to(s, 18);
@@ -150,6 +150,9 @@ void bc_dump::dump(cf_node& n) {
if ((n.bc.op_ptr->flags & CF_RAT) && (n.bc.type & 1)) {
s << ", @R" << n.bc.index_gpr << ".xyz";
}
if ((n.bc.op_ptr->flags & CF_MEM) && (n.bc.type & 1)) {
s << ", @R" << n.bc.index_gpr << ".x";
}
s << " ES:" << n.bc.elem_size;

View File

@@ -63,7 +63,7 @@ int bc_finalizer::run() {
// workaround for some problems on r6xx/7xx
// add ALU NOP to each vertex shader
if (!ctx.is_egcm() && sh.target == TARGET_VS) {
if (!ctx.is_egcm() && (sh.target == TARGET_VS || sh.target == TARGET_ES)) {
cf_node *c = sh.create_clause(NST_ALU_CLAUSE);
alu_group_node *g = sh.create_alu_group();
@@ -695,7 +695,7 @@ void bc_finalizer::finalize_cf(cf_node* c) {
c->bc.rw_gpr = reg >= 0 ? reg : 0;
c->bc.comp_mask = mask;
if ((flags & CF_RAT) && (c->bc.type & 1)) {
if (((flags & CF_RAT) || (!(flags & CF_STRM))) && (c->bc.type & 1)) {
reg = -1;

View File

@@ -58,7 +58,10 @@ int bc_parser::decode() {
if (pshader) {
switch (bc->type) {
case TGSI_PROCESSOR_FRAGMENT: t = TARGET_PS; break;
case TGSI_PROCESSOR_VERTEX: t = TARGET_VS; break;
case TGSI_PROCESSOR_VERTEX:
t = pshader->vs_as_es ? TARGET_ES : TARGET_VS;
break;
case TGSI_PROCESSOR_GEOMETRY: t = TARGET_GS; break;
case TGSI_PROCESSOR_COMPUTE: t = TARGET_COMPUTE; break;
default: assert(!"unknown shader target"); return -1; break;
}
@@ -134,8 +137,12 @@ int bc_parser::parse_decls() {
}
}
if (sh->target == TARGET_VS)
if (sh->target == TARGET_VS || sh->target == TARGET_ES)
sh->add_input(0, 1, 0x0F);
else if (sh->target == TARGET_GS) {
sh->add_input(0, 1, 0x0F);
sh->add_input(1, 1, 0x0F);
}
bool ps_interp = ctx.hw_class >= HW_CLASS_EVERGREEN
&& sh->target == TARGET_PS;
@@ -202,7 +209,7 @@ int bc_parser::decode_cf(unsigned &i, bool &eop) {
if (cf->bc.rw_rel)
gpr_reladdr = true;
assert(!cf->bc.rw_rel);
} else if (flags & (CF_STRM | CF_RAT)) {
} else if (flags & CF_MEM) {
if (cf->bc.rw_rel)
gpr_reladdr = true;
assert(!cf->bc.rw_rel);
@@ -676,7 +683,7 @@ int bc_parser::prepare_ir() {
} while (1);
c->bc.end_of_program = eop;
} else if (flags & (CF_STRM | CF_RAT)) {
} else if (flags & CF_MEM) {
unsigned burst_count = c->bc.burst_count;
unsigned eop = c->bc.end_of_program;
@@ -694,7 +701,7 @@ int bc_parser::prepare_ir() {
sh->get_gpr_value(true, c->bc.rw_gpr, s, false);
}
if ((flags & CF_RAT) && (c->bc.type & 1)) { // indexed write
if (((flags & CF_RAT) || (!(flags & CF_STRM))) && (c->bc.type & 1)) { // indexed write
c->src.resize(8);
for(int s = 0; s < 3; ++s) {
c->src[4 + s] =

View File

@@ -349,7 +349,7 @@ void dump::dump_op(node &n, const char *name) {
static const char *exp_type[] = {"PIXEL", "POS ", "PARAM"};
sblog << " " << exp_type[c->bc.type] << " " << c->bc.array_base;
has_dst = false;
} else if (c->bc.op_ptr->flags & CF_STRM) {
} else if (c->bc.op_ptr->flags & (CF_MEM)) {
static const char *exp_type[] = {"WRITE", "WRITE_IND", "WRITE_ACK",
"WRITE_IND_ACK"};
sblog << " " << exp_type[c->bc.type] << " " << c->bc.array_base

View File

@@ -215,7 +215,7 @@ void shader::init() {
void shader::init_call_fs(cf_node* cf) {
unsigned gpr = 0;
assert(target == TARGET_VS);
assert(target == TARGET_VS || target == TARGET_ES);
for(inputs_vec::const_iterator I = inputs.begin(),
E = inputs.end(); I != E; ++I, ++gpr) {
@@ -433,6 +433,7 @@ std::string shader::get_full_target_name() {
const char* shader::get_shader_target_name() {
switch (target) {
case TARGET_VS: return "VS";
case TARGET_ES: return "ES";
case TARGET_PS: return "PS";
case TARGET_GS: return "GS";
case TARGET_COMPUTE: return "COMPUTE";

View File

@@ -58,6 +58,9 @@
#define NUM_H264_REFS 17
#define NUM_VC1_REFS 5
#define FB_BUFFER_OFFSET 0x1000
#define FB_BUFFER_SIZE 2048
/* UVD buffer representation */
struct ruvd_buffer
{
@@ -81,6 +84,7 @@ struct ruvd_decoder {
struct ruvd_buffer msg_fb_buffers[NUM_BUFFERS];
struct ruvd_msg *msg;
uint32_t *fb;
struct ruvd_buffer bs_buffers[NUM_BUFFERS];
void* bs_ptr;
@@ -131,16 +135,21 @@ static void send_cmd(struct ruvd_decoder *dec, unsigned cmd,
set_reg(dec, RUVD_GPCOM_VCPU_CMD, cmd << 1);
}
/* map the next available message buffer */
static void map_msg_buf(struct ruvd_decoder *dec)
/* map the next available message/feedback buffer */
static void map_msg_fb_buf(struct ruvd_decoder *dec)
{
struct ruvd_buffer* buf;
uint8_t *ptr;
/* grap the current message buffer */
/* grab the current message/feedback buffer */
buf = &dec->msg_fb_buffers[dec->cur_buffer];
/* copy the message into it */
dec->msg = dec->ws->buffer_map(buf->cs_handle, dec->cs, PIPE_TRANSFER_WRITE);
/* and map it for CPU access */
ptr = dec->ws->buffer_map(buf->cs_handle, dec->cs, PIPE_TRANSFER_WRITE);
/* calc buffer offsets */
dec->msg = (struct ruvd_msg *)ptr;
dec->fb = (uint32_t *)(ptr + FB_BUFFER_OFFSET);
}
/* unmap and send a message command to the VCPU */
@@ -148,8 +157,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)
{
struct ruvd_buffer* buf;
/* ignore the request if message buffer isn't mapped */
if (!dec->msg)
/* ignore the request if message/feedback buffer isn't mapped */
if (!dec->msg || !dec->fb)
return;
/* grap the current message buffer */
@@ -157,6 +166,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)
/* unmap the buffer */
dec->ws->buffer_unmap(buf->cs_handle);
dec->msg = NULL;
dec->fb = NULL;
/* and send it to the hardware */
send_cmd(dec, RUVD_CMD_MSG_BUFFER, buf->cs_handle, 0,
@@ -644,7 +655,7 @@ static void ruvd_destroy(struct pipe_video_codec *decoder)
assert(decoder);
map_msg_buf(dec);
map_msg_fb_buf(dec);
memset(dec->msg, 0, sizeof(*dec->msg));
dec->msg->size = sizeof(*dec->msg);
dec->msg->msg_type = RUVD_MSG_DESTROY;
@@ -773,7 +784,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
memset(dec->bs_ptr, 0, bs_size - dec->bs_size);
dec->ws->buffer_unmap(bs_buf->cs_handle);
map_msg_buf(dec);
map_msg_fb_buf(dec);
dec->msg->size = sizeof(*dec->msg);
dec->msg->msg_type = RUVD_MSG_DECODE;
dec->msg->stream_handle = dec->stream_handle;
@@ -813,6 +824,10 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
dec->msg->body.decode.db_surf_tile_config = dec->msg->body.decode.dt_surf_tile_config;
dec->msg->body.decode.extension_support = 0x1;
/* set at least the feedback buffer size */
dec->fb[0] = FB_BUFFER_SIZE;
send_msg_buf(dec);
send_cmd(dec, RUVD_CMD_DPB_BUFFER, dec->dpb.cs_handle, 0,
@@ -822,7 +837,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
send_cmd(dec, RUVD_CMD_DECODING_TARGET_BUFFER, dt, 0,
RADEON_USAGE_WRITE, RADEON_DOMAIN_VRAM);
send_cmd(dec, RUVD_CMD_FEEDBACK_BUFFER, msg_fb_buf->cs_handle,
0x1000, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
FB_BUFFER_OFFSET, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
set_reg(dec, RUVD_ENGINE_CNTL, 1);
flush(dec);
@@ -898,7 +913,8 @@ struct pipe_video_codec *ruvd_create_decoder(struct pipe_context *context,
bs_buf_size = width * height * 512 / (16 * 16);
for (i = 0; i < NUM_BUFFERS; ++i) {
unsigned msg_fb_size = align(sizeof(struct ruvd_msg), 0x1000) + 0x1000;
unsigned msg_fb_size = FB_BUFFER_OFFSET + FB_BUFFER_SIZE;
STATIC_ASSERT(sizeof(struct ruvd_msg) <= FB_BUFFER_OFFSET);
if (!create_buffer(dec, &dec->msg_fb_buffers[i], msg_fb_size)) {
RUVD_ERR("Can't allocated message buffers.\n");
goto error;
@@ -920,7 +936,7 @@ struct pipe_video_codec *ruvd_create_decoder(struct pipe_context *context,
clear_buffer(dec, &dec->dpb);
map_msg_buf(dec);
map_msg_fb_buf(dec);
dec->msg->size = sizeof(*dec->msg);
dec->msg->msg_type = RUVD_MSG_CREATE;
dec->msg->stream_handle = dec->stream_handle;

View File

@@ -888,14 +888,13 @@ public:
ast_node *body;
private:
/**
* Generate IR from the condition of a loop
*
* This is factored out of ::hir because some loops have the condition
* test at the top (for and while), and others have it at the end (do-while).
*/
void condition_to_hir(class ir_loop *, struct _mesa_glsl_parse_state *);
void condition_to_hir(exec_list *, struct _mesa_glsl_parse_state *);
};

View File

@@ -4029,17 +4029,22 @@ ast_jump_statement::hir(exec_list *instructions,
_mesa_glsl_error(& loc, state,
"break may only appear in a loop or a switch");
} else {
/* For a loop, inline the for loop expression again,
* since we don't know where near the end of
* the loop body the normal copy of it
* is going to be placed.
/* For a loop, inline the for loop expression again, since we don't
* know where near the end of the loop body the normal copy of it is
* going to be placed. Same goes for the condition for a do-while
* loop.
*/
if (state->loop_nesting_ast != NULL &&
mode == ast_continue &&
state->loop_nesting_ast->rest_expression) {
state->loop_nesting_ast->rest_expression->hir(instructions,
state);
}
mode == ast_continue) {
if (state->loop_nesting_ast->rest_expression) {
state->loop_nesting_ast->rest_expression->hir(instructions,
state);
}
if (state->loop_nesting_ast->mode ==
ast_iteration_statement::ast_do_while) {
state->loop_nesting_ast->condition_to_hir(instructions, state);
}
}
if (state->switch_state.is_switch_innermost &&
mode == ast_break) {
@@ -4369,14 +4374,14 @@ ast_case_label::hir(exec_list *instructions,
}
void
ast_iteration_statement::condition_to_hir(ir_loop *stmt,
ast_iteration_statement::condition_to_hir(exec_list *instructions,
struct _mesa_glsl_parse_state *state)
{
void *ctx = state;
if (condition != NULL) {
ir_rvalue *const cond =
condition->hir(& stmt->body_instructions, state);
condition->hir(instructions, state);
if ((cond == NULL)
|| !cond->type->is_boolean() || !cond->type->is_scalar()) {
@@ -4397,7 +4402,7 @@ ast_iteration_statement::condition_to_hir(ir_loop *stmt,
new(ctx) ir_loop_jump(ir_loop_jump::jump_break);
if_stmt->then_instructions.push_tail(break_stmt);
stmt->body_instructions.push_tail(if_stmt);
instructions->push_tail(if_stmt);
}
}
}
@@ -4432,7 +4437,7 @@ ast_iteration_statement::hir(exec_list *instructions,
state->switch_state.is_switch_innermost = false;
if (mode != ast_do_while)
condition_to_hir(stmt, state);
condition_to_hir(&stmt->body_instructions, state);
if (body != NULL)
body->hir(& stmt->body_instructions, state);
@@ -4441,7 +4446,7 @@ ast_iteration_statement::hir(exec_list *instructions,
rest_expression->hir(& stmt->body_instructions, state);
if (mode == ast_do_while)
condition_to_hir(stmt, state);
condition_to_hir(&stmt->body_instructions, state);
if (mode != ast_do_while)
state->symbols->pop_scope();

View File

@@ -118,6 +118,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
ubo_layout_mask.flags.q.shared = 1;
ast_type_qualifier ubo_binding_mask;
ubo_binding_mask.flags.i = 0;
ubo_binding_mask.flags.q.explicit_binding = 1;
ubo_binding_mask.flags.q.explicit_offset = 1;

View File

@@ -1466,7 +1466,7 @@ type_qualifier:
"just before storage qualifiers");
}
$$ = $1;
$$.flags.i |= $2.flags.i;
$$.merge_qualifier(&@1, state, $2);
}
| storage_qualifier type_qualifier
{

View File

@@ -141,6 +141,7 @@ dri2_bind_context(struct glx_context *context, struct glx_context *old,
struct dri2_context *pcp = (struct dri2_context *) context;
struct dri2_screen *psc = (struct dri2_screen *) pcp->base.psc;
struct dri2_drawable *pdraw, *pread;
__DRIdrawable *dri_draw = NULL, *dri_read = NULL;
struct dri2_display *pdp;
pdraw = (struct dri2_drawable *) driFetchDrawable(context, draw);
@@ -148,20 +149,26 @@ dri2_bind_context(struct glx_context *context, struct glx_context *old,
driReleaseDrawables(&pcp->base);
if (pdraw == NULL || pread == NULL)
if (pdraw)
dri_draw = pdraw->driDrawable;
else if (draw != None)
return GLXBadDrawable;
if (!(*psc->core->bindContext) (pcp->driContext,
pdraw->driDrawable, pread->driDrawable))
if (pread)
dri_read = pread->driDrawable;
else if (read != None)
return GLXBadDrawable;
if (!(*psc->core->bindContext) (pcp->driContext, dri_draw, dri_read))
return GLXBadContext;
/* If the server doesn't send invalidate events, we may miss a
* resize before the rendering starts. Invalidate the buffers now
* so the driver will recheck before rendering starts. */
pdp = (struct dri2_display *) psc->base.display;
if (!pdp->invalidateAvailable) {
if (!pdp->invalidateAvailable && pdraw) {
dri2InvalidateBuffers(psc->base.dpy, pdraw->base.xDrawable);
if (pread != pdraw)
if (pread != pdraw && pread)
dri2InvalidateBuffers(psc->base.dpy, pread->base.xDrawable);
}

View File

@@ -392,6 +392,9 @@ driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
if (priv == NULL)
return NULL;
if (glxDrawable == None)
return NULL;
psc = priv->screens[gc->screen];
if (priv->drawHash == NULL)
return NULL;

View File

@@ -254,26 +254,6 @@ gen6_blorp_emit_blend_state(struct brw_context *brw,
blend->blend1.write_disable_b = params->color_write_disable[2];
blend->blend1.write_disable_a = params->color_write_disable[3];
/* When blitting from an XRGB source to a ARGB destination, we need to
* interpret the missing channel as 1.0. Blending can do that for us:
* we simply use the RGB values from the fragment shader ("source RGB"),
* but smash the alpha channel to 1.
*/
if (params->src.mt &&
_mesa_get_format_bits(params->dst.mt->format, GL_ALPHA_BITS) > 0 &&
_mesa_get_format_bits(params->src.mt->format, GL_ALPHA_BITS) == 0) {
blend->blend0.blend_enable = 1;
blend->blend0.ia_blend_enable = 1;
blend->blend0.blend_func = BRW_BLENDFUNCTION_ADD;
blend->blend0.ia_blend_func = BRW_BLENDFUNCTION_ADD;
blend->blend0.source_blend_factor = BRW_BLENDFACTOR_SRC_COLOR;
blend->blend0.dest_blend_factor = BRW_BLENDFACTOR_ZERO;
blend->blend0.ia_source_blend_factor = BRW_BLENDFACTOR_ONE;
blend->blend0.ia_dest_blend_factor = BRW_BLENDFACTOR_ZERO;
}
return cc_blend_state_offset;
}

View File

@@ -66,6 +66,8 @@ do_blit_copypixels(struct gl_context * ctx,
/* Update draw buffer bounds */
_mesa_update_state(ctx);
intel_prepare_render(brw);
switch (type) {
case GL_COLOR:
if (fb->_NumColorDrawBuffers != 1) {
@@ -148,8 +150,6 @@ do_blit_copypixels(struct gl_context * ctx,
return false;
}
intel_prepare_render(brw);
intel_batchbuffer_flush(brw);
/* Clip to destination buffer. */

View File

@@ -72,6 +72,8 @@ do_blit_drawpixels(struct gl_context * ctx,
return false;
}
intel_prepare_render(brw);
struct gl_renderbuffer *rb = ctx->DrawBuffer->_ColorDrawBuffers[0];
struct intel_renderbuffer *irb = intel_renderbuffer(rb);
@@ -101,8 +103,6 @@ do_blit_drawpixels(struct gl_context * ctx,
src_offset += _mesa_image_offset(2, unpack, width, height,
format, type, 0, 0, 0);
intel_prepare_render(brw);
src_buffer = intel_bufferobj_buffer(brw, src,
src_offset, width * height *
irb->mt->cpp);

View File

@@ -865,7 +865,9 @@ st_GetTexImage(struct gl_context * ctx,
ubyte *map = NULL;
boolean done = FALSE;
if (!st->prefer_blit_based_texture_transfer) {
if (!st->prefer_blit_based_texture_transfer &&
!_mesa_is_format_compressed(texImage->TexFormat)) {
/* Try to avoid the fallback if we're doing texture decompression here */
goto fallback;
}
@@ -1483,6 +1485,12 @@ st_finalize_texture(struct gl_context *ctx,
if (tObj->Target == GL_TEXTURE_BUFFER) {
struct st_buffer_object *st_obj = st_buffer_object(tObj->BufferObject);
if (!st_obj) {
pipe_resource_reference(&stObj->pt, NULL);
pipe_sampler_view_reference(&stObj->sampler_view, NULL);
return GL_TRUE;
}
if (st_obj->buffer != stObj->pt) {
pipe_resource_reference(&stObj->pt, st_obj->buffer);
pipe_sampler_view_release(st->pipe, &stObj->sampler_view);